United States
Environmental Protection
Agency
Office Of Water
(4503F)
EPA-841-B-97-007
June 1997
&EPA
Linear Regression for Nonpoint
Source Pollution Analyses
INTRODUCTION
The purpose of this fact sheet is to demonstrate an
approach for describing the relationship between variables
using regression. The fact sheet is targeted toward
persons in state water quality monitoring agencies who are
responsible for nonpoint source assessments and
implementation of watershed management.
Regression can be used to model or predict the behavior
of one or more variables. The general regression model,
where e is an error term, is given as
+
(1)
In this equation, the behavior of a single dependent
variable (y) is modeled with one or more independent
variables (x,, ..., xn). The x's may be linear or nonlinear
(e.g., jCj can represent:*2, x3, r1, etc.). p0, ..., Pn are
numerical constants that are computed using equations
described later. Nonlinear models are commonly applied
to physical systems, but they are somewhat more difficult
to analyze because iterative techniques are involved when
the model cannot be transformed to a linear model. The
use of two or more independent variables (x) in a linear
function to describe the behavior of y is referred to as
multiple linear regression. In either case, regression
techniques attempt to explain as much of the variation hi
the dependent variable as possible.
In nonpoint source analyses, linear regression is often
used to determine the extent to which the value of a water
quality variable (y) is influenced by land use or hydrologic
factors (x) such as crop type, soil type, percentage of land
treatment, rainfall, or stream flow, or by another water
quality variable. Practical applications of these regression
results include the.ability to predict the water quality
impacts due to changes in the independent variables.
SIMPLE LINEAR REGRESSION
f
The simplest form of regression is to consider one
dependent and one independent variable using
y = P0 + Pj* + e (2)
where y is the dependent variable, x is the independent
variable, and P0 and ^l are numerical constants
representing the y-intercept and slope, respectively.
Helsel and Hirsch (1995) summarize the key assumptions
regarding application of linear regression (Table 1). The
uses of a regression analysis should not be extended
beyond those supported by the assumptions met. Note
that the normality assumption (assumption 5) can be
relaxed when testing hypotheses and estimating confidence
intervals if the sample size is relatively large.
The first step in applying linear regression (assumption 1
in Table 1) is to examine the data to see if linear
regression makes sensethat is, to use a bivariate scatter
plot to see if the points approximate a straight line. If
they fall hi a straight line, linear regression makes sense;
if they do not, a data transformation might be needed, or
perhaps a nonlinear relationship should be used.
To illustrate the use of linear regression, the fraction of
water (split) collected by a water and sediment sampler
for a plot-sized runoff sampler is used (Dressing et al.,
1987). In this data set the sampling percentage (split) was
measured for a range of flow rates. The scatter plot
(Figure 1) shows that linear regression can be applied.
Presuming that the data are representative (assumption 2
hi Table 1), the next step is to develop the regression line
using the method of least squares (Freund, 1973). To
determine the values of (i0 and pt in Equation 2, the
following equations can be used (Helsel and Hirsch,
1995):
= ^L
55.
j=i
- «w2
Po = y ~ Pi*
(3)
(4)
where n, x, and y are the number of observations, the
mean of the independent variable (e.g., flow rate), and the
mean of the dependent variable (e.g., split), respectively.
Sg, is the sum of the xy cross products and SSX is the sum
of the squares x.
-------
1. Assumptions necessary for the purposes of linear regression.
ill in
I
iiiiiiiiiiiiiii ii
infilling nil mi
il I
"I
I
Assumption
(1) The model form is
correct: y is linearly
related to x
{2} The data used to fit
the model are represen-
tative of data of interest
(3) The variance of the
residuals is constant and
does not depend on x or
anything else
(4} The residuals are
independent
{5} The residuals are
normally distributed
Purpose
Predict y
given x
S
S
Predict y and a
variance for the
prediction
y
/
/
Obtain best
linear unbiased
estimator of y
/
/
/
/
Test hypotheses,
estimate confidence or
prediction intervals
/
/
/
/
S
S Indicates that assumption is required.
Reprinted from Helsel and Hirsch, Statistical Methods in Water Resources, 1995, page 225, with kind permission from Elsevier Science - NL, Sara
Burgerhamtraat 25, 1055 KV Amsterdam, The Netherlands.
-is ,:,;.:::
For the data in thejfas^ two cohirnns of Table 2 (same as _
those displayed in Figure 1), Equations 3*"an3'4 were used
to compute a slope (P,) oif-0.0119 and an intercept (P0) of
3.131,7. fej, Sm and SSX were computed as 28.89,2.79,
Thus, the linear
2=^V^!: ;-""ffipdefjqr predicting split versus flow rate is
ASSUMPTION EVALUATION
The analyst must make sure that PO and P_ make sense. In
this case, perhaps the best approach is to plot the
regression line with the raw data, as shown in Figure 1.
The third column in Table 2 contains the predicted split, 'y,,
computed using Equation 5 for each flow rate. The
:;iiic ;r,
.ft '"'
"III IIIIEIIIIIBII ''
ill Ii
liilliK
iiii'iiiif!:!
!!:- '!!
RIS'S1'11'11!1!!!
tuan
ii^
f
!' I jiiliup "Ililllllifl
iiiii'lB^^^
00
-1
1=
iSliitl
li'SSillHE'CfljII'l II
'
^>
o ^.
LOW
|l|l|
i i«! «
^^
O 3
r v=i./^
i|^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^-
O -*l
-Tf= C
111 J1
lllllllli'lirl III 'illlllllliUl'iill 111
iiiliiii^iiii'siiiiiisii;:1:!!!1;!!!
tfi'^itrr
<^
3 SC
3f=>rri'
illii!!!iiili::isi!i!:ii!:iB^^
i i i =
D &<
i
1!
111!!
D
IK
liiliviiiinini,
Mw^'
1
Ill'll!!"!'"
ll"!1'!,
Flow Rate, Split, /, Predicted Residual
r * ' x .*.
192 312 2 9028 0 2172
4.9 2.86 3.0733 -0.2133
44.4 2.70 2.6024 0.0976
25.8 2.83 2.8241 0.0059
37.6 2.60 2.6835 -0.0835
40.1 2.58 2.6536 -0.0736
47.4 2.49 2.5666 -0.0766
35.7 2.60 2.7061 -0.1061
13.9 3.19 2.9660 0.2240
Figure 1. Split versus flow rate.
ittS^
'''I"1 1"! il11'""'
'"''ir'1'1!! ' "III"!',, ijillifi1'1 : '" i. i,'l linJh! J1 ft!]! I,!1"1"!,'1!1'1!
L I'!1!1
1
ii]ii«
:::::!:
"'":i:^~^:^^t^^^~~
i/Rii^'i-'iitii'ii iiiji.]^^
tt~% : !
jiseian iiiM i1:: I
11-1 : : : ' : :"
fflrf!;1!!^:!
.'.: !:.',
'"I
-------
predicted split, y,, is plotted as the regression line in
Figure 1. By visual inspection, p\, and P[ seem reasonable.
Residuals plotted as a function of predicted values of y,
residuals plotted as a function of time, and normal
probability plots of residuals are the most effective
approaches to evaluate the last three assumptions listed in
Table 1, respectively. The fourth column of Table 2
presents the residuals, e-,, which are computed as the
observed split minus the predicted split (y, - y^.
The plot of residuals should appear to be a uniform band
of points around 0, as shown in Case A of Figure 2
(Ponce, 1980). In Figure 2, residuals are plotted as a
function of predicted values of y. The analyst should look
for two types of patterns when evaluating assumption 3
from Table 1 (e.g., constant variance). The first is a
pattern of increasing or decreasing variance with predicted
values of y, as depicted in Case B of Figure 2. The
second is a pattern (e.g., a trend, a curved line) of the
residual with predicted values ofy. Both characteristics
are usually assessed based on a review of the residual
plots and professional judgment alone. The analyst may
also need to examine other variables besides predicted
values ofy to fully evaluate assumption 3.
Independence of residuals (assumption 4 from Table 1)
can be evaluated by examining residuals plotted as a
function of time. The analyst should look for the same
patterns as before. As an alternative for evaluating
independence, the analyst can also plot the fth. residual, eh
as a function of the (i-l)th residual, e^. One word of
caution is in order when reviewing any residual plot: If
there are more points in a certain section of the residual
plot, the residuals might not appear to be a uniform band
of points around 0 (as suggested in Case A of Figure 2);
instead, that section might have a somewhat wider band
(Helsel and Hirsch, 1995). This is an expected result.
The normality of residuals can be assessed by examining a
probability plot. Two problems with nonnormal residuals
are the loss of power in subsequent hypothesis tests and
increased prediction intervals together with the impression
of symmetry (Helsel and Hirsch, 1995).
Figure 3 displays all three of these plots for the split data
analyzed from Table 2. From Figure 3, .A and B, the
split residuals appear to be independent of predicted
values ofy and tune, as well as having constant variance.
Thus, the regression meets assumptions 3 and 4 listed in
Table 1. In this analysis, testing for residual
independence is important since the testing apparatus was
calibrated initially, The pumps or other equipment could
Figure 2. Plot of residuals versus predicted values.
(Source: Ponce, 1980)
.O.M
DM-
g ]
1
-025.
-2
. .
*
*
***** *
* 25 26 2:7 2.8 2.9 30 3.1
PREDICTED VALUES OF SPUTpq
A) Split residuals as function of predicted values of split.
*
» *
TIME ' ' ' '
B) Split residuals as function of time.
^^ [I,
^>^"'" 1-
KORMAL QUANT1LES
C) Probability plot of split residuals.
Figures. Plot of split residuals.
-------
I'M 'llllllll'illlll.ilipil. Ill1 'h'.ii" ii'.i'.PT'l
^
....... ;:?«;: ..... if ..... s ......... i^ ..... i ..... i ..... :> ...... jiK-g ..... |:;;i ..... i ..... ;.,!:/::.;. ............ ,f,rw;:.- ....... ::;;, ...... ;;;, :
nliiiliiniiiiiim t:*'i I'iiiilwiniijl niir ",
sHr£i?»B;.^$.v.":£
ill--,,! , ,: liliiic
"f;£i;;J£;;£;'.:i II: have""9!ffered'mpeiioTaaaice OTertune,"winch"'in"turn
'il!! ' '"' ' " l|l"!|1 'l ' ^l^afjfectffie resuftsT'TE'ig^e" 3C, the pjoga'gii|ty pi5t^'
^ir.l,~ 'sugges^s'that the data might notr^orously"follow the
1 ^sifiynir'Eibfn^iiy assumption, ggjoyg Up0n inspectioici any
^k""" "'"v'fi5rtSiCity viofiSon^pp«iri"'to'"be relatively'"minor. The
3C would fall along tl
''ne
equations used* in computing the analysis of variance
££frOV!X^sunra!aiy" Sfbte'mat'i^ and'Ai?,, the" sum' of
the xy cross products and the sum of the squares x, are
defined in Equation 3.
he coefficient of determination, R2, can be used to
ibu'ted. Th
lUted to evaluate normality.
-'' - - ; -""- i - - -i1;-" ' '-='=" - ; : i;-"--"'" ;' iff- =! ;=" ;! ; 1 =--iii3 i; *:
Had this analysis violated anyof these assumptions, using
~'~ "I3i^ere9£,,i:egression technique, transforming me'data," or '
addirig variables to the regression would have to be
f conspered. Alternatiyely, the uses of the regression
H^i.tejs'uits' could be limited to those identified in Table 1 as
evaluate what 'proportion of the variation can be explained
................................. , ..................................... £ ............ ,|^ [[[ ..... , .................. ................. ............................................ C ..............................
by file" model (3augush, 1986). "f? can be computed as
i" 1993)
grjmii| ..... flg y? well ..... .the,, |egression line fits tihe data,
. _ [SSy-s2(n-2)] SSE
Jf 1 ~
SS.. SS..
where
IIU lll'iHI WllilllliW^^^^^ <:!IJ 'lllilllllK^^^^^^^^^^ '''"Pi1 i/f xJi '' I
iioflin K..'il|«trj mKW.'-\~'.t'"> ".::', us si"1
liiiiiiiiiiiiii' and
'Uiiiliiiw^ ililHIi!!,,: '%;iliillllliiii,,iiK^ ill 'I ,i« i' . " : '''"
(6)
(7)
'
!:i t i:
SisaHSMsSil; ifl£,i ......... Evaluate the proportion of variation in v explained
llllilf ............... ..... 1 .............................. .................. ........... [[[ ......................................... I ..... ................ ................................... ................................ ......... , ................................
...... jjii ...... >iif'|iJ!i,;iy &e, iE?99S^- -
whether B0 is zero.
'
' IIPMPPPIH^^^^^ ....... . .I'Lul' ............... I ................ I ...... « .......... .T ,!<1 ...... '. !'r ............... , ,
. ^ ~ ...... iliiiW^^^^^^^ ..... ...... iip-inpiR -' . A !,:; » j* ' :":" ", """'
The residual, & is defined as Vj - y. Values for R2 range
. I . ' ''. »
(8)
. , ,,,. , ,,
,,, , , ~ . reErese?tmI ,^e 9as? wnere
observed y values are on the regression line. The
[[[ S! ...... [[[ n ........................................... ............................... £?, ...... .n< .............. M,, ........ ., .
;* ^Cgnigute the confidence interval for go- relatronships (Freund, 1973) and is computed as the
* Compute me confidence interval for pV square root of'JR5'. The sign of r should be the same as the
B .nj :l!::^^^^ ii2 ?£%Z,' iLiSS^* ""*M.".if f01,,^',,^^1,^16 extrej??
rgSeIJ'rnIght imagine, many of these evaluations have values representing the strongest association and 0
into standard spreadsheet rrepresenting no i
'resent a ^^^^oJ" fonnat'thaf '
Usmg the split data from above, the sum of residuals-
squared\SSE) is equal to 0.2227 and die sum of the
use to present tEe resuits"frbm a regression
inalysis. The top portion of Table 3 also presents the
''' '<«">< iiiliui||lil|li|i|!. mil'" I'Sil II ...i|.iiiiiiii |lni|T|i 'ilii'ilHilh IllPh'P
Intercept (30)
Flow Rate (p,)
Coefficients
3.1317
-0.0119
Standard
Error
0.072914
0.002237
t Statistic
-------
squares y (SSy) is 0.7093; thus, R2 is equal to 1-
(0.2227/0.7093) = 0.686, or 68.6 percent of the variance
is explained by the model. The correlation coefficient, r,
is then equal to -0.828. The overall model can also be
evaluated with the F statistic (28.41), which is computed
in Table 3. The F statistic is a measure of the variability
in the data set that is explained by the regression equation
in comparison to the variability that is not explained by
the regression equation. Since the/? value of 0.0001366 is
less than 0.05, the overall model is significant at the 95
percent confidence level.
Are Po and Pj significantly different from zero? The
standard error for p0 and PJ hi Table 4 can be calculated
as (Helsel and Hirsch, 1995)
Table 5. Percentiles of the ^df distribution (values oft
such that 100(l-a)% of the distribution is less than t).
= s
n SS
where
s =
(9)
(10)
(11)
The value s is equal to the standard error of the regression
(which is the same as the standard deviation of the
residuals). The corresponding / statistics (with n - 2
degrees of freedom) for P0 and PJ are then equal to P0 and
p! divided by their respective standard error. The /
statistics may then be compared to values from the /
distribution to determine whether p0 or P, are significantly
different from zero. In this case, P0 and pt are both
significantly different from zero based on inspection of
thek associated/? values hi Table 4.
The confidence intervals for P0 and Pi can be computed
using the following formulas (Helsel and Hirsch, 1995):
(12)
(13)
where taa
-------
1 ill
I 111
Ml
Ill
I
'Ill
In thi| example, j> is equal to the predicted split using
Equation 5 and the flow rate equal to x0. SSX and s can be
estimated by using Equations 3 and 11, respectively. This
interval is most narrow at~x and widens as XQ moves
in farther from"x. By calculating the interval at each point
1 along the regression line, a curve like the dashed line in
Figure 4 for the example data can be plotted. The
equation for the prediction interval for individual values of
y at Jc = x"g is (Helsel 'and Hirsch, 1995) " i
ss.
(15)
Figure 4 also shows this interval for the example data.
11 1, '' ' '' 'i 'i
One ofjhe simplest (in theory) nonpouit source control
applications of linear regressibn is the regressibn of a
walef quality indicator against an implementation
indicator. For example, flow-adjusted total suspended
Jt§S) concentration could be regressed" against a
~- 3SS!tV!3t5M;'«aB3^a.'^SB^EL. ,;, - i , t
lilliiii ' I lii S
Ji^^^^^^~~^~~~'~J§^J SaTan
of ,§11 cropland for which "delivery' "to"' the
I II Ml il'lllllllllllllllll 111 Mill i1 III ll|l 11 I 111 111
I II
versus tune will most likely be confounded by the
variability hi precipitation and flows. Thus, considerable
data manipulation (transformation, stratification, etc.)
might be required before regression analysis can be
successfully applied. In these cases, it might be more
appropriate to apply one of the alternatives to regression
described by Helsel and Hirsch (1995).
In many cases water quality parameters are regressed
against flow. This approach is particularly relevant hi
nonpouit source studies. In analysis of covariance,
regressions against flow are often performed prior to an
ANOVA- One of the implicit goals of nonpouit source
control is to change the relationship between flow and
pollutant concentration or load. In paired watershed
studies, measured parameters from paired samples are
often regressed against each other to compare the
watersheds (OSEPA, l9l?3). These regression lines can
be compared over time to test for the impact of nonpouit
source control efforts (Spooner et al., 1985). The reader
is referred to Paired Watershed Study Design (USEPA,
3elnbnstrates this technique.
strearn is likely to be 50 percent or greater. A significant
f6 sloge would' suggest'(but not" prove)"that"water
improved" because" of IIjpj~gngggn[lof
sedirnent control practices'^'' '
NONLINEAR REGRESSION AND
" T RAN S FORM ATIO NS
Nonlinear regression (as discussed here) involves
II Il'lgnsfbnnation to linear equations, followed by simple
e use of simple linear regression is to linear regression. Helsel and Hirsch (1995) provide a
i i1 j "'model^a water quality parameter versus tune.' In' this detailed discussion on transformations using the "bulging
, a significant slope would indicate change over rule" described by Mosteller and Tukey (1977), which can
.IIS j,,;,! ^^ i_rdrne. ",'pie sign' of the slope would 'indicate eitEef'"' be used to select appropriate transformations. Crawford et
""' ""''"' '"" '" '"" '" " ' . depen3mg"oii th'e parameter al. (1983) list the numerous regression models most often
applied* By tEe U.S. Geological Survey for flow-adjusting
concentrations. The selection
of which transformation to use
is ultimately based on an
inspection of the residuals and
whether the assumptions
described earlier are met.
Typical transformations include
x2, x3, lux, 1/x, x°-s, etc.
,lj|^^
.
$4
30
28-
22
20
95 PERCENT CONFIDENCE
INTERVAL FOR MEAN
RESPONSE
95 PERCENT CONFIDENCE
INTERVAL ON INDIVIDUAL
ESTIMATES OF SPLIT
10
20 30 40
FLOWRATE (GPM)
50
60
of spit versus flow rate withi confidence fimite for mean response and
'"' "'""
When the residuals do not
exhibit constant variance
(heteroscedasticity), one of
several common
transformations should be used.
Lx>garithmic transformations
are used when the standard
deviation hi the original scale is
proportional to the mean of y.
Square root transformations are
used when the variance is
proportional to the mean of y.
In many instances, the right
="=- i-"- -- I "ill1'1; ''i;:: f lilji "fi II -i I T-*^ '"?; «;T:;ii;il *;:;il: I'!'1!'1 ""' *^?l«**i ;jii'1 "^ I -t^*^ 'i -n11"" I -" 'SSttiS* '"''"';;"'" ":iiill!i '"''"I '
gllllll!!!, IlilliiSllilS^^^^ II,,,,;;,!,!;!!!1, ';,, 'fill'1:"
IliiiiB "' ,i> !,;,
!,,,, ;| i ; ; ;;;,||;|,|,,,j| ; "I; ; ;;
jn'iii! , '"::; I:;;!:":;'1:,: aaJii.^ jliiilfliilLJ.:,,!!!! i .ii;,: :i iiiJ!ii:!i!!ii!i Iliij!:1 ':ii!ii!!!:;:jjiii' ^in1:!!!1!'!': i libs: I!!:!!! i!!!!:,:!:!:!!'!!!::::' is im,: ! '!
i' <" iliiniiiljiiiiiililni:1 liiiiiiiiiiijiiiiil'iliiliiiiiiiiliikiiiiiiiiiii ii|iiiii|i|iiiiiiiiiiiiiiiint iiiiiif liiiijiiiii «i'n iiiiiei'iiiiiiiliiiinii iiNiiiiiiiiiiiiliiu|iiHiiii|Hu 4^'
iiiiiiiiliiiililiiiihiiiliiir'/iillliiiiiiiiilliiiiilii1:11 iii|,,ni ill
;|^-~j'.:.r
'ilihiidi'iifnuiiiliiiiiiiiiiiil:! ^fiilinM^ n'1,,1," 'i I'M.' "i ii ii"i" ,,," .iiiu'j . 'n ,',!"' ii
E IHl ,I||M' ,',,"il!' Mil
liiMiiii ,,^4i:>:;
-------
transformation will "fix" the nonlinear and heteroscedastic
problem. With data that are percentages or proportions
(between the values of 0 and 1), the variances at 0 and 1
are small. The arcsin of the square root of the individual
values is a common transformation that helps spread out
the values near 0 and 1 to increase their variance
(Snedecor and Cochran, 1980).
There are several disadvantages when applying
transformations to regression applications. The most
important issue is that the regression line and confidence
intervals are symmetric in the transformed form of the
variables. When these lines are transformed back to their
normal units, the lines will no longer be symmetrical.
The most notable time in hydrology when this creates a
problem is when estimating mass loading. To estimate the
mass, the means for short time periods are regressed and
summed to estimate the total mass over a longer period.
This approach is acceptable if no transformations are
usedthe analyst is summing the means. However, if a
log transformation was used, summing the mass over the
back-transformed values results in summing the median,
which will result hi an estimate that is biased low for the
total mass (Helsel and Hirsch, 1995).
As an example'of nonlinear regression, consider a
common relationship that is used to describe load (L) as a
function of discharge (Q):
L = aQb
Taking the logarithms of both sides yields
(16)
= \n(a)+b
(17)
which has the same form as Equation 2, introduced at the
beginning of this document, where ln(L) corresponds to y,
ln(a) corresponds to P0, b corresponds to pls and ln(<2)
corresponds to x. By taking the logarithms of both sides,
the nonlinear problem has been reduced to a simple linear
model. The only additional step that the analyst must
perform is to convert L and Q to ln(L) and ln(Q) before
using standard software. The analyst should be aware that
all of the confidence limits are hi transformed units; when
they are plotted hi normal units, the confidence intervals
will not be symmetric.
Figure 5 demonstrates how transforming the data may
improve the regression analysis. In Figure 5A, sulfate
concentrations (hi milligrams per liter) are plotted as a
function of stream flow (in cubic feet per second). The
apparent downward trend is typical of a stream dilution
effect; however, the trend is clearly nonlinear. The trend
line plotted in this figure, as well as the residuals plotted'
in Figure 5C, demonstrate that a linear model would tend
to over- and underestimate sulfate concentrations
depending on the flow. Figure 5B displays the same data
after computing the logarithms (base 10) of the sulfate and
flow data. A trend line fitted to these data and the
residual plot (Figure 5D) clearly demonstrate that
applying linear regression after log-transformation would
be appropriate for these data.
CONCLUSION
When properly used, regression analysis can be an
important tool for evaluating nonpoint source data.
However, the analyst should pay close attention that the
application of regression does not exceed the uses that are
met hi Table 1. In some instances it might be necessary
to select distribution-free approaches that tend to be more
robust. The reader is referred to Statistical Methods in
Water Resources (Helsel and Hirsch, 1995) for a more
complete discussion regarding distribution-free
approaches.
REFERENCES
Crawford, C.G., J.R. Slack, and R.M. Hirsch. 1983.
Nonparametric tests for trends in water-quality data using
the statistical analysis system. USGS Open File Report
83-550. U.S. Geological Survey, Reston, Virginia.
Dressing, S., J. Spooner, J.M. Kreglow, E.O. Beasley,
and P.W. Westerman. 1987. Water and sediment
sampler for plot and field studies. /. Environ. Qual.
16(l):59-64.
Freund, J.E. 1973. Modem elementary statistics.
Prentice-Hall, Englewood Cliffs, New Jersey.
Gaugush, R.F., ed. 1986. Statistical methods for reservoir
water quality investigations. Instruction Report E-86-2.
U.S. Army Engineer Waterways Experiment Station,
Vicksburg, Mississippi.
Helsel, D.R., and R.M. Hirsch. 1995. Statistical methods
in water resources. Elsevier, Amsterdam.
Mosteller, F., and J.W. Tukey. 1977. Data analysis and
regression. Addison-Wesley Publishers, Menlo Park,
California.
Ponce, S-L. 1980. Statistical methods commonly used in
water quality data. WSDG Technical Paper WSDG-TP-
00001. U.S. Department of Agriculture, Forest Service.
Remington, R.D., andM.A. Schork. 1970. Statistics
with applications to the biological and health sciences.
Prentice-Hall, Englewood Cliffs, New Jersey.
-------
iljllil iiilLiii:X \ii;iiil!L,i lifllib
I Inn 11.1.1 in... null g n I I iiHiigin .
Figure 5. Comparison of regression analyses using raw and log-transformed data.
' |:
,.
j G,W._, and WjG- SS^JIES-...!: St&ISSl'
I Ames,
Iowa.
> .!<'-'".'. ':. II' ,, .:'%,: :.;;' !SJ;:Ei;l:iy;^":;>vfei<{? !i '^..f^ li-'j"
:.:- ' Spooner, J.,, R.P. Maas, S.A. Dressm^, M.D. Smolen,
|i|; | ;;; ...; ,,,:: d^lfjiBSiflg water Duality improvements "from
hon, ..... pr^eedings-of a national
''i ," KSssuri'i'EPA
.^ ........ illL,!^ ..... ' ..... , liwi; :!,,,:;;»!;,!!,!!, JlE!!,!,,,!,;,!,!,!!;; ....... ' ...... \ '\:L, »j,f ....... 1;:11;;iil!l!;1;;ll1!"hh;,1
"' '' ' '
j-" : : QO?. tLS. Enyirojamentai Profectjp,n Agency, Office of
I | ! !. UJ' t; a j^Jjiij,,,,' i,| 'Ii .^.Slaiil; i !, :;;iil;;Li L!';;! ! .S, ' In *, ,h " ' ' " ' ' T ,' ' ' ' ii"i ,
rtjijgtpn, DC.
iiiiiiniiliiiiif !l'i lIKjiiiii i pip|il|ii!iiiii!!« :iiii!I!i!!iI!iiiii!iiiii|ii!!IIIIIiiiii3 :i;ii!i|iiiiiiii!i!iiiii!nji!iyh!li^''i! 'riJiilJiJ i i i >n v P' ii" iiii'1; \f, i #! ^'", ;,: '' i"' "\ < iii»: ' '', <,» v~ 7V, /, > 17\S,'' liiiji t ~ i 'ilpi'x' "I" i ! >i!l \Sr \
. EJiii..!;!;.:!f ISi«. ?»,;> !:::..:,E -,y . ,,;;, v.'J.^i; laSjh 'f W- ;:||lt|i
i/SSiiif aafiiirtlnw «i,'!' -'ikiw" , > ''hi^1 :i':Ji! .';," / \ '' ^'i,!:!;*!'!!'!!,1 M',^ i iJii'P1""1!!11 ^I'l'Ieiii
....... ...................... |||f ............................ ........... f | ...... , ......... |B ...... f ................. ....... f ........ ..... , , ......... ,| |r | . ............ , ....... .; ....... , ^ ^ | ...... .......... ........ |h
' ' '
' '' |i '' '; f ...... i1 ,"l!ji|ii1"1 !"' 'I|""'1' .I1'-1;'! I ...... ; f;1 'il'ii ''i]i ' ",, '! 'vs' ..... i1 ii'i, ii' |:«!l' ,/ , ; ,; ^ ). ,
.I i i '',' ' ',. "','' iiLini i i ' ,'i',i . ,,,:, ,,,»"i j inilu , .,,,!,,. i i ,nii !, iiiiiiii,;:,,m,iii,,|,:. :: in,;!, ,.|; n,,! iiiiiUiiiiiiiu, i,mi,|i|iii iin,, si jik, ii" jii!!.;,«! ikini ;.,i
K^^^^^ i>«^^^ R ft! i!!:ii
------- |