Jl United States
Environmental Protection
^^Lal M % Agency
Scout 2008 Version 1.0
User Guide
Part III
W*Of R*gr*t)>on •/»«* «o(
C** OU	Y v* X ««
RESEARCH AND DEVELOPMENT

-------
£3151?	US EPA
AtcMUt	Headquarters ar*a Chemical Libraries
EPA	£PA West Blclg Room 3.340
too-	Mpiicode 3404T
1301 Constitution Ave NW
^	Washington DC 20004
0ft	202-566-0556
pt-3
EPA/600/R-08/038
February 2009
www epa gov
Scout 2008 Version 1.0
User Guide
(Second Edition, December 2008)

John Nocerino
U.S. Environmental Protection Agency
Office of Research and Development
National Exposure Research Laboratory
Environmental Sciences Division
Technology Support Center
Characterization and Monitoring Branch
944 E. Harmon Ave.
cr	Las Vegas, NV 89119
_£>

-------
Notice
The United States Environmental Protection Agency (EPA) through its Office of
Research and Development (ORD) funded and managed the research described here. It
has been peer reviewed by the EPA and approved for publication. Mention of trade
names and commercial products does not constitute endorsement or recommendation by
the EPA for use.
The Scout 2008 software was developed by Lockheed-Martin under a contract with the
USEPA. Use of any portion of Scout 2008 that does not comply with the Scout 2008
User Guide is not recommended.
Scout 2008 contains embedded licensed software. Any modification of the Scout 2008
source code may violate the embedded licensed software agreements and is expressly
forbidden.
The Scout 2008 software provided by the USEPA was scanned with McAfee VirusScan
and is certified free of viruses.
With respect to the Scout 2008 distributed software and documentation, neither the
USEPA, nor any of their employees, assumes any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information, apparatus, product, or process
disclosed. Furthermore, the Scout 2008 software and documentation are supplied "as-
is" without guarantee or warranty, expressed or implied, including without limitation, any
warranty of merchantability or fitness for a specific purpose.
in

-------
Acronyms and Abbreviations
% NDs	Percentage of Non-detect observations
ACL	alternative concentration limit
A-D, AD	Anderson-Darling test
AM	arithmetic mean
ANOVA	Analysis of Variance
AOC	area(s) of concern
B*	Between groups matrix
BC	Box-Cox-type transformation
BCA	bias-corrected accelerated bootstrap method
BD	break down point
BDL	below detection limit
BTV	background threshold value
BW	Black and White (for printing)
CERCLA	Comprehensive Environmental Response, Compensation, and
Liability Act
CL
compliance limit, confidence limits, control limits
CLT	central limit theorem
CMLE	Cohen's maximum likelihood estimate
COPC	contaminant(s) of potential concern
C V	Coefficient of Variation, cross validation
D-D	distance-distance
DA	discriminant analysis
DL	detection limit
DL/2 (t)	UCL based upon DL/2 method using Student's t-distribution
cutoff value
DL/2 Estimates	estimates based upon data set with non-detects replaced by half
of the respective detection limits
DQO	data quality objective
DS	discriminant scores
EA	exposure area
EDF	empirical distribution function
EM	expectation maximization
EPA	Environmental Protection Agency
EPC	exposure point concentration
FP-ROS (Land)	UCL based upon fully parametric ROS method using Land's H-
statistic
v

-------
Gamma ROS (Approx.) UCL based upon Gamma ROS method using the bias-corrected
accelerated bootstrap method
Gamma ROS (BCA) UCL based upon Gamma ROS method using the gamma
approximate-UCL method
GOF, G.O.F.	goodness-of-fit
H-UCL	UCL based upon Land's H-statistic
HBK	Hawkins Bradu Kaas
HUBER	Huber estimation method
ID	identification code
¦QR	interquartile range
K	Next K, Other K, Future K
KG	Kettenring Gnanadesikan
KM (%)	UCL based upon Kaplan-Meier estimates using the percentile
bootstrap method
KM (Chebyshev)	UCL based upon Kaplan-Meier estimates using the Chebyshev
inequality
KM (t)	UCL based upon Kaplan-Meier estimates using the Student's t-
distribution cutoff value
KM (z)	UCL based upon Kaplan-Meier estimates using standard normal
distribution cutoff value
K-M, KM	Kaplan-Meier
K-S, KS	Kolmogorov-Smirnov
LMS	least median squares
LN	lognonnal distribution
Log-ROS Estimates estimates based upon data set with extrapolated non-detect
values obtained using robust ROS method
LPS	least percentile squares
MAD
Median Absolute Deviation
Maximum	Maximum value
MC	minimization criterion
MCD	minimum covariance determinant
MCL	maximum concentration limit
MD	Mahalanobis distance
Mean	classical average value
Median	Median value
Minimum	Minimum value
MLE	maximum likelihood estimate
MLE (t)	UCL based upon maximum likelihood estimates using Student's
t-distribution cutoff value
vi

-------
MLE (Tiku)	UCL based upon maximum likelihood estimates using the
Tiku's method
Multi Q-Q	multiple quantile-quantile plot
MVT	multivariate trimming
MVUE	minimum variance unbiased estimate
ND	non-detect or non-detects
NERL	National Exposure Research Laboratory
NumNDs	Number of Non-detects
NumObs	Number of Observations
OKG	Orthogonalized Kettenring Gnanadesikan
OLS	ordinary least squares
ORD	Office of Research and Development
PCA	principal component analysis
PCs	principal components
PCS	principal component scores
PLs	prediction limits
PRG	preliminary remediation goals
PROP	proposed estimation method
Q-Q	quantile-quantile
RBC	risk-based cleanup
RCRA	Resource Conservation and Recovery Act
ROS	regression on order statistics
RU	remediation unit
S	substantial difference
SD, Sd, sd	standard deviation
SLs	simultaneous limits
SSL	soji screening levels
S-W, SW	Shapiro-Wilk
TLs	tolerance limits
UCL	upper confidence limit
UCL95, 95% UCL	95% upper confidence limit
UPL	upper prediction limit
UPL95, 95% UPL	95"% Upper prediction limit
USEPA	United States Environmental Protection Agency
UTL	upper tolerance limit
Variance	classical variance
W*	Within groups matrix
vii

-------
WiB matrix	Inverse of W* cross-product B* matrix
WMW	Wilcoxon-Mann-Whitney
WRS	Wilcoxon Rank Sum
WSR	Wilcoxon Signed Rank
Wsum	Sum of weights
Wsum2	Sum of squared weights
viii

-------
Table of Contents
Notice	iii
Acronyms and Abbreviations	v
Table of Contents	ix
Chapter 9	341
Regression	341
9.1	Ordinary Least Squares (OLS) Linear Regression Method	343
9.2	OLS Quadratic/Cubic Regression Method	351
9.3	Least Median/Percentile Squares (LMS/LPS) Regression Method	357
9.3.1 Least Percentile of Squared Residuals (LPS) Regression	365
9.4	Iterative OLS Regression Method	376
9.5	Biweight Regression Method	387
9.6	Huber Regression Method	400
9.7	MVT Regression Method	411
9.8	PROP Regression Method	422
9.9	Method Comparison in Regression Module	435
9.9.1	Bivariate Fits	436
9.9.2	Multivariate R-R Plots	442
9.9.3	Multivariate Y-Y-hat Plots	445
References	449
ix

-------
Chapter 9
Regression
»
The Regression module in Scout also offers most of the classical and robust multiple
linear regression (including regression diagnostic methods) methods available in the
current literature, similar to the Outlier/Estimates module. The multiple linear regression
model with p explanatory (x-variables, leverage variables) variables is given by:
y,= (x,\b + x,ib2 + x,Pbp)+ e, ¦
The residuals, e„ are assumed to be normally distributed as N(0,S 2); i = I, 2, ..., n.
The classical ordinary least square (OLS) method has a "0" break down point and can get
distorted by the presence of even a single outlier, as in the classical mean vector and the
covariance matrix.
Let	= (xlX,xl2,...,xip), b'= (bvb2,...,bp).
A
The objective here is to obtain a robust and resistant estimate, b , of b using the data set,
(y,,x\y, i = 1, 2, ..., n. The ordinary least squares (OLS) estimate, b0!S,ofb is
n
obtained by minimizing the residual sum of squares; namely, ]T r,% where
^	A
rt = y, - x'b0ls. Like the classical mean, the estimate, b0LS, of b has a "zero" break
A
down point. This means that the estimate, b0LS, can take an arbitrarily aberrant value
even by the presence of a single regression outlier (y-outlier) or leverage point (x-outlier),
leading to a distorted regression model. The use of robust procedures that eliminate or
dampen the influence of discordant observations on the estimates of regression
parameters is desirable.
In regression applications, anomalies arising out of p-dimension space of the predictor
variables, (e.g., due to unexpected experimental conditions), are called leverage points.
Outliers in the response variable (e.g., due to unexpected outcomes, such as unusual
reactions to a drug), are called regression or vertical outliers. The leverage outliers are
divided into two categories: significant leverages ("bad" or inconsistent) and insignificant
("good" or consistent) points.
The identification of outliers in a data set and the identification of outliers in a regression
model are two different problems. It is very desirable that a procedure distinguishes
between good and bad outliers. In practice, in order to achieve high break down point,
some methods (e.g., LMS method) fail to distinguish between good and bad leverage
points.
341

-------
In robust regression, the objective is twofold: 1) the identification of vertical (y-outliers,
regression outliers) outliers and distinguishing between significant and insignificant
leverage points, and 2) the estimation of regression parameters that are not influenced by
the presence of the anomalies. The robust estimates should be in close agreement with
classical OLS estimates when no outlying observations are present. Scout also offers
several formal graphical displays of the regression and leverage results.
Scout provides several methods to obtain multiple linear regression models. Those
available options include:
® Ordinary Least Squares Regression (OLS)
Minimizes the least squared residuals.
° Least Median/Percentile Squares Regression (LMS/LPS)
Minimizing the "hth" ordered squared residuals (Rousseeuw, 1984).
® Biweight Regression
Conducted using Tukey's Biweight criterion (Beaton and Tukey, 1974).
• Huber Regression
Conducted using Huber influence function (Huber, 1981).
o MVT Regression
Conducted using Multivariate Trimming Methods (Devlin et al., 1981).
° PROP Regression
Conducted using PROP influence function (Singh andNocerino, 1995).
Scout also provides the user with the option of identifying leverage outliers. If the
leverage option is selected, then the outliers arising in the p-dimensional space of the
predictor variables (X-space) are identified first. Those leverage points can be identified
using various options available in Scout. The leverage points are identified using the
same outlier methods as incorporated in the outlier module of Scout. The MDs for the
leverage option are computed using the selected x-variables only. The weights obtained
used in the leverage option are used at the initial regression option. The regression option
is iterated some number of times to identify all of the regression outliers and bad leverage
points. This process also distinguishes between good and bad leverage points.
342

-------
9.1 Ordinary Least Squares (OLS) Linear Regression Method
i.
Click Regression > OLS > Multiple Linear.

Scout 2008. - [p:\Haraia\W.orkPatlhE>(ccl\Wood]| '
50 File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QA/QC |
Navigation Panel I
Multivariate EDA Geo5tats Programs Window Help
Name
D:\NarainVWotkDall...
OLSOut ost
OLSresQQ gst
OLSresXY gst
OLSresNDX gst
nI ^ YYh^t net

0
1
2


Case
*1
*2
a
1
1
0 573
01059

2
2
0 651
0 1 356,
3
3
0 606
01273
I 4
4
0 437
0159V
5
5
0 547
01135

IMS/LPS
Iterative OLS
Biweight
Huber
MVT
PROP
Multiple Linear
Quadratic
Cubic
Method Comparison ~ [5
0 535|
057j~
"045j '
"Mi"
2. The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o Click on the "Options" button.
t*7 Display Intervals
Confidence Coefficient
0.95
F? Display Diagnostics
OK
Cancel
A
o The "Display Intervals" check box will display the "Summary
Table for Prediction and Confidence Limits" in the output sheet.
o The "Display Diagnostics" check box will display the
"Regression Diagnostics Table" and the "Lack of Fit ANOVA
Table" (only if there are replicates in the independent variables).
o Click "OK" to continue or "Cancel" to cancel the options.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
343

-------
user should select and click on an appropriate variable representing a
group variable.
e Click on the "Graphics" button and check all boxes.
HI Select'OLS,Graphics Options.
17 XY Plots
[7 Yvs Y-Hat
17 Y vs Residuals
17 Y-Hal vs Residuals
17 Index Plots
17 QQ Residuals
XY Plot Title
Linear OLS Regression ¦ Y vsX PI
Y vsY-Hat Title
Linear OLS Regression - Y vsY-H
Y vs Residuals Title
Linear OLS Regression ¦ Y vs Res
Y-Hat vs Residuals Title
Linear OLS Regression - Y-Hat vs
XY Plot Title
"Regression Line - Fixing Other Regressors at ~
C No Line	17 Confidence Interval
C Minimum Values
(* Mean Values
C Maximum Values
C Zero Values
17 Predection Interval
|— Confidence Coefficient
0 95
Linear OLS Regression ¦ Residual
QQ Residuals Title
OK
Cancel
Linear OLS Regression - Residual
A
o A regression line can be drawn in the multivariate setting by
choosing a single independent (regressor) variable and fixing other
variables at the provided options using "Regression Line - Fixing
Other Regressors at" option.
o Specify the confidence or/and prediction band for the regression
line using the "Confidence Intervals" and the "Prediction
Intervals" check boxes.
o Specify the "Confidence Level" for the bands.
o Click "OK" to continue or "Cancel" to cancel the options.
° Click "OK" to continue or "Cancel" to cancel the OLS procedure.
344

-------
Output for OLS Regression.
Data Set used: Wood (predictor variables p = 5).

Ordinary Least Squares Linear Regression Analysis Output
Date/T ime of Computation
10/30/200811.05:40 AM
User Selected Options

From File
D: \N arainW/ orkD atl nE xcelVWood
Full Precision
OFF
Confidence Level for Intervals
0.95
lay Confidence and Prediction Limits
True
Display Regresion Diagnostics
True
Title for Residual QQ Plot
Linear OLS Regression - Residuals QQ Plot
Title Residual Index Plot
Linear OLS Regression ¦ Residuals Index Plot
Title For Y vsX Plots
Linear OLS Regression - Y vsX Plot
Confidence Level for Regression Line
0 95
Display Confidence Band
True
Display Prediction Band
True
Title for Y-Hat vs Residuals Plot
Linear OLS Regression - Y-Hat vs Residuals Plot
Title for Y vs Residuals Plot
Linear OLS Regression - Y vs Residuals Plot
Title for Yvs Y-Hat Plot
Linear OLS Regression - Y vs Y-Hat Plot
Number of Observations
20



Dependent Variable
y



Number of Selected Regression Variables
5



Independant Variable
x1



Independant Variable
x2



Independant Variable
x3



Independant Variable
x4



Independant Variable
x5






Correlation Matrix



y
x1
x2
x3
x4
x5


y
1
•0145
0G11
0 47
-0G
0G29


x1
-0.145
1
-0 24G
-0 G04
0 528
-0.G41


x2
0S11
-0 24G
1
0 388
-0.498
0 248


x3
0 47
-0.G04
0 388
1
-0 24
0G59


x4
-0G
0 528
-0 498
-0 24
1
-0 512


x5
0G29
•0 G41
0 248
0 659
-0 512
1


345

-------
Eigenvalues of Correlation Matrix


Eval 1
Eval 2
Eval 3
Eval 4
Eval 5
Eval 6


3 357
1.114
0.713
0.588
0173
0.054


Sum of Eigenvalues
6







Regression Estimates and Inference Table
Paramater
DOF
Estimates
Std. Error
T-values
p-values
T ol Values
VIF
intercept
1
0.422
0.169
2.494
0.0253
N/A
N/A
x1
1
0.441
0.117
3.77
0.00222
0.27
3.701
x2
1
-1.475
0.487
-3.029
0.00931
0.264
3.786
x3
1
-0.2G1
0.112
-2.332
0.0339
0 583
1.715
x4
1
0.0208
0.161
0.129
0.388
0.299
3.346
x5
1
0.171
0.203
0.84
0.27
0.268
3.725


OLSANOVA Table

Source of Variation
ss
DOF
MS
F-Value
P-Value
Regression
0.0344
5
0.00687
11.81
0.0001
Error
0.00814
14
5.8158E-4


Total
0.0425
19








R Square
0 808



Adjusted R Square
0.74



Sqrt(MSE) = Scale
0.0241




Regression Table

Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res

1
0.534
0.551
-0.0175
0.278
-0.725
-0 853

2
0.535
0 534
0.00114
0.132
0.0472
0.0507

3
0.57
0.54
0 03
0.22
1.243
1.407

4
0.45
0.441
0.00855
0.258
0.355
0.412

5
0.548
0.524
0.0242
0.222
1.002
1.137

6
0.431
0 442
-0.0109
0.259
-0.452
-0.525

7
0.481
0.459
0.0219
0.53
0.907
1.323

8
0.423
0.424
-8 415E-4
0.289
-0.0349
-0.0414

9
0.475
0 485
-0.00955
0.348
-0.396
-0.49

10
0.488
0.496
-0.01
0 449
0.415
-0.559

11
0.554
0.506
0.0479
0.317
1.986
2.403

346

-------
Output for OLS Regression (continued).
J	1	'	'	I., 		 , , 	 1			I-		L
Summary T able for Prediction and Confidence Linfts
~bs
Y Vector
Yhal
s(Yhat)
s(pred)
LCL
UCL
LPL
UPL
Residuals
1
0 534
0.551
0 0127
0 0273
0 524
0.579
0.493
0.G1
-0.0175
2
0.535
0 534
0 00875
0 0257
0515
0.553
0.479
0 589
0.00114
3
0.57
0 54
00113
0 026G
0.51 G
0.5G4
0.483
0.597
0.03
4
0 45
0.441
0 0123
0.0271
0 415
0 468
0.383
0.499
0.00855
5
0 548
0 524
0.0114
0.0267
0.499
0 548
0.467
0.581
0.0242
G
0 431
0 442
0.0123
0.0271
0.41G
0.4G8
0.384
0.5
-0 0109
7
0.481
0 459
0.017G
0 0298
0.421
0.497
0.395
0 523
0.0219
8
0.423
0.424
0.013
0.0274
0.39G
0.452
0 365
0.483
-8.415E-4
9
0 475
0 485
0 0142
0 028
0 454
0.515
0.424
0.545
-0 00955
10
0.48G
0 49G
0 01G2
0.029
0 4G1
0.531
0.434
0.558
-0 01
11
0 554
0 506
0 0136
0.0277
0 477
0 535
0 447
0.565
0 0479
12
0 519
0 548
0 0154
0.0286
0 515
0 581
0 486
0.609
-0.0289
13
0 492
0 504
0.0129
0.0274
0.47G
0 531
0.445
0.562
¦0 0117
14
0 517
0 547
0 008GG
0 0256
0.529
0 5GG
0 492
0.G02
-0.0304
15
0.502
0 51G
0 00941
0 0259
0.49G
0 536
0.461
0 572
-0.0141
1G
0.508
0 495
0.0175
00298
0 458
0.533
0.431
0.559
0.0126
17
0.52
0 52G
0 013
0.0274
0.498
0.554
0.467
0.585
-0.00615
18
0.50G
0.499
0 0131
0.0274
0 471
0.527
0.44
0.558
0.00G85
19
0 401
0 427
0 013
0.0274
0.399
0.455
0.368
0.48G
-0 0261
20
0 5G8
0 555
0 013G
0.0277
0.526
0.584
0.495
0.G14
0.0131



No replicates in the data - Lack of Fit AN OVA T able not displayed









Regression Diagnostics Table




Obs. #
Residuals
H[u]
CD[i]
t[i]
DFFITG




1
•0.0175
0 278
0.04GG
-0 876
-0.543




2
000114
0132
G 4894E-5
0 0507
0 0197




3
0 03
0 22
0.0929
1.518
0 80G




4
0.00855
0 258
0 00985
0.414
0.245




5
0 0242
0.222
0.0G1G
1.193
0 638




6
-0.0109
0 259
0 0161
-0.53
¦0.314




7
0.0219
0.53
0.329
1.414
1.502




8
-8.415E-4
0 289
1.1582E-4
-0.0414
-0 02G4




9
-0 00955
0 348
0.0214
-0 495
-0 362




10
-0.01
0.449
0 0425
-0.566
-0 511




347

-------
Output for OLS Regression (continued).
OLS Regression - Residuals QQ Plot
[Upper Un-asl
^ -016
-a -0 36
.§-0.56
'0
-0.76
-0.96

Normal Quantiles
348

-------
Output for OLS Regression (continued).

-------
Output for OLS Regression (continued).
« oog RetOud-OJ
J Residual • 0.0
Linear OLS Regression - Y vs Residuals Plot
0384	0 404	0424	0444	0464
0 504	0 524	0544	0 564	0 584	0.6O*
Linear OLS Regression - Y-Hat vs Residuals Plot
259
209
0571 0561
350

-------
9.2 OLS Quadratic/Cubic Regression Method
1. Click Regression > OLS > Quadratic or Cubic.
j Seoul 2008 - [p:\Narain\WoikpatlnExcel\Wood]J
dQ File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QA/QC
Navigation Panel
Multivariate EDA GeoStats Programs Window Help
Name
D:\Narain\Woi1cDatl..
OLSOut ost
OLSresQQ gst
OLSresXY gst
OLSresNDX gst
OLS YYhat ost

0 : '
2


Case
*1
x2
X
1
1
0 573
01053
2
2
0 651
0 135S(
3
3
0 606
01273
	
4
4
0 437
01591
5
5
0 547
01135
Regression
OLS
LMS/LP5
Iterative OLS
Bi weight
Huber
MVT
PROP
Method Comparison ~
The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° Click on the "Options" button.
SI OLS Options 2£.
P? Display Intervals
Confidence Coefficient -
0.95
Is? Display Diagnostics
OK
Cancel
A
o The "Display Intervals" check box will display the "Summary
Table for Prediction and Confidence Limits" in the output sheet.
o The "Display Diagnostics" check box will display the
"Regression Diagnostics Table" and the "Lack of Fit ANOVA
Tabic" (only if there are replicates in the independent variables).
o Click "OK" to continue or "Cancel" to cancel the options.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
351

-------
user should select and click on an appropriate variable representing a
group variable.
Click on the "Graphics" button and check all boxes.
H Select; OLS, Graphics Options
9 XY Plots
W V vs Y-Hat
17 Y vs Residuals
	XY Plot Title	
(Linear OLS Regression -Y vsXPI
Y vs Y-Hat T itle
(Linear QLS Regression • Y vsY-H
	Y vs Residuals Title	
(Linear OLS Regression • Y vs Res
Y-H at vs R esiduals T itle
17 Y-Hat vs Residuals jLlnear OLS Regression-Y-Hat vs
17 Index Plots
17 QQ Residuals
XV Plot Title
(Linear OLS Regression - Residual
	QQ Residuals Title	
{Lineal OLS Regression - Residual
-Regression Line - Fixing Other Regressors at -
No Line	[7 Confidence Interval
f Minimum Values
(•	Mean Values
C	Maximum Values
C	Zero Values
17 Predection Interval
Confidence Coefficient
0.95
OK
Cancel
A
o "Regression Line - Fixing Other Regressors at" option is not
used in this quadratic regression module.
o Specify the confidence or/and prediction band for the regression
line using the "Confidence Intervals" and the "Prediction
Intervals" check boxes.
o Specify the "Confidence Level" for the bands.
o Click "OK" to continue or "Cancel" to cancel the options.
o Click "OK" to continue or "Cancel" to cancel the OLS procedure.
352

-------
Output for OLS Regression.
Data Set used: Wood (predictor variables p = 5).

Ordinary Least Squares Quadratic Regression Analysis Output
Date/Time of Computation
10/30/20081.14.57 PM
User Selected Options

From File
D AN arain\WorkD atl nE xcel\Wood
Full Precision
OFF
Confidence Level for Intervals
0.95
i'j Confidence and Prediction Limits
True
Display Regresion Diagnostics
True
Residual QQ Plot
Not Selected
Residual Index Plot
Not Selected
T itle For Y vs X Plots
Quadratic OLS Regression -Y vs X Plot
unfidebce Level for Regression Line
0.95
Display Confidence Band
True
Display Prediction Band
True
Y vs Residuals Plot
Not Selected
Y vs Residuals Plot
Not Selected
Y vsY-Hat Plot
Not Selected
Number of Observations
20




Dependent Variable
y




Number of Selected Regression Variables
1




Independant Variable
x1











Correlation Matrix





y
x1
Squared






y
1
0.997
0 629






x1
0 997
1
0.588






Squared
0.629
0.588
1














Eigenvalues of Correlation Matrix




Eval 1
Eval 2
Eval 3







2.493
0.505
0.0015







Sum of Eigenvalues
3






353

-------
Output for OLS Regression (continued).
R egression E stimates and 1 nfetence T able

Paramater
DOF
Estimates
Std. Error
T-values
p-values
Tol Values
VIF

intercept
1
-0.G43
0 26
-2.47
0.0252
N/A
N/A

x1
1
3 90G
0 957
4.08
9.1497E-4
0.00573
174 6

Squared
1
-3.237
0 863
•3.75
0 00185
0.00573
174.6




OLS ANOVA Table


Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
0.0284
2
0 0142
17.2
0.0001

Error
0.0141
17
8.2683E-4



Total
0 0425
19










R Square
0.669




Adjusted R Square
0.63




Sqrt(MSE) = Scale
0.0288






Regression Table


Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/'Scale
Stude" Res


1
0 534
0.532
0 00157
0103
0.0545
0.0576


2
0 535
0.528
0.00G97
0116
0.242
0.258


3
0 57
0.535
0.0346
0.0923
1 204
1.264


4
0 45
0.44G
0.00411
0.16
0143
0.156


5
0 548
0.525
0 0229
0.106
0 795
0 84


G
0.431
0.453
-0 0223
0.137
-0.774
-0 833


7
0 481
0.493
-0.0121
0.0873
-0 421
-0.441


8
0 423
0.418
0.00481
0.293
01G7
0.199


9
0.475
0.521
-0 0457
0.103
-1.591
•1.68


10
0 486
0514
-0 0278
0.247
-0 966
-1.114


11
0.554
0 523
0.0305
0.149
1.062
1.151


12
0.519
0.503
0.0158
0.391
0.549
0.703


13
0.492
0.527
-0 0354
0.12
-1 231
-1 313


14
0.517
0 534
-0 0174
0 0993
-0 606
-0.639


15
0.502
0 52
¦0.0179
0.103
-0 621
-0.656


1G
0.508
0.515
-0.00653
0 099
-0 227
-0 239


17
0.52
0 534
-0.0136
0.101
-0.475
-0.501


18
0.50G
0 457
0 0487
0.12G
1.692
1.81


19
0.401
0.423
-0.0221
0.264
-0 767
-0 895


20
0.568
0.517
0.0509
0.101
1.772
1.869


354

-------
Output for OLS Regression (continued).
S ummaiy T able f01 Prediction and Confidence Links

Obs
V Vectoi
Yhat
s(Yhat)
s(pred)
LCL
UCL
LPL
UPL
Residuals
	
1
0 534
0 532
0.00925
0 0302
0 513
0 552
0.469
0 596
"0592
0 00157
0 00697
2
0 535
0 528
0 00981
0 0304
0 507
0 549
0 464
3
0.57
0 535
0 00874
0 0301
0 517
0.554
0 472
0.599
0 0346

4
0.45
0 446
00115
0 031
0.422
0.47
0.381
0.511
0 00411

5
0.548
0 525
0 00934
0 0302
0 505
0 545
0.461
0 589
0 0229

8
0 431
0 453
0 0106
0 0307
0 431
0 476
0 389
0 518
-0.0223

7
0 481
0 493
0 0085
0 03
0 475
0 511
0 43
0 556
-0 0121

8
0.423
0 418
0 0156
0 0327
0 385
0.451
0.349
0.487
0 00481

9
0 475
0 521
0 00925
0 0302
0.501
0.54
0.457
0.584
-0 0457

• 10
0.48G
0 514
0 0143
0 0321
0.484
0.544
0.446
0.582
¦0 0278

11
0.554
0 523
00111
0 0308
05
0 547
0 458
0 589
0 0305

12
0 519
0 503
0 018
0.0339
0.465
0.541
0.432
0.575
0 0158

13
0 492
0 527
0.00998
0 0304
0 506
0 548
0 463
0.582
-0 0354

14
0.517
0.534
0.00906
0 0301
0.515
0.554
0.471
0.598
¦0 0174

15
0.502
0.52
0.00922
0.0302
05
0 539
0.456
0 584
-0.0179

16
0 508
0 515
0 00905
0 0301
0 495
0 534
0 451
0 578
-0.00653
	
17
0 52
0 534
0 00916
0 0302
0 514
0 553
0 47
0 597
-0 0136
18
0 506
0 457
0.0102
0 0305
0 436
0 479
0 393
0.522
0 0487

19
0.401
0 423
0 0148
0 0323
0 392
0 454
0 355
0.491
-0 0221

20
0.568
0517
0.00913
0 0302
0 498
0 536
0 453
0 581
0 0509





No leplicates in the data - Lack of Fit ANOVA Table not displayed











Regression Diagnostics Table





Obs. 8
Residuals
H[i.i]
CD[i]
t|i]
DFFITS





1
0 00157
0103
1 2758E-4
0 0576
0 0196





2
0 00697
0.116
0.00292
0 258
0 0938





3
0 0346
0 0923
0 0542
1 328
0 423





4
0 00411
016
0 00154
0.156
0 0681





5
0.0229
0106
0 0278
0 858
0 285





G
-0 0223
0137
0 0366
-0 851
¦0 339





7
-0 0121
0 0873
0.00621
-0 444
¦0137





8
0 00481
0 293
0 00549
0199
0.128





9
¦0 0457
0.103
0109
-1 84
-0 625





10
¦0 0278
0 247
0136
¦1 157
¦0 663 j




355

-------
Output for OLS Regression (continued) - Quadratic Fit.
Output for OLS Regression (continued) - Cubic Fit.
356

-------
9.3 Least Median/Percentile Squares (LMS/LPS) Regression
Method
Break Down Point of LMS Regression Estimates
The break down (BD) points for LMS (k~0.5) and least percentile of squared residuals
(LPS, k>0.5) regression methods as incorporated in Scout are summarized in the
following table. Note that, LMS is labeled as LPS when k>0.5. In the following the
fraction, k is given by 0.5< k <1. For example, for median, the fraction, k =0.5, for 75th
percentile, fraction, k = 0.75, and so forth.
Approximate Break Down Point for LMS or LPS Regression Estimates
No. of Explanatory Vars., p = 1
Minimizing Squared Residual
Pos = [n/2], k = 0.5
Pos = [(n+l)/2]
Pos = [(n+p+l)/2]
LPS ~ Pos = [n*k], k > 0.5
BD
(n-Pos)/n
(n-Pos)/n
(n-Pos)/n
(n-Pos)/n
No. of Explanatory Vars. ,P>
Minimizing Squared Residual
Pos = [n/2], k = 0.5
Pos = [(n+l)/2]
Pos = [(n+p+l)/2]
LPS ~ Pos = [n*k], k > 0.5
BD
(n-Pos-p+2)/n
(n-Pos-p+2)/n
(n-Pos-p+2)/n
(n-Pos-p+2)/n
Here [x] = greatest integer contained in x, and k represents a fraction: 0.5 < k <1. Pos stands for
position/index of an entry in ordered array (of size n) of squared residuals. The squared residual at
position, Pos is being minimized. For example, when Pos = [n/2], the median of squared residuals is being
minimized.
I. Click Regression > LMS.
HI Scout- 2008; [DiWacain^cquti_F?or_V/indgy^\ScputSo,urc^\WorkDatlhE)ceJ\Wo()d]|
Regression
Navigation Panel |

o
1
2
OL5
5 G
7
8
Name |
Case
*1
x2
Iterative OtS
Biweight
Huber
MVT
x5 v


D \Narain\Scout_Fo
1
! 1i
0 573
065T
010
oT:
i
0 641; 0 534J
—
2
	2
0 887j 0 535

3
3
-
0 608
012
	I
0 92i o 57:
|
	 --
4
0 437
015
PROP
Method Comparison
0 992] 0 45
5
5
0 547
011
0 915 0 548

2. The "Select Variables" screen (Section 3.3) will appear.
357

-------
o
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window and then click
on "User Specified" in "Subset Search Strategy" box.
§§ LMSj Reg^essioni Options
-Subset Search Strategy
C All Combinations
<• User Specified
C Extensive
C Quick
-Minimization Criterion 	
<• [n/2] Squared Res (LMS)
[(n +1) / 2] Squared Res
f |(n + p + 1)/2] Squared Res.
Percentile Squaied Res
"Subsets to Search All
r <=10.000
<=100.000
r <=1,000.000
r <=10.000,000
-Percentage Outliers
C Maximum < 55S
C	Maximum < 10%
Maximum < 15%
C	Maximum < 20%
(•	Maximum < 25%
C	Maximum < 30%
C	Maximum < 40%
C	Maximum <50%
15" Display Intervals
-Confidence Coefficient
| 035
17 Display Diagnostics
OK
Cancel
A
Note: The Subset Search Strategy allows the user to specify the number of initial subsets of size p+l to be
used to obtain the residuals (regression models) from a total of
/ n'\
subsets. The user can specify the
\r
Percentage of Outliers, Outlier Probability (usually closer to 1), and the Minimization Criterion (order of
the squared residual to minimize) (Leroy and Rousseeirw, 1987).
o Specify "Subsets to Search." The default is "<=100,000."
o Specify "Percentage Outliers." The default is "< 25%."
o Specify "Outlier Probability." The default is "0.95."
o Specify "Minimization Criterion." The default is "Median
Squared Residual."
o Click on "OK" to continue or "Cancel" to cancel the options.
358

-------
° Click on "Graphics" for the graphics options and specify the preferred
graphs.
I Select LMS Graphics.Options
I? W Plots
P7 YvsY-Hat
Y vs Residuals
R Y-Hat vs Residuals
tv Index Plots
I* QQ Residuals
XV Plot Title
MS Regiession -Y vs Y-Hat Plot
Y vs Residuals Title
LMS Regression ¦ Y vs X Plot
YvsY-Hat Title
MS Regression - Y vs Residuals
Y-Hat vs Residuals Title
MS Regression - Y-Hat vs Resid
XV Plot Title
MS Regression - Residuals Index
QQ Residuals Title
Regression - Residuals QQ Plot
-Regression Line - Fixing Other Regressors at -
C No Line	I51 Confidence Interval
Minimum Values
(* Mean Values
C Maximum Values
C Zero Values
(7 Predection Interval
"Confidence Coefficient
0.95
OK
Cancel
o Specify the required graphs and the input parameters,
o Click on "OK" to continue or "Cancel" to cancel the options.
° Click on "OK" to continue or "Cancel" to cancel the computations.
Output example: The data set "WOOD.xls" was used for LMS regression. It has 5
predictor variables (p) and 20 observations. A total of 38760 subsets of size p+1 (6)
observations were used find the best subset meeting the minimization criterion of least
median of squared residuals.
359

-------
Output for LMS Regression.
Data Set used: Bradu (predictor variables p = 5, Minimization Criterion = Median Squared Residuals).
j Least Median Squared (LMS) Regression Analysts Output
Date/Time of Computation
3/4/2008 9.3511 AM
User Selected Options

From File
D VNarainSScout_For_Windows\ScoutSource\WorkDatlnExcel\Wood
Full Precision
OFF
Subset Search Strategy
User Specified Criteria
Percentage Outliers
Maximum Outliers <= 0 25
Percentage Outliers
Outlier Probability <= 0.95
Search All Cutoff
Do all combinations if <= 100000
Minimization Criterion
Median of Squared Residuals
Title for Residual QQ Plot
LMS Regression ¦ Residuals QQ Plot
Residual Index Plot
Not Selected
YvsX Plots
Not Selected
Title for Y-Hat vs Residuals Plot
LMS Regression - Y-Hat vs Residuals Plot
Y vs Residuals Plot
Not Selected
Y vs Y-Hat Plot
Not Selected




Number of Selected Regression Variables
5



Number of Observations
20



Dependent Variable
V






Correlation Matin



y
x1
x2
x3
x4
x5


y
1
¦0145
0 611
0 47
¦OS
0 629


x1
¦0.145 1
¦0 246
¦0 604
0 528
¦0 641


x2
0 Gil 1 -0 246
1
0 388
-0 498
0 248


x3
0 47
•0 604
0 388
1
¦0 24
0 659


x4
-0 6
0 528
¦0 498
¦0 24
1
¦0 512


x5
0 629
¦0 641
0 248
0 659
-0 512
1






E igenvalues of Correlation M atrix



EvaM
Eval 2
Eval 3
EvaM
Eval 5
Eval 6



0 054
0.173
0 588
0.713
1.114
3357







OLS Estimates of Regression Parameter



Intercept
x1
x2
x3
x4
x5



0 422
0.441
-1 475
0.261
0 0208
0.171



360

-------
Output for LMS Regression (continued).
Stdv of Estimated Regression Parameters



Intercept
x1
x2
x3
x4
x5



0.1 G9
0.117
0.487
0.112
0.161
0 203






OLSANOVA Table


Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
0 0344
5
0.00687
11.81
0 0001

Error
0.00814
14
5 8158E-4



Total
0.0425
19




OLS Scale Estimate
0.0241


R Square
0.808




Least Median of Squared Residual Regression



Total Number of Elemental Subsets of size (G)
387G0


Total Number of Elemental Subsets of size (6) Searched
38760


Number of Non-Singular Elemental Subsets of size (6)
38760




Best Elemental Subset of size 6 Found



y
x1
x2
x3
x4
x5


~ bs tt 7
0.481
0.489
0.123
0.562
0.455
0.824


ObsttIO
0.486
0.685
0.156
0.631
0.5G4
0.914


Obs #11
0.554
0 664
0159
0.506
0.481
0.867


Obs #12
0.513
0 703
0134
0.519
0.484
0.812


Obs # 15
0.502
0.534
0.114
0.521
0 57
0.889


Obs # 1G
0 503
0.523
0.132
0.505
0.G12
0.919





Best Subset satisfies minimization criterion.





LM S E stimates of R egression Parameters (UsingBestS ubset)



Intercept
x1
x2
x3
x4
x5



0.37
0.172
-0 073
•0 524
-0.441
0.G44







Stdv of Estimated Regression Parameters (Using Best Siinel]



Intercept
x1
x2
x3
x4
x5



0.874
0G04
2 516
0.579
0.832
1.051








361

-------
Output for LMS Regression (continued).
Minimizing 10th Ordered Squared Residual |




Value of Minimum Criterion
2.0999E-6



Approximate Breakdown Value
0.35



Unweighted Sigma Estimate based upon LMS Residuals
0.125









Initial Robust LMS Scale Estimate (Adjusted for dimensionality)
0 00691









LMS Regression Table Based Upon Best Sifcset
Obs#
Y
Yhat
Residuals
Hat[i,i]
Res/Sigma
Student
Res/Scale
Weights
C Res~cale
1
0 534
0 522
0 0122
0.278
0.0978
0115
1 761
1
-0.725
2
0.535
0 527
0.00787
0.132
0.0632
00678
1.139
1
0.0472
3
0.57
0 569
8.8370E-4
0.22
0.00709
0.00803
0.128
1
1 243
4
0.45
0 652
-0 202
0.258
-1 625
-1 887
-29.31

0 355
5
0.548
0.538
0 00972
0.222
0.078
00885
1.407
1
1.002
6
0.431
0.662
•0 231
0 259
-1.857
-2.158
-33 5

-0.452
7
0.481
0 481
-3.43E-14
0.53
-2.75E-13
-4.02E-13
-4.97E-12
1
0.907
8
0.423
0 65
¦0.227
0 289
-1 824
-2163
-32.91

-0.0349
9
0.475
0 489
•0 0141
0.348
-0113
-0.14
-2.037
1
-0 396
10
0.48G
0.486
•4 35E-14
0.449
-3.49E-13
-4 71E-13
-6.30E-12
1
-0.415
11
0.554
0 554
-1.14E-14
0 317
-918E-14
-1 11E-13
-1.66E-12
1
1 986
12
0.519
0.519
-3.52E-14
0.41
-2 82E-13
-3 68E-13
-5.09E-12
1
-1.198
13
0.492
0.481
0.00145
0.287
00116
0.0138
0.21
1
-0.485
14
0.517
0.522
•0.00472
0.129
-0.0379
¦0 0406
-0.684
1
-1.281
15
0.502
0.502
¦6 55E-14
0152
•5 26E-13
-5 71E-13
-9.48E-12
1
-0 587
16
0.508
0.508
¦7.33E-14
0.526
-5 88E-13
-8.54E-13
-1.06E-11
1
0.524
17
0.52
0.521
-0 00114
0 289
-0.00913
-0 0108
-0.165
1
-0.255
18
0.50G
0 522
•0 0161
0 294
-0129
-0.154
-2.329
1
0 284
19
0.401
0.666
-0.265
0.292
-2129
-2.53
-38.4
0
-1.084
20
0.568
0.567
9.6663E-4
0.318
0.00776
0 00939
0.14
1
0.545





R e weighted LM S E stimales of R egression Parameters




Intercept
x1
x2
x3
x4
x5




0377
0.217
-0 085
-0.564
-0.4
0.G07









Reweighted LMS Stdv of Estimated Regiession Parameters




Intercept
x1
x2
x3
x4
x5




0 054
0 0421
0.198
0.0435
0.0654
0.0786




362

-------
Output for LMS Regression (continued).


Reweighted LMS AN OVA Titte

Souice of Variation
ss
DGF
MS
F-Value
P-Value
Regression
0 0128
5
0.00255
46
0 0000
Eiioi
5.5517E-4
10
5 5517E-5


Total
0 0133
15



R Square
0 958

Final Reweighted LMS Scale Estimate
0 00745


Reweighted LMS Regression TaHe

Obstt
Y
Yhat
Residuals
Hat[i,i]
Student
Res^Scale

1
0 534
0 526
0 00802
0 278
1 267
1 076

2
0 535
0 531
0 00444
0132
0.639
0 595

3
0 57
0 57
2 3614E-4
0 22
0 0359
0 0317

4
0.45
0 64
-019
0 258
¦29 67
¦25.55

5
0 548
0 535
0 013
0 222
1 979
1 745

6
0.431
0G51
-0 22
0 259
¦34 32
-29 54

7
0.481
0.474
0 00658
0 53
1 288
0 883

8
0 423
0G39
•0 215
0 289
¦34.37
-28.98

9
0 475
0 483
¦0 00775
0 348
¦1.288
-1 04

10
0 486
0 488
•2 958E-4
0 449
¦0 0535
-0.0397

11
0 554
0 557
-0 00274
0317
¦0 445
-0 3E8

12
0 519
0525
-0 00642
0 41
¦1 122
-0.862

13
0 492
0 489
0.00318
0 287
0 507
0 428

14
0 517
0524
¦0.00712
0.129
¦1.023
-0 955

15
0 502
0502
4 6552E-4
0152
0 0679
0 0625

1G
0 508
0 508
-7 691E-5
0 526
¦0.015
-0 0103

17
0 52
0521
¦7 971E-4
0 289
¦0.127
-0.107

18
0 508
0.515
¦0 00928
0 294
¦1 482
-1 246

19
0 401
0655
•0 254
0 292
¦40 5
-34 07

20
0.588
0 569
•0 00145
0 318
•0 235
-0194

Final Weighted Coirelation Matrix

y
x1
x2
x3
x4
x5
V
1
0 75
0 271
-0 0959
¦0.173
0147
x1
0.75
1
0 497
•0.132
•0.03G7
•0 097
x2
0 271
0 497
1
•0.226
•0 0031
•0 755
x3
-0 0959
-0132
-0.22G
1
0 733
0138
x4
-0173
-0.03G7
-0.0031
0 733
1
0 245
x5
0147
-0 097
-0 755
0138
0 245
1


Eigenvalues of Final Weighted Correlation Matrix

Eval 1
Eval 2
Eval 3
Eval 4
Eval 5
EvalG

0012
0.19G
0.39
1.431
1.G04
2 368















363

-------
Output for LMS Regression (continued).
772
5.72
372
LMS Regression - Residuals QQ Plot
*

1.72


|UpperUm*«25 |
H.°verl**-25|
»

-4.28
*


-628



-828



-1028



-12.28



-14.28



"S*'6-28
¦O -1828



S-2028



^J-2228



N -24 28



|-2620



^2-38.28







-3228



-3428



-36 28



-3020



-4028



-4228



-4428



-4628



-4828



-5028 j



-5228



-5428



-5628



Normal Quantiles
Output for LMS Regression (continued).
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers.
364

-------
9.3.1 Least Percentile of Squared Residuals (LPS) Regression
l. Click Regression > LMS/LPS.
m
Scout' 2008; [D^arain^cout^foL^ind^^cou^tSour^Wort^BtlnteelVWogd]!
~B Fde Edit Configure Data Graphs Stats/GOF Outliers/Estimates
Navigation Panel
Regression
Multivariate EDA GeoStats Programs Window Help
Name
D \Narain\Scout_Fo.
SCLASS^Out ost
D \Narain\Scout Fo
1
Case
x1
1
0 573
0 551
TE06
0 437
~0547
x2
010
—I
o.i:
	i
012
	I
0 IE
011
OL5
EE2
Iterative OLS
Biweight
Huber
MVT
PROP
Method Comparison
x5
0 841 j
0 8871"
~092[
0 992f
0.534
0 5351
Wj
" 0 451
' 0.915j 0~548l
8
2. The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on the "Options" button to get the options window and then click on
"User Specified" in "Subset Search Strategy" box.
I1MS Regcessionj Options
-Subset Search Strategy
C All Combinations
(• User Specified
Extensive
Quick
-Subsets to Search All
c <=10,000
<* <=100,000
C <=1,000,000
r <=10,000.000
rMinimization Criterion 	
r [n/2] Squared Res (LMS)
C [(n + 1)/2] Squared Res
C [(n + p +1) / 2] Squared Res
(* Percentile Squared Res
-Percentage Outliers
Maximum < 5%
C	Maximum <10%
C	Maximum < 15%
C	Maximum < 20%
(•	Maximum < 25%
C	Maximum < 30%
r	Maximum < 40^
C	Maximum < 50Z
R? Display Intervals
Confidence Coefficient
0.95
W Display Diagnostics
OK
Cancel
A
o Specify "Subsets to Search All." The default is "<=100,000."
o Specify "Percentage Outliers." The default is "<25%."
o Specify "Outlier Probability." The default is "0.95."
365

-------
o Specify "Minimization Criterion" as "Percentile Squared Res.'
The default is "0.75."
o Click on "OK" to continue or "Cancel" to cancel the options.
o Click on "Graphics" for the graphics options and specify the preferred
graphs.
il Select liMS Gcaghics Options
XV Plots
(v Y vsY-Hat
W Y vs Residuals
I* Y-Hat vs Residuals
W Index Plots
!>? QQ Residuals
XT Plot Title
LMS Regression - Y vs X Plot
Y vsY-Hat Title
LMS Regression -Y vsY-Hat Plot
Y vs Residuals Title
LMS Regression - Y vs Residuals
Y-Hat vs Residuals Title
LMS Regression -Y-Hat vs Resid
XT Plot Title
S Regression - Residuals QQ Plot
m
LMS Regression - Residuals Index
QQ Residuals Title
-Regression Line - Fixing Othef Regressors at ~
No Line	W Confidence Interval
f*	Minimum Values
(*	Mean Values
t~ Maximum Values
f* Zero Values
K? Predection Interval
-Confidence Coefficient
0.95
OK
Cancel
o Click on "OK" to continue or "Cancel" to cancel the options,
o Click on "OK" to continue or "Cancel" to cancel the computations.
366

-------
Output for LPS Regression.
Data Set used: Bradu (predictor variables p = 3, Minimization Criterion = 0.75 percentile).
! Least Percentile Squared (LPS) Regression Analysis Output
Date/Time of Computation
2/25/200811:08-20 AM
User Selected Options

From File
D: \N arain\S cout_For_Windows\S coutS ource\WorkD atlnExcelVB RAD U
Full Precision
OFF
Subset Search Strategy
User Specified Criteria
Percentage Outliers
Maximum Outliers <=0.25
Percentage Outliers
Outlier Probability < = 0 95
Search All Cutoff
Do all combinations if <= 100000
Minimization Criterion
The 0.75 Percentile of Squared Residuals
Residual QQ Plot
Not Selected
Residual Index Plot
Not Selected
YvsX Plots
Not Selected
Title for Y-Hat vs Residuals Plot
LMS Regression - Y-Hat vs Residuals Plot
Y vs Residuals Plot
Not Selected
Yvs Y-Hat Plot
Not Selected




Number of Selected Regression Variables
3



Number of Observations
75



Dependent Variable
y








Coirelation Matrix




V
x1
x2
x3




V
1
0 94S
0.9G2
0.743




x1
0 946
1
0.979
0 708




x2
0 9S2
0 979
1
0 757




x3
0.743
0.703
0.757
1










Eigenvalues for Correlation Matrix



Eval1
Eval 2
Eval 3
Eval 4





0 0172
0.055G
0 3G8
3.559











OLS Estimates of Regression Parameteis



Intercept
x1
x2
x3





-0.388
0 239
-0.335
0.383





367

-------
Output for LPS Regression (continued).
Stdv of Estimated Regression Parameters


Intercept
x1
x2
x3




0 41G
0.2S2
0.155
0.129




'

0LSAN0VA Table

Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
543.3
3
181.1
35.77
0.0000
Error
359.5
71
5.0G3


Total
902.8
74



OLS Scale Estimate
2.25

R Square
0.602

Least 0.75 Percentile of S quared R esidual R «yession
Total Number of Elemental Subsets Searched
10000

Number of Non-Singular Elemental Subsets
10000

Best Elemental Subset of size 4 Found



y
x1
x2
x3



Obs 819
0.1
0.8
2.9
1.6



Obs 81
9.7
10.1
19.6
28.3



Obs 8 3
10.3
10.7
20.2
31



Obs 8 72
-0.2
0.6
2
1.5







Best Subset satisfies minimization criterion.







'ercentile Squared Estimates of Regression Parameters (Using Besl


Intercept
x1
x2
x3




-1.045
0.219
0.272
0.113









Stdv of Estimated Regression Parameters (Using Best Siind]


Intercept
x1
x2
x3




0.572
0.36
0.213
0.177








Minimizing 56th Ordered Squared Residual



Value of Minimum Criterion
0.606

Approximate Breakdown Value
0.24

Unweighted Sigma Estimate based upon LPS Residuals
3.088

368

-------
Output for LPS Regression (continued).
Initial Robust LPS Scale Estimate (Adiusted for dimensionality)
0 901 [







LPS (0.75th] Regression Table Based Upon Best Siiiset
Obs#
Y
Yhat
Residuals
Hat[i,i]
Res/Sigma
Student
Res/Scale
Weights
C Res~cale
1
97
9.7
1 356E-11
0.063
4.393E-12
4 538E-12
1 506E-11
1
1 502
2
10.1
9882
0 218
0 0599
0.0707
0.073
0.242
1
1
1.775
3
10.3
103
1 530E-11
0.0857
4.954E-12
5.181E-12
1.698E-11
1 334
4
95
10 56
-1 058
0 0805
-0 343
-0 357
-1.174
1
1 138
5
10
10.47
-0.469
0 0729
-0.152
-0.158
-0.52
1
1.36
6
10
1017
-0.173
0.0756
¦0.0559
¦0.0582
•0.192
1
1.527
7
108
10.23
0 568
0.068
0.184
0.191
0 631
1
2 006
8
10.3
9713
0 587
0.0631
0.19
0.196
0.G52
1
1.705
9
96
10 22
-0 617
0.08
-0.2
-0 208
•0.685
1
1.204
10
9.9
9.778
0122
0 0869
0.0394
0.0412
0.135
1
1.35
11
-0.2
11.85
-12.05
0.0942
-3 903
-4.101
•13 38
0
-3.48
12
-0.4
12.03
-12.43
0.144
-4.024
-4.349
•13.79
0
-4.165
13
0.7
125
-11 8
0.109
-3 822
-4 049
•13.1
0
-2.719
14
01
14 46
-14 36
0 564
-4.65
-7.04
-15.94
0
-1.69
15
-0.4
0 725
-1 125
0 0579
-0.364
-0 375
¦1.249
1
-0 294
1G
06
02G6
0 334
0.0759
0108
0113
0 371
1
0.385
17
-0 2
-0.587
0 387
0.0393
0125
0128
0.43
1
0 287
18
0
012
-0.12
0 0231
-0.0387
-0.0392
•0.133
1
-0175
19
01
01
2.54E 12
0 0312
-8.24E-13
-8 37E-13
-2 82E-12
1
0 29
20
0.4
0.807
-0.407
0.0476
-0.132
-0.135
-0.452
1
0.151
21
0.9
0.337
0 563
0.0294
0182
0.185
0.625
1
0.299
22
03
0.128
0.172
0.0457
0.0557
0.057
0.191
1
0.415
23
-0.8
0.109
-0.909
0.0293
-0.294
-0.299
-1.009
1
-019
24
07
¦0.0783
0.778
0.0261
0 252
0.255
0.864
1
0 602
25
-0.3
-0.781
0.481
0 022
0156
0.158
0.534
1
-0136
26
¦0.8
0 333
-1 133
0 0318
-0.367
-0 373
•1 257
1
-0 214
27
-0 7
0G85
-1 385
0 0417
-0.449
-0 458
•1.538
1
-0 612
2B
03
-0 207
0 507
0.0235
0164
01G6
0 563
1
-0.108
29
03
-0 447
0.747
0.0178
0 242
0 244
0 829
1
0.176
30
-0 3
-0 208
¦0 0924
0.0466
¦0.0299
•0.0307
¦0.103
1
¦0 564
31
0
0127
-0127
0 059
-0.0412
-0 0424
¦0141
1
¦012
32
-0.4
-0 249
-0151
0 0364
¦0.049
-0 0499
•0 1G8
1
0 247
33
-0G
0 296
-0 896
0.02G4
-0 29
-0 294
¦0 995
1
-0.0485
34
-0 7
-0 879
0179
0 032
0 0578
0.0588
0.198
1
-0.301
35
03
0.626
-0.326
0.0342
-0.105
-0107
•0 3G1
1
-0178
(The complete regression table is not shown.)
369

-------
Output for LPS Regression (continued).
Re weighted LPS Estimates of Regression Parameters


Intercept
x1
x2
x3




¦0 93
0143
0191
0184









Reweighted LPS Stdv of Estimated Regression Parameters


Intercept
xl
x2
x3




0.13
0.0795
0.0718
0.0505











Reweighted LPS ANOVA Table

S ource of Variation
ss
DOF
MS
F-Value
P-Value
Regression
8G5 3
3
288 4
635
0.0000
Error
30 43
67
0.454


Total
895.7
70



R Square
0 966

Final Reweighted LPS Scale Estimate
0 674


Reweighted LPS Regression Table

Obstt
Y
Yhat
Residuals
Hat[i,i]
Student
Res*1 Scale

1
97
9 475
0.225
0.063
0 346
0.335

2
10.1
9 671
0 429
0.0599
0.657
0 637

3
103
10.17
0127
0.0857
0.197
0.189

4
9.5
10.44
¦0.936
0.0805
-1.448
¦1.388

5
10
10 31
¦0.306
0.0729
¦0.471
•0.454

e
10
9 893
0107
0.0756
0.165
0158

(The complete regression table is not shown.)
73
0.4
-0.157
0 557
0 0426
0.844
0826

74
¦0.9
¦0 215
-0.685
0.05
-1 043
¦1.016

75
02
-0 331
0 531
0.0621
0 813
0.788






Final Weighted Correlation M abix



V
x1
x2
x3



V
1
0 939
0 946
0.943



x1
0 939
1
0 985
0 977



x2
0 946
0 985
1
0 98



x3
0 943
0 977
0 98
1








Eigenvalues for Final Weighted Correlation Matrix


Eval 1
Eval 2
Eval 3
Eval 4




0 0142
0 0244
0.0762
3 885












370

-------
Output for LPS Regression (continued).
LMS Regression - Residuals QQ Plot
21.23
2023
19 23
1023
1723
1623
15.23
1423
13.23
12-23
=6
T> 7 23
J 6 23
523
h**l,T4-25
-2.77
-3 77

•09	-0.«	01
Normal Quantiles
LMS Regression - Y-Hat vs Residuals Plot
215
' f
0.33 0 38

-------
Output for LPS Regression.
Data Set used: Bradu (predictor variables p = 3, Minimization Criterion = 0.9 percentile).
Least 0.9 Percentile of Squared Residual Recession







Total Number of Elemental Subsets Searched
10000




Number of NorvSirtgutat Elemental Subsets
10000








Best Elemental Subset of size 4 Found






V
x1
x2
x3






Obs« 31
0
3.1
1 4
1






Obs« 32
-04
05
24
03






Obs a 3
10.3
10.7
20 2
31






Obs ft 45
-0 5
1 9
0.1
06













B est S ubset satisfies minimization criterion.













ercentile Squared Estimates of Regression Parameters (Using Bes





Intercept
x1
x2
x3







¦0 951
0167
0.171
0.194















Stdv of E stimated R egression Parameters (U sing Best Siiiset)





Intercept
x1
x2
x3







0 554
0.349
0.206
0.171














Minimizing 67th Ordered Squared Residual






Value of Minimum Criterion
1 664




Approximate Breakdown Value
0 0933




Unweighted Sigma Estimate based upon LMS Residuals
2 991











Initial Robust LPS Scale Estimate (Adiusted for dimensionality)
0 909











LPS (0.9th) Regression Table Based Upon Best Siiiset

Obs#
Y
Yhat
Residuals
Hat[i,i]
Res/Sigma
Student
Res/Scale
Weights
C Res~cale

1
97
9 573
0127
0.063
0 0424
0 0438
0.139
1
1 502

2
10.1
9 743
0357
0 0599
0.119
0123
0.393
1
1.775

3
10.3
103
1 723E-13
0 0857
5.760E-14
6 024E-14
1 895E-13
1
1 334

4
95
10 52
¦1.024
0 0805
¦0.342
-0 357
¦1 126
1
1.138

5
10
10.41
•0.406
0 0729
-0.136
-0.141
-0 447
1
1 36

6
10
10
-0.00147
0 0756
-4.906E-4
-5.102E-4
-0 00161
1
1 527

7
10.8
10 02
0 783
0.068
0 262
0.271
0.861
1
2 006

8
10.3
9 837
0 663
0.0631
0 222
0229
0.729
1
1 705

372

-------
Output for LPS Regression (continued).
8
10.3
9 637
0 663
0 0631
0 222
0 229
0 729
1
1 705

9
96
10 22
¦0.618
0 08
-0 207
-0 215
¦0 68
1
1 204

10
9.9
9 845
0.0552
0 0869
0.0185
0 0193
0 0607
1
1.35

11
-0 2
11 77
-11 97
0.0942
-4 003
-4 206
•1317
0
-3 48

12
-0.4
1216
-12 56
0144
-4199
•4 538
-13.82
0
-4.165

13
07
12.09
-11 39
0109
-3 807
¦4.034
-12.53
0
•2 719

14
0.1
13 29
-1319
0 564
-4.408
-6 673
-14 5
0
¦1 69

15
-0.4
0 52
•0.92
0 0579
-0.307
¦0.317
-1.011
1
¦0.294

16
06
5 8698E-4
0 599
0 0759
02
. 0 208
0.659
1
0 385

17
-0.2
-0 639
0.439
0 0393
0.147
0.15
0.483
1
0 287

18
0
0 0945
¦0 0945
0.0231
-0 0316
-0 0319
-0104
1
•0.175

19
01
•0 0122
0112
0 0312
0 0375
0 0381
0123
1
0.29

20
04
0.574
¦0.174
0 0476
-0 0582
-0.0596
•0191
1
0151

21
09
0 228
0 672
0 0294
0 225
0 228
0 74
1
0 299

22
0.3
0 0303
0 27
0.0457
0 0902
0 0923
0 297
1
0.415

23
-0 8
-0 0692
•0.731
0 0293
-0 244
-0 248
•0 804
1
•019

24
07
-0 244
0 944
0 0261
0 316
0 32
1 039
1
0 602

25
-0.3
-0 706
0.406
0.022
0136
0.137
0 447
1
•0.136

26
-0 8
0 247
¦1 047
0 0318
-0.35
-0 356
-1.152
1
¦0 214

27
-0 7
0 59
¦1.29
0 0417
-0 431
-0 44
-1 419
1
¦0 612

28
03
-0126
0.426
0.0235
0142
0144
0 468
1
¦0108

29
03
•0 442
0.742
0 0178
0 248
0.25
0 816
1
0176

30
-0.3
0 0288
¦0 329
0.0466
-0.11
-0.113
¦0 362
1
•0.564

31
0
2146E-14
-215E-14
0 059
¦7.17E-15
•7 39E-15
-2 3GE-14
1
•012

32
-0.4
-0 4
2 520E-14
0 0364
8 425E-15
8 583E-15
2 772E-14
1
0.247

33
-0 6
0119
-0 719
0 0264
-0 241
•0 244
¦0 791
1
¦0 0485

34
-0 7
-0 748
0.0484
0.032
0 0162
0 0165
0 0533
1
¦0 301

35
03
0 559
-0.259
0 0342
-0 0865
-0 088
•0 285

¦0.178

36
-1
0132
-1.132
0 0231
-0 378
-0 383
•1 245
1
•0.522

37
-0 6
0 0819
-0.682
0 0587
-0 228
-0 235
•0 75
1
•0102

38
0.9
•0.457
1.357
0 021
0.454
0.458
1.493
1
0 557

39
-0 7
-0 367
-0.333
0 035
•0.111
-0.113
•0 366
1
¦0 567

40
-0.5
¦0 294
-0 206
0.03
-0 069
-0 0701
•0 227
1
¦0 0102

41
•0.1
0.453
-0.553
0 0524
-0185
-019
•0 608
1
¦0 49

42
-0.7
•0.206
-0.494
0 0554
-0165
-017
•0 543
1
¦0.482

43
0.6
-0.197
0.797
0 0606
0 266
0 275
0.877
1
0 766

44
-0 7
0 0561
-0.756
0 0406
-0 253
-0 258
¦0.832
1
•0 801

45
-0 5
•0 5
-4.16E-14
0 029
-1 39E-14
-1 41E-14
-4 58E-14
1
-0 339

46
-0.4
0 0173
-0.417
0 0377
-0.14
-0142
¦0 459
1
•0 634

(The complete regression table is not shown.)
373

-------
Output for LPS Regression (continued).
R e weighted LPS E stimates of R egression Parameters




Intercept
xl
x2
x3






¦0.93
0.143
0.191
0.184













R eweighted LM S S tdv of E stimated R egression Parameters |
i



Intercept
x1
x2
x3






0.13
0.0795
0.0718
0.0505

















R e weighted LPS AN OVA T rite



Source of Variation
ss
DOF
MS
F-Value
P-Value


Regression
865.3
3
288.4
635
0.0000


Error
30.43
67
0.454




Total
895.7
70





R Square
0 966



Final Reweighted LMS Scale Estimate I 0 674





Reweighted LPS Regression Table



0bs8
Y
Yhat
Residuals
Hat[i,i]
Student
Res/Scale



1
97
9.475
0 225
0.063
0 346
0.335



2
10.1
9.671
0.429
0 0599
0.657
0 637



3
103
1017
0127
0 0857
0197
0189



4
9.5
10.44
¦0.936
0.0805
-1 448
-1 388



5
10
1031
-0 306
0.0729
-0 471
-0.454



6
10
9.893
0.107
0.0756
0165
0158



7
108
9.927
0.873
0.068
1 341
1 295



8
10.3
9 538
0.762
0.0631
1.168
1 13



9
9.6
10.13
-0.525
0.08
-0 812
-0.779



10
9.9
9.748
0.152
0.0869
0.236
0.225



11
¦0.2
11.68
-11.88
0.0942
-18.52
-17.63



12
¦0.4
12
-12.4
0.144
-19.89
-18.4



13
0.7
12.02
-11.32
0.109
-17.79
-16 79



14
0.1
13.4
-13.3
0.564
-29.88
-19.74



15
-0.4
0.497
-0.897
0.0579
-1 372
-1.332



16
0.6
-0 0111
0.611
0.0759
0 943
0 907



17
-0.2
¦0.588
0.388
0.0393
0.587
0 575



18
0
0.0736
-0 0736
0.0231
-0.11
-0.109



19
01
0.033
0 067
0 0312
0101
0 0994



20
04
0 568
-0.168
0.0476
-0.256
-0 25



(The complete regression table is not shown.)
374

-------
Output for LPS Regression (continued).
4 09
3 09
LMS Regression - Residuals QQ Plot
~ ~	lyppe>LWI-25l
0 09
-031
-2 31

-4 31
« S3,
¦o -6 31
*3 9 91
T3
g-1031
W.11 9i
¦12.91
-13.31
-14.91
-1991
-20 31
-21 91
IB	2.1
The 75,h percentile minimization criterion finds 14 observations (1,2, 3,4, 5, 6, 7, 8,9 and 10) as outliers
and 90"' percentile minimization criterion finds four observations (11, 12, 13 and 14) as outliers.
375

-------
9.4 Iterative OLS Regression Method
I. Click Regression > Iterative OLS.
g
Scout'4.J3)- [D:.\Narain\Scouti_Fori_Wi"diiW5\ScoutSource\WbrkDatl'nExcel\BRADlJ]]
~9, File Edit Configure Data Graphs
Stats/GOF
Outliers/Estimates j
Multivariate EDA
Geo5tats Programs Window
Help
Navigation Panel ]

0
I 1
2
OLS
5
1 6
7
8
I Name I
I

Count
I v
x1
LM5
J
1 1

D \Narain\Scout_Fo
1
	"T
97
1(
	 ]
1 Biweight
[ Huber



f 1


2
2
10.1
:



i


3
3
| 103
1_i
MVT



1


4
4
95
" c
1
PROP






5
: "5
!	
10
Method Comparison



l

2. The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on the "Options" button to get the options window.
§11IR11S, Options
'Regression Value
Alpha for Residual Outliers
I oT5
"Number of Regression Iterations
I lo
[Max = 50]
"Residuals MDs Distribution —
(• Beta	Chtsquare
-Intermediate Iterations —
<• Do Not Display
C Display Every 5th
C Display Every 4th
C Display Every 2nd
C DisplayAD
"Identify Leverage Points 	
Leverage
-Select Leverage Distance Method
Dassical
f Sequential Classical
^ Huber
PROP
MVT (TnmmBig)
-Number of Leverage Iterations
10
[Max = 50]
"Leverage MDs Distribution
f* Beta
C Chisquaie
"Initial Leverage Distances 	
C Classical
C Sequential Gassical
r Robust (Median. 1 48MAD]
(* OKG (MaronnaZamai)
C KG (Not Orthogonalized)
C MCD
¦Leverage Value(s)
| 005
Leverage Influence Function
Alpha
Display Intervals
Confidence Coefficient
0 95
9 Dbplay Diagnostics
OK
Cancel
A
o Specify the "Regression Value." The default is "0.05.'
376

-------
o Specify the "Number of Regression Iterations." The default
is"10."
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances" The default is "OKG
(Maronna Zamar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options.
° Click on the "Graphics" button to get the options window.
0ptio nsRegnessionGija|j hics
F7	XY Plots
17	Yvs Y-Hat
|i?	Y vs Residuals
17	Y-Hat vs Residuals
15"	Residuals vs Leverage
P? QQ Residuals
Xf Plot Title
I IRLS Regression -Y vsX Plot
YvsY-HatTHle
JlRLS Regiession -Y vs Y-Hat Plot
	Y vs Residuals Title	
jlRLS Regiession - Y vs Residuals
Y-Hat vs Residuals Title
||RLS Regiession -Y-Hat vs Resid
Residuals vs Leverage Title
JlRLS Regiession - Residuals vs U
QQ Residuals Title
jIRLS Regiession - Residuals QQ
Regression Line - Fixing Other Regressors at —
t*	No Line	[51 Confidence Interval
f~	Minimum Values
C	Mean Values
C	Maximum Values
C	Zero Values
17 Piedection Interval
Confidence Coefficient
I 095
'Graphics Distribution
f Beta
C Chisquare
-Residual/Lev Alpha —
0 05
OK
Cancel
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options.
° Click "OK" to continue or "Cancel" to cancel the computations.
A
377

-------
Output example: The data set "BRADU.xls" was used for iterative OLS regression. It
has 3 predictor variables (p) and 75 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in finding the regression outliers iteratively. When the leverage option
is off, all observations are assigned one (1) as weights and then the regression outliers are
found iteratively. Finally, the estimated regression parameters are calculated.
378

-------
Output for Iterative OLS (Leverage ON with PROP function and OKG initial start).
Data Set Used: Bradu (predictor variables p = 3).
Date/Time of Computation
User Selected Options
| RegiessionAnalysis Output
3/4/2008 9 50 32 AM
From Fie (D \Naran\Scout_Fof_Windows\ScoutSource\WofkDatlnExcel\BRADU
Ful Precision .OFF
Selected Regression Method Iterative Reweighted Least Squares (IRLS)
Alpha for Residual Outbers i0 05 (Used to Identrfy Vertical Regression Outkers)
Number of Regression Iterations
Leverage | Identify Leverage Points (Outliers in X Space)
Selected Leverage Method i PROP
Initial Leverage Distance Method jO^G (Maionna Zamar) Matrix
Squared MDs
Leverage Distance Alpha
Number of Leverage Iterations
Beta Distribution used foi Leverage Distances based ipon Selected Regression [Leverage) Varables
0 05 (Used to Identify Leverage Points)
10 (Maamun Nimbet if doesnt Converge]
YvsY-hatPlot .NotSelected
Y vs Residual Plot Not Selected
Y-hat vs Residual Plot Not Selected
YvsX Plots Not Selected
J	
Title for Residual QQ Plot IRLS Regression • Residuals QQ Plot
Residual Band Alpha
Title Residual vs Distance Plot
0 05 (Used n Graphics Residual Bands)
IRLS Regression - Resduals vs Unsquared Leverage Distance Plot
Show Intermediate Results ,Do Not Display Intermediate Results
I Intermediate Results Shown on Another Output Sheet
Leverage Points are Outliers inX-Space of Selected Regression Variables.




— —


Numbej of Selected Regression Variables j 3





Number of Observations
1 75
	





Dependent Variable
y









Residual Values used with Graphics Display





Upper Residual Indvidual (0 05] MD
194





Lower Residual Indvidual (0 05] MD
CofretationMatlix
¦1 94




V 1
xl |x2
x3



......


y
1
0 946
0982
0 743


i
i


x1
0 946
1
0 979
0 703







x2
0 962
0 979
1
0 757






x3
0 743
0 703
0 757
1
















Eigenvalues of Correlation Mabn



	


Eval 1
Eval 2
Eval 3
Eval 4
I





0 0172
0 055G
0 368
3 559
J

..... ...J







Ordinary Least Squares (OLS) Regression Resdtz







Estimates of Regression Parameters


	
	
	
Intercept
x1
x2
x3





-0 388
0 239
¦0 335
0 383





	
	
	






Stdv of Estimated Regression Parameters





Intercept
x1
x2
x3
I






0416
0 262
0155
0129
I






379

-------
Output for Iterative OLS (Leverage ON) (continued).
ANOVA Table



1
Souice of Variation
SS
DOF
MS
F-Value
P-Value
1
1
Regression
543 3
3
181 1
35.77
0 0000
1
Error
359.5
71
5.063
1
i
Total
902 8
74

1
1


I
1
1
R Square Estimates
OS02




MAD Based Scale Estimates
1 067



1
Weighted Scale Estimates
2 25



1
IQR Estimates
1.469



I
Det of COV[Regiession Coefficients] Matrix
5.5107E-8



1

Regression ParametersVector Estimates





-
Intercept
xl
x2
x3
|





•0 0105
0 0624
00119
-0.107
















S tdv of R egression E stimales Vectai






Intercept
x1
x2
x3
|





0197
0 0689
0 0684
0 0713
I







I

ANOVA Table

i

Source of Variation
SS
DOF
MS
F-Value
P-Value



Regression
0 898
3
0.299
0.94
0 4272



Error
1814
57
0 318





Total
19 04
60














R Square Estimates
0 0472






MAD Based Scale Estimates
0 902



|


Weighted Scale Estimates
0 564



i



Individual MD(0]
7 346






IQR Estimates
1 236




I
Determinant of Leverage S Matrix
1.357

|

t
380

-------
Output for Iterative OLS (Leverage ON) (continued).
Le veiage 0 ption R egression T able
~ bs
"1
2
3
4
5
6
Y Vector
" IT"""
101
~ioT
FF
10
10
10!
103
9
"iF
"TT
IF
9 G
97F
•0 2
-0.4
13
14"
"15"
IF
0.7
"FT
•04"
IF'
17
-0 2
18
19
'To
'TT
22
~23~
24
FF
26
~2T
"28"
~2F
3F
IT"
IF
IF
"34"
"IF
"36
~W
0
~o7f
"04"
09
"FT
-"08"
"077""
-FT"
-oF"
-0 7
03"
0.3
T3
0
"FT"
-0.G
4.7"
"oF
-i
"oF
Yhat
-2.174
"-2265"
T418
•2.523
Residuals
"Ti"87~
12"3G
-2 443
-2.217
-2.219
-224"
-2 475
-2 437
-2 782
F§4F
-2.589
FffiF
"Toil 5"
"0.177
"F012F
0.0619
0 0971
00119
0.0253"
¦0151
0 0561
00446
0 00912
FT§F
-0 085
-0 0105
"F29T
""04926
"00173"
-0.0404
"^0 0604"
~o"ToF
1204"
~0 247"
12 72
iFoF
1244
1222
1302
1254
12 07
"1234"
2.582
2546
Z656
"F4lF"
F423"
"oT&T
0 0619
"FisT
07412
~0"9"25""
07451"
-0.856
0655"
-0 309
-0618
-0 615
076F
0.31
-0 00901
-0 0926
-0417"
~0W
" F64
0.409
-0 796
Hat[i.i]
0 063
0 0599
0 0857
"0 0805
"CE6729
0 0756
0 068
0.0631
0 08
0 0869
0 0942
""OUT"
0109
0.564
0 0579
0.0759
F0393"
0.0231
0 0312
0l47"6"
"00294"
0.0457
"00293
"070261
"0 022""
"00318"
0 0417
00235
0.0178
00466
0059
00364
1702 64"
0703F"
00342
00231
-0.353
0 0587
Res/Scale
"'"2i'05""
~" FT 92
" ~22"54~
F.32
"2FoF
'21 66
~""23.0F~
22.23
21 4
21 87
4 577"
"4513
" 5783"
4 708"
-0.73
0.75"
"-033F
0.11
"0.343
0.73
1G4
0798
-1.518
1 162
-075"48
^1095
-1 09
F71T
"FsF
0"016"
Fl64~
¦6774""
-0992"
-1 134"
0.726
"1.41
"F62F"
Stude~Res| Wls[u]
21	74 12 364E-13
"22 §T j 1" 074E-T3"
23 58"~jT886E-T4"
22	23 j 6."93rE-1"5
"2291"
22.52
"239""
22797
"22 32"
22 89
|1 266E-14
'7 228E-T4"
IFsTeE-iT
I1.634E-T3
1 768E-14
4 809
5 017E-14
1.424E-16
4~877"1F684E-17
' 6."177 j i"069ETe"
7" 127 lT.478FlF
" "O 75F 1
0 78
"0339"
FfTT j
0~355~ j
07748 j
T.665 I
0 818 j
T5T" j
1.177 J
""-0 554"]
•TlTJ" I
1
Res Dist
" 2V05"
"21.92
" 22 54~"
21 32"
"22"0G"
FT£F
23708-'
22723
21 4
21.87"
-1 114
0722"
0 555
•0 0164"
"FT&T
' -0 754~
¦1.005
-1 15F
"0738"
A 427
-0.646
4 577
"7513"
5.83
4.708
0"73~~
oTT
""033F
"FTP"
"1349"
0 73
164
0 799
TsiF
1.162
0548"
T095
ToF"
"07T4"
0 55 "
0016
"0164"
0 74""
""099F
"T.134"
"*0726"
141""'
oeif
Lev Dist
29.44 "
3F2l"~
31" 89"
32.86""
'32 28""
30.59 *
"" 3'o7eF
29.8
"" 31" 95"
30 94"
" 3664"
" "37.96 "
OLS R~ist
"" "f.502"
T77F"
1 334
1 138""
L .
1 36
w
T00F
"1 705"
"1"204"
"1735" ~
36.92
4T0F"
"F002
2.165
_17938
'"07786 "
W
2 067"
"17059"
1.746
l7l£3
1.317
1 98F
1 70F
T934"
i.rcsF
"1I3F
FTiT
1 7lF
1 763"
1~2~77~
"2.042
1*885"
lT44"
"F0T4"
3 48
4.165
"2719"
1.69
"17294""
0 385
" 0"287~
"0175
~0729"~"
0.151
1729F
0.415"'
"oT9
0 602
" oTilF"
"oFTT
0 612
oioF
"Fi7F
"0564"
" 0.1F""
"0247"
ooffi
"0 301 "
" FiTF
"0.522""
07102"
(The complete regression table is not shown.)
381

-------
Output for Iterative OLS (Leverage ON) (continued).
THE BREAK BETWEEN LEVERAGE AND REGRESSION IS HEREI
	
	
		
Results From the Regression Operation
-
	
— .
	










Regression Parameters Vecloi Estimates


I

Intercept
x1
x2
x3




I

¦018
0.0314
0.0399
-0 0517



















S tdv of R egression E stimates Vector






Intercept
x1
x2
x3







0104
0.0667
0 0405
0 0354


















AN OVA Table




Source of Variation
ss
DOF
MS
F-Value
P-Value




Regression
0.847
3
0.282
0.909
0.4421





Error
18 94
61
0 31





Total
19 79
64














R Square Estimates
0 0428







MAD Based Scale Estimates
0 845






Weighted Scale Estimates
0 557






Individual MD(0]
7 346






IQR Estimates
1.132



I

Del of C0V[Regression Coefficients] Matrix
2.531 E-12



I

!




R egression T able





Obs
V Vector
Yhat
Residuals
Hat[i,i]
Res^Scale
Stude" Res
Wts[u]
Res Dist


1
97
-0.0386
9 739
0 063
1748
18 06
0
17 48


2
101
¦0 0825
10.18
0 0599
18 27
18.85
0
18 27


3
103
¦0105
10.41
0 0857
18 67
19 53
0
18 67


4
95
¦0.155
9.655
0.0805
17 33
18.07
0
1733


5
10
•0.107
1011
0.0729
18.14
18 84
0
1814


6
10
0.00379
9.996
0.0756
17.94
18.66
0
1794


7
10.8
0 00449
10.8
0.068
19 37
20.07
0
19.37


3
10.3
-0 0807
10.38
0 0631
18.63
19.25
0
18.63
1
9
9.6
-0.167
9 767
0.08
17.53
18 27
0
17.53


10
99
•0.203
101
0 0869
18.13
18 98
0
1813


11
¦0 2
¦0136
-0.0641
0 0942
¦0115
•0.121
1
0115


i . Vi
(The complete regression table is not shown.)
382

-------
Final Weighted Correlation Mabw

y
x1
k2
x3
V
1
0.89
0.917
0.0893
*1
0.89
1
0.961
0.063
*2
0.917
0.961
1
0.0261
x3
0.0893
0.063
00261
1
Eigenvalues of FinalWeighted Correlation Mabix
Evai1 Eval 2 Eval3 EvaU
0.035 0.117 0997 2051
Output for Iterative OLS (Leverage ON) (continued).
LS Regression - Residuals QQ Plot
2123
2023
1923
1823
1723
16.23
1523
1423
1323
1223
¦
| 11 23
3
S 1°23
T3 723
s . ..
|m»v<0 05|««1 94 |

¦0.77
-177
Normal Quantifes
383

-------
215

IRLS Regression - Residuals vs Unsquared Distance Plot

|mdlv-MD(0 05) ¦ *2 7*






ji
M 4
V -
1 -J
16.2




3
3
¦o




o
a:
¦O
•




¦o
s
K «



|lndv-Res|0X)Sl - »1.9* |

, ;jr
¦ , IV-


- - i,
InKJtnK 051
—1.9*1 * •




|htax-f©(0 05] = *3 94

0.2
12
2 2
32 4 2 52 62 70
Unsquared Leverage Distances
Interpretation of Graphs: Observations which are outside of the horizontal lines in both of the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. Observations between the horizontal and to the right of the vertical lines represent good
leverage points.
Question: What are really bad leverage points for this data set in the context of a regression model?
Answer: There are contradictory opinions in this respect. So far as outliers are considered, several
methods (e.g., MCD, PROP) can identify all of the 14 outliers present in this data set. However,
observations I through 10 should be considered to be good leverage points as they enhance the regression
model and increase the coefficient of determination. Without those 10 points, fitting a regression model to
the rest of the 65 points is meaningless. Observations 11 through 14 are outliers and bad leverage points.
384

-------
Output for Iterative OLS (Leverage OFF).
i Regression Analysis Output
Date/Time of Computation
3/4/2008 9 54 08 AM
User Selected Options

From File
D:\Narain\Scout_For_Windows\ScoutSouice\WorkDatlnExcel\BRADU
Full Precision
OFF
Selected Regression Method
Iterative Reweighted Least Squares (IRLS)
Alpha for Residual Outliers
0 05 (Used to Identify Vertical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leverage
Off
YvsYhat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
YvsX Plots
Not Selected
Title for Residual QQ Plot
IRLS Regression - Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
IRLS Regression - Residuals vs Unsquaied Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results

Intermediate Results Shown on Another Output Sheet




Number of Selected Regression Variables
3



Number of Observations
75



Dependent Variable
S1







RetidualValues used with Graphics Display



Upper Residual Indvidual (0 05) MD
1 94



Lower Residual Indvidual (0 05) MD
¦1.94









Correlation Matrix




y
x1
x2
x3




V
1
0 946
0 962
0 743




x1
0 946
1
0 979
0 708




x2
0 962
0 979
1
0 757




x3
0 743
0 708
0 757
1










Eigenvalues of Coiielation Matrix



Eval 1 I Eval2
Eval 3
Eval 4





0.0172 0 0556
0 368
3 559





385

-------
Output for Iterative OLS (Leverage OFF) (continued).
Ordinary Least Squares (OLS)Regression Reais








Estimates of Regression Parameters



Intercept
x1
*2
x3





¦0 388
0.239
¦0 335
0 383











Stdv of Estimated Regression Parameters



Intercept
x1
x2
x3





0 415
0 282
0155
0129








ANOVA Table


Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
543 3
3
181 1
35 77
0 0000

Error
359 5
71
5.063


Total
902 8
74






I
R Square Estimate
0602




MAD Based Scale Estimate
1 067




Weighted Scale Estrmate
225





IQR Estimate of Residuals
1 468




DeL of COV[Regression Coefficients] Matrm|5 5107E-8




Final Reweighled Regression ReoAs
Estimates of Regression Parameters



Intercept
x1
x2
x3





-0 388
0 239
-0 335
0.383











Stdv of Estimated Regression Parametral



Intercept
x1
x2
x3





0416
0 262
0155
0129





ANOVA Table
	
	
Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
543.3
3
181 1
35 77
0.0000

Error
359 5
71
5 063



Total
902.8
74




R Square Estimate
0 602




MAD Based Scale Estimate
1 0S7




Weighted Scale Estimate
225




IQR Estimate of Residuals
1 468




DeL of C0V(Regression Coefficients] Matrix 5 5107E-8





Regression Tabte
Obs
V Vector
Yhat
Residuals
Hat[u]
Res^Scale
Stude" Res
Wts[u]
Res Dist
1
97
6 32
338
0 063
1 502
1 552
1
1 502
2
101
6105
3 995
0 0599
1 775
1 831
1
1 775
3
103
7 297
3 003
0 0857
1 334
1 396
1
1 334
4
95
6 939
2 561
• 0 0805
1 138
1 187
1
1 138
5
10
6 939
3 061
0 0729
1 36
1 413
1
1 36
(The complete regression table is not shown.)
386

-------
Output for Iterative OLS (Leverage OFF) (continued).
IRLS Regression - Residuals QQ Plot
2 42
212
1.82
1.52
1.22
0.92
062
0.32
0.02
3
2-0 58
to
4>
-0 88
"O
e
N -1 18
-1.48
|lndv(0 051 ¦ *1 94 |
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced. The sequential classical method failed to identify all of the regression outliers.
9.5 Biweight Regression Method
1. Click Regression ~ Biweight.
liS Scout 4.0 - [D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExeel\DEFIME]
1 n2 File Edit Configure Data Graphs
Stats/GOF Outliers/Estimates
.J] Multivariate EDA GeoStats Programs Window Help
Navigation Panel j
r-
0 1
2
OLS
5
6
7 8
Name

Y X

LMS
Iterative OLS



D:\Narain\Scout_Fo...
i
6i 1




¦


Biweiaht




2
i			8.5 1

Huber




3
8.5 2

MVT




4
10 2

PROP




5
12 4

Method Comparison



2. The "Select Variables" screen (Section 3.3) will appear.
387

-------
o
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
® If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window.
iH Biweight' Options
-Regression Value
Residual Scale Tumrrg Constant
"Number of Regression Iterations —
| To
[Max = 50]
"Residuals MDs Distribution —
Beta C Chisquare
•Intermediate Iterations -
(• Do Not Display
C Display Every 5th
C Display Every 4th
^ Display Every 2nd
C Display Al
'Identify Leverage Ponts 	
W Leverage
¦Select Leverage Distance Method
f Classical
Sequential Classical
Huber
^ PROP
f MVT (Trimming)
¦Number of Leverage Iterations
I 10
(Max = 50]
'Leverage MDs Distribution
f Beta
Chisquare
¦Initial Leverage Distances 	
C	Oasscal
C	Sequential Classical
C	Robust (Median, 1 48MAD)
(*	OKLG (MaronnaZamar)
C	KG (Not Orthogonahzed)
C	MCD
-Leverage Value(s)
|	0.05
Leverage Influence Function
Alpha
W Display Intervals
"Confidence Coefficient
0 95
OK
Cancel
I* Display Diagnostics
A
o Specify the "Regression Value." The default is "4.'
o Specify the "Number of Regression Iterations." The default is
"10"
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
388

-------
o Specify the "Leverage Initial Distances" The default is "OKG
(Maronna Zaniar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options
o Click on the "Graphics" button to get the options window.
H Optio nsReg^essio nGna [ihics
m
17 XY Plots
[7 Yvs Y-Hat
17 Y vs Residuals
17 Y-Hat vs Residuals
17 Residuals vs Leverage
17 QQ Residuals
XY Plot Title
Biweight Regression ¦ Y vs X Plot
Y vs Y-HatTitle
Biweight Regression - Y vs Y-Hat
Y vs Residuals Title
Biweight Regression -Y vs Residu
Y-Hat vs Residuals Title
"Regression Line - Fixing Other Regressors at —
(* No Line	[7 Confidence Interval
Minimum Values
C Mean Values
C Maximum Values
Zero Values
(7 Piedection Interval
r Confidence Coefficient
I 095
Biweight Regression -Y-Hat vs R
Residuals vs Leverage Title
-Graphics Distribution
<•" Beta
Chisquare
Biweight RegressiortEResiduals vs
QQ Residuals Title
-Residual/Lev. Alpha —
Biweight Regression - Residuals Q
0 05
OK
Cancel
A
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options
o Click "OK" to continue or "Cancel" to cancel the computations.
Output example: The data set "DEFINE.xls" was used for Biweight regression. It>has 1
predictor variables (p) and 26 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the Biweight tuning constant iteratively. Finally, the estimated
regression parameters are calculated.
389

-------
Output for Biweight (Leverage ON).
Data Set Used: Define (predictor variables p = 1).



Regression Analysis Output
Date/Time of Computation
3/4/200810 03 07 AM








User Selected Options











From File
D \Narain\Scout_For_Wmdows\ScoutSouice\WoikDatlnExcel\DHFINE



Full Piecision
OFF








Selected Regression Method
Biweight
Residual Biweight Tuning Constant
4 (Used to Identify Vertical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesnt Converge)
Leverage
Identify Leverage Points (Outliers in X-Space)





Selected Leverage Method
PROP
Initial Leverage Distance Method
OKG (Maronna Zamar) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leveiage) Variables
Leverage Distance Alpha
0 05 (Used to Identify Leverage Points)
Number of Leveiage Iterations
10 (Maximum Number if doesnt Converge)
Y vsYhat Plot
Not Selected
Y vs Residual Plot
Not Selected







Y-hat vs Residual Plot
Not Selected







Title For Y vsX Plots
Biweight Regression • Y vs X Plot






Title for Residual QQ Plot
Biweight Regression ¦ Residuals QQ Plot





Residual Band Alpha
0 05 (Used
n Graphics Residual Bands)





Title Residual vs Distance Plot
Biweight Regression ¦ Residuals vs Unsquared Leverage Distance Rot
Show Intermediate Results
Do Not Display Intermediate Results







Intermediate Results Shown on Another Output Sheet




Leverage Points are Outliers inX-Space of Selected RegressionVariaUesL



I

I
Number of Selected Regression Variables] 1








Number of Observations
2G









Dependent Variable
Y














Residual Value* used with Graphic* Display






Upper Residual Indvidual [0 05) MD
1 903






Lower Residual Indvidual {0 05) MD
¦1 903

















Correlation Matibc







Y
X









Y
1
0218









X
0218
1









390

-------
Output for Biweight (Leverage ON) (continued).
E igenvalues of Correlation Mabbc j


Evall
Eval 2







0.782
1.218







Ordinary Least Squares (OLS) Regression Resdts
Estimates of Regression Parameteis



Intercept
X







22.0G
0 25G















Stdv of Estimated Regression Parameters



Intercept
X







4107
0 233







ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
224 9
1
224 9
1.202
0.2838

Error
4490
24
187.1



Total
4715
25










R Square Estimate
0.0477




MAD Based Scale Estimate
8.862




Weighted Scale Estimate
13 68




IQR Estimate of Residuals
25.79




Det. of C0V[Regression Coefficients] Matrix
0.391




Initial Weighted Regression Iteration with Identified Leverage Poirts
Estimates of Regression Parameters



Intercept
X







22.06
0.256















Stdv of Estimated Regression Parameters



Intercept
X







4.107
0.233










391

-------
Output for Biweight (Leverage ON) (continued).
AN OVA Table




S ource of Variation
SS
DOF
MS
F-Value
P-Value



Regression
224 9
1
224.9
1.202 | 0 2838



Enor
4490
24
1871





Total
4715
25














R Square Estimate
0 0477






MAD Based Scale Estimate
8 862






Weighted Scale Estimate
13.68






Unsquared Leverage Distance lndiv-MD(0.05)
1.803






IQR Estimate of Residuals
25 79






Determinant of Leverage S Matrix
137.6








R egression Table with Leverage Option
Obs
Y Vector
Yhat
Residuals
Hat[u]
Res/Scale
Stude" Res
Wts[u]
Res Dist
Lev Dist.
0LS R~ist
1
e
22.32
•1G 32
0 0827
•1 193
-1.246
1
1.193
1 052
1.193
2
8.5
22.32
¦13.82
0.0827
-1 01
•1.055
1
1.01
1 052
1.01
3
8.5
22.58
-14 08
0.0758
•1 029
¦1.07
1
1.029
0 966
1.029
4
10
22 58
-12 58
0 0758
•0 919
¦0 956
1
0.919
0 966
0 919
5
12
23.09
-11 09
0.0638
¦0 811
•0.838
1
0.811
0.796
0 811
6
40
23.08
16 91
0 0638
1 237
1 278
1
1 237
0.796
1 237
7
42.5
23 09
1941
0 0638
1.419
1 467
1
1 419
0.796
1 419
8
45
23 34
21.66
0 0587
1.583
1.632
1
1.583
0.711
1.583
9
50
23.34
26 66
0.0587
1 949
2.009
1
1.949
0 711
1.949
10
13
23 09
¦10.09
0 0638
-0 737
•0 762
1
0 737
0 796
0 737
11
14
23.09
¦8.086
0 0638
¦0.664
•0 687
1
0.664
0.786
0.664
12
17
23 34
-6 342
0 0587
¦0 464
•0.478
1
0.464
0 711
0.464
13
17 4
23 34
-5 942
0 0587
¦0 434
•0 448
1
0 434
0.711
0 434
14
22
24.62
-2 62
0 0417
-0.192
¦0186
1
0192
0.284
0.192
15
24
24 62
-0 62
0.0417
-0 0454
•0.0463
1
0.0454
0.284
0 0454
1G
25
24.62
0 38
0.0417
0 0277
0 0283
1
0 0277
0.284
0 0277
17
42.5
27.18
15.32
0.0514
1.12
1.15
1
1.12
0.568
1.12
18
43
27.18
15 82
0 0514
1.157
1 188
1
1.157
0 568
1 157
18
441
27 69
16.41
0 0603
1 2
1.238
1
1 2
0.739
1 2
20
45 3
27 82
17.48
0 0628
1 278
1 32
1
1 278
0 781
1.278
21
20
28 73
•9 734
0.119
-0.712
•0 758
1
0.712
1 421
0.712
22
22
29.73
¦7.734
0119
-0 565
¦0.602
1
0.565
1.421
0.565
23
21
29 99
-8 99
0129
-0 657
•0 704
1
0 657
1.506
0.657
24
23
30 25
-7 245
014
-0 53
¦0 571
1
0 53
1.581
0 53
(The complete regression table is not shown.)
392

-------
Output for Biweight (Leverage ON) (continued).
Final Reweighted Regression
ReaJts



Estimates of Regression Parameter



Intercept
X







11 G4
0 358





	
	






Stdv of Estimated Regiession Parameters



Intercept
X

|




1 372
0 0696

J







AN OVA Table


Source of Variation
SS
DOF
MS
F Value
P Value

Regression
346
1
346
26 45
0 0002

Error
173 3
13 25
13 08



Total
519.3
14 25










R Square Estimate
0S66




MAD Based Scale Estimate
7 695




Weighted Scale Estimate
3 617




IQR Estimate of Residuals
25 53




Det of COV[Regression Coefficients] Matrix] 0 00416





Regiession Table
Qbs
Y Vector
Yhat
Residuals
Hat[u]
Res/Scale
Stude" Res
Wts[i,i]
ResDist.
1
6
11 99
¦5 993
0 0827
-1 657
-1 73
0 678
1 657
2
85
11 99
¦3 493
0 0827
-0 966
¦1 008
0 883
0966
3
85
1235
•3 852
0 0758
¦1.065
¦1.108
0 859
1 065
4
10
12 35
¦2 352
0 0758
•0 65
¦0 676
0.946
065
5
12
13 07
¦1 068
0 0638
¦0 295
-0 305
0 989
0 295
G
40
13 07
26 93
0.0638
7.446
7 696
0
7.446
7
42 5
13 07
29.43
0 0638
8.138
8 41
0
8138
8
45
13 43
31 57
0 0587
8 73
8.998
0
8 73
9
50
13 43
36.57
0 0587
10.11
10 42
0
1011
10
13
13 07
¦0 0681
0 0638
¦0 0188
-0 0195
1
0 0188
11
14
13 07
0 932
0 0638
0 258
0 266
0 992
0 258
(The complete regression table is not shown.)
FinalWeighted Correlation Matin

Y
X



Y
1
0 867



X
0 867
1








Eigenvalues of FinalWeighted Correlation Matrix
Eval 1
Eval 2




0133
1 867





-------
Output for Biweight (Leverage ON) (continued).

Biweight Regression - Residuals QQ Plot


11.17



10.17


4
917

*

817
d


7.17
M
m


617



«



1



•
(£.



u
a



1
? 3,7
(0



to
*


2.17






|lnclv|0 05]« *1 90 |
1.17
- -


0.17
J *
J M * *


-0.63
* - "





.1 si Mvf0.05]--19
-------
Output for Biweight (Leverage ON) (continued).
Biweight Regression - Residuals vs Unsquared Distance Plot
|lndv-MD|0 OS) - *150
M
M
d
M
M
t '
M
4
|lrxiv-Res[0 05)» *1 90 |


i
* M
' M
-
4 «*
ndv-ResfO OS] - -1SO | *



|Max-MD(0.051-42M

28	>	
0.46 0.5$ 0 56 076 0 86 0 96 106 1 16 1.28 1 36 1.46 1 56 1 66 1.76 1.86 1.96 2 06 2 16 2.26 2.36 2.46 2 56 2 66 2.76 266 295
Unsquared Leverage Distances
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. The regression lines are produced since there is only one predictor variable.
395

-------
Output for Biweight (Leverage OFF).

R egression Analysis 0 utput
Date/Time of Computation
3/5/2008 7:38 34 AM
User Selected Options

From File
D VNarainVS cout_For_Windows\ScoutSoutceW/otkD atlr£ xcel\D EFtM E
Full Precision
OFF
Selected Regression Method
Biweight
Residual Biweight Tuning Constant
4 (Used to Identify Veitical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leverage
Off
Y vs V-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title FofYvsX Plots
Biweight Regression • Y vsX Plot
Title for Residual QQ Plot
Biweight Regression - Residuals QQ Plot
Residual Band Alpha
0.05 (Used in Graphics Residual Bands)
Residual vs Distance Plot
Not Selected
Show Intermediate Results
Do Not Display Intermediate Results

Intermediate Results Shown on Another Output Sheet




Number of Selected Regression Variables
1



Number of Observations
2G



Dependent Variable
Y







Residual Values used with Graphics Display



Upper Residual Indvidual (0.05) MD
1.903



Lower Residual Indvidual (0.05) MD
-1.903










Correlation Matrix




Y
X






Y
1
0 218






X
0 218
1














Eigenvalues of Correlation Matrix



EvaM
Eval 2







0 782
1.218







396

-------
Output for Biweight (Leverage OFF) (continued).
0 rdinary Leasts quares (0 LS) R egression R estAs

Estimates of Regression Parameters

	
Intercept | X





22 OS , 0 25G
i













S tdv of E stimated Regression Parameteis


Intercept
X






4107 | 0 233








AN OVA Table

Source of Variation
ss
DOF
MS
F-Value
P-Value
Regression
224.9
1
224.9
1.202
0.2838
Error
4490
24
1871


Total
4715
25








R Square Estimate
0 0477



	
MAD Based Scale Estimate
8.8S2



Weighted Scale Estimate
13.S8



IQR Estimate of Residuals
25 79



Det. of C0V[Regression Coefficients] Matrix
0.391




Final Reweighted R egression Resits

Estimates of Regression Parameters


Intercept
X






11 G4
0 358













Stdv of Estimated Regression Parameteis


Intercept
X






1.372
0.0SSS








ANOVA Table

Source of Vaiiation
ss
DOF
MS
F-Value
P-Value
Regression
34G
1
34G
2G 45
0.0002
Error
173 3
13 25
13.08


Total
519.3
14 25



R Square Estimate
0.666



MAD Based Scale Estimate
7G95



Weighted Scale Estimate
3G17



IQR Estimate of Residuals
25.53



Det of COV[Regression Coefficients] Matrix
0 0041G



397

-------
Output for Biweight (Leverage OFF) (continued).
Regression T able
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res
Wts[i.i]
Res Dist.
1
G
11.99
-5.993
0.0827
-1.G57
-1.73
0.678
1 G57
2
8.5
11.99
\
-3.493
0.0827
-0.96G
-1 008
0 883
0.966
3
8.5
12.35
-3.852
0.0758
-1.065
-1.108
0.859
1.065
4
10
1235
-2.352
0.0758
-0 65
-0.G76
0.946
0.G5 -
5
12
13.07
-1.0G8
0.0G38
-0.295
-0.305
0.989
0 295
G
40
13.07
2G 93
0.0G38
7.446
7.696
0
7.446
7
42.5
13.07
29.43
0.0G38
8.138
8.41
0
8138
8
45
13.43
31.57
0.0587
8.73
8.998
0
8 73
9
50
13.43
3G.57
0.0587
10.11
10.42
0
10.11
10
13
13.07
-0.0G81
0.0G38
-0.0188
-0 0195
1
0.0188
11
14
13.07
0.932
0.0638
0.258
0 266
0.992
0.258
12
17
13.43
3.574
0.0587
0 988
1.018
0.879
0.988
13
17.4
13.43
3.974
0.0587
1 099
1.132
0.851
1.099
14
22
15.22
G.783
0.0417
1.875
1.916
0.599
1.875
15
24
15.22
8 783
0.0417
2.428
2.481
0.38G
2.428
1G
25
15 22
9.783
0.0417
2.705
2.763
0.281
2.705
17
42.5
18.8
23.7
0.0514
G.553
G.728
0
6.553
18
43
18.8
24 2
0.0514
G.G91
G.87
0
6 691
19
44.1
19.52
24.58
0.0603
G.797
7.012
0
6.797
20
45.3
19.7
25 G
0.0629
7.079
7.313
0
7.079
21
20
22.38
-2.382
0.119
-0.659
-0.702
0.945
0.659
22
22
22.38
-0.382
0.119
-0.106
-0.113
0.999
0106
23
21
22.74
-1.74
0.129
¦0.481
-0.51G
0 97
0.481
24
23
23.1
-0.0984
0.14
-0.0272
-0.0293
1
. 0 0272
25
22.5
22.81
-0.312
0.131
-0 0862
-0 0925
0.999
0.0862
2G
24
23.1
0.902
0.14
0 249
0.2G9
0.992
0.249








Final Weighted Correlation Mdbix




Y
X






Y
1
0.8G7






X
0 367
1














Eigenvalues of Final Weighted Correlation Matrix



Eval 1
Eval 2







0.133
1.8G7







398

-------
Output for Biweight (Leverage OFF) (continued).
ID IW0°q-m)|
Biweight Regression - Residuals QQ Plot
lhcH0Q5] ¦ »190 [
Normal Quantiles
Biweight Regression - Y vs X Plot
S3 80
5160
49 60
*7 60
<5 60
4360
41 eo
3960
37 £0
35 60
3360
3160
'29 60
" 27 60
2560
2360
2160 •
1960

Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced even if checked on. The regression lines are produced since there is only one predictor variable.
399

-------
9.6 Huber Regression Method
1. Click Regression ~ Huber.
ill Seb.ut< 4VQi = [D :AMa rqin^cgutJoi^Wjndiws^Sco utSo urceWorkDat I n Ercel^TiAfiCliS.xU
Regression
~§ File Edit Configure Data Graphs Stats/GOF Outliers/Estimates
Multivariate EDA GeoStc
Navigation Panel |

0
1
2
Name

X
V

D.\Narain\Scout Fo .
1
4.37
5 23


2
4 56
5.74


3
4 26
4 93


4
4 56
574


5
43
5.19

OLS
LMS
Iterative OLS
Biweight
MVT
PROP
Method Comparison
2. The "Select Variables" screen (Section 3.3) will appear.
o Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window.
] tfuberj Options
"Regression Value
| 0X6
Residual Influence Function Alpha
•Number of Regression Iterations
I ^5
[Max = 50]
• R esiduals M D s D istribution —
Beta C* Chrsquare
"Intermediate Iterations ~
(* Do Not Display
f Display Every 5th
f Display Every 4th
C Display Every 2nd
Display All
-Identify Leverage Points 	
Leveiage
Select Leverage Distance Method
f Classical
C Sequential Classical
Huber
C PROP
MVT (Trimming)
Number of Leveiage Iterations 	
I ™
[Max = 50]
'Leverage MDs Distribution
<* Beta
f Chisquare
¦Initial Leverage Distances 	
C Classical
C Sequential Classical
Robust (Median, 1 48MAD)
OKG (MaronnaZamar)
KG (Not Qrthogonalced]
C MCD
"Leverage Value(s)
|	0 05
Leverage Influence Function
Alpha
I* Display Intervals
Confidence Coefficient
095
R Display Diagnostics
OK
Cancel

o Specify the "Regression Value." The default is "0.05.'
400

-------
o Specify the "Number of Regression Iterations." The default
is"10 "
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances." The default is "OKG
(Maronna Zamar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options.
o Click on the "Graphics" button to get the options window.
0 ptio nsRegtessibnG i;ap hjcs
W W Plots
W YvsY-Hat
|7 Y vs Residuals
P Y-Hat vs Residuals
I* Residuals vs Leverage
f~ QQ Residuals
XY Plot Title
Huber Regression - Y vsX Plot
Y vs Y-H atT itle
Huber Regression • Y vs Y-H at PI
Y vs Residuals Title
Huber Regression ¦ Y vs Residual
Y-Hat vs Residuals Title
"Regression Line - Fixing Other Regressors at —
No Line	Confidence Interval
Minimum Values
(* Mean Values
C Maximum Values
C Zero Values
I*7 Predection Interval
p Confidence Coefficient ~i
0.95
Huber Regression - Y-Hat vs Resi
Residuals vs Leverage Title
"Graphics Distribution
Beta
C Chisquare
Huber Regression - Residuals vs
"Residual/Lev. Alpha
I tt05 "
OK
Cancel
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the computations.

401

-------
Output example: The data set "BRADU.xls" was used for Huber regression. It has 3
predictor variables (p) and 75 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the Huber function iteratively. Finally, the estimated regression
parameters are calculated.
Output for Huber (Leverage ON).
Data Set Used: Bradu (predictor variables p = 3).
Date/Time of Computation
3/5/2008 7 51 39 AM
User Selected Options

From Fits
D \Narain\Scout_FoT_Windows\ScoutSource\WorkDatlnExcel\BRADU
Full Piecision
OFF
Selected Regression Method
Hubei
Residual Influence Function Alpha
0 05 (Used to Identify Veitical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leveiage
Identify Leverage Points (Outliers in X-Space)
Selected Leverage Method
PROP
Initial Leverage Distance Method
OKG (Maronna Zamar) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leverage) Variables
Leverage Distance Alpha
0.05 (Used to Identify Leverage Points)
Number of Leverage Delations
10 (Maximum Number if doesnt Converge)
YvsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hal vs Residual Plot
Not Selected
Title For Y vsX Plots
Huber Regression - Y vs X Plot
Title for Residual QQ Plot
Huber Regression ¦ Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Giaphics Residual Bands)
Title Residual vs Distance Plot
Hubei Regression • Residuals vs Unsquared Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results

Intermediate Results Shown on Another Output Sheet
Leverage Points areQutlieis inX-Space of Selected Regression Variables.







Number of Selected Regression Variables' 3






Number of Observations] 75






Dependent VariableI y













Residual Values used with G raphici D icplay






	
Upper Residual Indvidual (0 05) MD
1.94






Lower Residual Indvidual (0 05) MD
-1 94















Correlation Matrix







y
xl
x2
x3







y
1
0 946
0 962
0 743







x1
0 946
1
0 979
0 708







x2
0 962
0 979
1
0 757







x3
0 743
0 708
0 757
1







402

-------
Output for Huber (Leverage ON) (continued).
E igenvalues of Correlation M abix



Eval 1
Eval 2
Eval 3
Eval 4





0.0172
0.055G
0.368
3.559






0 rdinary Least S quares (0 LS) Ft egression R esdts

E stimates of R egression Parameters



Intercept
x1
x2
x3





•0.388
0 239
-0 335
0.383











StdvofEstimatedRegression Parameters



Intercept
x1
x2
x3





0.418
0.262
0155
0.129








AN OVA Table


Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
543.3
• 3
181.1
35.77
0.0000

Error
359.5
71
5.063



Total
902.8
74










R Square Estimate
0 602




MAD Based Scale Estimate
1.067




Weighted Scale Estimate
2.25




IQR Estimate of Residuals
1.468




Det. of COV[Regression Coefficients] Matrix
5.5107E-8





Initial Weighted Regression Iteration with 1 dent lied Leverage Ports

E stimates of R egression Parameters



Intercept
x1
x2
x3





-0.0105
0 0624
00119
¦0107











Stdv of Estimated Regression Parameters



Intercept
x1
x2
x3





0.197
0.0689
0.0684
0.0713








403

-------
Output for Huber (Leverage ON) (continued).
ANOVA Table




Souice of Variation
ss
DOF
MS
F-Value
P-Value



Regression
0.898
3
0 299
0.94
0 4272



Error
18.14
57
0.318





Total
19.04
60














R Square Estimate
0 0472






MAO Based Scale Estimate
0.902






Weighted Scale Estimate
0 564






Unsquared Leverage Distance lndiv-MD(0 05)
2 743






IQR Estimate of Residuals
1.236






Determinant of Leverage S Matrix
1.357








R egression T able with Leverage Option
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res
Wts[i,i] I ResDist
Lev Disl
OLS R~ist
1
9.7
-2.174
11 87
0 063
21.05
21.74
2.364E 13
21 05
29.44
1.502
2
101
-2 265
12.36
0.0599
21 92
22.61
1.074E-13
21 92
30 21
1 775
3
103
-2 418
12.72
0.0857
22 54
23 58
1 886E-14
22 54
31 89
1 334
4
95
-2.528
12.03
0 0805
21 32
22.23
6 931E-15
21 32
32 86
1.138
5
10
-2 443
12.44
0 0729
22 06
22 91
1 266E-14
22 06
32 28
1.36
6
10
-2 217
12.22
¦ 0 0756
21 66
22.52
7.228E-14
21 66
30.59
1.527
7
108
-2 219
13.02
0 068
23 08
23 9
6 576E-14
23 08
30 68
2 006
8
10.3
-2 24
12.54
0 0631
22 23
22.97
1 634E-13
22 23
29 8
1.705
9
9 G
-2 475
12.07
0 08
21 4
22 32
1 768E-14
21 4
31 95
1 204
10
99
-2 437
12.34
0.0869
21 87
22 89
5 017E-14
21 87
30.94
1 35
11
-0 2
-2 782
2 582
0 0942
4.577
4.809
1.424E-16
4 577
36 64
3.48
12
-0.4
-2.94G
2 54G
0144
4.513
4.877
3.684E-17
4.513
37.96
4.165
13
07
-2.589
3.289
0.109
5.83
6.177
1 069E-16
5 83
36.92
2.719
14
0.1
-2.556
2.656
0.564
4.708
7.127
1.478E-18
4.708
41.09
1.69
15
-0 4
00115
-0.412
0 0579
-0.73
-0.752
1
0.73
2 002
0 284
1G
06
0177
0.423
0 0759
0.75
0.78
1
0 75
2165
0.385
17
-0.2
¦0.0128
•0.187
0.0393
-0.332
-0 339
1
0.332
1.938
0.287
18
0
-0 0819
0.0819
0.0231
0.11
0.111
1
011
0 786
0.175
19
01
•0 0971
0197
0 0312
0 349
0.355
1
0 349
1 287
0 29
20
04
-0 0119
0.412
0 0476
0 73
0 748
1
0 73
2 067
0.151
21
09
-0 0253
0 925
0 0294
1 64
1 665
1
1.64
1 059
0 299
22
03
-0.151
0.451
0 0457
0 799
0 818 .
1
0 799
1 746
0 415
23
-0 8
0 05G1
•0 856
0 0293
-1 518
-1 54
1
1 518
1.163
0.19
(The complete regression table is not shown.)
404

-------
Output for Huber (Leverage ON) (continued).
Final Reweighted Regiession
*l
GC I



Estimates of Regression Paiameteis



Intercept
x1
x2
x3





-0.413
0 237
-0.382
0.44











Stdv of Estimated Regression Parameters



Intercept
x1
x2
x3





0 378
0 238
0142
0.118








AN OVA Table


Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
610
3
203.3
48.96
0 0000

Error
290.9
70 05
4.153



Total
900.9
73 05










R Square Estimate
0.677




MAD Based Scale Estimate
1.158




Weighted Scale Estimate
2.038




IQR Estimate of Residuals
1 579




Det of C0V[Regression Coefficients] Matrix
2.8216E-8





Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i.i|
Res/Scale
Stude" Res
Wts[i.i]
Res Dist
1
9.7
6.944
2.756
0.063
1.352
1.397
1
1.352
2
101
6.722
3 378
0 0599
1 657
1.709
1
1.657
3
10.3
8 045
2 255
0 0857
1 106
1.157
1
1.106
4
9.5
7.667
1 833
0.0805
0.9
0.938
1
0.9
5
10
7.651
2 349
0.0729
1.153
1.197
1
1.153
G
10
7.201
2 799
0 0756
1 374
1.429
1
1 374
(The complete regression table is not shown.)
FinalWeighted Correlation Matrix




V
x1
x2
x3




y
1
0 94
0 957
0.813




x1
0 94
1
0 977
0 774




x2
0S57
0.977
1
0.833




x3
0813
0.774
0 833
1










Eigenvalues of FinalWeighted Correlation Matrix



Eval 1
Eval 2
Eval 3
Eval 4





0.0165
0.0618
0 27
3 652





405

-------
Output for Huber (Leverage ON) (continued).
Huber Regression - Residuals QQ Plot
220
[inSviooq - ~i"5T|
4
M
a 4
-3 80
4 80
-5 80
-23	-24	-1.9	-1.4	-09	-04	01	0.6	1 1	1.6	2.1	26
Normal Quantiles
Huber Regression - Residuals vs Unsquared Distance Plot
|hdv-KO(O.OS]-t274




llrx*v-Res[0.0S] ¦ *1 9« 1
m a
. -j7)/
i '
. *
, M '• '•"A '
'' <•
* t *
d

d
Mv-Resfl 05| • -1 64 ]


d
a
*
M
|Max-MD|0 OS] - ~3.94
02	1 2	22	3.2	42	52	62	70
Unsquared Leverage Distances
406

-------
Output for Huber (Leverage ON) (continued).
Huber Regression - Y vs X Plot
- -
* 4 *	*
i
i
* '3	. ,
¦ -¦ '• ¦ .<
10	11	12	13
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. Regression lines are not produced since there are three predictor variables. Select other
"X" variables by using the drop-down bar in the graphics panel and click on "Redraw."
407

-------
Output for Huber (Leverage OFF).
		
Regression Analysis Output
Date/7irrte of Compulation
3V5/2008 8 1 5 51 AM
User Selected Options

From File
D \N arain\S cout_For_Wmdows\ScoutSource\WorkDatl nE xcel\B RAD U
FuD Precision
OFF
Selected Regression Method
Huber
Residual Influence Function Alpha
0 05 (Used to Identify Vertical Regression Outbeis)
Number of Regression Iterations
10 (Maximum Number if doesnt Converge)
Leverage
Off
Y vsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title For Y vsX Plots
Huber Regression -Y vsX Plot
Title foi Residual QQ Plot
Huber Regression - Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Restdua) vs Distance Plot
Huber Regression • Residuals vs Unsquaied Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results

Intermediate Results Shown on Another Output Sheet




Numbei of Selected Regression VariabtesJ 3



Number of Observations
75



Dependent Variable
y







Residual Values used with Graphics Display



Upper Residual Indvidua! (0 05) MD
1.94



Lower Residual Indvidua! (0 05) MD
•1 94




L .



CoirelationMatiK




y
x1
x2
x3




y
1
0 94G
0 962
0 743




x1
0 946
1
0.979
0 70S




x2
0 9G2
0 979
1
0 757




x3
0 743
0 708
0 757
1










Eigenvalues of Correlation Malm



Eval 1
Ev^2
Eval 3
Eval 4





0.0172
0 055G
0 368
3 559





Ordinary Least Squares (OLS) Regression RetJtt
Estimates of Regression Parameters



Intercept
x1
x2
x3





¦0 388
0 239
-0.335
0 383











Stdv of Estimated Regression Parameters



Intercept
x1
*2
x3





0 416
0 262
0155
0128





ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
543 3
3
181.1
35 77
0 0000

Error
359 5
71
5 063



Total
902 8
74




408

-------
Output for Huber (Leverage OFF) (continued).
R Square Estimate
0 602


MAO Based Scale Estimate
1 067


Weighted Scale Estimate
2 25


IQR Estimate of Residuals
1.468


Det of COV[Regression Coefficients] Matrix
5 5107E-8


Final Reweighted Regression ReaAs
Estimates of R egression Parameters



Inteicept
*1
x2
x3





¦0 413
0237
-0 382
0 44








	


S tdv of E stimated Regression Parameters


Intercept
*1
x2
x3



	

0 370
0 238
0142
0118



I
i

ANOVA Table



Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
610
3
203 3
48 96
0.0000

Error
290 9
70.05
4.153



Total
900 3
73 05










R Square Estimate
0 677




MAD Based Scale Estimate
1.158




Weighted Scale Estimate
2 038




IQR Estimate of Residuals
1 578




Del. of COV[Regression Coefficients] Matiwl 2 821BE-8




Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res
Wts[u]
Res Dist
1
97
6.944
2.756
0 063
1 352
1 397
1
1 352
2
101
6 722
3 378
0 0599
1 657
1 709
1
1 657
3
10.3
8 045
2 255
0 0857
1 106
1 157
1
1 106
4
85
7 667
1 833
0 0805
09
0 938
1
09
5
10
7 651
2 349
0 0729
1 153
1 197

1 153
6
10
7 201
2 799
0 0756
1 374
1 429 I 1
1 374
7
108
6 895
3 905
0 068
1 916
1985 j 1
1.916
(The complete regression table is not shown.)
Final WeightedCorrelation Matrix




V
x1
x2
x3




y
1
0 94
0 957
0 813




x1
0 94
1
0 977
0 774




x2
0 957
0 977
1
0 833




x3
0 813
0 774
0 833
1

I
I







E igenvalues of Final Weighted Correlation Mabii
	


Eval 1
Eval 2
Eval 3
Eval 4




0 0165
0 0618
0 27
3 652






-------
Output for Huber (Leverage OFF) (continued).
Huber Regression - Residuals QQ Plot


B"180 »»Moas|..i9«"1
-0.9	-0*	01	06	1.1	1.8	2.1	2.6
Normal Quantiles

Huber Regression - Y vs X Plot


1152



10.82
M


982
Jt M


8 82



7B2



682



582



> 4.82



3.82



2 82



182



082
a M d
j * jfda M 4 s m d
„ a
d

-0.18
-1 18
* 4 * * M '
*** **** *
, » - ¦' /
M

-2.1 B
-1



0123156789 10 11
x1
12
13
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced even if checked on. Regression lines are not produced since there are three predictor variables.
Select other "X" variables by using the drop-down bar in the graphics panel and click on "Redraw."
410

-------
9.7 MVT Regression Method
1. Click Regression > MVT.
i
Scouti4'.0J= [JJrAHarain^cout^fori^Windcws^coutSo.urceAWorkDatlnExcel^TAGKllOSSjJ
rn§ File Edit Configure Data Graphs 5tats/GOF Outliers/Estimates
Navigation Panel	II
Regression,
Multivariate EDA Geo5tats Programs Window Help
Name
D:\Naram\Scout Fo
1
1
Mack.-
I
Air-Row
43
37f
37|
23;
18,
Temp
80
80.
75,
TZ
62.
OLS
LM5
Iterative OLS
Btweight
Huber
PROP
Method Comparison
2. The "Select Variables" screen (Section 3.3) will appear.
o Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on the "Options" button to get the options window.
I MVT Options
¦Regression Values
I 01
R esidual Trim Percent
Alpha for Residual Outliers
0 05
•Number of Regression Iterations
I IF
[Max = 50]
¦Residuals MDs Distribution —
(• Beta C Chisquare
"Intermediate Iterations —
Do Not Display
Display Every 5th
C Display Every 4th
C Display Every 2nd
C Display All
~ I dentify Leverage Points 	
Leverage
"Select Leverage Distance Method
C Classical
C Sequential Classical
f Hubet

-------
o Specify the "Number of Regression Iterations." The default
is"10."
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances" The default is "OKG
(Maronna Zamar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options,
o Click on the "Graphics" button to get the options window.
H 0 [j tionsRegressionG rap hies
17 XY Plots
W YvsY-Hat
I* Y vs Residuals
Y-Hat vs Residuals
W Residuals vs Leverage
P QQ Residuals
XY Rot Title
I MVT Regression ¦ Y vs X Plot
	Y vs Y-HatTitle	
| MVT Regression -Y vs Y-Hat Plot
	Yvs Residuals Title	
|MVT Regression • Y vs Residuals
Y-Hat vs Residuals Title
|MVT Regression - Y-Hat vs Resid
Residuals vs Leverage Title
|MVT Regression - Residuals vs U
QQ Residuals Title
|MVT Regression - Residuals QQ
¦Regression Line • Fixing Other Regressors at —
f No Line	(~ Confidence Interval
Minimum Values
(* Mean Values
C Maximum Values
ZetoValues
F7 Predection Interval
-Confidence Coefficient —
I 095
"Graphics Distribution
Beta
C Chisquare
-Residual/Lev. Alpha
0 05
OK
Cancel
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the computations.
A
412

-------
Output example: The data set "STACKLOSS.xls" was used for MVT regression. It has
3 predictor variables (p) and 21 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the trimming percentage and a critical alpha iteratively. Finally,
the estimated regression parameters are calculated.
413

-------
Output for MVT (Leverage ON).
Data Set Used: Stackloss (predictor variables p = 3).
| Regression Analysis Output
Date/T ime of Computation
3/5/2008 8 22 37 AM
User Selected Options

From File
D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\STACKLOSS
Full Precision
OFF
Selected Regression Method
Multivariate Triming (MVT)
Residual MVT Trimming Percentage
01 (Used to Identify Vertical Regression Outliers)
Alpha lor Residual Outliers
005 (Planned Future Modification Used to Compare Residual MVT MDs)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leverage
Identify Leverage Points (Outliers in X-Space)
Selected Leveiage Method
PROP
Initial Leverage Distance Method
OKG (Maronna Zamar) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leverage) Variables
Leverage Distance Alpha
0 05 (Used to Identify Leverage Points)
Number of Leverage Iterations
10 (Maximum Number if doesn't Converge)
YvsY-hat Rot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title For YvsX Plots
MVT Regression - Y vsX Rot
Title for Residual QQ Plot
MVT Regression ¦ Residuals QQ Plot
Residual B and Afciha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
MVT Regression - Residuals vs Unsquared Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results

Intermediate Results Shown on Another Output Sheet
Leveiage Points are Outliers inX-Space of Selected Regression Variables.

I






Number of Selected Regression Variables
3






Number of Observations
21






Dependent Variable
Stack-Loss













I ResidualValues used with Graphics Display






Upper Residual Indvidual (0.05) MD
1 889






Lower Residual Indvidual (0 05) MD
¦1 889















Correlation Matrix







Stack-Loss
Air-Flow
Temp
Acid-Conc







Stack-Loss
1
0 782
05
0 92







Air-Flow
0.782
1
0 391
0 878






	
Temp
05
0 391
1
04






Acid-Conc
0 92
0 87G
04
1
















Eigenvalues of Correlation Matrix






Eval 1
Eval 2
Eval 3
Eval 4








0 0532
0 215
0 734
Z997








414

-------
Output for 1MVT (Leverage ON) (continued).
Oidinaiy Least Squaies (OLS) Regression Rente



E stimates of R egression Parameters





Intercept
Air-Flow
Temp
Acid-Conc





¦33 92
0.716
1 295
-0152











S tdv of E stimated R egiession Parameters



Intercept
Air-Flow
Temp
Acid-Conc





11 9
0.135
0 368
015S








AN OVA Table


S ource of Variation
SS
DOF
MS
F-Value
P-Value

Regression
1890
3
630 1
59 9
0 0000

Error
178 8
17
10.52



Total
2069
20










R Square Estimate
0 914




MAD Based Scale Estimate
2 768




Weighted Scale Estimate
3 243




IQR Estimate of Residuals
4313




Det of COV[Regression Coefficients] Matrix
1 0370E-5





Initial Weighted Regression Iteration with Identified Leverage Porte

Estimates of Regression Parameters



Intercept
Air-Flow
Temp
Acid-Conc





-39 54
0 709
1 291
•0.151











S tdv of E stimated R egression Parameter



Intercept
Air-Flow
Temp
Acid-Conc





12.1
0143
0 373
0162








ANOVA Table


S ource of Variation
SS
DOF
MS
F-Value
P-Value

Regression
1421
3
473 5
44 06
0 0000

Error
1722
16 03
10 75



Total
1593
19.03




R Square Estimate
0692






MAD Based Scale Estimate
2 738






Weighted Scale Estimate
3278






Urtsquaied Leveiage Distance lndiv-MD(0 05]
2619






IQR Estimate of Residuals
4169






Determinant of Leveiage S Matrix
5200








Reg? estion Table with Leverage Option
Obs
Y Vector
Yhat
Residuals
Hat(i.i)
Res/Scale
Slude~ Res
Wlsful ] Res Dist
Lev Dist.
OLS R~isl
1
42
38 58
3417
0X2
1 042
1 247
0 562
1 042
2 931
0 997
2
37
39 73
¦1 734
0318
-0 529
¦0 641
0 497
0529
3 02
0 591
3
37
32 31
4 694
0175
1 432
1 576 | 1
1 432
2 073
1 405
(The complete regression table is not shown.)

-------
Output for IMVT (Leverage ON) (continued).
Final Reweighted Regression Resdts

E stimates of Regression Parameters



Intercept
Air-Flow
Temp
Acid-Conc





-42.45
0.S57
0 556
¦0.109











S tdv of E stimated R egression Parameters



Intercept
Air-Flow
Temp
Acid-Conc





7.385
0.0945
0 2G4
0.0968








AN OVA Table


S ource of Variation
SS
DOF
MS
F-Value
P-Value

Regression
1890
3
630
158.1
0 0000

Error
59 78
15
3.986



Total
1950
18










R Square Estimate
0 969




MAD Based Scale Estimate
2.069




Weighted Scale Estimate
1.996




IQR Estimate of Residuals
2.995




Det. of C0V[Regression Coefficients] Matrix
3.4468E-7





Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i.i]
Res/Scale
Stude" Res
Wts[u]
Res Dist.
1
42
39 4
2.604
0 302
1.305
1 561
1
1.305
2
37
395
-2.504
0 318
-1.254
-1 519
1
1.254
3
37
33 39
3.607
0175
1.807
1 989
1
1.807
4
28
20 73
7.273
0129
3.643
3 902
0
3.643
5
18
19 G2
-1.616
0 0522
-0 81
-0.832
1
0.81
6
18
2017
-2.172
0.0775
-1.088
-1.133
1
1.088
(The complete regression table is not shown.)
Final Weighted Correlation Mabbc




Stack-Loss
Air-Flow
Temp
Acid-Conc




Stack-Loss
1
0.836
0 474
0 979




Air-Flow
0 836
1
0 418
0 869




Temp
0 474
0.418
1
0 423




Acid-Conc
0.979
0 869
0 423
1










Eigenvalues of Final Weighted Correlation Matrix



Eval 1
Eval 2
Eval 3
Eval 4





0 0169
019
0.724
3.069





416

-------
Output for MVT (Leverage ON) (continued).


MVT Regression - Residuals QQ Plot

292



2.62
2.32


a
2.02



1.72


|ft**0.0S] - *1.33 |
1 42
1.12

M

0.82

M *

0.52
ui022
13
3 -0 .08
-o
2-0 38
OL
n -0 68
•
N
~X3 -0.98
¦g -138
35 -158
1M lntvJO.051 ¦ -1 69 |

M
M
* *
M
. '
4
* *
J
M

-2.18



-2.48



-2.78



-3 06



-3.38



-3.68



-3.98 M



-4 28



-4 58



-23 -1B
-13
-0.8 -03 03 0.7 13
Normal Quantiles
1.7 23
MVT Regression - Residuals vs Unsquared Distance Plot
fk^dhr'-MCfO OS] - *2 ft2
M
|lndiv-ftes|0 05) ¦ +1 89 |


M
Ji *
4
M
d
J
•
¦M
'M
J
4 i
4
hc*-Res|0 05] ¦ -1 89 |


*
|Max-fcC>|0 05] - *327

054	0.74	0 94	1 14	134	1.54	1 74	1.94	214	2 34	2 54	274	2 94	314	3.34 3.41
Unsquared Leverage Distances
417

-------
Output for MVT (Leverage ON) (continued).
4550
4350
41.50'
3950
37.50
35.50
3350
31.50
29.50
27.50
ft
ft
25.50
c
3 2350
2150
1950
17.50
15.50
13.50
11.50
9.50
7.50
5.50
350
MVT Regression - Y vs X Plot
Air-Flow
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. The regression lines are not produced since there are three predictor variables. Select
other "X" variables by using the drop-down bar in the graphics panel and click on "Redraw."
418

-------
Output for iMVT (Leverage OFF).
Date/Time of Computation
User Selected Options
From File
! Regression Analysis Output
11/10/2008 8 47 07 AM
D \Narain\Scout For Windows\ScoutSouice\WorkDatlnE>icel\STACKLOSS
Full Precision I OFF
Selected Regression Method
Multivariate Triming (MVT)


MVT T rim Percentage
0.05 (Used to Identify Regression Outliers)


Number of Regression Iterations
10 (Maximum Number if doesn't Converge)


Leverage
Off


Res-Lev Rectangle Alpha
0 05 (Used with Graphics Confidence 8ands)


Title for Residual QQ Plot
MVT Regression - Residuals QQ Plot


Title Residual vs Distance Plot IMVT Regression • Residuals vs Unsquared Distance Plot
Title For Y vs X Plots) | MVT Regression - Y vs X Plot
ResidualQQPIot IN ot S elected
Y vs Residual Plot I Not Selected
Y-hat vs Residual Plot Not Selected
Show Intermediate Results I Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Number of Selected Regression Vanablesj 3
Number of Observations^ 21
		
-
- - - - -
	
	
DependantVariablej Stack-Loss
- -

--
Ordinary Least Squaies (OLS) Regression R estte

"
	
Regression Parameters Vector
Intercept j Air-Flow | Temp | Acid-Conc
Estimates

	1
—

¦39 92 0 71G j 1.295 [ -0.152
		












S tdv of R egression Estimates Vector
Intercept | Au-Flow j Temp. | Acid-Conc |
	
....
- - -
— -
-
	
119 ^ 0135 | 0 363 |	0156 |



— -


ANOVATab
Source of Variation | SS
e
DOF | MS
F-Value
P-Vatue"


Regression 1890
Error | 178 8
3 I 6301
17 j 1052
59.9
0 0000
	
"
Total | 2069
20 ^
	 	


J	
419

-------
Output for MVT (Leverage OFF) (continued).
R Square Estimates
0.914 |
|
|
MAD Based Scale Estimates
2.7G8 |
I
I
Weighted Scale Estimates
3.243


	

IQR Estimates
4 313



Det. of C0V[Regression Coefficients] Matrix
1.0370E-5




Results From the Regression Operation
Regression Parameters Vector Estimates


I
Intercept
Air-Flow
Temp.
Acid-Conc






-43 7
0.889
0.817
-0.107













S tdv of Regression Estimates Vectoi




Intercept
Air-Flow
Temp.
Acid-Conc
|
!


9.432
0119
0.325
0.125
|




I


AN OVA Table



Source of Variation
SS
DOF
MS
F-Value
P-Value


Regression
1957
3
652 3
98.82
0.0000


Error
105.6
16
6.601


I


Total
2063
19











1




R Square Estimates
0.949





MAD Based Scale Estimates
3.046

1


Weighted Scale Estimates
2.569





IQR Estimates
3.365





Det of C0V[Regression Coefficients] Matrix
2 2471E-6










Regression Table




Obs
Y Vector
Vhat
Residuals
Hat|u]
Res/Scale
Stude~ Res
Wts[i,i]
Res Dist.

1
42
39.94
2.062
0.302
0.803
0 96
1
0.803

2
37
40.04
-3.045
0 318
-1 185
-1.435
1
1.185

3
37
33 75
3 248
0175
1 264
1 392
1
1 264

4
28
21.7
6.302
0.129
2.453
2 627
1
2.453

5
18
20.07
-2.065
0.0522
-0 804
-0.826
1
0 804

6
18
20.88
-2 882
0.0775
-1 122
-1.168
1
1.122

7
19
21 06
-2 055
0.219
-0.8
-0.905
1
0.8

8
20
21.0G
-1 055
0.219
-0 411
•0 465
1
0.411

9
15
17.33
-2.325
0.14
-0.905
-0.976 | 1
0.905

(The complete regression table is not shown.)
420

-------
Output for MVT (Leverage OFF) (continued).
n 031
5
3-oxe
S-O.J®
a:
¦a-o.M
Tj-0.96
<3
3-158
MVT Regression - Residuals QQ Plot
|lndr<10051'*r8i~^
-3.08
-338
Normal Quantiles

-------
Output for MVT (Leverage OFF) (continued).
MVT Regression - Y vs X Plot
4550
43.50
4150
3950
37.50
3550
33.50
31.50
2950
2750
M
M
2550
r50
2150
19.50
17.50
,5»	-
1350
d
1150
M
950
750	*
550
350 1	J			r-	;	;			r-
«Z	57	67	77
Air-Flow
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced even if checked on. Regression lines are not produced since there are three predictor variables.
Select other "X" variables by using the drop-down bar in the graphics panel and click on '•Redraw."
Note: There are at least four regression outliers (I, 3, 4, and 21) in the data set of size 2 Ion the previous
page. However, the trimming percentage selected is only 5%, which is equivalent to one outlier in the data
set of size 21. The user may want to use the MVT method with a higher trimming percentage to identify all
of the outliers.
9.8 PROP Regression Method
1. Click Regression ~ PROP.
3® Scout 4.0 - [D:\NarainVScoul_For_VVindows\ScoutSource\WorkDatlnExcet\STARCLS]
¦S File Edit Configure Data Graphs
Stats/GOF
Outliers/Estimates
9 Multivariate EDA GeoStats Programs Window
Help
Navigation Panel |

0
1
2
OLS
LMS
Iterative OLS
Biweight
Huber
5
6
7
8
Name
y
X
ZZJ




D:\Narain\Sccut_Fo...
1
I ra
4.37






2
5.74
4.56






3
4.93
4.26

MVT




4
5.74
4.56

PROP




5
5.19
4.3

Method Comparison




2. The "Select Variables" screen (Section 3.3) will appear.
422

-------
o Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window.
m PROP Options
'Regression Value
r
005
Residual Influence Function Alpha
•Number of Regression Iterations —
I 10
[Max = 501
"Residuals MDs Distribution —
(* Beta C Chsquare
"IntermediateHerations ~
(* DoNotDisplay
C Display Every 5th
Display Every 4th
f Display Every 2nd
Display AS
Identify Leverage Points 	
W Leverage
"Select Leverage Distance Method
C* Classical
C Sequential Classical
C Huber
f PROP
C MVT (Tnmrrung)
-Numbei of Leverage Iterations —
I 10
[Max - 50]
-Leverage MDs Distribution 	
Beta
f" Chisquaie
"Irutial Leveiage Distances 	
f" Classical
C Sequential Classical
C Robust [Median, 1 48MAD)
f* OKG (MaicmnaZamat)
KG [Not Oithogonabzed)
C MCD
•Leverage Vafue(s)
(7 Display Intervals
Confidence Coefficient
095
W Display Diagnostics
Leverage Influence Function
Alpha
Cancel
A
o Specify the "Regression Value." The default is "0.05."
o Specify the "Number of Regression Iterations." The default
is"10."
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances." The default is "OKG
(Maronna Zamar)."
423

-------
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options.
• Click on the "Graphics" button to get the options window.
§H O pti o nsRegijessio nG rap hies
17 XY Plots
17 Yvs Y-Hat
17 Y vs Residuals
17 Y-Hat vs Residuals
(7 Residuals vs Leverage
17 QQ Residuals
XV Plot Title
PROP Regression - Y vs X Plot
YvsY-HatTitle
PROP Regression - Y vs Y-Hat PI
Y vs Residuals Title
PROP Regression • Y vs Residual
Y-Hat vs Residuals Title
-Regression Line - Fixing Other Regressors at —
<• No Line	17 Confidence Interval
C Minimum Values _ _ . .. , , .
y Predection Interval
C Mean Values
C Maximum Values
-Confidence Coefficient -
(*" Zero Values
0 95
PROP Regression ¦ Y-Hat vs Resi
Residuals vs Leverage Title
Graphics Distribution
(* Beta
f Chisquare
PROP Regression - Residuals vs
QQ Residuals Title
PROP Regression - Residuals QQ
¦Residual/Lev. Alpha —
0.05
OK
Cancel
A
o Specify the preferred plots and the input parameters.
o Click "OK" to continue or "Cancel" to cancel the options.
o Click "OK" to continue or "Cancel" to cancel the computations.
Output example: The data set "STARCLS.xls" was used for PROP regression. It has 1
predictor variables (p) and 47 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the PROP function iteratively. Finally the estimated regression
parameters are calculated.
424

-------
Output for PROP (Leverage ON).
Data Set Used: Star Cluster (predictor variables p = 1).
; Regression Analysis Output
Date/Time of Compulation
User Selected Options
3/12/2008 8 09 44 AM
Fiom File
D \Narain\Scout_ForJw,indo'iivs\ScoutSource\WorkDatlnExcel\STARCLS
Fufl Precision
OFF
Selected Regression Method
PROP
Residual Influence Function Alpha
0 05 (Used to Identify Vertical Regression Outbers)
Number of Regression Iterations
10 (Maximum Nimber if doesn't Converge)
Leverage
Identify Leverage Points (Outliers snX-Space)
Selected Leverage Method
PROP
Initial Leverage Distance Method
OKG (Marortna Zamai) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leveiage] Variables
Leverage Distance Alpha
0 05 (Used to Identify Leverage Points)
Number of Leverage llecattons
TO (Maximum Number if doesn't Converge)
YvsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat v$ Residual Rot
Not Selected
T itle ForY v$X Plots
PROP Regression • Y vs X Plot
Title for Residual QQ Plot
PROP Regression - Residuals QQ Rot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
PROP Regression - Residuals vs Unsquared Leveiage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results

Intermediate Results Shown on Another Output Sheet
Leveiage Points aie Outliers inX-Space of Selected Regression Variables.






Number of Selected Regression Variables' 1


I


Number of Observations! 47


I




Dependent Variable
y









J


Residual Values used with Graphics Display






Upper Residual Indvidua! (0 05] MD
1 929






Lower Residual Indvidual (0 05) MD
•1 929


[


|
|






Correlation Mati oc







y
X









y
1
-021









X
0 21
1
I







425

-------
Output for PROP (Leverage ON) (continued).
Eigenvalues of Correlation Matrix



Eval 1
Eval 2







0.79
1.21








0 rdinary Least S quares (0 LS) R egression R estls

Estimates of Regression Parameters



Intercept
X







6.793
•0.413















Stdv of Estimated Regression Parameters



Intercept
X







1.237
0.286










ANOVA Table


Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
0.665
1
0.665
2.085
0.1557

Error
14.35
45
0.319



Total
15.01
46










R Square Estimate
0 0443




MAD Based Scale Estimate
0 651




Weighted Scale Estimate
0.565




IQR Estimate of Residuals
1.025




Det. of C0V[Regression Coefficients] Matrix
5.5584E-4





Initial Weighted Regression Iteration with Identfied Leverage Pewits

E stimates of R egression Parameters



Intercept
X







-7.97
2.93















S tdv of E stimated R egression Parameters



Intercept
X







2.396
0.543










(The complete regression table is not shown.)
426

-------
Output for PROP (Leverage ON) (continued).
AN OVA Table




Souice ofVariation
SS
DOF
MS
F-Value
P-Value



Regression
4 205
1
4 205
29 09
0 0000



Enoi
5G47
39.0G
0.145





Total
9.852
40 0G














R Square Estimate
0.427






MAD Based Scale Estimate
0 45






Weighted Scale Estimate
0 38






Unsquaied Leverage Distance Indiv-MDIO 05)
1 929






IQR Estimate of Residuals
0G0S






Determinant of Leverage S Matrix
00118








Regression Table with Leveiage Option
Obs
Y Vector
Yhat
Residuals
Hat[i.i]
Res/Scale
Stude~ Res
Wts(i.i] j Res Dist
Lev Dist.
OLS R~ist
1
5 23
4 83G
0 394
0 0222
1.037
1.049
1
1 037
0 348
0 43
2
5.74
5 393
0 347
0 0373
0.S14
0.931
1
0 914
1 405
1 472
3
4 93
4 513
0 417
0 0219
1 09G
1.108
1
1 09G
1 362
0182
4
5 74
5.393
0 347
0.0373
0.914
0 931
1
0 914
1 405
1.472
5
5.19
4G31
0.559
0 0213
1.471
1 487
1
1 471
0 993
0 308
8
5.4G
51
0 3G
0 0271
0 948
0 961
1
0 948
0 483
0 903
7
4.S5
3 283
1 3G7
0 0781
3.59G
3.745
0 0135
3 59G
5 237
0 985
8
5 27
5 422
•0152
0 0387
-0.399
•0.407
1
0 399
1.497
0.G47
9
5 57
4 513
1 057
0.0219
2.779
2 81
1
2 779
1.3G2
0.951
10
5.12
4 83G
0 284
0 0222
0.748
0 75G
1
0 748
0 348
0.235
11
5.73
2 257
3.473
0194
9.134
10.17
3 3033E-4
9134
8 4G5
0.G71
12
5.45
5 012
0 438
0 025
1 153
1 1G8
1
1 153
0 206
0 8G3
13
5.42
5158
0 2G2
0.0287
0G89
0 689
1
0 689
0.GG7
0 847
14
4 05
3 781
0 2G9
0 0444
0 708
0.724
0 0923
0 708
3.668
1 924
15
4 2G
4.G01
¦0 341
0 0214
-0 898
¦0 908
1
0.898
1 086
1 347
16
4 58
4.982
¦0.402
0 0244
-1 058
-1 071
1
1.058
0114
0G85
17
3 94
4 42G
-0.486
0 0229
-1 277
-1 292
1
1 277
1 639
1.957
18
4.18
4 982
-0 802
0 0244
-211
-2136
1
211
0114
1 393
19
4.18
4.42G
¦0 24G
0 0229
-0 646
-0 653
1
0 646
1 639
1.532
20
5.89
2.257
3.633
0.194
9.555
10 64
3.3033E-4
9.555
8 465
0 955
21
4 38
4.G01
•0 221
0 0214
-0 582
-0 589
1
0 582
1 08G
1 134
22
4 22
4.G01
¦0 381
0 0214
-1 003
-1 014
1
1 003
1 08G
1.418
23
4 42
4.982
-0.562
0 0244
-1 479
-1.497
1
1 479 I 0114
0.9G8
(The complete regression table is not shown.)
427

-------
Output for PROP (Leverage ON) (continued).
Final Reweighted Regression Renfts

Estimates of Regiession Parameter



Intercept
X







•7 955
2 92G















S tdv of E stimated R egression Parameters



Intercept
X







1 911
0.434










AN OVA Table


Source of Variation
SS
DOF
MS
F-Value
P-Value

Regression
5.401
1
5 401
45 44
0 0000

Error
4 614
38.83
0119



Total
10 01
39.83










R Square Estimate
0 539




MAD Based Scale Estimate
0 45




Weighted Scale Estimate
0.345




IQR Estimate of Residuals
0 607




Det. of COV[Regression Coefficients] Matrix
5 4829E-4





Regiession Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale jStude" Res
Wts[i.i]
Res Dist
1
5 23
4 831
0.399
0 0222
1 157
1 17
1
1.157
2
5 74
5.387
0.353
0 0373
1.023
1 043
1
1 023
3
4.93
4.509
0 421
0.0219
1 22
1 234
1
1 22
4
5 74
5 387
0 353
0.0373
1 023
1 043
1
1 023
5
519
4.626
0 564
0 0213
1 635
1 652
1
1 635
e
5 46
5.095
0.365
0 0271
1.06
1.075
1
1.06
7
4 65
3.281
1 369
0 0781
3 972
4.137
0 0628
3 972
9
5 27
5.416
¦0.146
0 0387
•0 425
¦0.433
1
0 425
(The complete regression table is not shown.)
Final Weighted Correlation Matrix




y
X






y
1
0 759






X
0 759
1














Eigenvalues of Final Weighted Correlation Matrix



Eval 1
Eval 2







0 241
1 759







428

-------
Output for PROP (Leverage ON) (continued).

PROP Regression - Residuals QQ Plot

1227


11.27
-

10.27
M
»

927


827


Standardized Residuals
'

1 27
027
-0 73
-1.73 IjndrvtO OSI - -1 93 |
	
--- *
*****
-
]lnciv(0O5J - «1.93 |
-
-2 73





-27 -22
-1.7 -1.2 -0.7 -02 0.3 03 13 Ifl
Normal Quantiles
2.3
PROP Regression - Residuals vs Unsquared Distance Plot
llndrv-KtO 05] ¦ *1 93
-
*
M
M
*
|lndrv-ResJ0 05]» »
93 |
— -
* - *
* M *
' '
• . ' . '
. ' ' * ^
! - ' 1
hcH-FesfOOSl-^l 93|


Jt
|Max-MT<0 05] ¦ *3.10

0.08	0.53	1.08	1 58	2 08	2 58	3 08 3.18
Unsquared Leverage Distances
429

-------
Output for PROP (Leverage ON) (continued).
PROP Regression - Y vs X Plot
651
631	J
611
591	jt
391
371	'
3.37	3.57	3.77	3.97	4.17	4.37	4.57
X
Interpretation of Graphs: Observations which are outside of the horizontal lines on the residual Q-Q plot
or on the residual versus unsquared leverage distances represent regression outliers. Observations lying to
the right of the vertical lines represent leverage outliers, leverage points lying between the two horizontal
lines represent good leverage points, and the rest of the leverage points represent bad leverage points. Both
the classical and robust regression lines are also shown on the y vs. x scatter plot.
430

-------
Output for PROP (Leverage OFF).
In order to demonstrate the usefulness of the leverage options (when several leverage points may be
present), the Star cluster data is considered again with the leverage option off.
The output thus obtained is given as follows.
; R egression Analysis 0 utput
Date/T ime of Computation
3/12/2008 8 15 48 AM
User Selected Options

From File
D \Narain\Scout_For_Windows\ScoutSouiceWorkDatlnExcel\STARCLS
Full Precision
OFF
Selected Regression Method
PROP
Residual Influence Function Alpha
0.05 (Used to Identify Vertical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number il doesn't Converge)
Leverage
Off
YvsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title For Y vsX Plots
PROP Regression ¦ Y vsX Plot
Title for Residual QQ Plot
PROP Regression - Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
PROP Regression - Residuals vs Unsquared Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results

Intermediate Results Shown on Another Output Sheet
Number of Selected Regression Variables
Number of Observations
47
Dependent Variable
Residual Values used with Graphics Display
Upper Residual Indvidual (0 05) MD 1 929
Lower Residual Indvidual (0.05) MD
-1.929
Correlation Matr be
¦0 21
¦0 21
Eigenvalues of Correlation Mdbix
Eval 1
0 79
Eval 2
1 21
431

-------
Output for PROP (Leverage OFF) (continued).
Ordinary Least S quares (0 LS) R egression RestAs

Estimates of Regression Parameters


Intercept
X






6.793
¦0.413













S tdv of Estimated Regression Parameters


Intercept
X






1 237
0.286







'
ANOVA Table

Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0.665
1
0.665
2.085
0.1557
Error
14.35
45
0 319


Total
15.01
46








R Square Estimate
0.0443



MAD Based Scale Estimate
0.651



Weighted Scale Estimate
0.565



IQR Estimate of Residuals
1 025



Det. of C0V[R egression Coefficients] Matrix
5.5584E-4




Final Re weighted Regression Resdts

Estimates of Regression Parameters


Intercept
X






6.799
•0.414













Stdv of Estimated Regression Parameters


Intercept
X






1 235
0.286








ANOVA Table

S ource of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0 668
1
0.668
2.102
0.1540
Error
14.29
44.95
0.318


Total
14.95
45.95



432

-------
Output for PROP (Leverage OFF) (continued).
R Square Estimate
0.0447




MAD Based Scale Estimate
0 651




Weighted Scale Estimate
0 5S4




IQR Estimate of Residuals
1.025




Det of C0V[Regression Coefficients] Matrix
5 5303E-4




Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude~ Res
Wts[i,i]
Res Disl
1
5 23
4 988
0.242
0.0222
0.429
0 433
1
0.429
2
5.74
4 91
0.83
0 0373
1.473
1.501
1
1.473
3
4 93
5 034
¦0104
0 0219
-0.104
-0187
1
0184
4
5 74
4 91
0 83
0 0373
1 473
1 501
1
1 473
5
5.19
5 017
0.173
0 0213
0 306
0 309
1
0 306
6
5 46
4 951
0.509
0 0271
0 903
0 915
1
0.903
7
4.65
5 208
¦0 558
0 0781
¦0 99
¦1.031
1
0 99
8
5 27
4 906
0 364
0 0387
0 646
0.659
1
0 646
9
5.57
5.034
0 536
0 0219
0 951
0 961
1
0.951
10
5.12
4.988
0.132
0.0222
0 233
0.236
1
0.233
11
5 73
5 353
0.377
0.194
0.669
0 745
1
0 669
12
5.45
4 964
0.486
0 025
0.863
0 874
1
0.863
13
5.42
4 943
0.477
0 0287
0 846
0 859
1
0 846
14
4 05
5138
¦1 088
0 0444
-1.929
¦1 974
1
1 929
15
4 26
5 022
¦0.762
0 0214
¦1 351
¦1 366
1
1.351
16
4.58
4.968
¦0.388
0 0244
-0 688
-0.6S6
1
0.688
17
3.94
5.046
-1.106
0.0229
¦1.963
¦1.985
0 951
1.963
18
4.18
4.968
•0.788
0.0244
¦1.397
-1.415
1
1 397
19
4.18
5.046
-0 866
0 0229
-1.537
¦1.555
1
1.537
20
5.89
5.353
0 537
0194
0.952
1.061
1

0 952
21
4.38
5.022
-0.642
0.0214
-1.138
¦1.15
1

1.138


(The complete regression table is not shown.)



FinalWeighled Correlation Matrix




V
X






V
1
¦0 212






X
¦0 212
1














Eigenvalues of FinalWeighled Correlation Matrix



Eval 1
Eval 2







0.738
1 212








-------
Q-Q plot of Standardized Output for PROP (Leverage OFF).
PROP Regression - Residuals QQ Plot
1 56
1 66

|WM0 05]..1S3 |
1 48
a a

1 28
*
*

1 06
0-86
M '
i *
a

066
****

0.46
EJ 0 26
u
2 006
0£
"2-o.u
1-034
m
T>
j-ou
v>
-0 74
-0 94
•1.14
*
M *
***
4,"
M
a » "

•134
-154
*
a
M *
M

KMOM-tlll


-2.14


334


-27 -22
-1.7 .1.2 -07 -02 0.3 08 13 18
Normal Quantil«s
23
434

-------
Q-Q plot of Standardized Output for PROP (Leverage OFF).
Interpretation of Graphs: Observations (if any) lying outside of the horizontal lines in the Q-Q plot are
considered to be regression outliers. The Leverage Distances vs. standardized residuals plot is not
produced as the leverage option was not activated. Regression lines are produced since there is only one
predictor variable. It is easy to see from the above graph (where both the classical and robust regression
lines are overlapping and attracted toward the outliers) that one should use the leverage option to properly
identify all of the leverage points. Once the leverage points are identified, the robust regression method
should be used to distinguish between the good and bad leverage points.
9.9 Method Comparison in Regression Module
The "Method Comparison" option in the "Regression" drop-down menu can be used to
compare the regression estimates of bivariate data obtained using various classical and
robust regression methods. Regression lines for the selected regression methods are
drawn on two-dimensional scatter plots. These comparisons are done in the "Bivariate
Regression Fits" drop-down menu. The method comparison module also compares the
residuals obtained by a single regression method against residuals obtained from one or
more methods. A comparison of fits (Y-hat) from one method against fits from the other
methods is done in a similar way. These comparisons of the residuals and fits from the
various regression methods are done in "R-R Plots" and "Y-Y-hat Plots," respectively.
435

-------
9.9.1 Bivariate Fits
Click Regression ~ Method Comparison > Bivariate Regression Fits.

Scout 200.8jo[p:j\Na/afn\Scqut|_For._WiNdo>ys,VScoutSource\WqrkI)at!nExcel\STARQiS]
Regression. I
File Ed* Configure Data Graphs Stats/GOF Outliers/Esttriotes QA/QC
Navigation Panel I
Nama
D \Narain\Scout_Fo
D VNarain\Scout_Fo
lndex_Plot gst
lndex_P!ot_a gst
InterQC gst
D \Narain\Scout Fo
MJbvarlate EDA GeoStats Proyams Window Help
2. The "Select Variables" screen (Section 3.3) will appear.
HI Select; Variables, tb;Graphj
Variables
Name
ID
Count
47
»
Select Y Axis Variable
Options
Name
ID
Count
Select X Axis Variable
Name
ID Count
Select Group Variable
"73
OK
Cancel

Select the Y axis variable and the X axis variable from the "Select
Variables to Graph" screen.
If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on "Options" for method comparison options.
436

-------
Select Regression,Method!Gbnipariso.n,Option5
-Select Regression Method (5) —
r OLS
r LMS
I- Iterative OLS w/o Leverage
r Iteiatrve OLS with Leverage
["" Biweight w/o Leverage
f Biweight with Leverage
Huber w/o Leverage
V	Huber with Leverage
P MVT w/o Leverage
l~~ MVT with Leverage
V	PROP w/o Leverage
f~" PROP with Leverage
Tile for Graph (~
Reaiesslon Lirre P
IE?]
The options in the window shown above represent the different
options.
~a
Select- Re^ressioniMeJhod|Cpm^ariso_niOp.tions;
• Select Regression Method (*) —
.r	ols
r	LMS
(7	Iterative OLS w/o Leverage
P"	Iterative OLS with Leverage
[7 B weight w/o Leverage
f"	B iweight with Leverage
f7	Huber w/o Leverage
r	Huber with Leverage
W	MVT w/o Leverage
l"~	MVT with Leverage
(7 PROP w/o Leverage
r	PROP with Leverage
-Number of Regress or Iterations
I 10
[Max -50]
Title for Giaph
¦Reg Iterative OLS and/or MVT
005
Critical Alpha
¦Reg Biweight Turing Constant
Tunr>g Constant
•Reg Huber and/or PROP —
| 005
Influence Function Alpha
-Regresswn MD* Distribution
(* Beta Chtsquaie
"Reg Multivariate Trimming
I 01
Tiunming
Percentaqe
Regression Line Plot
m
A
O
The options selected in the window shown above are the options
for the regression methods without the leverage option.
437

-------
° The "Iterative OLS w/o Leverage" requires the input of a
"Critical Alpha."
° The "Biweight w/o Leverage" requires the input of a
"Tuning Constant."
D The "Huber w/o Leverage" requires the input of an
"Influence Function Alpha."
B The "MVT w/o Leverage" requires the input of a "Critical
Alpha" and a "Trimming Percentage."
D The "PROP w/o Leverage" requires the input of an
"Influence Function Alpha."
^Selecj'Rcgrcss.iDniMcthodlGonigci.nsor'O^tionsj
¦Select Recession Method Is) —
r OLS
T LMS
f"~ Iterative OLS w/o Leverage
W Iterative OLS with Leverage
r Biweight w/o Leverage
W Biweight with Leverage
P Huber w/oLeveiage
|7 Huber wth Leveiage
l~~ MVT w/o Leverage
F? MVT wih Leverage
l~~ PROP w/o Leverage
1^ PROP with Leverage
-Reg.ltejatrveOLS and/or MVT ~
005
Dibcal Alpha
-Reg. Biwetj^it Tuning Constant
Tinng Constant
-Reg Huber and/or PROP
| 005
Influence Function Alpha
- Regression MDs Distribution
(• Beta f Chsquare
Nirrbei of Regiessbn Iteration*
I 1°
[Max - 50)
Reg Mukivariate Ticmming
I 01
T rmmng
Per centaoe
Tdle for Graph
RegiesnonUnePta
• Leverage Distance Method
f* Classical
C Sequential Oasscat
<** Huber
<* PROP
MVT (Tiimmmg)
"Irubal Leverage Estmates 	
C Classical
C Sequenlial Classical
f Robust (Median, MAO)
OKG (MatomaZamai 1
C KG [Not Oithogonafczed)
C MCD
[-Lev Hiber arid/or PROP	
| 005
Influence Finctwn Alpha
"Leveiage MDs Dutrbution -
<• Beta Chisquaie
"NumberofLeveiageIterations ~~
10
[Max -50]
OK
Cancel
o Options in the window selected above represent options for the
regression methods with leverage.
n The "Leverage Distance Method" remains the same for
any of the regression methods.
° The "Classical" and "Sequential Classical" requires the
input of a "Critical Alpha"
n The "Huber" and "PROP" requires the input of an
"Influence Function Alpha" and the "Leverage MDs
Distribution."
n The "MVT" requires the input of a "Critical Alpha" and a
"Trimming Percentage."
° The Leverage Distance Method requires an "Initial
Leverage Estimates" selection to start the computations.
438

-------
Graphical Display for Method Comparisons Option.
Data Set: Bradu (XI vs. Y).
Methods: OLS, LMS, PROP w/o Leverage (PROP influence function alpha for regression outliers = 0.2).
Regression Line Plot
Data Set: Bradu (X3 vs. Y).
Methods: OLS, LMS, PROP w/o Leverage (PROP influence function alpha = 0.2).
Regression Line Plot
439

-------
It is noted that the LMS (green line) method finds different sets of outliers when
compared to the PROP (violet line) method. As shown earlier, in multiple linear LMS
regression of y on xl, x2, and x3, observations 1 through 10 were identified as regression
outliers (and bad leverage points). Here the LMS regression of y on xl (and also of y on
x2) also identified the first 10 points as regression outliers; whereas the LMS regression
of y on x3 identified observations 11, 12, 13, and 14 as bad leverages and regression
outliers. However, the PROP method, without the leverage option, identified
observations 11, 12, 13, and 14 as regression outliers and bad leverage points for all of
the regression models: y vs. xl, x2, and x3; y vs. xl; y vs. x2; and y vs. x3. In practice, it
is desirable to supplement statistical results with graphical displays. In the present
context, graphical displays also help the user to determine points that may represent good
(or bad) leverage points. Without the first 10 points, this data set should be used to
obtain any regression model.
Output for Method Comparisons.
Data Set: Bradu (XI vs. Y).
Methods: PROP w/o Leverage (influence function alpha = 0.2), PROP with Leverage (initial estimate as
OKG, influence function as 0.05), and OLS.
440

-------
Output for Method Comparisons.
Data Set: Bradu (XI vs. Y).
Methods: All (12 methods).
Regression Line Plot
las
| IMS
| Iterative CCS (VNthout Leverage)
¦	lerative OLS (AKh Leverage)
¦	PROP (VVBioul Leverage)
| PROP (V#h Leverage)
Huber QMhOUl Leverage)
Huber (V*h Leverage)
3 Biwetyt (VWKXI Leverage)
I Biwe«gtt (V*h Leverage)
MVT (Vstthout Leverage)
MVT PA*h Leverage)
-1.2 -0.2 0.8 1.8
10.8 11.8
As mentioned before, the user should select the various options carefully. It is suggested
not to select all of the available options to generate a single graph. Such a graph will be
cluttered with many regression lines. This is illustrated in the above figure.
Note: Sometimes a line will be outside the frame of the graph. In such cases, a warning message (in
orange) will be printed in the Log Panel.
441

-------
9.9.2 Multivariate R-R Plots
I. Click Regression I> Method Comparison > Multivariate R-R Plot.
HI Scout' 2008; [D:\yarain\ScgiitjJori_Windo^w^cgutSource\WorkDalln(^elV^TARGLS]|
Regression
pQ File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QjtyQC |
Navigation Panel I
Multivariate EDA GeoStats Programs Window Help
Name
D \Narain\Scout
Fo .
D \Narain\Scout
Fo
Index Plot gst

Index Plot_a gst

InterQC gst

D.\Narain\Scout_
Fo.

a
1 | 2
z

y ¦
x I

1
5 23
4 37|

2
5 74
4 56


3
493
426


4
5.74
4 56


5
519
43


6
546
4 46


1 ^
465
384


OLS
LMS/LPS
Iterative OLS
Biweitfit
Huber
MVT
PROP
Method Comparison ~
Brvariate Regression Fits j
i i
I "
Multivariate R-R Plot I
Multivariate Y-Y-Hat Plot |
2. The "Select Variables" screen (Section 3.3) will appear.
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
Click on the "Options" button to get the options window.
o The options in the window shown below are the options when all
the check-boxes in the "Method(s) for Residuals on Y-Axis" are
checked. The default option is of plotting the "Observed Y"
against "OLS" residuals.
9 Regression MethodjComf>qusoniResidualrResidua(iGoniparjsoniOpJions
"Method (or Residuals onX-Axis
(• ObservedY
r ols
r lms
r LPs
Iterative OLS w/o Leverage
f Iterative OLS with Leverage
C BtweigM w/o Leverage
C Biweight with Leverage
Huber w/o Leverage
C Huber with Leverage
MVT w/o Leverage
C MVT with Leverage
C PROP w/o Leverage
PROP with Leverage
I to
[Max-100]
-Residuals MD* Distnbution 	
(* Beta C Chisquare
fv Store Residuals to Worksheet
W Use Defaut Title
*Method(s) for Residuals on Y-Axts
W OLS
9 LMS
f* LPS
(7 Iterative OLS w/o Leverage
W Iteratrve OLS with Leverage
0 (weight w/o Leverage
BweigUwAhLeveiage
ft Huber w/o Leverage
f? Huber with Leverage
17 MVT w/o Leverage
ft MVT with Leverage
ft PROP w/o Leverage
ft PROP with Leverage
-Number of Regression Iterations ~
¦Reg Iterative OLS and/or MVT
0 05
Critical Alpha
~Reg Huber arid/c* PROP
| 0 05
Influence Function Alpha
• Leverage Distance Method
C Classical
C Sequential Classical
C Huber
PROP
• C MVT (Tiimnung)
- Initial Leverage Estimates 	
C Classical
C* Sequential Classical
C Robust (Median, 1 48MAD)
(* 0KG (MaionnaZamar)
C KG (NotOilhogonaicedJ
r mcd
-Reg MultivariateTnmmtng
I oi
T rimming Percentage
ft Display User Selections
• Reg Bweight T untng Corotart
Tumng Constant
Nun. of Leverage Iterations —
I iff
[Max ° 50]
- Lever ace MDi Out/but ion
f* Beta C Chisquare
~
*~Lev Huber and/or PROP
| 005
Influence Function Alpha
•LMS/LPS Search Strategy
C Al Combinations
(• Extensive
LPS Petcentrte •
| 05
Mrumizaticn Criterion
OK
Cancel
442

-------
o The options required for the various regression methods are
discussed in the previous sections of this chapter.
o Select a method for X-axis and one or more methods for the Y-
axis.
o Specify the required parameters of the selected methods in the
various options boxes.
o "Display User Selections" option stores the user selected options
for the various methods into a new worksheet for reference.
o "Store Residuals to Worksheet" options stores the residuals of
each of the selected y-axis methods and the x-axis method in a new
worksheet.
o Click on "OK" to continue or "Cancel" to cancel the options
window.
• Click on "OK" to continue or "Cancel" to cancel the generation of R-R
Plots.
Output for R-R Plots.
Data Set: Bradu.
Methods: 13 (All) methods on Y-axis vs. Observed Y on X-axis.
295

Residuals versus Residuals Plot




23.4


Lev. # Sequential OLSwthLev."--BiweigK wfoLev.
MVTwjtoLev. jgMVT with Lev. PRC*> wto Lev. PROPwUh Lev
| Biweight w*h Lev.
443

-------
Data Set: Bradu.
Methods: 5 (OLS, LMS, Biweight, Huber and PROP with leverages) methods on Y-axis vs. OLS on X
axis.
5 OpIionsRegressionMCXY
Method foe Residuals onXAws
j	Observed Y
f	OLS
j C	LMS
, r	LPS
f	Iterative OLS w/o Leverage
C	Itaalive OLS with Leverage
!	BiweigN w/o Leverage
C	BiweigN with Leverage
| f	Hubet w/o Leverage
I f	Hubet with Leverage
[	MVT w/o Leverage
i C	MVT with Leverage
| C	PROP w/o Leverage
I <"*	PROP with Leverage
Number of Regression Iterates
I
(Max-100]
Residuals MDs Distribution
G Beta Chnquare
f" Use Default Title
Method(s) for Residuals on Y-Axjs
!3»	OLS
I?	LMS
r	lps
r	Iterative OLS w/o Leverage
r	Iterative OLS with Leverage
r	3*ve«gN w/o Leverage
I*	Bweight with Leverage
r	Huber w/o Leverage
&	Huber with Leverage
r	MVT w/o Leverage
r	MVT with Leverage
r	PROP w/o Leverage
17 PROP with Leverage
Reg Huber and/or PROP
| 0.05
Influence Function Alpha
Leverage Distance Method
Classical
Sequential Classical
f* Huber
C PROP
MVT (T rimming)
Initial Leverage Estimates
C Classical
f* Sequential Classical
Robust (Median. 1.48MAD)
<• OKG (M aroma Zamar)
KG (Not OrthogonaSied)
r MCD
Reg BiweigN Tunrig Constant
T uning Constant
Num. of Leverage Iterations
10
[Max - 50]
Levetaae MDs Distribution
Beta f Chisquare
Lev. Huber and/or PROP
| 0.05
Influence Function Alpha
LMS/LPS Search Strategy
Al Combinations
• Extensive
444

-------
9.9.3 Multivariate Y-Y-hat Plots
I. Click Regression ^ Method Comparison > Multivariate Y-Y-hat Plot.
H Scout 2Qp.^^[D:;\^arainAScoutul;or,J/jhdows^Sc<)UtSDurge3V/.br^atrnEwel\BBADy.ixls]J
Regression
ny File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QA/QC
Navigation Panel
Multivariate EDA GeoStats Programs Window Help
Name |
D \Narain\Scout
Fd .
0 \Narain\Scout
Fo
tndex Plot gst

lndex_Plot_a gst

InterQC gst

D \Narain\Scout_
liSTCElETOGfflffll
Fo
ReqRR qst
Count
x1
1
	
1
9 7t
101

2

2
1011
95

3

3
10 3!
107

4

4
9 5!
99

5

5
V
103

E

6
__ J -
108|
10.8

7

7
105

OlS
j LMS/LP5
Iterative OLS
Biweight
Huber
MVT
PROP
20 4[
-209|
29 2
291
Bivariate Regression Fits
Multivariate R-R Plot
is
2. The "Select Variables" screen (Section 3.3) will appear.
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° Click on the "Options" button to get the options window.
o The options in the window shown below are the options when all
the check-boxes in the "Method(s) for Fits on Y-Axis" are
checked. The default option is of plotting the "Observed Y"
against "OLS" fits.
§sl Regression MethodiComparison Y-Y-hat/Comparison,Options,
"Method for Fa* onX-Aat 	
(* ObseivedY
r ols
r lms
r lps
Iteratrve OLS w/o Leverage
C Iterative OLS with Leverage
f" Brweit^it w/o Leverage
Brweight with Leverage
f Huber w/o Leverage
I C Huber with Leverage
| f* MVT w/o Leverage
| C MVT with Leverage
j C PROP w/o Leverage
I C PROP with Leverage
10
[Max = 1CH3]
r Store Y-hats lo Worksheet
Use Defaiit Title
¦Method(j)fo( F*$ onY-Aas 	
W OLS
I? LMS
& LPS
17 Iteiative OLS w/o Leverage
W Iteiative OLS with Leverage
P7 Biweight w/o Leverage
W BiwerghtwAh Leverage
W Huber w/o Leveiage
Huber with Leverage
W MVT w/o Leverage
W MVT with Leverage
[7 PROP w/o Leverage
W PROP with Leverage
•Nimbero/ Regretsronlteratwris
"Reg. Iterative OLS and/or MVT
005
Critical Alpha
~Reg Huber and/or PROP
"Res'iduabMDsDistiibutjon "
(* Beta C Chitquaie
j 005
Influence Find on Alpha
V Display User Selections
' Leverage Distance Method
C Class teal
C SequenbdOasstcal
C Huber
PROP
C MVT (Tinmng)
Tunrig Constant
• Intel Leveiage Estinates 	
C Classical
Sequential Classical
C Robust (Median, 1 48MAD)
(* 0KG (MaromaZamar)
f KG (NotOitbogonafced)
r MCD
-Num of Leveiage Iterations
I io
[Max = 50]
pLeveraoe MDi Distribution —i
(* Beta C Chsquaie
"Lev. Huber and/or PROP "
| 005
Influence Function Alpha
"Reg Multivariate Trimming 	
I
Tnmmmg Percentage
'Reg Brwetght Turung Constant —
"LMS/LPS Sejich Strategy
f A1 Combinations
(* Extensive
[-LPS Percentle •
I 05
Minmeation Cnterion
445

-------
o The options required for the various regression methods are
discussed in the previous sections of this chapter.
o Select a method for X-axis and one or more methods for the Y-
axis.
o Specify the required parameters of the selected methods in the
various options boxes.
o "Display User Selections" option stores the user selected options
for the various methods into a new worksheet for reference.
o "Store Residuals to Worksheet" options stores the residuals of
each of the selected y-axis methods and the x-axis method in a new
worksheet.
o Click on "OK" to continue or "Cancel" to cancel the options
window.
• Click on "OK" to continue or "Cancel" to cancel the generation of Y-Y-
hat Plots.
Output for Y-Y-hat Plots.
Data Set: Bradu.
Methods: 13 (All) methods on Y-axis vs. Observed Y on X-axis.
51-2 2	-1.2	-0.2	0 8	1.8	2.8	3.8	4.8	5.8	6 8	7.8	88	9.8	10.8	11.8
Observed Y
JlOLS	LMS	ALPS	Sequential OLS vWb Lev. it Sequential OLS with Lev — Biweight vWo Lev.	\ Biweigtt wth Lev.
-f-Huber wfe Lev	V Hubet wth Lev	MVT wta Lev	flfMVT wth Lev.	PROP wto Lev.	PROP wih Lev.
446

-------
Data Set: Bradu.
Methods: 5 (OLS, LMS, Biweight, and PROP with leverages) methods on Y-axis vs. OLS on X-axis.
31 GptionsRegressionMC_XY
Method for Fits onX-Axis
f	Observed Y
<5"	OLS
r	LMS
r	lps
| r	Iterative OLS w/o Leverage
I C	Iterative OLS with Leverage
. C	Biweight w/o Leverage
C	Biweight with Leverage
I f	Huber w/o Leverage
; f	Huber iMth Leverage
: f	MVT w/o Leverage
MVT with Leverage
; C	PROP w/o Leverage
j	PROP with Leverage
Number of Regression Iterations
10
[Max »100]
Residuals MDs Distribution
~ Beta C Chisquare
f~ Use Default Title
Methods) for Fits on Y-Axis
!?	OLS
R?	LMS
r	LPS
r*	Iterative OLS w/o Leverage
P	Iterative OLS with Leverage
f	Biweight w/o Leverage
v	Biweight with Leverage
V	Huber w/o Leverage
r	Huber with Leverage
r* MVT w/o Leverage
T	MVT with Leverage
r	PROP w/o Leverage
'¦? PROP with Leverage
Reg. Huber and/or PROP
j 0.05
Influence Function Afc>ha
Leverage Distance Method
Oassical
C Sequential Oassical
Huber
'• PROP
r MVT (T rimming)
Initial Leverage Estimates
C Classical
C Sequential Classical
C Robust (Median. 1.48MAD)
(* 0KG {Maionna Zamar)
KG [NotOrthogonafced]
C MCD
Reg. Biweight Tuning Constant
T uning Constant
Num. of Leverage Iterations
| ^
[Max - 50)
Leveraoe MDs Distribution
Beta C Chisquare
Lev. Huber and/or PROP
0.05
Influence Function Alpha
LMS/LPS Search Stiategy
f All Combinations
'* Extensive

Y-hat versus Y-hat Plot



3.3









7.3




h3
S.3 ¦




43




„ 13




¦I




^ 23




t.3




03




-07




-17




-Z7
_-i *



•3£7




-a.1
-1.1 -0.1 05 19 2.9 13 4 9 $9 69
OLS Yhat
79
89
99
*OLS
• I.MS A wth Lev PROP with Lev



447

-------
Data Set: Bradu.
Methods: 3 (OLS, PROP with and without leverage) methods on Y-axis vs. PROP with leverage on X-
axis.
^	X
Method for Fits on X-Axis
r Observed Y
r ols
^ LMS
r lps
f Iterative OLS w/o Leverage
C Iterative OLS with Leverage
Biweight w/o Leverage
C Biweight with Leverage
<" Huber w/o Leverage
<" Huber with Leverage
^ HVT w/o Leverage
MVT with Leverage
I f PROP w/o Leverage
<* PROP with Leverage
Number of Regression Iterations
I 10
[Max-100]
Residuals MDs Distribution
<* Beta C Chisquaie
f Use Default Title
Method(s) for Fits on Y-Axis
W	OLS
r	LMS
r	lps
r	Iterative OLS w/o Leverage
f~*	Iterative OLS with Leveiage
P	Biweight w/o Leverage
P	Biweight with Leverage
Huber w/o Leveiage
r*	Huber with Leverage
r~	MVT w/o Leverage
r*	MVT with Leverage
&	PROP w/o Leveiage
5*	PROP with Leverage
Reg. Huber and/or PROP
| 0.05
Influence Function Alpha
Leverage Distance Method
r Classical
<"* Sequential Classical
C Huber
a prop
MVT (Trimming)
Initial Leva age Estimates
C Classical
Sequential Classical
C Robust (Medan. 1.48MAD]
OKG (MaronnaZamar)
C KG (Not Orthogonafizedl
r MCD
Num. of Leverage Iterations
I 10
[Max • 50)
Leveiaae MDs Distribution
<* Beta Chisquare
Lev. Huber and/or PROP
| 005
Influence Function Alpha

Y-hat versus Y-hat Plot


14.4



13.4



12.4



11.4



10.4
a * %


94



8.4



7.4
4 » *


j2 64
>
5.4
4


4.4



3.4



2.4



14
0.4
¦¦¦ '. 't'. k-'' > '

T-X
-OB
-16
-—.	


2B-037
-027 -D.17 43.07 0.03 0.13
PROP with Leverage Y-hat
023
0.33 0.33
JiOLS
:(f PROP w/o Lew. A PRO? wtl. Lev.


448

-------
References
Agullo, J. (1997). "Exact Algorithms to Compute the Least Median of Squares Estimate in
Multiple Linear Regression," in Statistical Procedures and Related Topics, ed. Dodge, Y.,
Institute of Mathematical Statistics, Hayward, CA, 133-146.
Belsley, D.A., Kuh, E., and Welsch, R.E. (1980). Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity, John Wiley and Sons, NY.
Chatterjee, S. and Machler, M. (1997). "Robust regression: A weighted least squares
approach," Communications in Statistics Theory and Methods, 26, 1381-1394.
Cook, R.D., and Weisberg, S. (1982). Residuals and Influence in Regression, Chapman &
Hall, London.
Dollinger, M.B., and Staudte, R.G. (1991). "Influence Functions of lteratively Reweighted
Least Squares Estimators," Journal of the American Statistical Association, 86, 709-716.
Draper, N.R., and Smith, H. (1984). Applied Regression Analysis, 2nd ed., John Wiley and
Sons, NY.
Gnanadesikan, R., and Kettenring, J.R. (1972). "Robust Estimates, Residuals, and Outlier
Detection with Multi-response Data," Biometrics, 28, 81-124.
Hadi, A.S., and Simonoff, J.S. (1993). "Procedures for the Identification of Multiple
Outliers in Linear Models," Journal of the American Statistical Association, 88, 1264-
1272.
Hawkins, D.M., Bradu, D., and Kass, G.V. (1984). "Location of Several Outliers in
Multiple Regression Data Using Elemental Sets," Technometrics, 26, 197-208.
Hawkins, D.M., and Simonoff, J.S. (1993). "High Break Down Regression and Multivariate
Estimation," Applied Statistics, 42, 423-432.
Hettmansperger, T.P., and Sheather, S.J. (1992). "A Cautionary Note on the Method of
Least Median Squares," The American Statistician, 46, 79-83.
Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman W. (1996). Applied Linear
Statistical Models, 4th ed., McGraw-Hill, Boston.
Olive, D.J. (2002). "Applications of Robust Distances for Regression," Technometrics, 44,
64-71.
Rousseeuw, P.J. (1984). "Least Median of Squares Regression," Journal of the American
Statistical Association, 79, 871-880.
449

-------
Rousseeuw, P.J., and Leroy, A.M. (1987). Robust Regression and Outlier Detection, John
Wiley and Sons, NY.
Ruppert, D. (1992). "Computing S-Estimators for Regression and Multivariate
Location/Dispersion," Journal of Computational and Graphical Statistics, 1, 253-270.
Ruppert, D., and Carroll, R.J. (1980). "Trimmed Least Squares Estimation in the Linear
Model," Journal of the American Statistical Association, 75, 828-838.
Simpson, D.G., Ruppert, D., and Carroll, R.J. (1992). "On One-Step GM Estimates and
Stability of Inferences in Linear Regression," Journal of the American Statistical
Association, 87,439-450.
Simpson, J.R., and Montgomery, D.C. (1998a). "The development and evaluation of
alternative generalized M estimation techniques," Communications in Statistics —
Simulation and Computation, 27, 999-1018.
Singh, A. and Nocerino, J.M. (1995). Robust Procedures for the Identification of Multiple
Outliers, Handbook of Environmental Chemistry, Statistical Methods, Vol. 2. G, pp. 229-
277, Springer Verlag, Germany.
Stromberg, A.J. (1993). "Computing the Exact Least Median of Squares Estimate and
Stability Diagnostics in Multiple Linear Regression," SI AM Journal of Scientific and
Statistical Computing, 14, 12891299.
Welsh, A.H. (1986). "Bahadur Representations for Robust Scale Estimators Based on
Regression Residuals," The Annals of Statistics, 14, 1246-1251.
Yohai, V.J., and Zamar, R.H. (1988). "High break down point estimates of regression by
means of the minimization of an efficient scale," Journal of the American Statistical
Association, 83, 406^ 13.
450

-------