-------
Notice
The United States Environmental Protection Agency (EPA) through its Office of
Research and Development (ORD) funded and managed the research described here. It
has been peer reviewed by the EPA and approved for publication. Mention of trade
names and commercial products does not constitute endorsement or recommendation by
the EPA for use.
The Scout 2008 software was developed by Lockheed-Martin under a contract with the
USEPA. Use of any portion of Scout 2008 that does not comply with the Scout 2008
User Guide is not recommended.
Scout 2008 contains embedded licensed software. Any modification of the Scout 2008
source code may violate the embedded licensed software agreements and is expressly
forbidden.
The Scout 2008 software provided by the USEPA was scanned with McAfee VirusScan
and is certified free of viruses.
With respect to the Scout 2008 distributed software and documentation, neither the
USEPA, nor any of their employees, assumes any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information, apparatus, product, or process
disclosed. Furthermore, the Scout 2008 software and documentation are supplied "as-
is" without guarantee or warranty, expressed or implied, including without limitation, any
warranty of merchantability or fitness for a specific purpose.
in
-------
Acronyms and Abbreviations
% NDs Percentage of Non-detect observations
ACL alternative concentration limit
A-D, AD Anderson-Darling test
AM arithmetic mean
ANOVA Analysis of Variance
AOC area(s) of concern
B* Between groups matrix
BC Box-Cox-type transformation
BCA bias-corrected accelerated bootstrap method
BD break down point
BDL below detection limit
BTV background threshold value
BW Black and White (for printing)
CERCLA Comprehensive Environmental Response, Compensation, and
Liability Act
CL
compliance limit, confidence limits, control limits
CLT central limit theorem
CMLE Cohen's maximum likelihood estimate
COPC contaminant(s) of potential concern
C V Coefficient of Variation, cross validation
D-D distance-distance
DA discriminant analysis
DL detection limit
DL/2 (t) UCL based upon DL/2 method using Student's t-distribution
cutoff value
DL/2 Estimates estimates based upon data set with non-detects replaced by half
of the respective detection limits
DQO data quality objective
DS discriminant scores
EA exposure area
EDF empirical distribution function
EM expectation maximization
EPA Environmental Protection Agency
EPC exposure point concentration
FP-ROS (Land) UCL based upon fully parametric ROS method using Land's H-
statistic
v
-------
Gamma ROS (Approx.) UCL based upon Gamma ROS method using the bias-corrected
accelerated bootstrap method
Gamma ROS (BCA) UCL based upon Gamma ROS method using the gamma
approximate-UCL method
GOF, G.O.F. goodness-of-fit
H-UCL UCL based upon Land's H-statistic
HBK Hawkins Bradu Kaas
HUBER Huber estimation method
ID identification code
¦QR interquartile range
K Next K, Other K, Future K
KG Kettenring Gnanadesikan
KM (%) UCL based upon Kaplan-Meier estimates using the percentile
bootstrap method
KM (Chebyshev) UCL based upon Kaplan-Meier estimates using the Chebyshev
inequality
KM (t) UCL based upon Kaplan-Meier estimates using the Student's t-
distribution cutoff value
KM (z) UCL based upon Kaplan-Meier estimates using standard normal
distribution cutoff value
K-M, KM Kaplan-Meier
K-S, KS Kolmogorov-Smirnov
LMS least median squares
LN lognonnal distribution
Log-ROS Estimates estimates based upon data set with extrapolated non-detect
values obtained using robust ROS method
LPS least percentile squares
MAD
Median Absolute Deviation
Maximum Maximum value
MC minimization criterion
MCD minimum covariance determinant
MCL maximum concentration limit
MD Mahalanobis distance
Mean classical average value
Median Median value
Minimum Minimum value
MLE maximum likelihood estimate
MLE (t) UCL based upon maximum likelihood estimates using Student's
t-distribution cutoff value
vi
-------
MLE (Tiku) UCL based upon maximum likelihood estimates using the
Tiku's method
Multi Q-Q multiple quantile-quantile plot
MVT multivariate trimming
MVUE minimum variance unbiased estimate
ND non-detect or non-detects
NERL National Exposure Research Laboratory
NumNDs Number of Non-detects
NumObs Number of Observations
OKG Orthogonalized Kettenring Gnanadesikan
OLS ordinary least squares
ORD Office of Research and Development
PCA principal component analysis
PCs principal components
PCS principal component scores
PLs prediction limits
PRG preliminary remediation goals
PROP proposed estimation method
Q-Q quantile-quantile
RBC risk-based cleanup
RCRA Resource Conservation and Recovery Act
ROS regression on order statistics
RU remediation unit
S substantial difference
SD, Sd, sd standard deviation
SLs simultaneous limits
SSL soji screening levels
S-W, SW Shapiro-Wilk
TLs tolerance limits
UCL upper confidence limit
UCL95, 95% UCL 95% upper confidence limit
UPL upper prediction limit
UPL95, 95% UPL 95"% Upper prediction limit
USEPA United States Environmental Protection Agency
UTL upper tolerance limit
Variance classical variance
W* Within groups matrix
vii
-------
WiB matrix Inverse of W* cross-product B* matrix
WMW Wilcoxon-Mann-Whitney
WRS Wilcoxon Rank Sum
WSR Wilcoxon Signed Rank
Wsum Sum of weights
Wsum2 Sum of squared weights
viii
-------
Table of Contents
Notice iii
Acronyms and Abbreviations v
Table of Contents ix
Chapter 9 341
Regression 341
9.1 Ordinary Least Squares (OLS) Linear Regression Method 343
9.2 OLS Quadratic/Cubic Regression Method 351
9.3 Least Median/Percentile Squares (LMS/LPS) Regression Method 357
9.3.1 Least Percentile of Squared Residuals (LPS) Regression 365
9.4 Iterative OLS Regression Method 376
9.5 Biweight Regression Method 387
9.6 Huber Regression Method 400
9.7 MVT Regression Method 411
9.8 PROP Regression Method 422
9.9 Method Comparison in Regression Module 435
9.9.1 Bivariate Fits 436
9.9.2 Multivariate R-R Plots 442
9.9.3 Multivariate Y-Y-hat Plots 445
References 449
ix
-------
Chapter 9
Regression
»
The Regression module in Scout also offers most of the classical and robust multiple
linear regression (including regression diagnostic methods) methods available in the
current literature, similar to the Outlier/Estimates module. The multiple linear regression
model with p explanatory (x-variables, leverage variables) variables is given by:
y,= (x,\b + x,ib2 + x,Pbp)+ e, ¦
The residuals, e„ are assumed to be normally distributed as N(0,S 2); i = I, 2, ..., n.
The classical ordinary least square (OLS) method has a "0" break down point and can get
distorted by the presence of even a single outlier, as in the classical mean vector and the
covariance matrix.
Let = (xlX,xl2,...,xip), b'= (bvb2,...,bp).
A
The objective here is to obtain a robust and resistant estimate, b , of b using the data set,
(y,,x\y, i = 1, 2, ..., n. The ordinary least squares (OLS) estimate, b0!S,ofb is
n
obtained by minimizing the residual sum of squares; namely, ]T r,% where
^ A
rt = y, - x'b0ls. Like the classical mean, the estimate, b0LS, of b has a "zero" break
A
down point. This means that the estimate, b0LS, can take an arbitrarily aberrant value
even by the presence of a single regression outlier (y-outlier) or leverage point (x-outlier),
leading to a distorted regression model. The use of robust procedures that eliminate or
dampen the influence of discordant observations on the estimates of regression
parameters is desirable.
In regression applications, anomalies arising out of p-dimension space of the predictor
variables, (e.g., due to unexpected experimental conditions), are called leverage points.
Outliers in the response variable (e.g., due to unexpected outcomes, such as unusual
reactions to a drug), are called regression or vertical outliers. The leverage outliers are
divided into two categories: significant leverages ("bad" or inconsistent) and insignificant
("good" or consistent) points.
The identification of outliers in a data set and the identification of outliers in a regression
model are two different problems. It is very desirable that a procedure distinguishes
between good and bad outliers. In practice, in order to achieve high break down point,
some methods (e.g., LMS method) fail to distinguish between good and bad leverage
points.
341
-------
In robust regression, the objective is twofold: 1) the identification of vertical (y-outliers,
regression outliers) outliers and distinguishing between significant and insignificant
leverage points, and 2) the estimation of regression parameters that are not influenced by
the presence of the anomalies. The robust estimates should be in close agreement with
classical OLS estimates when no outlying observations are present. Scout also offers
several formal graphical displays of the regression and leverage results.
Scout provides several methods to obtain multiple linear regression models. Those
available options include:
® Ordinary Least Squares Regression (OLS)
Minimizes the least squared residuals.
° Least Median/Percentile Squares Regression (LMS/LPS)
Minimizing the "hth" ordered squared residuals (Rousseeuw, 1984).
® Biweight Regression
Conducted using Tukey's Biweight criterion (Beaton and Tukey, 1974).
• Huber Regression
Conducted using Huber influence function (Huber, 1981).
o MVT Regression
Conducted using Multivariate Trimming Methods (Devlin et al., 1981).
° PROP Regression
Conducted using PROP influence function (Singh andNocerino, 1995).
Scout also provides the user with the option of identifying leverage outliers. If the
leverage option is selected, then the outliers arising in the p-dimensional space of the
predictor variables (X-space) are identified first. Those leverage points can be identified
using various options available in Scout. The leverage points are identified using the
same outlier methods as incorporated in the outlier module of Scout. The MDs for the
leverage option are computed using the selected x-variables only. The weights obtained
used in the leverage option are used at the initial regression option. The regression option
is iterated some number of times to identify all of the regression outliers and bad leverage
points. This process also distinguishes between good and bad leverage points.
342
-------
9.1 Ordinary Least Squares (OLS) Linear Regression Method
i.
Click Regression > OLS > Multiple Linear.
Scout 2008. - [p:\Haraia\W.orkPatlhE>(ccl\Wood]| '
50 File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QA/QC |
Navigation Panel I
Multivariate EDA Geo5tats Programs Window Help
Name
D:\NarainVWotkDall...
OLSOut ost
OLSresQQ gst
OLSresXY gst
OLSresNDX gst
nI ^ YYh^t net
0
1
2
Case
*1
*2
a
1
1
0 573
01059
2
2
0 651
0 1 356,
3
3
0 606
01273
I 4
4
0 437
0159V
5
5
0 547
01135
IMS/LPS
Iterative OLS
Biweight
Huber
MVT
PROP
Multiple Linear
Quadratic
Cubic
Method Comparison ~ [5
0 535|
057j~
"045j '
"Mi"
2. The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o Click on the "Options" button.
t*7 Display Intervals
Confidence Coefficient
0.95
F? Display Diagnostics
OK
Cancel
A
o The "Display Intervals" check box will display the "Summary
Table for Prediction and Confidence Limits" in the output sheet.
o The "Display Diagnostics" check box will display the
"Regression Diagnostics Table" and the "Lack of Fit ANOVA
Table" (only if there are replicates in the independent variables).
o Click "OK" to continue or "Cancel" to cancel the options.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
343
-------
user should select and click on an appropriate variable representing a
group variable.
e Click on the "Graphics" button and check all boxes.
HI Select'OLS,Graphics Options.
17 XY Plots
[7 Yvs Y-Hat
17 Y vs Residuals
17 Y-Hal vs Residuals
17 Index Plots
17 QQ Residuals
XY Plot Title
Linear OLS Regression ¦ Y vsX PI
Y vsY-Hat Title
Linear OLS Regression - Y vsY-H
Y vs Residuals Title
Linear OLS Regression ¦ Y vs Res
Y-Hat vs Residuals Title
Linear OLS Regression - Y-Hat vs
XY Plot Title
"Regression Line - Fixing Other Regressors at ~
C No Line 17 Confidence Interval
C Minimum Values
(* Mean Values
C Maximum Values
C Zero Values
17 Predection Interval
|— Confidence Coefficient
0 95
Linear OLS Regression ¦ Residual
QQ Residuals Title
OK
Cancel
Linear OLS Regression - Residual
A
o A regression line can be drawn in the multivariate setting by
choosing a single independent (regressor) variable and fixing other
variables at the provided options using "Regression Line - Fixing
Other Regressors at" option.
o Specify the confidence or/and prediction band for the regression
line using the "Confidence Intervals" and the "Prediction
Intervals" check boxes.
o Specify the "Confidence Level" for the bands.
o Click "OK" to continue or "Cancel" to cancel the options.
° Click "OK" to continue or "Cancel" to cancel the OLS procedure.
344
-------
Output for OLS Regression.
Data Set used: Wood (predictor variables p = 5).
Ordinary Least Squares Linear Regression Analysis Output
Date/T ime of Computation
10/30/200811.05:40 AM
User Selected Options
From File
D: \N arainW/ orkD atl nE xcelVWood
Full Precision
OFF
Confidence Level for Intervals
0.95
lay Confidence and Prediction Limits
True
Display Regresion Diagnostics
True
Title for Residual QQ Plot
Linear OLS Regression - Residuals QQ Plot
Title Residual Index Plot
Linear OLS Regression ¦ Residuals Index Plot
Title For Y vsX Plots
Linear OLS Regression - Y vsX Plot
Confidence Level for Regression Line
0 95
Display Confidence Band
True
Display Prediction Band
True
Title for Y-Hat vs Residuals Plot
Linear OLS Regression - Y-Hat vs Residuals Plot
Title for Y vs Residuals Plot
Linear OLS Regression - Y vs Residuals Plot
Title for Yvs Y-Hat Plot
Linear OLS Regression - Y vs Y-Hat Plot
Number of Observations
20
Dependent Variable
y
Number of Selected Regression Variables
5
Independant Variable
x1
Independant Variable
x2
Independant Variable
x3
Independant Variable
x4
Independant Variable
x5
Correlation Matrix
y
x1
x2
x3
x4
x5
y
1
•0145
0G11
0 47
-0G
0G29
x1
-0.145
1
-0 24G
-0 G04
0 528
-0.G41
x2
0S11
-0 24G
1
0 388
-0.498
0 248
x3
0 47
-0.G04
0 388
1
-0 24
0G59
x4
-0G
0 528
-0 498
-0 24
1
-0 512
x5
0G29
•0 G41
0 248
0 659
-0 512
1
345
-------
Eigenvalues of Correlation Matrix
Eval 1
Eval 2
Eval 3
Eval 4
Eval 5
Eval 6
3 357
1.114
0.713
0.588
0173
0.054
Sum of Eigenvalues
6
Regression Estimates and Inference Table
Paramater
DOF
Estimates
Std. Error
T-values
p-values
T ol Values
VIF
intercept
1
0.422
0.169
2.494
0.0253
N/A
N/A
x1
1
0.441
0.117
3.77
0.00222
0.27
3.701
x2
1
-1.475
0.487
-3.029
0.00931
0.264
3.786
x3
1
-0.2G1
0.112
-2.332
0.0339
0 583
1.715
x4
1
0.0208
0.161
0.129
0.388
0.299
3.346
x5
1
0.171
0.203
0.84
0.27
0.268
3.725
OLSANOVA Table
Source of Variation
ss
DOF
MS
F-Value
P-Value
Regression
0.0344
5
0.00687
11.81
0.0001
Error
0.00814
14
5.8158E-4
Total
0.0425
19
R Square
0 808
Adjusted R Square
0.74
Sqrt(MSE) = Scale
0.0241
Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res
1
0.534
0.551
-0.0175
0.278
-0.725
-0 853
2
0.535
0 534
0.00114
0.132
0.0472
0.0507
3
0.57
0.54
0 03
0.22
1.243
1.407
4
0.45
0.441
0.00855
0.258
0.355
0.412
5
0.548
0.524
0.0242
0.222
1.002
1.137
6
0.431
0 442
-0.0109
0.259
-0.452
-0.525
7
0.481
0.459
0.0219
0.53
0.907
1.323
8
0.423
0.424
-8 415E-4
0.289
-0.0349
-0.0414
9
0.475
0 485
-0.00955
0.348
-0.396
-0.49
10
0.488
0.496
-0.01
0 449
0.415
-0.559
11
0.554
0.506
0.0479
0.317
1.986
2.403
346
-------
Output for OLS Regression (continued).
J 1 ' ' I., , , 1 I- L
Summary T able for Prediction and Confidence Linfts
~bs
Y Vector
Yhal
s(Yhat)
s(pred)
LCL
UCL
LPL
UPL
Residuals
1
0 534
0.551
0 0127
0 0273
0 524
0.579
0.493
0.G1
-0.0175
2
0.535
0 534
0 00875
0 0257
0515
0.553
0.479
0 589
0.00114
3
0.57
0 54
00113
0 026G
0.51 G
0.5G4
0.483
0.597
0.03
4
0 45
0.441
0 0123
0.0271
0 415
0 468
0.383
0.499
0.00855
5
0 548
0 524
0.0114
0.0267
0.499
0 548
0.467
0.581
0.0242
G
0 431
0 442
0.0123
0.0271
0.41G
0.4G8
0.384
0.5
-0 0109
7
0.481
0 459
0.017G
0 0298
0.421
0.497
0.395
0 523
0.0219
8
0.423
0.424
0.013
0.0274
0.39G
0.452
0 365
0.483
-8.415E-4
9
0 475
0 485
0 0142
0 028
0 454
0.515
0.424
0.545
-0 00955
10
0.48G
0 49G
0 01G2
0.029
0 4G1
0.531
0.434
0.558
-0 01
11
0 554
0 506
0 0136
0.0277
0 477
0 535
0 447
0.565
0 0479
12
0 519
0 548
0 0154
0.0286
0 515
0 581
0 486
0.609
-0.0289
13
0 492
0 504
0.0129
0.0274
0.47G
0 531
0.445
0.562
¦0 0117
14
0 517
0 547
0 008GG
0 0256
0.529
0 5GG
0 492
0.G02
-0.0304
15
0.502
0 51G
0 00941
0 0259
0.49G
0 536
0.461
0 572
-0.0141
1G
0.508
0 495
0.0175
00298
0 458
0.533
0.431
0.559
0.0126
17
0.52
0 52G
0 013
0.0274
0.498
0.554
0.467
0.585
-0.00615
18
0.50G
0.499
0 0131
0.0274
0 471
0.527
0.44
0.558
0.00G85
19
0 401
0 427
0 013
0.0274
0.399
0.455
0.368
0.48G
-0 0261
20
0 5G8
0 555
0 013G
0.0277
0.526
0.584
0.495
0.G14
0.0131
No replicates in the data - Lack of Fit AN OVA T able not displayed
Regression Diagnostics Table
Obs. #
Residuals
H[u]
CD[i]
t[i]
DFFITG
1
•0.0175
0 278
0.04GG
-0 876
-0.543
2
000114
0132
G 4894E-5
0 0507
0 0197
3
0 03
0 22
0.0929
1.518
0 80G
4
0.00855
0 258
0 00985
0.414
0.245
5
0 0242
0.222
0.0G1G
1.193
0 638
6
-0.0109
0 259
0 0161
-0.53
¦0.314
7
0.0219
0.53
0.329
1.414
1.502
8
-8.415E-4
0 289
1.1582E-4
-0.0414
-0 02G4
9
-0 00955
0 348
0.0214
-0 495
-0 362
10
-0.01
0.449
0 0425
-0.566
-0 511
347
-------
Output for OLS Regression (continued).
OLS Regression - Residuals QQ Plot
[Upper Un-asl
^ -016
-a -0 36
.§-0.56
'0
-0.76
-0.96
Normal Quantiles
348
-------
Output for OLS Regression (continued).
-------
Output for OLS Regression (continued).
« oog RetOud-OJ
J Residual • 0.0
Linear OLS Regression - Y vs Residuals Plot
0384 0 404 0424 0444 0464
0 504 0 524 0544 0 564 0 584 0.6O*
Linear OLS Regression - Y-Hat vs Residuals Plot
259
209
0571 0561
350
-------
9.2 OLS Quadratic/Cubic Regression Method
1. Click Regression > OLS > Quadratic or Cubic.
j Seoul 2008 - [p:\Narain\WoikpatlnExcel\Wood]J
dQ File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QA/QC
Navigation Panel
Multivariate EDA GeoStats Programs Window Help
Name
D:\Narain\Woi1cDatl..
OLSOut ost
OLSresQQ gst
OLSresXY gst
OLSresNDX gst
OLS YYhat ost
0 : '
2
Case
*1
x2
X
1
1
0 573
01053
2
2
0 651
0 135S(
3
3
0 606
01273
4
4
0 437
01591
5
5
0 547
01135
Regression
OLS
LMS/LP5
Iterative OLS
Bi weight
Huber
MVT
PROP
Method Comparison ~
The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° Click on the "Options" button.
SI OLS Options 2£.
P? Display Intervals
Confidence Coefficient -
0.95
Is? Display Diagnostics
OK
Cancel
A
o The "Display Intervals" check box will display the "Summary
Table for Prediction and Confidence Limits" in the output sheet.
o The "Display Diagnostics" check box will display the
"Regression Diagnostics Table" and the "Lack of Fit ANOVA
Tabic" (only if there are replicates in the independent variables).
o Click "OK" to continue or "Cancel" to cancel the options.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
351
-------
user should select and click on an appropriate variable representing a
group variable.
Click on the "Graphics" button and check all boxes.
H Select; OLS, Graphics Options
9 XY Plots
W V vs Y-Hat
17 Y vs Residuals
XY Plot Title
(Linear OLS Regression -Y vsXPI
Y vs Y-Hat T itle
(Linear QLS Regression • Y vsY-H
Y vs Residuals Title
(Linear OLS Regression • Y vs Res
Y-H at vs R esiduals T itle
17 Y-Hat vs Residuals jLlnear OLS Regression-Y-Hat vs
17 Index Plots
17 QQ Residuals
XV Plot Title
(Linear OLS Regression - Residual
QQ Residuals Title
{Lineal OLS Regression - Residual
-Regression Line - Fixing Other Regressors at -
No Line [7 Confidence Interval
f Minimum Values
(• Mean Values
C Maximum Values
C Zero Values
17 Predection Interval
Confidence Coefficient
0.95
OK
Cancel
A
o "Regression Line - Fixing Other Regressors at" option is not
used in this quadratic regression module.
o Specify the confidence or/and prediction band for the regression
line using the "Confidence Intervals" and the "Prediction
Intervals" check boxes.
o Specify the "Confidence Level" for the bands.
o Click "OK" to continue or "Cancel" to cancel the options.
o Click "OK" to continue or "Cancel" to cancel the OLS procedure.
352
-------
Output for OLS Regression.
Data Set used: Wood (predictor variables p = 5).
Ordinary Least Squares Quadratic Regression Analysis Output
Date/Time of Computation
10/30/20081.14.57 PM
User Selected Options
From File
D AN arain\WorkD atl nE xcel\Wood
Full Precision
OFF
Confidence Level for Intervals
0.95
i'j Confidence and Prediction Limits
True
Display Regresion Diagnostics
True
Residual QQ Plot
Not Selected
Residual Index Plot
Not Selected
T itle For Y vs X Plots
Quadratic OLS Regression -Y vs X Plot
unfidebce Level for Regression Line
0.95
Display Confidence Band
True
Display Prediction Band
True
Y vs Residuals Plot
Not Selected
Y vs Residuals Plot
Not Selected
Y vsY-Hat Plot
Not Selected
Number of Observations
20
Dependent Variable
y
Number of Selected Regression Variables
1
Independant Variable
x1
Correlation Matrix
y
x1
Squared
y
1
0.997
0 629
x1
0 997
1
0.588
Squared
0.629
0.588
1
Eigenvalues of Correlation Matrix
Eval 1
Eval 2
Eval 3
2.493
0.505
0.0015
Sum of Eigenvalues
3
353
-------
Output for OLS Regression (continued).
R egression E stimates and 1 nfetence T able
Paramater
DOF
Estimates
Std. Error
T-values
p-values
Tol Values
VIF
intercept
1
-0.G43
0 26
-2.47
0.0252
N/A
N/A
x1
1
3 90G
0 957
4.08
9.1497E-4
0.00573
174 6
Squared
1
-3.237
0 863
•3.75
0 00185
0.00573
174.6
OLS ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0.0284
2
0 0142
17.2
0.0001
Error
0.0141
17
8.2683E-4
Total
0 0425
19
R Square
0.669
Adjusted R Square
0.63
Sqrt(MSE) = Scale
0.0288
Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/'Scale
Stude" Res
1
0 534
0.532
0 00157
0103
0.0545
0.0576
2
0 535
0.528
0.00G97
0116
0.242
0.258
3
0 57
0.535
0.0346
0.0923
1 204
1.264
4
0 45
0.44G
0.00411
0.16
0143
0.156
5
0 548
0.525
0 0229
0.106
0 795
0 84
G
0.431
0.453
-0 0223
0.137
-0.774
-0 833
7
0 481
0.493
-0.0121
0.0873
-0 421
-0.441
8
0 423
0.418
0.00481
0.293
01G7
0.199
9
0.475
0.521
-0 0457
0.103
-1.591
•1.68
10
0 486
0514
-0 0278
0.247
-0 966
-1.114
11
0.554
0 523
0.0305
0.149
1.062
1.151
12
0.519
0.503
0.0158
0.391
0.549
0.703
13
0.492
0.527
-0 0354
0.12
-1 231
-1 313
14
0.517
0 534
-0 0174
0 0993
-0 606
-0.639
15
0.502
0 52
¦0.0179
0.103
-0 621
-0.656
1G
0.508
0.515
-0.00653
0 099
-0 227
-0 239
17
0.52
0 534
-0.0136
0.101
-0.475
-0.501
18
0.50G
0 457
0 0487
0.12G
1.692
1.81
19
0.401
0.423
-0.0221
0.264
-0 767
-0 895
20
0.568
0.517
0.0509
0.101
1.772
1.869
354
-------
Output for OLS Regression (continued).
S ummaiy T able f01 Prediction and Confidence Links
Obs
V Vectoi
Yhat
s(Yhat)
s(pred)
LCL
UCL
LPL
UPL
Residuals
1
0 534
0 532
0.00925
0 0302
0 513
0 552
0.469
0 596
"0592
0 00157
0 00697
2
0 535
0 528
0 00981
0 0304
0 507
0 549
0 464
3
0.57
0 535
0 00874
0 0301
0 517
0.554
0 472
0.599
0 0346
4
0.45
0 446
00115
0 031
0.422
0.47
0.381
0.511
0 00411
5
0.548
0 525
0 00934
0 0302
0 505
0 545
0.461
0 589
0 0229
8
0 431
0 453
0 0106
0 0307
0 431
0 476
0 389
0 518
-0.0223
7
0 481
0 493
0 0085
0 03
0 475
0 511
0 43
0 556
-0 0121
8
0.423
0 418
0 0156
0 0327
0 385
0.451
0.349
0.487
0 00481
9
0 475
0 521
0 00925
0 0302
0.501
0.54
0.457
0.584
-0 0457
• 10
0.48G
0 514
0 0143
0 0321
0.484
0.544
0.446
0.582
¦0 0278
11
0.554
0 523
00111
0 0308
05
0 547
0 458
0 589
0 0305
12
0 519
0 503
0 018
0.0339
0.465
0.541
0.432
0.575
0 0158
13
0 492
0 527
0.00998
0 0304
0 506
0 548
0 463
0.582
-0 0354
14
0.517
0.534
0.00906
0 0301
0.515
0.554
0.471
0.598
¦0 0174
15
0.502
0.52
0.00922
0.0302
05
0 539
0.456
0 584
-0.0179
16
0 508
0 515
0 00905
0 0301
0 495
0 534
0 451
0 578
-0.00653
17
0 52
0 534
0 00916
0 0302
0 514
0 553
0 47
0 597
-0 0136
18
0 506
0 457
0.0102
0 0305
0 436
0 479
0 393
0.522
0 0487
19
0.401
0 423
0 0148
0 0323
0 392
0 454
0 355
0.491
-0 0221
20
0.568
0517
0.00913
0 0302
0 498
0 536
0 453
0 581
0 0509
No leplicates in the data - Lack of Fit ANOVA Table not displayed
Regression Diagnostics Table
Obs. 8
Residuals
H[i.i]
CD[i]
t|i]
DFFITS
1
0 00157
0103
1 2758E-4
0 0576
0 0196
2
0 00697
0.116
0.00292
0 258
0 0938
3
0 0346
0 0923
0 0542
1 328
0 423
4
0 00411
016
0 00154
0.156
0 0681
5
0.0229
0106
0 0278
0 858
0 285
G
-0 0223
0137
0 0366
-0 851
¦0 339
7
-0 0121
0 0873
0.00621
-0 444
¦0137
8
0 00481
0 293
0 00549
0199
0.128
9
¦0 0457
0.103
0109
-1 84
-0 625
10
¦0 0278
0 247
0136
¦1 157
¦0 663 j
355
-------
Output for OLS Regression (continued) - Quadratic Fit.
Output for OLS Regression (continued) - Cubic Fit.
356
-------
9.3 Least Median/Percentile Squares (LMS/LPS) Regression
Method
Break Down Point of LMS Regression Estimates
The break down (BD) points for LMS (k~0.5) and least percentile of squared residuals
(LPS, k>0.5) regression methods as incorporated in Scout are summarized in the
following table. Note that, LMS is labeled as LPS when k>0.5. In the following the
fraction, k is given by 0.5< k <1. For example, for median, the fraction, k =0.5, for 75th
percentile, fraction, k = 0.75, and so forth.
Approximate Break Down Point for LMS or LPS Regression Estimates
No. of Explanatory Vars., p = 1
Minimizing Squared Residual
Pos = [n/2], k = 0.5
Pos = [(n+l)/2]
Pos = [(n+p+l)/2]
LPS ~ Pos = [n*k], k > 0.5
BD
(n-Pos)/n
(n-Pos)/n
(n-Pos)/n
(n-Pos)/n
No. of Explanatory Vars. ,P>
Minimizing Squared Residual
Pos = [n/2], k = 0.5
Pos = [(n+l)/2]
Pos = [(n+p+l)/2]
LPS ~ Pos = [n*k], k > 0.5
BD
(n-Pos-p+2)/n
(n-Pos-p+2)/n
(n-Pos-p+2)/n
(n-Pos-p+2)/n
Here [x] = greatest integer contained in x, and k represents a fraction: 0.5 < k <1. Pos stands for
position/index of an entry in ordered array (of size n) of squared residuals. The squared residual at
position, Pos is being minimized. For example, when Pos = [n/2], the median of squared residuals is being
minimized.
I. Click Regression > LMS.
HI Scout- 2008; [DiWacain^cquti_F?or_V/indgy^\ScputSo,urc^\WorkDatlhE)ceJ\Wo()d]|
Regression
Navigation Panel |
o
1
2
OL5
5 G
7
8
Name |
Case
*1
x2
Iterative OtS
Biweight
Huber
MVT
x5 v
D \Narain\Scout_Fo
1
! 1i
0 573
065T
010
oT:
i
0 641; 0 534J
—
2
2
0 887j 0 535
3
3
-
0 608
012
I
0 92i o 57:
|
--
4
0 437
015
PROP
Method Comparison
0 992] 0 45
5
5
0 547
011
0 915 0 548
2. The "Select Variables" screen (Section 3.3) will appear.
357
-------
o
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window and then click
on "User Specified" in "Subset Search Strategy" box.
§§ LMSj Reg^essioni Options
-Subset Search Strategy
C All Combinations
<• User Specified
C Extensive
C Quick
-Minimization Criterion
<• [n/2] Squared Res (LMS)
[(n +1) / 2] Squared Res
f |(n + p + 1)/2] Squared Res.
Percentile Squaied Res
"Subsets to Search All
r <=10.000
<=100.000
r <=1,000.000
r <=10.000,000
-Percentage Outliers
C Maximum < 55S
C Maximum < 10%
Maximum < 15%
C Maximum < 20%
(• Maximum < 25%
C Maximum < 30%
C Maximum < 40%
C Maximum <50%
15" Display Intervals
-Confidence Coefficient
| 035
17 Display Diagnostics
OK
Cancel
A
Note: The Subset Search Strategy allows the user to specify the number of initial subsets of size p+l to be
used to obtain the residuals (regression models) from a total of
/ n'\
subsets. The user can specify the
\r
Percentage of Outliers, Outlier Probability (usually closer to 1), and the Minimization Criterion (order of
the squared residual to minimize) (Leroy and Rousseeirw, 1987).
o Specify "Subsets to Search." The default is "<=100,000."
o Specify "Percentage Outliers." The default is "< 25%."
o Specify "Outlier Probability." The default is "0.95."
o Specify "Minimization Criterion." The default is "Median
Squared Residual."
o Click on "OK" to continue or "Cancel" to cancel the options.
358
-------
° Click on "Graphics" for the graphics options and specify the preferred
graphs.
I Select LMS Graphics.Options
I? W Plots
P7 YvsY-Hat
Y vs Residuals
R Y-Hat vs Residuals
tv Index Plots
I* QQ Residuals
XV Plot Title
MS Regiession -Y vs Y-Hat Plot
Y vs Residuals Title
LMS Regression ¦ Y vs X Plot
YvsY-Hat Title
MS Regression - Y vs Residuals
Y-Hat vs Residuals Title
MS Regression - Y-Hat vs Resid
XV Plot Title
MS Regression - Residuals Index
QQ Residuals Title
Regression - Residuals QQ Plot
-Regression Line - Fixing Other Regressors at -
C No Line I51 Confidence Interval
Minimum Values
(* Mean Values
C Maximum Values
C Zero Values
(7 Predection Interval
"Confidence Coefficient
0.95
OK
Cancel
o Specify the required graphs and the input parameters,
o Click on "OK" to continue or "Cancel" to cancel the options.
° Click on "OK" to continue or "Cancel" to cancel the computations.
Output example: The data set "WOOD.xls" was used for LMS regression. It has 5
predictor variables (p) and 20 observations. A total of 38760 subsets of size p+1 (6)
observations were used find the best subset meeting the minimization criterion of least
median of squared residuals.
359
-------
Output for LMS Regression.
Data Set used: Bradu (predictor variables p = 5, Minimization Criterion = Median Squared Residuals).
j Least Median Squared (LMS) Regression Analysts Output
Date/Time of Computation
3/4/2008 9.3511 AM
User Selected Options
From File
D VNarainSScout_For_Windows\ScoutSource\WorkDatlnExcel\Wood
Full Precision
OFF
Subset Search Strategy
User Specified Criteria
Percentage Outliers
Maximum Outliers <= 0 25
Percentage Outliers
Outlier Probability <= 0.95
Search All Cutoff
Do all combinations if <= 100000
Minimization Criterion
Median of Squared Residuals
Title for Residual QQ Plot
LMS Regression ¦ Residuals QQ Plot
Residual Index Plot
Not Selected
YvsX Plots
Not Selected
Title for Y-Hat vs Residuals Plot
LMS Regression - Y-Hat vs Residuals Plot
Y vs Residuals Plot
Not Selected
Y vs Y-Hat Plot
Not Selected
Number of Selected Regression Variables
5
Number of Observations
20
Dependent Variable
V
Correlation Matin
y
x1
x2
x3
x4
x5
y
1
¦0145
0 611
0 47
¦OS
0 629
x1
¦0.145 1
¦0 246
¦0 604
0 528
¦0 641
x2
0 Gil 1 -0 246
1
0 388
-0 498
0 248
x3
0 47
•0 604
0 388
1
¦0 24
0 659
x4
-0 6
0 528
¦0 498
¦0 24
1
¦0 512
x5
0 629
¦0 641
0 248
0 659
-0 512
1
E igenvalues of Correlation M atrix
EvaM
Eval 2
Eval 3
EvaM
Eval 5
Eval 6
0 054
0.173
0 588
0.713
1.114
3357
OLS Estimates of Regression Parameter
Intercept
x1
x2
x3
x4
x5
0 422
0.441
-1 475
0.261
0 0208
0.171
360
-------
Output for LMS Regression (continued).
Stdv of Estimated Regression Parameters
Intercept
x1
x2
x3
x4
x5
0.1 G9
0.117
0.487
0.112
0.161
0 203
OLSANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0 0344
5
0.00687
11.81
0 0001
Error
0.00814
14
5 8158E-4
Total
0.0425
19
OLS Scale Estimate
0.0241
R Square
0.808
Least Median of Squared Residual Regression
Total Number of Elemental Subsets of size (G)
387G0
Total Number of Elemental Subsets of size (6) Searched
38760
Number of Non-Singular Elemental Subsets of size (6)
38760
Best Elemental Subset of size 6 Found
y
x1
x2
x3
x4
x5
~ bs tt 7
0.481
0.489
0.123
0.562
0.455
0.824
ObsttIO
0.486
0.685
0.156
0.631
0.5G4
0.914
Obs #11
0.554
0 664
0159
0.506
0.481
0.867
Obs #12
0.513
0 703
0134
0.519
0.484
0.812
Obs # 15
0.502
0.534
0.114
0.521
0 57
0.889
Obs # 1G
0 503
0.523
0.132
0.505
0.G12
0.919
Best Subset satisfies minimization criterion.
LM S E stimates of R egression Parameters (UsingBestS ubset)
Intercept
x1
x2
x3
x4
x5
0.37
0.172
-0 073
•0 524
-0.441
0.G44
Stdv of Estimated Regression Parameters (Using Best Siinel]
Intercept
x1
x2
x3
x4
x5
0.874
0G04
2 516
0.579
0.832
1.051
361
-------
Output for LMS Regression (continued).
Minimizing 10th Ordered Squared Residual |
Value of Minimum Criterion
2.0999E-6
Approximate Breakdown Value
0.35
Unweighted Sigma Estimate based upon LMS Residuals
0.125
Initial Robust LMS Scale Estimate (Adjusted for dimensionality)
0 00691
LMS Regression Table Based Upon Best Sifcset
Obs#
Y
Yhat
Residuals
Hat[i,i]
Res/Sigma
Student
Res/Scale
Weights
C Res~cale
1
0 534
0 522
0 0122
0.278
0.0978
0115
1 761
1
-0.725
2
0.535
0 527
0.00787
0.132
0.0632
00678
1.139
1
0.0472
3
0.57
0 569
8.8370E-4
0.22
0.00709
0.00803
0.128
1
1 243
4
0.45
0 652
-0 202
0.258
-1 625
-1 887
-29.31
0 355
5
0.548
0.538
0 00972
0.222
0.078
00885
1.407
1
1.002
6
0.431
0.662
•0 231
0 259
-1.857
-2.158
-33 5
-0.452
7
0.481
0 481
-3.43E-14
0.53
-2.75E-13
-4.02E-13
-4.97E-12
1
0.907
8
0.423
0 65
¦0.227
0 289
-1 824
-2163
-32.91
-0.0349
9
0.475
0 489
•0 0141
0.348
-0113
-0.14
-2.037
1
-0 396
10
0.48G
0.486
•4 35E-14
0.449
-3.49E-13
-4 71E-13
-6.30E-12
1
-0.415
11
0.554
0 554
-1.14E-14
0 317
-918E-14
-1 11E-13
-1.66E-12
1
1 986
12
0.519
0.519
-3.52E-14
0.41
-2 82E-13
-3 68E-13
-5.09E-12
1
-1.198
13
0.492
0.481
0.00145
0.287
00116
0.0138
0.21
1
-0.485
14
0.517
0.522
•0.00472
0.129
-0.0379
¦0 0406
-0.684
1
-1.281
15
0.502
0.502
¦6 55E-14
0152
•5 26E-13
-5 71E-13
-9.48E-12
1
-0 587
16
0.508
0.508
¦7.33E-14
0.526
-5 88E-13
-8.54E-13
-1.06E-11
1
0.524
17
0.52
0.521
-0 00114
0 289
-0.00913
-0 0108
-0.165
1
-0.255
18
0.50G
0 522
•0 0161
0 294
-0129
-0.154
-2.329
1
0 284
19
0.401
0.666
-0.265
0.292
-2129
-2.53
-38.4
0
-1.084
20
0.568
0.567
9.6663E-4
0.318
0.00776
0 00939
0.14
1
0.545
R e weighted LM S E stimales of R egression Parameters
Intercept
x1
x2
x3
x4
x5
0377
0.217
-0 085
-0.564
-0.4
0.G07
Reweighted LMS Stdv of Estimated Regiession Parameters
Intercept
x1
x2
x3
x4
x5
0 054
0 0421
0.198
0.0435
0.0654
0.0786
362
-------
Output for LMS Regression (continued).
Reweighted LMS AN OVA Titte
Souice of Variation
ss
DGF
MS
F-Value
P-Value
Regression
0 0128
5
0.00255
46
0 0000
Eiioi
5.5517E-4
10
5 5517E-5
Total
0 0133
15
R Square
0 958
Final Reweighted LMS Scale Estimate
0 00745
Reweighted LMS Regression TaHe
Obstt
Y
Yhat
Residuals
Hat[i,i]
Student
Res^Scale
1
0 534
0 526
0 00802
0 278
1 267
1 076
2
0 535
0 531
0 00444
0132
0.639
0 595
3
0 57
0 57
2 3614E-4
0 22
0 0359
0 0317
4
0.45
0 64
-019
0 258
¦29 67
¦25.55
5
0 548
0 535
0 013
0 222
1 979
1 745
6
0.431
0G51
-0 22
0 259
¦34 32
-29 54
7
0.481
0.474
0 00658
0 53
1 288
0 883
8
0 423
0G39
•0 215
0 289
¦34.37
-28.98
9
0 475
0 483
¦0 00775
0 348
¦1.288
-1 04
10
0 486
0 488
•2 958E-4
0 449
¦0 0535
-0.0397
11
0 554
0 557
-0 00274
0317
¦0 445
-0 3E8
12
0 519
0525
-0 00642
0 41
¦1 122
-0.862
13
0 492
0 489
0.00318
0 287
0 507
0 428
14
0 517
0524
¦0.00712
0.129
¦1.023
-0 955
15
0 502
0502
4 6552E-4
0152
0 0679
0 0625
1G
0 508
0 508
-7 691E-5
0 526
¦0.015
-0 0103
17
0 52
0521
¦7 971E-4
0 289
¦0.127
-0.107
18
0 508
0.515
¦0 00928
0 294
¦1 482
-1 246
19
0 401
0655
•0 254
0 292
¦40 5
-34 07
20
0.588
0 569
•0 00145
0 318
•0 235
-0194
Final Weighted Coirelation Matrix
y
x1
x2
x3
x4
x5
V
1
0 75
0 271
-0 0959
¦0.173
0147
x1
0.75
1
0 497
•0.132
•0.03G7
•0 097
x2
0 271
0 497
1
•0.226
•0 0031
•0 755
x3
-0 0959
-0132
-0.22G
1
0 733
0138
x4
-0173
-0.03G7
-0.0031
0 733
1
0 245
x5
0147
-0 097
-0 755
0138
0 245
1
Eigenvalues of Final Weighted Correlation Matrix
Eval 1
Eval 2
Eval 3
Eval 4
Eval 5
EvalG
0012
0.19G
0.39
1.431
1.G04
2 368
363
-------
Output for LMS Regression (continued).
772
5.72
372
LMS Regression - Residuals QQ Plot
*
1.72
|UpperUm*«25 |
H.°verl**-25|
»
-4.28
*
-628
-828
-1028
-12.28
-14.28
"S*'6-28
¦O -1828
S-2028
^J-2228
N -24 28
|-2620
^2-38.28
-3228
-3428
-36 28
-3020
-4028
-4228
-4428
-4628
-4828
-5028 j
-5228
-5428
-5628
Normal Quantiles
Output for LMS Regression (continued).
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers.
364
-------
9.3.1 Least Percentile of Squared Residuals (LPS) Regression
l. Click Regression > LMS/LPS.
m
Scout' 2008; [D^arain^cout^foL^ind^^cou^tSour^Wort^BtlnteelVWogd]!
~B Fde Edit Configure Data Graphs Stats/GOF Outliers/Estimates
Navigation Panel
Regression
Multivariate EDA GeoStats Programs Window Help
Name
D \Narain\Scout_Fo.
SCLASS^Out ost
D \Narain\Scout Fo
1
Case
x1
1
0 573
0 551
TE06
0 437
~0547
x2
010
—I
o.i:
i
012
I
0 IE
011
OL5
EE2
Iterative OLS
Biweight
Huber
MVT
PROP
Method Comparison
x5
0 841 j
0 8871"
~092[
0 992f
0.534
0 5351
Wj
" 0 451
' 0.915j 0~548l
8
2. The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on the "Options" button to get the options window and then click on
"User Specified" in "Subset Search Strategy" box.
I1MS Regcessionj Options
-Subset Search Strategy
C All Combinations
(• User Specified
Extensive
Quick
-Subsets to Search All
c <=10,000
<* <=100,000
C <=1,000,000
r <=10,000.000
rMinimization Criterion
r [n/2] Squared Res (LMS)
C [(n + 1)/2] Squared Res
C [(n + p +1) / 2] Squared Res
(* Percentile Squared Res
-Percentage Outliers
Maximum < 5%
C Maximum <10%
C Maximum < 15%
C Maximum < 20%
(• Maximum < 25%
C Maximum < 30%
r Maximum < 40^
C Maximum < 50Z
R? Display Intervals
Confidence Coefficient
0.95
W Display Diagnostics
OK
Cancel
A
o Specify "Subsets to Search All." The default is "<=100,000."
o Specify "Percentage Outliers." The default is "<25%."
o Specify "Outlier Probability." The default is "0.95."
365
-------
o Specify "Minimization Criterion" as "Percentile Squared Res.'
The default is "0.75."
o Click on "OK" to continue or "Cancel" to cancel the options.
o Click on "Graphics" for the graphics options and specify the preferred
graphs.
il Select liMS Gcaghics Options
XV Plots
(v Y vsY-Hat
W Y vs Residuals
I* Y-Hat vs Residuals
W Index Plots
!>? QQ Residuals
XT Plot Title
LMS Regression - Y vs X Plot
Y vsY-Hat Title
LMS Regression -Y vsY-Hat Plot
Y vs Residuals Title
LMS Regression - Y vs Residuals
Y-Hat vs Residuals Title
LMS Regression -Y-Hat vs Resid
XT Plot Title
S Regression - Residuals QQ Plot
m
LMS Regression - Residuals Index
QQ Residuals Title
-Regression Line - Fixing Othef Regressors at ~
No Line W Confidence Interval
f* Minimum Values
(* Mean Values
t~ Maximum Values
f* Zero Values
K? Predection Interval
-Confidence Coefficient
0.95
OK
Cancel
o Click on "OK" to continue or "Cancel" to cancel the options,
o Click on "OK" to continue or "Cancel" to cancel the computations.
366
-------
Output for LPS Regression.
Data Set used: Bradu (predictor variables p = 3, Minimization Criterion = 0.75 percentile).
! Least Percentile Squared (LPS) Regression Analysis Output
Date/Time of Computation
2/25/200811:08-20 AM
User Selected Options
From File
D: \N arain\S cout_For_Windows\S coutS ource\WorkD atlnExcelVB RAD U
Full Precision
OFF
Subset Search Strategy
User Specified Criteria
Percentage Outliers
Maximum Outliers <=0.25
Percentage Outliers
Outlier Probability < = 0 95
Search All Cutoff
Do all combinations if <= 100000
Minimization Criterion
The 0.75 Percentile of Squared Residuals
Residual QQ Plot
Not Selected
Residual Index Plot
Not Selected
YvsX Plots
Not Selected
Title for Y-Hat vs Residuals Plot
LMS Regression - Y-Hat vs Residuals Plot
Y vs Residuals Plot
Not Selected
Yvs Y-Hat Plot
Not Selected
Number of Selected Regression Variables
3
Number of Observations
75
Dependent Variable
y
Coirelation Matrix
V
x1
x2
x3
V
1
0 94S
0.9G2
0.743
x1
0 946
1
0.979
0 708
x2
0 9S2
0 979
1
0 757
x3
0.743
0.703
0.757
1
Eigenvalues for Correlation Matrix
Eval1
Eval 2
Eval 3
Eval 4
0 0172
0.055G
0 3G8
3.559
OLS Estimates of Regression Parameteis
Intercept
x1
x2
x3
-0.388
0 239
-0.335
0.383
367
-------
Output for LPS Regression (continued).
Stdv of Estimated Regression Parameters
Intercept
x1
x2
x3
0 41G
0.2S2
0.155
0.129
'
0LSAN0VA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
543.3
3
181.1
35.77
0.0000
Error
359.5
71
5.0G3
Total
902.8
74
OLS Scale Estimate
2.25
R Square
0.602
Least 0.75 Percentile of S quared R esidual R «yession
Total Number of Elemental Subsets Searched
10000
Number of Non-Singular Elemental Subsets
10000
Best Elemental Subset of size 4 Found
y
x1
x2
x3
Obs 819
0.1
0.8
2.9
1.6
Obs 81
9.7
10.1
19.6
28.3
Obs 8 3
10.3
10.7
20.2
31
Obs 8 72
-0.2
0.6
2
1.5
Best Subset satisfies minimization criterion.
'ercentile Squared Estimates of Regression Parameters (Using Besl
Intercept
x1
x2
x3
-1.045
0.219
0.272
0.113
Stdv of Estimated Regression Parameters (Using Best Siind]
Intercept
x1
x2
x3
0.572
0.36
0.213
0.177
Minimizing 56th Ordered Squared Residual
Value of Minimum Criterion
0.606
Approximate Breakdown Value
0.24
Unweighted Sigma Estimate based upon LPS Residuals
3.088
368
-------
Output for LPS Regression (continued).
Initial Robust LPS Scale Estimate (Adiusted for dimensionality)
0 901 [
LPS (0.75th] Regression Table Based Upon Best Siiiset
Obs#
Y
Yhat
Residuals
Hat[i,i]
Res/Sigma
Student
Res/Scale
Weights
C Res~cale
1
97
9.7
1 356E-11
0.063
4.393E-12
4 538E-12
1 506E-11
1
1 502
2
10.1
9882
0 218
0 0599
0.0707
0.073
0.242
1
1
1.775
3
10.3
103
1 530E-11
0.0857
4.954E-12
5.181E-12
1.698E-11
1 334
4
95
10 56
-1 058
0 0805
-0 343
-0 357
-1.174
1
1 138
5
10
10.47
-0.469
0 0729
-0.152
-0.158
-0.52
1
1.36
6
10
1017
-0.173
0.0756
¦0.0559
¦0.0582
•0.192
1
1.527
7
108
10.23
0 568
0.068
0.184
0.191
0 631
1
2 006
8
10.3
9713
0 587
0.0631
0.19
0.196
0.G52
1
1.705
9
96
10 22
-0 617
0.08
-0.2
-0 208
•0.685
1
1.204
10
9.9
9.778
0122
0 0869
0.0394
0.0412
0.135
1
1.35
11
-0.2
11.85
-12.05
0.0942
-3 903
-4.101
•13 38
0
-3.48
12
-0.4
12.03
-12.43
0.144
-4.024
-4.349
•13.79
0
-4.165
13
0.7
125
-11 8
0.109
-3 822
-4 049
•13.1
0
-2.719
14
01
14 46
-14 36
0 564
-4.65
-7.04
-15.94
0
-1.69
15
-0.4
0 725
-1 125
0 0579
-0.364
-0 375
¦1.249
1
-0 294
1G
06
02G6
0 334
0.0759
0108
0113
0 371
1
0.385
17
-0 2
-0.587
0 387
0.0393
0125
0128
0.43
1
0 287
18
0
012
-0.12
0 0231
-0.0387
-0.0392
•0.133
1
-0175
19
01
01
2.54E 12
0 0312
-8.24E-13
-8 37E-13
-2 82E-12
1
0 29
20
0.4
0.807
-0.407
0.0476
-0.132
-0.135
-0.452
1
0.151
21
0.9
0.337
0 563
0.0294
0182
0.185
0.625
1
0.299
22
03
0.128
0.172
0.0457
0.0557
0.057
0.191
1
0.415
23
-0.8
0.109
-0.909
0.0293
-0.294
-0.299
-1.009
1
-019
24
07
¦0.0783
0.778
0.0261
0 252
0.255
0.864
1
0 602
25
-0.3
-0.781
0.481
0 022
0156
0.158
0.534
1
-0136
26
¦0.8
0 333
-1 133
0 0318
-0.367
-0 373
•1 257
1
-0 214
27
-0 7
0G85
-1 385
0 0417
-0.449
-0 458
•1.538
1
-0 612
2B
03
-0 207
0 507
0.0235
0164
01G6
0 563
1
-0.108
29
03
-0 447
0.747
0.0178
0 242
0 244
0 829
1
0.176
30
-0 3
-0 208
¦0 0924
0.0466
¦0.0299
•0.0307
¦0.103
1
¦0 564
31
0
0127
-0127
0 059
-0.0412
-0 0424
¦0141
1
¦012
32
-0.4
-0 249
-0151
0 0364
¦0.049
-0 0499
•0 1G8
1
0 247
33
-0G
0 296
-0 896
0.02G4
-0 29
-0 294
¦0 995
1
-0.0485
34
-0 7
-0 879
0179
0 032
0 0578
0.0588
0.198
1
-0.301
35
03
0.626
-0.326
0.0342
-0.105
-0107
•0 3G1
1
-0178
(The complete regression table is not shown.)
369
-------
Output for LPS Regression (continued).
Re weighted LPS Estimates of Regression Parameters
Intercept
x1
x2
x3
¦0 93
0143
0191
0184
Reweighted LPS Stdv of Estimated Regression Parameters
Intercept
xl
x2
x3
0.13
0.0795
0.0718
0.0505
Reweighted LPS ANOVA Table
S ource of Variation
ss
DOF
MS
F-Value
P-Value
Regression
8G5 3
3
288 4
635
0.0000
Error
30 43
67
0.454
Total
895.7
70
R Square
0 966
Final Reweighted LPS Scale Estimate
0 674
Reweighted LPS Regression Table
Obstt
Y
Yhat
Residuals
Hat[i,i]
Student
Res*1 Scale
1
97
9 475
0.225
0.063
0 346
0.335
2
10.1
9 671
0 429
0.0599
0.657
0 637
3
103
10.17
0127
0.0857
0.197
0.189
4
9.5
10.44
¦0.936
0.0805
-1.448
¦1.388
5
10
10 31
¦0.306
0.0729
¦0.471
•0.454
e
10
9 893
0107
0.0756
0.165
0158
(The complete regression table is not shown.)
73
0.4
-0.157
0 557
0 0426
0.844
0826
74
¦0.9
¦0 215
-0.685
0.05
-1 043
¦1.016
75
02
-0 331
0 531
0.0621
0 813
0.788
Final Weighted Correlation M abix
V
x1
x2
x3
V
1
0 939
0 946
0.943
x1
0 939
1
0 985
0 977
x2
0 946
0 985
1
0 98
x3
0 943
0 977
0 98
1
Eigenvalues for Final Weighted Correlation Matrix
Eval 1
Eval 2
Eval 3
Eval 4
0 0142
0 0244
0.0762
3 885
370
-------
Output for LPS Regression (continued).
LMS Regression - Residuals QQ Plot
21.23
2023
19 23
1023
1723
1623
15.23
1423
13.23
12-23
=6
T> 7 23
J 6 23
523
h**l,T4-25
-2.77
-3 77
•09 -0.« 01
Normal Quantiles
LMS Regression - Y-Hat vs Residuals Plot
215
' f
0.33 0 38
-------
Output for LPS Regression.
Data Set used: Bradu (predictor variables p = 3, Minimization Criterion = 0.9 percentile).
Least 0.9 Percentile of Squared Residual Recession
Total Number of Elemental Subsets Searched
10000
Number of NorvSirtgutat Elemental Subsets
10000
Best Elemental Subset of size 4 Found
V
x1
x2
x3
Obs« 31
0
3.1
1 4
1
Obs« 32
-04
05
24
03
Obs a 3
10.3
10.7
20 2
31
Obs ft 45
-0 5
1 9
0.1
06
B est S ubset satisfies minimization criterion.
ercentile Squared Estimates of Regression Parameters (Using Bes
Intercept
x1
x2
x3
¦0 951
0167
0.171
0.194
Stdv of E stimated R egression Parameters (U sing Best Siiiset)
Intercept
x1
x2
x3
0 554
0.349
0.206
0.171
Minimizing 67th Ordered Squared Residual
Value of Minimum Criterion
1 664
Approximate Breakdown Value
0 0933
Unweighted Sigma Estimate based upon LMS Residuals
2 991
Initial Robust LPS Scale Estimate (Adiusted for dimensionality)
0 909
LPS (0.9th) Regression Table Based Upon Best Siiiset
Obs#
Y
Yhat
Residuals
Hat[i,i]
Res/Sigma
Student
Res/Scale
Weights
C Res~cale
1
97
9 573
0127
0.063
0 0424
0 0438
0.139
1
1 502
2
10.1
9 743
0357
0 0599
0.119
0123
0.393
1
1.775
3
10.3
103
1 723E-13
0 0857
5.760E-14
6 024E-14
1 895E-13
1
1 334
4
95
10 52
¦1.024
0 0805
¦0.342
-0 357
¦1 126
1
1.138
5
10
10.41
•0.406
0 0729
-0.136
-0.141
-0 447
1
1 36
6
10
10
-0.00147
0 0756
-4.906E-4
-5.102E-4
-0 00161
1
1 527
7
10.8
10 02
0 783
0.068
0 262
0.271
0.861
1
2 006
8
10.3
9 837
0 663
0.0631
0 222
0229
0.729
1
1 705
372
-------
Output for LPS Regression (continued).
8
10.3
9 637
0 663
0 0631
0 222
0 229
0 729
1
1 705
9
96
10 22
¦0.618
0 08
-0 207
-0 215
¦0 68
1
1 204
10
9.9
9 845
0.0552
0 0869
0.0185
0 0193
0 0607
1
1.35
11
-0 2
11 77
-11 97
0.0942
-4 003
-4 206
•1317
0
-3 48
12
-0.4
1216
-12 56
0144
-4199
•4 538
-13.82
0
-4.165
13
07
12.09
-11 39
0109
-3 807
¦4.034
-12.53
0
•2 719
14
0.1
13 29
-1319
0 564
-4.408
-6 673
-14 5
0
¦1 69
15
-0.4
0 52
•0.92
0 0579
-0.307
¦0.317
-1.011
1
¦0.294
16
06
5 8698E-4
0 599
0 0759
02
. 0 208
0.659
1
0 385
17
-0.2
-0 639
0.439
0 0393
0.147
0.15
0.483
1
0 287
18
0
0 0945
¦0 0945
0.0231
-0 0316
-0 0319
-0104
1
•0.175
19
01
•0 0122
0112
0 0312
0 0375
0 0381
0123
1
0.29
20
04
0.574
¦0.174
0 0476
-0 0582
-0.0596
•0191
1
0151
21
09
0 228
0 672
0 0294
0 225
0 228
0 74
1
0 299
22
0.3
0 0303
0 27
0.0457
0 0902
0 0923
0 297
1
0.415
23
-0 8
-0 0692
•0.731
0 0293
-0 244
-0 248
•0 804
1
•019
24
07
-0 244
0 944
0 0261
0 316
0 32
1 039
1
0 602
25
-0.3
-0 706
0.406
0.022
0136
0.137
0 447
1
•0.136
26
-0 8
0 247
¦1 047
0 0318
-0.35
-0 356
-1.152
1
¦0 214
27
-0 7
0 59
¦1.29
0 0417
-0 431
-0 44
-1 419
1
¦0 612
28
03
-0126
0.426
0.0235
0142
0144
0 468
1
¦0108
29
03
•0 442
0.742
0 0178
0 248
0.25
0 816
1
0176
30
-0.3
0 0288
¦0 329
0.0466
-0.11
-0.113
¦0 362
1
•0.564
31
0
2146E-14
-215E-14
0 059
¦7.17E-15
•7 39E-15
-2 3GE-14
1
•012
32
-0.4
-0 4
2 520E-14
0 0364
8 425E-15
8 583E-15
2 772E-14
1
0.247
33
-0 6
0119
-0 719
0 0264
-0 241
•0 244
¦0 791
1
¦0 0485
34
-0 7
-0 748
0.0484
0.032
0 0162
0 0165
0 0533
1
¦0 301
35
03
0 559
-0.259
0 0342
-0 0865
-0 088
•0 285
¦0.178
36
-1
0132
-1.132
0 0231
-0 378
-0 383
•1 245
1
•0.522
37
-0 6
0 0819
-0.682
0 0587
-0 228
-0 235
•0 75
1
•0102
38
0.9
•0.457
1.357
0 021
0.454
0.458
1.493
1
0 557
39
-0 7
-0 367
-0.333
0 035
•0.111
-0.113
•0 366
1
¦0 567
40
-0.5
¦0 294
-0 206
0.03
-0 069
-0 0701
•0 227
1
¦0 0102
41
•0.1
0.453
-0.553
0 0524
-0185
-019
•0 608
1
¦0 49
42
-0.7
•0.206
-0.494
0 0554
-0165
-017
•0 543
1
¦0.482
43
0.6
-0.197
0.797
0 0606
0 266
0 275
0.877
1
0 766
44
-0 7
0 0561
-0.756
0 0406
-0 253
-0 258
¦0.832
1
•0 801
45
-0 5
•0 5
-4.16E-14
0 029
-1 39E-14
-1 41E-14
-4 58E-14
1
-0 339
46
-0.4
0 0173
-0.417
0 0377
-0.14
-0142
¦0 459
1
•0 634
(The complete regression table is not shown.)
373
-------
Output for LPS Regression (continued).
R e weighted LPS E stimates of R egression Parameters
Intercept
xl
x2
x3
¦0.93
0.143
0.191
0.184
R eweighted LM S S tdv of E stimated R egression Parameters |
i
Intercept
x1
x2
x3
0.13
0.0795
0.0718
0.0505
R e weighted LPS AN OVA T rite
Source of Variation
ss
DOF
MS
F-Value
P-Value
Regression
865.3
3
288.4
635
0.0000
Error
30.43
67
0.454
Total
895.7
70
R Square
0 966
Final Reweighted LMS Scale Estimate I 0 674
Reweighted LPS Regression Table
0bs8
Y
Yhat
Residuals
Hat[i,i]
Student
Res/Scale
1
97
9.475
0 225
0.063
0 346
0.335
2
10.1
9.671
0.429
0 0599
0.657
0 637
3
103
1017
0127
0 0857
0197
0189
4
9.5
10.44
¦0.936
0.0805
-1 448
-1 388
5
10
1031
-0 306
0.0729
-0 471
-0.454
6
10
9.893
0.107
0.0756
0165
0158
7
108
9.927
0.873
0.068
1 341
1 295
8
10.3
9 538
0.762
0.0631
1.168
1 13
9
9.6
10.13
-0.525
0.08
-0 812
-0.779
10
9.9
9.748
0.152
0.0869
0.236
0.225
11
¦0.2
11.68
-11.88
0.0942
-18.52
-17.63
12
¦0.4
12
-12.4
0.144
-19.89
-18.4
13
0.7
12.02
-11.32
0.109
-17.79
-16 79
14
0.1
13.4
-13.3
0.564
-29.88
-19.74
15
-0.4
0.497
-0.897
0.0579
-1 372
-1.332
16
0.6
-0 0111
0.611
0.0759
0 943
0 907
17
-0.2
¦0.588
0.388
0.0393
0.587
0 575
18
0
0.0736
-0 0736
0.0231
-0.11
-0.109
19
01
0.033
0 067
0 0312
0101
0 0994
20
04
0 568
-0.168
0.0476
-0.256
-0 25
(The complete regression table is not shown.)
374
-------
Output for LPS Regression (continued).
4 09
3 09
LMS Regression - Residuals QQ Plot
~ ~ lyppe>LWI-25l
0 09
-031
-2 31
-4 31
« S3,
¦o -6 31
*3 9 91
T3
g-1031
W.11 9i
¦12.91
-13.31
-14.91
-1991
-20 31
-21 91
IB 2.1
The 75,h percentile minimization criterion finds 14 observations (1,2, 3,4, 5, 6, 7, 8,9 and 10) as outliers
and 90"' percentile minimization criterion finds four observations (11, 12, 13 and 14) as outliers.
375
-------
9.4 Iterative OLS Regression Method
I. Click Regression > Iterative OLS.
g
Scout'4.J3)- [D:.\Narain\Scouti_Fori_Wi"diiW5\ScoutSource\WbrkDatl'nExcel\BRADlJ]]
~9, File Edit Configure Data Graphs
Stats/GOF
Outliers/Estimates j
Multivariate EDA
Geo5tats Programs Window
Help
Navigation Panel ]
0
I 1
2
OLS
5
1 6
7
8
I Name I
I
Count
I v
x1
LM5
J
1 1
D \Narain\Scout_Fo
1
"T
97
1(
]
1 Biweight
[ Huber
f 1
2
2
10.1
:
i
3
3
| 103
1_i
MVT
1
4
4
95
" c
1
PROP
5
: "5
!
10
Method Comparison
l
2. The "Select Variables" screen (Section 3.3) will appear.
° Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on the "Options" button to get the options window.
§11IR11S, Options
'Regression Value
Alpha for Residual Outliers
I oT5
"Number of Regression Iterations
I lo
[Max = 50]
"Residuals MDs Distribution —
(• Beta Chtsquare
-Intermediate Iterations —
<• Do Not Display
C Display Every 5th
C Display Every 4th
C Display Every 2nd
C DisplayAD
"Identify Leverage Points
Leverage
-Select Leverage Distance Method
Dassical
f Sequential Classical
^ Huber
PROP
MVT (TnmmBig)
-Number of Leverage Iterations
10
[Max = 50]
"Leverage MDs Distribution
f* Beta
C Chisquaie
"Initial Leverage Distances
C Classical
C Sequential Gassical
r Robust (Median. 1 48MAD]
(* OKG (MaronnaZamai)
C KG (Not Orthogonalized)
C MCD
¦Leverage Value(s)
| 005
Leverage Influence Function
Alpha
Display Intervals
Confidence Coefficient
0 95
9 Dbplay Diagnostics
OK
Cancel
A
o Specify the "Regression Value." The default is "0.05.'
376
-------
o Specify the "Number of Regression Iterations." The default
is"10."
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances" The default is "OKG
(Maronna Zamar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options.
° Click on the "Graphics" button to get the options window.
0ptio nsRegnessionGija|j hics
F7 XY Plots
17 Yvs Y-Hat
|i? Y vs Residuals
17 Y-Hat vs Residuals
15" Residuals vs Leverage
P? QQ Residuals
Xf Plot Title
I IRLS Regression -Y vsX Plot
YvsY-HatTHle
JlRLS Regiession -Y vs Y-Hat Plot
Y vs Residuals Title
jlRLS Regiession - Y vs Residuals
Y-Hat vs Residuals Title
||RLS Regiession -Y-Hat vs Resid
Residuals vs Leverage Title
JlRLS Regiession - Residuals vs U
QQ Residuals Title
jIRLS Regiession - Residuals QQ
Regression Line - Fixing Other Regressors at —
t* No Line [51 Confidence Interval
f~ Minimum Values
C Mean Values
C Maximum Values
C Zero Values
17 Piedection Interval
Confidence Coefficient
I 095
'Graphics Distribution
f Beta
C Chisquare
-Residual/Lev Alpha —
0 05
OK
Cancel
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options.
° Click "OK" to continue or "Cancel" to cancel the computations.
A
377
-------
Output example: The data set "BRADU.xls" was used for iterative OLS regression. It
has 3 predictor variables (p) and 75 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in finding the regression outliers iteratively. When the leverage option
is off, all observations are assigned one (1) as weights and then the regression outliers are
found iteratively. Finally, the estimated regression parameters are calculated.
378
-------
Output for Iterative OLS (Leverage ON with PROP function and OKG initial start).
Data Set Used: Bradu (predictor variables p = 3).
Date/Time of Computation
User Selected Options
| RegiessionAnalysis Output
3/4/2008 9 50 32 AM
From Fie (D \Naran\Scout_Fof_Windows\ScoutSource\WofkDatlnExcel\BRADU
Ful Precision .OFF
Selected Regression Method Iterative Reweighted Least Squares (IRLS)
Alpha for Residual Outbers i0 05 (Used to Identrfy Vertical Regression Outkers)
Number of Regression Iterations
Leverage | Identify Leverage Points (Outliers in X Space)
Selected Leverage Method i PROP
Initial Leverage Distance Method jO^G (Maionna Zamar) Matrix
Squared MDs
Leverage Distance Alpha
Number of Leverage Iterations
Beta Distribution used foi Leverage Distances based ipon Selected Regression [Leverage) Varables
0 05 (Used to Identify Leverage Points)
10 (Maamun Nimbet if doesnt Converge]
YvsY-hatPlot .NotSelected
Y vs Residual Plot Not Selected
Y-hat vs Residual Plot Not Selected
YvsX Plots Not Selected
J
Title for Residual QQ Plot IRLS Regression • Residuals QQ Plot
Residual Band Alpha
Title Residual vs Distance Plot
0 05 (Used n Graphics Residual Bands)
IRLS Regression - Resduals vs Unsquared Leverage Distance Plot
Show Intermediate Results ,Do Not Display Intermediate Results
I Intermediate Results Shown on Another Output Sheet
Leverage Points are Outliers inX-Space of Selected Regression Variables.
— —
Numbej of Selected Regression Variables j 3
Number of Observations
1 75
Dependent Variable
y
Residual Values used with Graphics Display
Upper Residual Indvidual (0 05] MD
194
Lower Residual Indvidual (0 05] MD
CofretationMatlix
¦1 94
V 1
xl |x2
x3
......
y
1
0 946
0982
0 743
i
i
x1
0 946
1
0 979
0 703
x2
0 962
0 979
1
0 757
x3
0 743
0 703
0 757
1
Eigenvalues of Correlation Mabn
Eval 1
Eval 2
Eval 3
Eval 4
I
0 0172
0 055G
0 368
3 559
J
..... ...J
Ordinary Least Squares (OLS) Regression Resdtz
Estimates of Regression Parameters
Intercept
x1
x2
x3
-0 388
0 239
¦0 335
0 383
Stdv of Estimated Regression Parameters
Intercept
x1
x2
x3
I
0416
0 262
0155
0129
I
379
-------
Output for Iterative OLS (Leverage ON) (continued).
ANOVA Table
1
Souice of Variation
SS
DOF
MS
F-Value
P-Value
1
1
Regression
543 3
3
181 1
35.77
0 0000
1
Error
359.5
71
5.063
1
i
Total
902 8
74
1
1
I
1
1
R Square Estimates
OS02
MAD Based Scale Estimates
1 067
1
Weighted Scale Estimates
2 25
1
IQR Estimates
1.469
I
Det of COV[Regiession Coefficients] Matrix
5.5107E-8
1
Regression ParametersVector Estimates
-
Intercept
xl
x2
x3
|
•0 0105
0 0624
00119
-0.107
S tdv of R egression E stimales Vectai
Intercept
x1
x2
x3
|
0197
0 0689
0 0684
0 0713
I
I
ANOVA Table
i
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0 898
3
0.299
0.94
0 4272
Error
1814
57
0 318
Total
19 04
60
R Square Estimates
0 0472
MAD Based Scale Estimates
0 902
|
Weighted Scale Estimates
0 564
i
Individual MD(0]
7 346
IQR Estimates
1 236
I
Determinant of Leverage S Matrix
1.357
|
t
380
-------
Output for Iterative OLS (Leverage ON) (continued).
Le veiage 0 ption R egression T able
~ bs
"1
2
3
4
5
6
Y Vector
" IT"""
101
~ioT
FF
10
10
10!
103
9
"iF
"TT
IF
9 G
97F
•0 2
-0.4
13
14"
"15"
IF
0.7
"FT
•04"
IF'
17
-0 2
18
19
'To
'TT
22
~23~
24
FF
26
~2T
"28"
~2F
3F
IT"
IF
IF
"34"
"IF
"36
~W
0
~o7f
"04"
09
"FT
-"08"
"077""
-FT"
-oF"
-0 7
03"
0.3
T3
0
"FT"
-0.G
4.7"
"oF
-i
"oF
Yhat
-2.174
"-2265"
T418
•2.523
Residuals
"Ti"87~
12"3G
-2 443
-2.217
-2.219
-224"
-2 475
-2 437
-2 782
F§4F
-2.589
FffiF
"Toil 5"
"0.177
"F012F
0.0619
0 0971
00119
0.0253"
¦0151
0 0561
00446
0 00912
FT§F
-0 085
-0 0105
"F29T
""04926
"00173"
-0.0404
"^0 0604"
~o"ToF
1204"
~0 247"
12 72
iFoF
1244
1222
1302
1254
12 07
"1234"
2.582
2546
Z656
"F4lF"
F423"
"oT&T
0 0619
"FisT
07412
~0"9"25""
07451"
-0.856
0655"
-0 309
-0618
-0 615
076F
0.31
-0 00901
-0 0926
-0417"
~0W
" F64
0.409
-0 796
Hat[i.i]
0 063
0 0599
0 0857
"0 0805
"CE6729
0 0756
0 068
0.0631
0 08
0 0869
0 0942
""OUT"
0109
0.564
0 0579
0.0759
F0393"
0.0231
0 0312
0l47"6"
"00294"
0.0457
"00293
"070261
"0 022""
"00318"
0 0417
00235
0.0178
00466
0059
00364
1702 64"
0703F"
00342
00231
-0.353
0 0587
Res/Scale
"'"2i'05""
~" FT 92
" ~22"54~
F.32
"2FoF
'21 66
~""23.0F~
22.23
21 4
21 87
4 577"
"4513
" 5783"
4 708"
-0.73
0.75"
"-033F
0.11
"0.343
0.73
1G4
0798
-1.518
1 162
-075"48
^1095
-1 09
F71T
"FsF
0"016"
Fl64~
¦6774""
-0992"
-1 134"
0.726
"1.41
"F62F"
Stude~Res| Wls[u]
21 74 12 364E-13
"22 §T j 1" 074E-T3"
23 58"~jT886E-T4"
22 23 j 6."93rE-1"5
"2291"
22.52
"239""
22797
"22 32"
22 89
|1 266E-14
'7 228E-T4"
IFsTeE-iT
I1.634E-T3
1 768E-14
4 809
5 017E-14
1.424E-16
4~877"1F684E-17
' 6."177 j i"069ETe"
7" 127 lT.478FlF
" "O 75F 1
0 78
"0339"
FfTT j
0~355~ j
07748 j
T.665 I
0 818 j
T5T" j
1.177 J
""-0 554"]
•TlTJ" I
1
Res Dist
" 2V05"
"21.92
" 22 54~"
21 32"
"22"0G"
FT£F
23708-'
22723
21 4
21.87"
-1 114
0722"
0 555
•0 0164"
"FT&T
' -0 754~
¦1.005
-1 15F
"0738"
A 427
-0.646
4 577
"7513"
5.83
4.708
0"73~~
oTT
""033F
"FTP"
"1349"
0 73
164
0 799
TsiF
1.162
0548"
T095
ToF"
"07T4"
0 55 "
0016
"0164"
0 74""
""099F
"T.134"
"*0726"
141""'
oeif
Lev Dist
29.44 "
3F2l"~
31" 89"
32.86""
'32 28""
30.59 *
"" 3'o7eF
29.8
"" 31" 95"
30 94"
" 3664"
" "37.96 "
OLS R~ist
"" "f.502"
T77F"
1 334
1 138""
L .
1 36
w
T00F
"1 705"
"1"204"
"1735" ~
36.92
4T0F"
"F002
2.165
_17938
'"07786 "
W
2 067"
"17059"
1.746
l7l£3
1.317
1 98F
1 70F
T934"
i.rcsF
"1I3F
FTiT
1 7lF
1 763"
1~2~77~
"2.042
1*885"
lT44"
"F0T4"
3 48
4.165
"2719"
1.69
"17294""
0 385
" 0"287~
"0175
~0729"~"
0.151
1729F
0.415"'
"oT9
0 602
" oTilF"
"oFTT
0 612
oioF
"Fi7F
"0564"
" 0.1F""
"0247"
ooffi
"0 301 "
" FiTF
"0.522""
07102"
(The complete regression table is not shown.)
381
-------
Output for Iterative OLS (Leverage ON) (continued).
THE BREAK BETWEEN LEVERAGE AND REGRESSION IS HEREI
Results From the Regression Operation
-
— .
Regression Parameters Vecloi Estimates
I
Intercept
x1
x2
x3
I
¦018
0.0314
0.0399
-0 0517
S tdv of R egression E stimates Vector
Intercept
x1
x2
x3
0104
0.0667
0 0405
0 0354
AN OVA Table
Source of Variation
ss
DOF
MS
F-Value
P-Value
Regression
0.847
3
0.282
0.909
0.4421
Error
18 94
61
0 31
Total
19 79
64
R Square Estimates
0 0428
MAD Based Scale Estimates
0 845
Weighted Scale Estimates
0 557
Individual MD(0]
7 346
IQR Estimates
1.132
I
Del of C0V[Regression Coefficients] Matrix
2.531 E-12
I
!
R egression T able
Obs
V Vector
Yhat
Residuals
Hat[i,i]
Res^Scale
Stude" Res
Wts[u]
Res Dist
1
97
-0.0386
9 739
0 063
1748
18 06
0
17 48
2
101
¦0 0825
10.18
0 0599
18 27
18.85
0
18 27
3
103
¦0105
10.41
0 0857
18 67
19 53
0
18 67
4
95
¦0.155
9.655
0.0805
17 33
18.07
0
1733
5
10
•0.107
1011
0.0729
18.14
18 84
0
1814
6
10
0.00379
9.996
0.0756
17.94
18.66
0
1794
7
10.8
0 00449
10.8
0.068
19 37
20.07
0
19.37
3
10.3
-0 0807
10.38
0 0631
18.63
19.25
0
18.63
1
9
9.6
-0.167
9 767
0.08
17.53
18 27
0
17.53
10
99
•0.203
101
0 0869
18.13
18 98
0
1813
11
¦0 2
¦0136
-0.0641
0 0942
¦0115
•0.121
1
0115
i . Vi
(The complete regression table is not shown.)
382
-------
Final Weighted Correlation Mabw
y
x1
k2
x3
V
1
0.89
0.917
0.0893
*1
0.89
1
0.961
0.063
*2
0.917
0.961
1
0.0261
x3
0.0893
0.063
00261
1
Eigenvalues of FinalWeighted Correlation Mabix
Evai1 Eval 2 Eval3 EvaU
0.035 0.117 0997 2051
Output for Iterative OLS (Leverage ON) (continued).
LS Regression - Residuals QQ Plot
2123
2023
1923
1823
1723
16.23
1523
1423
1323
1223
¦
| 11 23
3
S 1°23
T3 723
s . ..
|m»v<0 05|««1 94 |
¦0.77
-177
Normal Quantifes
383
-------
215
IRLS Regression - Residuals vs Unsquared Distance Plot
|mdlv-MD(0 05) ¦ *2 7*
ji
M 4
V -
1 -J
16.2
3
3
¦o
o
a:
¦O
•
¦o
s
K «
|lndv-Res|0X)Sl - »1.9* |
, ;jr
¦ , IV-
- - i,
InKJtnK 051
—1.9*1 * •
|htax-f©(0 05] = *3 94
0.2
12
2 2
32 4 2 52 62 70
Unsquared Leverage Distances
Interpretation of Graphs: Observations which are outside of the horizontal lines in both of the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. Observations between the horizontal and to the right of the vertical lines represent good
leverage points.
Question: What are really bad leverage points for this data set in the context of a regression model?
Answer: There are contradictory opinions in this respect. So far as outliers are considered, several
methods (e.g., MCD, PROP) can identify all of the 14 outliers present in this data set. However,
observations I through 10 should be considered to be good leverage points as they enhance the regression
model and increase the coefficient of determination. Without those 10 points, fitting a regression model to
the rest of the 65 points is meaningless. Observations 11 through 14 are outliers and bad leverage points.
384
-------
Output for Iterative OLS (Leverage OFF).
i Regression Analysis Output
Date/Time of Computation
3/4/2008 9 54 08 AM
User Selected Options
From File
D:\Narain\Scout_For_Windows\ScoutSouice\WorkDatlnExcel\BRADU
Full Precision
OFF
Selected Regression Method
Iterative Reweighted Least Squares (IRLS)
Alpha for Residual Outliers
0 05 (Used to Identify Vertical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leverage
Off
YvsYhat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
YvsX Plots
Not Selected
Title for Residual QQ Plot
IRLS Regression - Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
IRLS Regression - Residuals vs Unsquaied Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Number of Selected Regression Variables
3
Number of Observations
75
Dependent Variable
S1
RetidualValues used with Graphics Display
Upper Residual Indvidual (0 05) MD
1 94
Lower Residual Indvidual (0 05) MD
¦1.94
Correlation Matrix
y
x1
x2
x3
V
1
0 946
0 962
0 743
x1
0 946
1
0 979
0 708
x2
0 962
0 979
1
0 757
x3
0 743
0 708
0 757
1
Eigenvalues of Coiielation Matrix
Eval 1 I Eval2
Eval 3
Eval 4
0.0172 0 0556
0 368
3 559
385
-------
Output for Iterative OLS (Leverage OFF) (continued).
Ordinary Least Squares (OLS)Regression Reais
Estimates of Regression Parameters
Intercept
x1
*2
x3
¦0 388
0.239
¦0 335
0 383
Stdv of Estimated Regression Parameters
Intercept
x1
x2
x3
0 415
0 282
0155
0129
ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
543 3
3
181 1
35 77
0 0000
Error
359 5
71
5.063
Total
902 8
74
I
R Square Estimate
0602
MAD Based Scale Estimate
1 067
Weighted Scale Estrmate
225
IQR Estimate of Residuals
1 468
DeL of COV[Regression Coefficients] Matrm|5 5107E-8
Final Reweighled Regression ReoAs
Estimates of Regression Parameters
Intercept
x1
x2
x3
-0 388
0 239
-0 335
0.383
Stdv of Estimated Regression Parametral
Intercept
x1
x2
x3
0416
0 262
0155
0129
ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
543.3
3
181 1
35 77
0.0000
Error
359 5
71
5 063
Total
902.8
74
R Square Estimate
0 602
MAD Based Scale Estimate
1 0S7
Weighted Scale Estimate
225
IQR Estimate of Residuals
1 468
DeL of C0V(Regression Coefficients] Matrix 5 5107E-8
Regression Tabte
Obs
V Vector
Yhat
Residuals
Hat[u]
Res^Scale
Stude" Res
Wts[u]
Res Dist
1
97
6 32
338
0 063
1 502
1 552
1
1 502
2
101
6105
3 995
0 0599
1 775
1 831
1
1 775
3
103
7 297
3 003
0 0857
1 334
1 396
1
1 334
4
95
6 939
2 561
• 0 0805
1 138
1 187
1
1 138
5
10
6 939
3 061
0 0729
1 36
1 413
1
1 36
(The complete regression table is not shown.)
386
-------
Output for Iterative OLS (Leverage OFF) (continued).
IRLS Regression - Residuals QQ Plot
2 42
212
1.82
1.52
1.22
0.92
062
0.32
0.02
3
2-0 58
to
4>
-0 88
"O
e
N -1 18
-1.48
|lndv(0 051 ¦ *1 94 |
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced. The sequential classical method failed to identify all of the regression outliers.
9.5 Biweight Regression Method
1. Click Regression ~ Biweight.
liS Scout 4.0 - [D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExeel\DEFIME]
1 n2 File Edit Configure Data Graphs
Stats/GOF Outliers/Estimates
.J] Multivariate EDA GeoStats Programs Window Help
Navigation Panel j
r-
0 1
2
OLS
5
6
7 8
Name
Y X
LMS
Iterative OLS
D:\Narain\Scout_Fo...
i
6i 1
¦
Biweiaht
2
i 8.5 1
Huber
3
8.5 2
MVT
4
10 2
PROP
5
12 4
Method Comparison
2. The "Select Variables" screen (Section 3.3) will appear.
387
-------
o
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
® If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window.
iH Biweight' Options
-Regression Value
Residual Scale Tumrrg Constant
"Number of Regression Iterations —
| To
[Max = 50]
"Residuals MDs Distribution —
Beta C Chisquare
•Intermediate Iterations -
(• Do Not Display
C Display Every 5th
C Display Every 4th
^ Display Every 2nd
C Display Al
'Identify Leverage Ponts
W Leverage
¦Select Leverage Distance Method
f Classical
Sequential Classical
Huber
^ PROP
f MVT (Trimming)
¦Number of Leverage Iterations
I 10
(Max = 50]
'Leverage MDs Distribution
f Beta
Chisquare
¦Initial Leverage Distances
C Oasscal
C Sequential Classical
C Robust (Median, 1 48MAD)
(* OKLG (MaronnaZamar)
C KG (Not Orthogonahzed)
C MCD
-Leverage Value(s)
| 0.05
Leverage Influence Function
Alpha
W Display Intervals
"Confidence Coefficient
0 95
OK
Cancel
I* Display Diagnostics
A
o Specify the "Regression Value." The default is "4.'
o Specify the "Number of Regression Iterations." The default is
"10"
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
388
-------
o Specify the "Leverage Initial Distances" The default is "OKG
(Maronna Zaniar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options
o Click on the "Graphics" button to get the options window.
H Optio nsReg^essio nGna [ihics
m
17 XY Plots
[7 Yvs Y-Hat
17 Y vs Residuals
17 Y-Hat vs Residuals
17 Residuals vs Leverage
17 QQ Residuals
XY Plot Title
Biweight Regression ¦ Y vs X Plot
Y vs Y-HatTitle
Biweight Regression - Y vs Y-Hat
Y vs Residuals Title
Biweight Regression -Y vs Residu
Y-Hat vs Residuals Title
"Regression Line - Fixing Other Regressors at —
(* No Line [7 Confidence Interval
Minimum Values
C Mean Values
C Maximum Values
Zero Values
(7 Piedection Interval
r Confidence Coefficient
I 095
Biweight Regression -Y-Hat vs R
Residuals vs Leverage Title
-Graphics Distribution
<•" Beta
Chisquare
Biweight RegressiortEResiduals vs
QQ Residuals Title
-Residual/Lev. Alpha —
Biweight Regression - Residuals Q
0 05
OK
Cancel
A
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options
o Click "OK" to continue or "Cancel" to cancel the computations.
Output example: The data set "DEFINE.xls" was used for Biweight regression. It>has 1
predictor variables (p) and 26 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the Biweight tuning constant iteratively. Finally, the estimated
regression parameters are calculated.
389
-------
Output for Biweight (Leverage ON).
Data Set Used: Define (predictor variables p = 1).
Regression Analysis Output
Date/Time of Computation
3/4/200810 03 07 AM
User Selected Options
From File
D \Narain\Scout_For_Wmdows\ScoutSouice\WoikDatlnExcel\DHFINE
Full Piecision
OFF
Selected Regression Method
Biweight
Residual Biweight Tuning Constant
4 (Used to Identify Vertical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesnt Converge)
Leverage
Identify Leverage Points (Outliers in X-Space)
Selected Leverage Method
PROP
Initial Leverage Distance Method
OKG (Maronna Zamar) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leveiage) Variables
Leverage Distance Alpha
0 05 (Used to Identify Leverage Points)
Number of Leveiage Iterations
10 (Maximum Number if doesnt Converge)
Y vsYhat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title For Y vsX Plots
Biweight Regression • Y vs X Plot
Title for Residual QQ Plot
Biweight Regression ¦ Residuals QQ Plot
Residual Band Alpha
0 05 (Used
n Graphics Residual Bands)
Title Residual vs Distance Plot
Biweight Regression ¦ Residuals vs Unsquared Leverage Distance Rot
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Leverage Points are Outliers inX-Space of Selected RegressionVariaUesL
I
I
Number of Selected Regression Variables] 1
Number of Observations
2G
Dependent Variable
Y
Residual Value* used with Graphic* Display
Upper Residual Indvidual [0 05) MD
1 903
Lower Residual Indvidual {0 05) MD
¦1 903
Correlation Matibc
Y
X
Y
1
0218
X
0218
1
390
-------
Output for Biweight (Leverage ON) (continued).
E igenvalues of Correlation Mabbc j
Evall
Eval 2
0.782
1.218
Ordinary Least Squares (OLS) Regression Resdts
Estimates of Regression Parameteis
Intercept
X
22.0G
0 25G
Stdv of Estimated Regression Parameters
Intercept
X
4107
0 233
ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
224 9
1
224 9
1.202
0.2838
Error
4490
24
187.1
Total
4715
25
R Square Estimate
0.0477
MAD Based Scale Estimate
8.862
Weighted Scale Estimate
13 68
IQR Estimate of Residuals
25.79
Det. of C0V[Regression Coefficients] Matrix
0.391
Initial Weighted Regression Iteration with Identified Leverage Poirts
Estimates of Regression Parameters
Intercept
X
22.06
0.256
Stdv of Estimated Regression Parameters
Intercept
X
4.107
0.233
391
-------
Output for Biweight (Leverage ON) (continued).
AN OVA Table
S ource of Variation
SS
DOF
MS
F-Value
P-Value
Regression
224 9
1
224.9
1.202 | 0 2838
Enor
4490
24
1871
Total
4715
25
R Square Estimate
0 0477
MAD Based Scale Estimate
8 862
Weighted Scale Estimate
13.68
Unsquared Leverage Distance lndiv-MD(0.05)
1.803
IQR Estimate of Residuals
25 79
Determinant of Leverage S Matrix
137.6
R egression Table with Leverage Option
Obs
Y Vector
Yhat
Residuals
Hat[u]
Res/Scale
Stude" Res
Wts[u]
Res Dist
Lev Dist.
0LS R~ist
1
e
22.32
•1G 32
0 0827
•1 193
-1.246
1
1.193
1 052
1.193
2
8.5
22.32
¦13.82
0.0827
-1 01
•1.055
1
1.01
1 052
1.01
3
8.5
22.58
-14 08
0.0758
•1 029
¦1.07
1
1.029
0 966
1.029
4
10
22 58
-12 58
0 0758
•0 919
¦0 956
1
0.919
0 966
0 919
5
12
23.09
-11 09
0.0638
¦0 811
•0.838
1
0.811
0.796
0 811
6
40
23.08
16 91
0 0638
1 237
1 278
1
1 237
0.796
1 237
7
42.5
23 09
1941
0 0638
1.419
1 467
1
1 419
0.796
1 419
8
45
23 34
21.66
0 0587
1.583
1.632
1
1.583
0.711
1.583
9
50
23.34
26 66
0.0587
1 949
2.009
1
1.949
0 711
1.949
10
13
23 09
¦10.09
0 0638
-0 737
•0 762
1
0 737
0 796
0 737
11
14
23.09
¦8.086
0 0638
¦0.664
•0 687
1
0.664
0.786
0.664
12
17
23 34
-6 342
0 0587
¦0 464
•0.478
1
0.464
0 711
0.464
13
17 4
23 34
-5 942
0 0587
¦0 434
•0 448
1
0 434
0.711
0 434
14
22
24.62
-2 62
0 0417
-0.192
¦0186
1
0192
0.284
0.192
15
24
24 62
-0 62
0.0417
-0 0454
•0.0463
1
0.0454
0.284
0 0454
1G
25
24.62
0 38
0.0417
0 0277
0 0283
1
0 0277
0.284
0 0277
17
42.5
27.18
15.32
0.0514
1.12
1.15
1
1.12
0.568
1.12
18
43
27.18
15 82
0 0514
1.157
1 188
1
1.157
0 568
1 157
18
441
27 69
16.41
0 0603
1 2
1.238
1
1 2
0.739
1 2
20
45 3
27 82
17.48
0 0628
1 278
1 32
1
1 278
0 781
1.278
21
20
28 73
•9 734
0.119
-0.712
•0 758
1
0.712
1 421
0.712
22
22
29.73
¦7.734
0119
-0 565
¦0.602
1
0.565
1.421
0.565
23
21
29 99
-8 99
0129
-0 657
•0 704
1
0 657
1.506
0.657
24
23
30 25
-7 245
014
-0 53
¦0 571
1
0 53
1.581
0 53
(The complete regression table is not shown.)
392
-------
Output for Biweight (Leverage ON) (continued).
Final Reweighted Regression
ReaJts
Estimates of Regression Parameter
Intercept
X
11 G4
0 358
Stdv of Estimated Regiession Parameters
Intercept
X
|
1 372
0 0696
J
AN OVA Table
Source of Variation
SS
DOF
MS
F Value
P Value
Regression
346
1
346
26 45
0 0002
Error
173 3
13 25
13 08
Total
519.3
14 25
R Square Estimate
0S66
MAD Based Scale Estimate
7 695
Weighted Scale Estimate
3 617
IQR Estimate of Residuals
25 53
Det of COV[Regression Coefficients] Matrix] 0 00416
Regiession Table
Qbs
Y Vector
Yhat
Residuals
Hat[u]
Res/Scale
Stude" Res
Wts[i,i]
ResDist.
1
6
11 99
¦5 993
0 0827
-1 657
-1 73
0 678
1 657
2
85
11 99
¦3 493
0 0827
-0 966
¦1 008
0 883
0966
3
85
1235
•3 852
0 0758
¦1.065
¦1.108
0 859
1 065
4
10
12 35
¦2 352
0 0758
•0 65
¦0 676
0.946
065
5
12
13 07
¦1 068
0 0638
¦0 295
-0 305
0 989
0 295
G
40
13 07
26 93
0.0638
7.446
7 696
0
7.446
7
42 5
13 07
29.43
0 0638
8.138
8 41
0
8138
8
45
13 43
31 57
0 0587
8 73
8.998
0
8 73
9
50
13 43
36.57
0 0587
10.11
10 42
0
1011
10
13
13 07
¦0 0681
0 0638
¦0 0188
-0 0195
1
0 0188
11
14
13 07
0 932
0 0638
0 258
0 266
0 992
0 258
(The complete regression table is not shown.)
FinalWeighted Correlation Matin
Y
X
Y
1
0 867
X
0 867
1
Eigenvalues of FinalWeighted Correlation Matrix
Eval 1
Eval 2
0133
1 867
-------
Output for Biweight (Leverage ON) (continued).
Biweight Regression - Residuals QQ Plot
11.17
10.17
4
917
*
817
d
7.17
M
m
617
«
1
•
(£.
u
a
1
? 3,7
(0
to
*
2.17
|lnclv|0 05]« *1 90 |
1.17
- -
0.17
J *
J M * *
-0.63
* - "
.1 si Mvf0.05]--19
-------
Output for Biweight (Leverage ON) (continued).
Biweight Regression - Residuals vs Unsquared Distance Plot
|lndv-MD|0 OS) - *150
M
M
d
M
M
t '
M
4
|lrxiv-Res[0 05)» *1 90 |
i
* M
' M
-
4 «*
ndv-ResfO OS] - -1SO | *
|Max-MD(0.051-42M
28 >
0.46 0.5$ 0 56 076 0 86 0 96 106 1 16 1.28 1 36 1.46 1 56 1 66 1.76 1.86 1.96 2 06 2 16 2.26 2.36 2.46 2 56 2 66 2.76 266 295
Unsquared Leverage Distances
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. The regression lines are produced since there is only one predictor variable.
395
-------
Output for Biweight (Leverage OFF).
R egression Analysis 0 utput
Date/Time of Computation
3/5/2008 7:38 34 AM
User Selected Options
From File
D VNarainVS cout_For_Windows\ScoutSoutceW/otkD atlr£ xcel\D EFtM E
Full Precision
OFF
Selected Regression Method
Biweight
Residual Biweight Tuning Constant
4 (Used to Identify Veitical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leverage
Off
Y vs V-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title FofYvsX Plots
Biweight Regression • Y vsX Plot
Title for Residual QQ Plot
Biweight Regression - Residuals QQ Plot
Residual Band Alpha
0.05 (Used in Graphics Residual Bands)
Residual vs Distance Plot
Not Selected
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Number of Selected Regression Variables
1
Number of Observations
2G
Dependent Variable
Y
Residual Values used with Graphics Display
Upper Residual Indvidual (0.05) MD
1.903
Lower Residual Indvidual (0.05) MD
-1.903
Correlation Matrix
Y
X
Y
1
0 218
X
0 218
1
Eigenvalues of Correlation Matrix
EvaM
Eval 2
0 782
1.218
396
-------
Output for Biweight (Leverage OFF) (continued).
0 rdinary Leasts quares (0 LS) R egression R estAs
Estimates of Regression Parameters
Intercept | X
22 OS , 0 25G
i
S tdv of E stimated Regression Parameteis
Intercept
X
4107 | 0 233
AN OVA Table
Source of Variation
ss
DOF
MS
F-Value
P-Value
Regression
224.9
1
224.9
1.202
0.2838
Error
4490
24
1871
Total
4715
25
R Square Estimate
0 0477
MAD Based Scale Estimate
8.8S2
Weighted Scale Estimate
13.S8
IQR Estimate of Residuals
25 79
Det. of C0V[Regression Coefficients] Matrix
0.391
Final Reweighted R egression Resits
Estimates of Regression Parameters
Intercept
X
11 G4
0 358
Stdv of Estimated Regression Parameteis
Intercept
X
1.372
0.0SSS
ANOVA Table
Source of Vaiiation
ss
DOF
MS
F-Value
P-Value
Regression
34G
1
34G
2G 45
0.0002
Error
173 3
13 25
13.08
Total
519.3
14 25
R Square Estimate
0.666
MAD Based Scale Estimate
7G95
Weighted Scale Estimate
3G17
IQR Estimate of Residuals
25.53
Det of COV[Regression Coefficients] Matrix
0 0041G
397
-------
Output for Biweight (Leverage OFF) (continued).
Regression T able
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res
Wts[i.i]
Res Dist.
1
G
11.99
-5.993
0.0827
-1.G57
-1.73
0.678
1 G57
2
8.5
11.99
\
-3.493
0.0827
-0.96G
-1 008
0 883
0.966
3
8.5
12.35
-3.852
0.0758
-1.065
-1.108
0.859
1.065
4
10
1235
-2.352
0.0758
-0 65
-0.G76
0.946
0.G5 -
5
12
13.07
-1.0G8
0.0G38
-0.295
-0.305
0.989
0 295
G
40
13.07
2G 93
0.0G38
7.446
7.696
0
7.446
7
42.5
13.07
29.43
0.0G38
8.138
8.41
0
8138
8
45
13.43
31.57
0.0587
8.73
8.998
0
8 73
9
50
13.43
3G.57
0.0587
10.11
10.42
0
10.11
10
13
13.07
-0.0G81
0.0G38
-0.0188
-0 0195
1
0.0188
11
14
13.07
0.932
0.0638
0.258
0 266
0.992
0.258
12
17
13.43
3.574
0.0587
0 988
1.018
0.879
0.988
13
17.4
13.43
3.974
0.0587
1 099
1.132
0.851
1.099
14
22
15.22
G.783
0.0417
1.875
1.916
0.599
1.875
15
24
15.22
8 783
0.0417
2.428
2.481
0.38G
2.428
1G
25
15 22
9.783
0.0417
2.705
2.763
0.281
2.705
17
42.5
18.8
23.7
0.0514
G.553
G.728
0
6.553
18
43
18.8
24 2
0.0514
G.G91
G.87
0
6 691
19
44.1
19.52
24.58
0.0603
G.797
7.012
0
6.797
20
45.3
19.7
25 G
0.0629
7.079
7.313
0
7.079
21
20
22.38
-2.382
0.119
-0.659
-0.702
0.945
0.659
22
22
22.38
-0.382
0.119
-0.106
-0.113
0.999
0106
23
21
22.74
-1.74
0.129
¦0.481
-0.51G
0 97
0.481
24
23
23.1
-0.0984
0.14
-0.0272
-0.0293
1
. 0 0272
25
22.5
22.81
-0.312
0.131
-0 0862
-0 0925
0.999
0.0862
2G
24
23.1
0.902
0.14
0 249
0.2G9
0.992
0.249
Final Weighted Correlation Mdbix
Y
X
Y
1
0.8G7
X
0 367
1
Eigenvalues of Final Weighted Correlation Matrix
Eval 1
Eval 2
0.133
1.8G7
398
-------
Output for Biweight (Leverage OFF) (continued).
ID IW0°q-m)|
Biweight Regression - Residuals QQ Plot
lhcH0Q5] ¦ »190 [
Normal Quantiles
Biweight Regression - Y vs X Plot
S3 80
5160
49 60
*7 60
<5 60
4360
41 eo
3960
37 £0
35 60
3360
3160
'29 60
" 27 60
2560
2360
2160 •
1960
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced even if checked on. The regression lines are produced since there is only one predictor variable.
399
-------
9.6 Huber Regression Method
1. Click Regression ~ Huber.
ill Seb.ut< 4VQi = [D :AMa rqin^cgutJoi^Wjndiws^Sco utSo urceWorkDat I n Ercel^TiAfiCliS.xU
Regression
~§ File Edit Configure Data Graphs Stats/GOF Outliers/Estimates
Multivariate EDA GeoStc
Navigation Panel |
0
1
2
Name
X
V
D.\Narain\Scout Fo .
1
4.37
5 23
2
4 56
5.74
3
4 26
4 93
4
4 56
574
5
43
5.19
OLS
LMS
Iterative OLS
Biweight
MVT
PROP
Method Comparison
2. The "Select Variables" screen (Section 3.3) will appear.
o Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window.
] tfuberj Options
"Regression Value
| 0X6
Residual Influence Function Alpha
•Number of Regression Iterations
I ^5
[Max = 50]
• R esiduals M D s D istribution —
Beta C* Chrsquare
"Intermediate Iterations ~
(* Do Not Display
f Display Every 5th
f Display Every 4th
C Display Every 2nd
Display All
-Identify Leverage Points
Leveiage
Select Leverage Distance Method
f Classical
C Sequential Classical
Huber
C PROP
MVT (Trimming)
Number of Leveiage Iterations
I ™
[Max = 50]
'Leverage MDs Distribution
<* Beta
f Chisquare
¦Initial Leverage Distances
C Classical
C Sequential Classical
Robust (Median, 1 48MAD)
OKG (MaronnaZamar)
KG (Not Qrthogonalced]
C MCD
"Leverage Value(s)
| 0 05
Leverage Influence Function
Alpha
I* Display Intervals
Confidence Coefficient
095
R Display Diagnostics
OK
Cancel
o Specify the "Regression Value." The default is "0.05.'
400
-------
o Specify the "Number of Regression Iterations." The default
is"10 "
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances." The default is "OKG
(Maronna Zamar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options.
o Click on the "Graphics" button to get the options window.
0 ptio nsRegtessibnG i;ap hjcs
W W Plots
W YvsY-Hat
|7 Y vs Residuals
P Y-Hat vs Residuals
I* Residuals vs Leverage
f~ QQ Residuals
XY Plot Title
Huber Regression - Y vsX Plot
Y vs Y-H atT itle
Huber Regression • Y vs Y-H at PI
Y vs Residuals Title
Huber Regression ¦ Y vs Residual
Y-Hat vs Residuals Title
"Regression Line - Fixing Other Regressors at —
No Line Confidence Interval
Minimum Values
(* Mean Values
C Maximum Values
C Zero Values
I*7 Predection Interval
p Confidence Coefficient ~i
0.95
Huber Regression - Y-Hat vs Resi
Residuals vs Leverage Title
"Graphics Distribution
Beta
C Chisquare
Huber Regression - Residuals vs
"Residual/Lev. Alpha
I tt05 "
OK
Cancel
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the computations.
401
-------
Output example: The data set "BRADU.xls" was used for Huber regression. It has 3
predictor variables (p) and 75 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the Huber function iteratively. Finally, the estimated regression
parameters are calculated.
Output for Huber (Leverage ON).
Data Set Used: Bradu (predictor variables p = 3).
Date/Time of Computation
3/5/2008 7 51 39 AM
User Selected Options
From Fits
D \Narain\Scout_FoT_Windows\ScoutSource\WorkDatlnExcel\BRADU
Full Piecision
OFF
Selected Regression Method
Hubei
Residual Influence Function Alpha
0 05 (Used to Identify Veitical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leveiage
Identify Leverage Points (Outliers in X-Space)
Selected Leverage Method
PROP
Initial Leverage Distance Method
OKG (Maronna Zamar) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leverage) Variables
Leverage Distance Alpha
0.05 (Used to Identify Leverage Points)
Number of Leverage Delations
10 (Maximum Number if doesnt Converge)
YvsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hal vs Residual Plot
Not Selected
Title For Y vsX Plots
Huber Regression - Y vs X Plot
Title for Residual QQ Plot
Huber Regression ¦ Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Giaphics Residual Bands)
Title Residual vs Distance Plot
Hubei Regression • Residuals vs Unsquared Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Leverage Points areQutlieis inX-Space of Selected Regression Variables.
Number of Selected Regression Variables' 3
Number of Observations] 75
Dependent VariableI y
Residual Values used with G raphici D icplay
Upper Residual Indvidual (0 05) MD
1.94
Lower Residual Indvidual (0 05) MD
-1 94
Correlation Matrix
y
xl
x2
x3
y
1
0 946
0 962
0 743
x1
0 946
1
0 979
0 708
x2
0 962
0 979
1
0 757
x3
0 743
0 708
0 757
1
402
-------
Output for Huber (Leverage ON) (continued).
E igenvalues of Correlation M abix
Eval 1
Eval 2
Eval 3
Eval 4
0.0172
0.055G
0.368
3.559
0 rdinary Least S quares (0 LS) Ft egression R esdts
E stimates of R egression Parameters
Intercept
x1
x2
x3
•0.388
0 239
-0 335
0.383
StdvofEstimatedRegression Parameters
Intercept
x1
x2
x3
0.418
0.262
0155
0.129
AN OVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
543.3
• 3
181.1
35.77
0.0000
Error
359.5
71
5.063
Total
902.8
74
R Square Estimate
0 602
MAD Based Scale Estimate
1.067
Weighted Scale Estimate
2.25
IQR Estimate of Residuals
1.468
Det. of COV[Regression Coefficients] Matrix
5.5107E-8
Initial Weighted Regression Iteration with 1 dent lied Leverage Ports
E stimates of R egression Parameters
Intercept
x1
x2
x3
-0.0105
0 0624
00119
¦0107
Stdv of Estimated Regression Parameters
Intercept
x1
x2
x3
0.197
0.0689
0.0684
0.0713
403
-------
Output for Huber (Leverage ON) (continued).
ANOVA Table
Souice of Variation
ss
DOF
MS
F-Value
P-Value
Regression
0.898
3
0 299
0.94
0 4272
Error
18.14
57
0.318
Total
19.04
60
R Square Estimate
0 0472
MAO Based Scale Estimate
0.902
Weighted Scale Estimate
0 564
Unsquared Leverage Distance lndiv-MD(0 05)
2 743
IQR Estimate of Residuals
1.236
Determinant of Leverage S Matrix
1.357
R egression T able with Leverage Option
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res
Wts[i,i] I ResDist
Lev Disl
OLS R~ist
1
9.7
-2.174
11 87
0 063
21.05
21.74
2.364E 13
21 05
29.44
1.502
2
101
-2 265
12.36
0.0599
21 92
22.61
1.074E-13
21 92
30 21
1 775
3
103
-2 418
12.72
0.0857
22 54
23 58
1 886E-14
22 54
31 89
1 334
4
95
-2.528
12.03
0 0805
21 32
22.23
6 931E-15
21 32
32 86
1.138
5
10
-2 443
12.44
0 0729
22 06
22 91
1 266E-14
22 06
32 28
1.36
6
10
-2 217
12.22
¦ 0 0756
21 66
22.52
7.228E-14
21 66
30.59
1.527
7
108
-2 219
13.02
0 068
23 08
23 9
6 576E-14
23 08
30 68
2 006
8
10.3
-2 24
12.54
0 0631
22 23
22.97
1 634E-13
22 23
29 8
1.705
9
9 G
-2 475
12.07
0 08
21 4
22 32
1 768E-14
21 4
31 95
1 204
10
99
-2 437
12.34
0.0869
21 87
22 89
5 017E-14
21 87
30.94
1 35
11
-0 2
-2 782
2 582
0 0942
4.577
4.809
1.424E-16
4 577
36 64
3.48
12
-0.4
-2.94G
2 54G
0144
4.513
4.877
3.684E-17
4.513
37.96
4.165
13
07
-2.589
3.289
0.109
5.83
6.177
1 069E-16
5 83
36.92
2.719
14
0.1
-2.556
2.656
0.564
4.708
7.127
1.478E-18
4.708
41.09
1.69
15
-0 4
00115
-0.412
0 0579
-0.73
-0.752
1
0.73
2 002
0 284
1G
06
0177
0.423
0 0759
0.75
0.78
1
0 75
2165
0.385
17
-0.2
¦0.0128
•0.187
0.0393
-0.332
-0 339
1
0.332
1.938
0.287
18
0
-0 0819
0.0819
0.0231
0.11
0.111
1
011
0 786
0.175
19
01
•0 0971
0197
0 0312
0 349
0.355
1
0 349
1 287
0 29
20
04
-0 0119
0.412
0 0476
0 73
0 748
1
0 73
2 067
0.151
21
09
-0 0253
0 925
0 0294
1 64
1 665
1
1.64
1 059
0 299
22
03
-0.151
0.451
0 0457
0 799
0 818 .
1
0 799
1 746
0 415
23
-0 8
0 05G1
•0 856
0 0293
-1 518
-1 54
1
1 518
1.163
0.19
(The complete regression table is not shown.)
404
-------
Output for Huber (Leverage ON) (continued).
Final Reweighted Regiession
*l
GC I
Estimates of Regression Paiameteis
Intercept
x1
x2
x3
-0.413
0 237
-0.382
0.44
Stdv of Estimated Regression Parameters
Intercept
x1
x2
x3
0 378
0 238
0142
0.118
AN OVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
610
3
203.3
48.96
0 0000
Error
290.9
70 05
4.153
Total
900.9
73 05
R Square Estimate
0.677
MAD Based Scale Estimate
1.158
Weighted Scale Estimate
2.038
IQR Estimate of Residuals
1 579
Det of C0V[Regression Coefficients] Matrix
2.8216E-8
Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i.i|
Res/Scale
Stude" Res
Wts[i.i]
Res Dist
1
9.7
6.944
2.756
0.063
1.352
1.397
1
1.352
2
101
6.722
3 378
0 0599
1 657
1.709
1
1.657
3
10.3
8 045
2 255
0 0857
1 106
1.157
1
1.106
4
9.5
7.667
1 833
0.0805
0.9
0.938
1
0.9
5
10
7.651
2 349
0.0729
1.153
1.197
1
1.153
G
10
7.201
2 799
0 0756
1 374
1.429
1
1 374
(The complete regression table is not shown.)
FinalWeighted Correlation Matrix
V
x1
x2
x3
y
1
0 94
0 957
0.813
x1
0 94
1
0 977
0 774
x2
0S57
0.977
1
0.833
x3
0813
0.774
0 833
1
Eigenvalues of FinalWeighted Correlation Matrix
Eval 1
Eval 2
Eval 3
Eval 4
0.0165
0.0618
0 27
3 652
405
-------
Output for Huber (Leverage ON) (continued).
Huber Regression - Residuals QQ Plot
220
[inSviooq - ~i"5T|
4
M
a 4
-3 80
4 80
-5 80
-23 -24 -1.9 -1.4 -09 -04 01 0.6 1 1 1.6 2.1 26
Normal Quantiles
Huber Regression - Residuals vs Unsquared Distance Plot
|hdv-KO(O.OS]-t274
llrx*v-Res[0.0S] ¦ *1 9« 1
m a
. -j7)/
i '
. *
, M '• '•"A '
'' <•
* t *
d
d
Mv-Resfl 05| • -1 64 ]
d
a
*
M
|Max-MD|0 OS] - ~3.94
02 1 2 22 3.2 42 52 62 70
Unsquared Leverage Distances
406
-------
Output for Huber (Leverage ON) (continued).
Huber Regression - Y vs X Plot
- -
* 4 * *
i
i
* '3 . ,
¦ -¦ '• ¦ .<
10 11 12 13
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. Regression lines are not produced since there are three predictor variables. Select other
"X" variables by using the drop-down bar in the graphics panel and click on "Redraw."
407
-------
Output for Huber (Leverage OFF).
Regression Analysis Output
Date/7irrte of Compulation
3V5/2008 8 1 5 51 AM
User Selected Options
From File
D \N arain\S cout_For_Wmdows\ScoutSource\WorkDatl nE xcel\B RAD U
FuD Precision
OFF
Selected Regression Method
Huber
Residual Influence Function Alpha
0 05 (Used to Identify Vertical Regression Outbeis)
Number of Regression Iterations
10 (Maximum Number if doesnt Converge)
Leverage
Off
Y vsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title For Y vsX Plots
Huber Regression -Y vsX Plot
Title foi Residual QQ Plot
Huber Regression - Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Restdua) vs Distance Plot
Huber Regression • Residuals vs Unsquaied Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Numbei of Selected Regression VariabtesJ 3
Number of Observations
75
Dependent Variable
y
Residual Values used with Graphics Display
Upper Residual Indvidua! (0 05) MD
1.94
Lower Residual Indvidua! (0 05) MD
•1 94
L .
CoirelationMatiK
y
x1
x2
x3
y
1
0 94G
0 962
0 743
x1
0 946
1
0.979
0 70S
x2
0 9G2
0 979
1
0 757
x3
0 743
0 708
0 757
1
Eigenvalues of Correlation Malm
Eval 1
Ev^2
Eval 3
Eval 4
0.0172
0 055G
0 368
3 559
Ordinary Least Squares (OLS) Regression RetJtt
Estimates of Regression Parameters
Intercept
x1
x2
x3
¦0 388
0 239
-0.335
0 383
Stdv of Estimated Regression Parameters
Intercept
x1
*2
x3
0 416
0 262
0155
0128
ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
543 3
3
181.1
35 77
0 0000
Error
359 5
71
5 063
Total
902 8
74
408
-------
Output for Huber (Leverage OFF) (continued).
R Square Estimate
0 602
MAO Based Scale Estimate
1 067
Weighted Scale Estimate
2 25
IQR Estimate of Residuals
1.468
Det of COV[Regression Coefficients] Matrix
5 5107E-8
Final Reweighted Regression ReaAs
Estimates of R egression Parameters
Inteicept
*1
x2
x3
¦0 413
0237
-0 382
0 44
S tdv of E stimated Regression Parameters
Intercept
*1
x2
x3
0 370
0 238
0142
0118
I
i
ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
610
3
203 3
48 96
0.0000
Error
290 9
70.05
4.153
Total
900 3
73 05
R Square Estimate
0 677
MAD Based Scale Estimate
1.158
Weighted Scale Estimate
2 038
IQR Estimate of Residuals
1 578
Del. of COV[Regression Coefficients] Matiwl 2 821BE-8
Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude" Res
Wts[u]
Res Dist
1
97
6.944
2.756
0 063
1 352
1 397
1
1 352
2
101
6 722
3 378
0 0599
1 657
1 709
1
1 657
3
10.3
8 045
2 255
0 0857
1 106
1 157
1
1 106
4
85
7 667
1 833
0 0805
09
0 938
1
09
5
10
7 651
2 349
0 0729
1 153
1 197
1 153
6
10
7 201
2 799
0 0756
1 374
1 429 I 1
1 374
7
108
6 895
3 905
0 068
1 916
1985 j 1
1.916
(The complete regression table is not shown.)
Final WeightedCorrelation Matrix
V
x1
x2
x3
y
1
0 94
0 957
0 813
x1
0 94
1
0 977
0 774
x2
0 957
0 977
1
0 833
x3
0 813
0 774
0 833
1
I
I
E igenvalues of Final Weighted Correlation Mabii
Eval 1
Eval 2
Eval 3
Eval 4
0 0165
0 0618
0 27
3 652
-------
Output for Huber (Leverage OFF) (continued).
Huber Regression - Residuals QQ Plot
B"180 »»Moas|..i9«"1
-0.9 -0* 01 06 1.1 1.8 2.1 2.6
Normal Quantiles
Huber Regression - Y vs X Plot
1152
10.82
M
982
Jt M
8 82
7B2
682
582
> 4.82
3.82
2 82
182
082
a M d
j * jfda M 4 s m d
„ a
d
-0.18
-1 18
* 4 * * M '
*** **** *
, » - ¦' /
M
-2.1 B
-1
0123156789 10 11
x1
12
13
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced even if checked on. Regression lines are not produced since there are three predictor variables.
Select other "X" variables by using the drop-down bar in the graphics panel and click on "Redraw."
410
-------
9.7 MVT Regression Method
1. Click Regression > MVT.
i
Scouti4'.0J= [JJrAHarain^cout^fori^Windcws^coutSo.urceAWorkDatlnExcel^TAGKllOSSjJ
rn§ File Edit Configure Data Graphs 5tats/GOF Outliers/Estimates
Navigation Panel II
Regression,
Multivariate EDA Geo5tats Programs Window Help
Name
D:\Naram\Scout Fo
1
1
Mack.-
I
Air-Row
43
37f
37|
23;
18,
Temp
80
80.
75,
TZ
62.
OLS
LM5
Iterative OLS
Btweight
Huber
PROP
Method Comparison
2. The "Select Variables" screen (Section 3.3) will appear.
o Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on the "Options" button to get the options window.
I MVT Options
¦Regression Values
I 01
R esidual Trim Percent
Alpha for Residual Outliers
0 05
•Number of Regression Iterations
I IF
[Max = 50]
¦Residuals MDs Distribution —
(• Beta C Chisquare
"Intermediate Iterations —
Do Not Display
Display Every 5th
C Display Every 4th
C Display Every 2nd
C Display All
~ I dentify Leverage Points
Leverage
"Select Leverage Distance Method
C Classical
C Sequential Classical
f Hubet
PROP
MVT (Trimming)
"Number of Leverage Iterations
10
[Max = 50]
"Leverage MDs Distribution
(* Beta
Chisquare
"Initial Leverage Distances
C Classical
C Sequential Classical
f Robust (Median, 1 48MADJ
(* 0KG (MaronnaZamar)
C KG (Not OrthogonaEzed]
C MCD
"Leverage Value(s)
| 005
Leverage Influence Function
Aloha
W Display Intervals
Confidence Coefficient
0.95
P Display Diagnostics
OK
Cancel
o Specify the "Regression Value." The default is "0.05."
411
-------
o Specify the "Number of Regression Iterations." The default
is"10."
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances" The default is "OKG
(Maronna Zamar)."
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options,
o Click on the "Graphics" button to get the options window.
H 0 [j tionsRegressionG rap hies
17 XY Plots
W YvsY-Hat
I* Y vs Residuals
Y-Hat vs Residuals
W Residuals vs Leverage
P QQ Residuals
XY Rot Title
I MVT Regression ¦ Y vs X Plot
Y vs Y-HatTitle
| MVT Regression -Y vs Y-Hat Plot
Yvs Residuals Title
|MVT Regression • Y vs Residuals
Y-Hat vs Residuals Title
|MVT Regression - Y-Hat vs Resid
Residuals vs Leverage Title
|MVT Regression - Residuals vs U
QQ Residuals Title
|MVT Regression - Residuals QQ
¦Regression Line • Fixing Other Regressors at —
f No Line (~ Confidence Interval
Minimum Values
(* Mean Values
C Maximum Values
ZetoValues
F7 Predection Interval
-Confidence Coefficient —
I 095
"Graphics Distribution
Beta
C Chisquare
-Residual/Lev. Alpha
0 05
OK
Cancel
o Specify the preferred plots and the input parameters,
o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the computations.
A
412
-------
Output example: The data set "STACKLOSS.xls" was used for MVT regression. It has
3 predictor variables (p) and 21 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the trimming percentage and a critical alpha iteratively. Finally,
the estimated regression parameters are calculated.
413
-------
Output for MVT (Leverage ON).
Data Set Used: Stackloss (predictor variables p = 3).
| Regression Analysis Output
Date/T ime of Computation
3/5/2008 8 22 37 AM
User Selected Options
From File
D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\STACKLOSS
Full Precision
OFF
Selected Regression Method
Multivariate Triming (MVT)
Residual MVT Trimming Percentage
01 (Used to Identify Vertical Regression Outliers)
Alpha lor Residual Outliers
005 (Planned Future Modification Used to Compare Residual MVT MDs)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leverage
Identify Leverage Points (Outliers in X-Space)
Selected Leveiage Method
PROP
Initial Leverage Distance Method
OKG (Maronna Zamar) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leverage) Variables
Leverage Distance Alpha
0 05 (Used to Identify Leverage Points)
Number of Leverage Iterations
10 (Maximum Number if doesn't Converge)
YvsY-hat Rot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title For YvsX Plots
MVT Regression - Y vsX Rot
Title for Residual QQ Plot
MVT Regression ¦ Residuals QQ Plot
Residual B and Afciha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
MVT Regression - Residuals vs Unsquared Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Leveiage Points are Outliers inX-Space of Selected Regression Variables.
I
Number of Selected Regression Variables
3
Number of Observations
21
Dependent Variable
Stack-Loss
I ResidualValues used with Graphics Display
Upper Residual Indvidual (0.05) MD
1 889
Lower Residual Indvidual (0 05) MD
¦1 889
Correlation Matrix
Stack-Loss
Air-Flow
Temp
Acid-Conc
Stack-Loss
1
0 782
05
0 92
Air-Flow
0.782
1
0 391
0 878
Temp
05
0 391
1
04
Acid-Conc
0 92
0 87G
04
1
Eigenvalues of Correlation Matrix
Eval 1
Eval 2
Eval 3
Eval 4
0 0532
0 215
0 734
Z997
414
-------
Output for 1MVT (Leverage ON) (continued).
Oidinaiy Least Squaies (OLS) Regression Rente
E stimates of R egression Parameters
Intercept
Air-Flow
Temp
Acid-Conc
¦33 92
0.716
1 295
-0152
S tdv of E stimated R egiession Parameters
Intercept
Air-Flow
Temp
Acid-Conc
11 9
0.135
0 368
015S
AN OVA Table
S ource of Variation
SS
DOF
MS
F-Value
P-Value
Regression
1890
3
630 1
59 9
0 0000
Error
178 8
17
10.52
Total
2069
20
R Square Estimate
0 914
MAD Based Scale Estimate
2 768
Weighted Scale Estimate
3 243
IQR Estimate of Residuals
4313
Det of COV[Regression Coefficients] Matrix
1 0370E-5
Initial Weighted Regression Iteration with Identified Leverage Porte
Estimates of Regression Parameters
Intercept
Air-Flow
Temp
Acid-Conc
-39 54
0 709
1 291
•0.151
S tdv of E stimated R egression Parameter
Intercept
Air-Flow
Temp
Acid-Conc
12.1
0143
0 373
0162
ANOVA Table
S ource of Variation
SS
DOF
MS
F-Value
P-Value
Regression
1421
3
473 5
44 06
0 0000
Error
1722
16 03
10 75
Total
1593
19.03
R Square Estimate
0692
MAD Based Scale Estimate
2 738
Weighted Scale Estimate
3278
Urtsquaied Leveiage Distance lndiv-MD(0 05]
2619
IQR Estimate of Residuals
4169
Determinant of Leveiage S Matrix
5200
Reg? estion Table with Leverage Option
Obs
Y Vector
Yhat
Residuals
Hat(i.i)
Res/Scale
Slude~ Res
Wlsful ] Res Dist
Lev Dist.
OLS R~isl
1
42
38 58
3417
0X2
1 042
1 247
0 562
1 042
2 931
0 997
2
37
39 73
¦1 734
0318
-0 529
¦0 641
0 497
0529
3 02
0 591
3
37
32 31
4 694
0175
1 432
1 576 | 1
1 432
2 073
1 405
(The complete regression table is not shown.)
-------
Output for IMVT (Leverage ON) (continued).
Final Reweighted Regression Resdts
E stimates of Regression Parameters
Intercept
Air-Flow
Temp
Acid-Conc
-42.45
0.S57
0 556
¦0.109
S tdv of E stimated R egression Parameters
Intercept
Air-Flow
Temp
Acid-Conc
7.385
0.0945
0 2G4
0.0968
AN OVA Table
S ource of Variation
SS
DOF
MS
F-Value
P-Value
Regression
1890
3
630
158.1
0 0000
Error
59 78
15
3.986
Total
1950
18
R Square Estimate
0 969
MAD Based Scale Estimate
2.069
Weighted Scale Estimate
1.996
IQR Estimate of Residuals
2.995
Det. of C0V[Regression Coefficients] Matrix
3.4468E-7
Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i.i]
Res/Scale
Stude" Res
Wts[u]
Res Dist.
1
42
39 4
2.604
0 302
1.305
1 561
1
1.305
2
37
395
-2.504
0 318
-1.254
-1 519
1
1.254
3
37
33 39
3.607
0175
1.807
1 989
1
1.807
4
28
20 73
7.273
0129
3.643
3 902
0
3.643
5
18
19 G2
-1.616
0 0522
-0 81
-0.832
1
0.81
6
18
2017
-2.172
0.0775
-1.088
-1.133
1
1.088
(The complete regression table is not shown.)
Final Weighted Correlation Mabbc
Stack-Loss
Air-Flow
Temp
Acid-Conc
Stack-Loss
1
0.836
0 474
0 979
Air-Flow
0 836
1
0 418
0 869
Temp
0 474
0.418
1
0 423
Acid-Conc
0.979
0 869
0 423
1
Eigenvalues of Final Weighted Correlation Matrix
Eval 1
Eval 2
Eval 3
Eval 4
0 0169
019
0.724
3.069
416
-------
Output for MVT (Leverage ON) (continued).
MVT Regression - Residuals QQ Plot
292
2.62
2.32
a
2.02
1.72
|ft**0.0S] - *1.33 |
1 42
1.12
M
0.82
M *
0.52
ui022
13
3 -0 .08
-o
2-0 38
OL
n -0 68
•
N
~X3 -0.98
¦g -138
35 -158
1M lntvJO.051 ¦ -1 69 |
M
M
* *
M
. '
4
* *
J
M
-2.18
-2.48
-2.78
-3 06
-3.38
-3.68
-3.98 M
-4 28
-4 58
-23 -1B
-13
-0.8 -03 03 0.7 13
Normal Quantiles
1.7 23
MVT Regression - Residuals vs Unsquared Distance Plot
fk^dhr'-MCfO OS] - *2 ft2
M
|lndiv-ftes|0 05) ¦ +1 89 |
M
Ji *
4
M
d
J
•
¦M
'M
J
4 i
4
hc*-Res|0 05] ¦ -1 89 |
*
|Max-fcC>|0 05] - *327
054 0.74 0 94 1 14 134 1.54 1 74 1.94 214 2 34 2 54 274 2 94 314 3.34 3.41
Unsquared Leverage Distances
417
-------
Output for MVT (Leverage ON) (continued).
4550
4350
41.50'
3950
37.50
35.50
3350
31.50
29.50
27.50
ft
ft
25.50
c
3 2350
2150
1950
17.50
15.50
13.50
11.50
9.50
7.50
5.50
350
MVT Regression - Y vs X Plot
Air-Flow
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graphs are
considered to be regression outliers. The observations to the right of the vertical lines are considered to be
leverage outliers. The regression lines are not produced since there are three predictor variables. Select
other "X" variables by using the drop-down bar in the graphics panel and click on "Redraw."
418
-------
Output for iMVT (Leverage OFF).
Date/Time of Computation
User Selected Options
From File
! Regression Analysis Output
11/10/2008 8 47 07 AM
D \Narain\Scout For Windows\ScoutSouice\WorkDatlnE>icel\STACKLOSS
Full Precision I OFF
Selected Regression Method
Multivariate Triming (MVT)
MVT T rim Percentage
0.05 (Used to Identify Regression Outliers)
Number of Regression Iterations
10 (Maximum Number if doesn't Converge)
Leverage
Off
Res-Lev Rectangle Alpha
0 05 (Used with Graphics Confidence 8ands)
Title for Residual QQ Plot
MVT Regression - Residuals QQ Plot
Title Residual vs Distance Plot IMVT Regression • Residuals vs Unsquared Distance Plot
Title For Y vs X Plots) | MVT Regression - Y vs X Plot
ResidualQQPIot IN ot S elected
Y vs Residual Plot I Not Selected
Y-hat vs Residual Plot Not Selected
Show Intermediate Results I Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Number of Selected Regression Vanablesj 3
Number of Observations^ 21
-
- - - - -
DependantVariablej Stack-Loss
- -
--
Ordinary Least Squaies (OLS) Regression R estte
"
Regression Parameters Vector
Intercept j Air-Flow | Temp | Acid-Conc
Estimates
1
—
¦39 92 0 71G j 1.295 [ -0.152
S tdv of R egression Estimates Vector
Intercept | Au-Flow j Temp. | Acid-Conc |
....
- - -
— -
-
119 ^ 0135 | 0 363 | 0156 |
— -
ANOVATab
Source of Variation | SS
e
DOF | MS
F-Value
P-Vatue"
Regression 1890
Error | 178 8
3 I 6301
17 j 1052
59.9
0 0000
"
Total | 2069
20 ^
J
419
-------
Output for MVT (Leverage OFF) (continued).
R Square Estimates
0.914 |
|
|
MAD Based Scale Estimates
2.7G8 |
I
I
Weighted Scale Estimates
3.243
IQR Estimates
4 313
Det. of C0V[Regression Coefficients] Matrix
1.0370E-5
Results From the Regression Operation
Regression Parameters Vector Estimates
I
Intercept
Air-Flow
Temp.
Acid-Conc
-43 7
0.889
0.817
-0.107
S tdv of Regression Estimates Vectoi
Intercept
Air-Flow
Temp.
Acid-Conc
|
!
9.432
0119
0.325
0.125
|
I
AN OVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
1957
3
652 3
98.82
0.0000
Error
105.6
16
6.601
I
Total
2063
19
1
R Square Estimates
0.949
MAD Based Scale Estimates
3.046
1
Weighted Scale Estimates
2.569
IQR Estimates
3.365
Det of C0V[Regression Coefficients] Matrix
2 2471E-6
Regression Table
Obs
Y Vector
Vhat
Residuals
Hat|u]
Res/Scale
Stude~ Res
Wts[i,i]
Res Dist.
1
42
39.94
2.062
0.302
0.803
0 96
1
0.803
2
37
40.04
-3.045
0 318
-1 185
-1.435
1
1.185
3
37
33 75
3 248
0175
1 264
1 392
1
1 264
4
28
21.7
6.302
0.129
2.453
2 627
1
2.453
5
18
20.07
-2.065
0.0522
-0 804
-0.826
1
0 804
6
18
20.88
-2 882
0.0775
-1 122
-1.168
1
1.122
7
19
21 06
-2 055
0.219
-0.8
-0.905
1
0.8
8
20
21.0G
-1 055
0.219
-0 411
•0 465
1
0.411
9
15
17.33
-2.325
0.14
-0.905
-0.976 | 1
0.905
(The complete regression table is not shown.)
420
-------
Output for MVT (Leverage OFF) (continued).
n 031
5
3-oxe
S-O.J®
a:
¦a-o.M
Tj-0.96
<3
3-158
MVT Regression - Residuals QQ Plot
|lndr<10051'*r8i~^
-3.08
-338
Normal Quantiles
-------
Output for MVT (Leverage OFF) (continued).
MVT Regression - Y vs X Plot
4550
43.50
4150
3950
37.50
3550
33.50
31.50
2950
2750
M
M
2550
r50
2150
19.50
17.50
,5» -
1350
d
1150
M
950
750 *
550
350 1 J r- ; ; r-
«Z 57 67 77
Air-Flow
Interpretation of Graphs: Observations which are outside of the horizontal lines in the graph are
considered to be regression outliers. The Leverage Distances vs. Standardized residuals plot is not
produced even if checked on. Regression lines are not produced since there are three predictor variables.
Select other "X" variables by using the drop-down bar in the graphics panel and click on '•Redraw."
Note: There are at least four regression outliers (I, 3, 4, and 21) in the data set of size 2 Ion the previous
page. However, the trimming percentage selected is only 5%, which is equivalent to one outlier in the data
set of size 21. The user may want to use the MVT method with a higher trimming percentage to identify all
of the outliers.
9.8 PROP Regression Method
1. Click Regression ~ PROP.
3® Scout 4.0 - [D:\NarainVScoul_For_VVindows\ScoutSource\WorkDatlnExcet\STARCLS]
¦S File Edit Configure Data Graphs
Stats/GOF
Outliers/Estimates
9 Multivariate EDA GeoStats Programs Window
Help
Navigation Panel |
0
1
2
OLS
LMS
Iterative OLS
Biweight
Huber
5
6
7
8
Name
y
X
ZZJ
D:\Narain\Sccut_Fo...
1
I ra
4.37
2
5.74
4.56
3
4.93
4.26
MVT
4
5.74
4.56
PROP
5
5.19
4.3
Method Comparison
2. The "Select Variables" screen (Section 3.3) will appear.
422
-------
o Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button to get the options window.
m PROP Options
'Regression Value
r
005
Residual Influence Function Alpha
•Number of Regression Iterations —
I 10
[Max = 501
"Residuals MDs Distribution —
(* Beta C Chsquare
"IntermediateHerations ~
(* DoNotDisplay
C Display Every 5th
Display Every 4th
f Display Every 2nd
Display AS
Identify Leverage Points
W Leverage
"Select Leverage Distance Method
C* Classical
C Sequential Classical
C Huber
f PROP
C MVT (Tnmrrung)
-Numbei of Leverage Iterations —
I 10
[Max - 50]
-Leverage MDs Distribution
Beta
f" Chisquaie
"Irutial Leveiage Distances
f" Classical
C Sequential Classical
C Robust [Median, 1 48MAD)
f* OKG (MaicmnaZamat)
KG [Not Oithogonabzed)
C MCD
•Leverage Vafue(s)
(7 Display Intervals
Confidence Coefficient
095
W Display Diagnostics
Leverage Influence Function
Alpha
Cancel
A
o Specify the "Regression Value." The default is "0.05."
o Specify the "Number of Regression Iterations." The default
is"10."
o Specify the "Regression MDs Distribution." The default is
"Beta."
o Specify the "Identify Leverage Points." The default is "On."
o Specify the "Select Leverage Distance Method." The default is
"PROP."
o Specify the "Number of Leverage Iterations." The default
is"10."
o Specify the "Leverage Initial Distances." The default is "OKG
(Maronna Zamar)."
423
-------
o Specify the "Leverage Value." The default is "0.05."
o Click "OK" to continue or "Cancel" to cancel the options.
• Click on the "Graphics" button to get the options window.
§H O pti o nsRegijessio nG rap hies
17 XY Plots
17 Yvs Y-Hat
17 Y vs Residuals
17 Y-Hat vs Residuals
(7 Residuals vs Leverage
17 QQ Residuals
XV Plot Title
PROP Regression - Y vs X Plot
YvsY-HatTitle
PROP Regression - Y vs Y-Hat PI
Y vs Residuals Title
PROP Regression • Y vs Residual
Y-Hat vs Residuals Title
-Regression Line - Fixing Other Regressors at —
<• No Line 17 Confidence Interval
C Minimum Values _ _ . .. , , .
y Predection Interval
C Mean Values
C Maximum Values
-Confidence Coefficient -
(*" Zero Values
0 95
PROP Regression ¦ Y-Hat vs Resi
Residuals vs Leverage Title
Graphics Distribution
(* Beta
f Chisquare
PROP Regression - Residuals vs
QQ Residuals Title
PROP Regression - Residuals QQ
¦Residual/Lev. Alpha —
0.05
OK
Cancel
A
o Specify the preferred plots and the input parameters.
o Click "OK" to continue or "Cancel" to cancel the options.
o Click "OK" to continue or "Cancel" to cancel the computations.
Output example: The data set "STARCLS.xls" was used for PROP regression. It has 1
predictor variables (p) and 47 observations. When the "Leverage" option is on, the
leverage distances are calculated and outlying observations are obtained iteratively using
initial estimates as median and OKG matrix and the leverage option as PROP (i.e., using
PROP influence function). Then the weights are assigned to observations and those
weights are used in the finding the regression outliers iteratively. When the leverage
option is off, all observations are assigned one (1) as weights and then the regression
outliers are found using the PROP function iteratively. Finally the estimated regression
parameters are calculated.
424
-------
Output for PROP (Leverage ON).
Data Set Used: Star Cluster (predictor variables p = 1).
; Regression Analysis Output
Date/Time of Compulation
User Selected Options
3/12/2008 8 09 44 AM
Fiom File
D \Narain\Scout_ForJw,indo'iivs\ScoutSource\WorkDatlnExcel\STARCLS
Fufl Precision
OFF
Selected Regression Method
PROP
Residual Influence Function Alpha
0 05 (Used to Identify Vertical Regression Outbers)
Number of Regression Iterations
10 (Maximum Nimber if doesn't Converge)
Leverage
Identify Leverage Points (Outliers snX-Space)
Selected Leverage Method
PROP
Initial Leverage Distance Method
OKG (Marortna Zamai) Matrix
Squared MDs
Beta Distribution used for Leverage Distances based upon Selected Regression (Leveiage] Variables
Leverage Distance Alpha
0 05 (Used to Identify Leverage Points)
Number of Leverage llecattons
TO (Maximum Number if doesn't Converge)
YvsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat v$ Residual Rot
Not Selected
T itle ForY v$X Plots
PROP Regression • Y vs X Plot
Title for Residual QQ Plot
PROP Regression - Residuals QQ Rot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
PROP Regression - Residuals vs Unsquared Leveiage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Leveiage Points aie Outliers inX-Space of Selected Regression Variables.
Number of Selected Regression Variables' 1
I
Number of Observations! 47
I
Dependent Variable
y
J
Residual Values used with Graphics Display
Upper Residual Indvidua! (0 05] MD
1 929
Lower Residual Indvidual (0 05) MD
•1 929
[
|
|
Correlation Mati oc
y
X
y
1
-021
X
0 21
1
I
425
-------
Output for PROP (Leverage ON) (continued).
Eigenvalues of Correlation Matrix
Eval 1
Eval 2
0.79
1.21
0 rdinary Least S quares (0 LS) R egression R estls
Estimates of Regression Parameters
Intercept
X
6.793
•0.413
Stdv of Estimated Regression Parameters
Intercept
X
1.237
0.286
ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0.665
1
0.665
2.085
0.1557
Error
14.35
45
0.319
Total
15.01
46
R Square Estimate
0 0443
MAD Based Scale Estimate
0 651
Weighted Scale Estimate
0.565
IQR Estimate of Residuals
1.025
Det. of C0V[Regression Coefficients] Matrix
5.5584E-4
Initial Weighted Regression Iteration with Identfied Leverage Pewits
E stimates of R egression Parameters
Intercept
X
-7.97
2.93
S tdv of E stimated R egression Parameters
Intercept
X
2.396
0.543
(The complete regression table is not shown.)
426
-------
Output for PROP (Leverage ON) (continued).
AN OVA Table
Souice ofVariation
SS
DOF
MS
F-Value
P-Value
Regression
4 205
1
4 205
29 09
0 0000
Enoi
5G47
39.0G
0.145
Total
9.852
40 0G
R Square Estimate
0.427
MAD Based Scale Estimate
0 45
Weighted Scale Estimate
0 38
Unsquaied Leverage Distance Indiv-MDIO 05)
1 929
IQR Estimate of Residuals
0G0S
Determinant of Leverage S Matrix
00118
Regression Table with Leveiage Option
Obs
Y Vector
Yhat
Residuals
Hat[i.i]
Res/Scale
Stude~ Res
Wts(i.i] j Res Dist
Lev Dist.
OLS R~ist
1
5 23
4 83G
0 394
0 0222
1.037
1.049
1
1 037
0 348
0 43
2
5.74
5 393
0 347
0 0373
0.S14
0.931
1
0 914
1 405
1 472
3
4 93
4 513
0 417
0 0219
1 09G
1.108
1
1 09G
1 362
0182
4
5 74
5.393
0 347
0.0373
0.914
0 931
1
0 914
1 405
1.472
5
5.19
4G31
0.559
0 0213
1.471
1 487
1
1 471
0 993
0 308
8
5.4G
51
0 3G
0 0271
0 948
0 961
1
0 948
0 483
0 903
7
4.S5
3 283
1 3G7
0 0781
3.59G
3.745
0 0135
3 59G
5 237
0 985
8
5 27
5 422
•0152
0 0387
-0.399
•0.407
1
0 399
1.497
0.G47
9
5 57
4 513
1 057
0.0219
2.779
2 81
1
2 779
1.3G2
0.951
10
5.12
4 83G
0 284
0 0222
0.748
0 75G
1
0 748
0 348
0.235
11
5.73
2 257
3.473
0194
9.134
10.17
3 3033E-4
9134
8 4G5
0.G71
12
5.45
5 012
0 438
0 025
1 153
1 1G8
1
1 153
0 206
0 8G3
13
5.42
5158
0 2G2
0.0287
0G89
0 689
1
0 689
0.GG7
0 847
14
4 05
3 781
0 2G9
0 0444
0 708
0.724
0 0923
0 708
3.668
1 924
15
4 2G
4.G01
¦0 341
0 0214
-0 898
¦0 908
1
0.898
1 086
1 347
16
4 58
4.982
¦0.402
0 0244
-1 058
-1 071
1
1.058
0114
0G85
17
3 94
4 42G
-0.486
0 0229
-1 277
-1 292
1
1 277
1 639
1.957
18
4.18
4 982
-0 802
0 0244
-211
-2136
1
211
0114
1 393
19
4.18
4.42G
¦0 24G
0 0229
-0 646
-0 653
1
0 646
1 639
1.532
20
5.89
2.257
3.633
0.194
9.555
10 64
3.3033E-4
9.555
8 465
0 955
21
4 38
4.G01
•0 221
0 0214
-0 582
-0 589
1
0 582
1 08G
1 134
22
4 22
4.G01
¦0 381
0 0214
-1 003
-1 014
1
1 003
1 08G
1.418
23
4 42
4.982
-0.562
0 0244
-1 479
-1.497
1
1 479 I 0114
0.9G8
(The complete regression table is not shown.)
427
-------
Output for PROP (Leverage ON) (continued).
Final Reweighted Regression Renfts
Estimates of Regiession Parameter
Intercept
X
•7 955
2 92G
S tdv of E stimated R egression Parameters
Intercept
X
1 911
0.434
AN OVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
5.401
1
5 401
45 44
0 0000
Error
4 614
38.83
0119
Total
10 01
39.83
R Square Estimate
0 539
MAD Based Scale Estimate
0 45
Weighted Scale Estimate
0.345
IQR Estimate of Residuals
0 607
Det. of COV[Regression Coefficients] Matrix
5 4829E-4
Regiession Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale jStude" Res
Wts[i.i]
Res Dist
1
5 23
4 831
0.399
0 0222
1 157
1 17
1
1.157
2
5 74
5.387
0.353
0 0373
1.023
1 043
1
1 023
3
4.93
4.509
0 421
0.0219
1 22
1 234
1
1 22
4
5 74
5 387
0 353
0.0373
1 023
1 043
1
1 023
5
519
4.626
0 564
0 0213
1 635
1 652
1
1 635
e
5 46
5.095
0.365
0 0271
1.06
1.075
1
1.06
7
4 65
3.281
1 369
0 0781
3 972
4.137
0 0628
3 972
9
5 27
5.416
¦0.146
0 0387
•0 425
¦0.433
1
0 425
(The complete regression table is not shown.)
Final Weighted Correlation Matrix
y
X
y
1
0 759
X
0 759
1
Eigenvalues of Final Weighted Correlation Matrix
Eval 1
Eval 2
0 241
1 759
428
-------
Output for PROP (Leverage ON) (continued).
PROP Regression - Residuals QQ Plot
1227
11.27
-
10.27
M
»
927
827
Standardized Residuals
'
1 27
027
-0 73
-1.73 IjndrvtO OSI - -1 93 |
--- *
*****
-
]lnciv(0O5J - «1.93 |
-
-2 73
-27 -22
-1.7 -1.2 -0.7 -02 0.3 03 13 Ifl
Normal Quantiles
2.3
PROP Regression - Residuals vs Unsquared Distance Plot
llndrv-KtO 05] ¦ *1 93
-
*
M
M
*
|lndrv-ResJ0 05]» »
93 |
— -
* - *
* M *
' '
• . ' . '
. ' ' * ^
! - ' 1
hcH-FesfOOSl-^l 93|
Jt
|Max-MT<0 05] ¦ *3.10
0.08 0.53 1.08 1 58 2 08 2 58 3 08 3.18
Unsquared Leverage Distances
429
-------
Output for PROP (Leverage ON) (continued).
PROP Regression - Y vs X Plot
651
631 J
611
591 jt
391
371 '
3.37 3.57 3.77 3.97 4.17 4.37 4.57
X
Interpretation of Graphs: Observations which are outside of the horizontal lines on the residual Q-Q plot
or on the residual versus unsquared leverage distances represent regression outliers. Observations lying to
the right of the vertical lines represent leverage outliers, leverage points lying between the two horizontal
lines represent good leverage points, and the rest of the leverage points represent bad leverage points. Both
the classical and robust regression lines are also shown on the y vs. x scatter plot.
430
-------
Output for PROP (Leverage OFF).
In order to demonstrate the usefulness of the leverage options (when several leverage points may be
present), the Star cluster data is considered again with the leverage option off.
The output thus obtained is given as follows.
; R egression Analysis 0 utput
Date/T ime of Computation
3/12/2008 8 15 48 AM
User Selected Options
From File
D \Narain\Scout_For_Windows\ScoutSouiceWorkDatlnExcel\STARCLS
Full Precision
OFF
Selected Regression Method
PROP
Residual Influence Function Alpha
0.05 (Used to Identify Vertical Regression Outliers)
Number of Regression Iterations
10 (Maximum Number il doesn't Converge)
Leverage
Off
YvsY-hat Plot
Not Selected
Y vs Residual Plot
Not Selected
Y-hat vs Residual Plot
Not Selected
Title For Y vsX Plots
PROP Regression ¦ Y vsX Plot
Title for Residual QQ Plot
PROP Regression - Residuals QQ Plot
Residual Band Alpha
0 05 (Used in Graphics Residual Bands)
Title Residual vs Distance Plot
PROP Regression - Residuals vs Unsquared Leverage Distance Plot
Show Intermediate Results
Do Not Display Intermediate Results
Intermediate Results Shown on Another Output Sheet
Number of Selected Regression Variables
Number of Observations
47
Dependent Variable
Residual Values used with Graphics Display
Upper Residual Indvidual (0 05) MD 1 929
Lower Residual Indvidual (0.05) MD
-1.929
Correlation Matr be
¦0 21
¦0 21
Eigenvalues of Correlation Mdbix
Eval 1
0 79
Eval 2
1 21
431
-------
Output for PROP (Leverage OFF) (continued).
Ordinary Least S quares (0 LS) R egression RestAs
Estimates of Regression Parameters
Intercept
X
6.793
¦0.413
S tdv of Estimated Regression Parameters
Intercept
X
1 237
0.286
'
ANOVA Table
Source of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0.665
1
0.665
2.085
0.1557
Error
14.35
45
0 319
Total
15.01
46
R Square Estimate
0.0443
MAD Based Scale Estimate
0.651
Weighted Scale Estimate
0.565
IQR Estimate of Residuals
1 025
Det. of C0V[R egression Coefficients] Matrix
5.5584E-4
Final Re weighted Regression Resdts
Estimates of Regression Parameters
Intercept
X
6.799
•0.414
Stdv of Estimated Regression Parameters
Intercept
X
1 235
0.286
ANOVA Table
S ource of Variation
SS
DOF
MS
F-Value
P-Value
Regression
0 668
1
0.668
2.102
0.1540
Error
14.29
44.95
0.318
Total
14.95
45.95
432
-------
Output for PROP (Leverage OFF) (continued).
R Square Estimate
0.0447
MAD Based Scale Estimate
0 651
Weighted Scale Estimate
0 5S4
IQR Estimate of Residuals
1.025
Det of C0V[Regression Coefficients] Matrix
5 5303E-4
Regression Table
Obs
Y Vector
Yhat
Residuals
Hat[i,i]
Res/Scale
Stude~ Res
Wts[i,i]
Res Disl
1
5 23
4 988
0.242
0.0222
0.429
0 433
1
0.429
2
5.74
4 91
0.83
0 0373
1.473
1.501
1
1.473
3
4 93
5 034
¦0104
0 0219
-0.104
-0187
1
0184
4
5 74
4 91
0 83
0 0373
1 473
1 501
1
1 473
5
5.19
5 017
0.173
0 0213
0 306
0 309
1
0 306
6
5 46
4 951
0.509
0 0271
0 903
0 915
1
0.903
7
4.65
5 208
¦0 558
0 0781
¦0 99
¦1.031
1
0 99
8
5 27
4 906
0 364
0 0387
0 646
0.659
1
0 646
9
5.57
5.034
0 536
0 0219
0 951
0 961
1
0.951
10
5.12
4.988
0.132
0.0222
0 233
0.236
1
0.233
11
5 73
5 353
0.377
0.194
0.669
0 745
1
0 669
12
5.45
4 964
0.486
0 025
0.863
0 874
1
0.863
13
5.42
4 943
0.477
0 0287
0 846
0 859
1
0 846
14
4 05
5138
¦1 088
0 0444
-1.929
¦1 974
1
1 929
15
4 26
5 022
¦0.762
0 0214
¦1 351
¦1 366
1
1.351
16
4.58
4.968
¦0.388
0 0244
-0 688
-0.6S6
1
0.688
17
3.94
5.046
-1.106
0.0229
¦1.963
¦1.985
0 951
1.963
18
4.18
4.968
•0.788
0.0244
¦1.397
-1.415
1
1 397
19
4.18
5.046
-0 866
0 0229
-1.537
¦1.555
1
1.537
20
5.89
5.353
0 537
0194
0.952
1.061
1
0 952
21
4.38
5.022
-0.642
0.0214
-1.138
¦1.15
1
1.138
(The complete regression table is not shown.)
FinalWeighled Correlation Matrix
V
X
V
1
¦0 212
X
¦0 212
1
Eigenvalues of FinalWeighled Correlation Matrix
Eval 1
Eval 2
0.738
1 212
-------
Q-Q plot of Standardized Output for PROP (Leverage OFF).
PROP Regression - Residuals QQ Plot
1 56
1 66
|WM0 05]..1S3 |
1 48
a a
1 28
*
*
1 06
0-86
M '
i *
a
066
****
0.46
EJ 0 26
u
2 006
0£
"2-o.u
1-034
m
T>
j-ou
v>
-0 74
-0 94
•1.14
*
M *
***
4,"
M
a » "
•134
-154
*
a
M *
M
KMOM-tlll
-2.14
334
-27 -22
-1.7 .1.2 -07 -02 0.3 08 13 18
Normal Quantil«s
23
434
-------
Q-Q plot of Standardized Output for PROP (Leverage OFF).
Interpretation of Graphs: Observations (if any) lying outside of the horizontal lines in the Q-Q plot are
considered to be regression outliers. The Leverage Distances vs. standardized residuals plot is not
produced as the leverage option was not activated. Regression lines are produced since there is only one
predictor variable. It is easy to see from the above graph (where both the classical and robust regression
lines are overlapping and attracted toward the outliers) that one should use the leverage option to properly
identify all of the leverage points. Once the leverage points are identified, the robust regression method
should be used to distinguish between the good and bad leverage points.
9.9 Method Comparison in Regression Module
The "Method Comparison" option in the "Regression" drop-down menu can be used to
compare the regression estimates of bivariate data obtained using various classical and
robust regression methods. Regression lines for the selected regression methods are
drawn on two-dimensional scatter plots. These comparisons are done in the "Bivariate
Regression Fits" drop-down menu. The method comparison module also compares the
residuals obtained by a single regression method against residuals obtained from one or
more methods. A comparison of fits (Y-hat) from one method against fits from the other
methods is done in a similar way. These comparisons of the residuals and fits from the
various regression methods are done in "R-R Plots" and "Y-Y-hat Plots," respectively.
435
-------
9.9.1 Bivariate Fits
Click Regression ~ Method Comparison > Bivariate Regression Fits.
Scout 200.8jo[p:j\Na/afn\Scqut|_For._WiNdo>ys,VScoutSource\WqrkI)at!nExcel\STARQiS]
Regression. I
File Ed* Configure Data Graphs Stats/GOF Outliers/Esttriotes QA/QC
Navigation Panel I
Nama
D \Narain\Scout_Fo
D VNarain\Scout_Fo
lndex_Plot gst
lndex_P!ot_a gst
InterQC gst
D \Narain\Scout Fo
MJbvarlate EDA GeoStats Proyams Window Help
2. The "Select Variables" screen (Section 3.3) will appear.
HI Select; Variables, tb;Graphj
Variables
Name
ID
Count
47
»
Select Y Axis Variable
Options
Name
ID
Count
Select X Axis Variable
Name
ID Count
Select Group Variable
"73
OK
Cancel
Select the Y axis variable and the X axis variable from the "Select
Variables to Graph" screen.
If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on "Options" for method comparison options.
436
-------
Select Regression,Method!Gbnipariso.n,Option5
-Select Regression Method (5) —
r OLS
r LMS
I- Iterative OLS w/o Leverage
r Iteiatrve OLS with Leverage
["" Biweight w/o Leverage
f Biweight with Leverage
Huber w/o Leverage
V Huber with Leverage
P MVT w/o Leverage
l~~ MVT with Leverage
V PROP w/o Leverage
f~" PROP with Leverage
Tile for Graph (~
Reaiesslon Lirre P
IE?]
The options in the window shown above represent the different
options.
~a
Select- Re^ressioniMeJhod|Cpm^ariso_niOp.tions;
• Select Regression Method (*) —
.r ols
r LMS
(7 Iterative OLS w/o Leverage
P" Iterative OLS with Leverage
[7 B weight w/o Leverage
f" B iweight with Leverage
f7 Huber w/o Leverage
r Huber with Leverage
W MVT w/o Leverage
l"~ MVT with Leverage
(7 PROP w/o Leverage
r PROP with Leverage
-Number of Regress or Iterations
I 10
[Max -50]
Title for Giaph
¦Reg Iterative OLS and/or MVT
005
Critical Alpha
¦Reg Biweight Turing Constant
Tunr>g Constant
•Reg Huber and/or PROP —
| 005
Influence Function Alpha
-Regresswn MD* Distribution
(* Beta Chtsquaie
"Reg Multivariate Trimming
I 01
Tiunming
Percentaqe
Regression Line Plot
m
A
O
The options selected in the window shown above are the options
for the regression methods without the leverage option.
437
-------
° The "Iterative OLS w/o Leverage" requires the input of a
"Critical Alpha."
° The "Biweight w/o Leverage" requires the input of a
"Tuning Constant."
D The "Huber w/o Leverage" requires the input of an
"Influence Function Alpha."
B The "MVT w/o Leverage" requires the input of a "Critical
Alpha" and a "Trimming Percentage."
D The "PROP w/o Leverage" requires the input of an
"Influence Function Alpha."
^Selecj'Rcgrcss.iDniMcthodlGonigci.nsor'O^tionsj
¦Select Recession Method Is) —
r OLS
T LMS
f"~ Iterative OLS w/o Leverage
W Iterative OLS with Leverage
r Biweight w/o Leverage
W Biweight with Leverage
P Huber w/oLeveiage
|7 Huber wth Leveiage
l~~ MVT w/o Leverage
F? MVT wih Leverage
l~~ PROP w/o Leverage
1^ PROP with Leverage
-Reg.ltejatrveOLS and/or MVT ~
005
Dibcal Alpha
-Reg. Biwetj^it Tuning Constant
Tinng Constant
-Reg Huber and/or PROP
| 005
Influence Function Alpha
- Regression MDs Distribution
(• Beta f Chsquare
Nirrbei of Regiessbn Iteration*
I 1°
[Max - 50)
Reg Mukivariate Ticmming
I 01
T rmmng
Per centaoe
Tdle for Graph
RegiesnonUnePta
• Leverage Distance Method
f* Classical
C Sequential Oasscat
<** Huber
<* PROP
MVT (Tiimmmg)
"Irubal Leverage Estmates
C Classical
C Sequenlial Classical
f Robust (Median, MAO)
OKG (MatomaZamai 1
C KG [Not Oithogonafczed)
C MCD
[-Lev Hiber arid/or PROP
| 005
Influence Finctwn Alpha
"Leveiage MDs Dutrbution -
<• Beta Chisquaie
"NumberofLeveiageIterations ~~
10
[Max -50]
OK
Cancel
o Options in the window selected above represent options for the
regression methods with leverage.
n The "Leverage Distance Method" remains the same for
any of the regression methods.
° The "Classical" and "Sequential Classical" requires the
input of a "Critical Alpha"
n The "Huber" and "PROP" requires the input of an
"Influence Function Alpha" and the "Leverage MDs
Distribution."
n The "MVT" requires the input of a "Critical Alpha" and a
"Trimming Percentage."
° The Leverage Distance Method requires an "Initial
Leverage Estimates" selection to start the computations.
438
-------
Graphical Display for Method Comparisons Option.
Data Set: Bradu (XI vs. Y).
Methods: OLS, LMS, PROP w/o Leverage (PROP influence function alpha for regression outliers = 0.2).
Regression Line Plot
Data Set: Bradu (X3 vs. Y).
Methods: OLS, LMS, PROP w/o Leverage (PROP influence function alpha = 0.2).
Regression Line Plot
439
-------
It is noted that the LMS (green line) method finds different sets of outliers when
compared to the PROP (violet line) method. As shown earlier, in multiple linear LMS
regression of y on xl, x2, and x3, observations 1 through 10 were identified as regression
outliers (and bad leverage points). Here the LMS regression of y on xl (and also of y on
x2) also identified the first 10 points as regression outliers; whereas the LMS regression
of y on x3 identified observations 11, 12, 13, and 14 as bad leverages and regression
outliers. However, the PROP method, without the leverage option, identified
observations 11, 12, 13, and 14 as regression outliers and bad leverage points for all of
the regression models: y vs. xl, x2, and x3; y vs. xl; y vs. x2; and y vs. x3. In practice, it
is desirable to supplement statistical results with graphical displays. In the present
context, graphical displays also help the user to determine points that may represent good
(or bad) leverage points. Without the first 10 points, this data set should be used to
obtain any regression model.
Output for Method Comparisons.
Data Set: Bradu (XI vs. Y).
Methods: PROP w/o Leverage (influence function alpha = 0.2), PROP with Leverage (initial estimate as
OKG, influence function as 0.05), and OLS.
440
-------
Output for Method Comparisons.
Data Set: Bradu (XI vs. Y).
Methods: All (12 methods).
Regression Line Plot
las
| IMS
| Iterative CCS (VNthout Leverage)
¦ lerative OLS (AKh Leverage)
¦ PROP (VVBioul Leverage)
| PROP (V#h Leverage)
Huber QMhOUl Leverage)
Huber (V*h Leverage)
3 Biwetyt (VWKXI Leverage)
I Biwe«gtt (V*h Leverage)
MVT (Vstthout Leverage)
MVT PA*h Leverage)
-1.2 -0.2 0.8 1.8
10.8 11.8
As mentioned before, the user should select the various options carefully. It is suggested
not to select all of the available options to generate a single graph. Such a graph will be
cluttered with many regression lines. This is illustrated in the above figure.
Note: Sometimes a line will be outside the frame of the graph. In such cases, a warning message (in
orange) will be printed in the Log Panel.
441
-------
9.9.2 Multivariate R-R Plots
I. Click Regression I> Method Comparison > Multivariate R-R Plot.
HI Scout' 2008; [D:\yarain\ScgiitjJori_Windo^w^cgutSource\WorkDalln(^elV^TARGLS]|
Regression
pQ File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QjtyQC |
Navigation Panel I
Multivariate EDA GeoStats Programs Window Help
Name
D \Narain\Scout
Fo .
D \Narain\Scout
Fo
Index Plot gst
Index Plot_a gst
InterQC gst
D.\Narain\Scout_
Fo.
a
1 | 2
z
y ¦
x I
1
5 23
4 37|
2
5 74
4 56
3
493
426
4
5.74
4 56
5
519
43
6
546
4 46
1 ^
465
384
OLS
LMS/LPS
Iterative OLS
Biweitfit
Huber
MVT
PROP
Method Comparison ~
Brvariate Regression Fits j
i i
I "
Multivariate R-R Plot I
Multivariate Y-Y-Hat Plot |
2. The "Select Variables" screen (Section 3.3) will appear.
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
Click on the "Options" button to get the options window.
o The options in the window shown below are the options when all
the check-boxes in the "Method(s) for Residuals on Y-Axis" are
checked. The default option is of plotting the "Observed Y"
against "OLS" residuals.
9 Regression MethodjComf>qusoniResidualrResidua(iGoniparjsoniOpJions
"Method (or Residuals onX-Axis
(• ObservedY
r ols
r lms
r LPs
Iterative OLS w/o Leverage
f Iterative OLS with Leverage
C BtweigM w/o Leverage
C Biweight with Leverage
Huber w/o Leverage
C Huber with Leverage
MVT w/o Leverage
C MVT with Leverage
C PROP w/o Leverage
PROP with Leverage
I to
[Max-100]
-Residuals MD* Distnbution
(* Beta C Chisquare
fv Store Residuals to Worksheet
W Use Defaut Title
*Method(s) for Residuals on Y-Axts
W OLS
9 LMS
f* LPS
(7 Iterative OLS w/o Leverage
W Iteratrve OLS with Leverage
0 (weight w/o Leverage
BweigUwAhLeveiage
ft Huber w/o Leverage
f? Huber with Leverage
17 MVT w/o Leverage
ft MVT with Leverage
ft PROP w/o Leverage
ft PROP with Leverage
-Number of Regression Iterations ~
¦Reg Iterative OLS and/or MVT
0 05
Critical Alpha
~Reg Huber arid/c* PROP
| 0 05
Influence Function Alpha
• Leverage Distance Method
C Classical
C Sequential Classical
C Huber
PROP
• C MVT (Tiimnung)
- Initial Leverage Estimates
C Classical
C* Sequential Classical
C Robust (Median, 1 48MAD)
(* 0KG (MaionnaZamar)
C KG (NotOilhogonaicedJ
r mcd
-Reg MultivariateTnmmtng
I oi
T rimming Percentage
ft Display User Selections
• Reg Bweight T untng Corotart
Tumng Constant
Nun. of Leverage Iterations —
I iff
[Max ° 50]
- Lever ace MDi Out/but ion
f* Beta C Chisquare
~
*~Lev Huber and/or PROP
| 005
Influence Function Alpha
•LMS/LPS Search Strategy
C Al Combinations
(• Extensive
LPS Petcentrte •
| 05
Mrumizaticn Criterion
OK
Cancel
442
-------
o The options required for the various regression methods are
discussed in the previous sections of this chapter.
o Select a method for X-axis and one or more methods for the Y-
axis.
o Specify the required parameters of the selected methods in the
various options boxes.
o "Display User Selections" option stores the user selected options
for the various methods into a new worksheet for reference.
o "Store Residuals to Worksheet" options stores the residuals of
each of the selected y-axis methods and the x-axis method in a new
worksheet.
o Click on "OK" to continue or "Cancel" to cancel the options
window.
• Click on "OK" to continue or "Cancel" to cancel the generation of R-R
Plots.
Output for R-R Plots.
Data Set: Bradu.
Methods: 13 (All) methods on Y-axis vs. Observed Y on X-axis.
295
Residuals versus Residuals Plot
23.4
Lev. # Sequential OLSwthLev."--BiweigK wfoLev.
MVTwjtoLev. jgMVT with Lev. PRC*> wto Lev. PROPwUh Lev
| Biweight w*h Lev.
443
-------
Data Set: Bradu.
Methods: 5 (OLS, LMS, Biweight, Huber and PROP with leverages) methods on Y-axis vs. OLS on X
axis.
5 OpIionsRegressionMCXY
Method foe Residuals onXAws
j Observed Y
f OLS
j C LMS
, r LPS
f Iterative OLS w/o Leverage
C Itaalive OLS with Leverage
! BiweigN w/o Leverage
C BiweigN with Leverage
| f Hubet w/o Leverage
I f Hubet with Leverage
[ MVT w/o Leverage
i C MVT with Leverage
| C PROP w/o Leverage
I <"* PROP with Leverage
Number of Regression Iterates
I
(Max-100]
Residuals MDs Distribution
G Beta Chnquare
f" Use Default Title
Method(s) for Residuals on Y-Axjs
!3» OLS
I? LMS
r lps
r Iterative OLS w/o Leverage
r Iterative OLS with Leverage
r 3*ve«gN w/o Leverage
I* Bweight with Leverage
r Huber w/o Leverage
& Huber with Leverage
r MVT w/o Leverage
r MVT with Leverage
r PROP w/o Leverage
17 PROP with Leverage
Reg Huber and/or PROP
| 0.05
Influence Function Alpha
Leverage Distance Method
Classical
Sequential Classical
f* Huber
C PROP
MVT (T rimming)
Initial Leverage Estimates
C Classical
f* Sequential Classical
Robust (Median. 1.48MAD)
<• OKG (M aroma Zamar)
KG (Not OrthogonaSied)
r MCD
Reg BiweigN Tunrig Constant
T uning Constant
Num. of Leverage Iterations
10
[Max - 50]
Levetaae MDs Distribution
Beta f Chisquare
Lev. Huber and/or PROP
| 0.05
Influence Function Alpha
LMS/LPS Search Strategy
Al Combinations
• Extensive
444
-------
9.9.3 Multivariate Y-Y-hat Plots
I. Click Regression ^ Method Comparison > Multivariate Y-Y-hat Plot.
H Scout 2Qp.^^[D:;\^arainAScoutul;or,J/jhdows^Sc<)UtSDurge3V/.br^atrnEwel\BBADy.ixls]J
Regression
ny File Edit Configure Data Graphs Stats/GOF Outliers/Estimates QA/QC
Navigation Panel
Multivariate EDA GeoStats Programs Window Help
Name |
D \Narain\Scout
Fd .
0 \Narain\Scout
Fo
tndex Plot gst
lndex_Plot_a gst
InterQC gst
D \Narain\Scout_
liSTCElETOGfflffll
Fo
ReqRR qst
Count
x1
1
1
9 7t
101
2
2
1011
95
3
3
10 3!
107
4
4
9 5!
99
5
5
V
103
E
6
__ J -
108|
10.8
7
7
105
OlS
j LMS/LP5
Iterative OLS
Biweight
Huber
MVT
PROP
20 4[
-209|
29 2
291
Bivariate Regression Fits
Multivariate R-R Plot
is
2. The "Select Variables" screen (Section 3.3) will appear.
Select the dependent variable and one or more independent variables from
the "Select Variables" screen.
° Click on the "Options" button to get the options window.
o The options in the window shown below are the options when all
the check-boxes in the "Method(s) for Fits on Y-Axis" are
checked. The default option is of plotting the "Observed Y"
against "OLS" fits.
§sl Regression MethodiComparison Y-Y-hat/Comparison,Options,
"Method for Fa* onX-Aat
(* ObseivedY
r ols
r lms
r lps
Iteratrve OLS w/o Leverage
C Iterative OLS with Leverage
f" Brweit^it w/o Leverage
Brweight with Leverage
f Huber w/o Leverage
I C Huber with Leverage
| f* MVT w/o Leverage
| C MVT with Leverage
j C PROP w/o Leverage
I C PROP with Leverage
10
[Max = 1CH3]
r Store Y-hats lo Worksheet
Use Defaiit Title
¦Method(j)fo( F*$ onY-Aas
W OLS
I? LMS
& LPS
17 Iteiative OLS w/o Leverage
W Iteiative OLS with Leverage
P7 Biweight w/o Leverage
W BiwerghtwAh Leverage
W Huber w/o Leveiage
Huber with Leverage
W MVT w/o Leverage
W MVT with Leverage
[7 PROP w/o Leverage
W PROP with Leverage
•Nimbero/ Regretsronlteratwris
"Reg. Iterative OLS and/or MVT
005
Critical Alpha
~Reg Huber and/or PROP
"Res'iduabMDsDistiibutjon "
(* Beta C Chitquaie
j 005
Influence Find on Alpha
V Display User Selections
' Leverage Distance Method
C Class teal
C SequenbdOasstcal
C Huber
PROP
C MVT (Tinmng)
Tunrig Constant
• Intel Leveiage Estinates
C Classical
Sequential Classical
C Robust (Median, 1 48MAD)
(* 0KG (MaromaZamar)
f KG (NotOitbogonafced)
r MCD
-Num of Leveiage Iterations
I io
[Max = 50]
pLeveraoe MDi Distribution —i
(* Beta C Chsquaie
"Lev. Huber and/or PROP "
| 005
Influence Function Alpha
"Reg Multivariate Trimming
I
Tnmmmg Percentage
'Reg Brwetght Turung Constant —
"LMS/LPS Sejich Strategy
f A1 Combinations
(* Extensive
[-LPS Percentle •
I 05
Minmeation Cnterion
445
-------
o The options required for the various regression methods are
discussed in the previous sections of this chapter.
o Select a method for X-axis and one or more methods for the Y-
axis.
o Specify the required parameters of the selected methods in the
various options boxes.
o "Display User Selections" option stores the user selected options
for the various methods into a new worksheet for reference.
o "Store Residuals to Worksheet" options stores the residuals of
each of the selected y-axis methods and the x-axis method in a new
worksheet.
o Click on "OK" to continue or "Cancel" to cancel the options
window.
• Click on "OK" to continue or "Cancel" to cancel the generation of Y-Y-
hat Plots.
Output for Y-Y-hat Plots.
Data Set: Bradu.
Methods: 13 (All) methods on Y-axis vs. Observed Y on X-axis.
51-2 2 -1.2 -0.2 0 8 1.8 2.8 3.8 4.8 5.8 6 8 7.8 88 9.8 10.8 11.8
Observed Y
JlOLS LMS ALPS Sequential OLS vWb Lev. it Sequential OLS with Lev — Biweight vWo Lev. \ Biweigtt wth Lev.
-f-Huber wfe Lev V Hubet wth Lev MVT wta Lev flfMVT wth Lev. PROP wto Lev. PROP wih Lev.
446
-------
Data Set: Bradu.
Methods: 5 (OLS, LMS, Biweight, and PROP with leverages) methods on Y-axis vs. OLS on X-axis.
31 GptionsRegressionMC_XY
Method for Fits onX-Axis
f Observed Y
<5" OLS
r LMS
r lps
| r Iterative OLS w/o Leverage
I C Iterative OLS with Leverage
. C Biweight w/o Leverage
C Biweight with Leverage
I f Huber w/o Leverage
; f Huber iMth Leverage
: f MVT w/o Leverage
MVT with Leverage
; C PROP w/o Leverage
j PROP with Leverage
Number of Regression Iterations
10
[Max »100]
Residuals MDs Distribution
~ Beta C Chisquare
f~ Use Default Title
Methods) for Fits on Y-Axis
!? OLS
R? LMS
r LPS
r* Iterative OLS w/o Leverage
P Iterative OLS with Leverage
f Biweight w/o Leverage
v Biweight with Leverage
V Huber w/o Leverage
r Huber with Leverage
r* MVT w/o Leverage
T MVT with Leverage
r PROP w/o Leverage
'¦? PROP with Leverage
Reg. Huber and/or PROP
j 0.05
Influence Function Afc>ha
Leverage Distance Method
Oassical
C Sequential Oassical
Huber
'• PROP
r MVT (T rimming)
Initial Leverage Estimates
C Classical
C Sequential Classical
C Robust (Median. 1.48MAD)
(* 0KG {Maionna Zamar)
KG [NotOrthogonafced]
C MCD
Reg. Biweight Tuning Constant
T uning Constant
Num. of Leverage Iterations
| ^
[Max - 50)
Leveraoe MDs Distribution
Beta C Chisquare
Lev. Huber and/or PROP
0.05
Influence Function Alpha
LMS/LPS Search Stiategy
f All Combinations
'* Extensive
Y-hat versus Y-hat Plot
3.3
7.3
h3
S.3 ¦
43
„ 13
¦I
^ 23
t.3
03
-07
-17
-Z7
_-i *
•3£7
-a.1
-1.1 -0.1 05 19 2.9 13 4 9 $9 69
OLS Yhat
79
89
99
*OLS
• I.MS A wth Lev PROP with Lev
447
-------
Data Set: Bradu.
Methods: 3 (OLS, PROP with and without leverage) methods on Y-axis vs. PROP with leverage on X-
axis.
^ X
Method for Fits on X-Axis
r Observed Y
r ols
^ LMS
r lps
f Iterative OLS w/o Leverage
C Iterative OLS with Leverage
Biweight w/o Leverage
C Biweight with Leverage
<" Huber w/o Leverage
<" Huber with Leverage
^ HVT w/o Leverage
MVT with Leverage
I f PROP w/o Leverage
<* PROP with Leverage
Number of Regression Iterations
I 10
[Max-100]
Residuals MDs Distribution
<* Beta C Chisquaie
f Use Default Title
Method(s) for Fits on Y-Axis
W OLS
r LMS
r lps
r Iterative OLS w/o Leverage
f~* Iterative OLS with Leveiage
P Biweight w/o Leverage
P Biweight with Leverage
Huber w/o Leveiage
r* Huber with Leverage
r~ MVT w/o Leverage
r* MVT with Leverage
& PROP w/o Leveiage
5* PROP with Leverage
Reg. Huber and/or PROP
| 0.05
Influence Function Alpha
Leverage Distance Method
r Classical
<"* Sequential Classical
C Huber
a prop
MVT (Trimming)
Initial Leva age Estimates
C Classical
Sequential Classical
C Robust (Medan. 1.48MAD]
OKG (MaronnaZamar)
C KG (Not Orthogonafizedl
r MCD
Num. of Leverage Iterations
I 10
[Max • 50)
Leveiaae MDs Distribution
<* Beta Chisquare
Lev. Huber and/or PROP
| 005
Influence Function Alpha
Y-hat versus Y-hat Plot
14.4
13.4
12.4
11.4
10.4
a * %
94
8.4
7.4
4 » *
j2 64
>
5.4
4
4.4
3.4
2.4
14
0.4
¦¦¦ '. 't'. k-'' > '
T-X
-OB
-16
-—.
2B-037
-027 -D.17 43.07 0.03 0.13
PROP with Leverage Y-hat
023
0.33 0.33
JiOLS
:(f PROP w/o Lew. A PRO? wtl. Lev.
448
-------
References
Agullo, J. (1997). "Exact Algorithms to Compute the Least Median of Squares Estimate in
Multiple Linear Regression," in Statistical Procedures and Related Topics, ed. Dodge, Y.,
Institute of Mathematical Statistics, Hayward, CA, 133-146.
Belsley, D.A., Kuh, E., and Welsch, R.E. (1980). Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity, John Wiley and Sons, NY.
Chatterjee, S. and Machler, M. (1997). "Robust regression: A weighted least squares
approach," Communications in Statistics Theory and Methods, 26, 1381-1394.
Cook, R.D., and Weisberg, S. (1982). Residuals and Influence in Regression, Chapman &
Hall, London.
Dollinger, M.B., and Staudte, R.G. (1991). "Influence Functions of lteratively Reweighted
Least Squares Estimators," Journal of the American Statistical Association, 86, 709-716.
Draper, N.R., and Smith, H. (1984). Applied Regression Analysis, 2nd ed., John Wiley and
Sons, NY.
Gnanadesikan, R., and Kettenring, J.R. (1972). "Robust Estimates, Residuals, and Outlier
Detection with Multi-response Data," Biometrics, 28, 81-124.
Hadi, A.S., and Simonoff, J.S. (1993). "Procedures for the Identification of Multiple
Outliers in Linear Models," Journal of the American Statistical Association, 88, 1264-
1272.
Hawkins, D.M., Bradu, D., and Kass, G.V. (1984). "Location of Several Outliers in
Multiple Regression Data Using Elemental Sets," Technometrics, 26, 197-208.
Hawkins, D.M., and Simonoff, J.S. (1993). "High Break Down Regression and Multivariate
Estimation," Applied Statistics, 42, 423-432.
Hettmansperger, T.P., and Sheather, S.J. (1992). "A Cautionary Note on the Method of
Least Median Squares," The American Statistician, 46, 79-83.
Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman W. (1996). Applied Linear
Statistical Models, 4th ed., McGraw-Hill, Boston.
Olive, D.J. (2002). "Applications of Robust Distances for Regression," Technometrics, 44,
64-71.
Rousseeuw, P.J. (1984). "Least Median of Squares Regression," Journal of the American
Statistical Association, 79, 871-880.
449
-------
Rousseeuw, P.J., and Leroy, A.M. (1987). Robust Regression and Outlier Detection, John
Wiley and Sons, NY.
Ruppert, D. (1992). "Computing S-Estimators for Regression and Multivariate
Location/Dispersion," Journal of Computational and Graphical Statistics, 1, 253-270.
Ruppert, D., and Carroll, R.J. (1980). "Trimmed Least Squares Estimation in the Linear
Model," Journal of the American Statistical Association, 75, 828-838.
Simpson, D.G., Ruppert, D., and Carroll, R.J. (1992). "On One-Step GM Estimates and
Stability of Inferences in Linear Regression," Journal of the American Statistical
Association, 87,439-450.
Simpson, J.R., and Montgomery, D.C. (1998a). "The development and evaluation of
alternative generalized M estimation techniques," Communications in Statistics —
Simulation and Computation, 27, 999-1018.
Singh, A. and Nocerino, J.M. (1995). Robust Procedures for the Identification of Multiple
Outliers, Handbook of Environmental Chemistry, Statistical Methods, Vol. 2. G, pp. 229-
277, Springer Verlag, Germany.
Stromberg, A.J. (1993). "Computing the Exact Least Median of Squares Estimate and
Stability Diagnostics in Multiple Linear Regression," SI AM Journal of Scientific and
Statistical Computing, 14, 12891299.
Welsh, A.H. (1986). "Bahadur Representations for Robust Scale Estimators Based on
Regression Residuals," The Annals of Statistics, 14, 1246-1251.
Yohai, V.J., and Zamar, R.H. (1988). "High break down point estimates of regression by
means of the minimization of an efficient scale," Journal of the American Statistical
Association, 83, 406^ 13.
450
-------