| | %gk United States
JVubUll Environmental Protection
Agency
Scout 2008 Version 1.0
User Guide
Part II
RESEARCH AND DEVELOPMENT
-------
EPA/600/R-08/038
February 2009
US EPA
frTgp, Headquarters and Chemical Libraries
EPA West Bldg Room 3340
£PA Mailcode 3404T
loQO- 1301 Constitution Ave NW www.epa gov
^ - Washington DC 20004
0% - 202-566-0556
01%
*¦* Scout 2008 Version 1.0
User Guide
(Second Edition, December 2008)
John Nocerino
U.S. Environmental Protection Agency
Office of Research and Development
National Exposure Research Laboratory
Environmental Sciences Division
^ Technology Support Center
S Characterization and Monitoring Branch
^ 944 E. Harmon Ave.
ir
&>
%
Las Vegas, NV 89119
Anita Singh, Ph.D.1
Robert Maichle1
Narain Armbya1
Ashok K. Singh, Ph.D.2
'Lockheed Martin Environmental Services
1050 E. Flamingo Road, Suite N240
Las Vegas, NV 89119
Department of Hotel Management DAr\noItnKi#
University of Nevada, Las Vegas nUpUbllOry IVIat@Nal
Las Vegas, NV 89154
Permanent Collection
Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official
Agency policy. Mention of trade names and commercial products does not constitute endorsement or
recommendation for use.
U.S. Environmental Protection Agency
Office of Research and Development
Washington, DC 20460
7663cmb09
-------
Notice
The United States Environmental Protection Agency (EPA) through its Office of
Research and Development (ORD) funded and managed the research described here. It
has been peer reviewed by the EPA and approved for publication. Mention of trade
names and commercial products does not constitute endorsement or recommendation by
the EPA for use.
The Scout 2008 software was developed by Lockheed-Martin under a contract with the
USEPA. Use of any portion of Scout 2008 that does not comply with the Scout 2008
User Guide is not recommended.
Scout 2008 contains embedded licensed software. Any modification of the Scout 2008
source code may violate the embedded licensed software agreements and is expressly
forbidden.
The Scout 2008 software provided by the USEPA was scanned with McAfee VirusScan
and is certified free of viruses.
With respect to the Scout 2008 distributed software and documentation, neither the
USEPA, nor any of their employees, assumes any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information, apparatus, product, or process
disclosed. Furthermore, the Scout 2008 software and documentation are supplied "as-is"
without guarantee or warranty, expressed or implied, including without limitation, any
warranty of merchantability or fitness for a specific purpose.
iii
-------
Acronyms and Abbreviations
0/° NDs Percentage of Non-detect observations
ACL alternative concentration limit
A-D, AD Anderson-Darling test
AM arithmetic mean
ANOVA Analysis of Variance
AOC area(s) of concern
B* Between groups matrix
BC Box-Cox-type transformation
BCA bias-corrected accelerated bootstrap method
BD break down point
BDL below detection limit
BTV background threshold value
BW Black and White (for printing)
CERCLA Comprehensive Environmental Response, Compensation, and
Liability Act
CL
compliance limit, confidence limits, control limits
CLT central limit theorem
CMLE Cohen's maximum likelihood estimate
COPC contaminant(s) of potential concern
CV Coefficient of Variation, cross validation
D-D distance-distance
DA discriminant analysis
DL detection limit
DL/2 (t) UCL based upon DL/2 method using Student's t-distribution
cutoff value
DL/2 Estimates estimates based upon data set with non-detects replaced by half
of the respective detection limits
DQO data quality objective
DS discriminant scores
EA exposure area
EDF empirical distribution function
EM expectation maximization
EPA Environmental Protection Agency
EPC exposure point concentration
FP-ROS (Land) UCL based upon fully parametric ROS method using Land's H-
statistic
v
-------
Gamma ROS (Approx.)
Gamma ROS (BCA)
GOF, G.O.F.
H-UCL
HBK
HUBER
ID
[QR
K
KG
KM (%)
KM (Chebyshev)
KM (t)
KM (z)
K-M, KM
K-S, KS
LMS
LN
Log-ROS Estimates
LPS
MAD
Maximum
MC
MCD
MCL
MD
Mean
Median
Minimum
MLE
MLE (t)
UCL based upon Gamma ROS method using the bias-corrected
accelerated bootstrap method
UCL based upon Gamma ROS method using the gamma
approximate-UCL method
goodness-of-fit
UCL based upon Land's H-statistic
Hawkins Bradu Kaas
Huber estimation method
identification code
interquartile range
Next K, Other K, Future K
Kettenring Gnanadesikan
UCL based upon Kaplan-Meier estimates using the percentile
bootstrap method
UCL based upon Kaplan-Meier estimates using the Chebyshev
inequality
UCL based upon Kaplan-Meier estimates using the Student's t-
distribution cutoff value
UCL based upon Kaplan-Meier estimates using standard normal
distribution cutoff value
Kaplan-Meier
Kolmogorov-Smirnov
least median squares
lognormal distribution
estimates based upon data set with extrapolated non-detect
values obtained using robust ROS method
least percentile squares
Median Absolute Deviation
Maximum value
minimization criterion
minimum covariance determinant
maximum concentration limit
Mahalanobis distance
classical average value
Median value
Minimum value
maximum likelihood estimate
UCL based upon maximum likelihood estimates using Student's
t-distribution cutoff value
vi
-------
MLE (Tiku) UCL based upon maximum likelihood estimates using the
Tiku's method
Multi Q-Q multiple quantile-quantile plot
MVT multivariate trimming
MVUE minimum variance unbiased estimate
ND non-detect or non-detects
NERL National Exposure Research Laboratory
NumNDs Number of Non-detects
NumObs Number of Observations
OKG Orthogonalized Kettenring Gnanadesikan
OLS ordinary least squares
ORD Office of Research and Development
PCA principal component analysis
PCs principal components
PCS principal component scores
PLs prediction limits
PRG preliminary remediation goals
PROP proposed estimation method
Q-Q quantile-quantile
RBC risk-based cleanup
RCRA Resource Conservation and Recovery Act
ROS regression on order statistics
RU remediation unit
S substantial difference
SD, Sd, sd standard deviation
SLs simultaneous limits
SSL soj] screening levels
S-W, SW Shapiro-Wilk
TLs tolerance limits
UCL upper confidence limit
UCL95, 95% UCL 95% upper confidence limit
UPL upper prediction limit
UPL95, 95% UPL 950/,, L1pper prediction limit
USEPA United States Environmental Protection Agency
UTL upper tolerance limit
Variance classical variance
W* Within groups matrix
vii
-------
WiB matrix Inverse of W* cross-product B* matrix
WMW Wilcoxon-Mann-Whitney
WRS Wilcoxon Rank Sum
WSR Wilcoxon Signed Rank
Wsum Sum of weights
Wsum2 Sum of squared weights
viii
-------
Table of Contents
Notice iii
Acronyms and Abbreviations v
Table of Contents ix
Chapter 7 223
Outliers and Estimates 223
7.1 Univariate Outliers and Estimates 223
7.1.1 Dixon Test for Univariate Data 225
7.1.2 Rosner's Test for Univariate Data 226
7.1.3 MD-Basec! (Grubbs Test) Test for Univariate Data 228
7.1.4 Biweight Estimate for Univariate Data 229
7.2 Robust Estimation and Identification of Multiple Multivariate Outliers 231
7 2.1 Classical Outlier Testing 231
7.2.1.1 Mahalanobis' Distances 231
7.2.1.2 Multivariate Kurtosis 233
7.2.1.3 Identifying Causal Variables 235
7.2.2 Robust Outlier Testing 240
7.2.2.1 Sequential Classical 243
7.2.2.2 Huber 250
7.2.2.3 Extended MCD 259
7.2.2.4 MVT 268
7.2.2.5 PROP 278
7.2.3 Method Comparisons 289
References 299
Chapter 8 303
QA/QC 303
8.1 Univariate QA/QC 303
8.1.1 No Non-detects 303
8.1.1.1 Q-Q Plots with Limits 303
8.1.1.2 Interval Graphs 306
8.1.1.2.1 Compare Intervals 306
8.1.1.2.2 Intervals Index Plots 310
8.1.1.3 Control Charts 314
8.1.1.3.1 Using All Data 314
8.1.1.3.2 Using Training/Background 315
8.1.2 With Non-detects 318
8.1.2.1 Interval Graphs 318
8.1.2.2 Control Charts 321
8.1.2.2 1 Using All Data 321
8.1 2.2.2 Using Training/Background 322
8.2 Multivariate QA/QC 325
8.2.1 No Non-detects 325
8.2.1.1 MDs Q-Q Plots with Limits 325
8.2.1.2 MDs Control Chart 328
8.2.1.3 Prediction and Tolerance Ellipsoids 33 I
ix
-------
8.2.2 With Non-detects 334
8.2.2.1 MDs Control Charts 334
8.2.2.2 Prediction and Tolerance Ellipsoids 337
x
-------
Chapter 7
Outliers and Estimates
Outliers are inevitable in data sets originating from various applications. There are many
graphical (Q-Q plots, Box plots), classical (Dixon, Rosner, Welch, Max MD), sequential
classical (Max MD, Kurtosis), and robust estimation and outlier identification methods
(Biweight, Huber, MCD, MVE, MVT, OK.G, PROP) available in the literature. Classical
outlier tests suffer from masking (e.g., extreme outliers may mask intermediate outliers)
effects. The use of robust outlier identification procedures is recommended to identify
multiple outliers, especially when dealing with multivariate (having multiple
contaminants) data sets. Several univariate and multivariate (both classical and robust
outlier identification methods (e.g., based upon Biweight, Huber, and PROP influence
functions)) are available in this Scout software package.
7.1 Univariate Outliers and Estimates
For historical reasons and also for the sake of comparison, some simple classical outlier
tests are also included in the Scout software package. Specifically, the classical outlier
tests (often cited in environmental literature), Dixon and Rosner, are available in Scout.
For details, refer to ProUCL 4.00.04 Technical Guide. Those classical tests may be used
on data sets with and without non-detect observations. For data sets with non-detects,
two options are available in Scout to deal with data sets with outliers: I) exclude non-
detects, and 2) replace NDs by DL/2 values. Those options are used only to identify
outliers and not to compute any estimates and limits used in the decision-making process.
It is suggested that the classical (and also the robust procedures to be described later)
outlier identification procedures be supplemented with graphical displays such as Q-Q
plots, box-and-whisker plots (also called box plots), and interquartile range (IQR) plots
(upper quartile, Q3, and lower quartile, Ql). Those graphical displays are available in
Scout. Box plots with whiskers are sometimes used to identify univariate outliers (e.g.,
EPA 2006). Typically, a box plot gives a good indication of extreme (outliers)
observations that may present in a data set. The statistics (lower quartile, median, upper
quartile, and IQR) used in the construction of a box plot do not get distorted by outliers.
On a box plot, observations beyond the two whiskers may be considered to be candidates
for potential outliers.
On a normal Q-Q plot, observations that arc well separated from the bulk (central part) of
the data typically represent potential outliers needing further investigation. Moreover,
significant and obvious jumps and breaks in a Q-Q plot (for any distribution) are
indications of the presence of more than one population. Data sets exhibiting such
behavior on Q-Q plots should be partitioned out into component sub-populations before
estimating various statistics of interest (e.g., prediction intervals, confidence intervals).
223
-------
Dixon's Test (Extreme Value Test).
o Used to identify statistical outliers when the sample size is less than or equal
to 25.
° Used to identify outliers or extreme values in both the left tail (Case I) and the
right tail (Case 2) of a data distribution. In environmental data sets, extremes
found in the right tail may represent potentially contaminated site areas
needing further investigation or remediation. The extremes in the left tail may
represent ND values.
o Assumes that the data without the suspected outliers are normally distributed;
therefore, it is necessary to perform a test for normality on the data without
the suspected outliers before applying this test.
° May suffer from masking in the presence of multiple outliers. This means that
if more than one outlier is suspected, this test may fail to identify all of the
outliers. Therefore, if you decide to use the Dixon's test for multiple outliers,
apply the test to the least extreme value first.
Rosner's Test.
o Can be used to identify and detect up to 10 outliers in data sets of sizes 25 and
higher.
o Assumes that the data are normally distributed; therefore, it is necessary to
perform a test for normality before applying this test.
Depending upon the selected variables and the number of observations associated with
them, either the Dixon's Test or the Rosner's Test will be performed.
Biweight Estimates.
o Based on the estimation methods of Mosteller and Tukey (1977), Kafadar
(1981) and LAX (1985).
MD-based test (Grubb's Test)
° This is the multivariate extension of the univariate test known as the Grubbs
test. It is based on the assumption of normality. The generalized distances of
the multivariate data are calculated and the observation with the distance
greater than the critical value is expunged from the data set. The test is
iterated until no outliers are detected.
224
-------
7.1.1 Dixon Test for Univariate Data
l. Click Outliers/Estimates >• Univariate > Dixon.
Scout' 4'.(^o[D:\Narain\SMUt^fqr^WindbwsJ\5coutSDurce\WqrkDatrn^el\palaAcensor--b^rgr^s1[]J
Outliers/Estimates,
d@ File Edit Configure Data Graphs Stats/GOF
Navigation Panel
Regression Multivariate EDA GeoStats Programs Window Help
Name
D.\Narain\Scout Fo
Gioup
Multivariate ~
1
3 202,
"4238!
Rosner
Biweight Estimates
MO Based
Group2>s
19 601
L»_bioup4:
V
Jf- 23 836,
Groupoft
116467
102 922
Ujjroupj
v
2.
The "Select Variables" screen (Section 3.2) will appear.
o Select one or more variables from the "Select Variables" screen.
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° The default the number for suspected outliers is 2. In order to use this test,
the user has to obtain an initial guess about the number of outliers that
may be present in the data set. This can be done by using graphical
displays such as a Q-Q plot. On this graphical Q-Q plot, higher
observations that are well separated from the rest of the data may be
considered to be potential or suspected outliers.
° Click on the "OK" button to continue or on the "Cancel" button to cancel
the Outliers tests.
225
-------
Output for Dixon Test.
; Dixon Outlier T est foi Selected Variable!
User Selected Options'
Date/Time of Computation 17/10/2007 3 53 44 PM
From File 'D \Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\bata\censor-bii-grps1
Full Precision |0FF
Test foi Suspected Outliers using Dixon Test |1
Dixon's Outliei Test foi GioupZ*!
Number of data = 20
10£ critical value: 0.401
5% critical value 0 45
1 % critical value 0 535
1. 37.867 is a Potential Outliei (Uppei Tai)
Test Statistic 0 383
For 10Z significance level, 37 867 is not an outlier
For 5Z significance level. 37 867 is not an outlier
For 1% significance level, 37 887 is not an outlier.
2.1.5 is a Potential Outliei (Lower T ai)
Test Statistic 0198
Foi 105; significance level, 1 5 is not an outlier
Foi 5X significance level, 1.5 is not an outlier
Foil % significance level, 1 5 is not an outlier
7.1.2 Rosner's Test for Univariate Data
1. Click Outliers/Estimates Univariate B>Rosner.
HI Scout' d'.Qj [D:\Narain\Scouti for. Wihdbws.VScoutSource\VVbrkDatrnExcel\Dati]\censorl=b^giipsj1i]j
Outliers^Estimates
Navigation Panel |
0
Multivariate ~
Dixon f
5 | 6
7 | 8
Name
Group
r T
Group2X | u-b™Pz
Groups |
D.\Narain\Scout_Fo ..
1
3 202j
!
Bfwetght Estimates |
MD Based 1
19 6011 1
13 896l 1
116 4S7|
2
!| 4 238i
102 922]
2. The "Select Variables" screen (Section 3.2) will appear.
° Select one or more variables from the "Select Variables" screen.
226
-------
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on the "Options" button.
o The default number for suspected outliers is "1." In order to use this test,
the user has to obtain an initial guess about the number of outliers that
may be present in the data set. This can be done by using graphical
displays such as a Q-Q plot. On this graphical Q-Q plot, higher
observations that are well separated from the rest of the data may be
considered to be potential or suspected outliers.
° Click "OK" to continue or "Cancel" to cancel the Outliers tests.
Output for Rosner Test.
. Rosner Outlier T est foi Selected Variables
U sei S elected Options'
D ate/T rme of Computation 7/10/2007 3 56 51 PM
From File
Full Precision
test fox 'N' Suspected Outliers using Rosner
D \Narain\Scout_Fa_Windows\ScoutSource\WoikDatlnExcel\Data\censor-by-grps1
OFF
T
Rosner'* Outlier T est forX
Number of data 53
Number of suspected outliers 1
Mean
5110
Potential]^
sd| outlier |
4337! 121111
Test,
value
1*61
Critical
value {%%)
3151
Critical
value (1%)
"" J504
bX Significance Level, there is no Potential Outlier
For 1% Significance Level, there is no Potential Outlier
227
-------
7.1.3 MD-Based (Grubbs Test) Test for Univariate Data
I. Click Outliers/Estimates ~ Univariate >MD-Based.
HI Seoul 4lOj -J[iD,:\yarain\Sco_ul'_f>qr.3Vindo\ysj\Sco.utSQ.urcej\WorkDatInExcel'iBRAD.IJ])
File' Edit Configure Data Graphs Stats/GOF
Navigation Panel
Regression MuWvanate EDA Geo9:ats Programs Window Help
Name
D \Narain\Scout Fo
1
Court
"i;
"z'i"
Outliers/Estimates
Q Dixon
Multivariate ~ | Rosner
Biweight Estimates
97
10.1
I
2. The "Select Variables" screen (Section 3.2) will appear.
o Select one or more variables from the "Select Variables" screen.
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on "Options.'
)§Sl Options'OutlieSMITiTkal
¦^Select.Critical Alpha
C: 0.010
C 0.025
0.050
C 0100
O 0.150
C 0.200
C 0.250
OK,
jj Cancel
° Click "OK" to continue or "Cancel" to cancel the Outliers tests.
Output example: The data set "BRADU.xls" was used for the univariate MD-based
(Grubbs) test. The theoretical maximum MD at the selected critical alpha is calculated
and compared to the maximum MD obtained from the data set.
228
-------
Output for MD-Based Grubbs Test.
| U nivariate M D B ased (G rubbs T est) 0 utlier Analysis
User Selected Options
Date/Time of Computation
1/7/2008 410:24 PM
Fforn File
D AN arain\S cout_For_Windows\S coutS ource\W orkD atl nE xcel\B RAD U
Full Precision
OFF
x1
No Outliers Piesent
Initial Conditions
Max MD
Max MD(0.05)
5 796
10 78
|
|
7.1.4 Biweight Estimate for Univariate Data
l. Click Outliers/Estimates > Univariate > Biweight.
Scout ^VOj^fP^arainNSco.ut^Fior^Wjindo^^cqutSo.urceWorkDairnExcel^RAtiUJI
File Edit Configure Data Graphs Stats/GOF
Navigation Panel
Outliers/Estimates
Regression Multivariate EDA GeoStats Programs Window Help
Name
D \Narain\Scout_Fo.
D.\Narain\Scout Fo
.0
Counl
1
Dixon
Multivariate ~ I Rosner
9 7!
1011
MD Based
in 7i
?n 7\
qii
8'
2. The "Select Variables" screen (Section 3.2) will appear.
° Select one or more variables from the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
o Click on "Options.'
229
-------
' Qfttions,
T ukey Location T uning Constant
T ukey Scale T uning Constant
Lax/Kafadar T uning Constant
Maximum Number of Iterations
OK-
30
Cancel
A
o Click "OK" continue or "Cancel" to cancel the Outliers tests.
Output example: The estimates of location and scale were computed using the Tukey's
bisquare function and the Kafadar biweight function.
Output for Biweight Estimates.
User Selected Options
Date>Time of Computation
Univariate Biweight Outliei Analysis
1V7/2O08 4T4 02 PM
From File
Full Precision
Number of Iterations
T ukey Location T uning Constant
D \N arain\S cout_Fa_Wmdows\S coutS ouiceSWoikD atl nE xcel\B FIADU
OFF
30
4 " ~ - -- - - - -
Tukeji Scale Tuning Constant
Kafadai Scale Tuning Constant
4
? " "" "
| Robust B iweight E stimates
_ _ _
Tukey
Kaf
adai
Variable j Ob* No.
Classical
Location | Scale
One-Step
Location Scale
Final
Location Scale
Fi
Location
rial
Scale
x1 | 75
3207 | 3G53
1.56 1 377
1 513 1 313
1 621
2.709
230
-------
7.2 Robust Estimation and Identification of Multiple
Multivariate Outliers
A myriad of classical and robust outlier identification procedures are available in the
literature. Most of procedures covering about the last three decades of research in the
area of robust estimation and outlier identification methods have been incorporated in
Scout. For the sake of comparison and completeness, some classical methods are also
available in Scout. A list of articles covering some of those research procedures is
provided in the references.
Several formal graphical method comparison tools have been incorporated in Scout.
Specifically, in both the outlier module and the Regression module, the user can pick
several methods and Scout will produce graphical comparison displays of those methods.
Some examples illustrating those methods are included in the User Guide. Several
benchmark data sets from the literature have been used throughout this user guide.
7.2.1 Classical Outlier Testing
Due to historical importance. (Wilk (1963)) and for the sake of completeness, classical
outlier methods have also been incorporated in Scout. The Classical outlier module
offers two tests for discordances: multivariate kurtosis and Mahalanobis distances
(sometimes called generalized distances). Multivariate kurtosis is also useful as a test for
deviation from normality in one or more dimensions. Both of those tests assume that the
data represent a random sample from a multivariate (p-dimensional, p> I) normal
population.
7.2.1.1 Mahalanobis'Distances
The classical Mardia's multivariate kurtosis (Mardia 1970, 1974, and Schwagerand
Margolin 1982) outlier (and multinormality) test and the MD test (Ferguson 1961a,
1961b and Barnett and Lewis 1994) have been incorporated into Scout. The generalized
distance (MD-based) test is a multivariate extension of a univariate Grubb's test (Grubbs
1950). Scout also computes robustified multivariate kurtosis, skewness, and the largest
MD. As can be seen below, outliers have a huge influence (impact) on those statistics.
1. Click Outliers/Estimates G> Multivariate [> Classical > Max MDs.
Hal Scout 4'.0j - [D:\Harain\Scouti_Fori Windin
«s\ScoutSource\WorkDatl'nExcel\BRADU|j
| Outliers/Estimates
Help
Navigation Panel |
0
Univanata ~ 1 1 t I n I c;
6
7
8,
Name
Count
""'1
2
-J Robust/Iterative ~ 1 Kurtosis
OutUmBiWt ost
1
1 2
imj qq! ?n>;j •! Cau5a
2. The "Select Variables" screen (Section 3.4) will appear.
231
-------
Select two or more variables from the "Select Variables" screen.
If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
Click on "Options."
e i-lnl'ixi
- S elect Critical Alpha -—7
O 0.010
Q 0.025
(5.'0.050
D'o.100
Q 0.150
Q 0.200
O 0.250
OK
Cancel
Select the required "Critical Alpha" and click "OK" to continue or
"Cancel" to cancel the Outliers tests.
-------
Output for Max MDs test for outliers.
Data Set used: Bradu (From Hawkins Kaas, and Bradu, 1984 article).
: Classical Sequential Outlier Test Based Upon Maximum Mahalanobis Distance (Max-MDs]
User Selected Options
Date/Time of Computation
3/4/2008 8 42 40 AM
From File
D, \N ar ain\S cout_For_W mdows\S coutS ource\WoikD atl nE xcel\B FtAD U
Full Precision
OFF
Number of Observations! 75
Number ofVariablesU
Classical Mean Vecta
V
x1
x2
x3
1.279
3 207
5 597
7 231
Dassical S tandaid D eviation Vectaf
V
x1
x2
x3
3 493
3 853 I 8.239
11.74
Classical CovarianceS Matrix
!
V
x1
x2
x3
122
9 477
20 39
31 03
9 477
13 34
28 47
41 24
20 39
28 47
67 88
94 67
31.03
41 24
94 67
1378
Determinant
1906
Log of Determinant
7.553
Eigenvalues of Classical Covaiiance S Maine
Eval 1
Eval 2
Eval 3
Eval 4
0.914
1 688
5 538
223.1
Critical Alpha10 05
Outlier Summaiv |
Obs No
Max MD
Max MD (0
14
43 7
17 43
12
27 75
17 39
11
34 2G
17 34
13 5G99
17 29
Result: Observations 11, 12, 13, and 14 were identified as outliers. The classical method with a classical
start could not identify the first 10 observations as outliers.
7.2.1.2 Multivariate Kurtosis
Mardia's multivariate kurtosis is an extension of the univariate kurtosis and, thus, may
also be used as a univariate outlier test. Multivariate kurtosis is also used to test
multivariate normality.
233
-------
Those tests, as incorporated in Scout, are sequential (repeated, without using any
previously identified discordant values). The process stops when no further outliers are
found. For the multivariate kurtosis test, the discordant observation identified is that
point which has the largest generalized distance from the sample classical mean vector.
1. Click Outliers/Estimates > Multivariate > Classical > Kurtosis.
Scout/d'.Oj^tpiAHarainXScoutliorWindows^XScqutSourcre^V/orl^QtrnExcel^FWDlLI])
Outliers/Estimates
Navigation Panel |
r
0 Univariate ~ II t I a 1 r
6
7
8
Name
Count Max MD
• v s-^1 1 Robust/Iterative
liMfflmfJSWJMI
MOUT_MD ost
I 2
- - , f 1 1 Causal ~
2| 101 95j ?f)"i
2. The "Select Variables" screen (Section 3.4) will appear.
o Select two or more variables from the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on "Options."
1 ®5l Pptj.QnS'Outlie^HBl
- Select Critical Alpha —
C. 0.010
0 0.025
(•; 0 050
C 0.100
C 0.150
O, 0.200
O 0.250
: 1
; 1
i OK j Cancel j
A
o Select the required "Critical Alpha" and click on "OK" to continue or
"Cancel" to cancel the Outliers tests.
234
-------
Output for Kurtosis test for outliers.
Data Set used: Bradu.
Usei Selected Options
Classical Sequential Outlier Test Based Upon Multivariate Kutosis
Date/T ime of Computation
3/4/2008 8.44 50 AM
From Fila
D \Narain\Scout_For_W
ndowsSS coutS ource\Wc
rkDatlnExcel\BRADU
Full Precision
OFF
Number of Observations' 75
Number of Variablesl 4
Classical Mean Vector
y
x1
x2
x3
1 279
3 207
5 597
7.231
Classical Standaid Deviation Vectoi
y
x1
x2
x3
3 493
3.G53
8 239
11 74
Classical Covaiiance S Matrix
y
x1
x2
x3
122
9.477
20.39
31.03
9 477
13 34
28.47
41 24
20 39
28.47
G7.88
94 G7
31 03
41 24
94 G7
137 8
Determinant
1906
Log of Determinant
7 553
Eigenvalues of Classical Covaiiance S Mabii
Eval 1
Eval 2
Eval 3
Eval 4
0 914
1 G88
5 538
2231
Critical Alpha
0 05
Outlier Summary
Obs No
Kurtosis
Kurtosis (0 05)
14
53 97
25 2
12
38 2G
251G
11
43 57
251
13 591G
2519
Result: Once again, only the observations 11, 12, 13, and 14 were identified as outliers. The other 10
outliers could not be identified due to masking effects.
7.2.1.3 Identifying Causal Variables
Once an outlier test has been performed, the user may wish to identify the variables (if
any) which are responsible for each discordant observation. This can be done by
selecting the "Causal Variables" option from the pull-down menu. However, there are
235
-------
several other methods (e.g., Q-Q plot of individual variables, bivariate scatter plot with
tolerance ellipsoid) available in Scout that can also be used to identify variables that
might cause an observation to be an outlier. The details of this classical method can be
found in (Garner, et al. (1991a and 1991b). This method retests each discordant
observation with one variable excluded at a time. Thus, each discordant observation is
tested p times using all subsets of p-1 of the variables. A variable is listed as causal only
if absence of that variable prevents rejection of the outlier. Although this procedure is
based on iterations of rigorous tests of hypothesis, the user should consider its results
only as general guidance and not as definitive proof of the cause. This method also
requires some additional research.
I. Click Outliers/Estimates > Multivariate S> Classical > Casual > Distances
or Kurtosis.
is
Scout; 20Q8, -- [D^arain^Scqu^t^for^Windoy/sAScoutSourceHVorkDatlrtL^elVBRADt),]!
Outliers/Estimates
20 File Edit Configure Data Graphs Stats/GOF
Navigation Panel
Name
RegPROPOut ost
RenPRnPniit a net
1
Count
Univariate
Recession Multivariate EDA GeoStats Programs Window Help
>_U 1 I 1 1 5 I fi
Multivariate ~ I Classical
97
101
mi
1.
Robustylterative ~
95
20 5|
?n?r
Max MD
Kurtosis
Causal ~ I Distances
Kurtosis
a
2. The "Select Variables" screen (Section 3.4) will appear.
o Select two or more variables from the "Select Variables" screen.
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.
° Click on "Options."
236
-------
luyi.i'.wmmii^ -ini xi
- S elect Cntical Alpha
C- 0.010
0 0 025 !
0 050
O 0.100
c 0.150
C 0 200
C o 250
OK
Cancel
Select the required "Critical Alpha" and click "OK" to continue
"Cancel" to cancel the Causal test.
-------
Output for Causal Variables using the MD test for outliers.
Output for MD test for outliers.
Data Set used: Bradu.
Classical Causal Variable Routine UsngMDs
Usei Selected Options
Date/Tme of Computation 11/6/2008 7 47 44AM
Fiom Fte D \Naiani\SciwLFo«_Wir^ows\ScoutSourc«\Wo(kDdtlr£xcel\BRADU
FuS Recision OFF
Numbej of Observations 75
Nurrter of Cofumnsr4
__1
0 it ea! Alpha 0 05
MaxMD (4,0 05)
1743
I
Max MD (3,0 05)
1555
_ i
Net* Data Matrix (Outlier Rows)
y ix1
i4 oi~~j ii
x2
""34
;X3
1 34
12 j -04 , 12
Tl ; ^0 2 _j 11
13 0 7 12
23
24
26
1 37 ;
L 35 !
i 34 " | " "
Obs No
~ ~ 14
n
fi
~ 13
MD_Distaree
437~
PVakie
r~
27 75
34 26
"56 98"
Row 14
Obs No I Observed
x2 is a Causal Variable
Predated
i 24 T5
Row 12
I
__ _L
y is a Causal Variable
Obs No | Observed
12 -0 4
Row Tl~
Redcted
1133
y is a Causa) Vatiable
Obs No^J Observed J Redcted
11) -02
Obs No | Obse/ved
111 35
I 1038
i3 is a Causal Variable
| Redcted
j 2958"
Row 13
y is a Causal Variable
Obs No ^Observed 1 Redcted
13 07
Obs No | Observed
"13 "IF"
I 9 81
r1 is a Causal Variable
J Redcted
I 1T32
13 26
1 .
*2 is a Causal Variable
Obs No | Observed I Predcted
_ |_ ____
*3 is a Causal Variable
Obs No| Observed j PTedicted
13 34" * | 32 54
238
-------
Results: Observations 14, 12, II and 13 are identified as outliers with variable "x2" in 14, variable "y" in
12, variable "y" in 1 1 and all variables in observation 13 as potential causal variables. The predicted value
is obtained by using regression with the causal variable as the dependent variable and other variables as the
independent variables.
Output for Causal Variables using the Kurtosis test for outliers.
Classical Causal Variable Rountine Using Kutass
User Selected Options
Date/Time of Computation
1/16/2009 11 -45 48 AM
From File
Full Precision
D \Narain\Scout_For_WindovjsVScoutS ource VWorkDatlnE xce[\BRADU
OFF """
Number of Obsavabons! 75
- - J
__
-
—
Number of Columns14
Critical Alphaj'O 05
.
_j
—
Full Cutoffi 25 2002
i
Reg CuloK
161643
|
New Data Matiix(Oulliei Rows)
V xl
"01" j "11
k2
" "34
x3 ¦
—
¦0.4 | 12
1il i~Ti
23
-24 —
37 :
35 " 1"
""
0.7 | 12
26
34 !
0b$ No iKuitosis
- —
P Value
r "
14 53 9678865248759
N/A
12 38.2586440611149
N/A
ll!« 570)927137092
N/A
13j 59 1622093139548
N/A
Row 14
x2 is a Causal Vaiiable
ObsNo j Observed
Predicted
14;34
17 3214183128282
Row 12 | |
i
y is a Causal Variable
ObsNo j Observed
Predcted
12-04 [11 999899S724309
i
I
Row 11
y is a Causal Variable
ObsNo j Observed
Predicted
V2
11 6785030135424
*3 is a Causal Variable
ObsNo .Observed
Predicted
11(35
23 8400404984235
Row 13
pis a Causal Variable
ObsNo!Observed
Predicted
'
13,0 7
12 0185193167481 "
x3 is a Causal Variable
ObsNo. | Observed
Predicted
13t 34
26 7004459115258
_
. __
239
-------
7.2.2 Robust Outlier Testing
Detection of multiple (one or more) anomalies in multivariate data sets is a complex
problem. Considerable research has been performed on this topic. Many classical and
robust outlier identification methods have been developed since early 1970s.
Mahalanobis distances (MDs or Mds) play the key role in identifying the outlying
observations. As is well known, multiple outliers tend to influence the estimates (even
robust estimates) of the means, variances, covariances, and the MDs significantly.
Therefore, the MDs get distorted by the same observations that they are supposed to find.
In an effort to identify the best possible method(s) to identify outliers, the developers of
Scout considered several classical as well as robust outlier identification and robust
estimation methods.
Scientists dealing with the multivariate data realize that there is no substitute for the
graphical display of their data. Therefore, the developers of Scout have put extra
emphasis on formal graphical displays of multivariate data sets. The Scout outlier
module offers the following methods to identify outliers in multivariate data sets.
o Classical (using the Max MD as proposed by Wilks in 1963)
o Sequential Classical
o Huber
o Extended Minimum Covariance Determinant (MCD)
o Proposed (PROP)
o Multivariate Trimming (MVT)
The critical values used for the Max MD statistics are obtained using the Bonferroni
inequality and the scaled beta distribution: (n-l)*(n-l) Beta (p/2,(n- p-l)/2)/n of MDs.'
Often, because of the computational ease, an approximate chi-square distribution with p
degrees of freedom is used for the distribution of MDs. The difference between the two
options is significant, especially when the dimensionality, p, is large (p = 4 is large
enough for the sample of size n = 100). For details, one can refer to Singh (1993). For
comparison sake, both distributional options have been incorporated into Scout.
All of the outlier methods listed above (except for the classical method) are iterative. The
MCD method (Rousseeuw and Van Driessen (1999)) has been extended to accommodate
other options. For example, instead of obtaining the MCD by minimizing the
determinant of a covariance matrix based upon h = [(n+p+1/2)] observations, one can
choose other values for h. The objective is to find "real and only real outliers," and not to
identify outliers using the maximum break down point option. The multivariate trimming
(MVT) method (Devlin, S.J, et al. (1981)) is also available in Scout to identify outliers.
The Huber and PROP methods represent M-estimation methods based upon the Huber
(Huber, 1981) influence function and the PROP (Singh, 1993) influence function,
respectively. The iterative process reduces the influence of potential outliers iteratively.
This is especially true for the PROP influence function. The convergence is normally
achieved in less than 10 iterations. A default of 20 iterations is used in the Scout
240
-------
software. In those iterative procedures, initial estimates of the population mean vector
and the variance covariance matrix are required. Several initial estimate methods are
available in Scout. Specifically, classical maximum likelihood estimates (MLEs) and
various robust estimates (e.g., median, MAD or IQR; median, OKG; median, KG; and
MCD) are available as initial start estimates. The use of robust initial estimates is
recommended for improved and more resistant estimators of the population parameters at
the final iteration. The use of the PROP influence function with an initial robust start
(e.g., OK.G estimate of covariance matrix) seems to be very effective in identifying
multivariate outliers and computing robust and resistant estimates of the mean vector and
the covariance matrices.
1. The sequential classical method performs the classical procedure (with a classical
initial start) by iteratively removing outliers at each iteration. The MDs are
compared to the Max (MD) critical value (e.g., for a = 0.1). That is, there is a
hard rejection point (= Max (MD)) for outliers. This procedure suffers from
masking effects when multiple outliers are present. This masking effect can be
reduced if an initial robust start is chosen instead of the classical initial start. This
is illustrated in the graphical comparison section in the following.
2. The Huber procedure uses the Ruber influence function (Huber 1981) which
assigns unit weight to the observations coming from the central part of the
distribution and reduced weight coming from the tails of the underlying
distributions. However, outliers always leave some influence on Huber estimates.
3. The MCD method uses the minimum covariance determinant (MCD) method of
Rousseeuw and Leroy (1987). Its objective is to find a set of observations in the
data set whose covariance matrix has the lowest determinant. The fast algorithm
to compute the minimum covariance determinant estimator has been incorporated
into Scout.
4. The PROP procedure uses a smooth redescending influence function and uses the
cut-off points from the distribution of the MDs. Both options, the Beta
distribution and the chi-square distribution, are available in Scout. The use of
beta distribution is recommended. Using this influence function, the extreme
outliers coming from the tails of the contaminating distribution get almost
negligible weights. This procedure provides an automatic way of dealing with the
outlying observations present in a multivariate data set. A tuning constant, "c,"
can be used to make the influence function more resistant to outliers. In most
applications, c = I works very well. However, when proportion of discordant
observations increases, smaller values of c (< 1.0) are recommended. This
procedure also works very effectively in estimating the principal components
(PCs), including the low variance PCs. Probability plots of the PCs based upon
this influence function are good enough to reveal all kinds of outliers, including
those which might be inflating variances and covariances inappropriately and
those which might be violating the correlation structure imposed by the bulk of
the data.
241
-------
5. Robust procedures based upon multivariate trimming (MVT) often work well in
estimating the robust PCs and revealing discordant observations. A fixed
proportion, p (0.05, 0.1, 0.2 etc.), of the observations with the largest values of
MDs is temporarily set aside and the estimates of the mean vector and variance
covariance matrix are recomputed based on the remaining n (1 - p) observations.
This iterative procedure, which requires computations of those MDs at each step,
stops as soon as the desired degree of accuracy has been achieved.
As mentioned before, another important aspect of robust outlier detection is obtaining an
initial estimate to start the iterative robust procedures. Scout provides several procedures
to compute the initial estimates of the mean vector and the covariance matrix:
® Classical
o Sequential Classical
® Robust Median/MAD
° OKG (Orthogonalized Kettering Gnanadesikan, Maronna-Zamar, 2002)
° KG (Kettering Gnanadesikan, 1972)
O MCD
The classical method uses the mean vector and the variance-covariance matrix as the
initial estimate.
In the outlier module, the sequential classical method computes mean vector and the
covariance matrix iteratively by removing outliers (observations exceeding the Max
(MD) at each iteration.
The Robust Median/MAD (Median Absolute Deviation) method uses the median vector
as the initial estimate of the location vector. For the dispersion matrix, the classical
variance covariance matrix is used, but the diagonal elements are replaced by the simple
If the MAD is equal to (or approximately equal to) 0 (as in Fisher's Iris data set), then the
separately for each variable. This is called the IQR fix in Scout.
The OKG (Orthogonalized Kettering Gnanadesikan) method uses the median vector as
the initial estimate of the location vector. In practice, such a KG covariance matrix may
not be positive definite (can yield even negative eigen values). The dispersion matrix of
the Kettering Gnanadesikan can be orthogonal ized using the procedure described by
Maronna and Zamar (2002) to get a positive definite dispersion matrix. This procedure i
also available in Scout.
robust estimate of the variance, given by
242
-------
For the MCD method, the objective is to find a subset of some specified size, h
(n/2 Multivariate E> Robust > Sequential Classical.
Scou(4.(^'[p.:\Harain\Sco.ut_j7or._Windo>ys^\ScoutSource\WorkDatlhExcel\BRADIlJ]j
Outliers/Estimates |
p9 File Edit Configure Data Graphs 5tats/GOF
Navigation Panel
Regression Multivariate EDA GeoStats Programs Wndow Help
Name
D \Narain\Scout Fo
0
Count
Classical
1
l;
2 ~
~3~
4f
97
ToT
~T03
95
1«_
95|
"107|
99I
"103
"iotf
Robust/Iterative > | Sequential Classical
-r
20 5
20 2
215
21 1
"204
Huber
Extended MCD
MVT
OKG Reweighted
PROP
Method_Comparison
T-
8
2. The "Select Variables" screen will appear (Section 3.4).
o Select the variables from the screen.
o If various groups are available and the analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.
° Click the "Options" button for various options.
243
-------
11 W'ultiyaoiatej f&b.us,t< %gju&njHaAtC.la^.i^ot
111
"Correlation R Matrix -
(* Do Not Display
C Display
"Intermediate Iterations
(* Do Not Display
r Display Every 5th
C Display Every 4th
C Display Every 2nd
C Display All
"Select Number of Iterations
r i
[Max = 100]
pCutoff for Outliers
Critical Alpha
(105
OK
Cancel
A
o Specify if the correlation matrix of the final robust estimate is to be
displayed or not. The default option is "Do Not Display."
o Specify the number of iterations to be computed. The default is "10."
o Specify if the intermediate iterations be displayed or not. The default
"Do not display."
o Click "OK" to continue or "Cancel" to cancel the options.
o Click the "Graphics" button for the graphics options and check the three
check boxes to get the following screen.
H Sequential G.lassiea.() Qca jjfiibs; Qjj.tions;
F7 Index MD Plot
F7 Distance-Distance MD Plot
F? QQ MD Plot
_QQ Plot Options
MDs Distribution
f* Beta Chi
Title for Index MD Plot
Sequential Classical
Title for Distance-Distance Plot
Sequential Classical
Title for QQ MD Plot
Sequential Classical
Graphics Critical Alpha
I 005
Cancel
OK
-------
o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the sequential outlier estimates.
o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
sequential outlier estimates.
o Specify the "Title for Q-Q MD Plot." Select the distribution required
for the "Q-Q Plot" and the "Graphics Critical Alpha" for identifying
the outliers.
Note The "Graphics Critical Alpha" should match the "Critical Alpha" from the
Options Multivariate Robust Sequential Classical window to obtain the same
outliers The user should type suitable titles related to the data set.
o Click "OK" to continue or "Cancel" to cancel the options.
o Click "OK" to continue or "Cancel" to cancel the sequential classical
procedure.
Output example: The data set "BRADU.xls" was used for Sequential Classical. The
outliers are removed (down weighted from 1 to 0) at the end of each iteration and the
location and scale estimates are calculated at the end of each iteration.
245
-------
Output for Sequential Outliers method.
Data Set used: Bradu.
Multivariate Robust Sequential Classical OutEef Anal)>sts
User Selected Options
Date/Time of Computation
3/4/2008 904:35 AM
From File
D: \N arain\S cout_For_Windows\S coutS ource\WorkD atl nE xcelVB RADU
Full Precision
OFF
Display Correlation R Matrix
Do Not Display Correlation R matrix
Number of Iterations
10
Show Intermediate Results
Do Not Display Intermediate Results
Title for Index Plot
Sequential Classical
Title for Distance-Distance Plot
Sequential Classical
Title for QQ Plot
Sequential Classical
Graphics Critical Alpha
0.05
MDs Distribution
Beta
Number of Observations
75
Number of Selected Variables
4
Max Squared MD (0 05)
1743
Classical M ean Vector
V
x1
x2
x3
1 279
3 207
5.597
7.231
Classical CovaiianceS Matrix
V
x1
x2
x3
12.2
9.477
20 39
31.03
9 477
13.34
28.47
41.24
20.39
28.47
S7.88
94.S7
31.03
41.24
94 87
137.8
Determinant
190G
Log of Determinant
7.553
E igenvalues of Classical Co variance S Matrix
Evall
Eval 2
Eval 3
Eval 4
0 914
1 G8S
5.538
2231
3 Outliers were found using Classical Method
4 Outliers were found using Sequential Classical1
246
-------
Output for Sequential Outliers method (continued).
SquaredMDs
Obs
Classical
Gequenlial
Weights
1
G02G
5G82
1
2
6.729
6.588
1
3
7.209
7.197
1
4
6 321
9.265
1
5
6.339
6.751
1
G
7.036
6 755
1
7
8 235
8 70G
1
8
6 714
6 788
1
9
6 445
7.52
1
10
7 34G
7.215
1
11
18.61
333.7
0
12
27 74
3GG4
0
13
14 79
310 2
0
14
43.7
512.8
0
15
3.386
5.109
1
1G
4.783
5.G89
1
17
2.004
2.49G
1
18
0.751
0.764
1
19
1 408
1.934
1
20
2 556
2 954
1
21
1 281
2173
1
22
2.578
3 282
1
23
1.216
2.757
1
24
1.321
3.455
1
25
0 658
1.394
1
2G
1.413
4.544
1
27
2.492
5 615
1
28
0.7G5
1.829
1
29
0.3G4
1 577
1
30
2.793
4.472
1
31
3 395
3 219
1
32
1.772
2.704
1
33
0 9G7
3 073
1
34
1 475
1 797
1
35
1 58
1.622
1
3G
1 008
3.66G
1
37
3.3G8
4 751
1
(The complete output table is not shown.)
-------
Output for Sequential Outliers method (continued).
Sequential Classical Estimates
Sequential Classical Mean Vector
V
x1
x2
x3
1 348
2.739
4.406
5 666
Sequential Classical Covariance S Matrix
y
x1
x2
x3
12.8
10.G3
23.09
34.89
10.63
9.938
19.57
29.69
23.09
19.57
43.69
64.82
34.89
29.69
64.82
99.08
Determinant
56.61
Log of Determinant
4.036
Classical Kurtosis
53.97
Sequential Kurtosis
8084
Results: Four (4) observations (II, 12, 13 and 14), with squared distances greater than the Max (squared
MD), were given zero (0) weights (hard rejection) and were considered to be outliers.
248
-------
Graphical Output for Sequential Outliers method (continued).
Sequential Classical
549.14
529.14
463.14
449.14
429 14
409.14
389 14
369 1 4
349 14
329 14
309.14
73
¦H 26914
C 229.14
16914
149.14
12914
10914
89.14
5914
1 j-95% Mmmiffi [Largest hC) Lirrf »17 4345
-1086
-30 86
-50-M ' , , 1 ( ; 1 1 ,
-6 4 14 24 34 44 54 &4 74
Index
Sequential Classical
509 14
489.14
46914
44914
429.14
40914
38914
369.14
§ 343 1 4
1 32914
3 309.14
XT
V) 289.14
Classical Squared MD
249
-------
Sequential Classical
N
75 0000
54914
P
40000
529 14
Slope
33.2001
509 14
M Intercept
-41 5886
Correlation Coefficiert
0 7184
489.14
Critical Coreeiation (0.05)
0.9941
469 14
Kurtosis
8,063 6558
449.14
Critical Kixtosis (0.05)
252002
Skewness
140.0560034
429 14
Critical Skewness (0.05)
2.3990
409.14
389.14
369 1 4
349.14
Q 32914
j 309.14
is 28914
V)
« 269.14
"53 24914
1.9 14
16914
149.14
129.14
109.14
9.14
29 14 195%Maxtnun(LargestMP)Umt-17 4345~|
6 7 8
Beta Quantiles
Graphical Interpretation: The observations between the "Warning (Individual MD) Limit" and
"Maximum (Largest MD) Limit" lines represent borderline outliers and may require further investigation.
The Warning Limit represents the critical value from the scaled beta distribution of the MDs at a specified
level of significance (here, 0.95), and the Maximum (Largest MD) Limit represents the critical value of the
Max (MD) obtained using the Bonferroni inequality (details in Singh, 1993).
7.2.2.2 Huber
1.
Click Outlier/Estimates ~ Multivariate ~ Robust ~Huber.
3Scout 4.0 [D:\Narain\Scout_For_Windows\ScoiitSource\WorkDatlnExcBl\BRADU]
Outliers/Estimates
¦y File Edit Configure Data Graphs Stats/GOF
Navigation Panel I
Regression Multivariate EDA GeoStats Programs Window Help
Name
D:\Narain\Scout Fo..
~
0
m
Univariate ~
1
J 5 6
7
8
Count
Multivariate ~ j
Classical
~
> I I
1
1 l!
9.7 1
Robust/Iterative ~
Sequential Classical
2
2
101 9.5
20.5
j
Extended MCD
3
3
10.3 10.7
20.2
MVT
4
4
95 9.9
21.5
OKG Reweighted
' 5
5
10 10.3
21.1
PROP
Method Jlomparison
6
6
10 10.8
20.4
The "Select Variables" screen will appear (Section 3.4).
250
-------
° Select the variables from the screen.
° If various groups are available and an analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.
o Click the "Options" button for various options.
plMultiyariatej Outlier, Options;
"Select Initial Estimates
Classical
Sequential Classical
Robust (Median, MAO)
(• OKG (Maronna Zamar)
KG [Not Orthogonalized)
r MCD
"Cutoff for Outliers
Critical Alpha
0 05'
r
"Correlation R Matrix -
(* DoNot Display
Display
~MDs Distribution
G Beta C Chi
-Select Number of Iterations
I
[Max = 50]
-Influence Function Alpha -
Influence Function
r
0 05
Alpha
"Intermediate Iterations
(* Do Not Display
C Display Every 5th
Display Every 4th
Display Every 2nd
<** Display All
OK
Cancel
o Specify the "Initial Estimates" to start the Huber iterative procedure.
The default is "OKG (Maronna Zamar)."
o Specify the distribution for the Mahalanobis distances in the "MDs
Distribution." The default is "Beta."
o Specify the "Critical Alpha," the cutoff for outliers. The default is
"0.05."
o Specify the "Number of Iterations." The default is "10."
o Specify the "Influence Function Alpha" for the Huber weighting
process. The default is "0.05." *
o Specify "Correlation R Matrix." The default is "Do Not Display."
o Specify "Intermediate Iterations." The default is "Do Not Display."
o Click "OK" to continue or "Cancel" to cancel the options.
° Click the "Graphs" button for various options.
251
-------
Hcaiiiia? ®tpaaai0
f7 Index MD Plot
I? Distance-Distance MD Plot
f? QQ MD Plot
QQ Plot Options
MDs Distribution
(* Beta C Chi
Title for Index MD Plot
Huber Estimate
Title for Distance-Distance Plot
Huber Estimate
Title for QQ MD Plot
Huber Estimate
Graphics Critical Alpha
I 005
Cancel
OK |
A
o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the Huber estimates.
o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
Huber estimates.
o Specify the "Title for Q-Q MD Plot." Select the distribution required
for the "Q-Q Plot" and the "Graphics Critical Alpha" for identifying
the outliers.
Note- The "Graphics Critical Alpha" should match the "Critical Alpha" from the
Multivariate Outlier Options window to obtain the same outliers.
o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the Huber procedure.
Output example: The data set "BRADU.xls" was used for the Huber method. It has 75
observations and four variables. The initial estimates of location and scale for each group
were the median vector and the scale matrix obtained from the OKG method. The
outliers were found using the Huber influence function and the observations were given
weights accordingly. The weighted mean vector and the weighted covariance matrix
were then calculated.
252
-------
Output for Huber outliers method.
Data Set used: Bradu.
User Selected Options
Huber Multivariate Outlier Analysis
Date/T ime of Computation
1/8/20081.21.20 PM
From File
DAN arainVS cout_F or_Windows\S coutS ource\WoikD atl riE :-;cel\B RAD U
Full Precision
OFF
Critical Alpha
0 05
Influence Function Alpha
0.05
Initial Estimates
Robust Median Vector and OKG (Maronna-Zamar) Matrix
Display Correlation Fl Matrix
Do Not Display Correlation R matrix
Distribution of Squared MDs
Beta Distribution
Number of Iterations
10
Show Intermediate Results
Do Not Display Intermediate Results
Title for Index Plot.
Huber Estimate
Title for Distance-Distance Plot
Huber Estimate
Title for QQ Plot
Huber Estimate
Graphics Critical Alpha
0.05
MDs Distribution
Beta
Number of Observations! 75
|
Number of Selected Variables! 4
I
CriticalValues ;
Max Squared MD (0.05) 17.43
I
!
I ndividual S quared M D (0.05) | 9.128
i
M ultivariate Kurtosis(0.05) | 25.2
1
Influence Fn. Squared MD (0 05)
9128
Classical M ean Vector
V
x1
x2
5 597
x3
7 23\~
1 279
3 207
Classical CovarianceS Matrix
—
y
xl
x2
x3
9.477
20.39
31.03
9.477
13.34
28.47
41.24 |
20.39
28.47
67.88
94.G7
31.03
41.24
94. G7
. 1?-8_L
Determinant
190G 2B0803882B2
253
-------
Output for Huber outliers method (continued).
Eigenvalues for Classical Covariance S Matrix
Eval 1
Eval 2
Eval 3
Eval 4
0.914
1 688
5 538
223.1
MedianVector
V
ol
x1
x2
x3
1.8
2.2
2.1
MAD/0.6745 Vector Representing Standard Deviation
y
x1
x2
x3
|
0.89
1.927
1.631
1.779
|
|
OKG MeanVector
V
x1
x2
x3
1 258 | -7 552
-6.052
2.521
R obust 0 KG (M aronna Zamar) Covariance S M abix
y
x1
x2
x3
0 598
0.243
0.115
o.Tob"
-0.231
024f
0.243
2.856
0.115
0.108
2.175
0 00402
-0.231
0.241
0.00402
2 34
Determinant
7 84553740487618
Robust OKG Eigenvalues
Eval 1
Eval 2
Eval 3
Eval 4
0.53
2.149
2 316
2.975
|
Final Weighted Mean Vector
V
x1
x2
x3
|
1.332
2.849
4.68
6.033
|
|
Final Covariance S Matrix
y
x1
x2
x3
12.76
10.57
22.94
34 67
10.57
10.14
20.08
30 37
22.94
20.08
45
66 52
101 4
34.67
30.37
66.52
Determinant
119 506396724439
254
-------
Output for Huber outliers method (continued).
j Observations with Squared MDs greater than 17.43
| may be considered as potential outliers!
Observation
Number
1
10
TV
12
Tf
IT
~T5
16
17
18
"i§"
20
TV
~22~
~23
~24~
~25
26
"27"
28
"29
"30"
IV
~32
13
"34"
Final
Classical | Initial Final
Squared MD Squared MO Squared MQ Weights
i54 r
69978
6 026
6.729
7 209
761.9
6 321
6.339
7691
7G56~
7.036
8.235
701 8
74072"
G 714
6.445
692 4
7408"
7 346
TaeV"
27.74
"14 79"
7191
~69l7T
732.6
712.8"
43 7
"3.386"
4.783
2.004
0.751
17408"
2.556
" 1.281"
2.578
"17216""
1.321
1.658"
1 413
2.492
0.765
0.364
2.793
"3395"
T772"
1 967~
'T475~
907
""17944
2 228
2.761
0.303
17675"
5 709
6.635
7183
7 201
6 216
6.7
1757"
6 71
6.606
7.218
160.8
0.239
182.7
"1487V"
0 225
125
270.9
3.909~
0184
5.488
2.463
0.789
"i.9lT
1 234
17165"
1.322
2.84
1.542
"3846
2.229
2387"
1.029
1.612
T945"
7.721"
T478"
1.674
"4.46V
2.828
1.867"
3 273
"V843"
2.856
1.261"
2.699
"37599"
1.64
"i 135"
4.316
17276""
26
1.028"
T.824"
255
-------
Output for Huber outliers method (continued).
35
1 58
1 oW
0.923
1 561
1.945
1
36
2.086
1
37
3.368
Z047
3.842
1
38
0.889
2.826
3.046
1
39
1 935
~"1 237
3.883
17461"
2.76
1.314"
1
40
1
41
3139
1.386
3.637
1
42
3.358
2.147
3.89
1
43
4.108
2.526
6.291
4 734
1
44
2.686
2.505
1
45
1.278
3.85
1 671
1
46
2.225
1.865
3.851
1
47
4.332
~zm~
5 896
'lT054~
4.947
2.641
1
48
1
48
2 552
0.278
1.422
2.927
1
50
0 856
0.74
1
51
1 858
4.587
1.936
3.562 ~~
2.959
4898"
1
52
1
1
53
4 889
4.616
7.612
54
2 604
2.131
4.971
1
55
1.622
1.159
2.19
1
1
1 ~
56
1.898
1.137
3.037
57
0.773
1.988"
2.385
1 208 ~
2.402
~ 279
58
1
58
0.463
_3J341T
0.523
0.681
1
60
2.628
4.791
1
61
2.806
" 0 675"
1.957
193
3.26
1 i
62
3 022
1
63
1 757
2.344
2 934
1.943
1
64
1.112
1.602
1
65
1 336
1.909
3.045
1.913""
1
66
1.B96
0.445
~ 2AS2~
3.273
1
67
1.752
47222™
0.688
5^049"
1
68
69~
1
1.154
2.636
1 83
1
70
1 042
1.609
1.423
1
71
0.413
1 .Tl4
0114
0 381
1
72
0.732
1.117
1
1
73
2.207
1 311
2.576
256
-------
Output for Huber outliers method (continued).
74
2.742
2.635
3.069
1
75
3.632
2.62
5.12
1
Classical
Initial
Final
Multivariate Kurtosis
53.97
101431
2076
Results: Four (4) observations (11, 12, 13 and 14), with squared distances greater than the squared MD
17.43, were given weights between 0 and 1 (soft rejection) and were considered to be outliers. Note that
due to masking effects, the Huber method did not identify the remaining 10 outliers present in this data set.
Huber Estimate
293.33
283.33
273.33
263.33
253.33
243.33
233 33
223 33
213.33
203.33
193.33
183 33
173.33
163.33
M 153.33
Q
S 143-33
a> 133 33
-Q
123 33
113.33
103.33
9333
83.33
73.33
63.33
5333
4333
3333
2333
1333
333
-667
-1667
•2667
195% Maximum iLxyst MD) Ln> ¦ 17.4345
, tfj Limi - 91281
Index
257
-------
Estimate
293.33
283.33
273.33
26333
25333
24333
233.33
223.33
213.33
20333
193.33
16333
17333
163.33
» 153.33
Q
S 143.33
© 133 33
12333
113 33
103.33
9333
6333
7333
63 33
53.33
4333
3333
23.33
13.33
3.33
-6.67
-16.67
-26 67
|9S% Maximum (Larged MP) Lin-J ¦ 17 4345
Slope
Intercept
Correlation Coefficient
Crtical Correlation (0 05)
Kulosij
Crticarf Kurtosis (0.05)
Skewness
Crtical Skewness (0.05)
750000
40000
170079
-20.2021
0.7323
09941
2,075.7165
25.2002
15,680 5992
Beta Quantiles
|9S% Warrrtg QndrvKiuel MP) L>nl» 91291 |
10 11 12 13 14
258
-------
Graphical Interpretation: Observations (if any) between the "Warning (Individual MD) Limit" and
"Maximum (Largest MD) Limit" lines may require further investigation. Those observations have
reduced weights between 0 and I.
7.2.2.3 Extended MCD
1.
Click Outlier/Estimates > Multivariate Robust E> Extended MCD.
Scout' 20,08;--[D:J\Narain\ScoutJor^W,indp^\ScoutSource\WorkDatliiE-xcel\BRAl)U]j
Outliers/Estimates 1
File Edit Configure Data Graphs Stats/GOF
Navigation Panel
Regression Multivariate EDA GeoStats Programs Window Help
Name
PROPOut ost
PROPIndex of Obs
PROPDD gst
PROPQQ gst
Count
Univariate ~ L
Multivariate ~
~1
97
Classical
Robust/Iterative ~
1011
95
103
107:
9 5|
10;
99
103
20 5
202
21 5
21 1
-i
Sequential Classical
Huber
Extended MCD
MVT
PROP
Method Comparison
2. The "Select Variables" screen will appear (Section 3.4).
° Select the variables from the screen.
o If various groups are available and analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.
° Click the "Options" button for various options.
§1 QptionsMuttiQutMerMEPj
~MCD Options
Initial Subset Strategy
W Ad|ust ('h') Value
l>7 Adjust Initial/Final Subsets
!7 Display Minimum Determinant Data [h.p]
"Initial Subset Stiategy
(* Size = p+1 (Default)
f Size = 'h'
All Subsets of Size = p+1 (p < 10)
—Ad|ust 'h' Value
C (n + p + 1) / 2 (Default)
(• User Specified
X Non-Outliers
I 075
— Irutial/Final Subsets
Number of Best Retained
I i°
Number of Elemental Sets
| 500
OK
Cancel
A
o When all of the checkboxes are checked in the "MCD Options," the
options window looks like the one above.
259
-------
o "Initial Subset Strategy": this is used to specify the size of the initial
subsets. It can be equal to the number of variable plus 1 (p + 1) or of
size equal to h, the number of non-outliers.
o Specify the "Initial Subset Strategy." The default is "Size = p+1."
o "Adjust "h" Value": this is used to specify the number of non-
outliers. It can be equal to —or a percentage.
Specify the Adjust "h" Value. The default is
o "Adjust Initial/Final Subsets": this is used to specify the number of
elemental subsets of size "p+1" or "h" to be used to start the C-Step
operations of the MCD algorithm and the number of subsets with the
lowest determinant of the scatter matrix to be retained to continue C-
Steps until convergence.
o Specify the "Adjust Initial/Final Subsets." The defaults are "10" and
"500" respectively.
o Click "OK" to continue or "Cancel" to cancel the options.
Click the "Graphs" button for various options.
@1 MEDj Gra phicSj Options)
P Index MD Plot
l«? Distance-Distance MD Plot
If QQ MD Plot
Title for Index MD Plot
MCD Estimate
Title for Distance-Distance Plot
MCD Estimate
Title for QQ MD Plot
MCD Estimate
Cancel
OK
A
o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the MCD estimates.
-o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
MCD estimates.
o
Specify the "Title for Q-Q MD Plot."
-------
o Click "OK" to continue or "Cancel" to cancel the options.
° Click "OK" to continue or "Cancel" to cancel the MCD procedure.
Output example: The data set "BRADU.xls" was used for the MCD method. It has 75
observations and four variables. The MCD estimates of location and scale were obtained
using the best "h" subset. Then this scale estimate was adjusted for multi-normality.
Using this estimate, outliers were obtained and weighted (hard weighting) accordingly.
The weighted mean vector and the weighted covariance matrix were then calculated.
Output for MCD outliers method.
Data Set used: Bradu.
i Multivariate Robust MCD Outliei Analysis
User Selected Options
Date/T ime of Computation 11 /8/2008 4 52 34 PM
Fiom File |D \Naram\Scout_For_Windows\ScoutSource\WorkDatlnExcel\BRADU
Full Precision l OFF
sd Number of Intermediate Subsets
Number of Best Retained Subsets
Initial Elemental Subset Size 1500 Random Subsets of Initial Size of 'p + 1' will be used
'h' Value of '(n" 0 75' wiO be used
500 "
500
Usei Selected 'h'Value
Selected Number of Initial Subsets
110
Minimum Determinant Data [h.p] | Will Not be Display in Output
Title for Index Plot |MCD Estimate
Title for Distance-Distance Plot ' MCD Estimate
Title"for QQ Rot iMCD Estimate
Giaphics Critical Alpha 0 05
Number of Observations' 75
Number of Variables
4
Size of Elemental Subsets
5
Number of Initial Subsets
HjiT"
Number of Intermediate Subsets
500
'h' Value
56
Bieakdown Value
0.2667
Chisquaie with 4 DoF (0 975) 11 1510
Unsquared Chisquare with 4 DoF (0 975)| 3 3393
Chisquare with 4 DoF (0 50)!3 3531
T"
Classical Mean Vectof
— I
y
xl
x2
x3
1 279
3 207
5 597
7 231
Classical Covariance S Matrix
y
*1
x2
r^r-
122
9 477
20 39
31 03
3 477
' T334 "
"2847
TT24
20 39
28 47
67 88
94 67
31 03
41 24
94 67
137 8
_L._.
I
Determinant|
1906 26080388263
261
-------
Output for MCD outliers method (continued).
B est 'h' S ubset of 56 0 bservations
15. 16. 17.
18. 19,
20, 21.
22, 23
24. 25, 26,
27. 28.
29, 30.
31, 32
"33. 3
35. 3G, 37,
38, 39,
40. 41.
42, 44
45, 46, 48,
49. 50,
51, 54,
55, 56
57. 5
59. GO. 61,
62, 63,
64. 65.
66. 67
69, 70. 71.
72, 73,
74. 75
i I
MCD MeanVector
V
x1
x2
x3
-0 0893
1 554
1 861
1 707
MCD Covaiiance S Matrix
y
x1
x2
x3
|
0.303
0153
¦0.00848
•0.138
0.153
1 036
0.0987
0167
•0 00848
0 0987
1 04
0158
-0138
0167
0158
1 084
MCD Determinant
0 282788956390238
Adjustment Factor lor Midtinormality
1 41188216564302
|
Adjusted MCD Covaiiance S Matrix
y
xl
x2
x3
0428
0 215
¦0 012
¦0.194
0215
1 463
0139
0 236
•0 012
0139
1 468
0 222
•0194
0 236
0 222
1.53
Adjusted MCD Determinant
1 12371519856147
ObsjUnsquared 1 Weights
1
3213
0
2
33 3
0
3
34 66
0
4
34 76
0
5
34.71
0
6
33 2
0
7
3419
0
8
3313
0
9
34.18
0
10
33 79
0
11
31 17
0
262
-------
Output for MCD outliers method (continued).
12
13
32.15 |
31.63
!
°i °
i
i
i
—__—
14
35 3
0
_
15
W
1 932
1.927
1
T
17
TfT
1.651
0.699
1
__
19
1.227
20
1.794
1
21
1.656
1854
1
22
1
__ .
23
1 831
24
1.546
1
_
25
1.838
26
1 672
Z048~
1
...
27
23
29
1.176
fl'4
1
__
30
1.863
1
.
I
31
1.623
32
1.603
]m~
1
33
"34"
35"
I
2.036
1.642~~
1
"1
i
3G
37
1.491
l"796"_
1
.
38
1 811
1
39
2.122
1
i
___
"""1
..
.
_
i
_
_
40
1.074
T773~
41
42
43
1.663
2.331
44
45
46
1.925
2.003""
1.688
2.904"
47
48
1 617
184
49
1
50
1 241
1
263
-------
Output for MCD outliers method (continued).
52
2.336
1
1
53
3.05
54
1.953
1
55
1.208
1
5G
1 398
f~584~
1
57
1
5B
1.5
1
59
1.055
1
60
1.934
1
1
61
1.987
62
2.
1
63
1.641
1
64
• 1.677
1
65
1.885 | 1
66
1.736
1
67
1.246
1
68
2.339
1
69
1.568
1
70
2.123
1
71
0 912
1
72
0.815
1
73
1 761
1
74
1.639
2.VSf~
1
75
1
Obseivations with Unsquared MDs greater than 11.15
may be considered as potential outliers!
Weighted Mean Vector
V
x1
x2
x3
-0.0738
1 538
1.78
1.687
Weighted Covaiiance S Matrix
y | x1
x2
x3
0.317
0.0587
0.00186
~ 0.0508
-0.105
0.0587
1.132
0.117
0.00186
0.0508
0.1 T 7
1.152
0.141
-0.105
0.141
1.07
Weighted S Determinant
0.410144007183786
264
-------
Output for MCD outliers method (continued).
Squared MDs
ObsjMCDMDs j Weights J Classical Mt
1
3G51
0
6.026
t
2
37. GG
0
6.729
i
I
3
39.4
o
7 209
I
4
39 48
0
6.321
5
39.42
0
6.339
6
37.81
0
7.036
7
38.75
0
8.235
|
8
37.53
0
6 714
I
9
38.77
0
6.445
I
10
38 22
0
7.346
11
3G 94
0
18.61
12
38.24
0
27.74
13
37.4
0
14 79
14
41 37
0
43.7
15
2.137
1
3.386
1G
2.298
1
4.783
17
1 9G8
2 004
18
0.794
1
0 751
19
1 336
1
1.408
20
2.198
1
2.556
21
1 988
1 | 1 281
22
1.929
1
2 578
|
23
1.943
1
1.216
i
i
24
1.776
1
1 321
25
2 064
1
0 658
2G
2 042
1
1.413
27
2.28G
1
2.492
I
28
1.268
1
0.765
29
1 269
1
0.364
30
2.111
1
2 793
31
1.723
1
3 395
32
1.919
1
1 772
33
1.633
1
0.967
34
2.35
1
1.475
35
2 02G
1
1 58
3G
1.845
1
1.008
37
2.114
1
3 3G8
-------
Output for MCD outliers method (continued).
40
41
42"
43
1.322
2.046"
T.967-
1
"T
.
1 237
3T3ET
37358
i
i
i
2 471
4.108
44
45 '
2192
1
2.68S
1 278 "
2225
I
2103
1" §01
1
~"T
46
4f
2 909
~ T'925-
2 269
1
4 332
48
49
1
2184
"" 2552~
0278""
50
5T
1 539
1
_ _
1 858
1.858
52
53
2 34
1
4.587
3148
2*231
1
1
4.889
_"2604"
54
55
1.377
1
1 622
56
. .
1.648
1 82
1
1.898
" 0.773'
58
1 756
1
1.988
59
1 291
~T342~
1
" " 1
0.463
S945-
60
-
61
2 244
2.806
62
63 "
2 254
1
0.675
'""1 757"
•
1 923
1
64
65
1.986
1
1 112
2 082
1
1.336
66
2.112
1
1.696
67
1.362
1
0 445
68
2.483
1
2.462
69
1.758
1
1.154
70
71~"
2.173
1 133
1
1.042
0.413
72
0 937
1
1.114
73
1 723
1
1
2.207
74
2 032
2.742
75
2.23
1
3.632
- --- --
Classical Kurtosis
5397
MCD Kuitosis
9 072E+11
Results: Fourteen (14) observations (I, 2, 3, 4, 5, 6, 7, 8, 9, 10, II, 12, 13 and 14), with squared distances
greater than the Max squared MCD MDs, were given a weight of zero (0) (hard rejection) and were
considered to be outliers. Also, note the large value of the MCD kurtosis.
266
-------
Output for MCD outliers method (continued).
MCD Estimate
1829.51
1729.51
1623.51
1529.51
1429.51
132951
122951
112951
102951
3 929.51
q 829 51
s
729.51
62951
529.51
429.51
329.51
22951
12951
2951
-70.49
¦170 49
¦|«feMt*(0.97S)-11 IS
MCD Estimate
182951
1729.51
162951
152951
142951
132951
1229.51
112951
102951
jf a
-------
MCD Estimate
162951
1729.51
1629 51
152951
1429 51
132951
122951
1129 51
1029.51
N 75 0000
P 40000
Slope 248 3939
Wefcept -216 3694
Con station CoeWKaert 0 $421
(J 829.51
S
72951
62951
52951
429.51
32951
22951
129.51
2951
•70.49
-17049
IChaquare (0.975) « 11.15 |
Chisquare Quantiles
Graphical Interpretation: The observations greater than "the chi-square (0.975)" line may be considered
to be potential outliers. Those observations have weights of zero (0). To be consistent with the literature,
on the graphs generated for MCD method, chi-square quantiles (and not beta quantiles) have been used.
7.2.2.4 MVT
1. Click Outlier/Estimates ~ Multivariate ~ Robust ~MVT.
E
Scout ?008 - [C:\OLD__Drivo\MyFiles\WPWIN\SCOUT\Scou1 7008 Beta Test Version 1.00.00\Scout\Data\Scout v. .
Outliers/Estimates I
Rio Edit Configure
Navigation Panel |
I Regression Multivariate EDA Geo Stats Programs Window Help
LOG 1 00 09 PM »flnrormation] C:\OLD_Drive\MyFiieswvPWlNlSCOLrnscout 2008 Beta Test Version 1 (
\Scout\Data\Scout v 2 0 DataURISOUT DAT was imported into IRISOUT wst
268
-------
2.
The "Select Variables" screen will appear (Section 3.4).
o Select the variables from the screen.
° If various groups are available and an analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.
° Click the "Options" button for various options.
- S elect I nitial E stimates
C Classical
Sequential Classical
C Robust (Median, MAD)
(• OKG (MaronnaZamai)
C KG (Not Orthogonalized)
C MCD
¦.Cutoff for Outliers
Critical Alpha
0.05
"Correlation R Matrix ~
<• Do Hot Display
C Display
" Select N umber of I terations ~i
10
[Max = 50]
-MVT Trimming Percentage
T rim Percentage
I 01
-Intermediate Iterations
(* Do Not Display
C Display Every 5th
C Display Every 4th
C Display Every 2nd
C Display All
OK
Cancel
A
o Specify the "Select Initial Estimates" to start the Huber iterative
procedure. The default is "OKG (Maronna Zamar)."
o Specify the distribution for Mahalanobis distances in the "MDs
Distribution." The default is "Beta."
o Specify the "Critical Alpha," the cutoff for outliers. The default is
"0.05."
o Specify the "Select Number of Iterations." The default is "10."
o Specify the "MVT Trimming Percentage" for the trimming process.
The default is "0.05."
o Specify whether or not to display the "Correlation R Matrix." The
default is "Do Not Display."
269
-------
o Specify the "Intermediate Iterations." The default is "Do Not
Display."
o Click "OK" to continue or "Cancel" to cancel the options.
° Click the "Graphs" button for various options.
HI MViTrG/aghibSj QfJ.tib.ns,
W Index MD Plot
1? Distance-Distance MD Plot
K? QQ MD Plot
QQ Plot Options
MDs Distribution
f Beta Chi
Title for Index MD Plot
MVT Estimate
Title for Distance-Distance Plot
MVT Estimate
Title for QQ MD Plot
MVT Estimate
Graphics Critical Alpha
i 005
Cancel
OK
4
o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the MVT estimates.
o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
MVT estimates.
o Specify the "Title for Q-Q MD Plot."
o Select the distribution required for the "Q-Q Plot" and the "Graphics
Critical Alpha" for identifying the outliers.
Note The "Graphics Critical Alpha " should match the "Critical Alpha" from the
Multivariate Outlier Options window to obtain the same outliers.
o Click "OK" to continue or "Cancel" to cancel the options.
° Click "OK" to continue or "Cancel" to cancel the MVT procedure.
270
-------
Output example: The data set "BRADU.xls" was used for the Huber method. It has 75
observations and four variables. The initial estimates of location and scale for each group
were the median vector and the scale matrix obtained from the OKG method. The
outliers were found using the Huber influence function and the observations were given
weights accordingly. The weighted mean vector and the weighted covariance matrix
were then calculated.
271
-------
Output for MVT outliers method.
Data Set used: Bradu.
! MVT Multivariate Outlier AnaJpsis
User Selected Options
Date/Time of Computation
From File
1/8/2008 6:25:49 PM
D AN arain\S cout_For_Windows\S coutS ourceVWorkD atl nE xcelVB R AD U
Full Precision
OFF
Critical Alpha
0.05
T rimming Percentage
Initial Estimates
W4
Robust Median Vector and OKG (Maronna-Zamar) Matrix
Display Correlation R Matrix
Do Not Display Correlation R matrix
Number of Iterations
10
Show Intermediate Results
Do Not Display Intermediate Results
Title for Index Plot
MVT Estimate
Title for Distance-Distance Plpt
MVT Estimate
Title for QQ Plot
MVT Estimate
Graphics Critical Alpha
0.05
MDs Distribution
Beta
Number of Observations
75
Number of Selected Variablesj 4
I
Critical Values |
Max Squared MD (0.05)
17 43
Individual Squared MD (0.05)
9128
I
Multivariate Kurtosis(0.05)
25.2
|
Classical Mean Vector
y
x1
x2
x3
1.279
3 207
5.597
7.231
ClassicalCovarianceS Matrix
y
x1
x2
x3
12.2
9.477
20.39
31.03 .
9.477
13.34
28.47
41.24
20.39
28 47
G7.88
94.67
31.03
41 24
94.67
137.8
Determinant
1906.26080388262
272
-------
Output for MVT outliers method (continued).
E igenvalues for Classical Covaiiance S Matow
Eval 1
Eval 2
Eval 3
Eval 4
0.914
1.688
5.538
223.1
I
i
MedianVector
V
x1
x2
x3
0.1
1.8
2.2
2.1
MAD/0.6745Vector Representing Standard Deviation
V
x1
x2
x3
0 89
1.927
1.631
1.779
|
OKG MeanVector
y
x1
x2
x3
1 258
-7.552
-6.052
2 521
Robust OKG [MaionnaZamai) Co variances Matrix
V
x1
x2
x3
0 598
0 243
0.115
-0.231
0.243
2.856
0108
0 241
0115
0108
2.175
0.00402
•0 231
0 241
0.00402
2.34
Determinant
7.84553740487618
_ _ I
Robust OKG Eigenvalues
Eval 1
Eval 2
Eval 3
Eval 4 |
0.53
2149
2.316
2.975
Final W eighted M ean Vector
V
x1
x2
x3
0.9G8
2.418
3.672
4 566
FinalCovarianceS Matrix
V
x1
x2
x3
9.88
8.16
17.43
26.43
81G
7.891
14.78
22.53
17.43
14 78
32.71
48 33
2G 43
22.53
48.33
74.4
Determinant! 43 7076080629344
273
-------
Output for MVT outliers method (continued).
J Obseivations with Squaied MDs greater than 17.43
may be considered as potential outliers!
Observation
Classical
Initial
Final
Final
Number
Squared MD Squared MC
Squared MDWeights
1
6.02S
654.4
~ Mm"
_
2
6.729
699.8
9.167
1
___
7 209
761.9
9 825
1
4
6 321
769.1
131
0
5
6 339
765.6
9.918
1
6
7 036
701.8
9.236
1.
7
8 235
740 2
11.12
0
8
6.714
692.4
8.997
1
9
6.445
740.8
10.86
0
10
7.346
719.1
9.925
1
_ _
Ts'.GT
69172"
Tmf
0"
12
27.74
732.6
387.7
0
TT
14.79
7128"""
325.5
"0 ~
14
43.7
907
527.9
0
__
~ "1386
1 944"
"""4865""
1G
4.783
2.228
5.779
1
17"
~ T004~
2761"
" T357
"" "T
18
0.751
0 303
0.692
1
___
1408"
0675 "
"1.821
. _
20
2.556
1 234
2.838
1
zT
" 17281""
"" Tl65~
222"
_
22
2.578
1.322
3.12
1
23 '
Tm
284 ""
~2£U
24
1 321
1 542
3.501
1
"25"
" "T658
37846
1335 "
. - -
26
1.413
2.229
4.579
1
27"
"" "2.492"
2.387"
5.45
28
0 765
1.029
1.737
1
29
0 364
1.612
1 561
1
30
2 793
1 945
4.399
1
31
3.395
1.721
3.116
1
32
1.772
2.478
2.552
1
33
0.967
1.674
2.962
1
34
1.475
4 461
1.718
1
274
-------
Output for MVT outliers method (continued).
41
3139
1.386
3 732 |
|
42
3 353 |
2.147
4.224 j
1
I
43
4.108
2.526.
6 397
1
!
44
2686
2.505
5 298
1
i
45
1 278
3 85
1 564
1
i
4G
2.225
1 865
3 955
1
I
47
4 332
5 896
5 06
1
I
48
2.184
1.054
2.574
1
49
2.552
1.422
311
1
50
0 278
0.856
1 747
1
51
1 858
1 936
3.817
1
52
4.587
3 562
4 813
1
53
4.889
4 616
8 281
1
54
2 604
2131
5.445
1
55
1 622
1.159
2 062
1
56
1.898
1.137
3 033
1
57
0 773
2 385
3 679
1
58
1.988
1 208
3 251
1
59
0 463
0.523
1 106
1
I
GO
3 945
2 628
5 978
1
61
2.806
1.957
3.985
1
I
62
0.675
3.98
4 866
1
i
63
1 757
2 344
2 945
1
64
1 112
1 602
3 478
1
65
1.336
1 909
3 353
1
66
1.696
3 273
2 272
1
i
67
0.445
1.752
1.194
1
63
2 462
4 222
6 71
1
63
1 154
2 636
2 238
1
70
1 042
1 609
1 476
1
71
0413
0.114
0 331
1
72
1 114
0 732
1 016
1
73
2 207
1 311
2 601
1
74
2.742
2 635
3 807
1
75
3.632
2 62
5 231
1
j Classical
Initial
Final
Multivariate Kuitosis
53 97
101431
8820
j
I
Results: Seven (7) (= 10% of 75) observations (4, 7, 9, II, 12, 13 and 14), with squared distances greater
than the Max (squared MD), were each given a zero (0) weight (hard rejection) and were considered to be
outliers. In order to identify all of the 14 outliers, one has to use higher trimming percentages, such as
20%.
275
-------
Output for MVT outliers method (continued).
PROP Estimate
1830.81
1730.81
1630.31
1530SI
1430.81
1330.81
1230 81
1130.81
1030.81
730 81
630 81
53081
430.81
330 81
230.81
130.81
30 61 [95% Maximum (largest >P) Lknt ¦ 17 <345 |
-6919
-16919
PROP Estimate
1830.81
173081
1630.81
1530.81
1430.81
133091
123081
1130 81
103081
83081
730.81
630 81
530.81
430.31
330 81
230 81
130.81
276
-------
Output for MVT outliers method (continued).
75 0000
4 0000
260.0656
•233.1933
0 8414
09941
414.687 5837
25 2002
96624.191 0299
2 3990
123081
1130.81
103081
Q 930 81
5
830.81
£
^ 730.81
630.81
530 81
430.81
330.81
230.81
13081
30 01 l9S% Max""um CLwoeSl WDJ 17 *34S I
-8919
-169.19
•1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Beta Quantiles
Graphical Interpretation: As before, the observations between the "Warning (Individual MD) Limit"
and "Maximum (Largest MD) Limit" lines may also represent potential outliers.
Note: Many limes in practice, depending upon the trimming percentage value, this method may assign "0"
weights (may find more outliers than actual outliers present in the data set) to some non-outlying
observations with MDs smaller than the Max (MDs). In order to overcome this problem, at the final
iteration, Scout compares the MVT MDs with the critical value Max (MDs), and the observations with MDs
less than the critical value Max (MDs) are reassignedfull unit weight. Estimates of the mean vector and the
covariance matrix are then recomputed.
PROP Estimate »*'•>«•
N
1830.81 p
"»« , nZta
Ccrretation Coefttclfcrt
1630 81 CrJicnl Coi relation (3D5J
M M * Kutoste
^ M Crtcal Kurtosis (0 Q50
1430 81 m ^ -
-------
7.2.2.5 PROP
1. Click Outlier/Estimates > Multivariate > Robust 6>PROP.
lH Scouti 4'..Qj - [B^araijnKcfiut jQC_Wjndmv^coutSourc%\^or(®alJ'nteel]BRABI!i]|
Outliers/Estimates
Navigation Panel |
0
Univariate > I I i I a I n r
7 | 8
Name
Count
| 1
¦ 2
MBtUISffliHSEi Classical M
i 1
mimwmmdM
1
2
9 7
un
95
205
; - I Huber
; _ "j Extended MCD
! MVT
I
3
: 3
103
107
9 9
202
4
4
95|
21 5 j | OKG Reweighted
5
5
10; 103!
lof 108!
21 11
204
¦H PR0P I
e
6
; ! Method_Comparlson |
2. The "Select Variables" screen will appear (Section 3.4).
° Select the variables from the screen.
o If various groups are available and an analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.
o Click the "Options" button for various options.
H M'u ItiVariatej QyUier/ Qp.tiqns;
-Select Initial Estimates
C Classical
C Sequential Classical
C Robust (Median, MAD)
(* OKG (MaronnaZamar)
f KG (Not Orthogonalized)
C MCD
-Cutoff for Outliers —
Critical Alpha
I 005
"Select Number of Iterations
10
[Max = 50]
"Influence Function Alpha
Influence Function
| 005
Alpha
"Correlation R Matrix -
(* Do Not Display
C Display
-Intermediate Iterations
<* Do Not Display
Display Every 5th
C Display Every 4th
C Display Every 2nd
f Display All
OK
Cancel
A
278
-------
o Specify the initial estimates listed in the "Select Initial Estimates" to
start the PROP iterative procedure. The default is "OKG (Maronna
Zamar)."
o Specify the distribution for the Mahalanobis distances in the "MDs
Distribution." The default is "Beta."
o Specify the "Critical Alpha," the cutoff for outliers. The default is
"0.05."
o Specify the "Select Number of Iterations." The default is "10."
o Specify the "Influence Function Alpha" for the Huber weighting
process. The default is "0.05."
o Specify whether or not to display the "Correlation R Matrix." The
default is "Do Not Display."
o Specify "Intermediate Iterations." The default is "Do Not Display."
o Click "OK" to continue or "Cancel" to cancel the options.
° Click the "Graphs" button for various options.
II (>RQ£> Gcap.fiicsi Qatictris,
}~ Index MD Plot j"~
P Distance-Distance MD Plot j"~
W QQ MD Plot
QQ Plot Options
MDs Distribution
f* Beta C Chi
Title for Index MD Plot
PROP Estimate
Title for Distance-Distance Plot
PROP Estimate
Title for QQ MD Plot
PROP Estimate
Giaphics Critical Alpha
I 005
Cancel
OK
A
o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the PROP estimates.
279
-------
o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
PROP estimates.
o Specify the "Title for Q-Q MD Plot."
o Select the distribution required for the "Q-Q Plot Options" and the
"Graphics Critical Alpha" for identifying the outliers.
Note. The "Graphics Critical Alpha" should match the "Critical Alpha" from the
outlier Multivariate Outlier Options window to obtain the same outliers
o Click "OK" to continue or "Cancel" to cancel the PROP procedure,
o Click "OK" to continue or "Cancel" to cancel the computing.
Output example: The data set "BRADU.xls" was used for the PROP method. It has 75
observations and four variables. The initial estimates of location and scale for each group were
the median vector and the scale matrix obtained from the OKG method. The outliers were found
using the PROP influence function and the observations were given weights accordingly. The
weighted mean vector and the weighted covariance matrix were then calculated.
280
-------
Output for PROP outliers method.
Data Set used: Bradu
.PR 0 P M univariate 0 utlier Analysis
U ser S elected 0 ptions
D ate/T ime of Computation j 1 /8/2008 1:54:07 PM
From File jD\Narain\Scout_For_Windows^ScoutSouice\WorkDatlnExcer\BRADU
Full Precision fOFF
Critical Alpha 10.05
Influence Function Alpha
Initial Estimates
Display Correlation R Matrix
Distribution of Squared MDs
Number ofTterations
Show Intermediate Results
Title for Index Plot
Title for Distance-Distance Plot
title for QQ Plot
Graphics Critical Alpha
MDs Distribution
0.05
Robust Median Vector and OKG (Maronna-Zarnar) Matrix
Do Not Display Correlation R matrix
Beta Distribution
10
Do Not Display Intermediate Results
PROP Estimate
PROP Estimate
PROP Estimate
0.05
Beta
Number of Observations] 75 |
" 1
1
Number of Selected Variablesj 4
!
Max Squared MD (0.05)
CiiticalValues
i7ir
Individual Squared MD (0.05) 9.128
Multivariate Kurtosis(0.05)|
25 2
Influence Fn Squared MD (0.05)J^ 9128
ClassicalMeanVector
y
x1
1 279
3 207
V
Si
x2
12.2
9.477
20.39
"9477"
13.34
12&47
20.39
28.47
67.88
" 3lT03
- "41.24
1m~67~
Classical Covariance S Matrix
x3
31.03
41.24"
94.67"
T37T8
Determinant! 1906 2G080388262
281
-------
Output for PROP outliers method (continued).
Eigenvalues for Classical Covariance S Matnx
I
EvaM
Eval 2
Eval 3
Eval 4
0.914
1.688
5.538
223.1
MedianVector
V
-x2
x3
01
1 8
2.2
2.1
MAD/0.6745 Vectoi Representing Standard Deviation
V
x1
X.2
x3
0.89
1.927
1.631
1.779
OKG MeanVector
V
x1
x2
x3
1 258
-7 552
-6.052
2.521
Robust OKG (MaronnaZamar) Covariance S Matrix
y
x1
x2
x3
0 598
0.243
0.115
-0.231
0.243
2.856
0.108
0.241
0115
0.108
2175
0.00402
-0.231
0.241
0.00402
2.34
Determinant
7.84553740487618
Robust OKG Eigenvalues
Eval 1
Eval 2
Eval 3
Eval 4
0.53
2.149
2.316
2.975
Final Weighted MeanVector
V
x1
x2
x3
-0.0776
1.544
1.787
1.679
Final Covariance S Matrix
y
xl
x2
x3
0.315
0.0675
0.0111
-0.117
0.0G75
1.129
0.0364
0136
0.0111
0 0364
1.146
0.161
-0.117
0.136
0.161
1.057
Determinant
0 388282776749715
282
-------
Output for PROP outliers method (continued).
0 bservations with S quaied MD s greater than 17.43
may be considered as potential outliers!
i I
Observation
Classical
Initial
Final
Final |
Number
Squared ME
Squared MC
Squared MD
Weights
1
6 026
654.4
1350
1.861E-16
2
6.729
699.8
1437
5 659E-17
3
7 209
761.9
1574
9 216E-18
4
6.321
769.1
1578
8.734E-18
5
6.339
765.6
1574
9.190E-18
6
7 036
701.8
1446
4 962E-17
7
3 235
740.2
1521
1.843E-17
8
6.714
692.4
1428
6.359E-17
9
6.445
740.8
1524
1 774E-17
10
7 346
719.1
1483 '3.026E-17
i
11
18.61
691.2
1360
1 616E-16
12
27.74
732.6
1459
4.184E-17
13
14.79
712.8
1393
1 033E-16
14
43.7
907
1699
1.893E-18 |
15
3.386
1.944
4 666
1
1G
4 783
2.228
5.315
1
17
2.004
2.761
3.841
1
18
0 751
0 303
0 627
1
19
1 408
0 675
1.769
1
20
2 556
1.234
4.802
1
21
1 281
1 165
3 964
1
22
2 578
1 322
3 721
1
23
1 216
2.84
3.999
1
24
1 321
1.542
3.126
1
25
0.658
3.846
4 222
1
2G
1.413
2 229
415
1
27
2.492
2.387
5 235
1
28
0 765
1 029
1.709
1
29
0 364
1.612
1.619
1
30
2.793
1.945
4.703
1
31
3.395
1.721
2.993
1
32
1 772
2 478
3 73
1
• 33
0 967
1 674
2 75
1
34
1 475
4 461
5 473
1
283
-------
Output for PROP outliers method (continued).
35
1.58
' T.fioiT
0.923
4.093
3.372
1
~T ~~
1
1
1
CO
O")
2.086
37
3.368
i}889~
2.047
""2.826
4.44
1
.
i
38
4 44
39
1.935
3.883
4.769
1
1
40
1 237
1.463
1.735
——
41
3139 j 1 386
3.3~5Er| Z147~
4.151
1
42
3 949
1
43
4.108 | 2.526
6.062
_ ~4li54~
1
1
i
44
2.686
2.505
45
1.278
3 85
4.4
1
i~~
1
i
i
46
2.225
1.865
3.711
47
4.332
5.896
1.054
8.531
48
2.184
3 712
49
2 552
1.422
5.112
1
50
0 278
0.856
2.346
1
1
51
1.858
1.936
1562
3.421
52
4.587
5 498
1
53
4.889
4.616
Z13T
10.85
4.937""
0.698
_
i
54
2.604
55
1.622
1 159
1.877
1
56
1.898
1.137
2.77
1
57
0.773
2.385
1.208
3.327
1
1
58
1.988
3.164
59
0 463
0.523
1.691
1
60
3 945
2.628
5.543
1
61
2 806
1.957
5.029
i
62
0.675
3 98
5.127
1
63
1.757
1 112~
2.344
1.602
3.823
1
64
3.936
65
1.336
1.909
4.628
1
66
1 696
3.273
4.468
1
67
0.445
1.752
1.921
1
1
68
2.462
4 222
6.464
69,
1.154
2 636
3.062
1
1
70
1.042
1.609
5 021
71
0.413
0.114
0.732
1.278
1
72
1.114
0 872
1
73
2.207
1.311
3.196
1
284
-------
Output for PROP outliers method (continued).
74
2.742
2.635
4.093
1
75
3.632
2.62
5.414
1
Classical Initial Final
Multivariate Kurtosis 53.97 101431 414688
Results: Fourteen (14) observations (1,2,3,4, 5, 6, 7, 8, 9, 10,11, 12, 13, and 14), with squared distances
greater than the Max (squared MD), were each assigned an almost zero (0) weight (soft rejection) and may
be considered to be outliers. Another observation (#53) also received a reduced (<1) weight.
Note: PROP estimates with or without the 14 outliers are in close agreement with the classical estimates
without the 14 outliers.
PROP Estimate
1830.81
1730.81
1630.81
1530.81
143081
1330.81
1230.81
1130.81
1030.81
Q 930.81
5
630 81
53081
*30.81
330.81
230.81
130.81
30 81 195% Maximum (Largest MD) Utntt ¦ 17 4?4S~|
Index
285
-------
PROP Estimate
1830 61
1730.81
1630.81
1530.81
1430 81
1330.61
1230.81
113051
1030.81
Q 930.81
2
830.81
730.81
630 81
53081
430.81
330 81
230.81
130.81
30.81
Classical MD
Estimate
1830 81
1730.81
163081
1530.61
1430.81
1330.81
1230.81
1130.81
1030.81
I
| 930.81
j 830.81
" 730.81
630.81
530.61
430.81
330.81
230 81
130.81
30 SI fe*
-6919
-169.19
MayinMTi^l»geslhC)Un<"17 4345|
N
75 0000
P
40000
Slope
2600656
ttercept
-233.1933
Correlation Coefficiert
0 8414
Obcal Correlation (0.05)
0.9941
Kirtosis
414,687 5887
Cr lical Kurtosis (0.05)
252002
Skewness
98,624.191.0299
O Ileal Skerwness (0 OS)
23990
|95% Warning (Indrvidual MD) Liml » 9 1281 |
0 1 2 3 4 $ 6 7 8 9 10 11 12 13 14
Beta Quantiles
286
-------
Graphical Interpretation: the observations (e.g., #53) between the "Warning (Individual MD) Limit"
and the "Maximum (Largest MD) Limit" received a reduced (<1) weight.
Note 1 If the initial estimate of the covariance matrix is not positive definite, then a warning message (in
orange) is displayed, so that one of the other options, which yields a positive definite covariance matrix,
can he selected. This is illustrated as follows
Data Set used: Stackloss.
Select Initial Estimates: Robust MAD/Median.
Classical MeanVector
Stack-Loss
An Flow
Temp
Acid-Cone
1752
60 43
2,,
86 23
Classical Covariance S Matin
Stack-Loss
Air-Row
Temp,
Acid-Conc
103 5
85.76
2815
21,79
85,76
84 06
22.66
24 57
2815
22 66
999
6 621
21 79
24 57
6 621
28 71
Determinant
62844 87S4181548
Eigenvalue* for ClassicalCovaiiance S Mabh
Evall
Eva! 2
Eval 3
Eval 4
1 96
7134 j 22 96
^ 134 1
MedianVectoi
Stack-Loss
Air-Flow
Temp
Acid-Conc
15
58
20
97
MAD/0.6745Vector Representing Standsd Deviation
Stack-Low
Ab Flow
Temp
Acid Cone
5 33
5 93
2 365
4 448
Robust MAO Covariance S Matrec
Stack-Loss
Air Row
Temp
Actd-Conc
3517
8576
2815
21.79
8576
3517
22.66
24 57
2815
~2\~W
22 66
24 57""
8 792
""6 621"
6.621
13i~78
Determinant|171530 124177133
|
Robust MAD Eigenvalues
Evall
Eva! 2
Eval3
Eval 4
-50 94
-2115
11 32
1406
Initial Covariance Matrix is Not Positive Defintel j
Please use other options to compute the Initial Covariance Matrix* I
Note 2. If any of the elements of the MAD/0 6745 vector is less than 10then a fix, called the IQR Fix, is
used. In such cases, the variability measure, MAD/0 6745, is replaced by IOR/1 355. This is illustrated as
follows.
287
-------
Classical M ean Vector
sp-le~th-1
sp-width-1
pt-le~th-1 | pt-width-1
5.00S
3.428
1.4G2 | 0.24G
— -
Classical Co variances Matrix
sp-le~th-1
sp-width-1
pt-le~th-1
pt-width-1
0.124
0.0992
0 0164
0.0103
0.0992
0.144
00117
0 0093
0.0164
0.0117
00302
0 00607
0.0103
0.0093
0.00607
0.0111
Determinant
211308767598396E-06
Eigenvalues for Classical Covariance S Matrix
Eval 1
Eval 2
Eval 3
Eval 4
0 00903
0 0268
0 0369
0 236
Median Vectoi
sple~th-1
sp-width-1
ptle~th-1
pt-width-1
5
3.4
1.5
02
MAD/0.6745Vectoi Representing Standanl Deviation
sple~th-1
sp-width-1
pt-le~th-1
pt-width-1
0 297
0.371
0.148
0
sple~th-1
—oar
iQRTiiT
Adjusted by IQ R /1.35 MAD ALG745Vectoi
pt-le~th-f |~ pt-width-1
sp-width-1
0.371
0148
0.0741 J
" r
n
Robust MAD Covariance S Matrix
:ti
sp-le~th-1
00879"
sp-width-1
pt-le~th-1
00164
pt-width-1
0 0103"
0 0992
0 0992
0.137
0.0117
0 0093
0 0164
~ 0 0103
0.0117
0 022
0.00607
0 0093
0 00607
0.00549
Determinant
1 27592503630283E -07
1
-------
7.2.3 Method Comparisons
The Method Comparison module (available in the Outliers/Estimates drop-down menu) is
a formal graphical method of comparing various classical and robust outlier identification
methods incorporated in Scout. Specifically, selected classical and robust prediction and
tolerance ellipsoids (contour ellipses) are drawn on two-dimensional scatter plots of
selected variables. The main objective of this module is to compare the effectiveness of
the various outlier methods included in Scout.
Those contour plots are displayed at the same two levels as the horizontal lines (warning
limit and maximum limit) displayed on the Q-Q Plots of the MDs. The individual (Indv-
MD) contour (prediction ellipsoid) corresponds to the inner ellipsoid given by the
probability statement:
p(d? < d?ml)< (1- or); i = 1,2, ..., n.
The Simultaneous (Max-MD) outer contour ellipsoid corresponds to a tolerance ellipsoid
given by the probability statement:
p(df < dha)<{\-a); i = 1,2, ..., n
For details, refer to Singh (1993), and Singh and Nocerino (1995). The plots based upon
the classical MDs accommodate outliers as a part of the same population and often fail to
identify all of the outliers present in the data set. The outlying observations are more
prominent on the contour plots obtained using robustified distanced and estimates.
Observations falling outside the outer ellipse (tolerance ellipsoid) are outliers; whereas,
the observations lying between the inner (prediction ellipsoid) and the outer ellipses may
be also represent potential outliers.
If the data set is categorized by a group column, then the contour ellipses (prediction or
tolerance ellipsoids) can be drawn separately for each of the groups included in the data
set. The plots shown here are obtained using some well-known data sets.
o Click Outlier/Estimates > Multivariate 1> Robust Method Comparison.
M
sP
Scoyt< 4'.0, ^ [D:\yarain\ScoutLRQri_W1indbv<£J\ScautSource\WorkDatlnExcel\BRAP.IJ]J
Outliers/Estimates
Navigation Panel |
o
Univariate ~ _
L—Jl—J I
5
6
7 !
8
Name
Count
Classical ~
JD
D \Narain\Scout_Fo
1
r 1' 3 7!
Robust/Iterative >
Sequential Classical
Huber
Extended MCD
MVT
OKG Reweighted
PROP
"1
i
2
2
To i
95
1 2°5! , i
3
4
"3
4
10 3;
~9 5
107
99j
20 2
215
I " ""
- -i
5
5
-
10! 103
ici 108
21 1
204|
6
Meth6d_Comparison |
289
-------
o
The following variable selection screen appears.
IH Select Variables tg Graphj
Variables
| Name
1 ID
| Count |
I Count
0
75„
V
1
75
xl
2
75
x2
3
75 .
x3
4
75
Select Y Axis Variable
»
«
Name
ID Count
»
Select X Axis Variable
«
Name
ID
Count
Select Group Variable
Options I | [*]
OK
Cancel
Specify the variable for Y-axis under "Select Y Axis Variable."
Specify the variable for X-axis under "Select X Axis Variable."
Specify the group variable in the "Select Group Variable" drop-down if a
group variable is present in the data set.
Click on "Options" to get the following window.
E§ OptionsGrqphs_EDA\_Scatterfilot'
-Select EBipse |s)
I? Classical
F? Robust
-Ellipse Grouping
W ABData
f By Group
- Classical Contour Plots —
(• Individual [dOcut]
C Simultaneous [Max MD]
C Simultaneous/Individual
p Classical Cutoff —
Critical Alpha
r
0.05
rRobust Contoui Plots
Individual [dOcut]
f Simultaneous [Max MD]
f Sirnultaneous/lndividual
- S elect E stimation Method (s] —
Sequential Qassical
I- Huber
r PROP
r MVT
r MCD
|—Robust Cutoff
Critical Alpha
005
rLabel Individual Points —
(• Observation Number
By Group Designation
THI0:
Scatter/Contour Plot
OK
Cancel
290
-------
Click the "Simultaneous/Individual" radio button in the "Classical Contour
Plots" box. Click "OK" to continue or "Cancel" to cancel the options window.
Data Set used: Bradu. Both classical prediction and simultaneous ellipsoids are drawn.
• Open the options window and uncheck the "Classical" option in the "Select
Ellipse(s)" box.
® Click the "Simultaneous/Individual" radio button in the "Robust Contour
Plots" box.
• Specify any of the estimation methods from the "Select Estimation Method(s)"
box; e.g., PROP.
• Specify the preferred "Robust Cutoff Critical Alpha," "Select Initial
Estimates," "Select Number of Iterations," "MDs Distribution" and
'influence Alpha." Click "OK" to continue for the graph.
291
-------
i3® OptionsGraphs_[DA_ScatterPlot
Select Ellipse Is)
r Classical
17 Robust
Ellipse Grouping
I* All Data
I- By Gioup
Robust Cutoff
Critical Alpha
0.05
Title:
Robust Contour Hots
C Individual [dOcut]
C Simultaneous [Max MD]
<• Simultaneous/lndividual
Label Individual Points
<• Observation Number
By Group Designation
Scatter/Contour Plot
OK
Cancel
Select Estimation Method (s)
P Sequential Classical
i~ Huber
I* PROP
T MVT
r MCD
Select Initial Estimates
C Classical
C Robust (Median. MAD)
<~ OKG (Maronna Zamar |
C KG (Not Orthogonalized)
C MCD
MDs Distribution
C Beta C Chisquare
Select Number of Iterations
10
(Max = 501
Huber and/or PROP
r
0.05
Influence Alpha
Data Set used: Bradu. Both inner and outer ellipsoids are drawn using the PROP method.
292
-------
o Comparing various robust methods.
81 Qg.tiansGragtis^EPA_ScdtterRloti
-Select Ellipse (s)
f~ Classical
17 Robust
-Ellipse Grouping
& All Data
P By Group
"Robust Cutoff
Critical Alpha
0 05
Title
pRobust Contour Plots
(* Individual [dOcut]
C Simultaneous [Max MD]
C Sirndtaneous/lndividual
r Label Individual Points —
(* Observation Number
C By Group Designation
Scatler/Contour Plot
OK
Caned
-Select Estimation Method (s)
W Sequential Classical
W Huber
17 PROP
W MVT
W MCD
"Select Initial Estimates
C Classical
Robust (Median. MAD)
(* OKG [MaronnaZamar)
C KG [Not Orthogonalized)
r mcd
-MDs Distribution
(* Beta f Chisquare
"Select Number of Iterations
I 1°
[Max = 50]
"IterativeClassical 1
| 005
Critical Alpha
"Htier and/oi PROP 1
| O05
Influence Alpha
"Multivariate Trimming 1
I 01
Trim Percentage
4
293
-------
Data Set used: Bradu
Scatter/Contour Plot
. Sequent^ Classical
.PROP
. MVT
4.7 -3.7 -2.7 -1.7 -0.7 0.3 1.3 23 3.3 4.3 5.3 6.3 7.3 83 9.3 10.3 11.3 12.3 13.2
x1
Sequential Classieal (with robust OKG initial start); the PROP and MCD ellipses overlap each other.
Note: The "Select Initial Estimates " "MDs Distribution"Number of Iterations" and "Robust Cutoff
remain the same for the sequential classical, Huber, PROP and MVT methods.
294
-------
Data Set used: Star Cluster.
Note: In this example, all of the methods (including the Huber method), except for the classical method,
identified the four main outliers present in the data set.
295
-------
Data Set used: Stackloss.
Example with Group variable in the variable selection screen.
fp OptionsGraphs_EDA_ScatterPlot
Select Ellipse (s)
r Classical
Iv Robust
Elipse Grouping
All Data
I* By Group
Robust Cutoff
Critical Alpha
0.05
Title:
Robust Contour Plots
(• Individual [dOcul]
C Simultaneous [Max MD]
C Simultaneous/Individual
Label Individual Points
C Observation Number
• By Group Designation
Scatter/Contour Plot
OK.
Cancel
Select Estimation Method (s)
Sequential Classical
17 Huber
15 PROP
T MVT
r MCD
Select Initial Estimates
C Classical
f Robust (Median. MAD)
OKG (Maronna Zamar J
C KG (Not Orthogonalized)
C MCD
MDs Distribution
f* Beta C Chisquare
c
Select Number of Iterations
10
[Ma* - 50)
Iterative Classical
0.05
Critical Alpha
Huber and/or PROP
005
Influence Alpha
296
-------
o Check "By Group" in the "Ellipse Grouping" box and uncheck "All
Data."
o Click on the "By Group Designation" radio button in the "Label
Individual Points" box.
o Specify the preferred contour plots and the estimation methods.
o Click "OK" to continue or "Cancel" to cancel the options.
Data Set used: Fulliris (Fisher 1936 data set with 3 species).
Note: The user may select all of the options available in the options window. But, this selection will result
in a busy (with overlapping ellipses) and cluttered graph which is difficult to understand. The user should
select useful options from all available options.
297
-------
-;l*OptionsGraphs_EDA_ScatterPlot
Select Ellipse (s)
V Classical
'y Robust
Ellipse Grouping
& All Data
W By Group
Classical Cutoff
Critical Alpha
oos
Robust Cutoff
Critical Alpha
0.05
Classical Contour Plots
C Individual [dOcut]
C Simultaneous [Max MD1
(* Simultaneous/Individual
Robust Contour Plots
C Individual [dOcut]
Simultaneous [Max MD)
• S imultaneous/l ndividual
Label Individual Points
Observation Number
~ By Group Designation
Scatter/Contour Plot
OK I
Select Estimation Method (s)
!V Sequential Classical
Huber
I? PROP
!7 MVT
1? MCD
Cancel
Select Initial Estimates
C Classical
C Robust (Median, MAD)
OKG (Maronna Zamar)
f KG (Not Oithogonalized)
r MCD
MDs Distribution
Beta ^ Chisquare
m
Select Number of Iterations
I ~
[Max = 50]
Iterative Classical
0.05
Critical Alpha
Huber and/or PROP
0.05
Influence Alpha
Multivariate Trimming
0.1
Trim Percentage
Data Set used: Fulliris (Fisher 1936 data set with 3 species).
Scatter/Contour Plot
93
23
1 18 1.68 2.18 2.59 3.18 3.68 4 18 4.68 4.93
sp-wldth
*1*2 A3
¦ Classical ImSvidual
¦ _ _ Sequential Classical
Huber
¦ FRCP
H ...MVT
¦ MCD
298
-------
References
Alqallaf, F.A., Konis, K.P., Martin, R.D., and Zamar, R.H. (2002). "Scalable Robust
Covariance and Correlation Estimates for Data Mining." In Proceedings of the
Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, ACM, Edmonton.
Barnett, V. and Lewis, T. (1978). Outliers in Statistical Data, New York, J. Wiley and
Sons.
Bernholt, T., and Fischer, P. (2004). "The Complexity of Computing the MCD-
Estimator," Theoretical Computer Science, 326, 383-398.
Devlin, S.J., Gnanadesikan, R., and Kettenring, J.R. (1975). "Robust Estimation and
Outlier Detection with Correlation Coefficients," Biometrika, 62, 531-545.
Devlin, S.J., Gnanadesikan, R., and Kettenring, J.R. (1981). "Robust Estimation of
Dispersion Matrices and Principal Components," Journal of the American Statistical
Association, 76, 354-362.
Dixon, W.J. (1953). "Processing Data for Outliers." Biometrics 9: 74-89.
Fung, W. (1993). "Unmasking Outliers and Leverage Points: a Confirmation," Journal of
the American Statistical Association, 88, 515-519.
Ferguson, T.S. (1961a). "On the Rejection of Outliers," Proc., Fourth Berkeley
Symposium. Math. Statist., 1, 253-287.
Ferguson, T.S. (1961b). "Rules of Rejection of Outliers," Rev., Inst. Int. de Statist, 3,
29-43.
Garner, F.C., Stapanian, M.A., and Fitzgerald, K.E. (1991a). "Finding Causes of
Outliers in Multivariate Environmental Data," Journal of Chemometrics, 5, 241-248.
Garner, F.C., Stapanian, M.A., Fitzgerald, K.E., Flatman, G.T., and Englund, E.J.
(1991b). "Properties of Two Multivariate Outlier Tests," Communications in
Statistics, 20, 667-687.
Grubbs, F.E. (1950). "Sample Criterion for Testing Outlying Observations," Ann. Math.
Statist., 21,27-58.
Gnanadesikan, R., and Kettenring, J.R. (1972). "Robust Estimates, Residuals, and
Outlier Detection with Multi-response Data," Biometrics, 28, 81-124.
299
-------
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986). Robust
Statistics: The Approach Based on Influence Functions, John Wiley & Sons, New
York.
Hawkins, D.M., Bradu, D., and Kass, G.V. (1984). "Location of Several Outliers in
Multiple Regression Data Using Elemental Sets," Technometrics, 26, 197-208.
Huber, P.J. (1981). Robust Statistics, John Wiley and Sons, NY.
Hubert, M., Rousseeuw, P.J., and Vanden Branden, K. (2005). "ROBPCA: A New
Approach to Robust Principal Component Analysis," Technometrics, 47, 64-79.
Kafadar, K. (1982). "A Biweight Approach to the One-Sample Problem," Journal of the
American Statistical Association, 77, 416-424.
Lax, D.A. (1985). "Robust Estimators of Scale: Finite Sample Performance in Long-
Tailed Symmetric Distributions," Journal of the American Statistical Association, 80,
736-741.
Mardia, K.V. (1970). "Measures of Multivariate Skewness and Kurtosis in Testing
Normality and Robustness Studies," Biometrika, 57, 519-530.
Mardia, K.V. (1974). "Applications of Some Measures of Multivariate Skewness
Kurtosis in Testing "Normality and Robustness Studies," Sankhya, 36, 115-128.
Maronna, R.A., and Zamar, R.H. (2002). "Robust Estimates of Location and Dispersion
for High-Dimensional Data sets," Technometrics, 44, 307-317.
Mehrotra, D.V. (1995). "Robust Elementwise Estimation of a Dispersion Matrix,"
Biometrics, 51,1344-1351.
Mosteller, F., and Tukey, J.W. (1977). Data Analysis and Regression, Addison-Wesley
Reading, MA.
Pena, D., and Prieto, F.J. (2001). "Multivariate Outlier Detection and Robust
Covariance Matrix Estimation," Technometrics, 286-299.
ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 User Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:
http://www.epa.gov/esd/tsc/software.htm.
ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 Technical Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:
http://www.epa.gov/esd/tsc/software.htm.
300
-------
Rocke, D.M., and Woodruff, D.L. (1996). "Identification of Outliers in Multivariate
Data," Journal of the American Statistical Association, 91, 1047-1061.
Rocke, D.M., and Woodruff, D.L. (1997). "Robust Estimation of Multivariate Location
and Shape," Journal of Statistical Planning and Inference, 57, pp. 245-255.
Rousseeuw, P.J., and van Zomeren, B.C. (1990). "Unmasking Multivariate Outliers and
Leverage Points," Journal of the American Statistical Association, 85, 633-651.
Rosner, B. (1975). "On Detection of Many Outliers," Technometrics, 17,221-227.
Rousseeuw, P.J., and Leroy, A.M. (1987). Robust Regression and Outlier Detection,
John Wiley and Sons, NY.
Rousseeuw, P.J., and Van Driessen, K. (1999). "A Fast Algorithm for the Minimum
Covariance Determinant Estimator," Technometrics, 41,212-223.
Schwager, S.J., Margolin, B.H. (1982). Detection of Multivariate Normal Outliers, Ann.
Statist., 10, 943-954.
Scout. (2002). A Data Analysis Program, Technology Support Project, USEPA, NERL-
LV, Las Vegas, Nevada.
Scout. (2008). Technical Guide under preparation.
Singh, A. (1993). Omnibus Robust Procedures for Assessment of Multivariate Normality
and Detection of Multivariate Outliers, In Multivariate Environmental Statistics, Patil
G.P. and Rao, C.R., Editors, pp. 445-488, Elsevier Science Publishers.
Singh, A., and Nocerino, J.M. (1995). Robust Procedures for the Identification of
Multiple Outliers, Handbook of Environmental Chemistry, Statistical Methods, Vol.
2. G, pp. 229-277, Springer Verlag, Germany.
Sinha, B.K. (1984). "Detection of Multivariate Outliers in Elliptically Symmetric
Distributions," Anal. Statist., 12, 1558-1565.
Stefanski, L.A., and Boos, D.D. (2002). "The Calculus of M-estimators," The American
Statistician, 56, 29-38.
Tukey, J.W. (1977). Exploratory Data Analysis, Addison-Wesley Publishing Company,
Reading, MA.
Wiiks, S.S. (1963). "Multivariate Statistical Outliers," Sankhya, 25, 407-426.
301
-------
Chapter 8
QA/QC
Issues related to the reliability of data are often grouped under the general heading of
"quality assurance and quality control" (QA/QC), a description that captures the idea that
data quality can not only be documented but can also be controlled through appropriate
practices and procedures. Even with the most stringent and costly controls, data will
never be perfect: errors are inevitable as samples are collected, prepared and analyzed.
One goal of QA/QC is to quantify these errors so that subsequent statistical analysis and
interpretation can take them into account. A second goal is to monitor the errors so that
spurious or biased data can be recognized and, if possible, corrected. A third goal is to
provide information that can be used to improve sampling practices and analytical
procedures so that the impact of errors can be minimized. Scout offers QA/QC methods
for data with and without non-detects. Kaplan-Meier (KM) estimates of mean and
standard deviation are used for data with non-detects.
Scout also allows the user to test the behavior of "Site/Test" data against
"Background/Training" data. In this module the statistics and estimates are computed
using the "Background/Training" data and then graphs and the charts are produced for
the whole data set which is inclusive of "Background/Training" data and "Site/Test" data.
The important requirement for this module is that there should be a column which
indicates the various groups which can be considered as the "Site/Test" data.
8.1 Univariate QA/QC
Scout offers several univariate procedures to achieve the goals specified above. They
include Q-Q Plots with Limits, Interval Graphs and Control Charts. Classical and robust
methods have been incorporated in this module.
8.1.1 No Non-detects
8.1.1.1 Q-Q Plots with Limits
1. Click on QA/QC D> Univariate > No NDs E> Q-Q Plots with Limits.
2. The "Select Variables" screen (Section 3.4) will appear.
303
-------
® Click on the "Options" button for the options window.
§1 Qft/QC, liinivaniatej piftt QatiqiriSj
-Select Methods
C Classical
PROP
Huber
r MVT
"Select Quantiles
Raw Data
(* Standardized Data
-Influence Alpha
| 0025
_MDs Distribution
(* Beta
C Chisquared
"tt Iterations
20
"Initial Estimate
Mean'/Stdv
Median/1.48MAD
"Critical Alpha for Limits
T 0.010
C 0.025
<5" 0.050
r 0.100
c 0.150
C 0.200
C 0.250
- Display Lines
I- Regression Line
17 at Critical Value of Individual MD
f? at Critical Value of Max MD
157 Use Default Title
OK
Cancel
o Specify the method for computing the quantiles in "Select
Methods." The default method is "PROP."
o The robust methods need various input parameters like
"Influence Alpha" or "Trimming Percentage," "Initial
Estimates," "MDs Distribution," and "# Iterations."
o Specify the "Critical Alpha for Limits" for identifying the
outliers. Default is "0.05."
o Specify the quantiles for the X-axis using the "Select
Quantiles" option and options for displaying the regression
lines.
o Click "OK" to continue or "Cancel" to cancel the options.
304
-------
• Click on "OK" to continue or "Cancel" to cancel the Q-Q Plots with
limits.
Output example: The data set "Bradu.xls" was used for the Q-Q Plot. The options used
were the default options.
11.8
PROP Q-Q Plot of y With Limits
y
N-75
Influence A**a - 0.0250000
Mean - -0 066154
10 8
Sd= 0.5560117
J 4
Slope - 2 .625
Hercept -1 279
98
Correlation, R- 0.741
88
7 8
68
(A
c.
« 5fl
2
M
o 48
T3
2i
¦S 3 8
o
2.8
18
Ujjwi SinwltuMom Limit = 1 7298
Upper Indrrifeal Limit ¦ 1.011 i
08
M*&&>-0.0662
j j j
¦0.2
[
-U
Lower Indiwinl Limit • -1*1434 J '
Loto Simihtn*cai> Limit ¦ -1.8621
-2 87 -2.37 -1.87
-1.37 -0.87 -0 37 0 13 0 63 1.13
Theoretical Quantiles (Standardized Data)
1.63 2.13 2.83 2.87
Note: The observations outside the simultaneous limits are considered as outliers.
305
-------
8.1.1.2 Interval Graphs
8.1.1.2.1 Compare Intervals
l. Click on QA/QC ^ Univariate > No NDs > Interval Graphs E> Prediction,
Tolerance, Confidence, Simultaneous or Individual ~ Compare Intervals.
9 Scout 2008 - [D:\NarainlWorkDatlnExcelUiULLIRIS-nds]!
if} oj Fte Edit Cortgute Data Graphs Stats/GOF CutfarsfEsfrriates Regression Multivariate EDA GeoStats Propams Window help
Navigation Panel |
D \Narain\WorkDatl
sp-tength
Q-Q Plots wth Lints la 1 q
MAivariate > | Wih NDs ~
-g-g ControlCharti
-IfllXl. ff >
_m I 11 I o —
Cotnpae Prediction Intervals
49
~47'
3 14
"32 U
Tolerance Predctton Interval Index Plot
Corfdence ~
SmJtaneous ~
IndrnduaJ ~
2. The "Select Variables" screen (Section 3.4) will appear.
o Click on the "Options" button for the options window.
H RredictiunMethodlComparisoii]Qfi/Q£,OptionsJ
m
'Select Methods *
[~ Classical
[7 PROP
R? Huber
R? TukeyB weight
W Lax/Kafadar Biweight
I* MVT
"Confidence *
Confidence Level
I 095
"Title for Graph •
{Prediction Interval Comparison
"PROP MethodOptions
~tt Iterations -
10
-Initial Esbmate
r Mean/Stdv
<* Median/1 48MAD
-InfluenceAlpha -
0 025
-MDs Distrbution ~
(* Beta
*"* CHsquared
~Hubei Method Options
-# Iterations —
10
-Initial Estimate
C Mean/Stdv
Mediant 48MAD
"Influence Alpha —
I 0025
-Tukey Biweight Method Options
*8 Iterations —
I 25
Maximum
rlmtial Estimate
C Mean/Stdv
Median/1.48MAD
-Tuning Constants
I 4
Location
-Lax/Kafadar B iweighl Method Options
Iterations *
25
Maximum
"InitialEstimate
C Mean/Stdv
(5* Median/1 48MAD
-Tiding Constants
4
Location
MVT Method Options
— 8 Iteiatfons —
-Initial Estimate
C Mean/Stdv
<• Median/1 48MAD
-Trimming %
"MDsDistiibubon
(* Beta
C Chisquaied
6
Scale
6
Scale
OK
Cancel
J
o Select the methods to compare in "Select Methods" box. By
default, all methods are selected.
o Specify the various input parameters for the selected methods,
o Click "OK" to continue or "Cancel" to cancel the options.
306
-------
• Click on "OK" to continue or "Cancel" to cancel the intervals
comparison.
Output example: The data set "Bradu.xls" was used for the Interval Comparison. The
options used were the default options.
Output for the Prediction Interval Comparison.
Prediction Interval Comparison
Intervals for y
¦ Medan|Mean n- 75
307
-------
Output for the Tolerance Interval Comparison.
Tolerance Interval Comparison
Intervals for y
Output for the Confidence Interval Comparison.
Confidence Interval Comparison
5 I i I
i
JZ
zzt
*e?—
Intervals for y
308
-------
Output for the Simultaneous Interval Comparison.
Simultaneous Interval Comparison
Ornate* PROP Huber TiAey Lax wuar MVT
sd • 3.493 33 - C 556 sd • 1 «04 sd> 0.613 sd»0662 tf-2072
Intervals for y
¦ Median ¦Mean n-75
Output for the individual Interval Comparison.
309
-------
8.1.1.2.2 Intervals Index Plots
l. Click on QA/QC ^ Univariate ~ No NDs ~ Interval Graphs > Prediction,
Tolerance, Confidence, Simultaneous or Individual ^ Intervals Index Plots.
•} Scout< 2008 - [D:\Narain\WorW7atlnF.xcel\nULIilRI5-nds]
o? Ffc EcK Corfgure Data Graphs Stats/GOF Outtes/Estrnates Re^esun MJbvanate EDA GeoStats Propams Window Help
Navigation Panel I
-lg|x|. 3 >
D.VNarainVWorkDall
1 | 2 Q-Qftost*hLn
z.:i:
2. The "Select Variables" screen (Section 3.4) will appear.
° Click on the "Options" button for the options window.
ions, QA/QG, Tojer.ance I rite r,vaU I n dex< B lot'
"Select Method
C Classical
PROP
C Huber
C Tukey Biweight
C Lax Kafadar Biweight
r MVT
p Confidence Level
I 035
Converage
0.9
"Influence Alpha
0.025
"Initial Estimate
C Mean/Stdv
Median/1.48MAD
rtt Iterations"
"MDs Distribution
(* Beta
Chisquared
25
Maximum
!>/ Use Default Title
OK
Cancel
A
o Select one of the methods for the interval in "Select Methods"
box. By default, "PROP" is selected.
o Specify the various input parameters for the selected method.
o Click "OK" to continue or "Cancel" to cancel the options.
310
-------
• Click on "OK" to continue or "Cancel" to cancel the intervals
comparison.
Output example: The data set "Bradu.xls" was used for the Interval Index Plots. The
options used were the default options.
Output for the Prediction Interval Index Plot.
118
PROP Prediction Interval (Next 1) for y
108 J
98 j
7.8
68
58
> *8
38
¦ 05% Preddkm Lrtts
Lower - -1 185427
Upper-1.0531190
Numew C*>> • 75
¦ PROP Mean
¦ Meen--0 066154
SO ¦ 05560117
28
18
03
* J
-02
J ' J - - - , ' * m.
*
* *
* ,
0 4 1
J 12 18 20 24 28 32 36 *0 44 <8 52 56 B0 64 6
Index of Observations
6 72 76
31
-------
Output for the Tolerance Interval Index Plot.
PROP Tolerance Interval for y
118
98 a
88
78
68
58
¦** 48
38
¦ 95% Tolerance Lints
wth 90* Coverage
Lower • -1 146212
Upper -1 0139043
Nurtw Ofet - 75
¦ PROP Mean
¦ Mean--0066154
SO-0.5S60117
28
18
08
- . _ - . • •* ¦ , . , •
-03
• ¦' '• / . ' - "
-12
-23
ft 4
8 12 18 20 24 28 32 36 40 44 48 52 58 60 64 68 72 71
Index of Observations
Output for the Confidence Interval Index Plot.
PROP Confidence Interval fory
118
108
-
98 ,
88
7.8
68
58
195% Confidence Umts
Lower - -0 203927
Upper -0 .0716191
*4.3
Number Obi ¦ 75
PROP Mean
Mean •-0 066154
38
SC-0.556O117
24
18
08
- ' ,
-02
-13
-23
0
~ 8 12 16 20 24 23 32 36 40 44 48 52 56 60 $4
Index of Observations
3 72 76
312
-------
Output for the Simultaneous Interval Index Plot.
PROP Simultaneous Interval for y
¦ 95% SimJtaneous Lmls
Lower • -1862105
JK*r-1.7297375
NurtwObs-75
PROP Mean
M««n - 0 066154
SO-0.5560117
Output for the Individual Interval Index Plot.
PROP Individual Interval for y
H 95% krtvidual Ln
-------
8.1.1.3 Control Charts
8.1.1.3.1 Using All Data
l. Click on QA/QC > Univariate > No NDs > Control Charts &>• Using All
Data.
Seoul 2008 , [D:\Narain\Woi WatlnExcel\!JUULIRiS=nds]|
pP Fde Edit Confgixe Data Graphs Stats/GCF Outiers/Estmates
Navigation Panel
Regression Multivariate EDA Geo Stats Programs Window Help
Name
D'\Narain\WorkDatl
1
sp-length
2 Plots with Units
MJUvanate >| W*hNDs >| Interval Graphs
I 8
51}
49l
0 2i
Control Charts ~ I Using AD Data
3; 14| 0 2F 1i 1. 1
2. The "Select Variables" screen (Section 3.4) will appear.
° Click on the "Options" button for the options window.
Using Tranrtg/Background
Mil y mvarjiatG) Go ntcglj Chajjt' Options
-Select Method
f Classical
fi" PROP
Huber
C Tukey Biweight
C Lax Ka(adar Biweight
r MVT
|- Confidence Level —
10.95
"Converage
|09
r# Iterations ¦
25
Maximum
"Prediction -
NextK.
"Influence Alpha
I 0025
-Select Intervals
l*7 Prediction Intervals
(v? Toleiance Intervals
R7 Simultaneous Intervals
[~ Individual Intervals
(7 Min/Max
Sigma Limits
-Initial Estimate
C Mean/Stdv
f* Median/1.48MAD
"MDs Distribution
<• Beta f" Chisquared
(7 Use Default Title
"Sigma Factor
I 15
OK
Cancel
o Select one of the methods for the interval in "Select Methods"
box. By default, "PROP" is selected.
o Specify the various intervals in the "Select Intervals" box.
314
-------
o Specify the various options for the selected method and
intervals.
o Click "OK" to continue or "Cancel" to cancel the options.
• Click on "OK" to continue or "Cancel" to cancel the control charts.
Output example: The data set "Staekloss.xls" was used for the Control Charts. The
options used were the default options.
PROP Control Limits for Stack-Loss
455
*01
Number Obs« 21
PROP Stats
Mean -13.354037
S0«42301554
¦95% Skrn4aneous Itrft
301
Lower ¦ 2-2*49926
Upper - 24 .463082
¦ 95% Predfctiori Limlj
Lower -4.1380623
Upper-22 570012
M
O
Lower - 3.1593627
3)
201
Lower-5 4361663
Upper-21 271908
¦ Sigma Limts
S^na Factor >1 5000000
j *
Lower Lint - 7 0088041
*
Upper L*K ¦ 19 699270
*
I MnknumMaxlPium
10.1
*
Mramun • 7 0000000
Maximum-42 000000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Index of Observations
18 19 20 21 22
8.1.1.3.2 Using Training/Background
1. Click on QA/QC ~ Univariate ~ No NDs ~ Control Charts ~
T raining/Background.
3 Scout 2008 [D:\Narain\WorkDatlnExcelVFULLIRIS-nds]
Bji File Edit Configure Data Graphs Stats/GOF Out&ers/Estimates
Regression Muftivariate EDA GeoStats Programs Window Help
Navigation Panel i
Name
D:\Narain\WorkDatl.
0 1 2 KESEiSSSflD K&9D Q-Q Plots With Limits
1 court sp-length sp-widH " With NDs ~ Intetval Graphs
1 5.1 15 1.4 0.2
2 ! i| Q| 3| 14| 02] 1 1 1
| 8 9 10
] Using All Data
Using Trainrig/Background
2. The "Select Variables" screen will appear.
3I5
-------
SelectjSpecific Site.and;Variables
Variables
Selected
Name
I ID | Count |
Name
I ID 1 Count|
sp-length
sp-width
pt-lerigth
pt-wtdth
count
0 150
1 150
2 150
3 150
4 150
Options I
SelectGroupColumn and I nput T est/S ite
Select Group ID Column
Input Specific Test/Site fiom Group ID
Cancel |
o Select the variable of interest.
o Select the group variable using the "Select Group ID Column" drop-
down bar.
o Input the group name/number of the variable which is considered as the
test set in the "Input Specific Test/Site from Group ID" box.
° Click on the "Options" button for the options window.
316
-------
HlBl yniyajjijfei Control'Chant' Qotibris,
-Select Method
C Classical
PROP
C Huber
Tukey Biweight
C Lax Kafadar Biweight
C MVT
"Select Intervals
W Prediction Inteivals
17 T olerance Intervals
17 Simultaneous Intervals
Individual Intervals
17 Min/Max
[7 Sigma Limits
(7 Use Default Title
"Confidence Level
jO 95
"Converage
[09
# Iterations
25
Maximum
"Prediction ¦
Next K
"Influence Alpha
I 0025
"Initial Estimate
Mean/Stdv (• Median/1.48MAD
"MDs Distribution
<• Beta C Chisquared
_Sigma Factor —
I
OK
Cancel
-M
o Select one of the methods for the interval in "Select Methods"
box. By default, "PROP" is selected.
o Specify the various intervals in the "Select Intervals" box.
o Specify the various options for the selected method and
intervals.
o Click "OK" to continue or "Cancel" to cancel the options.
° Click on "OK" to continue or "Cancel" to cancel the control charts.
Output example: The data set "Fulllris.xls" was used for the Control Charts. The
options used were the default options.
317
-------
Classical Control Limits for sp-length Using Training Set
Test SeJ OroupC » 3
OtenTedSet-50
Ote In Training Set *100
Classical Slrfs using Tranr
Mean ¦ S.4710GOO
SD-0 6416383
195% Sr«Jltri«cius Limits
Lower -3299
Upper - 7.643
195% Precicton Llmts
Lower-4191
Upoer - 6.751
S 95% Tolerance Lntfs
wlh 90% Coverage
Lower ¦ 4 269
l*»er-6.673
4.08
3.58
308
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Index of Observations
Note: The observations in "dark blue" and "bigger point marks " are from group 3. The intervals are
calculated using the observations from group 1 and 2 only which were used as the
"Training/Background" set for this data set.
8.1.2 With Non-detects
8.1.2.1 Interval Graphs
1. Click on QA/QC ~ Univariate ~ With NDs ~ Interval Graphs
~ Prediction, Tolerance, Confidence, Simultaneous or Individual.
a
Scout 2008 [D:\Narain\WorkDa1lnExcel\FULLIRIS-nds.xls]
~j1 Ffle Ed* Configure Data Graphs Stats/GOF Outlets/Estimates Regression
Multivariate EDA GeoStats Programs Window help
Navigation Panel |
0 1
2
Univariate ~H
Yo NDs ~
'A-M.'
1 R
7 A q m
Name
count sp-length
sp-width
Multivariate ~ |§|
Interval Graphs ~
Prediction Interval Index Plot
D:\Narain\WorkDatl...
1
1 5.1
3.5
1.4 0.2
i
Control Charts ~
Tolerance Interval Index Plot
Confidence Interval Index Plot
Simultaneous Interval Index Plots
InterQC.gst
2
1 49
3
1.4 0.2
1
1
3
1 4.7
i ir
32
-i 1
1.3 0.2
1 c m
1
1
n
Individual Interval Index Plots
2. The "Select Variables" screen (Section 3.4) will appear.
• Click on the '"Options" button for the options window.
318
-------
Options QA/QG P[edietionjlnteiival) IndexPlot'
"Graphs with NDs replaced by —
<* Detection Limit (No Change)
C Normal ROS Estimates
Gamma ROS Estimates
C Lognormal ROS Estimates
C One Half (1 /2) Detection Limit
C Zero
r Confidence Level
I 095
9 Use Default Title
OK
-Future K
Oil :
Cancel
J
o Select the method to replace the non-detects with in "Graphs
with NDs replaced by" box. Default method is "Detection
Limit."
o Specify the various input parameters for the selected method.
o Click "OK" to continue or "Cancel" to cancel the options.
° Click on "OK" to continue or "Cancel" to cancel the intervals
comparison.
Output example: The data set "Fulllris.xls" was used for the Interval Index Plots. The
options used were the default options.
319
-------
Output for the Prediction Interval Index Plot with NDs.
85
Classical Prediction Interval (Next 1) for sp-length
7,6
66
a *
Kaplan Meier SMs
Nuribef OtW"150
Number NDs-6
fvCs -1/2 Detection Liri In Red
Mean - 5.845
SO - 0.822
195% Prediction Urrats
Lower «4.216
Upper ¦ 7.474
SB
¦C
a
c *
a
T ,
W
4 6
3.6
26
1.6
0
atii
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Index of Observations
Note: The non-detect observations are replaced by "One Half (1/2) Detection Limit" indicated by the red
points.
320
-------
8.1.2.2 Control Charts
8.1.2.2.1 Using All Data
I. Click on QA/QC > Univariate > With NDs l> Control Charts > Using All
Data.
SI Seoul 2008 - [D:\Narain\WorkQatInExc'elVFjUllLIRIS-nds]]'
&~ o? Edit Conflgixe Data Graphs Stats/GOF OutDers/Estlmates
I Recession Multivariate EDA GeoStats Programs Window Help
Navigation Panel |
1
o !
1
2 !
No NDs
Name j
count
sp length
sp-wtdkh
Mullivdiwle ~iKESESI
1 1 ' 1 —¦¦¦!
D \Narain\WorkDatl
1
1
5 1
35
14j 021 1 L
I ?
— -
"49
j ^
1 4| 0 2 1
7 1 8
Interval Graphs
Control Charts >1 Using All Data
j Usrrq Trammg^BackgrouxJ
2. The "Select Variables" screen (Section 3.4) will appear.
o Click on the "Options" button for the options window.
§§ QA/QC fjIDs Univariate Cqntr;oIIGhar,V 0ptions
¦Select KM Intervals
Prediction Intervals
17 Tolerance Intervals
17 Simultaneous Intervals
P IncDvidual Intervals
MirVMax
r~ Sigma Limits
17 Use Default Title
rConfrdence Level
|0 95
rFutureK ¦
1
•Coverage ¦
09
"Graphics Alpha
I 002§
"Giaphs with NDs replaced by —
<• Detection Limit (No Change)
C Normal ROS Estimates
C Gamma ROS Estimates
r Logrtormal ROS Estimates
C One Half (1 /2] Detection Limit
Zero
OK | Cancel |
d
o Select the intervals to be displayed on the control chart from
the "Select KM Intervals" box.
o Specify the various parameters for the selected intervals.
o Select the method to replace the non-detects with in "Graphs
with NDs replaced by" box. Default method is "Detection
Limit."
o Click "OK" to continue or "Cancel" to cancel the options.
° Click on "OK" to continue or "Cancel" to cancel the control charts.
321
-------
Output example: The data set "FullIRIS.xls" was used for the Control Charts. The
options used were the default options.
e.95
Kaplan Meier Control Limits for sp-length
770
- -
-
A
720
6.70
620
JZ
o
c
'
Number 06$ » ISO
* J ( Number Ws-6
- " NDs • Detection Lint m Red
J Kaplan Meier SI els
Mean -5 8453333
J ^ SD» 0.8216720
H 95"* Smulaneous LMs
- Lower-2 956
"2.570
M
520
4.70
Upper • 8 735
H 95% Preflfctton Urtls
Lower • 4216
Upper-7.474
|95% Tolefsnce LMs
wth 90* Coverage
Lower - 4345
Upper »7345
3.70
3.20
0
10 20 30 40 50 60 70 80 90 100 110 120 130
Index of Observations
140 150
Note: The non-delecl obsen>ations are replaced by "Detection Limit" indicated by the" red points. "
8.1.2.2.2 Using Training/Background
1. Click on QA/QC ~ Univariate ~ With NDs ~ Control Charts ~
T raining/Background.
Scout 2000 [D:\Narain\WoikDatlnExcel\FULLIRIS nds]
ajJ File Edit Configure
Data Graphs Stats/GOF Outiers/Estimates
ih'tfifttM Re9ression Multivariate EDA GeoStats Programs Window Help
Navigation Panel
r'
0
1
2
| Univariate ~
NofCs ~ I r 7 8 9 1n
Name
count
sp-length
sp-widlh
Multivariate ~
Interval (jraphs ~ 1 j T
iiju Era nWiVrn 9 iehm
1-.4—r\r-
hn
1l
5.1
3.5
Control Charts ~ MTrTT^f
1 n 1
1
3
2. The "Select Variables" screen will appear.
322
-------
Selec(' Specific Site.and'Variables
Variables
Name
count
sp-length
sp-width
pt-lenglh
pl-wtdth
| ID | Count [
150
150
150
150
150
Selected
Options
Name
ID | Count]
•Select Group Column and Input Tesl/Srte
Select Group ID Column
Input Specific Test/Site fiom Group ID
OK
Cancel
o Select the variable of interest.
° Select the group variable using the "Select Group ID Column" drop-
down bar.
o Input the group name/number of the variable which is considered as the
test set in the "Input Specific Test/Site from Group ID" box.
o Click on the "Options" button for the options window.
Ill QA/QG NDs Uhiyaijiate. Go n t rq l! G ha i;t< O f) t i ons
'Confidence Level
[095
m
•Select KM Intervals
Prediction Intervals
15* Tolerance Intervals
1^ Simultaneous Intervals
r~ Individual Intervals
r" Min/Max
f~" Sigma Limits
|7 Use Default Title
[-Future K -
i r
"Coverage -
| 09
"Graphics Alpha
| 0025
-Graphs with NDs replaced by —
(~ Detection Limit (No Change]
Normal ROS Estimates
f Gamma ROS Estimates
C Lognormal ROS Estimates
One Hal/ (1/2) Detection Limit
Zeio
OK
Cancel
323
-------
o Select the intervals to be displayed on the control chart from
the "Select KM Intervals" box.
o Specify the various parameters for the selected intervals.
o Select the method to replace the non-detects with in "Graphs
with NDs replaced by" box. Default method is "Detection
Limit."
o Click "OK" to continue or "Cancel" to cancel the options.
• Click on "OK" to continue or "Cancel" to cancel the control charts.
Output example: The data set "Fulllris.xls" was used for the Control Charts. The
options used were the default options.
Kaplan Meier Control Limits for sp-length Using Training Set
826
812
m
mm a
7.62
a
a MM
7.12
a
a M MM
i a a
Obs In Training Sal -100
- - a a m m ]
NDs in Training Set - 6
6.62
M j
j a a a m
NDs - Detection Liml in Red
j a a a a m a
Kaplan Meier Stats using Training Set
a a a a a a a a a
Test Sd Group 0-3
612
a J a a a a
Obs in Test Sd-50
a a a * a a
Mean • 5.47400CO
-C
a a M
SO- 0 6331856
CB
C
t a a j a a a
¦ 95% StnJaneous Ltnts
Q_562
i j j j a a
Lower • 3 331
A A ¦ AA Ai
Upper - 7 617
J ¦»
195% Piedction Lrris
AA A A
Lower ¦ 4.211
5.12 '
AA A A AAA i
Upper - 6.737
AA A A 12 A A
¦ 95% TotoranceUtfs
' ^
A '
w*h 90% Coverage
¦
A
Lower -4 288
462 a a
'
Upper-6 66
4.12
3.62
3.12
0 10
20 30 40 50 60 70 60 90 100 110 1 20 130 140 150
Index of Observations
Note: The observations in "dark blue " and "bigger points " are from group 3. The intervals are calculated
using the observations from group I and 2 only which were used as the "Training/Background'' set for
this data set. The "red points " indicate non-detects at the detection limit.
324
-------
8.2 Multivariate QA/QC
Several classical and robust multivariate procedures are available in the QA/QC module
of Scout. The multivariate, with non-detects module uses the Kaplan Meier estimates.
This QA/QC module includes MDs Q-Q Plots with limits, MDs Control Charts and
Prediction and Tolerance ellipsoids. The robust methods include in this module are
explained in Chapter 7.
8.2.1 No Non-detects
8.2.1.1 MDs Q-Q Plots with Limits
1. Click on QA/QC > Multivariate l> No NDs t> MDs Q-Q Plots with Limits.
Navigation Panel j
Name
325
-------
2.
The "!
O
Select Variables" screen (Section 3.4) will appear.
Click on the "Options" button for the options window.
§1 Multivariate Options
x;
pSelect Method
r
Classical
r
Sequential Classical
r
Huber
r
MVT
(*
PROP
r
MCD
-Select Initial Estimates
C Classical
C Sequential Classical
C Robust (Median. 1 48MAD)
f* OKG'(MaronnaZamar)
C KG (Not Orthogonalized)
C MCD
"Critical Alpha
jO 05
-# Iterations —
| 25
Maximum
¦Influence Alpha —
I 005
-MDs Distribution
f Beta
r Control Limits at:
Chisquared
I? Critical Value of Individual MD
ly" Critical .Value of Maximum MD
Is? Use Default Title
Title for Chart
|Q-Q Plot of MDs with Limits
OK I Cancel I
' ' ' 1 ^
o Specify the method for computing the quantiles in "Select
Methods." The default method is "PROP."
o The robust methods need various input parameters like
"Influence Alpha" or "Trimming Percentage," "Initial
Estimates," "MDs Distribution," "MDs Distribution" and
"# Iterations."
o Specify the "Critical Alpha for Limits" for identifying the
outliers. Default is "0.05."
o Specify the lines for control limits bys using the "Control
Limits at:" option. Both options are unchecked as default.
o Click "OK" to continue or "Cancel" to cancel the options.
o Click on "OK" to continue or "Cancel" to cancel the MDs Q-Q Plots with
limits.
326
-------
Output example (Using All Data): The data set "Stackloss.xls" was used for the Q-Q
Plot. The options used were the default options.
PROP Q-Q Plot of MOs with Limits
IIS!
SIOMlai
21
nij
Slop#
Hfccirt
161017
-13MS9
1013
91-3
su
Ctbol CartBtahon (0 05)
ovca Kuno3» (0 OS)
Crlfc# Skewnns (0 05)
0*932
1.«8«239
22 7607
14311 3093
6(156
n j
4
Squared MOs
U W M
-
313
213
II 8906
1 J ¦*
MlKOiUa-t 1726
32 08 U 20
sa 48 a e a it
Scaled Beta Cuantiles
ee aa
Note: The observations above the maximum limit line are considered as outliers.
Output example (Using Training/Background): The data set "'Fulllris.xls'" was used
for the Q-Q Plot. The options used were the default options.
Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set.
327
-------
8.2.1,2 MPs Control Chart
l. Click on QA/QC >• Multivariate ~ No NDs ~ MDs Control Chart.
33 Scout' 200B - [J):\Mar
-------
o Specify the lines for control limits bys using the "Control
Limits at:" option. Both options are unchecked as default.
o Click "OK" to continue or "Cancel" to cancel the options.
• Click on "OK" to continue or "Cancel" to cancel the MDs Control Charts.
Output example (Using All Data): The data set "Stackloss.xls" was used for the MDs
Control Charts. The options used were the default options.
PROP Multivariate Control Chart
111.3
101.3
91.3
81.3
713
Squared MDs
u u u
31.3
213
95% Maximun (Largest MD) Liml -11 89
08
95% W*ninQ (IndivKiial MO) Liml - 0.1726
1.3 J
* * 4 4 * J
* j J
0 1 2 3 4 5
6
7 8 9 10 11 12 13 14 15
Index of Observations
16 17 18 19 20
21
22
Note: The observations above the maximum limit line are considered as outliers.
329
-------
Output example (Using Training/Background): The data set "Fulllris.xls" was used
for the MDs Control Charts. The options used were the default options.
PROP Multivariate Control Chart Using Training Set
Training DA* Statistic*
54.8
m
n 100
P «
Wis. Estimates: OKG
hluencs Alpha 0 0500
MD distribution: Beta
N»jtroer Iterations 25
46.0
m
a
re-It DaU Statistics
Test Set Oroup ID » 3
to 50
D 4
360
* a
¦3
a
a
* a
m
Q
£
"g 26 0
5
r»
¦ a
CO
a
55% Mewmin (Urges* M0) Unl -18.4338
* a
16.0
•
¦
a
m j a
a a -
8
a aa a m
*
95% Warring (Individual KC)AJmt ¦ 921 S3J * a a
60
' - a * *
' a * *
-40
'J j J aa
j J J i
>* '*> ¦ .*
u
* " J J
10
20 30 40
50
60 70 80 90 100 110 120 130 140 150
Index of Observations
Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set.
330
-------
8.2.1.3 Prediction and Tolerance Ellipsoids
1. Click on QA/QC > Multivariate > No NDs > Prediction and Tolerance
Ellipsoids.
HScouti20081- [D:\Narain\Scout For W,indows\ScoutSnurce\WorkDatlnExcelVHI!)llLIRlS.xU|j. 1
OA/or
Navigation Panel |
0 1
2 I Univariate ~ U I r I el 7
L_
8
Name
count sp-length
sp-WUfll
MDs Q-Q Wot with Limits
MDs Control Chart
D \Narain\Scout_Fo
InterQC qst
1
1; 51
r
3 5; 14]
3* "f
2
' 1 1
2. The following "Select Variables" screen will appear.
BSlSelect'Variables,lOiGraphi . . [- ][n][x)
Variables
Select Y Axis Variable
1 Name ! ID I
Count I
» 1
Name I ID I Count I
Count. 0
75.. 1
y 1
xl 2
75
75
« 1
x2 3
75
x3 A
75
» 1
Select X Axis Variable
1 1
Name | ID | Count |
« 1
Select Group Variable
Options |
i _d
OK | Cancel j
A
° Click on the "Options" button for the options window.
331
-------
131 Options Outlieij MethodGomRgrisoa
•Select Efepse(s) ~
W Classical
W Robust
¦EftpseGrouprq —
V AIData
W By Group
' Classical Cutoff for Contours
Cnbcal Alpha
005
" R obust Cidcff for Contour $
Dilrcal Alpha
1 005
(7 Use Default Title
Title for Graph I
"Classcal Contour Plots —
<* Individual [DOcut]
C Simultaneous [Max MD]
C SimuDaneous/lndrvtdual
-Robust Contou Plots
(* Individual [DOcut]
Simultaneous (Max MD]
C Simultaneous^rtdr/tdual
"Label Individual Points —
Observation Number
(* By Group Designation
Prediction and/or Tolerance Elbpsotds
'Select Estmation Method (s)
P Sequential Classical
V Hubef
[7 PROP
r MVT
W MCD
"Select Initial Estimates
C Dasscal
r Robust (MeckanJ 48MAD)
<* OKG [MaronnaZamar]
f" KG (Not Orthogonalced)
C MCO
-MDs Distribution
<• Beta <"* CKsquaie
¦Select Number of Iterations —i
10
[Max - 50]
'Hubei and/or PROP
| 005
Influence Function Alpha
A
o Specify the required options. These options are discussed in
Section 7.2.3. The user has an option for drawing the ellipsoids
by groups if the observations are from different groups.
o Click "OK" to continue or "Cancel" to cancel the options,
o Click on "OK" to continue or "Cancel" to cancel the Ellipsoids.
332
-------
I
Output example (Using All Data): The data set "'Fulliris.xls" was used for the
Ellipsoids. The options used are shown in the options window screenshot above. The
ellipses are being drawn by groups.
Prediction Ellipsoids
394
ITS 2.26 276 3X 376 426 ««
sp-width
41*2*3
Output example (Using Background/Training): The data set "Fulliris.xls" was used
for the Ellipsoids.
333
-------
Note• The observations in "dark blue" and "bigger points" are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set
8.2.2 With Non-detects
8.2.2.1 MDs Control Charts
1. Click on QA/QC Multivariate ~ With NDs ~ MDs Control Charts.
1
Scout 2008 r[p:\Nnrain\WoikDatl.nExcel\fiULlilRISrnds]J
oj File Edt Configure Data Graphs Stats/GOP Outibers/Estimates Regression Mitovanete EDA GeoStats Programs Wndow Help
Navigation Panel | I | 0 j 1 | 2 I tJnJvjfi^e I ^ | 6 j 7
Name
D:\Narain\Wort\Datl...
nAnrn I IDTPQT
~r
count | jp^ength
51
Tl " 49
10
d to- I fl Dt-
No NDs
Pretficbon and Tolerance Elpsoids ~ ) Usmg Tramng/Background
1 4'
0.2,
2. The "Select Variables" screen (Section 3.4) will appear.
o Click on the "Options" button for the options window.
Hi QA/QC NDs Multvaxiate Gontr;oliGhar,ts Options
-Critical Alpha
0 05
-Graphs with N D s replaced —
<• Detection Limit (No Change)
C Normal R0S Estimates
Gamma ROS Estimates
Lognormal ROS Estimates
C One Half (1 /2) Detection Limit
C Zero
"MDs (KM Estimates) Distribution —
(•Beta C Chisquared
¦ Contiol Limits at ¦
I* Ditical Value of I ndividual H D
P Ditical Value of Maximum MD
(? Use Default Title
OK
Cancel
o Specify the "Critical Alpha" for identifying the outliers.
Default is "0.05."
o Specify the distribution for the distances using "MDs (KM
Estimates) Distribution" box.
o Specify the lines for control limits bys using the "Control
Limits at:" option. Both options are unchecked as default.
o Click "OK" to continue or "Cancel" to cancel the options.
334
-------
Click on "OK" to continue or "Cancel" to cancel the MDs Control Charts.
Output example (Using All Data): The data set "Fulllris.xls" was used for the control
chart. The options used were the default options.
206
95% Maxmum (Largest MD) Lc
Kaplan Meier Multivariate Control Chart
nt-19.7317
SUtMfJc*
n 150
P 4
188
1 TO
MJRows 15
~Os ¦ Detected In Red
Crtfctf Alpha 00500
MD Distribution Beta
168
15.8
148
-
13*
128
118
m
Q 108
5
"2 98
*
O" 88
-------
Output example (Using Training/Background): The data set "Fulllris.xls" was used
control charts. The options used were the default options.
Kaplan Meier Multivariate Control Chart Using Training Set
204 95*Maxtnun(LargestMD)Lmi* 18<338
TiakMnfl DM* H jtiatka
n 100
P *
Crttcal l&ha 00500
kC Diitrfcuhon: Beta
12.4
95* Warning (IrxfvtduM MD) IM • 9 7163
J
a
T«M Dal* St4(1*1 ics
Test Set Group O • 3
n 50
p *
. • . a a a m
s ' ' ¦ - 4 , •
. - • o - J-*. ' . " . M M
- '* ' J . „ , ,
o - aj " - , \ 1 V, «
5 • , a
-O 1 ' ' J " • i
• i J' *
5
S ' m" *
-re " «
-
a
m
-17jB
a a
a
-276
0 10 20 30 40 50 60 70 60 90 100 110 120 130 140 150
Index of Observations
Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set. The non-detect observations are in "red
point marks."
336
-------
8.2.2.2 Prediction and Tolerance Ellipsoids
m
Click on QA/QC > Multivariate > With NDs E> Prediction and Tolerance
Ellipsoids.
Seoul; 2008.! - rP:Warain\Scout'_For._W,indbws\ScoutSource\WorkDatl'nExcGl\FljLLIRIS.xlslJ
QA/QC •
SCI, FSe Edit Configure Data Graphs Stats/GOF Outliers/Estimates
Navigation Panel
Regression Multivariate EDA GeoStats Programs Window Help
Name
D \Narain\Scout_Fo
InteiQC qst
0
1
$p-length
~ tl
I
__L
sp-widtl.
MDs Q-Q Plot with Limits
MDs Control Chart
35
3
I4!
1-4]
2. The following "Select Variables" screen will appear.
s! Select; Variables, to,, G[aghj
Variables
Name
1 ID
I Count I
I Count
0.
75,.
y
1
75
xl
2
75
x2
3
75
x3
4
75
Options
Select Y Axis Variable
Name
1 ID | Count 1
Select X Axis Variable
Name
ID
i Count
Select Group Variable
"3
OK
Cancel
° Click on the "Options" button for the options window.
Options Ellipsiods withjNon.-Detects
~EIGpseGrouping
P All Data 17 By Group
"Label Individual Points ~
C Observation Number
(* By Group Designation
-Kaplan Meier Contour Plots
C Individual [DOcut]
r* Simultaneous (Max MD]
(* Simultaneous/Individual
[7 Use Default Title
¦Simultaneous Contour Cutoff
Critical Alpha
r
0 05
-Individual Contour Cutoff —
Critical Alpha
| 005
-MDs Distribution
(* Beta C Chisquare
"Graphs with NDs replaced by —
(* Detection Limit (No Change)
f Normal R0S Estimates
C Gamma R0S Estimates
r Lognoimal ROS Estimates
f One Half (1/2) Detection Limit
C Zero
OK
Cancel
337
-------
o Specify the required options. These options include "Kaplan
Meier Contour Plots", "Critical Alpha(s)," "MDs
Distributions" for the contours and "Graphs with NDs
replaced by" option. The user has an option for drawing the
ellipsoids by groups if the observations are from different
groups.
o Click "OK" to continue or "Cancel" to cancel the options.
Click on "OK" to continue or "Cancel" to cancel the Ellipsoids.
338
-------
Output example (Using All Data): The data set "Fulliris.xls" was used for the
Ellipsoids. The options used are shown in the options window screenshot above. The
ellipses are being drawn by groups.
~T
Kaplan Meier Prediction Ellipsoids
ItM rmpsoids
HafiObs 15C
Mum NO* 1C
Crlical A^oha 0 0501
MX Ctetnbutorc Bea
Ids ¦ Detecton Lkri trtRed
sp-width
Bi #2 A3
339
-------
Output example (Using Background/Training): The data set "Fulliris.xls" was used
for the Ellipsoids.
Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set.
340
------- |