| | %gk United States
JVubUll Environmental Protection
Agency

Scout 2008 Version 1.0
User Guide
Part II

RESEARCH AND DEVELOPMENT


-------
EPA/600/R-08/038
February 2009

US EPA

frTgp,	Headquarters and Chemical Libraries

EPA West Bldg Room 3340
£PA	Mailcode 3404T

loQO-	1301 Constitution Ave NW	www.epa gov

^ -	Washington DC 20004

0% -	202-566-0556

01%

*¦* Scout 2008 Version 1.0

User Guide

(Second Edition, December 2008)

John Nocerino

U.S. Environmental Protection Agency
Office of Research and Development
National Exposure Research Laboratory
Environmental Sciences Division
^	Technology Support Center

S	Characterization and Monitoring Branch

^	944 E. Harmon Ave.

ir

&>
%

Las Vegas, NV 89119

Anita Singh, Ph.D.1
Robert Maichle1
Narain Armbya1
Ashok K. Singh, Ph.D.2

'Lockheed Martin Environmental Services
1050 E. Flamingo Road, Suite N240
Las Vegas, NV 89119

Department of Hotel Management DAr\noItnKi#

University of Nevada, Las Vegas nUpUbllOry IVIat@Nal

Las Vegas, NV 89154

Permanent Collection

Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official
Agency policy. Mention of trade names and commercial products does not constitute endorsement or
recommendation for use.

U.S. Environmental Protection Agency
Office of Research and Development
Washington, DC 20460

7663cmb09


-------
Notice

The United States Environmental Protection Agency (EPA) through its Office of
Research and Development (ORD) funded and managed the research described here. It
has been peer reviewed by the EPA and approved for publication. Mention of trade
names and commercial products does not constitute endorsement or recommendation by
the EPA for use.

The Scout 2008 software was developed by Lockheed-Martin under a contract with the
USEPA. Use of any portion of Scout 2008 that does not comply with the Scout 2008
User Guide is not recommended.

Scout 2008 contains embedded licensed software. Any modification of the Scout 2008
source code may violate the embedded licensed software agreements and is expressly
forbidden.

The Scout 2008 software provided by the USEPA was scanned with McAfee VirusScan
and is certified free of viruses.

With respect to the Scout 2008 distributed software and documentation, neither the
USEPA, nor any of their employees, assumes any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information, apparatus, product, or process
disclosed. Furthermore, the Scout 2008 software and documentation are supplied "as-is"
without guarantee or warranty, expressed or implied, including without limitation, any
warranty of merchantability or fitness for a specific purpose.

iii


-------
Acronyms and Abbreviations

0/° NDs	Percentage of Non-detect observations

ACL	alternative concentration limit

A-D, AD	Anderson-Darling test

AM	arithmetic mean

ANOVA	Analysis of Variance

AOC	area(s) of concern

B*	Between groups matrix

BC	Box-Cox-type transformation

BCA	bias-corrected accelerated bootstrap method

BD	break down point

BDL	below detection limit

BTV	background threshold value

BW	Black and White (for printing)

CERCLA	Comprehensive Environmental Response, Compensation, and

Liability Act

CL

compliance limit, confidence limits, control limits

CLT	central limit theorem

CMLE	Cohen's maximum likelihood estimate

COPC	contaminant(s) of potential concern

CV	Coefficient of Variation, cross validation

D-D	distance-distance

DA	discriminant analysis

DL	detection limit

DL/2 (t)	UCL based upon DL/2 method using Student's t-distribution

cutoff value

DL/2 Estimates	estimates based upon data set with non-detects replaced by half

of the respective detection limits

DQO	data quality objective

DS	discriminant scores

EA	exposure area

EDF	empirical distribution function

EM	expectation maximization

EPA	Environmental Protection Agency

EPC	exposure point concentration

FP-ROS (Land)	UCL based upon fully parametric ROS method using Land's H-

statistic

v


-------
Gamma ROS (Approx.)

Gamma ROS (BCA)

GOF, G.O.F.

H-UCL

HBK

HUBER

ID

[QR

K

KG

KM (%)

KM (Chebyshev)

KM (t)

KM (z)

K-M, KM
K-S, KS
LMS
LN

Log-ROS Estimates

LPS
MAD

Maximum

MC

MCD

MCL

MD

Mean

Median

Minimum

MLE

MLE (t)

UCL based upon Gamma ROS method using the bias-corrected
accelerated bootstrap method

UCL based upon Gamma ROS method using the gamma

approximate-UCL method

goodness-of-fit

UCL based upon Land's H-statistic
Hawkins Bradu Kaas
Huber estimation method
identification code
interquartile range
Next K, Other K, Future K
Kettenring Gnanadesikan

UCL based upon Kaplan-Meier estimates using the percentile
bootstrap method

UCL based upon Kaplan-Meier estimates using the Chebyshev
inequality

UCL based upon Kaplan-Meier estimates using the Student's t-
distribution cutoff value

UCL based upon Kaplan-Meier estimates using standard normal
distribution cutoff value

Kaplan-Meier

Kolmogorov-Smirnov

least median squares

lognormal distribution

estimates based upon data set with extrapolated non-detect
values obtained using robust ROS method

least percentile squares

Median Absolute Deviation
Maximum value
minimization criterion
minimum covariance determinant
maximum concentration limit
Mahalanobis distance
classical average value
Median value
Minimum value
maximum likelihood estimate

UCL based upon maximum likelihood estimates using Student's
t-distribution cutoff value

vi


-------
MLE (Tiku)	UCL based upon maximum likelihood estimates using the

Tiku's method

Multi Q-Q	multiple quantile-quantile plot

MVT	multivariate trimming

MVUE	minimum variance unbiased estimate

ND	non-detect or non-detects

NERL	National Exposure Research Laboratory

NumNDs	Number of Non-detects

NumObs	Number of Observations

OKG	Orthogonalized Kettenring Gnanadesikan

OLS	ordinary least squares

ORD	Office of Research and Development

PCA	principal component analysis

PCs	principal components

PCS	principal component scores

PLs	prediction limits

PRG	preliminary remediation goals

PROP	proposed estimation method

Q-Q	quantile-quantile

RBC	risk-based cleanup

RCRA	Resource Conservation and Recovery Act

ROS	regression on order statistics

RU	remediation unit

S	substantial difference

SD, Sd, sd	standard deviation

SLs	simultaneous limits

SSL	soj] screening levels

S-W, SW	Shapiro-Wilk

TLs	tolerance limits

UCL	upper confidence limit

UCL95, 95% UCL	95% upper confidence limit

UPL	upper prediction limit

UPL95, 95% UPL	950/,, L1pper prediction limit

USEPA	United States Environmental Protection Agency

UTL	upper tolerance limit

Variance	classical variance

W*	Within groups matrix

vii


-------
WiB matrix	Inverse of W* cross-product B* matrix

WMW	Wilcoxon-Mann-Whitney

WRS	Wilcoxon Rank Sum

WSR	Wilcoxon Signed Rank

Wsum	Sum of weights

Wsum2	Sum of squared weights

viii


-------
Table of Contents

Notice	iii

Acronyms and Abbreviations	v

Table of Contents	ix

Chapter 7	223

Outliers and Estimates	223

7.1	Univariate Outliers and Estimates	223

7.1.1	Dixon Test for Univariate Data	225

7.1.2	Rosner's Test for Univariate Data	226

7.1.3	MD-Basec! (Grubbs Test) Test for Univariate Data	228

7.1.4	Biweight Estimate for Univariate Data	229

7.2	Robust Estimation and Identification of Multiple Multivariate Outliers	231

7 2.1 Classical Outlier Testing	231

7.2.1.1	Mahalanobis' Distances	231

7.2.1.2	Multivariate Kurtosis	233

7.2.1.3	Identifying Causal Variables	235

7.2.2	Robust Outlier Testing	240

7.2.2.1	Sequential Classical	243

7.2.2.2	Huber	250

7.2.2.3	Extended MCD	259

7.2.2.4	MVT	268

7.2.2.5	PROP	278

7.2.3	Method Comparisons	289

References	299

Chapter 8	303

QA/QC	303

8.1	Univariate QA/QC	303

8.1.1	No Non-detects	303

8.1.1.1	Q-Q Plots with Limits	303

8.1.1.2	Interval Graphs	306

8.1.1.2.1	Compare Intervals	306

8.1.1.2.2	Intervals Index Plots	310

8.1.1.3	Control Charts	314

8.1.1.3.1	Using All Data	314

8.1.1.3.2	Using Training/Background	315

8.1.2	With Non-detects	318

8.1.2.1	Interval Graphs	318

8.1.2.2	Control Charts	321

8.1.2.2 1 Using All Data	321

8.1 2.2.2 Using Training/Background	322

8.2	Multivariate QA/QC	325

8.2.1 No Non-detects	325

8.2.1.1	MDs Q-Q Plots with Limits	325

8.2.1.2	MDs Control Chart	328

8.2.1.3	Prediction and Tolerance Ellipsoids	33 I

ix


-------
8.2.2 With Non-detects	334

8.2.2.1	MDs Control Charts	334

8.2.2.2	Prediction and Tolerance Ellipsoids	337

x


-------
Chapter 7

Outliers and Estimates

Outliers are inevitable in data sets originating from various applications. There are many
graphical (Q-Q plots, Box plots), classical (Dixon, Rosner, Welch, Max MD), sequential
classical (Max MD, Kurtosis), and robust estimation and outlier identification methods
(Biweight, Huber, MCD, MVE, MVT, OK.G, PROP) available in the literature. Classical
outlier tests suffer from masking (e.g., extreme outliers may mask intermediate outliers)
effects. The use of robust outlier identification procedures is recommended to identify
multiple outliers, especially when dealing with multivariate (having multiple
contaminants) data sets. Several univariate and multivariate (both classical and robust
outlier identification methods (e.g., based upon Biweight, Huber, and PROP influence
functions)) are available in this Scout software package.

7.1 Univariate Outliers and Estimates

For historical reasons and also for the sake of comparison, some simple classical outlier
tests are also included in the Scout software package. Specifically, the classical outlier
tests (often cited in environmental literature), Dixon and Rosner, are available in Scout.
For details, refer to ProUCL 4.00.04 Technical Guide. Those classical tests may be used
on data sets with and without non-detect observations. For data sets with non-detects,
two options are available in Scout to deal with data sets with outliers: I) exclude non-
detects, and 2) replace NDs by DL/2 values. Those options are used only to identify
outliers and not to compute any estimates and limits used in the decision-making process.

It is suggested that the classical (and also the robust procedures to be described later)
outlier identification procedures be supplemented with graphical displays such as Q-Q
plots, box-and-whisker plots (also called box plots), and interquartile range (IQR) plots
(upper quartile, Q3, and lower quartile, Ql). Those graphical displays are available in
Scout. Box plots with whiskers are sometimes used to identify univariate outliers (e.g.,
EPA 2006). Typically, a box plot gives a good indication of extreme (outliers)
observations that may present in a data set. The statistics (lower quartile, median, upper
quartile, and IQR) used in the construction of a box plot do not get distorted by outliers.
On a box plot, observations beyond the two whiskers may be considered to be candidates
for potential outliers.

On a normal Q-Q plot, observations that arc well separated from the bulk (central part) of
the data typically represent potential outliers needing further investigation. Moreover,
significant and obvious jumps and breaks in a Q-Q plot (for any distribution) are
indications of the presence of more than one population. Data sets exhibiting such
behavior on Q-Q plots should be partitioned out into component sub-populations before
estimating various statistics of interest (e.g., prediction intervals, confidence intervals).

223


-------
Dixon's Test (Extreme Value Test).

o Used to identify statistical outliers when the sample size is less than or equal
to 25.

° Used to identify outliers or extreme values in both the left tail (Case I) and the
right tail (Case 2) of a data distribution. In environmental data sets, extremes
found in the right tail may represent potentially contaminated site areas
needing further investigation or remediation. The extremes in the left tail may
represent ND values.

o Assumes that the data without the suspected outliers are normally distributed;
therefore, it is necessary to perform a test for normality on the data without
the suspected outliers before applying this test.

° May suffer from masking in the presence of multiple outliers. This means that
if more than one outlier is suspected, this test may fail to identify all of the
outliers. Therefore, if you decide to use the Dixon's test for multiple outliers,
apply the test to the least extreme value first.

Rosner's Test.

o Can be used to identify and detect up to 10 outliers in data sets of sizes 25 and
higher.

o Assumes that the data are normally distributed; therefore, it is necessary to
perform a test for normality before applying this test.

Depending upon the selected variables and the number of observations associated with
them, either the Dixon's Test or the Rosner's Test will be performed.

Biweight Estimates.

o Based on the estimation methods of Mosteller and Tukey (1977), Kafadar
(1981) and LAX (1985).

MD-based test (Grubb's Test)

° This is the multivariate extension of the univariate test known as the Grubbs
test. It is based on the assumption of normality. The generalized distances of
the multivariate data are calculated and the observation with the distance
greater than the critical value is expunged from the data set. The test is
iterated until no outliers are detected.

224


-------
7.1.1 Dixon Test for Univariate Data
l. Click Outliers/Estimates >• Univariate > Dixon.

Scout' 4'.(^o[D:\Narain\SMUt^fqr^WindbwsJ\5coutSDurce\WqrkDatrn^el\palaAcensor--b^rgr^s1[]J

Outliers/Estimates,

d@ File Edit Configure Data Graphs Stats/GOF
Navigation Panel

Regression Multivariate EDA GeoStats Programs Window Help

Name

D.\Narain\Scout Fo

Gioup

Multivariate ~

1

3 202,
"4238!

Rosner

Biweight Estimates
MO Based

Group2>s

19 601

L»_bioup4:
V

Jf- 23 836,

Groupoft

116467

102 922

Ujjroupj
	v	

2.

The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° The default the number for suspected outliers is 2. In order to use this test,
the user has to obtain an initial guess about the number of outliers that
may be present in the data set. This can be done by using graphical
displays such as a Q-Q plot. On this graphical Q-Q plot, higher
observations that are well separated from the rest of the data may be
considered to be potential or suspected outliers.

° Click on the "OK" button to continue or on the "Cancel" button to cancel
the Outliers tests.

225


-------
Output for Dixon Test.

; Dixon Outlier T est foi Selected Variable!

User Selected Options'

Date/Time of Computation 17/10/2007 3 53 44 PM

From File 'D \Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\bata\censor-bii-grps1
Full Precision |0FF

Test foi Suspected Outliers using Dixon Test |1

Dixon's Outliei Test foi GioupZ*!

Number of data = 20

10£ critical value: 0.401
5% critical value 0 45

1 % critical value 0 535

1. 37.867 is a Potential Outliei (Uppei Tai)

Test Statistic 0 383

For 10Z significance level, 37 867 is not an outlier

For 5Z significance level. 37 867 is not an outlier

For 1% significance level, 37 887 is not an outlier.

2.1.5 is a Potential Outliei (Lower T ai)

Test Statistic 0198

Foi 105; significance level, 1 5 is not an outlier

Foi 5X significance level, 1.5 is not an outlier
Foil % significance level, 1 5 is not an outlier

7.1.2 Rosner's Test for Univariate Data

1. Click Outliers/Estimates Univariate B>Rosner.

HI Scout' d'.Qj [D:\Narain\Scouti for. Wihdbws.VScoutSource\VVbrkDatrnExcel\Dati]\censorl=b^giipsj1i]j

Outliers^Estimates

Navigation Panel |



0

Multivariate ~

Dixon f

5 | 6

7 | 8

Name

Group

r	T

Group2X | u-b™Pz

Groups |

D.\Narain\Scout_Fo ..

1

3 202j

!

Bfwetght Estimates |
MD Based 1

19 6011 1
13 896l 1

116 4S7|

2

!| 4 238i

102 922]

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

226


-------
° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click on the "Options" button.

o The default number for suspected outliers is "1." In order to use this test,
the user has to obtain an initial guess about the number of outliers that
may be present in the data set. This can be done by using graphical
displays such as a Q-Q plot. On this graphical Q-Q plot, higher
observations that are well separated from the rest of the data may be
considered to be potential or suspected outliers.

° Click "OK" to continue or "Cancel" to cancel the Outliers tests.

Output for Rosner Test.

. Rosner Outlier T est foi Selected Variables

U sei S elected Options'

D ate/T rme of Computation 7/10/2007 3 56 51 PM

From File

Full Precision

test fox 'N' Suspected Outliers using Rosner

D \Narain\Scout_Fa_Windows\ScoutSource\WoikDatlnExcel\Data\censor-by-grps1

OFF
T

Rosner'* Outlier T est forX

Number of data 53

Number of suspected outliers 1

Mean

5110

Potential]^
sd| outlier |

4337! 121111

Test,
value

1*61

Critical
value {%%)

3151

Critical
value (1%)
"" J504

bX Significance Level, there is no Potential Outlier

For 1% Significance Level, there is no Potential Outlier

227


-------
7.1.3 MD-Based (Grubbs Test) Test for Univariate Data
I. Click Outliers/Estimates ~ Univariate >MD-Based.

HI Seoul 4lOj -J[iD,:\yarain\Sco_ul'_f>qr.3Vindo\ysj\Sco.utSQ.urcej\WorkDatInExcel'iBRAD.IJ])

File' Edit Configure Data Graphs Stats/GOF
Navigation Panel

Regression MuWvanate EDA Geo9:ats Programs Window Help

Name

D \Narain\Scout Fo

1

Court

"i;

"z'i"

Outliers/Estimates

Q Dixon
Multivariate ~ | Rosner

Biweight Estimates

97
10.1

I

2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options.'

)§Sl Options'OutlieSMITiTkal



¦^Select.Critical Alpha 	





C: 0.010





C 0.025





0.050





C 0100





O 0.150





C 0.200





C 0.250





OK,

jj Cancel









° Click "OK" to continue or "Cancel" to cancel the Outliers tests.

Output example: The data set "BRADU.xls" was used for the univariate MD-based
(Grubbs) test. The theoretical maximum MD at the selected critical alpha is calculated
and compared to the maximum MD obtained from the data set.

228


-------
Output for MD-Based Grubbs Test.

| U nivariate M D B ased (G rubbs T est) 0 utlier Analysis

User Selected Options



Date/Time of Computation

1/7/2008 410:24 PM

Fforn File

D AN arain\S cout_For_Windows\S coutS ource\W orkD atl nE xcel\B RAD U

Full Precision

OFF













x1























No Outliers Piesent











Initial Conditions











Max MD

Max MD(0.05)













5 796

10 78













|

|













7.1.4 Biweight Estimate for Univariate Data

l. Click Outliers/Estimates > Univariate > Biweight.

Scout ^VOj^fP^arainNSco.ut^Fior^Wjindo^^cqutSo.urceWorkDairnExcel^RAtiUJI

File Edit Configure Data Graphs Stats/GOF
Navigation Panel

Outliers/Estimates

Regression Multivariate EDA GeoStats Programs Window Help

Name

D \Narain\Scout_Fo.
D.\Narain\Scout Fo

.0

Counl

1

Dixon

Multivariate ~ I Rosner

9 7!

1011

MD Based

in 7i

?n 7\

qii

8'

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click on "Options.'

229


-------
' Qfttions,

T ukey Location T uning Constant

T ukey Scale T uning Constant

Lax/Kafadar T uning Constant

Maximum Number of Iterations

OK-

30

Cancel

A

o Click "OK" continue or "Cancel" to cancel the Outliers tests.

Output example: The estimates of location and scale were computed using the Tukey's
bisquare function and the Kafadar biweight function.

Output for Biweight Estimates.

User Selected Options
Date>Time of Computation

Univariate Biweight Outliei Analysis

1V7/2O08 4T4 02 PM

From File
Full Precision
Number of Iterations
T ukey Location T uning Constant

D \N arain\S cout_Fa_Wmdows\S coutS ouiceSWoikD atl nE xcel\B FIADU

OFF

30

4 " ~ - -- - - - -

Tukeji Scale Tuning Constant
Kafadai Scale Tuning Constant

4

? " "" "

| Robust B iweight E stimates



	 	 _ _ _	

Tukey

Kaf

adai

Variable j Ob* No.

Classical
Location | Scale

One-Step
Location Scale

Final
Location Scale

Fi

Location

rial

Scale

x1 | 75

3207 | 3G53

1.56 1 377

1 513 1 313

1 621

2.709

230


-------
7.2 Robust Estimation and Identification of Multiple
Multivariate Outliers

A myriad of classical and robust outlier identification procedures are available in the
literature. Most of procedures covering about the last three decades of research in the
area of robust estimation and outlier identification methods have been incorporated in
Scout. For the sake of comparison and completeness, some classical methods are also
available in Scout. A list of articles covering some of those research procedures is
provided in the references.

Several formal graphical method comparison tools have been incorporated in Scout.
Specifically, in both the outlier module and the Regression module, the user can pick
several methods and Scout will produce graphical comparison displays of those methods.
Some examples illustrating those methods are included in the User Guide. Several
benchmark data sets from the literature have been used throughout this user guide.

7.2.1 Classical Outlier Testing

Due to historical importance. (Wilk (1963)) and for the sake of completeness, classical
outlier methods have also been incorporated in Scout. The Classical outlier module
offers two tests for discordances: multivariate kurtosis and Mahalanobis distances
(sometimes called generalized distances). Multivariate kurtosis is also useful as a test for
deviation from normality in one or more dimensions. Both of those tests assume that the
data represent a random sample from a multivariate (p-dimensional, p> I) normal
population.

7.2.1.1 Mahalanobis'Distances

The classical Mardia's multivariate kurtosis (Mardia 1970, 1974, and Schwagerand
Margolin 1982) outlier (and multinormality) test and the MD test (Ferguson 1961a,
1961b and Barnett and Lewis 1994) have been incorporated into Scout. The generalized
distance (MD-based) test is a multivariate extension of a univariate Grubb's test (Grubbs
1950). Scout also computes robustified multivariate kurtosis, skewness, and the largest
MD. As can be seen below, outliers have a huge influence (impact) on those statistics.

1. Click Outliers/Estimates G> Multivariate [> Classical > Max MDs.

Hal Scout 4'.0j - [D:\Harain\Scouti_Fori Windin

«s\ScoutSource\WorkDatl'nExcel\BRADU|j

| Outliers/Estimates

Help

Navigation Panel |



0

Univanata ~ 1 1 t I n I c;

6

7

8,

Name

Count

	""'1

2

-J Robust/Iterative ~ 1 Kurtosis







OutUmBiWt ost

1

1 2

imj qq! ?n>;j •! Cau5a

	

	



2. The "Select Variables" screen (Section 3.4) will appear.

231


-------
Select two or more variables from the "Select Variables" screen.

If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

Click on "Options."

e i-lnl'ixi
- S elect Critical Alpha -—7

O 0.010
Q 0.025
(5.'0.050

D'o.100

Q 0.150
Q 0.200
O 0.250

OK

Cancel



Select the required "Critical Alpha" and click "OK" to continue or
"Cancel" to cancel the Outliers tests.


-------
Output for Max MDs test for outliers.

Data Set used: Bradu (From Hawkins Kaas, and Bradu, 1984 article).

: Classical Sequential Outlier Test Based Upon Maximum Mahalanobis Distance (Max-MDs]

User Selected Options



Date/Time of Computation

3/4/2008 8 42 40 AM

	

From File

D, \N ar ain\S cout_For_W mdows\S coutS ource\WoikD atl nE xcel\B FtAD U

	

Full Precision

OFF





















Number of Observations! 75

















Number ofVariablesU



































Classical Mean Vecta













V

x1

x2

x3

















1.279

3 207

5 597

7 231



































Dassical S tandaid D eviation Vectaf













V

x1

x2

x3









	







3 493

3 853 I 8.239

11.74























	

	







Classical CovarianceS Matrix





!

V

x1

x2

x3

















122

9 477

20 39

31 03

















9 477

13 34

28 47

41 24

















20 39

28 47

67 88

94 67



	





	

	





31.03

41 24

94 67

1378











Determinant

1906











Log of Determinant

7.553











	



















Eigenvalues of Classical Covaiiance S Maine





	







Eval 1

Eval 2

Eval 3

Eval 4















0.914

1 688

5 538

223.1









	





















Critical Alpha10 05











	



	















Outlier Summaiv |

















Obs No

Max MD

Max MD (0











	







14

43 7

17 43

















12

27 75

17 39



















11

34 2G

17 34



















13 5G99

17 29



















Result: Observations 11, 12, 13, and 14 were identified as outliers. The classical method with a classical
start could not identify the first 10 observations as outliers.

7.2.1.2 Multivariate Kurtosis

Mardia's multivariate kurtosis is an extension of the univariate kurtosis and, thus, may
also be used as a univariate outlier test. Multivariate kurtosis is also used to test
multivariate normality.

233


-------
Those tests, as incorporated in Scout, are sequential (repeated, without using any
previously identified discordant values). The process stops when no further outliers are
found. For the multivariate kurtosis test, the discordant observation identified is that
point which has the largest generalized distance from the sample classical mean vector.

1. Click Outliers/Estimates > Multivariate > Classical > Kurtosis.

Scout/d'.Oj^tpiAHarainXScoutliorWindows^XScqutSourcre^V/orl^QtrnExcel^FWDlLI])

Outliers/Estimates

Navigation Panel |

r

0 Univariate ~ II t I a 1 r

6

7

8

Name

Count Max MD
• 		v	s-^1	1 Robust/Iterative







liMfflmfJSWJMI

MOUT_MD ost

I 2

- - ,		f	1	1 Causal ~

2| 101 95j ?f)"i







2. The "Select Variables" screen (Section 3.4) will appear.

o Select two or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options."

1 ®5l Pptj.QnS'Outlie^HBl





- Select Critical Alpha —





C. 0.010





0 0.025





(•; 0 050





C 0.100





C 0.150





O, 0.200





O 0.250



: 1

; 1

i OK j Cancel j

A



o Select the required "Critical Alpha" and click on "OK" to continue or
"Cancel" to cancel the Outliers tests.

234


-------
Output for Kurtosis test for outliers.
Data Set used: Bradu.

Usei Selected Options

Classical Sequential Outlier Test Based Upon Multivariate Kutosis



Date/T ime of Computation

3/4/2008 8.44 50 AM

From Fila

D \Narain\Scout_For_W

ndowsSS coutS ource\Wc

rkDatlnExcel\BRADU

Full Precision

OFF











Number of Observations' 75











Number of Variablesl 4























Classical Mean Vector







y

x1

x2

x3











1 279

3 207

5 597

7.231























Classical Standaid Deviation Vectoi







y

x1

x2

x3











3 493

3.G53

8 239

11 74























Classical Covaiiance S Matrix







y

x1

x2

x3











122

9.477

20.39

31.03











9 477

13 34

28.47

41 24











20 39

28.47

G7.88

94 G7











31 03

41 24

94 G7

137 8











Determinant

1906











Log of Determinant

7 553









	









Eigenvalues of Classical Covaiiance S Mabii







Eval 1

Eval 2

Eval 3

Eval 4











0 914

1 G88

5 538

2231





















Critical Alpha

0 05



















Outlier Summary













Obs No

Kurtosis

Kurtosis (0 05)









14

53 97

25 2









12

38 2G

251G









11

43 57

251









13 591G

2519









Result: Once again, only the observations 11, 12, 13, and 14 were identified as outliers. The other 10
outliers could not be identified due to masking effects.

7.2.1.3 Identifying Causal Variables

Once an outlier test has been performed, the user may wish to identify the variables (if
any) which are responsible for each discordant observation. This can be done by
selecting the "Causal Variables" option from the pull-down menu. However, there are

235


-------
several other methods (e.g., Q-Q plot of individual variables, bivariate scatter plot with
tolerance ellipsoid) available in Scout that can also be used to identify variables that
might cause an observation to be an outlier. The details of this classical method can be
found in (Garner, et al. (1991a and 1991b). This method retests each discordant
observation with one variable excluded at a time. Thus, each discordant observation is
tested p times using all subsets of p-1 of the variables. A variable is listed as causal only
if absence of that variable prevents rejection of the outlier. Although this procedure is
based on iterations of rigorous tests of hypothesis, the user should consider its results
only as general guidance and not as definitive proof of the cause. This method also
requires some additional research.

I. Click Outliers/Estimates > Multivariate S> Classical > Casual > Distances
or Kurtosis.

is

Scout; 20Q8, -- [D^arain^Scqu^t^for^Windoy/sAScoutSourceHVorkDatlrtL^elVBRADt),]!

Outliers/Estimates

20 File Edit Configure Data Graphs Stats/GOF
Navigation Panel

Name



RegPROPOut ost

RenPRnPniit a net

1

Count

Univariate

Recession Multivariate EDA GeoStats Programs Window Help
>_U	1 I 1 1 5	I fi

Multivariate ~ I Classical

97

101

mi

1.

Robustylterative ~

95

20 5|
?n?r

Max MD
Kurtosis

Causal ~ I Distances

Kurtosis

a

2. The "Select Variables" screen (Section 3.4) will appear.

o Select two or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options."

236


-------
luyi.i'.wmmii^ -ini xi

- S elect Cntical Alpha 	

C- 0.010

0 0 025	!

0 050
O 0.100
c 0.150
C 0 200
C o 250

OK

Cancel



Select the required "Critical Alpha" and click "OK" to continue
"Cancel" to cancel the Causal test.


-------
Output for Causal Variables using the MD test for outliers.
Output for MD test for outliers.

Data Set used: Bradu.

Classical Causal Variable Routine UsngMDs

Usei Selected Options

Date/Tme of Computation 11/6/2008 7 47 44AM

Fiom Fte D \Naiani\SciwLFo«_Wir^ows\ScoutSourc«\Wo(kDdtlr£xcel\BRADU

FuS Recision OFF

Numbej of Observations 75
Nurrter of Cofumnsr4

__1

0 it ea! Alpha 0 05



MaxMD (4,0 05)

1743

I

Max MD (3,0 05)

1555

_	i	

Net* Data Matrix (Outlier Rows)

y ix1

i4 oi~~j ii

x2

""34

;X3

1 34

12	j -04 , 12
Tl ; ^0 2 _j 11

13	0 7 12

23

24
26

1 37 ;

L 35 !

i 34 " | " "

Obs No
~ ~ 14
n
fi
~ 13

MD_Distaree
437~

PVakie

r~



27 75

34 26
"56 98"

Row 14

Obs No I Observed

x2 is a Causal Variable

Predated
i 24 T5

Row 12

I

__ _L

y is a Causal Variable

Obs No | Observed

12 -0 4

Row Tl~

Redcted
1133

y is a Causa) Vatiable

Obs No^J Observed	J Redcted

11) -02

Obs No | Obse/ved
111 35

I 1038
i3 is a Causal Variable

| Redcted

j 2958"

Row 13

y is a Causal Variable

Obs No ^Observed	1 Redcted

13 07

Obs No | Observed

"13 "IF"

I 9 81
r1 is a Causal Variable

J Redcted
I 1T32

13 26

1 .

*2 is a Causal Variable

Obs No | Observed	I Predcted

_ |_ ____
*3 is a Causal Variable

Obs No| Observed	j PTedicted

13 34"	* | 32 54

238


-------
Results: Observations 14, 12, II and 13 are identified as outliers with variable "x2" in 14, variable "y" in
12, variable "y" in 1 1 and all variables in observation 13 as potential causal variables. The predicted value
is obtained by using regression with the causal variable as the dependent variable and other variables as the
independent variables.

Output for Causal Variables using the Kurtosis test for outliers.





Classical Causal Variable Rountine Using Kutass



User Selected Options











Date/Time of Computation

1/16/2009 11 -45 48 AM









From File
Full Precision

D \Narain\Scout_For_WindovjsVScoutS ource VWorkDatlnE xce[\BRADU
OFF """

Number of Obsavabons! 75

- - J

__

	 -

—

Number of Columns14
Critical Alphaj'O 05

.

_j





—

Full Cutoffi 25 2002

i











Reg CuloK

161643













|









New Data Matiix(Oulliei Rows)









V xl
"01" j "11

k2
" "34

x3 ¦





	

—

¦0.4 | 12

1il i~Ti

23

-24 —

37 :

35 " 1"

	

	

""



0.7 | 12

26

34 !









0b$ No iKuitosis

- —

P Value





r "

	

14 53 9678865248759

N/A









12 38.2586440611149

N/A









ll!« 570)927137092

N/A









13j 59 1622093139548

N/A









Row 14













x2 is a Causal Vaiiable













ObsNo j Observed

Predicted











14;34

17 3214183128282























Row 12 | |

i





y is a Causal Variable













ObsNo j Observed

Predcted











12-04 [11 999899S724309









i
I









Row 11













y is a Causal Variable













ObsNo j Observed

Predicted











V2

11 6785030135424









*3 is a Causal Variable











ObsNo .Observed

Predicted











11(35

23 8400404984235























Row 13













pis a Causal Variable













ObsNo!Observed

Predicted



'







13,0 7

12 0185193167481 "









x3 is a Causal Variable











ObsNo. | Observed

Predicted











13t 34

26 7004459115258

_

. __

	



239


-------
7.2.2 Robust Outlier Testing

Detection of multiple (one or more) anomalies in multivariate data sets is a complex
problem. Considerable research has been performed on this topic. Many classical and
robust outlier identification methods have been developed since early 1970s.

Mahalanobis distances (MDs or Mds) play the key role in identifying the outlying
observations. As is well known, multiple outliers tend to influence the estimates (even
robust estimates) of the means, variances, covariances, and the MDs significantly.
Therefore, the MDs get distorted by the same observations that they are supposed to find.

In an effort to identify the best possible method(s) to identify outliers, the developers of
Scout considered several classical as well as robust outlier identification and robust
estimation methods.

Scientists dealing with the multivariate data realize that there is no substitute for the
graphical display of their data. Therefore, the developers of Scout have put extra
emphasis on formal graphical displays of multivariate data sets. The Scout outlier
module offers the following methods to identify outliers in multivariate data sets.

o	Classical (using the Max MD as proposed by Wilks in 1963)

o	Sequential Classical

o	Huber

o	Extended Minimum Covariance Determinant (MCD)

o	Proposed (PROP)

o	Multivariate Trimming (MVT)

The critical values used for the Max MD statistics are obtained using the Bonferroni
inequality and the scaled beta distribution: (n-l)*(n-l) Beta (p/2,(n- p-l)/2)/n of MDs.'
Often, because of the computational ease, an approximate chi-square distribution with p
degrees of freedom is used for the distribution of MDs. The difference between the two
options is significant, especially when the dimensionality, p, is large (p = 4 is large
enough for the sample of size n = 100). For details, one can refer to Singh (1993). For
comparison sake, both distributional options have been incorporated into Scout.

All of the outlier methods listed above (except for the classical method) are iterative. The
MCD method (Rousseeuw and Van Driessen (1999)) has been extended to accommodate
other options. For example, instead of obtaining the MCD by minimizing the
determinant of a covariance matrix based upon h = [(n+p+1/2)] observations, one can
choose other values for h. The objective is to find "real and only real outliers," and not to
identify outliers using the maximum break down point option. The multivariate trimming
(MVT) method (Devlin, S.J, et al. (1981)) is also available in Scout to identify outliers.

The Huber and PROP methods represent M-estimation methods based upon the Huber
(Huber, 1981) influence function and the PROP (Singh, 1993) influence function,
respectively. The iterative process reduces the influence of potential outliers iteratively.
This is especially true for the PROP influence function. The convergence is normally
achieved in less than 10 iterations. A default of 20 iterations is used in the Scout

240


-------
software. In those iterative procedures, initial estimates of the population mean vector
and the variance covariance matrix are required. Several initial estimate methods are
available in Scout. Specifically, classical maximum likelihood estimates (MLEs) and
various robust estimates (e.g., median, MAD or IQR; median, OKG; median, KG; and
MCD) are available as initial start estimates. The use of robust initial estimates is
recommended for improved and more resistant estimators of the population parameters at
the final iteration. The use of the PROP influence function with an initial robust start
(e.g., OK.G estimate of covariance matrix) seems to be very effective in identifying
multivariate outliers and computing robust and resistant estimates of the mean vector and
the covariance matrices.

1.	The sequential classical method performs the classical procedure (with a classical
initial start) by iteratively removing outliers at each iteration. The MDs are
compared to the Max (MD) critical value (e.g., for a = 0.1). That is, there is a
hard rejection point (= Max (MD)) for outliers. This procedure suffers from
masking effects when multiple outliers are present. This masking effect can be
reduced if an initial robust start is chosen instead of the classical initial start. This
is illustrated in the graphical comparison section in the following.

2.	The Huber procedure uses the Ruber influence function (Huber 1981) which
assigns unit weight to the observations coming from the central part of the
distribution and reduced weight coming from the tails of the underlying
distributions. However, outliers always leave some influence on Huber estimates.

3.	The MCD method uses the minimum covariance determinant (MCD) method of
Rousseeuw and Leroy (1987). Its objective is to find a set of observations in the
data set whose covariance matrix has the lowest determinant. The fast algorithm
to compute the minimum covariance determinant estimator has been incorporated
into Scout.

4.	The PROP procedure uses a smooth redescending influence function and uses the
cut-off points from the distribution of the MDs. Both options, the Beta
distribution and the chi-square distribution, are available in Scout. The use of
beta distribution is recommended. Using this influence function, the extreme
outliers coming from the tails of the contaminating distribution get almost
negligible weights. This procedure provides an automatic way of dealing with the
outlying observations present in a multivariate data set. A tuning constant, "c,"
can be used to make the influence function more resistant to outliers. In most
applications, c = I works very well. However, when proportion of discordant
observations increases, smaller values of c (< 1.0) are recommended. This
procedure also works very effectively in estimating the principal components
(PCs), including the low variance PCs. Probability plots of the PCs based upon
this influence function are good enough to reveal all kinds of outliers, including
those which might be inflating variances and covariances inappropriately and
those which might be violating the correlation structure imposed by the bulk of
the data.

241


-------
5. Robust procedures based upon multivariate trimming (MVT) often work well in
estimating the robust PCs and revealing discordant observations. A fixed
proportion, p (0.05, 0.1, 0.2 etc.), of the observations with the largest values of
MDs is temporarily set aside and the estimates of the mean vector and variance
covariance matrix are recomputed based on the remaining n (1 - p) observations.
This iterative procedure, which requires computations of those MDs at each step,
stops as soon as the desired degree of accuracy has been achieved.

As mentioned before, another important aspect of robust outlier detection is obtaining an
initial estimate to start the iterative robust procedures. Scout provides several procedures
to compute the initial estimates of the mean vector and the covariance matrix:

® Classical
o Sequential Classical
® Robust Median/MAD

° OKG (Orthogonalized Kettering Gnanadesikan, Maronna-Zamar, 2002)
° KG (Kettering Gnanadesikan, 1972)

O MCD

The classical method uses the mean vector and the variance-covariance matrix as the
initial estimate.

In the outlier module, the sequential classical method computes mean vector and the
covariance matrix iteratively by removing outliers (observations exceeding the Max
(MD) at each iteration.

The Robust Median/MAD (Median Absolute Deviation) method uses the median vector
as the initial estimate of the location vector. For the dispersion matrix, the classical
variance covariance matrix is used, but the diagonal elements are replaced by the simple

If the MAD is equal to (or approximately equal to) 0 (as in Fisher's Iris data set), then the

separately for each variable. This is called the IQR fix in Scout.

The OKG (Orthogonalized Kettering Gnanadesikan) method uses the median vector as
the initial estimate of the location vector. In practice, such a KG covariance matrix may
not be positive definite (can yield even negative eigen values). The dispersion matrix of
the Kettering Gnanadesikan can be orthogonal ized using the procedure described by
Maronna and Zamar (2002) to get a positive definite dispersion matrix. This procedure i
also available in Scout.

robust estimate of the variance, given by

242


-------
For the MCD method, the objective is to find a subset of some specified size, h
(n/2 Multivariate E> Robust > Sequential Classical.

Scou(4.(^'[p.:\Harain\Sco.ut_j7or._Windo>ys^\ScoutSource\WorkDatlhExcel\BRADIlJ]j

Outliers/Estimates |

p9 File Edit Configure Data Graphs 5tats/GOF
Navigation Panel

Regression Multivariate EDA GeoStats Programs Wndow Help

Name

D \Narain\Scout Fo

0

Count

Classical



1

l;

2 ~
~3~
4f

97
ToT
~T03
95

1«_

95|
"107|
99I
"103

"iotf

Robust/Iterative > | Sequential Classical

-r

20 5

20	2
215

21	1
"204

Huber

Extended MCD
MVT

OKG Reweighted
PROP

Method_Comparison

T-

8

2. The "Select Variables" screen will appear (Section 3.4).

o Select the variables from the screen.

o If various groups are available and the analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.

° Click the "Options" button for various options.

243


-------
11	W'ultiyaoiatej f&b.us,t< %gju&njHaAtC.la^.i^ot

111

"Correlation R Matrix -
(* Do Not Display
C Display

"Intermediate Iterations

(*	Do Not Display
r Display Every 5th

C	Display Every 4th

C	Display Every 2nd

C	Display All

"Select Number of Iterations

r i

[Max = 100]
pCutoff for Outliers

Critical Alpha
(105

OK

Cancel

A

o Specify if the correlation matrix of the final robust estimate is to be
displayed or not. The default option is "Do Not Display."

o Specify the number of iterations to be computed. The default is "10."

o Specify if the intermediate iterations be displayed or not. The default
"Do not display."

o Click "OK" to continue or "Cancel" to cancel the options.

o Click the "Graphics" button for the graphics options and check the three
check boxes to get the following screen.

H Sequential G.lassiea.() Qca jjfiibs; Qjj.tions;



F7 Index MD Plot
F7 Distance-Distance MD Plot
F? QQ MD Plot

_QQ Plot Options 	

MDs Distribution
f* Beta	Chi

Title for Index MD Plot

Sequential Classical
Title for Distance-Distance Plot

Sequential Classical
Title for QQ MD Plot

Sequential Classical

Graphics Critical Alpha
I 005

Cancel

OK




-------
o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the sequential outlier estimates.

o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
sequential outlier estimates.

o Specify the "Title for Q-Q MD Plot." Select the distribution required
for the "Q-Q Plot" and the "Graphics Critical Alpha" for identifying
the outliers.

Note The "Graphics Critical Alpha" should match the "Critical Alpha" from the
Options Multivariate Robust Sequential Classical window to obtain the same
outliers The user should type suitable titles related to the data set.

o Click "OK" to continue or "Cancel" to cancel the options.

o Click "OK" to continue or "Cancel" to cancel the sequential classical
procedure.

Output example: The data set "BRADU.xls" was used for Sequential Classical. The
outliers are removed (down weighted from 1 to 0) at the end of each iteration and the
location and scale estimates are calculated at the end of each iteration.

245


-------
Output for Sequential Outliers method.
Data Set used: Bradu.



Multivariate Robust Sequential Classical OutEef Anal)>sts

User Selected Options



Date/Time of Computation

3/4/2008 904:35 AM

From File

D: \N arain\S cout_For_Windows\S coutS ource\WorkD atl nE xcelVB RADU

Full Precision

OFF

Display Correlation R Matrix

Do Not Display Correlation R matrix

Number of Iterations

10

Show Intermediate Results

Do Not Display Intermediate Results

Title for Index Plot

Sequential Classical

Title for Distance-Distance Plot

Sequential Classical

Title for QQ Plot

Sequential Classical

Graphics Critical Alpha

0.05

MDs Distribution

Beta

Number of Observations

75













Number of Selected Variables

4













Max Squared MD (0 05)

1743



























Classical M ean Vector









V

x1

x2

x3













1 279

3 207

5.597

7.231



























Classical CovaiianceS Matrix









V

x1

x2

x3













12.2

9.477

20 39

31.03













9 477

13.34

28.47

41.24













20.39

28.47

S7.88

94.S7













31.03

41.24

94 87

137.8













Determinant

190G











Log of Determinant

7.553























E igenvalues of Classical Co variance S Matrix









Evall

Eval 2

Eval 3

Eval 4













0 914

1 G8S

5.538

2231









































3 Outliers were found using Classical Method







4 Outliers were found using Sequential Classical1







246


-------
Output for Sequential Outliers method (continued).

SquaredMDs

Obs

Classical

Gequenlial

Weights

1

G02G

5G82

1

2

6.729

6.588

1

3

7.209

7.197

1

4

6 321

9.265

1

5

6.339

6.751

1

G

7.036

6 755

1

7

8 235

8 70G

1

8

6 714

6 788

1

9

6 445

7.52

1

10

7 34G

7.215

1

11

18.61

333.7

0

12

27 74

3GG4

0

13

14 79

310 2

0

14

43.7

512.8

0

15

3.386

5.109

1

1G

4.783

5.G89

1

17

2.004

2.49G

1

18

0.751

0.764

1

19

1 408

1.934

1

20

2 556

2 954

1

21

1 281

2173

1

22

2.578

3 282

1

23

1.216

2.757

1

24

1.321

3.455

1

25

0 658

1.394

1

2G

1.413

4.544

1

27

2.492

5 615

1

28

0.7G5

1.829

1

29

0.3G4

1 577

1

30

2.793

4.472

1

31

3 395

3 219

1

32

1.772

2.704

1

33

0 9G7

3 073

1

34

1 475

1 797

1

35

1 58

1.622

1

3G

1 008

3.66G

1

37

3.3G8

4 751

1

(The complete output table is not shown.)


-------
Output for Sequential Outliers method (continued).

Sequential Classical Estimates











Sequential Classical Mean Vector

V

x1

x2

x3





1 348

2.739

4.406

5 666











Sequential Classical Covariance S Matrix

y

x1

x2

x3





12.8

10.G3

23.09

34.89





10.63

9.938

19.57

29.69





23.09

19.57

43.69

64.82





34.89

29.69

64.82

99.08





Determinant

56.61



Log of Determinant

4.036









Classical Kurtosis

53.97



Sequential Kurtosis

8084



Results: Four (4) observations (II, 12, 13 and 14), with squared distances greater than the Max (squared
MD), were given zero (0) weights (hard rejection) and were considered to be outliers.

248


-------
Graphical Output for Sequential Outliers method (continued).

Sequential Classical

549.14
529.14

463.14
449.14
429 14
409.14
389 14
369 1 4
349 14
329 14
309.14

73

¦H 26914

C 229.14

16914
149.14
12914
10914
89.14
5914

1 j-95% Mmmiffi [Largest hC) Lirrf »17 4345

-1086
-30 86

-50-M '	,	,	1			(			;	1	1	,	

-6	4	14	24	34	44	54	&4	74

Index

Sequential Classical

509 14

489.14

46914

44914

429.14

40914

38914

369.14

§ 343 1 4

1 32914

3 309.14
XT

V) 289.14

Classical Squared MD

249


-------
Sequential Classical



N

75 0000

54914

P

40000

529 14

Slope

33.2001

509 14

M Intercept

-41 5886

Correlation Coefficiert

0 7184

489.14

Critical Coreeiation (0.05)

0.9941

469 14

Kurtosis

8,063 6558

449.14

Critical Kixtosis (0.05)

252002

Skewness

140.0560034

429 14

Critical Skewness (0.05)

2.3990

409.14
389.14
369 1 4
349.14
Q 32914
j 309.14

is 28914

V)

« 269.14

"53 24914

1.9 14

16914
149.14
129.14
109.14

9.14

29 14 195%Maxtnun(LargestMP)Umt-17 4345~|

6 7 8
Beta Quantiles

Graphical Interpretation: The observations between the "Warning (Individual MD) Limit" and
"Maximum (Largest MD) Limit" lines represent borderline outliers and may require further investigation.
The Warning Limit represents the critical value from the scaled beta distribution of the MDs at a specified
level of significance (here, 0.95), and the Maximum (Largest MD) Limit represents the critical value of the
Max (MD) obtained using the Bonferroni inequality (details in Singh, 1993).

7.2.2.2 Huber

1.

Click Outlier/Estimates ~ Multivariate ~ Robust ~Huber.

3Scout 4.0 [D:\Narain\Scout_For_Windows\ScoiitSource\WorkDatlnExcBl\BRADU]

Outliers/Estimates

¦y File Edit Configure Data Graphs Stats/GOF

Navigation Panel I

Regression Multivariate EDA GeoStats Programs Window Help

Name

D:\Narain\Scout Fo..

~

0

m

Univariate ~

1



J 5 6

7

8

Count

Multivariate ~ j

Classical

~

> I I





1

1 l!

9.7 1



Robust/Iterative ~

Sequential Classical





2

2

101 9.5

20.5

j

Extended MCD





3

3

10.3 10.7

20.2



MVT





4

4

95 9.9

21.5



OKG Reweighted





' 5

5

10 10.3

21.1



PROP

Method Jlomparison





6

6

10 10.8

20.4







The "Select Variables" screen will appear (Section 3.4).

250


-------
° Select the variables from the screen.

° If various groups are available and an analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.

o Click the "Options" button for various options.

plMultiyariatej Outlier, Options;

"Select Initial Estimates
Classical

Sequential Classical

Robust (Median, MAO)
(• OKG (Maronna Zamar)

KG [Not Orthogonalized)
r MCD

"Cutoff for Outliers

Critical Alpha
0 05'

r

"Correlation R Matrix -
(* DoNot Display
Display

~MDs Distribution 	

G Beta C Chi

-Select Number of Iterations

I

[Max = 50]

-Influence Function Alpha -
Influence Function

r

0 05
Alpha

"Intermediate Iterations

(* Do Not Display
C Display Every 5th
Display Every 4th
Display Every 2nd
<** Display All

OK

Cancel

o Specify the "Initial Estimates" to start the Huber iterative procedure.
The default is "OKG (Maronna Zamar)."

o Specify the distribution for the Mahalanobis distances in the "MDs
Distribution." The default is "Beta."

o Specify the "Critical Alpha," the cutoff for outliers. The default is
"0.05."

o Specify the "Number of Iterations." The default is "10."

o Specify the "Influence Function Alpha" for the Huber weighting
process. The default is "0.05."	*

o Specify "Correlation R Matrix." The default is "Do Not Display."

o Specify "Intermediate Iterations." The default is "Do Not Display."

o Click "OK" to continue or "Cancel" to cancel the options.

° Click the "Graphs" button for various options.

251


-------
Hcaiiiia? ®tpaaai0



f7 Index MD Plot

I? Distance-Distance MD Plot

f? QQ MD Plot

QQ Plot Options 	

MDs Distribution
(* Beta C Chi

Title for Index MD Plot

Huber Estimate
Title for Distance-Distance Plot

Huber Estimate
Title for QQ MD Plot

Huber Estimate

Graphics Critical Alpha
I 005

Cancel

OK |

A

o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the Huber estimates.

o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
Huber estimates.

o Specify the "Title for Q-Q MD Plot." Select the distribution required
for the "Q-Q Plot" and the "Graphics Critical Alpha" for identifying
the outliers.

Note- The "Graphics Critical Alpha" should match the "Critical Alpha" from the
Multivariate Outlier Options window to obtain the same outliers.

o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the Huber procedure.

Output example: The data set "BRADU.xls" was used for the Huber method. It has 75
observations and four variables. The initial estimates of location and scale for each group
were the median vector and the scale matrix obtained from the OKG method. The
outliers were found using the Huber influence function and the observations were given
weights accordingly. The weighted mean vector and the weighted covariance matrix
were then calculated.

252


-------
Output for Huber outliers method.
Data Set used: Bradu.

User Selected Options

Huber Multivariate Outlier Analysis

Date/T ime of Computation

1/8/20081.21.20 PM

From File

DAN arainVS cout_F or_Windows\S coutS ource\WoikD atl riE :-;cel\B RAD U

Full Precision

OFF

Critical Alpha

0 05

Influence Function Alpha

0.05

Initial Estimates

Robust Median Vector and OKG (Maronna-Zamar) Matrix

Display Correlation Fl Matrix

Do Not Display Correlation R matrix

Distribution of Squared MDs

Beta Distribution

Number of Iterations

10

Show Intermediate Results

Do Not Display Intermediate Results

Title for Index Plot.

Huber Estimate

Title for Distance-Distance Plot

Huber Estimate

Title for QQ Plot

Huber Estimate

Graphics Critical Alpha

0.05

MDs Distribution

Beta

Number of Observations! 75







|

Number of Selected Variables! 4

I











CriticalValues ;





Max Squared MD (0.05) 17.43



I
!





I ndividual S quared M D (0.05) | 9.128



i





M ultivariate Kurtosis(0.05) | 25.2

1











Influence Fn. Squared MD (0 05)

9128























Classical M ean Vector







V

x1

x2

5 597

x3

7 23\~





		





1 279

3 207





















Classical CovarianceS Matrix

	—





y

xl

x2

x3









9.477

20.39

31.03











9.477

13.34

28.47

41.24 |









20.39

28.47

67.88

94.G7











31.03

41.24

94. G7

. 1?-8_L









Determinant

190G 2B0803882B2









253


-------
Output for Huber outliers method (continued).

Eigenvalues for Classical Covariance S Matrix







Eval 1

Eval 2

Eval 3

Eval 4









0.914

1 688

5 538

223.1







			







	



MedianVector





V

ol

x1

x2

x3











1.8

2.2

2.1



	





	









MAD/0.6745 Vector Representing Standard Deviation

	





y

x1

x2

x3

|





0.89

1.927

1.631

1.779

|









|







OKG MeanVector



	



V

x1

x2

x3









1 258 | -7 552

-6.052

2.521





	















R obust 0 KG (M aronna Zamar) Covariance S M abix







y

x1

x2

x3



	

	





0 598

0.243

0.115

o.Tob"

-0.231
024f







0.243

2.856











0.115

0.108

2.175

0 00402











-0.231

0.241

0.00402

2 34











Determinant

7 84553740487618





















Robust OKG Eigenvalues







Eval 1

Eval 2

Eval 3

Eval 4











0.53

2.149

2 316

2.975

|















Final Weighted Mean Vector







V

x1

x2

x3

|

	





1.332

2.849

4.68

6.033

|







|



Final Covariance S Matrix







y

x1

x2

x3



	







12.76

10.57

22.94

34 67









10.57

10.14

20.08

30 37











22.94

20.08

45

66 52
101 4











34.67

30.37

66.52











Determinant

119 506396724439









254


-------
Output for Huber outliers method (continued).

j Observations with Squared MDs greater than 17.43
| may be considered as potential outliers!

Observation

Number

1

10

TV

12

Tf
IT

~T5

16

17

18

"i§"

20

TV

~22~
~23
~24~
~25

26
"27"

28
"29
"30"

IV

~32

13

"34"

Final

Classical | Initial Final
Squared MD Squared MO Squared MQ Weights

i54 r

69978

6 026

6.729
7 209

761.9

6 321

6.339

7691
7G56~

7.036

8.235

701 8
74072"

G 714

6.445

692 4
7408"

7 346

TaeV"

27.74

"14 79"

7191
~69l7T

732.6
712.8"

43 7
"3.386"

4.783

2.004

0.751
17408"

2.556
" 1.281"

2.578
"17216""

1.321
1.658"

1 413

2.492

0.765

0.364

2.793
"3395"
T772"
1 967~
'T475~

907

""17944

2 228

2.761

0.303
17675"

5 709

6.635

7183

7 201

6 216

6.7

1757"

6 71

6.606

7.218

160.8

0.239

182.7
"1487V"

0 225
125

270.9
3.909~

0184

5.488

2.463

0.789
"i.9lT

1 234
17165"

1.322

2.84

1.542
"3846
2.229
2387"

1.029
1.612
T945"
7.721"
T478"
1.674
"4.46V

2.828
1.867"

3 273
"V843"

2.856
1.261"

2.699
"37599"

1.64
"i 135"

4.316
17276""
26

1.028"
T.824"

255


-------
Output for Huber outliers method (continued).

35

1 58
1 oW

0.923

1 561
1.945

1

	





36

2.086

1





37

3.368

Z047

3.842

1







38

0.889

2.826

3.046

1

	

	





39

1 935
~"1 237

3.883
17461"

2.76
1.314"

1





40

1





41

3139

1.386

3.637

1





42

3.358

2.147

3.89

1







43

4.108

2.526

6.291
4 734

1





44

2.686

2.505

1





45

1.278

3.85

1 671

1







46

2.225

1.865

3.851

1







47

4.332

~zm~

5 896
'lT054~

4.947
2.641

1

		





48

1





48

2 552
0.278

1.422

2.927

1

	





50

0 856

0.74

1





51

1 858
4.587

1.936
3.562 ~~

2.959
4898"

1





52

1
1



	

53

4 889

4.616

7.612



54

2 604

2.131

4.971

1





55

1.622

1.159

2.19

1
1

1 ~





56

1.898

1.137

3.037







57

0.773
1.988"

2.385
1 208 ~

2.402
~ 279

	





58

1





58

0.463
_3J341T

0.523

0.681

1





	

60

2.628

4.791

1



61

2.806
" 0 675"

1.957
193

3.26

1 i





62

3 022

1



	



63

1 757

2.344

2 934
1.943

1





64

1.112

1.602

1







65

1 336

1.909

3.045
1.913""

1





66

1.B96
0.445
~ 2AS2~

3.273

1





67

1.752
47222™

0.688
5^049"

1





68
69~

1

	





1.154

2.636

1 83

1





70

1 042

1.609

1.423

1







71

0.413
1 .Tl4

0114

0 381

1

	





72

0.732

1.117

1
1





73

2.207

1 311

2.576





256


-------
Output for Huber outliers method (continued).

74

2.742

2.635

3.069

1





75

3.632

2.62

5.12

1













Classical

Initial

Final







Multivariate Kurtosis

53.97

101431

2076













Results: Four (4) observations (11, 12, 13 and 14), with squared distances greater than the squared MD
17.43, were given weights between 0 and 1 (soft rejection) and were considered to be outliers. Note that
due to masking effects, the Huber method did not identify the remaining 10 outliers present in this data set.

Huber Estimate

293.33
283.33
273.33
263.33
253.33
243.33
233 33
223 33
213.33
203.33
193.33
183 33
173.33
163.33

M 153.33

Q

S 143-33

a> 133 33

-Q

123 33
113.33
103.33
9333
83.33
73.33
63.33
5333
4333
3333
2333
1333
333
-667
-1667
•2667

195% Maximum iLxyst MD) Ln> ¦ 17.4345

,	tfj Limi - 91281

Index

257


-------
Estimate

293.33
283.33
273.33
26333
25333
24333
233.33
223.33
213.33
20333
193.33
16333
17333
163.33

» 153.33

Q

S 143.33

© 133 33
12333
113 33
103.33
9333
6333
7333
63 33
53.33
4333
3333
23.33
13.33
3.33
-6.67
-16.67
-26 67

|9S% Maximum (Larged MP) Lin-J ¦ 17 4345

Slope
Intercept

Correlation Coefficient
Crtical Correlation (0 05)
Kulosij

Crticarf Kurtosis (0.05)
Skewness

Crtical Skewness (0.05)

750000
40000
170079
-20.2021
0.7323
09941
2,075.7165
25.2002
15,680 5992

Beta Quantiles

|9S% Warrrtg QndrvKiuel MP) L>nl» 91291 |
10 11 12 13 14

258


-------
Graphical Interpretation: Observations (if any) between the "Warning (Individual MD) Limit" and
"Maximum (Largest MD) Limit" lines may require further investigation. Those observations have
reduced weights between 0 and I.

7.2.2.3 Extended MCD

1.

Click Outlier/Estimates > Multivariate Robust E> Extended MCD.

Scout' 20,08;--[D:J\Narain\ScoutJor^W,indp^\ScoutSource\WorkDatliiE-xcel\BRAl)U]j

Outliers/Estimates 1

File Edit Configure Data Graphs Stats/GOF
Navigation Panel

Regression Multivariate EDA GeoStats Programs Window Help

Name

PROPOut ost
PROPIndex of Obs
PROPDD gst
PROPQQ gst

Count

Univariate ~ L

Multivariate ~

~1

97

Classical

Robust/Iterative ~

1011

95

103

107:

9 5|

10;

99
103

20 5

202

21 5
21 1

-i

Sequential Classical
Huber

Extended MCD

MVT
PROP

Method Comparison

2. The "Select Variables" screen will appear (Section 3.4).
° Select the variables from the screen.

o If various groups are available and analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.

° Click the "Options" button for various options.

§1 QptionsMuttiQutMerMEPj

~MCD Options	

Initial Subset Strategy

W Ad|ust ('h') Value

l>7 Adjust Initial/Final Subsets

!7 Display Minimum Determinant Data [h.p]

"Initial Subset Stiategy 	

(* Size = p+1 (Default)
f Size = 'h'

All Subsets of Size = p+1 (p < 10)

—Ad|ust 'h' Value 	

C (n + p + 1) / 2 (Default)

(• User Specified
X Non-Outliers
I 075

— Irutial/Final Subsets 	

Number of Best Retained

I i°

Number of Elemental Sets
|	500

OK

Cancel

A

o When all of the checkboxes are checked in the "MCD Options," the
options window looks like the one above.

259


-------
o "Initial Subset Strategy": this is used to specify the size of the initial
subsets. It can be equal to the number of variable plus 1 (p + 1) or of
size equal to h, the number of non-outliers.

o Specify the "Initial Subset Strategy." The default is "Size = p+1."

o "Adjust "h" Value": this is used to specify the number of non-
outliers. It can be equal to —or a percentage.

Specify the Adjust "h" Value. The default is

o "Adjust Initial/Final Subsets": this is used to specify the number of
elemental subsets of size "p+1" or "h" to be used to start the C-Step
operations of the MCD algorithm and the number of subsets with the
lowest determinant of the scatter matrix to be retained to continue C-
Steps until convergence.

o Specify the "Adjust Initial/Final Subsets." The defaults are "10" and
"500" respectively.

o Click "OK" to continue or "Cancel" to cancel the options.

Click the "Graphs" button for various options.

@1 MEDj Gra phicSj Options)

P Index MD Plot

l«? Distance-Distance MD Plot

If QQ MD Plot

Title for Index MD Plot

MCD Estimate
Title for Distance-Distance Plot

MCD Estimate
Title for QQ MD Plot

MCD Estimate

Cancel

OK

A

o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the MCD estimates.

-o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
MCD estimates.

o

Specify the "Title for Q-Q MD Plot."


-------
o Click "OK" to continue or "Cancel" to cancel the options.

° Click "OK" to continue or "Cancel" to cancel the MCD procedure.

Output example: The data set "BRADU.xls" was used for the MCD method. It has 75
observations and four variables. The MCD estimates of location and scale were obtained
using the best "h" subset. Then this scale estimate was adjusted for multi-normality.
Using this estimate, outliers were obtained and weighted (hard weighting) accordingly.
The weighted mean vector and the weighted covariance matrix were then calculated.

Output for MCD outliers method.

Data Set used: Bradu.

i Multivariate Robust MCD Outliei Analysis

User Selected Options

Date/T ime of Computation 11 /8/2008 4 52 34 PM

Fiom File |D \Naram\Scout_For_Windows\ScoutSource\WorkDatlnExcel\BRADU
Full Precision l OFF

sd Number of Intermediate Subsets
Number of Best Retained Subsets

Initial Elemental Subset Size 1500 Random Subsets of Initial Size of 'p + 1' will be used
'h' Value of '(n" 0 75' wiO be used
500 "

500

Usei Selected 'h'Value
Selected Number of Initial Subsets

110

Minimum Determinant Data [h.p] | Will Not be Display in Output
Title for Index Plot |MCD Estimate

Title for Distance-Distance Plot ' MCD Estimate
Title"for QQ Rot iMCD Estimate

Giaphics Critical Alpha 0 05

Number of Observations' 75

Number of Variables

4

Size of Elemental Subsets

5

Number of Initial Subsets

HjiT"

Number of Intermediate Subsets

500

'h' Value

56

Bieakdown Value

0.2667

Chisquaie with 4 DoF (0 975) 11 1510
Unsquared Chisquare with 4 DoF (0 975)| 3 3393
Chisquare with 4 DoF (0 50)!3 3531

	

T"

Classical Mean Vectof

— I

y

xl

x2

x3



1 279

3 207

5 597

7 231





Classical Covariance S Matrix

y

*1

x2

r^r-



122

9 477

20 39

31 03



3 477

' T334 "

"2847

TT24



20 39

28 47

67 88

94 67



31 03

41 24

94 67

137 8



_L._.

I

Determinant|

1906 26080388263

261


-------
Output for MCD outliers method (continued).









B est 'h' S ubset of 56 0 bservations







15. 16. 17.

18. 19,

20, 21.

22, 23

24. 25, 26,

27. 28.

29, 30.

31, 32

"33. 3

35. 3G, 37,

38, 39,

40. 41.

42, 44

45, 46, 48,

49. 50,

51, 54,

55, 56

57. 5

59. GO. 61,

62, 63,

64. 65.

66. 67

69, 70. 71.

72, 73,

74. 75





i I















MCD MeanVector















V

x1

x2

x3















-0 0893

1 554

1 861

1 707





































MCD Covaiiance S Matrix











y

x1

x2

x3

|











0.303

0153

¦0.00848

•0.138















0.153

1 036

0.0987

0167















•0 00848

0 0987

1 04

0158















-0138

0167

0158

1 084















MCD Determinant

0 282788956390238

































Adjustment Factor lor Midtinormality

1 41188216564302













|











Adjusted MCD Covaiiance S Matrix













y

xl

x2

x3















0428

0 215

¦0 012

¦0.194















0215

1 463

0139

0 236















•0 012

0139

1 468

0 222















•0194

0 236

0 222

1.53















Adjusted MCD Determinant

1 12371519856147



























ObsjUnsquared 1 Weights

















1

3213

0

















2

33 3

0

















3

34 66

0

















4

34 76

0

















5

34.71

0

















6

33 2

0

















7

3419

0

















8

3313

0

















9

34.18

0

















10

33 79

0

















11

31 17

0

















262


-------
Output for MCD outliers method (continued).

12

13

32.15 |
31.63

!

°i °

i

i

i



—__—

14

35 3

0

_



15

W

1 932



1.927

1

T





17
TfT

1.651





0.699

1

__



	

19

1.227



20

1.794

1



	

21

1.656
1854

1



22

1

__ .



	

23

1 831



24

1.546

1

_

	

25

1.838



26

1 672
Z048~

1

...



	

27



23
29

1.176
fl'4

1

__

	

	

30

1.863

1
.

I

31

1.623





32

1.603

]m~

1



33

"34"
35"

I

2.036
1.642~~

1

"1



i

3G
37

1.491

l"796"_

1

.









38

1 811

1





39

2.122

1

i

___

"""1
..

.

_

i

_

_

	

	

40

1.074

T773~

41



42

43

1.663
2.331





44

45

46

1.925
2.003""
1.688
2.904"

		

		

47

	

48

1 617
184

	

49

1

	

50

1 241

1

263


-------
Output for MCD outliers method (continued).

52

2.336

1
1













53

3.05













54

1.953

1











55

1.208

1













5G

1 398
f~584~

1













57

1













5B

1.5

1













59

1.055

1













60

1.934

1

1











	

61

1.987











62

2.

1



	









63

1.641

1











64

• 1.677

1













65

1.885 | 1













66

1.736

1







	





67

1.246

1











68

2.339

1













69

1.568

1













70

2.123

1













71

0 912

1













72

0.815

1













73

1 761

1













74

1.639
2.VSf~

1













75

1















Obseivations with Unsquared MDs greater than 11.15



may be considered as potential outliers!



	



















Weighted Mean Vector







V

x1

x2

x3











-0.0738

1 538

1.78

1.687























Weighted Covaiiance S Matrix







y | x1

x2

x3











0.317

0.0587

0.00186
~ 0.0508

-0.105





	





0.0587

1.132

0.117









0.00186

0.0508
0.1 T 7

1.152

0.141











-0.105

0.141

1.07











Weighted S Determinant

0.410144007183786







264


-------
Output for MCD outliers method (continued).

Squared MDs

ObsjMCDMDs j Weights J Classical Mt

1

3G51

0

6.026

t



2

37. GG

0

6.729

i

I



3

39.4

o

7 209

I



4

39 48

0

6.321





5

39.42

0

6.339





6

37.81

0

7.036





7

38.75

0

8.235

|

8

37.53

0

6 714

I

9

38.77

0

6.445

I

10

38 22

0

7.346





11

3G 94

0

18.61





12

38.24

0

27.74





13

37.4

0

14 79





14

41 37

0

43.7





15

2.137

1

3.386





1G

2.298

1

4.783





17

1 9G8



2 004





18

0.794

1

0 751





19

1 336

1

1.408





20

2.198

1

2.556





21

1 988

1 | 1 281





22

1.929

1

2 578

|

23

1.943

1

1.216

i

i

24

1.776

1

1 321





25

2 064

1

0 658





2G

2 042

1

1.413





27

2.28G

1

2.492

I

28

1.268

1

0.765





29

1 269

1

0.364



30

2.111

1

2 793





31

1.723

1

3 395





32

1.919

1

1 772





33

1.633

1

0.967





34

2.35

1

1.475





35

2 02G

1

1 58





3G

1.845

1

1.008





37

2.114

1

3 3G8






-------
Output for MCD outliers method (continued).

40

41
42"
43

1.322

2.046"

T.967-

1

"T

.

1 237
3T3ET
37358

i
i

i

2 471

4.108

44

45	'

2192

1

2.68S
1 278 "
2225

I

2103

1" §01

1

~"T





46
4f

	

	

2 909

~ T'925-
2 269

1

4 332

48

49

1

2184
"" 2552~
0278""

50
5T

1 539

1

_ _

	

1 858

1.858

52

53

2 34

1

4.587





3148
2*231

1
1

4.889
_"2604"

	



54



55

1.377

1

1 622





56

. .

1.648
1 82

1

1.898
" 0.773'



	

58

1 756

1

1.988

		

		

59

1 291

~T342~

1

" " 1

0.463
S945-

60

		

- 	

61

2 244

2.806

62

63	"

2 254

1

0.675
'""1 757"

	•



1 923

1

64

65

1.986

1

1 112



	

2 082

1

1.336



66

2.112

1

1.696



67

1.362

1

0 445





68

2.483

1

2.462





69

1.758

1

1.154





70
71~"

2.173
1 133

1

1.042





0.413



72

0 937

1

1.114





73

1 723

1
1

2.207



74

2 032

2.742





75

2.23

1

3.632





- --- -- 	







Classical Kurtosis

5397

MCD Kuitosis

9 072E+11

Results: Fourteen (14) observations (I, 2, 3, 4, 5, 6, 7, 8, 9, 10, II, 12, 13 and 14), with squared distances
greater than the Max squared MCD MDs, were given a weight of zero (0) (hard rejection) and were
considered to be outliers. Also, note the large value of the MCD kurtosis.

266


-------
Output for MCD outliers method (continued).

MCD Estimate

1829.51
1729.51
1623.51
1529.51
1429.51
132951
122951
112951
102951
3 929.51
q 829 51

s

729.51
62951
529.51
429.51
329.51
22951
12951
2951
-70.49
¦170 49









¦|«feMt*(0.97S)-11 IS

MCD Estimate

182951
1729.51
162951
152951
142951
132951
1229.51
112951
102951

jf a

-------
MCD Estimate

162951
1729.51
1629 51
152951
1429 51
132951
122951
1129 51
1029.51

N	75 0000

P	40000

Slope	248 3939

Wefcept	-216 3694

Con station CoeWKaert	0 $421

(J 829.51
S

72951
62951
52951
429.51
32951
22951
129.51
2951
•70.49
-17049

IChaquare (0.975) « 11.15 |

Chisquare Quantiles

Graphical Interpretation: The observations greater than "the chi-square (0.975)" line may be considered
to be potential outliers. Those observations have weights of zero (0). To be consistent with the literature,
on the graphs generated for MCD method, chi-square quantiles (and not beta quantiles) have been used.

7.2.2.4 MVT

1. Click Outlier/Estimates ~ Multivariate ~ Robust ~MVT.

E

Scout ?008 - [C:\OLD__Drivo\MyFiles\WPWIN\SCOUT\Scou1 7008 Beta Test Version 1.00.00\Scout\Data\Scout v. .

Outliers/Estimates I

Rio Edit Configure
Navigation Panel |

I Regression Multivariate EDA Geo Stats Programs Window Help

LOG 1 00 09 PM »flnrormation] C:\OLD_Drive\MyFiieswvPWlNlSCOLrnscout 2008 Beta Test Version 1 (
\Scout\Data\Scout v 2 0 DataURISOUT DAT was imported into IRISOUT wst

268


-------
2.

The "Select Variables" screen will appear (Section 3.4).

o Select the variables from the screen.

° If various groups are available and an analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.

° Click the "Options" button for various options.

- S elect I nitial E stimates 	

C Classical

Sequential Classical

C Robust (Median, MAD)
(• OKG (MaronnaZamai)

C KG (Not Orthogonalized)
C MCD

¦.Cutoff for Outliers

Critical Alpha

0.05

"Correlation R Matrix ~
<• Do Hot Display
C Display

" Select N umber of I terations ~i

10

[Max = 50]

-MVT Trimming Percentage
T rim Percentage

I 01

-Intermediate Iterations

(*	Do Not Display

C	Display Every 5th

C	Display Every 4th

C	Display Every 2nd

C	Display All

OK

Cancel

A

o Specify the "Select Initial Estimates" to start the Huber iterative
procedure. The default is "OKG (Maronna Zamar)."

o Specify the distribution for Mahalanobis distances in the "MDs
Distribution." The default is "Beta."

o Specify the "Critical Alpha," the cutoff for outliers. The default is
"0.05."

o Specify the "Select Number of Iterations." The default is "10."

o Specify the "MVT Trimming Percentage" for the trimming process.
The default is "0.05."

o Specify whether or not to display the "Correlation R Matrix." The
default is "Do Not Display."

269


-------
o Specify the "Intermediate Iterations." The default is "Do Not
Display."

o Click "OK" to continue or "Cancel" to cancel the options.
° Click the "Graphs" button for various options.

HI MViTrG/aghibSj QfJ.tib.ns,

W Index MD Plot

1? Distance-Distance MD Plot

K? QQ MD Plot
QQ Plot Options 	

MDs Distribution
f Beta	Chi

Title for Index MD Plot

MVT Estimate

Title for Distance-Distance Plot

MVT Estimate
Title for QQ MD Plot

MVT Estimate

Graphics Critical Alpha
i 005

Cancel

OK

4

o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the MVT estimates.

o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
MVT estimates.

o Specify the "Title for Q-Q MD Plot."

o Select the distribution required for the "Q-Q Plot" and the "Graphics
Critical Alpha" for identifying the outliers.

Note The "Graphics Critical Alpha " should match the "Critical Alpha" from the
Multivariate Outlier Options window to obtain the same outliers.

o Click "OK" to continue or "Cancel" to cancel the options.

° Click "OK" to continue or "Cancel" to cancel the MVT procedure.

270


-------
Output example: The data set "BRADU.xls" was used for the Huber method. It has 75
observations and four variables. The initial estimates of location and scale for each group
were the median vector and the scale matrix obtained from the OKG method. The
outliers were found using the Huber influence function and the observations were given
weights accordingly. The weighted mean vector and the weighted covariance matrix
were then calculated.

271


-------
Output for MVT outliers method.
Data Set used: Bradu.

! MVT Multivariate Outlier AnaJpsis

User Selected Options



Date/Time of Computation
From File

1/8/2008 6:25:49 PM

D AN arain\S cout_For_Windows\S coutS ourceVWorkD atl nE xcelVB R AD U

Full Precision

OFF

Critical Alpha

0.05

T rimming Percentage
Initial Estimates

W4

Robust Median Vector and OKG (Maronna-Zamar) Matrix

Display Correlation R Matrix

Do Not Display Correlation R matrix

Number of Iterations

10

Show Intermediate Results

Do Not Display Intermediate Results

Title for Index Plot

MVT Estimate

Title for Distance-Distance Plpt

MVT Estimate

Title for QQ Plot

MVT Estimate

Graphics Critical Alpha

0.05

MDs Distribution

Beta

Number of Observations

75









Number of Selected Variablesj 4







I

Critical Values |



Max Squared MD (0.05)

17 43









Individual Squared MD (0.05)

9128



I



Multivariate Kurtosis(0.05)

25.2



|













Classical Mean Vector





y

x1

x2

x3









1.279

3 207

5.597

7.231



















ClassicalCovarianceS Matrix





y

x1

x2

x3









12.2

9.477

20.39

31.03 .









9.477

13.34

28.47

41.24









20.39

28 47

G7.88

94.67









31.03

41 24

94.67

137.8









Determinant

1906.26080388262







272


-------
Output for MVT outliers method (continued).

E igenvalues for Classical Covaiiance S Matow



Eval 1

Eval 2

Eval 3

Eval 4







0.914

1.688

5.538

223.1

I

i

	

MedianVector



V

x1

x2

x3







0.1

1.8

2.2

2.1















MAD/0.6745Vector Representing Standard Deviation



V

x1

x2

x3







0 89

1.927

1.631

1.779















|







OKG MeanVector







y

x1

x2

x3







1 258

-7.552

-6.052

2 521













Robust OKG [MaionnaZamai) Co variances Matrix



V

x1

x2

x3







0 598

0 243

0.115

-0.231







0.243

2.856

0108

0 241







0115

0108

2.175

0.00402







•0 231

0 241

0.00402

2.34







Determinant

7.84553740487618





_ _ 	I	





Robust OKG Eigenvalues



Eval 1

Eval 2

Eval 3

Eval 4 |





0.53

2149

2.316

2.975











Final W eighted M ean Vector



V

x1

x2

x3







0.9G8

2.418

3.672

4 566











FinalCovarianceS Matrix



V

x1

x2

x3







9.88

8.16

17.43

26.43







81G

7.891

14.78

22.53







17.43

14 78

32.71

48 33







2G 43

22.53

48.33

74.4







Determinant! 43 7076080629344





273


-------
Output for MVT outliers method (continued).

J Obseivations with Squaied MDs greater than 17.43
may be considered as potential outliers!

Observation

Classical

Initial

Final

Final





Number

Squared MD Squared MC

Squared MDWeights





1

6.02S

654.4

~ Mm"

_





2

6.729

699.8

9.167

1





___

7 209

761.9

9 825

1





4

6 321

769.1

131

0





5

6 339

765.6

9.918

1





6

7 036

701.8

9.236

1.





7

8 235

740 2

11.12

0





8

6.714

692.4

8.997

1





9

6.445

740.8

10.86

0





10

7.346

719.1

9.925

1





_ _

Ts'.GT

69172"

Tmf

0"





12

27.74

732.6

387.7

0





TT

14.79

7128"""

325.5

"0 ~





14

43.7

907

527.9

0





__

~ "1386

1 944"

"""4865""







1G

4.783

2.228

5.779

1





17"

~ T004~

2761"

" T357

"" "T





18

0.751

0 303

0.692

1





___

1408"

0675 "

"1.821

. _





20

2.556

1 234

2.838

1





zT

" 17281""

"" Tl65~

222"

_





22

2.578

1.322

3.12

1





23 '

Tm

284 ""

~2£U







24

1 321

1 542

3.501

1





"25"

" "T658

37846

1335 "

. - -





26

1.413

2.229

4.579

1





27"

"" "2.492"

2.387"

5.45







28

0 765

1.029

1.737

1





29

0 364

1.612

1 561

1





30

2 793

1 945

4.399

1





31

3.395

1.721

3.116

1





32

1.772

2.478

2.552

1





33

0.967

1.674

2.962

1





34

1.475

4 461

1.718

1





274


-------
Output for MVT outliers method (continued).

41

3139

1.386

3 732 |



|



42

3 353 |

2.147

4.224 j

1

I

43

4.108

2.526.

6 397

1

!



44

2686

2.505

5 298

1

i

45

1 278

3 85

1 564

1

i



4G

2.225

1 865

3 955

1

I

47

4 332

5 896

5 06

1

I

48

2.184

1.054

2.574

1





49

2.552

1.422

311

1





50

0 278

0.856

1 747

1





51

1 858

1 936

3.817

1





52

4.587

3 562

4 813

1





53

4.889

4 616

8 281

1





54

2 604

2131

5.445

1





55

1 622

1.159

2 062

1





56

1.898

1.137

3 033

1





57

0 773

2 385

3 679

1





58

1.988

1 208

3 251

1





59

0 463

0.523

1 106

1

I

GO

3 945

2 628

5 978

1





61

2.806

1.957

3.985

1

I

62

0.675

3.98

4 866

1

i

63

1 757

2 344

2 945

1





64

1 112

1 602

3 478

1





65

1.336

1 909

3 353

1





66

1.696

3 273

2 272

1

i

67

0.445

1.752

1.194

1





63

2 462

4 222

6 71

1





63

1 154

2 636

2 238

1





70

1 042

1 609

1 476

1





71

0413

0.114

0 331

1





72

1 114

0 732

1 016

1





73

2 207

1 311

2 601

1





74

2.742

2 635

3 807

1





75

3.632

2 62

5 231

1













j Classical

Initial

Final







Multivariate Kuitosis

53 97

101431

8820



j











I

Results: Seven (7) (= 10% of 75) observations (4, 7, 9, II, 12, 13 and 14), with squared distances greater
than the Max (squared MD), were each given a zero (0) weight (hard rejection) and were considered to be
outliers. In order to identify all of the 14 outliers, one has to use higher trimming percentages, such as
20%.

275


-------
Output for MVT outliers method (continued).

PROP Estimate

1830.81
1730.81
1630.31
1530SI
1430.81
1330.81
1230 81
1130.81
1030.81

730 81
630 81
53081
430.81
330 81
230.81
130.81

30 61 [95% Maximum (largest >P) Lknt ¦ 17 <345 |

-6919
-16919

PROP Estimate

1830.81
173081
1630.81
1530.81
1430.81
133091
123081
1130 81
103081

83081
730.81
630 81
530.81
430.31
330 81
230 81
130.81

276


-------
Output for MVT outliers method (continued).

75 0000
4 0000
260.0656
•233.1933
0 8414
09941
414.687 5837
25 2002
96624.191 0299
2 3990

123081

1130.81

103081

Q 930 81
5

830.81

£

^ 730.81
630.81
530 81
430.81
330.81
230.81
13081

30 01 l9S% Max""um CLwoeSl WDJ 17 *34S I

-8919
-169.19

•1 0	1	2 3 4	5	6	7	8 9 10 11 12 13 14

Beta Quantiles

Graphical Interpretation: As before, the observations between the "Warning (Individual MD) Limit"
and "Maximum (Largest MD) Limit" lines may also represent potential outliers.

Note: Many limes in practice, depending upon the trimming percentage value, this method may assign "0"
weights (may find more outliers than actual outliers present in the data set) to some non-outlying
observations with MDs smaller than the Max (MDs). In order to overcome this problem, at the final
iteration, Scout compares the MVT MDs with the critical value Max (MDs), and the observations with MDs
less than the critical value Max (MDs) are reassignedfull unit weight. Estimates of the mean vector and the
covariance matrix are then recomputed.

PROP Estimate	»*'•>«•

N

1830.81	p

"»«	,	nZta

Ccrretation Coefttclfcrt

1630 81	CrJicnl Coi relation (3D5J

M M	*	Kutoste

^ M	Crtcal Kurtosis (0 Q50

1430 81	m ^ -	
-------
7.2.2.5 PROP

1. Click Outlier/Estimates > Multivariate > Robust 6>PROP.

lH Scouti 4'..Qj - [B^araijnKcfiut jQC_Wjndmv^coutSourc%\^or(®alJ'nteel]BRABI!i]|

Outliers/Estimates



Navigation Panel |



0

Univariate > I I i I a I n r

7 | 8

Name

Count
| 1

¦ 2

MBtUISffliHSEi Classical M

i	1	

mimwmmdM

1

2

9 7

un

95

205

; - I Huber
; _ "j Extended MCD
! MVT

I

		

3

: 3

103

107
9 9

202

4

4

95|

21 5 j | OKG Reweighted

5

5

10; 103!
lof 108!

21 11
204

¦H PR0P I

	

	

e

6

; ! Method_Comparlson |

2. The "Select Variables" screen will appear (Section 3.4).

° Select the variables from the screen.

o If various groups are available and an analysis is to be done for those groups,
then group variables can be selected from the drop-down list by clicking on
the arrow below the "Group by Variable" button.

o Click the "Options" button for various options.

H M'u ItiVariatej QyUier/ Qp.tiqns;

-Select Initial Estimates
C Classical
C Sequential Classical

C	Robust (Median, MAD)

(*	OKG (MaronnaZamar)

f	KG (Not Orthogonalized)

C	MCD

-Cutoff for Outliers —

Critical Alpha
I 005

"Select Number of Iterations

10

[Max = 50]

"Influence Function Alpha
Influence Function
| 005
Alpha

"Correlation R Matrix -
(* Do Not Display
C Display

-Intermediate Iterations

<* Do Not Display

Display Every 5th
C Display Every 4th
C Display Every 2nd
f Display All

OK

Cancel

A

278


-------
o Specify the initial estimates listed in the "Select Initial Estimates" to
start the PROP iterative procedure. The default is "OKG (Maronna
Zamar)."

o Specify the distribution for the Mahalanobis distances in the "MDs
Distribution." The default is "Beta."

o Specify the "Critical Alpha," the cutoff for outliers. The default is
"0.05."

o Specify the "Select Number of Iterations." The default is "10."

o Specify the "Influence Function Alpha" for the Huber weighting
process. The default is "0.05."

o Specify whether or not to display the "Correlation R Matrix." The
default is "Do Not Display."

o Specify "Intermediate Iterations." The default is "Do Not Display."

o Click "OK" to continue or "Cancel" to cancel the options.

° Click the "Graphs" button for various options.

II (>RQ£> Gcap.fiicsi Qatictris,

}~ Index MD Plot	j"~

P Distance-Distance MD Plot j"~

W QQ MD Plot
QQ Plot Options 	

MDs Distribution
f* Beta C Chi

Title for Index MD Plot

PROP Estimate
Title for Distance-Distance Plot

PROP Estimate
Title for QQ MD Plot

PROP Estimate

Giaphics Critical Alpha
I 005

Cancel

OK

A

o Specify the "Title for Index MD Plot." This is an index plot of the
robust distances obtained using the PROP estimates.

279


-------
o Specify the "Title for Distance-Distance Plot." This is a plot of the
classical Mahalanobis against the robust distances obtained using the
PROP estimates.

o Specify the "Title for Q-Q MD Plot."

o Select the distribution required for the "Q-Q Plot Options" and the
"Graphics Critical Alpha" for identifying the outliers.

Note. The "Graphics Critical Alpha" should match the "Critical Alpha" from the
outlier Multivariate Outlier Options window to obtain the same outliers

o Click "OK" to continue or "Cancel" to cancel the PROP procedure,
o Click "OK" to continue or "Cancel" to cancel the computing.

Output example: The data set "BRADU.xls" was used for the PROP method. It has 75
observations and four variables. The initial estimates of location and scale for each group were
the median vector and the scale matrix obtained from the OKG method. The outliers were found
using the PROP influence function and the observations were given weights accordingly. The
weighted mean vector and the weighted covariance matrix were then calculated.

280


-------
Output for PROP outliers method.
Data Set used: Bradu

.PR 0 P M univariate 0 utlier Analysis

U ser S elected 0 ptions

D ate/T ime of Computation j 1 /8/2008 1:54:07 PM

From File jD\Narain\Scout_For_Windows^ScoutSouice\WorkDatlnExcer\BRADU
Full Precision fOFF

Critical Alpha 10.05

Influence Function Alpha

Initial Estimates

Display Correlation R Matrix

Distribution of Squared MDs
Number ofTterations

Show Intermediate Results
Title for Index Plot

Title for Distance-Distance Plot
title for QQ Plot

Graphics Critical Alpha

MDs Distribution

0.05

Robust Median Vector and OKG (Maronna-Zarnar) Matrix

Do Not Display Correlation R matrix

Beta Distribution
10

Do Not Display Intermediate Results

PROP Estimate

PROP Estimate
PROP Estimate

0.05

Beta

Number of Observations] 75 |

" 1

1

Number of Selected Variablesj 4





!

Max Squared MD (0.05)

CiiticalValues

i7ir

Individual Squared MD (0.05) 9.128

Multivariate Kurtosis(0.05)|

25 2

Influence Fn Squared MD (0.05)J^ 9128

ClassicalMeanVector

y

x1

1 279

3 207

V

Si

x2

12.2

9.477

20.39

"9477"

13.34

12&47

20.39

28.47

67.88

" 3lT03

- "41.24

1m~67~

Classical Covariance S Matrix

x3

31.03
41.24"
94.67"
T37T8

Determinant! 1906 2G080388262

281


-------
Output for PROP outliers method (continued).

Eigenvalues for Classical Covariance S Matnx

I



EvaM

Eval 2

Eval 3

Eval 4











0.914

1.688

5.538

223.1

































MedianVector











V



-x2

x3











01

1 8

2.2

2.1





























MAD/0.6745 Vectoi Representing Standard Deviation







V

x1

X.2

x3











0.89

1.927

1.631

1.779

































OKG MeanVector











V

x1

x2

x3











1 258

-7 552

-6.052

2.521



	

	



	

Robust OKG (MaronnaZamar) Covariance S Matrix







y

x1

x2

x3











0 598

0.243

0.115

-0.231











0.243

2.856

0.108

0.241











0115

0.108

2175

0.00402











-0.231

0.241

0.00402

2.34











Determinant

7.84553740487618





















Robust OKG Eigenvalues









Eval 1

Eval 2

Eval 3

Eval 4











0.53

2.149

2.316

2.975































Final Weighted MeanVector









V

x1

x2

x3











-0.0776

1.544

1.787

1.679































Final Covariance S Matrix









y

xl

x2

x3











0.315

0.0675

0.0111

-0.117











0.0G75

1.129

0.0364

0136











0.0111

0 0364

1.146

0.161











-0.117

0.136

0.161

1.057











Determinant

0 388282776749715









282


-------
Output for PROP outliers method (continued).

0 bservations with S quaied MD s greater than 17.43

may be considered as potential outliers!









i I



Observation

Classical

Initial

Final

Final |



Number

Squared ME

Squared MC

Squared MD

Weights







1

6 026

654.4

1350

1.861E-16







2

6.729

699.8

1437

5 659E-17







3

7 209

761.9

1574

9 216E-18







4

6.321

769.1

1578

8.734E-18







5

6.339

765.6

1574

9.190E-18







6

7 036

701.8

1446

4 962E-17







7

3 235

740.2

1521

1.843E-17







8

6.714

692.4

1428

6.359E-17







9

6.445

740.8

1524

1 774E-17







10

7 346

719.1

1483 '3.026E-17

i







11

18.61

691.2

1360

1 616E-16







12

27.74

732.6

1459

4.184E-17







13

14.79

712.8

1393

1 033E-16







14

43.7

907

1699

1.893E-18 |





15

3.386

1.944

4 666

1





1G

4 783

2.228

5.315

1







17

2.004

2.761

3.841

1







18

0 751

0 303

0 627

1







19

1 408

0 675

1.769

1







20

2 556

1.234

4.802

1







21

1 281

1 165

3 964

1







22

2 578

1 322

3 721

1







23

1 216

2.84

3.999

1







24

1 321

1.542

3.126

1







25

0.658

3.846

4 222

1







2G

1.413

2 229

415

1







27

2.492

2.387

5 235

1







28

0 765

1 029

1.709

1







29

0 364

1.612

1.619

1







30

2.793

1.945

4.703

1







31

3.395

1.721

2.993

1







32

1 772

2 478

3 73

1







• 33

0 967

1 674

2 75

1







34

1 475

4 461

5 473

1







283


-------
Output for PROP outliers method (continued).

35

1.58

' T.fioiT

0.923

4.093
3.372

1

~T ~~



	



1

1
1

CO
O")

2.086



37

3.368
i}889~

2.047
""2.826

4.44

1

.

i



38

4 44





39

1.935

3.883

4.769

1
1







40

1 237

1.463

1.735



——



41

3139 j 1 386
3.3~5Er| Z147~

4.151

1





42

3 949

1







43

4.108 | 2.526

6.062
_ ~4li54~

1
1

i



44

2.686

2.505







45

1.278

3 85

4.4

1

i~~

1

i

i



46

2.225

1.865

3.711







47

4.332

5.896
1.054

8.531

	





48

2.184

3 712





49

2 552

1.422

5.112

1







50

0 278

0.856

2.346

1
1

	





51

1.858

1.936
1562

3.421





52

4.587

5 498

1







53

4.889

4.616
Z13T

10.85
4.937""

0.698

_

	 i



54

2.604







55

1.622

1 159

1.877

1







56

1.898

1.137

2.77

1







57

0.773

2.385
1.208

3.327

1
1







58

1.988

3.164







59

0 463

0.523

1.691

1







60

3 945

2.628

5.543

1







61

2 806

1.957

5.029

i







62

0.675

3 98

5.127

1







63

1.757
1 112~

2.344
1.602

3.823

1

	





64

3.936





65

1.336

1.909

4.628

1







66

1 696

3.273

4.468

1







67

0.445

1.752

1.921

1
1







68

2.462

4 222

6.464







69,

1.154

2 636

3.062

1
1







70

1.042

1.609

5 021







71

0.413

0.114
0.732

1.278

1







72

1.114

0 872

1







73

2.207

1.311

3.196

1







284


-------
Output for PROP outliers method (continued).

74

2.742

2.635

4.093

1

75

3.632

2.62

5.414

1

Classical Initial Final
Multivariate Kurtosis 53.97 101431 414688

Results: Fourteen (14) observations (1,2,3,4, 5, 6, 7, 8, 9, 10,11, 12, 13, and 14), with squared distances
greater than the Max (squared MD), were each assigned an almost zero (0) weight (soft rejection) and may
be considered to be outliers. Another observation (#53) also received a reduced (<1) weight.

Note: PROP estimates with or without the 14 outliers are in close agreement with the classical estimates
without the 14 outliers.

PROP Estimate

1830.81
1730.81
1630.81
1530.81
143081
1330.81
1230.81
1130.81
1030.81

Q 930.81

5

630 81
53081
*30.81
330.81
230.81
130.81

30 81 195% Maximum (Largest MD) Utntt ¦ 17 4?4S~|

Index

285


-------
PROP Estimate

1830 61
1730.81
1630.81
1530.81
1430 81
1330.61
1230.81
113051
1030.81

Q 930.81

2

830.81
730.81
630 81
53081
430.81
330 81
230.81
130.81
30.81

Classical MD

Estimate

1830 81
1730.81
163081
1530.61
1430.81
1330.81
1230.81
1130.81
1030.81

I

| 930.81
j 830.81
" 730.81
630.81
530.61
430.81
330.81
230 81
130.81
30 SI fe*
-6919
-169.19

MayinMTi^l»geslhC)Un<"17 4345|

N

75 0000

P

40000

Slope

2600656

ttercept

-233.1933

Correlation Coefficiert

0 8414

Obcal Correlation (0.05)

0.9941

Kirtosis

414,687 5887

Cr lical Kurtosis (0.05)

252002

Skewness

98,624.191.0299

O Ileal Skerwness (0 OS)

23990

|95% Warning (Indrvidual MD) Liml » 9 1281 |

0	1	2	3	4	$	6	7	8	9	10 11 12 13 14

Beta Quantiles

286


-------
Graphical Interpretation: the observations (e.g., #53) between the "Warning (Individual MD) Limit"
and the "Maximum (Largest MD) Limit" received a reduced (<1) weight.

Note 1 If the initial estimate of the covariance matrix is not positive definite, then a warning message (in
orange) is displayed, so that one of the other options, which yields a positive definite covariance matrix,
can he selected. This is illustrated as follows

Data Set used: Stackloss.

Select Initial Estimates: Robust MAD/Median.





Classical MeanVector









Stack-Loss

An Flow

Temp

Acid-Cone









1752

60 43

2,,

86 23



























Classical Covariance S Matin







Stack-Loss

Air-Row

Temp,

Acid-Conc









103 5

85.76

2815

21,79









85,76

84 06

22.66

24 57









2815

22 66

999

6 621









21 79

24 57

6 621

28 71









Determinant

62844 87S4181548

























Eigenvalue* for ClassicalCovaiiance S Mabh





Evall

Eva! 2

Eval 3

Eval 4









1 96

7134 j 22 96

^ 134 1





























MedianVectoi









Stack-Loss

Air-Flow

Temp

Acid-Conc









15

58

20

97



















MAD/0.6745Vector Representing Standsd Deviation





Stack-Low

Ab Flow

Temp

Acid Cone









5 33

5 93

2 365

4 448





















Robust MAO Covariance S Matrec







Stack-Loss

Air Row

Temp

Actd-Conc









3517

8576

2815

21.79









8576

3517

22.66

24 57









2815

~2\~W

22 66
24 57""

8 792
""6 621"

6.621
13i~78

	

	



	

Determinant|171530 124177133







|









Robust MAD Eigenvalues







Evall

Eva! 2

Eval3

Eval 4









-50 94

-2115

11 32

1406

























Initial Covariance Matrix is Not Positive Defintel j

Please use other options to compute the Initial Covariance Matrix* I

Note 2. If any of the elements of the MAD/0 6745 vector is less than 10then a fix, called the IQR Fix, is
used. In such cases, the variability measure, MAD/0 6745, is replaced by IOR/1 355. This is illustrated as
follows.

287


-------
Classical M ean Vector

sp-le~th-1

sp-width-1

pt-le~th-1 | pt-width-1





5.00S

3.428

1.4G2 | 0.24G

	

— -

Classical Co variances Matrix

sp-le~th-1

sp-width-1

pt-le~th-1

pt-width-1





0.124

0.0992

0 0164

0.0103





0.0992

0.144

00117

0 0093





0.0164

0.0117

00302

0 00607





0.0103

0.0093

0.00607

0.0111









Determinant

211308767598396E-06















Eigenvalues for Classical Covariance S Matrix

Eval 1

Eval 2

Eval 3

Eval 4





0 00903

0 0268

0 0369

0 236









Median Vectoi



sple~th-1

sp-width-1

ptle~th-1

pt-width-1





5

3.4

1.5

02

















MAD/0.6745Vectoi Representing Standanl Deviation

sple~th-1

sp-width-1

pt-le~th-1

pt-width-1





0 297

0.371

0.148

0

















sple~th-1

—oar

iQRTiiT

Adjusted by IQ R /1.35 MAD ALG745Vectoi

pt-le~th-f |~ pt-width-1

sp-width-1

0.371

0148

0.0741 J

" r

n

Robust MAD Covariance S Matrix

:ti

sp-le~th-1
00879"

sp-width-1

pt-le~th-1
00164

pt-width-1
0 0103"

	

	

0 0992

0 0992

0.137

0.0117

0 0093

	

0 0164
~ 0 0103

0.0117

0 022

0.00607

0 0093

0 00607

0.00549

		

Determinant

1 27592503630283E -07

1




-------
7.2.3 Method Comparisons

The Method Comparison module (available in the Outliers/Estimates drop-down menu) is
a formal graphical method of comparing various classical and robust outlier identification
methods incorporated in Scout. Specifically, selected classical and robust prediction and
tolerance ellipsoids (contour ellipses) are drawn on two-dimensional scatter plots of
selected variables. The main objective of this module is to compare the effectiveness of
the various outlier methods included in Scout.

Those contour plots are displayed at the same two levels as the horizontal lines (warning
limit and maximum limit) displayed on the Q-Q Plots of the MDs. The individual (Indv-
MD) contour (prediction ellipsoid) corresponds to the inner ellipsoid given by the
probability statement:

p(d? < d?ml)< (1- or); i = 1,2, ..., n.

The Simultaneous (Max-MD) outer contour ellipsoid corresponds to a tolerance ellipsoid
given by the probability statement:

p(df < dha)<{\-a); i = 1,2, ..., n

For details, refer to Singh (1993), and Singh and Nocerino (1995). The plots based upon
the classical MDs accommodate outliers as a part of the same population and often fail to
identify all of the outliers present in the data set. The outlying observations are more
prominent on the contour plots obtained using robustified distanced and estimates.
Observations falling outside the outer ellipse (tolerance ellipsoid) are outliers; whereas,
the observations lying between the inner (prediction ellipsoid) and the outer ellipses may
be also represent potential outliers.

If the data set is categorized by a group column, then the contour ellipses (prediction or
tolerance ellipsoids) can be drawn separately for each of the groups included in the data
set. The plots shown here are obtained using some well-known data sets.

o Click Outlier/Estimates > Multivariate 1> Robust Method Comparison.

M

sP

Scoyt< 4'.0, ^ [D:\yarain\ScoutLRQri_W1indbv<£J\ScautSource\WorkDatlnExcel\BRAP.IJ]J

Outliers/Estimates

Navigation Panel |



o

Univariate ~ _

L—Jl—J	I

5

6

7 !

8

Name

Count



Classical ~

JD









D \Narain\Scout_Fo

1

r 1' 3 7!

	

Robust/Iterative >

Sequential Classical
Huber

Extended MCD
MVT

OKG Reweighted
PROP

"1

i

2

	2

To i

95

1 2°5! , i

3

4

"3
4

10 3;
~9 5

107
99j

20 2
215

I " ""

- -i

5

5

-

10! 103
ici 108

21 1

204|





6

Meth6d_Comparison |

289


-------
o

The following variable selection screen appears.

IH Select Variables tg Graphj

Variables

| Name

1 ID

| Count |

I Count

0

75„

V

1

75

xl

2

75

x2

3

75 .

x3

4

75

Select Y Axis Variable

»

«

Name

ID Count

»

Select X Axis Variable

«

Name

ID

Count

Select Group Variable

Options I |	[*]

OK

Cancel



Specify the variable for Y-axis under "Select Y Axis Variable."

Specify the variable for X-axis under "Select X Axis Variable."

Specify the group variable in the "Select Group Variable" drop-down if a
group variable is present in the data set.

Click on "Options" to get the following window.	

E§ OptionsGrqphs_EDA\_Scatterfilot'

-Select EBipse |s)
I? Classical
F? Robust

-Ellipse Grouping
W ABData
f By Group

- Classical Contour Plots —
(• Individual [dOcut]
C Simultaneous [Max MD]
C Simultaneous/Individual

p Classical Cutoff —
Critical Alpha

r

0.05

rRobust Contoui Plots 	

Individual [dOcut]
f Simultaneous [Max MD]
f Sirnultaneous/lndividual

- S elect E stimation Method (s] —

Sequential Qassical
I- Huber
r PROP
r MVT
r MCD

|—Robust Cutoff	

Critical Alpha

005

rLabel Individual Points —
(• Observation Number
By Group Designation

THI0:

Scatter/Contour Plot

OK

Cancel



290


-------
Click the "Simultaneous/Individual" radio button in the "Classical Contour

Plots" box. Click "OK" to continue or "Cancel" to cancel the options window.

Data Set used: Bradu. Both classical prediction and simultaneous ellipsoids are drawn.

•	Open the options window and uncheck the "Classical" option in the "Select
Ellipse(s)" box.

® Click the "Simultaneous/Individual" radio button in the "Robust Contour
Plots" box.

•	Specify any of the estimation methods from the "Select Estimation Method(s)"
box; e.g., PROP.

•	Specify the preferred "Robust Cutoff Critical Alpha," "Select Initial
Estimates," "Select Number of Iterations," "MDs Distribution" and
'influence Alpha." Click "OK" to continue for the graph.

291


-------
i3® OptionsGraphs_[DA_ScatterPlot

Select Ellipse Is)

r Classical
17 Robust

Ellipse Grouping
I* All Data
I- By Gioup

Robust Cutoff

Critical Alpha

0.05

Title:

Robust Contour Hots
C Individual [dOcut]
C Simultaneous [Max MD]
<• Simultaneous/lndividual

Label Individual Points
<• Observation Number
By Group Designation

Scatter/Contour Plot
OK

Cancel

Select Estimation Method (s)
P Sequential Classical
i~ Huber
I* PROP
T MVT
r MCD

Select Initial Estimates
C Classical

C Robust (Median. MAD)
<~ OKG (Maronna Zamar |
C KG (Not Orthogonalized)
C MCD

MDs Distribution
C Beta C Chisquare

Select Number of Iterations

10

(Max = 501

Huber and/or PROP

r

0.05
Influence Alpha



Data Set used: Bradu. Both inner and outer ellipsoids are drawn using the PROP method.

292


-------
o Comparing various robust methods.

81 Qg.tiansGragtis^EPA_ScdtterRloti

-Select Ellipse (s)
f~ Classical
17 Robust

-Ellipse Grouping
& All Data
P By Group

"Robust Cutoff	

Critical Alpha

0 05

Title

pRobust Contour Plots 	

(* Individual [dOcut]
C Simultaneous [Max MD]
C Sirndtaneous/lndividual

r Label Individual Points —
(* Observation Number
C By Group Designation

Scatler/Contour Plot

OK

Caned

-Select Estimation Method (s)
W Sequential Classical
W Huber
17 PROP
W MVT
W MCD

"Select Initial Estimates 	

C Classical

Robust (Median. MAD)
(* OKG [MaronnaZamar)
C KG [Not Orthogonalized)

r mcd

-MDs Distribution 	

(* Beta f Chisquare

"Select Number of Iterations

I 1°

[Max = 50]

"IterativeClassical 	1

| 005
Critical Alpha

"Htier and/oi PROP	1

| O05
Influence Alpha

"Multivariate Trimming 	1

I 01

Trim Percentage

	4

293


-------
Data Set used: Bradu

Scatter/Contour Plot

. Sequent^ Classical

.PROP
. MVT

4.7 -3.7 -2.7 -1.7 -0.7 0.3 1.3 23 3.3 4.3 5.3 6.3 7.3 83 9.3 10.3 11.3 12.3 13.2

x1

Sequential Classieal (with robust OKG initial start); the PROP and MCD ellipses overlap each other.
Note: The "Select Initial Estimates " "MDs Distribution"Number of Iterations" and "Robust Cutoff

remain the same for the sequential classical, Huber, PROP and MVT methods.

294


-------
Data Set used: Star Cluster.

Note: In this example, all of the methods (including the Huber method), except for the classical method,
identified the four main outliers present in the data set.

295


-------
Data Set used: Stackloss.

Example with Group variable in the variable selection screen.

fp OptionsGraphs_EDA_ScatterPlot

Select Ellipse (s)
r Classical
Iv Robust

Elipse Grouping

All Data
I* By Group

Robust Cutoff

Critical Alpha

0.05

Title:

Robust Contour Plots
(• Individual [dOcul]
C Simultaneous [Max MD]
C Simultaneous/Individual

Label Individual Points
C Observation Number
• By Group Designation

Scatter/Contour Plot

OK.

Cancel

Select Estimation Method (s)

Sequential Classical
17 Huber
15 PROP
T MVT
r MCD

Select Initial Estimates
C Classical

f Robust (Median. MAD)
OKG (Maronna Zamar J
C KG (Not Orthogonalized)
C MCD

MDs Distribution
f* Beta C Chisquare

c

Select Number of Iterations

10

[Ma* - 50)

Iterative Classical

0.05
Critical Alpha

Huber and/or PROP

005
Influence Alpha

296


-------
o Check "By Group" in the "Ellipse Grouping" box and uncheck "All
Data."

o Click on the "By Group Designation" radio button in the "Label
Individual Points" box.

o Specify the preferred contour plots and the estimation methods.

o Click "OK" to continue or "Cancel" to cancel the options.

Data Set used: Fulliris (Fisher 1936 data set with 3 species).

Note: The user may select all of the options available in the options window. But, this selection will result
in a busy (with overlapping ellipses) and cluttered graph which is difficult to understand. The user should
select useful options from all available options.

297


-------
-;l*OptionsGraphs_EDA_ScatterPlot

Select Ellipse (s)
V Classical
'y Robust

Ellipse Grouping
& All Data
W By Group

Classical Cutoff
Critical Alpha

oos

Robust Cutoff
Critical Alpha

0.05

Classical Contour Plots
C Individual [dOcut]
C Simultaneous [Max MD1
(* Simultaneous/Individual

Robust Contour Plots
C Individual [dOcut]

Simultaneous [Max MD)
• S imultaneous/l ndividual

Label Individual Points
Observation Number
~ By Group Designation

Scatter/Contour Plot
OK I

Select Estimation Method (s)
!V Sequential Classical

Huber
I? PROP
!7 MVT
1? MCD

Cancel

Select Initial Estimates
C Classical

C Robust (Median, MAD)
OKG (Maronna Zamar)
f KG (Not Oithogonalized)
r MCD

MDs Distribution

Beta ^ Chisquare

m

Select Number of Iterations

I ~

[Max = 50]

Iterative Classical

0.05
Critical Alpha

Huber and/or PROP

0.05
Influence Alpha

Multivariate Trimming

0.1

Trim Percentage

Data Set used: Fulliris (Fisher 1936 data set with 3 species).

Scatter/Contour Plot

93

23

1 18	1.68	2.18	2.59	3.18	3.68	4 18	4.68 4.93

sp-wldth

*1*2 A3

	¦	Classical ImSvidual

¦	_ _ Sequential Classical
	Huber

	¦	FRCP

H ...MVT

¦	MCD

298


-------
References

Alqallaf, F.A., Konis, K.P., Martin, R.D., and Zamar, R.H. (2002). "Scalable Robust
Covariance and Correlation Estimates for Data Mining." In Proceedings of the
Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, ACM, Edmonton.

Barnett, V. and Lewis, T. (1978). Outliers in Statistical Data, New York, J. Wiley and
Sons.

Bernholt, T., and Fischer, P. (2004). "The Complexity of Computing the MCD-
Estimator," Theoretical Computer Science, 326, 383-398.

Devlin, S.J., Gnanadesikan, R., and Kettenring, J.R. (1975). "Robust Estimation and
Outlier Detection with Correlation Coefficients," Biometrika, 62, 531-545.

Devlin, S.J., Gnanadesikan, R., and Kettenring, J.R. (1981). "Robust Estimation of
Dispersion Matrices and Principal Components," Journal of the American Statistical
Association, 76, 354-362.

Dixon, W.J. (1953). "Processing Data for Outliers." Biometrics 9: 74-89.

Fung, W. (1993). "Unmasking Outliers and Leverage Points: a Confirmation," Journal of
the American Statistical Association, 88, 515-519.

Ferguson, T.S. (1961a). "On the Rejection of Outliers," Proc., Fourth Berkeley
Symposium. Math. Statist., 1, 253-287.

Ferguson, T.S. (1961b). "Rules of Rejection of Outliers," Rev., Inst. Int. de Statist, 3,
29-43.

Garner, F.C., Stapanian, M.A., and Fitzgerald, K.E. (1991a). "Finding Causes of

Outliers in Multivariate Environmental Data," Journal of Chemometrics, 5, 241-248.

Garner, F.C., Stapanian, M.A., Fitzgerald, K.E., Flatman, G.T., and Englund, E.J.
(1991b). "Properties of Two Multivariate Outlier Tests," Communications in
Statistics, 20, 667-687.

Grubbs, F.E. (1950). "Sample Criterion for Testing Outlying Observations," Ann. Math.
Statist., 21,27-58.

Gnanadesikan, R., and Kettenring, J.R. (1972). "Robust Estimates, Residuals, and
Outlier Detection with Multi-response Data," Biometrics, 28, 81-124.

299


-------
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986). Robust
Statistics: The Approach Based on Influence Functions, John Wiley & Sons, New
York.

Hawkins, D.M., Bradu, D., and Kass, G.V. (1984). "Location of Several Outliers in
Multiple Regression Data Using Elemental Sets," Technometrics, 26, 197-208.

Huber, P.J. (1981). Robust Statistics, John Wiley and Sons, NY.

Hubert, M., Rousseeuw, P.J., and Vanden Branden, K. (2005). "ROBPCA: A New
Approach to Robust Principal Component Analysis," Technometrics, 47, 64-79.

Kafadar, K. (1982). "A Biweight Approach to the One-Sample Problem," Journal of the
American Statistical Association, 77, 416-424.

Lax, D.A. (1985). "Robust Estimators of Scale: Finite Sample Performance in Long-
Tailed Symmetric Distributions," Journal of the American Statistical Association, 80,
736-741.

Mardia, K.V. (1970). "Measures of Multivariate Skewness and Kurtosis in Testing
Normality and Robustness Studies," Biometrika, 57, 519-530.

Mardia, K.V. (1974). "Applications of Some Measures of Multivariate Skewness
Kurtosis in Testing "Normality and Robustness Studies," Sankhya, 36, 115-128.

Maronna, R.A., and Zamar, R.H. (2002). "Robust Estimates of Location and Dispersion
for High-Dimensional Data sets," Technometrics, 44, 307-317.

Mehrotra, D.V. (1995). "Robust Elementwise Estimation of a Dispersion Matrix,"
Biometrics, 51,1344-1351.

Mosteller, F., and Tukey, J.W. (1977). Data Analysis and Regression, Addison-Wesley
Reading, MA.

Pena, D., and Prieto, F.J. (2001). "Multivariate Outlier Detection and Robust
Covariance Matrix Estimation," Technometrics, 286-299.

ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 User Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:
http://www.epa.gov/esd/tsc/software.htm.

ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 Technical Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:
http://www.epa.gov/esd/tsc/software.htm.

300


-------
Rocke, D.M., and Woodruff, D.L. (1996). "Identification of Outliers in Multivariate
Data," Journal of the American Statistical Association, 91, 1047-1061.

Rocke, D.M., and Woodruff, D.L. (1997). "Robust Estimation of Multivariate Location
and Shape," Journal of Statistical Planning and Inference, 57, pp. 245-255.

Rousseeuw, P.J., and van Zomeren, B.C. (1990). "Unmasking Multivariate Outliers and
Leverage Points," Journal of the American Statistical Association, 85, 633-651.

Rosner, B. (1975). "On Detection of Many Outliers," Technometrics, 17,221-227.

Rousseeuw, P.J., and Leroy, A.M. (1987). Robust Regression and Outlier Detection,
John Wiley and Sons, NY.

Rousseeuw, P.J., and Van Driessen, K. (1999). "A Fast Algorithm for the Minimum
Covariance Determinant Estimator," Technometrics, 41,212-223.

Schwager, S.J., Margolin, B.H. (1982). Detection of Multivariate Normal Outliers, Ann.
Statist., 10, 943-954.

Scout. (2002). A Data Analysis Program, Technology Support Project, USEPA, NERL-
LV, Las Vegas, Nevada.

Scout. (2008). Technical Guide under preparation.

Singh, A. (1993). Omnibus Robust Procedures for Assessment of Multivariate Normality
and Detection of Multivariate Outliers, In Multivariate Environmental Statistics, Patil
G.P. and Rao, C.R., Editors, pp. 445-488, Elsevier Science Publishers.

Singh, A., and Nocerino, J.M. (1995). Robust Procedures for the Identification of
Multiple Outliers, Handbook of Environmental Chemistry, Statistical Methods, Vol.
2. G, pp. 229-277, Springer Verlag, Germany.

Sinha, B.K. (1984). "Detection of Multivariate Outliers in Elliptically Symmetric
Distributions," Anal. Statist., 12, 1558-1565.

Stefanski, L.A., and Boos, D.D. (2002). "The Calculus of M-estimators," The American
Statistician, 56, 29-38.

Tukey, J.W. (1977). Exploratory Data Analysis, Addison-Wesley Publishing Company,
Reading, MA.

Wiiks, S.S. (1963). "Multivariate Statistical Outliers," Sankhya, 25, 407-426.

301


-------
Chapter 8
QA/QC

Issues related to the reliability of data are often grouped under the general heading of
"quality assurance and quality control" (QA/QC), a description that captures the idea that
data quality can not only be documented but can also be controlled through appropriate
practices and procedures. Even with the most stringent and costly controls, data will
never be perfect: errors are inevitable as samples are collected, prepared and analyzed.

One goal of QA/QC is to quantify these errors so that subsequent statistical analysis and
interpretation can take them into account. A second goal is to monitor the errors so that
spurious or biased data can be recognized and, if possible, corrected. A third goal is to
provide information that can be used to improve sampling practices and analytical
procedures so that the impact of errors can be minimized. Scout offers QA/QC methods
for data with and without non-detects. Kaplan-Meier (KM) estimates of mean and
standard deviation are used for data with non-detects.

Scout also allows the user to test the behavior of "Site/Test" data against
"Background/Training" data. In this module the statistics and estimates are computed
using the "Background/Training" data and then graphs and the charts are produced for
the whole data set which is inclusive of "Background/Training" data and "Site/Test" data.
The important requirement for this module is that there should be a column which
indicates the various groups which can be considered as the "Site/Test" data.

8.1 Univariate QA/QC

Scout offers several univariate procedures to achieve the goals specified above. They
include Q-Q Plots with Limits, Interval Graphs and Control Charts. Classical and robust
methods have been incorporated in this module.

8.1.1 No Non-detects

8.1.1.1 Q-Q Plots with Limits

1. Click on QA/QC D> Univariate > No NDs E> Q-Q Plots with Limits.

2. The "Select Variables" screen (Section 3.4) will appear.

303


-------
® Click on the "Options" button for the options window.

§1 Qft/QC, liinivaniatej piftt QatiqiriSj

-Select Methods
C Classical
PROP
Huber
r MVT

"Select Quantiles 	

Raw Data
(* Standardized Data

-Influence Alpha
| 0025

_MDs Distribution
(* Beta
C Chisquared

"tt Iterations

20

"Initial Estimate 	

Mean'/Stdv
Median/1.48MAD

"Critical Alpha for Limits

T	0.010

C	0.025

<5"	0.050

r	0.100

c	0.150

C	0.200

C	0.250

- Display Lines 	

I- Regression Line

17 at Critical Value of Individual MD

f? at Critical Value of Max MD

157 Use Default Title

OK

Cancel



o Specify the method for computing the quantiles in "Select
Methods." The default method is "PROP."

o The robust methods need various input parameters like
"Influence Alpha" or "Trimming Percentage," "Initial
Estimates," "MDs Distribution," and "# Iterations."

o Specify the "Critical Alpha for Limits" for identifying the
outliers. Default is "0.05."

o Specify the quantiles for the X-axis using the "Select

Quantiles" option and options for displaying the regression
lines.

o Click "OK" to continue or "Cancel" to cancel the options.

304


-------
• Click on "OK" to continue or "Cancel" to cancel the Q-Q Plots with
limits.

Output example: The data set "Bradu.xls" was used for the Q-Q Plot. The options used
were the default options.

11.8



PROP Q-Q Plot of y With Limits





y

N-75









Influence A**a - 0.0250000

Mean - -0 066154

10 8









Sd= 0.5560117





J 4



Slope - 2 .625
Hercept -1 279

98









Correlation, R- 0.741

88











7 8











68











(A

c.











« 5fl

2











M











o 48

T3











2i











¦S 3 8











o











2.8
18





Ujjwi SinwltuMom Limit = 1 7298







Upper Indrrifeal Limit ¦ 1.011 i





08

M*&&>-0.0662

j j j







¦0.2



[ 		







-U

Lower Indiwinl Limit • -1*1434 J '









Loto Simihtn*cai> Limit ¦ -1.8621

















-2 87 -2.37 -1.87

-1.37 -0.87 -0 37 0 13 0 63 1.13

Theoretical Quantiles (Standardized Data)

1.63 2.13 2.83 2.87



Note: The observations outside the simultaneous limits are considered as outliers.

305


-------
8.1.1.2 Interval Graphs

8.1.1.2.1 Compare Intervals

l. Click on QA/QC ^ Univariate > No NDs > Interval Graphs E> Prediction,
Tolerance, Confidence, Simultaneous or Individual ~ Compare Intervals.

9 Scout 2008 - [D:\NarainlWorkDatlnExcelUiULLIRIS-nds]!

if} oj Fte Edit Cortgute Data Graphs Stats/GOF CutfarsfEsfrriates	Regression Multivariate EDA GeoStats Propams Window help

Navigation Panel |

D \Narain\WorkDatl

sp-tength

	Q-Q Plots wth Lints la 1 q

MAivariate > | Wih NDs ~

-g-g	ControlCharti

-IfllXl. ff >

_m I 11 I o	—

Cotnpae Prediction Intervals

49

~47'

3	14

"32 U

Tolerance	Predctton Interval Index Plot

Corfdence	~

SmJtaneous	~

IndrnduaJ	~

2. The "Select Variables" screen (Section 3.4) will appear.

o Click on the "Options" button for the options window.

H RredictiunMethodlComparisoii]Qfi/Q£,OptionsJ

m

'Select Methods *

[~ Classical
[7 PROP
R? Huber
R? TukeyB weight
W Lax/Kafadar Biweight
I* MVT

"Confidence *

Confidence Level
I 095

"Title for Graph •

{Prediction Interval Comparison

"PROP MethodOptions
~tt Iterations -

10

-Initial Esbmate 	

r Mean/Stdv
<* Median/1 48MAD

-InfluenceAlpha -
0 025

-MDs Distrbution ~
(* Beta
*"* CHsquared

~Hubei Method Options
-# Iterations —

10

-Initial Estimate 	

C Mean/Stdv
Mediant 48MAD

"Influence Alpha —
I 0025

-Tukey Biweight Method Options

*8 Iterations —

I 25

Maximum

rlmtial Estimate 	

C Mean/Stdv
Median/1.48MAD

-Tuning Constants

I 4

Location

-Lax/Kafadar B iweighl Method Options

Iterations *

25
Maximum

"InitialEstimate 	

C Mean/Stdv
(5* Median/1 48MAD

-Tiding Constants
4

Location

MVT Method Options
— 8 Iteiatfons —

-Initial Estimate 	

C Mean/Stdv
<• Median/1 48MAD

-Trimming %

"MDsDistiibubon
(* Beta
C Chisquaied

6

Scale

6

Scale

OK

Cancel

J

o Select the methods to compare in "Select Methods" box. By
default, all methods are selected.

o Specify the various input parameters for the selected methods,
o Click "OK" to continue or "Cancel" to cancel the options.

306


-------
• Click on "OK" to continue or "Cancel" to cancel the intervals
comparison.

Output example: The data set "Bradu.xls" was used for the Interval Comparison. The
options used were the default options.

Output for the Prediction Interval Comparison.

Prediction Interval Comparison

Intervals for y

¦ Medan|Mean n- 75

307


-------
Output for the Tolerance Interval Comparison.

Tolerance Interval Comparison

Intervals for y

Output for the Confidence Interval Comparison.

Confidence Interval Comparison

5	I	i	I

i

JZ

zzt

*e?—

Intervals for y

308


-------
Output for the Simultaneous Interval Comparison.

Simultaneous Interval Comparison

Ornate*	PROP	Huber	TiAey	Lax wuar	MVT

sd • 3.493	33 - C 556	sd • 1 «04	sd> 0.613	sd»0662	tf-2072

Intervals for y

¦ Median ¦Mean n-75

Output for the individual Interval Comparison.

309


-------
8.1.1.2.2 Intervals Index Plots

l. Click on QA/QC ^ Univariate ~ No NDs ~ Interval Graphs > Prediction,
Tolerance, Confidence, Simultaneous or Individual ^ Intervals Index Plots.

•} Scout< 2008 - [D:\Narain\WorW7atlnF.xcel\nULIilRI5-nds]

o? Ffc EcK Corfgure Data Graphs Stats/GOF Outtes/Estrnates	Re^esun MJbvanate EDA GeoStats Propams Window Help

Navigation Panel I

-lg|x|. 3 >

D.VNarainVWorkDall

1 | 2	Q-Qftost*hLn

z.:i:

2. The "Select Variables" screen (Section 3.4) will appear.

° Click on the "Options" button for the options window.

ions, QA/QG, Tojer.ance I rite r,vaU I n dex< B lot'

"Select Method 	

C Classical

PROP

C Huber

C Tukey Biweight

C Lax Kafadar Biweight

r MVT

p Confidence Level
I 035

Converage

0.9

"Influence Alpha

0.025

"Initial Estimate 	

C Mean/Stdv
Median/1.48MAD

rtt Iterations"

"MDs Distribution
(* Beta
Chisquared

25
Maximum

!>/ Use Default Title

OK

Cancel

A

o Select one of the methods for the interval in "Select Methods"
box. By default, "PROP" is selected.

o Specify the various input parameters for the selected method.

o Click "OK" to continue or "Cancel" to cancel the options.

310


-------
• Click on "OK" to continue or "Cancel" to cancel the intervals
comparison.

Output example: The data set "Bradu.xls" was used for the Interval Index Plots. The
options used were the default options.

Output for the Prediction Interval Index Plot.

118

PROP Prediction Interval (Next 1) for y





108 J







98 j







7.8







68







58
> *8

38





¦	05% Preddkm Lrtts
Lower - -1 185427
Upper-1.0531190
Numew C*>> • 75

¦	PROP Mean

¦	Meen--0 066154
SO ¦ 05560117

28







18







03



* J



-02

J ' J - - - , ' * m.

*





* *

* ,







0 4 1

J 12 18 20 24 28 32 36 *0 44 <8 52 56 B0 64 6

Index of Observations

6 72 76

31


-------
Output for the Tolerance Interval Index Plot.



PROP Tolerance Interval for y

118



98 a



88



78



68





58
¦** 48

38



¦	95% Tolerance Lints

wth 90* Coverage
Lower • -1 146212
Upper -1 0139043
Nurtw Ofet - 75

¦	PROP Mean

¦	Mean--0066154
SO-0.5S60117

28





18





08

- . _ - . • •* ¦ , . , •



-03

• ¦' '• / . ' - "



-12





-23

ft 4

8 12 18 20 24 28 32 36 40 44 48 52 58 60 64 68 72 71

Index of Observations



Output for the Confidence Interval Index Plot.



PROP Confidence Interval fory





118







108

-





98 ,







88







7.8







68







58





195% Confidence Umts





Lower - -0 203927
Upper -0 .0716191

*4.3





Number Obi ¦ 75
PROP Mean
Mean •-0 066154

38





SC-0.556O117

24







18







08



- ' ,



-02 	







-13







-23

0







~ 8 12 16 20 24 23 32 36 40 44 48 52 56 60 $4
Index of Observations

3 72 76

312


-------
Output for the Simultaneous Interval Index Plot.

PROP Simultaneous Interval for y

¦ 95% SimJtaneous Lmls
Lower • -1862105
JK*r-1.7297375
NurtwObs-75
PROP Mean
M««n - 0 066154
SO-0.5560117

Output for the Individual Interval Index Plot.

PROP Individual Interval for y

H 95% krtvidual Ln
-------
8.1.1.3 Control Charts
8.1.1.3.1 Using All Data

l. Click on QA/QC > Univariate > No NDs > Control Charts &>• Using All
Data.

Seoul 2008 , [D:\Narain\Woi WatlnExcel\!JUULIRiS=nds]|

pP Fde Edit Confgixe Data Graphs Stats/GCF Outiers/Estmates
Navigation Panel

Regression Multivariate EDA Geo Stats Programs Window Help

Name

D'\Narain\WorkDatl

1

sp-length

2	Plots with Units

MJUvanate >| W*hNDs >| Interval Graphs

I 8

51}

49l

0 2i

Control Charts ~ I Using AD Data

3; 14| 0 2F 1i	1.	1

2. The "Select Variables" screen (Section 3.4) will appear.

° Click on the "Options" button for the options window.

Using Tranrtg/Background

Mil y mvarjiatG) Go ntcglj Chajjt' Options

-Select Method	

f Classical
fi" PROP
Huber
C Tukey Biweight
C Lax Ka(adar Biweight
r MVT

|- Confidence Level —

10.95

"Converage

|09

r# Iterations ¦

25
Maximum

"Prediction -

NextK.

"Influence Alpha	

I 0025

-Select Intervals 	

l*7 Prediction Intervals
(v? Toleiance Intervals
R7 Simultaneous Intervals
[~ Individual Intervals
(7 Min/Max
Sigma Limits

-Initial Estimate
C Mean/Stdv

f* Median/1.48MAD

"MDs Distribution 	

<• Beta	f" Chisquared

(7 Use Default Title

"Sigma Factor

I 15

OK

Cancel



o Select one of the methods for the interval in "Select Methods"
box. By default, "PROP" is selected.

o Specify the various intervals in the "Select Intervals" box.

314


-------
o Specify the various options for the selected method and
intervals.

o Click "OK" to continue or "Cancel" to cancel the options.
• Click on "OK" to continue or "Cancel" to cancel the control charts.

Output example: The data set "Staekloss.xls" was used for the Control Charts. The
options used were the default options.



PROP Control Limits for Stack-Loss



455





*01









Number Obs« 21
PROP Stats
Mean -13.354037
S0«42301554
¦95% Skrn4aneous Itrft

301



Lower ¦ 2-2*49926
Upper - 24 .463082





¦ 95% Predfctiori Limlj
Lower -4.1380623





Upper-22 570012

M

















O



Lower - 3.1593627

3)





201









Lower-5 4361663





Upper-21 271908





¦ Sigma Limts





S^na Factor >1 5000000



j *

Lower Lint - 7 0088041



*

Upper L*K ¦ 19 699270



*

I MnknumMaxlPium

10.1

*

Mramun • 7 0000000



Maximum-42 000000

















0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Index of Observations

18 19 20 21 22

8.1.1.3.2 Using Training/Background

1. Click on QA/QC ~ Univariate ~ No NDs ~ Control Charts ~
T raining/Background.

3 Scout 2008 [D:\Narain\WorkDatlnExcelVFULLIRIS-nds]

Bji File Edit Configure Data Graphs Stats/GOF Out&ers/Estimates

Regression Muftivariate EDA GeoStats Programs Window Help

Navigation Panel i

Name

D:\Narain\WorkDatl.

0	1	2 KESEiSSSflD K&9D	Q-Q Plots With Limits

1 court	sp-length	sp-widH " With NDs ~	Intetval Graphs

1	5.1	15	1.4 0.2

2	! i|	Q|	3| 14| 02]	1 1	1

| 8	9	10

] Using All Data

Using Trainrig/Background

2. The "Select Variables" screen will appear.

3I5


-------
SelectjSpecific Site.and;Variables

Variables

Selected

Name

I ID | Count |

Name

I ID 1 Count|

sp-length
sp-width
pt-lerigth
pt-wtdth

count

0	150

1	150

2	150

3	150

4	150

Options I

SelectGroupColumn and I nput T est/S ite 	

Select Group ID Column

Input Specific Test/Site fiom Group ID

Cancel |

o Select the variable of interest.

o Select the group variable using the "Select Group ID Column" drop-
down bar.

o Input the group name/number of the variable which is considered as the
test set in the "Input Specific Test/Site from Group ID" box.

° Click on the "Options" button for the options window.

316


-------
HlBl yniyajjijfei Control'Chant' Qotibris,

-Select Method	

C Classical

PROP
C Huber

Tukey Biweight
C Lax Kafadar Biweight
C MVT

"Select Intervals 	

W Prediction Inteivals
17 T olerance Intervals
17 Simultaneous Intervals

Individual Intervals
17 Min/Max
[7 Sigma Limits

(7 Use Default Title

"Confidence Level

jO 95

"Converage

[09

# Iterations

25
Maximum

"Prediction ¦

Next K

"Influence Alpha
I 0025

"Initial Estimate

Mean/Stdv (• Median/1.48MAD

"MDs Distribution 	

<• Beta	C Chisquared

_Sigma Factor —

I

OK

Cancel

-M

o Select one of the methods for the interval in "Select Methods"
box. By default, "PROP" is selected.

o Specify the various intervals in the "Select Intervals" box.

o Specify the various options for the selected method and
intervals.

o Click "OK" to continue or "Cancel" to cancel the options.

° Click on "OK" to continue or "Cancel" to cancel the control charts.

Output example: The data set "Fulllris.xls" was used for the Control Charts. The
options used were the default options.

317


-------
Classical Control Limits for sp-length Using Training Set

Test SeJ OroupC » 3
OtenTedSet-50
Ote In Training Set *100
Classical Slrfs using Tranr
Mean ¦ S.4710GOO
SD-0 6416383
195% Sr«Jltri«cius Limits
Lower -3299
Upper - 7.643
195% Precicton Llmts
Lower-4191
Upoer - 6.751
S 95% Tolerance Lntfs
wlh 90% Coverage
Lower ¦ 4 269
l*»er-6.673



4.08

3.58

308

0	10 20 30 40	50 60	70	80 90 100 110 120 130 140 150

Index of Observations

Note: The observations in "dark blue" and "bigger point marks " are from group 3. The intervals are
calculated using the observations from group 1 and 2 only which were used as the
"Training/Background" set for this data set.

8.1.2 With Non-detects
8.1.2.1 Interval Graphs

1. Click on QA/QC ~ Univariate ~ With NDs ~ Interval Graphs
~ Prediction, Tolerance, Confidence, Simultaneous or Individual.

a

Scout 2008 [D:\Narain\WorkDa1lnExcel\FULLIRIS-nds.xls]



~j1 Ffle Ed* Configure Data Graphs Stats/GOF Outlets/Estimates Regression

Multivariate EDA GeoStats Programs Window help

Navigation Panel |



0 1

2

Univariate ~H

Yo NDs ~

'A-M.'

1 R

7 A q m

Name

count sp-length

sp-width

Multivariate ~ |§|

Interval Graphs ~

Prediction Interval Index Plot

D:\Narain\WorkDatl...

1

1 5.1

3.5

1.4 0.2

i

Control Charts ~

Tolerance Interval Index Plot
Confidence Interval Index Plot
Simultaneous Interval Index Plots

InterQC.gst

2

1 49

3

1.4 0.2

1

1





3

1 4.7

i ir

32

-i 1

1.3 0.2

1 c m

1

1

n



Individual Interval Index Plots

2. The "Select Variables" screen (Section 3.4) will appear.

• Click on the '"Options" button for the options window.

318


-------
Options QA/QG P[edietionjlnteiival) IndexPlot'

"Graphs with NDs replaced by —
<* Detection Limit (No Change)
C Normal ROS Estimates
Gamma ROS Estimates
C Lognormal ROS Estimates
C One Half (1 /2) Detection Limit
C Zero

r Confidence Level
I 095

9 Use Default Title

OK

-Future K
Oil	:

Cancel

J

o Select the method to replace the non-detects with in "Graphs
with NDs replaced by" box. Default method is "Detection
Limit."

o Specify the various input parameters for the selected method.

o Click "OK" to continue or "Cancel" to cancel the options.

° Click on "OK" to continue or "Cancel" to cancel the intervals
comparison.

Output example: The data set "Fulllris.xls" was used for the Interval Index Plots. The
options used were the default options.

319


-------
Output for the Prediction Interval Index Plot with NDs.

85

Classical Prediction Interval (Next 1) for sp-length



7,6





66

a *

Kaplan Meier SMs
Nuribef OtW"150
Number NDs-6
fvCs -1/2 Detection Liri In Red
Mean - 5.845
SO - 0.822
195% Prediction Urrats
Lower «4.216
Upper ¦ 7.474

SB

¦C
a

c *
a

T ,
W

4 6



3.6





26





1.6

0

atii	

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Index of Observations



Note: The non-detect observations are replaced by "One Half (1/2) Detection Limit" indicated by the red
points.

320


-------
8.1.2.2 Control Charts
8.1.2.2.1 Using All Data

I. Click on QA/QC > Univariate > With NDs l> Control Charts > Using All
Data.

SI Seoul 2008 - [D:\Narain\WorkQatInExc'elVFjUllLIRIS-nds]]'

&~ o? Edit Conflgixe Data Graphs Stats/GOF OutDers/Estlmates

I Recession Multivariate EDA GeoStats Programs Window Help

Navigation Panel |

1	

o !

1

2 !

No NDs

Name j



count

sp length

sp-wtdkh

Mullivdiwle ~iKESESI
1 1 ' 1 —¦¦¦!

D \Narain\WorkDatl

1

1

5 1

35

14j 021 1 L



I ?

— -

"49

j ^

1 4| 0 2 1

7	1 8

Interval Graphs

Control Charts >1 Using All Data

	j Usrrq Trammg^BackgrouxJ

2. The "Select Variables" screen (Section 3.4) will appear.

o Click on the "Options" button for the options window.

§§ QA/QC fjIDs Univariate Cqntr;oIIGhar,V 0ptions

¦Select KM Intervals 	

Prediction Intervals
17 Tolerance Intervals
17 Simultaneous Intervals
P IncDvidual Intervals

MirVMax
r~ Sigma Limits

17 Use Default Title

rConfrdence Level

|0 95

rFutureK ¦

1

•Coverage ¦

09

"Graphics Alpha
I 002§

"Giaphs with NDs replaced by —
<• Detection Limit (No Change)
C Normal ROS Estimates
C Gamma ROS Estimates
r Logrtormal ROS Estimates
C One Half (1 /2] Detection Limit
Zero

OK |	Cancel |

	d

o Select the intervals to be displayed on the control chart from
the "Select KM Intervals" box.

o Specify the various parameters for the selected intervals.

o Select the method to replace the non-detects with in "Graphs
with NDs replaced by" box. Default method is "Detection
Limit."

o Click "OK" to continue or "Cancel" to cancel the options.
° Click on "OK" to continue or "Cancel" to cancel the control charts.

321


-------
Output example: The data set "FullIRIS.xls" was used for the Control Charts. The
options used were the default options.

e.95

Kaplan Meier Control Limits for sp-length









770

- -

-



A



720
6.70
620

JZ

o

c

'

Number 06$ » ISO
* J ( Number Ws-6
- " NDs • Detection Lint m Red
J Kaplan Meier SI els
Mean -5 8453333
J ^ SD» 0.8216720

H 95"* Smulaneous LMs
- Lower-2 956

"2.570
M

520
4.70



Upper • 8 735
H 95% Preflfctton Urtls
Lower • 4216
Upper-7.474
|95% Tolefsnce LMs
wth 90* Coverage
Lower - 4345
Upper »7345









3.70





3.20











0

10 20 30 40 50 60 70 80 90 100 110 120 130

Index of Observations

140 150

Note: The non-delecl obsen>ations are replaced by "Detection Limit" indicated by the" red points. "

8.1.2.2.2 Using Training/Background

1. Click on QA/QC ~ Univariate ~ With NDs ~ Control Charts ~
T raining/Background.



Scout 2000 [D:\Narain\WoikDatlnExcel\FULLIRIS nds]



ajJ File Edit Configure

Data Graphs Stats/GOF Outiers/Estimates

ih'tfifttM Re9ression Multivariate EDA GeoStats Programs Window Help

Navigation Panel

r'

0

1

2

| Univariate ~

NofCs ~ I r 7 8 9 1n

Name

count

sp-length

sp-widlh

Multivariate ~

Interval (jraphs ~ 1 j T

iiju Era nWiVrn 9 iehm

1-.4—r\r-

hn

1l

5.1

3.5

Control Charts ~ MTrTT^f

1 n 1

1

3



2. The "Select Variables" screen will appear.

322


-------
Selec(' Specific Site.and'Variables

Variables

Name

count

sp-length

sp-width

pt-lenglh

pl-wtdth

| ID | Count [

150
150
150
150
150

Selected

Options

Name

ID | Count]

•Select Group Column and Input Tesl/Srte
Select Group ID Column



Input Specific Test/Site fiom Group ID

OK

Cancel



o Select the variable of interest.

° Select the group variable using the "Select Group ID Column" drop-
down bar.

o Input the group name/number of the variable which is considered as the
test set in the "Input Specific Test/Site from Group ID" box.

o Click on the "Options" button for the options window.

Ill QA/QG NDs Uhiyaijiate. Go n t rq l! G ha i;t< O f) t i ons

'Confidence Level
[095

m

•Select KM Intervals 	

Prediction Intervals
15* Tolerance Intervals
1^ Simultaneous Intervals
r~ Individual Intervals
r" Min/Max
f~" Sigma Limits

|7 Use Default Title

[-Future K -

i r

"Coverage -

| 09

"Graphics Alpha
| 0025

-Graphs with NDs replaced by —
(~ Detection Limit (No Change]

Normal ROS Estimates
f Gamma ROS Estimates
C Lognormal ROS Estimates
One Hal/ (1/2) Detection Limit
Zeio

OK

Cancel



323


-------
o Select the intervals to be displayed on the control chart from
the "Select KM Intervals" box.

o Specify the various parameters for the selected intervals.

o Select the method to replace the non-detects with in "Graphs
with NDs replaced by" box. Default method is "Detection
Limit."

o Click "OK" to continue or "Cancel" to cancel the options.

• Click on "OK" to continue or "Cancel" to cancel the control charts.

Output example: The data set "Fulllris.xls" was used for the Control Charts. The
options used were the default options.



Kaplan Meier Control Limits for sp-length Using Training Set



826





812







m





mm a



7.62







a





a MM



7.12

a





a M MM





i a a

Obs In Training Sal -100



		 		 	 - - 	 a	a	m m ]

NDs in Training Set - 6

6.62

M j



j a a a m

NDs - Detection Liml in Red



j a a a a m a

Kaplan Meier Stats using Training Set



a a a a a a a a a

Test Sd Group 0-3

612

a J a a a a

Obs in Test Sd-50



a a a * a a

Mean • 5.47400CO

-C

a a M

SO- 0 6331856

CB
C

t a a j a a a

¦ 95% StnJaneous Ltnts

Q_562

i j j j a a

Lower • 3 331



A A ¦ AA Ai

Upper - 7 617

J ¦»



195% Piedction Lrris



AA A A

Lower ¦ 4.211

5.12 '

AA A A AAA i

Upper - 6.737



AA A A 12 A A

¦ 95% TotoranceUtfs

' ^

A '

w*h 90% Coverage

¦

A

Lower -4 288

462 a a

'

Upper-6 66







4.12





3.62





3.12





0 10

20 30 40 50 60 70 60 90 100 110 1 20 130 140 150





Index of Observations



Note: The observations in "dark blue " and "bigger points " are from group 3. The intervals are calculated
using the observations from group I and 2 only which were used as the "Training/Background'' set for
this data set. The "red points " indicate non-detects at the detection limit.

324


-------
8.2 Multivariate QA/QC

Several classical and robust multivariate procedures are available in the QA/QC module
of Scout. The multivariate, with non-detects module uses the Kaplan Meier estimates.
This QA/QC module includes MDs Q-Q Plots with limits, MDs Control Charts and
Prediction and Tolerance ellipsoids. The robust methods include in this module are
explained in Chapter 7.

8.2.1 No Non-detects

8.2.1.1 MDs Q-Q Plots with Limits

1. Click on QA/QC > Multivariate l> No NDs t> MDs Q-Q Plots with Limits.

Navigation Panel j

Name

325


-------
2.

The "!

O

Select Variables" screen (Section 3.4) will appear.

Click on the "Options" button for the options window.

§1 Multivariate Options

x;

pSelect Method

r

Classical

r

Sequential Classical

r

Huber

r

MVT

(*

PROP

r

MCD

-Select Initial Estimates 	

C Classical

C Sequential Classical

C Robust (Median. 1 48MAD)

f*	OKG'(MaronnaZamar)

C KG (Not Orthogonalized)

C MCD

"Critical Alpha
jO 05

-# Iterations —

| 25
Maximum

¦Influence Alpha —
I 005

-MDs Distribution
f Beta

r Control Limits at:

Chisquared

I? Critical Value of Individual MD
ly" Critical .Value of Maximum MD

Is? Use Default Title

Title for Chart
|Q-Q Plot of MDs with Limits

OK I	Cancel I

' ' ' 1 ^

o Specify the method for computing the quantiles in "Select
Methods." The default method is "PROP."

o The robust methods need various input parameters like
"Influence Alpha" or "Trimming Percentage," "Initial
Estimates," "MDs Distribution," "MDs Distribution" and
"# Iterations."

o Specify the "Critical Alpha for Limits" for identifying the
outliers. Default is "0.05."

o Specify the lines for control limits bys using the "Control
Limits at:" option. Both options are unchecked as default.

o Click "OK" to continue or "Cancel" to cancel the options.

o Click on "OK" to continue or "Cancel" to cancel the MDs Q-Q Plots with
limits.

326


-------
Output example (Using All Data): The data set "Stackloss.xls" was used for the Q-Q
Plot. The options used were the default options.

PROP Q-Q Plot of MOs with Limits

IIS!



SIOMlai

21

nij





Slop#

Hfccirt

161017
-13MS9

1013
91-3
su





Ctbol CartBtahon (0 05)
ovca Kuno3» (0 OS)
Crlfc# Skewnns (0 05)

0*932
1.«8«239
22 7607
14311 3093
6(156

n j

4







Squared MOs

U W M

-







313









213









II 8906















1 J ¦*



MlKOiUa-t 1726















32 08 U 20

sa 48 a e a it
Scaled Beta Cuantiles

ee aa





Note: The observations above the maximum limit line are considered as outliers.

Output example (Using Training/Background): The data set "'Fulllris.xls'" was used
for the Q-Q Plot. The options used were the default options.

Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set.

327


-------
8.2.1,2 MPs Control Chart

l. Click on QA/QC >• Multivariate ~ No NDs ~ MDs Control Chart.

33 Scout' 200B - [J):\Mar
-------
o Specify the lines for control limits bys using the "Control
Limits at:" option. Both options are unchecked as default.

o Click "OK" to continue or "Cancel" to cancel the options.
• Click on "OK" to continue or "Cancel" to cancel the MDs Control Charts.

Output example (Using All Data): The data set "Stackloss.xls" was used for the MDs
Control Charts. The options used were the default options.





PROP Multivariate Control Chart







111.3











101.3











91.3











81.3
713











Squared MDs

u u u











31.3











213





95% Maximun (Largest MD) Liml -11 89

08





95% W*ninQ (IndivKiial MO) Liml - 0.1726
1.3 J



* * 4 4 * J

* j J





0 1 2 3 4 5

6

7 8 9 10 11 12 13 14 15

Index of Observations

16 17 18 19 20

21

22

Note: The observations above the maximum limit line are considered as outliers.

329


-------
Output example (Using Training/Background): The data set "Fulllris.xls" was used
for the MDs Control Charts. The options used were the default options.





PROP Multivariate Control Chart Using Training Set

Training DA* Statistic*

54.8







m

n 100
P «

Wis. Estimates: OKG
hluencs Alpha 0 0500
MD distribution: Beta
N»jtroer Iterations 25

46.0







m

a

re-It DaU Statistics

Test Set Oroup ID » 3
to 50
D 4

360







* a

¦3

a

a

* a



m
Q
£











"g 26 0

5

r»







¦ a



CO







a





55% Mewmin (Urges* M0) Unl -18.4338



* a



16.0



•



¦

a

m j a

a a -

8

a aa a m







*



95% Warring (Individual KC)AJmt ¦ 921 S3J * a a



60







' - a * *

' a * *



-40

'J j J aa
j J J i

>* '*> ¦ .*

u

* " J J



10

20 30 40

50

60 70 80 90 100 110 120 130 140 150

Index of Observations

Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set.

330


-------
8.2.1.3 Prediction and Tolerance Ellipsoids

1. Click on QA/QC > Multivariate > No NDs > Prediction and Tolerance
Ellipsoids.

HScouti20081- [D:\Narain\Scout For W,indows\ScoutSnurce\WorkDatlnExcelVHI!)llLIRlS.xU|j. 1

OA/or

Navigation Panel |



0 1

2 I Univariate ~ U I r I el 7

L_

8



Name

count sp-length

sp-WUfll

MDs Q-Q Wot with Limits
MDs Control Chart







D \Narain\Scout_Fo
InterQC qst

1

1; 51

r

3 5; 14]

3* "f







2



' 1 1

2. The following "Select Variables" screen will appear.

BSlSelect'Variables,lOiGraphi . . [- ][n][x)



Variables





Select Y Axis Variable



1 Name ! ID I

Count I

» 1





Name I ID I Count I



Count. 0

75.. 1







y 1

xl 2

75
75

« 1





x2 3

75







x3 A

75

» 1

Select X Axis Variable







1 1

Name | ID | Count |







« 1

Select Group Variable







Options |

i _d

OK | Cancel j

A

° Click on the "Options" button for the options window.

331


-------
131 Options Outlieij MethodGomRgrisoa

•Select Efepse(s) ~
W Classical
W Robust

¦EftpseGrouprq —
V AIData
W By Group

' Classical Cutoff for Contours
Cnbcal Alpha

005

" R obust Cidcff for Contour $
Dilrcal Alpha

1 005

(7 Use Default Title
Title for Graph I

"Classcal Contour Plots —
<* Individual [DOcut]
C Simultaneous [Max MD]
C SimuDaneous/lndrvtdual

-Robust Contou Plots 	

(* Individual [DOcut]

Simultaneous (Max MD]
C Simultaneous^rtdr/tdual

"Label Individual Points —

Observation Number
(* By Group Designation

Prediction and/or Tolerance Elbpsotds

'Select Estmation Method (s)
P Sequential Classical
V Hubef
[7 PROP
r MVT
W MCD

"Select Initial Estimates 	

C Dasscal

r Robust (MeckanJ 48MAD)
<* OKG [MaronnaZamar]
f" KG (Not Orthogonalced)
C MCO

-MDs Distribution 	

<• Beta <"* CKsquaie

¦Select Number of Iterations —i

10

[Max - 50]

'Hubei and/or PROP	

| 005
Influence Function Alpha

A

o Specify the required options. These options are discussed in
Section 7.2.3. The user has an option for drawing the ellipsoids
by groups if the observations are from different groups.

o Click "OK" to continue or "Cancel" to cancel the options,
o Click on "OK" to continue or "Cancel" to cancel the Ellipsoids.

332


-------
I

Output example (Using All Data): The data set "'Fulliris.xls" was used for the
Ellipsoids. The options used are shown in the options window screenshot above. The
ellipses are being drawn by groups.

Prediction Ellipsoids

394

ITS	2.26	276	3X	376	426	««

sp-width

41*2*3

Output example (Using Background/Training): The data set "Fulliris.xls" was used
for the Ellipsoids.

333


-------
Note• The observations in "dark blue" and "bigger points" are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set

8.2.2 With Non-detects

8.2.2.1 MDs Control Charts

1. Click on QA/QC Multivariate ~ With NDs ~ MDs Control Charts.

1

Scout 2008 r[p:\Nnrain\WoikDatl.nExcel\fiULlilRISrnds]J

oj File Edt Configure Data Graphs Stats/GOP Outibers/Estimates	Regression Mitovanete EDA GeoStats Programs Wndow Help

Navigation Panel |	I	| 0 j 1 | 2 I tJnJvjfi^e	I ^ | 6 j 7

Name

D:\Narain\Wort\Datl...

nAnrn I IDTPQT

~r

count | jp^ength

51

Tl " 49

10



d to- I fl Dt-

No NDs					

Pretficbon and Tolerance Elpsoids ~ ) Usmg Tramng/Background

1 4'

0.2,

2. The "Select Variables" screen (Section 3.4) will appear.

o Click on the "Options" button for the options window.

Hi QA/QC NDs Multvaxiate Gontr;oliGhar,ts Options

-Critical Alpha

0 05

-Graphs with N D s replaced —
<• Detection Limit (No Change)
C Normal R0S Estimates
Gamma ROS Estimates
Lognormal ROS Estimates
C One Half (1 /2) Detection Limit
C Zero

"MDs (KM Estimates) Distribution —
(•Beta	C Chisquared

¦ Contiol Limits at ¦

I* Ditical Value of I ndividual H D
P Ditical Value of Maximum MD

(? Use Default Title

OK

Cancel



o Specify the "Critical Alpha" for identifying the outliers.
Default is "0.05."

o Specify the distribution for the distances using "MDs (KM
Estimates) Distribution" box.

o Specify the lines for control limits bys using the "Control
Limits at:" option. Both options are unchecked as default.

o Click "OK" to continue or "Cancel" to cancel the options.

334


-------
Click on "OK" to continue or "Cancel" to cancel the MDs Control Charts.

Output example (Using All Data): The data set "Fulllris.xls" was used for the control
chart. The options used were the default options.

206

95% Maxmum (Largest MD) Lc

Kaplan Meier Multivariate Control Chart

nt-19.7317



SUtMfJc*

n 150
P 4

188
1 TO









MJRows 15
~Os ¦ Detected In Red
Crtfctf Alpha 00500
MD Distribution Beta

168
15.8











148





-





13*











128











118

m

Q 108
5

"2 98





*





O" 88


-------
Output example (Using Training/Background): The data set "Fulllris.xls" was used
control charts. The options used were the default options.

Kaplan Meier Multivariate Control Chart Using Training Set

204 95*Maxtnun(LargestMD)Lmi* 18<338

TiakMnfl DM* H jtiatka

n 100

P *
Crttcal l&ha 00500
kC Diitrfcuhon: Beta

12.4

95* Warning (IrxfvtduM MD) IM • 9 7163

J

a

T«M Dal* St4(1*1 ics

Test Set Group O • 3
n 50
p *

. • . a a a m

s ' ' ¦ - 4 , •

. - • o - J-*. ' . " . M M

- '* ' J . „ , ,

o - aj " - , \ 1 V, «
5 • , a

-O 1 ' ' J " • i

• i J' *



5

S ' m" *

-re " «

-

a



m

-17jB

a a

a

-276



0 10 20 30 40 50 60 70 60 90 100 110 120 130 140 150

Index of Observations



Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set. The non-detect observations are in "red
point marks."

336


-------
8.2.2.2 Prediction and Tolerance Ellipsoids

m

Click on QA/QC > Multivariate > With NDs E> Prediction and Tolerance
Ellipsoids.

Seoul; 2008.! - rP:Warain\Scout'_For._W,indbws\ScoutSource\WorkDatl'nExcGl\FljLLIRIS.xlslJ

QA/QC •

SCI, FSe Edit Configure Data Graphs Stats/GOF Outliers/Estimates
Navigation Panel

Regression Multivariate EDA GeoStats Programs Window Help

Name

D \Narain\Scout_Fo
InteiQC qst

0

1

$p-length

~ tl

I

__L

sp-widtl.

MDs Q-Q Plot with Limits
MDs Control Chart

35
3

I4!

1-4]

2. The following "Select Variables" screen will appear.

s! Select; Variables, to,, G[aghj

Variables

Name

1 ID

I Count I

I Count

0.

75,.

y

1

75

xl

2

75

x2

3

75

x3

4

75

Options

Select Y Axis Variable

Name

1 ID | Count 1

Select X Axis Variable

Name

ID

i Count

Select Group Variable

"3

OK

Cancel



° Click on the "Options" button for the options window.

Options Ellipsiods withjNon.-Detects

~EIGpseGrouping 	

P All Data 17 By Group

"Label Individual Points ~
C Observation Number
(* By Group Designation

-Kaplan Meier Contour Plots
C Individual [DOcut]
r* Simultaneous (Max MD]
(* Simultaneous/Individual

[7 Use Default Title

¦Simultaneous Contour Cutoff
Critical Alpha

r

0 05

-Individual Contour Cutoff —
Critical Alpha
| 005

-MDs Distribution 	

(* Beta C Chisquare

"Graphs with NDs replaced by —

(*	Detection Limit (No Change)
f Normal R0S Estimates
C Gamma R0S Estimates
r Lognoimal ROS Estimates
f One Half (1/2) Detection Limit
C Zero

OK

Cancel



337


-------
o Specify the required options. These options include "Kaplan
Meier Contour Plots", "Critical Alpha(s)," "MDs
Distributions" for the contours and "Graphs with NDs
replaced by" option. The user has an option for drawing the
ellipsoids by groups if the observations are from different
groups.

o Click "OK" to continue or "Cancel" to cancel the options.

Click on "OK" to continue or "Cancel" to cancel the Ellipsoids.

338


-------
Output example (Using All Data): The data set "Fulliris.xls" was used for the
Ellipsoids. The options used are shown in the options window screenshot above. The
ellipses are being drawn by groups.

~T

Kaplan Meier Prediction Ellipsoids

ItM rmpsoids

HafiObs	15C

Mum NO*	1C

Crlical A^oha	0 0501

MX Ctetnbutorc Bea
Ids ¦ Detecton Lkri trtRed

sp-width

Bi #2 A3

339


-------
Output example (Using Background/Training): The data set "Fulliris.xls" was used
for the Ellipsoids.

Note: The observations in "dark blue" and "bigger points " are from group 3. The estimates of mean
vector and the covariance matrix are calculated using the observations from group I and 2 only which
were used as the "Training/Background" set for this data set.

340


-------