« United States
^W^IhUEA Environmental Protection
^^Lal	Agency

Scout 2008 Version 1.0
User Guide
Part I

i .M 2 H 30P«	C«MJMUIfc*«St>Xmstatt 26C*imO>S>T WI

¦L>J	StUWCtItHHWOMIg

S5 Scowl J00« |Ki_ScorwQQ.pl |

% Hi Ml CcrifJI 't^rw M* h*

N#*g*:«r Ptntl

C Seoul2006t*f
ZvtWCikum :»)
£¦• i ¦ i

W.Sow.i g»l

PCA_H«m_»,»1
PCA„Lm4_» jtt
PC* Scitiw «ffl
PCA Scw»sO: »
PCA*PROP t 9*1
PCA.Stw.ljK
PCA_Hom_b
PCA lo* bq«
PCA Scat!* h 5*1

PCa'scwwOO b

i»« k -Kj- ¦¦ • -J Si'®"? »E

I3«
S«3

a»
*«

»«
au
u iaw
a

1S«

1IM

PROP PC Scores for Gioup 1



i **

ta$ P*n»l

.» on::*	;k**¦**-»«<*•„

.>5 'ov.;?*» *>ifrwnrjptkup—mWatxwt*m**
:•)	ftAXhrfMwttgOt'lHJ1

'0 HIJ4.'"* •	PC A 00 PW •»}•**<»f

C«cwl2fWUrtf
Cf^fS^DsNwri s«

S«*R«30 br-p
PCA_PR0P 5U
PCA.Sctm gsl
PCA Horn jsl

PCA>* ja

pca.Scjuw pa
PCA Seem v. Jj

pa"pftDP_. (»<
PCA_ScrH.» »ri
PCA_Hom ij4
PCA.Utf'i 9«t
PCA Sc*»r » gtt
PCA SewtsOQ »

PCA_Sc-#«_fc {il
»CA.Hw».b j Inkj) *«»	B«

;»« B»»o	¦*(*	B*o «« i«cj) «i«)

*(?» v wj	B«<»	¦*«* loo) «a(»	¦ cr (3)

01«) law	.OKI	B0(7) larj) ¦«&!(<) SM®

¦50*3) 50«C«J *« B*WIJ» *•««)	B*X®	B**prnrr>> n iCflTMl	T'ir^lM- ,.1 «« >!Her^«l*iCn«iVt«M^3(Mtee:

M1CCTAM »>fcrrj*^tCr»iVhf4»t-^C'"(.MM*

RESEARCH AND DEVELOPMENT


-------
US EPA

&VV	Headquarters and Chemical Libraries

EPA West Bldg Room 3340	epa/6oo/r-o8/o38

Mejlcode 3404T	February 2009

bog -	1301 Constitution Ave NW	^epa gov

£ _	Washington DC 20004

0<%-	202-566-0556

0

pt-i- Scout 2008 Version 1.0

User Guide

(Second Edition, December 2008)

John Nocerino

U.S. Environmental Protection Agency
Office of Research and Development
National Exposure Research Laboratory
Environmental Sciences Division
^	Technology Support Center

Characterization and Monitoring Branch
^	944 E. Harmon Ave.

Las Vegas, NV 89119

t)o

^	Anita Singh, Ph.D.1

Robert Maichle1
Narain Armbya1
Ashok K. Singh, Ph.D.2

1 Lockheed Martin Environmental Services
1050 E. Flamingo Road, Suite N240
Las Vegas, NV 89119

department of Hotel Management
University of Nevada, Las Vegas
Las Vegas, NV 89154

Repository Material
Permanent Collection

Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official
Agency policy. Mention of trade names and commercial products does not constitute endorsement or
recommendation for use.

U.S. Environmental Protection Agency
Office of Research and Development
Washington, DC 20460

7663cmb09


-------
Notice

The United States Environmental Protection Agency (EPA) through its Office of
Research and Development (ORD) funded and managed the research described here. It
has been peer reviewed by the EPA and approved for publication. Mention of trade
names and commercial products does not constitute endorsement or recommendation by
the EPA for use.

The Scout 2008 software was developed by Lockheed-Martin under a contract with the
USEPA. Use of any portion of Scout 2008 that does not comply with the Scout 2008
User Guide is not recommended.

Scout 2008 contains embedded licensed software. Any modification of the Scout 2008
source code may violate the embedded licensed software agreements and is expressly
forbidden.

The Scout 2008 software provided by the USEPA was scanned with McAfee VirusScan
and is certified free of viruses.

With respect to the Scout 2008 distributed software and documentation, neither the
USEPA, nor any of their employees, assumes any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information, apparatus, product, or process
disclosed. Furthermore, the Scout 2008 software and documentation are supplied "as-
is" without guarantee or warranty, expressed or implied, including without limitation, any
warranty of merchantability or fitness for a specific purpose.

iii


-------
Executive Summary

The Scout 2008 version 1.00.01 software package provides a wide variety of classical
and robust statistical methods that are not typically available in other commercial
software packages. A major part of Scout deals with classical, robust, and resistant
univariate and multivariate outlier identification, and robust estimation methods that have
been available in the statistical literature over the last three decades. Outliers in a data set
represent those observations which do not follow the pattern displayed by the majority
(bulk) of the data. It should be pointed out that all of the outlier identification methods
are meant to identify outliers in a data set typically representing a single population.
Outlier identification methods are not meant to be used on clustered data sets
representing mixture data sets, especially when more than two clusters may be present in
the data set. On data sets having several clusters, other methods such as cluster analysis
and principal component analysis may be used.

Several robust estimation and outlier identification methods that have been incorporated
into Scout 2008 include: the iterative classical method, the iterative influence function
(e.g., Biweight, Huber, PROP)-based M-estimates method, the multivariate trimming
(MVT) method, the least median-of-squared residuals (LMS) regression method, and the
minimum covariance determinant (MCD) method. Some initial choices for the iterative
estimation of location and scale are also available in Scout 2008, including the
orthogonalized Kettenring and Gnanadesikan (OKG) method; the median, median
absolute deviation (MAD), or interquartile range (IQR)-based methods; and the MCD
method. Scout offers classical and robust methods to estimate: the multivariate location
and scale, classical and robust intervals, classical and robust prediction and tolerance
ellipsoids, multiple linear regression parameters, principal components (PCs), and
discriminant (Fisher, linear, and quadratic) functions (DFs). The discriminant analysis
module of Scout can perform cross validation using several methods, including leave-
one-out (LOO), split samples, M-fold validation, and bootstrap methods. For both
univariate and multivariate data sets, Scout also has a QA/QC module that can be used to
compare test (e.g., polluted site, new drug) data set with training (e.g., reference,
background, placebo) data set.

Below detection limit (BDL) observations or non-detect (ND) data are inevitable in many
environmental and chemometrics applications. Scout has several univariate graphical
(e.g., box plots, index plots, multiple quantile-quantile (Q-Q) plots) and inferential
methods that can be used on full uncensored data sets and also on left-censored data sets
with below detection limit (DL) observations. Specifically, Scout can be used to:
compute and graph various interval estimates, perform typical univariate goodness-of-fit
(GOF) tests, and perforin single and two-sample hypothesis tests on uncensored data sets
and left-censored data sets with NDs potentially consisting of multiple detection limits.
For univariate data sets with NDs, statistical inference methods (e.g., intervals and
hypothesis testing) available in Scout 2008 include simple substitution methods (0, DL/2,
and DL), regression on order statistics (ROS) methods, and the Kaplan-Meier (KM)

v


-------
method. For multivariate data sets with ND observations, Scout can compute mean
vector, covariance matrix, prediction and tolerance ellipsoids, and principal components
using the Kaplan-Meier method. For multivariate data sets with NDs. Scout can also
generate Q-Q plot of Mahalanobis distances (MDs) and prediction and tolerance
ellipsoids.

In Scout 2008, emphasis is given to graphical displays of multivariate data sets. Most of
the classical and robust methods in Scout are supplemented with formal multivariate
classical and robust graphical displays, including the quantile-quantile (Q-Q) plots of the
Mahalanobis distances (MDs); control-chart-type index plots of the MDs; distance-
distance (D-D) plots; Q-Q plot and index plot of residuals; residual versus leverage
distance plots; residual versus residual (R-R) and Y versus Y-hat plots; Q-Q plots of PCs;
scatter plots of raw data, PC scores, and DF scores with prediction or tolerance ellipsoids
superimposed on the respective scatter plots. Those graphical displays can be formalized
by drawing appropriate limits at the critical values of the MDs and Max-MD obtained
using the exact scaled beta distribution of the MDs or an approximate chi-square
distribution of the MDs. Some graphical methods comparison methods are also available
in Scout so that one can graphically compare the performances (e.g., in terms of
identifying appropriate outliers and producing best regression fits) of those methods.
Specifically, Scout can be used to display multiple D-D plots and R-R plots, multiple
linear regression fits, and tolerance ellipsoids or prediction ellipsoids for the various
outlier identification methods on the same graph. On these graphs, all observations can be
labeled simultaneously or individually by using a mouse. For grouped data, observations
can also be labeled by group ID; and group assignment of selected observations can be
changed and saved interactively using the computer monitor and mouse.

Scout 2008 also offers GOF test statistics to assess multivariate normality. Several GOF
test statistics, including the multivariate kurtosis, the skewness, and the correlation
coefficient between the ordered MDs and the scaled beta (or chi-square) distribution
quantiles, are displayed on a Q-Q plot of the MDs. The associated critical values of those
GOF test statistics (obtained via extensive simulation experiments) are also displayed on
the graphical displays of the Q-Q plots of the MDs. Some approximate multinormality
GOF test statistics (e.g., standardized kurtosis, omnibus test) and their p-values are also
displayed on a Q-Q plot of MDs.

Two standalone software packages, ProUCL 4.00.04 and ParallAX, have also been
incorporated into Scout 2008. ProUCL 4.00.04 is a statistical software package
developed to address environmental applications, whereas the ParallAX software offers
graphical and classification tools to analyze multivariate data using the parallel
coordinates.

vi


-------
Acronyms and Abbreviations

% NDs	Percentage of Non-detect observations

ACL	alternative concentration limit

A-D, AD	Anderson-Darling test

AM	arithmetic mean

ANOVA	Analysis of Variance

AOC	area(s) of concern

B*	Between groups matrix

BC	Box-Cox-type transformation

BCA	bias-corrected accelerated bootstrap method

BD	break down point

BDL	below detection limit

BTV	background threshold value

BW	Black and White (for printing)

CERCLA	Comprehensive Environmental Response, Compensation, and

Liability Act

CL

compliance limit, confidence limits, control limits

CLT	central limit theorem

CMLE	Cohen's maximum likelihood estimate

COPC	contaminant(s) of potential concern

CV	Coefficient of Variation, cross validation

D-D	distance-distance

DA	discriminant analysis

DL	detection limit

DL/2 (t)	UCL based upon DL/2 method using Student's t-distribution

cutoff value

DL/2 Estimates	estimates based upon data set with non-detects replaced by half

of the respective detection limits

DQO	data quality objective

DS	discriminant scores

EA	exposure area

EDF	empirical distribution function

EM	expectation maximization

EPA	Environmental Protection Agency

EPC	exposure point concentration

FP-ROS (Land)	UCL based upon fully parametric ROS method using Land's El-

statistic

vii


-------
Gamma ROS (Approx.) UCL based upon Gamma ROS method using the bias-corrected

accelerated bootstrap method
Gamma ROS (BCA) UCL based upon Gamma ROS method using the gamma

approximate-UCL method
GOF, G.O.F.	goodness-of-fit

H-UCL	UCL based upon Land's H-statistic

HBK	Hawkins Bradu Kaas

HUBER	Huber estimation method

ID	identification code

IQR	interquartile range

K	Next K, Other K, Future K

KG	Kettenring Gnanadesikan

KM (%)	UCL based upon Kaplan-Meier estimates using the percentile

bootstrap method

KM (Chebyshev)	UCL based upon Kaplan-Meier estimates using the Chebyshev

inequality

KM (t)	UCL based upon Kaplan-Meier estimates using the Student's t-

distribution cutoff value

KM (z)	UCL based upon Kaplan-Meier estimates using standard normal

distribution cutoff value

K-M, KM	Kaplan-Meier

K-S, KS	Kolmogorov-Smirnov

LMS	least median squares

LN	lognormal distribution

Log-ROS Estimates estimates based upon data set with extrapolated non-detect

values obtained using robust ROS method

LPS	least percentile squares

MAD	Median Absolute Deviation

Maximum	Maximum value

MC	minimization criterion

MCD	minimum covariance determinant

MCL	maximum concentration limit

MD	Mahalanobis distance

Mean	classical average value

Median	Median value

Minimum	Minimum value

MLE	maximum likelihood estimate

MLE (t)	UCL based upon maximum likelihood estimates using Student's

t-distribution cutoff value

viii


-------
MLE (Tiku)	UCL based upon maximum likelihood estimates using the

Tiku's method

Multi Q-Q	multiple quantile-quantile plot

MVT	multivariate trimming

MVUE	minimum variance unbiased estimate

ND	non-detect or non-detects

NERL	National Exposure Research Laboratory

NumNDs	Number of Non-detects

NumObs	Number of Observations

OKG	Orthogonalized Kettenring Gnanadesikan

OLS	ordinary least squares

ORD	Office of Research and Development

PCA	principal component analysis

PCs	principal components

PCS	principal component scores

PLs	Prediction limits

PRC	preliminary remediation goals

PROP	proposed estimation method

Q-Q	quantile-quantile

RBC	risk-based cleanup

RCRA	Resource Conservation and Recovery Act

ROS	Regression on order statistics

RU	remediation unit

S	substantial difference

SD, Sd, sd	standard deviation

SLs	simultaneous limits

SSL	soj| screening levels

S-W, SW	Shapiro-Wilk

TLs	tolerance limits

UCL	upper confidence limit

UCL95, 95% UCL	95% upper confidence limit

UPL	upper prediction limit

UPL95, 95% UPL	951% Upper prediction limit

USEPA	United States Environmental Protection Agency

UTL	upper tolerance limit

Variance	classical variance

W*	Within groups matrix

ix


-------
WiB matrix	Inverse of W* cross-product B* matrix

WMW	Wilcoxon-Mann-Whitney

WRS	Wilcoxon Rank Sum

WSR	Wilcoxon Signed Rank

Wsum	Sum of weights

Wsum2	Sum of squared weights

x


-------
Acknowledgements

We wish to express our gratitude and thanks to our colleagues who helped to develop
past versions of Scout and to all of the many people who reviewed, tested, and gave
helpful suggestions for the development of Scout. We wish to especially acknowledge:
Nadine Adkins, Girdhar Agarwal, Anastasia Arteyeva, Chad Cross, Rohan Dalpatadu,
Marion Edison, Tim Ehli, Evan Englund, Peter Filzmoser, Kirk Fitzgerald, George
Flatman, Forest Garner, Robert Gerlach, Edward Gilroy, Colin Greensill, Anwar Hossain,
Kuen Huang-Farmer, Mia Hubert, Alfred Inselberg, Barry Lavine, MalihaNash, Ramon
Olivero, John Palasota, Bruce Rhoads, Brian Schumacher, Cliff Spiegelman, Teruo
Sugihara, Martin Stapanian, Valeri Tsarev, Asokan Mulayath Variyath, Suresh
Veluchamy, Sabine Verboven, INDUS Corporation, and Computer Sciences Corporation.

xi


-------
Software Used to Develop Scout 2008

Scout 2008 (Scout) has been developed in the Microsoft .NET Framework using the C#
programming language to run under the Microsoft Windows XP operating systems. As
such, to properly run Scout, the computer using the program must have the .NET
Framework pre-installed. The downloadable .NET files can be found at one of the
following two Web sites:

° http://msdn2.microsoft.coin/en-us/netframework/default.aspx
Note: Download .NET version 1.1

o http://www.microsoft.com/downloads/detai ls.aspx?Familyld=262D25E3-
F589-4842-8157-034D1 E7CF3A3&displavlang=en

The Scout source code uses the following embedded licensed software:

Chart FX 6.2 (for graphics), http://www.softwarefx.com

Quinn-Curtis QCChart 3D Charting Tools for .Net (for graphics),
http://wvvvv.quinn-curtis.com

NMath (for mathematical and statistical libraries), http://www.centerspace.net/
FarPoint (for spreadsheet applications), http://www.fpoint.com/

xiii


-------
Table of Contents

Notice	iii

Executive Summary	v

Table of Contents	xv

Chapter 1	1

Introduction	 1

1.1	Methods to Handle Data Sets with Below Detection Limit Observations	 1

1.2	Goodness-of-Fit Test Statistics to Test Multinormality of a Data Set	2

1.3	Robust Methods in Scout	3

1.3.1	Robust Intervals	3

1.3.2	Coverage or Cutoff Levels (Factors) Used by Outlier Identification Methods	4

1.3.3	Critical or Cutoff Outlier Alpha Used in Graphical Displays	5

1.3.4	Break Down Point	6

1.3.4.1 Break Down Point of an Estimation Method	6

1.3.5	Initial Estimation Methods A vailable in Scout 2008	7

1.3.6	Least Median of Squares (LMS) Regression Method	8

1.3.7	MCD Method (Extended MCD Method)	10

1.3.8	PROP Influence Function	11

1.4	Outliers/Estimates Module	 12

1.4.1	Coverage and Influence Function Levels in Robust Outlier Identification Methods 13

1.4.2	Outlier Determination Critical Alpha	13

1.5	QA/QC Module	14

1.6	Regression Module	15

1.6.1 Robust Regression Based Upon M-Estimation and Generalized M-Estimation	 15

1.7	Principal Component Analysis (PCA) and Discriminant Analysis (DA)	16

1.8	Output Generated by Scout 2008	17

1.9	Installing and Using Scout	18

1.9.1	Minimum Hardware Requirements	 18

1.9.2	Software Requirements	 18

1.9.3	Installation Instructions	 18

1.9.4	Getting Started	19

Chapter 2	21

Working with Data, Graphical Output, and Non-Graphical Output	21

2.1	Creating a New Spreadsheet (Data Set)	21

2.2	Open an Existing Spreadsheet (Data Set)	21

2.3	Input File Format	22

2.4	Number Precision	22

2.5	Entering and Changing a Header Name	23

2.6	Editing	24

2.7	Handling Non-detect Observations			25

2.8	Handling Missing Values	26

2.9	Saving Files	27

2.10	Printing Non-Graphical Outputs	27

2.11	Working with Graphs	28

xv


-------
2.11.1	Graphics Toolbar	29

2.11.2	Drop-Down Menu Graphics Tools	31

2.11.3	3D Graphics Chart Rotation Control Button	35

References	37

Chapter 3	39

Select Variables Screens	39

3.1	Data Drop-Down Menu	39

3.1.1	Transform (No NDs)	39

3.1.2	Impute: Transform Two Columns to a Column (NDs)	40

3.1.3	Copy	42

3.2	Graphing and Statistical Analysis of Univariate Data	42

3.2.1	Graphs by Groups	45

3.2.2	Select Variables Screen for Two-Sample Hypothesis Testing	46

3.2.2.1	Without Group Variable	46

3.2.2.2	With Group Variable	46

3.3	Regression Menu	48

3.4	Multivariate Outliers and PCA Menu	49

3.5	Multivariate Discriminant Analysis Menu	51

Chapter 4	53

Data	53

4.1	Copy	53

4.2	Generate	55

4.2.1	Univariate	55

4.2.2	Multivariate	58

4.3	Impute (NDs)	60

4.4	Missing	62

4.5	Transform (No NDs)	64

4.6	Expand Data	66

4.7	Benford's Analysis	69

References	71

Chapter 5	73

Graphs	73

5.1	Univariate Graphs	73

5.1.1	Box Plots	74

5.1.2	Histograms	77

5.1.2.1	NoNDs	77

5.1.2.2	With NDs	79

5.1.3	Q-Q Plots	81

5.1.3.1	NoNDs	81

5.1.3.2	With NDs	83

5.2	Scatter Plots	85

5.2.1	2D Scatter Plots	85

5.2.2	3D Scatter Plots	87

Chapter 6	91

Goodness-of-Fit and Descriptive Statistics	91

xvi


-------
6.1	Descriptive Statistics of Univariate Data	91

61.1 Descriptive (Summary) Statist ics for Data Sets with No Non-cletects	91

6.1.2	Descriptive (Summary) Statistics for Data Sets with Non-detects	94

6.1.3	Descriptive Statistics for Multivariate Data	96

6.2	Goodness-of-Fit (GOF)	 101

6.2.1 Univariate GOF	 101

6.2.1.1	GOF Tests for Data Sets with No NDs	102

6.2.1.1.1	GOF Tests for Normal and Lognormal Distribution	 102

6.2.1.1.2	GOF Tests for Gamma Distribution	 105

6.2.1.1.3	GOF Statistics	 107

6.2.1.2	GOF Tests for Data Sets With NDs	 110

6.2.1.2.1	GOF Tests Using Exclude NDs for Normal and Lognormal Distribution 110

6.2.1.2.2	GOF Tests Using Exclude NDs for Gamma Distribution	 113

6.2.1.2.3	GOF Tests Using Log-ROS Estimates for Normal and Lognormal
Distribution	 116

6.2	1.2.4 GOF Tests Using Log-ROS Estimates for Gamma Distribution	 119

6.2.1.2.5	GOF Tests Using DL/2 Estimates for Normal or Lognormal Distribution...
	 122

6.2.1.2.6	GOF Tests Using DL/2 Estimates for Gamma Distribution	 125

6.2.1.2.7	GOF Statistics	128

6 2 2 Multivariate GOF	 131

6.3	Hypothesis Testing	 133

6.3.1.1	Single Sample Hypothesis Tests for Data Sets with No Non-detects	 133

6 3 111 Single Sample t-Test	 133

6.3	1 1 2 Single Sample Proportion Test	 135

6.3.1.1.3	Single Sample Sign Test	 137

6.3.1.1.4	Single Sample Wilcoxon Signed Rank Test	 139

6.3.1.2	Single Sample Hypothesis Tests for Data Sets With Non-detects	 141

6.3.1.2.1	Single Sample Proportion Test	 141

6.3.1.2.2	Single Sample Sign Test	 144

6.3.1.2.3	Single Sample Wilcoxon Signed Rank Test	 146

6.3.2.1	Two-Sample Hypothesis Tests for Data Sets With No Non-detects	 148

6.3 2.1.1 Two-Sample t-Test	 148

6.3.2.1.2	Two-Sample Wilcoxon Mann Whitney Test	 150

6.3.2.1.3	Two-Sample Quantile Test	 152

6.3.2.2	Two-Sample Hypothesis Tests for Data Sets With Non-detects	154

6.3.2.2.1 Two-Sample Wilcoxon Mann Whitney Test	 154

6.3 2.2.2 Two-Sample Gehan Test	 157

6.3 2.2.3 Two-Sample Quantile Test	 160

6.4	Classical Intervals	 161

6.4.1	Upper (Right Sided) Limits	 162

6.4.1.1	Upper (Right Sided) Confidence Limits (UCLs)	 162

6.4.1.1.1	No NDs	 162

6.4.1.1.2	With NDs	 165

6.4.1.2	Upper Prediction Limits (UPL) / Upper Tolerance Limits (UTL)	 168

6.4.1.2.1 No NDs	168

64 12.2 With NDs	 171

6.4.2	Classical Confidence Intervals	 175

6.4.2.1	Without Non-detects	 175

6.4.2.2	With Non-detects	179

6.4.3	Classical Tolerance Intervals	 184

xvii


-------
6.4.3.1	Without Non-detects	 184

6.4.3.2	With Non-detects	187

6.4.4 Classical Prediction Intervals	 192

6.4.4.1	Without Non-detects	192

6.4.4.2	With Non-detects	196

6.5 Robust Intervals	200

6.5.1	Robust Confidence Intervals	201

6.5.2	Robust Simultaneous Intervals	204

6.5.3	Robust Prediction Intervals	208

6.5.4	Robust Tolerance Intervals	211

6.5.5	Intervals Comparison	215

6.5.6	Group Analysis	218

References	221

xviii


-------
Chapter 1
Introduction

This chapter briefly summarizes statistical methods incorporated in Scout, which are not
readily available in commercial and freeware software packages. Therefore, only those
modules of Scout consisting of such methods are briefly discussed in this chapter. Please
note that at the time of writing this Scout 2008 User Guide, resources were not available
for producing a Scout 2008 Technical Guide, which would discuss the theory used in the
Scout 2008 software in much more detail. A technical guide is planned. However, in the
meantime, for theoretical inquiries, please consult the Bibliography given at the end of
this user guide.

1.1 Methods to Handle Data Sets with Below Detection Limit
Observations

The "Data" module of Scout offers several imputation (e.g., via regression on order
statistics) and substitution (e.g., replacing non-detects (NDs) by DLs or DL/2) methods
that can be used to estimate or extrapolate non-detect data consisting of multiple
detection limits (DLs). Specifically, this module has some univariate imputation (e.g., via
regression on order statistics (ROS) - for normal, lognormal, and gamma distributions)
and substitution (e.g., replacing NDs by 0, DL, DL/2, or uniform random variables)
methods that can be used to estimate and/or extrapolate non-detect observations present
in a left-censored data with ND observations. Whenever applicable, transformation and
imputation methods in Data module can also be used on data sets consisting of multiple
groups (e.g., perform z-transform, log ROS (LROS)). One may use the transformation
module on a multivariate data set with NDs before using a multivariate method (e.g.,
Regression, PCA, and DA) on that data set. It should be noted that for multivariate data
sets with NDs, Scout can estimate mean vector and covariance matrix using the Kaplan-
Meier (1958) method which does not require the imputation of NDs before using
statistical methods such as principal component analysis (PCA). Some basic tools to
estimate missing observations and bivariate transformation operations are also available
in this Data module. The Stats/GOF module of Scout offers several parametric and
nonparametric (including Kaplan-Meier, regression on order statistics (ROS), and
bootstrap methods) univariate statistical methods that can be used on left-censored data
sets with non-detect observations potentially having multiple detection limits. For both
uncensored and left-censored data sets, Scout can compute a variety of parametric and
nonparametric interval estimates, including: the confidence interval for the mean,
prediction intervals, and tolerance intervals. The Stats/GOF module also has univariate
goodness-of-fit (GOF) tests for normal, lognormal, and gamma distributions for
uncensored and left-censored data sets. However, it should be noted that it is not easy to
verify distributional assumptions for censored data sets consisting of multiple detection
limits (DLs). Therefore, use of nonparametric methods is preferable on such left censored
data sets. Some single and two-sample hypotheses tests (e.g., Wilcoxon Rank Sum Test,


-------
Gehan Test) for uncensored and left-censored data sets potentially having single or
multiple DLs are also available in Scout. The details of methods to compute statistics
based upon left-censored data sets can be found in Singh and Nocerino (2001), Helsel
(2005), Singh, Maichle, and Lee (2006), and ProUCL 4.00.04 Technical Guide (2007).

1.2 Goodness-of-Fit Test Statistics to Test Multinormality of a
Data Set

It is not easy to verify multivariate normality of a data set. Multivariate normality tests
such as multivariate kurtosis (MK) and skewness (e.g., Mardia (1970, 1974), Mardia and
Kanazawa (1983)) are very sensitive to even small changes in the values of observations
of a data set. As a result, it is very hard not to reject the hypothesis of multinormality of a
data set. Therefore, it is desirable also to use graphical quantile-quantile (Q-Q) plots
(e.g., Singh (1993), Koziol (1993) and Fang and Zhu (1997)) of Mahalanobis distances
(MDs) to assess the approximate multinormality of a data set. Singh (1993) proposed to
use a correlation-type goodness-of-fit (GOF) tests to assess approximate multivariate
normality of a data set. Scout 2008 can compute classical and robust (e.g., based upon
iterative M-estimation method, MVT and MCD methods) estimates of multivariate
kurtosis and skewness. Scout 2008 can also generate classical and robust Q-Q plots of
MDs based upon quantiles of scaled beta distribution and approximate chi-square
distribution.

Extensive simulated critical values of the multivariate GOF test statistics including
multivariate kurtosis (MK), multivariate skewness (MS), correlation coefficients between
order MDs and quantiles of scaled beta (or chi-square) distribution have been generated.
The GOF Q-Q plot of MDs is formalized by displaying exact test statistics: MS, MK, and
correlation coefficient and their simulated critical values for a specified level of
significance, a. Approximate MS (with small sample adjustment), standardized
approximate MK, and approximate omnibus multinormality test and their associated p-
values are also displayed on these Q-Q graphs. It should be pointed out that there are
significant differences between the exact simulated critical values of multivariate kurtosis
and skewness, and their approximate critical values as described in the literature. Also,
the performance of these approximations (e.g., chi-square distribution for MS and normal
distribution for standardized kurtosis) is not well established, especially when the
dimension, p becomes larger than 5. These discrepancies can be seen by looking at the
various exact and approximate GOF test statistics displayed on the Q-Q plot of MDs.

This issue is under further investigation. A linear pattern displayed by data pairs,
(theoretical quantiles from the distribution of MDs and ordered observed MDs) on the Q-
Q plot of MDs suggests (cautiously) approximate multinormality of the data set. Since,
Q-Q plots of MDs are very sensitive to even minor changes in observations and mild
outliers, other measures such as Q-Q plot and scatter plot of principal components (also
available in Scout) may also be used to assess approximate multinormality (cautiously) of
a multivariate data set.

2


-------
1.3

Robust Methods in Scout

Several options in various modules of Scout (e.g., Robust intervals, Outlier/Estimates,
QA/QC, Regression, Method Comparison, PCA, and discriminant analysis) offer robust
statistical methods described in the following sections.

1.3.1 Robust Intervals

In addition to classical methods, the Stats/GOF module of Scout has univariate methods
to compute robust estimates of location and scale, and robust interval estimates. At
present, robust methods are available for uncensored data sets without non-detect
observations. The univariate iterative robust estimation methods in Scout 2008 include:
Tukey's Bisquare (1975) and Kafadar's version ofTukey's Biweight (l 982) influence
functions, Huber (1981) and PROP (Singh, I993) influence functions, and the trimming
method. Two choices: (classical mean and sd), and (median, l .48MAD or IQR/l .345) of
initial estimates are available for all iterative univariate estimation methods included in
Scout. The robust interval module can be used to compute robust confidence intervals of
the mean, robust prediction interval for k (>l) observations, tolerance intervals, and
robust simultaneous (with critical value from the distribution of Max (MDs)), and
individual (with critical value from the distribution of MDs) intervals. The details of the
robust interval estimates can be found in Kafadar (1982), Hoaglin and Mosteller, and
Tukey (1983), Singh and Nocerino (1995, 1997), and Horn, Pesce and Copeland (1998).

The robust interval option provides graphical comparison of the various robust and
classical interval estimation methods. Depending upon the selected options and methods,
some relevant robust statistics such as mean, standard deviation (sd), influence function
alpha, a, trimming percentage (%), location and scale tuning constants (TCs) are also
displayed on these interval method comparison graphs. This option also provides
classical and robust control-chart-type interval index plots exhibiting the associated limits
for the selected variable. On a single classical or robust (e.g., using Biweight influence
function) interval plot (showing all individual data points), one can draw more than one
set of intervals including: individual interval, prediction interval, tolerance interval, and
simultaneous interval. Specifically, on this control-chart-type interval plot, if Huber
option is used, all interval estimates will be computed using the same Huber influence
function. These kinds of interval graphs can be quite useful in Quality Assurance/Quality
Control (QA/QC) applications including industrial, manufacturing, clinical trials,
medical, pharmaceutical, and environmental. Group Analysis option of Robust Interval
option can be used to formally compare interval estimates of a characteristic of interest
for various groups (e.g., lead concentrations in various areas of a polluted site, arsenic
concentrations in monitoring wells, effectiveness of two or more drugs) under study.

Standard terminology, such as coverage (e.g., half samples, h value) and cutoff (influence
function a, critical a, trimming percentage) levels used by the robust methods to identify
outliers as incorporated in Scout 2008 are described next.

3


-------
1.3.2 Coverage or Cutoff Levels (Factors) Used by Outlier Identification
Methods

Most robust methods available in the literature either use a coverage factor, h (e.g., half
samples, h = [(n+p+1 )/2] for MCD, best subset of size (p+1), or of size h = [(n+p+1 )/2]
for LMS), or a critical level, a (e.g., influence function, a for PROP and Huber influence
functions, location and scale tuning constants for Biweight function, trimming
percentage, a% for multivariate trimming (MVT) method) to identify outliers in a p-
dimensional data set of size n. There is a close relationship between the coverage or
critical cutoff and the break down (BD) point of an estimate. Specifically, for the MCD
and LMS methods, higher values of h may yield MCD and LMS estimates with lower BD
points; for influence function-based M-estimation methods (e.g., PROP and Huber),
higher values of the influence function, a, may yield estimates with higher BD points;
and for MVT method, higher values of trimming percentage tend to yield estimates with
higher BD points.

It should be noted that the success of a robust method in identifying outliers depends
upon the coverage or cutoff levels used and the behavior of the influence function. In
practice, the smooth redescending influence functions, such as the PROP influence, will
perform better than nondecreasing influence functions such as the Huber influence
function (e.g., Hampel etal. (1986)). In addition to coverage and critical cutoff levels,
initial robust starts in iterative process of obtaining robust estimates also play an
important role in achieving high break down estimates.

For each of the robust method incorporated in Scout, the user can pick a suitable
coverage, h or cutoff level, a. It is suggested that the user uses more than one coverage
or cutoff factor for the selected method. For example, for the standard MCD method
(also known as very robust MCD) with h = [(n+p+l)/2], the BD is roughly equal to 50%.
The use of the very robust MCD method with this coverage, h, tends to find more outliers
than actually are present in the data set. Even though it is desirable to use robust methods
with high BD points, those robust methods should be efficient enough not to identify
inliers (and good leverage points) as outliers (and regression outliers). This issue can be
addressed by choosing higher coverage (e.g., 75% coverage) levels. Using Scout 2008,
one can perform MCD and LMS methods for user selected coverage levels.

Since the number of outliers present in a data set is not known in advance, it is desirable
to use more than one value of the coverage or cutoff level on the same data set. In order
to get some idea about the number of outliers present in a data set, the use of graphical
displays is recommended before using the outlier identification methods available in
Scout (e.g., Huber, MCD, MVT, or PROP) or in any other software package. There is no
substitute for graphical displays of multivariate data sets. The graphical displays offer
additional information about the patterns and outliers present in a data set. This kind of
information cannot be obtained by looking at the statistics computed by the various
statistical procedures. Moreover, most computed statistics (e.g., mean vector, covariance
matrix, MDs, kurtosis) get distorted by the presence of outliers. The use of graphical
displays such as scatter plots of raw data, scatter plots of principal components (PCs),
normal quantile-quantile (Q-Q) plot of dependent variable (to identify regression

4


-------
outliers), and Q-Q plot of Mahalanobis distances (MDs) of explanatory variables (to
identify leverage point) is helpful to get some idea about the number (or percentage, k) of
outliers that may be present in the data set. The multivariate graphs listed above are also
useful to verify if the identified outliers based upon outlier test statistics (e.g., MDs, MS.
weights) indeed represent outliers. This step helps the user to pick an appropriate value of
h (MCD) or influence function alpha (e.g., PROP), which in turn will help obtain more
reliable and accurate estimates of population parameters (e.g., location, scale, regression).

1.3.3 Critical or Cutoff Outlier Alpha Used in Graphical Displays

In Scout 2008, emphasis is given to the graphical displays of multivariate data sets.
Graphical methods in Scout 2008 include: 2-dimensional and 3-dimensional scatter plots,
Q-Q, Index, and distance-distance (D-D) plots of MDs, prediction and tolerance
ellipsoids, Q-Q plots of residuals, and scatter plots of residuals versus unsquared leverage
distances, and multiple ellipsoids or regression lines on the same graph. Graphical
displays of multiple ellipsoids or regression lines provide useful graphical comparisons of
various robust and resistant methods incorporated in Scout 2008. An attempt has been
made to formalize these graphical displays by drawing control limits, prediction and
tolerance ellipsoids based upon the critical values of the MDs (individual MDs) and
Maximum MD (Max-MD) computed using the graphical alpha or regression band alpha.
Graphical displays for the MCD and LMS methods use critical values from chi-square
distribution at fixed critical level of 0.025 as cited in the literature (e.g., Rousseeuw and
van Zomeren (1990)). The LMS method uses fixed cutoff values of -2.5 and +2.5 to
identify regression/residual outliers (Rousseeuw and Leroy, 1987).

For other robust (PROP, MVT, Huber), and classical and sequential classical methods,
Scout uses critical values of the MDs based upon quantiles of scaled beta (or approximate
chi-square) distribution (Singh (1993)). The critical values of MDs and Max-MDs used
on these multivariate graphs are computed for user selected outlier critical alpha. Control
limits (or prediction and tolerance ellipsoids) drawn at critical values (based upon outlier
critical alpha) obtained from the distribution of MDs (prediction ellipsoid) and maximum
MD (tolerance ellipsoid) are drawn on the Q-Q plots and index plots of MDs. Critical
values of various other statistics displayed on the Q-Q plots of MDs, including MS, MK,
and correlation coefficients are also computed for the outlier critical alpha. On scatter
plots of raw data, principal component scores, or discriminant score, prediction ellipsoids
are drawn at critical value (computed for critical outlier alpha) from the distribution of
MDs, and tolerance ellipsoids are drawn at critical value from the distribution of
maximum MD (Max-MD). Observations lying outside the outer ellipsoid (tolerance)
represent potential outliers, and observations lying between the inner (prediction) and
outer (tolerance) ellipsoid may be considered representing borderline outliers.

In regression applications, graphical displays of Q-Q plot or index plot of residuals with
control limits drawn at the critical values (associated with selected regression outlier a) of
unsquared residual distances (for LMS, these are hard lines drawn at -2.5 and 2.5) are
used to determine regression outliers. A semi-formal residual versus unsquared leverage
distance plot (Singh and Nocerino (1995)) is also available in Scout to identify regression

5


-------
outliers (uses regression outlier alpha) and inconsistent (bad) leverage outliers (uses
leverage outlier alpha). In most of the graphical displays listed above, Scout 2008
collects and uses user selected critical levels to compute appropriate critical values of the
statistics used (e.g., critical values of MDs, critical value of Max MD, critical values for
leverage Mahalanobis distances and unsquared regression distances) to generate the
graphical displays.

1.3.4 Break Down Point

A brief description of the break down (BP) point (Hampel (1974, 1975), Huber (1981),
Maronna, Martin, and Yohai (2006), Hubert, Rousseeuw, and van Aelst (2007)) of an
estimate is described as follows.

1.3.4.1 Break Down Point of an Estimation Method

A great deal of emphasis is placed on break down (BD) point of robust outlier
identification and estimation methods. The performance of various robust methods
(estimates) is evaluated in terms of their BD points (e.g., Hubert, Rousseeuw, and van
Aelst (2007)). Robust methods roughly having BD point of about 50% are preferred and
often are called "very" robust methods (e.g., Rousseeuw and van Zomeren (1990),

Hubert, Rousseeuw, and van Aelst (2007)). It is also noted that the "very" robust
estimation methods are inefficient as they often tend to find more outliers than actually
are present in a data set (e.g., Maronna, Martin, and Yohai (2006)). The LMS
(Rousseeuw (1984), Rousseeuw and Leroy (1987)) and the MCD (Rousseeuw and van
Driessen (1999)) methods treat all outliers (e.g., extreme and borderline outliers) equally
by assigning the same "zero" weight (hard rejection of outliers). Therefore, it is desirable
to use influence function (Hampel (1974, 1985), Huber (1981 ))-based robust methods
possessing soft and smooth rejection of outliers. The PROP influence function (e.g.,
Singh (1993)) is a redescending smooth influence function. It is noted that iteratively
obtained robust M-estimates based upon the PROP influence function (e.g., with initial
robust starts) assign reduced-to-negligible weights, respectively, to intermediate and
extreme observations; observations coming from the central part of data are assigned full
unit weights. Furthermore, the robust estimates based upon the PROP influence function
are in close agreement with the classical estimates obtained using the data set without the
outliers (Singh and Nocerino (1995)).

The BD point of a method (or of estimates obtained using that method) represents that
fraction of observations which can be altered (e.g., can be made very large) arbitrarily
without affecting (influencing, distorting, changing drastically) the values of the
estimates. That is the BD of a method (e.g., LMS) represents that fraction of outlying
observations that can be tolerated by the estimates (e.g., LMS estimates) obtained using
that method without distorting (breaking) the estimates. Obviously, the BD point of a
classical estimate (e.g., arithmetic mean, OLS regression estimates) is "zero," as even a
single arbitrarily selected large value can completely distort (change the estimate without
bounds) that classical estimate. It is also noted that the sample median of a data set (and

6


-------
similarly median of squared residuals) has a BD point of 50% as median of a data set
remains unchanged even when about 50% of the data values are altered arbitrarily.

The break down points of LMS and MCD methods are known to be about 50%. Details
about LMS and MCD estimates and their break down points are discussed respectively in
section 1.3.6 and 1.3.7. Both the LMS regression and the MCD estimation methods are
based upon extensive searches of elemental subsets (Hawkins, Bradu, and Kaas (1984),
Hawkins (1993)) of size, (p+1). Other variations of the initial subset size such as subsets
of size (n+p+1) may also be used. Some of these choices for sizes of the initial subsets
searched have been incorporated in the Scout software. In Scout, the MCD method is
labeled as the Extended MCD method. It is also known that the theoretical break down
point of M-estimates (Maronna, 1976) of p-dimensional multivariate location and scale is
no more than 1/ (p+1). However, it should be noted that practical BD of an iteratively
obtained robust M-estimate (generalized likelihood estimate) based upon a smooth
redescending function such as the PROP (Singh, 1993) influence function can be much
higher than l/(p+l). The break down point of iteratively obtained robust and resistant
estimates increases with each iteration (as outlying observations iteratively are assigned
reduced weights) until the convergence of M-estimates is achieved. Typically,
convergence is achieved in less than 10-15 iterations. More details can be found in
Section 1.3.8. Scout generates intermediate results for all intermediate iterations for users
to review. It should be noted that higher break down points of iteratively obtained robust
estimates (e.g., Huber and PROP) are achieved by using higher values of the influence
function alpha, a (or of trimming percentage for MVT method), used to identify outliers.
It is observed that a robust method based upon PROP influence function assigns reduced
to negligible weights to intermediate and extreme outliers. This is especially true when
an initial robust start (e.g., based upon OKG (Devlin, Gnanadesikan, and Kettenring
(1975)), Maronna and Zamar (2002) method) is used in the iterative process of obtaining
M-estimates.

1.3.5 Initial Estimation Methods Available in Scout 2008

Several initial start robust estimates to compute iteratively obtained M-estimates are
available in Scout. It is well known that classical methods have a zero BD point, and
they suffer from severe masking effects. This means that the presence of some of the
outliers (e.g., extreme outliers) may mask the presence of some other outliers (e.g.,
intermediate outliers). Even robust outlier identification and estimation methods suffer
from masking effects. In order to overcome and reduce the masking effects, robust initial
start estimates are used in the iterative process of obtaining robust estimates. Initial start
robust estimates as incorporated in Scout can be used with all iterative estimation
methods (including sequential classical method) available in Scout.

The initial start estimates as incorporated in Scout include: l) the classical mean vector
and classical scale matrix; 2) the median vector and MAD/0.6745 (or IQR/l ,35)-based
covariance matrix with off diagonal elements obtained from the classical covariance
matrix; 3) the median vector and covariance matrix obtained using the Kettenring and
Gnanadesikan (KG) method (1975); and 4) the median vector and orthogonalized KG

7


-------
(OKG) covariance matrix as proposed by Maronna and Zamar (2002). Here,
MAD/0.6745 represents the MAD-based standard deviation of a variable, and the
IQR/1.35 represents the IQR standard deviation of a variable. In practice, often the MAD
of a variable becomes zero, even when the variance of that variable is not zero (e.g., well
known Iris data of size 50). In such cases, an IQR fix is applied, and the IQR/1.35 is
used as a robust estimate of the standard deviation for that variable.

It is noted that the OKG estimate as an initial estimate works very well with most
iterative estimation methods, including PROP, Huber, and MVT. It is also noted that the
use of the OKG method as an initial start estimate also improves the performance (in
terms of identification of outliers) of the iterative sequential classical method. However,
the computation of the OKG mean vector, as suggested and described in Maronna and
Zamar (2002), and Maronna, Martin, and Zamar (2006), does not yield good results, and
therefore not included in Scout. The developers of Scout 2008 are currently working on
how to compute more reliable estimate of the mean vector based upon OKG method.

1.3.6 Least Median of Squares (LMS) Regression Method

In the LMS regression method, the objective is to find an elemental subset of size (p+l)
that minimizes the median of squared residuals (Rousseeuw (1984)). The minimization
criterion for the LMS regression is the median of squared residuals. This objective is
obtained by searching for elemental subsets of size (p+l), p = number of explanatory
variables. The elemental subset that minimizes the median of squared residuals is called
the "best" elemental subset. It should be noted that more than one elemental subset can
yield the same minimum value of the criterion (median of squared residuals). The use of
different LMS subsets (best subsets) may result in different LMS regression estimates.

Depending upon the dimension and size of the data set, the process of searching for the
best (global) elemental subset of size (p+l) can be time-consuming. Therefore, in
addition to an exhaustive search for all elemental subsets, some quick (1,500 subsets),
extensive (3,000 subsets), and user specified search strategies have been incorporated in
Scout. As mentioned before, the best subset (minimizing the objective function) of size
(p+l) may not be unique, even when the search is exhaustive. Therefore, the LMS
regression parameter estimates may not be unique.

Since the median of squared residuals is being minimized, the BD of LMS regression
estimates is roughly 50%. The LMS estimates can tolerate about 50% arbitrarily large
values (outliers) before the regression estimates break down or get severely distorted by
the presence of those outliers. Since the LMS method roughly has 50% BD point, the
LMS method tends to identify about 50% the observations as outliers (both regression as
well as leverage outliers). It is observed that, in practice, the LMS method identifies
some of the inliers (non-outliers for obtaining a regression model) as outliers. That is, the
LMS method may find more outliers than actually are present in the data set. This is the
reason that the LMS method is known as an inefficient robust method (Maronna, Martin,
and Yohai (2006)). To some extent, this problem is overcome by using re-weighted least
square regression by assigning zero weights to observations with LMS absolute residuals

8


-------
greater than 2.5 (Rousseeuw and Leroy (1987), Rousseeuw and van Zomeren (1990)).
However, it is noted that even after performing this extra step of re-weighted least square
regression, the LMS method tends to find some of the non-outliers as outliers.

It is also noted that, even though, the LMS method identifies most of the leverage points
that may be present in a data set, it fails to distinguish between the good and bad leverage
points. As a result, the resulting regression model may not be very useful. This issue is
illustrated in this user guide by using the LMS method on the Hawkins, Bradu, and Kaas
- HBK (1984) data set. This HBK data set has 75 observations and 3 explanatory
variables. In the literature, the leverage points are defined as those outliers that are
outliers in the space of x-variables (3-dimesional here). The good leverage points
enhance the regression model (with higher coefficient of determination, lower scale
estimate, and lower standard errors of estimates of regression parameters) and bad
leverage points are outliers in both x-space and y-direction of dependent variable. The
detailed definition (with graphical displays) of regression outliers, good and bad leverage
points can be found in Rousseeuw and Leroy (1987), Rousseeuw and van Zomeren
(1990), and Singh and Nocerino (1995). Following the definition of regression outliers,
good (consistent) and bad (inconsistent) leverage points, in HBK data set, there are 4 (11,
12, 13, and 14) bad leverage points (and regression outliers) and 10 good leverage points,
as the inclusion of 10 good points (I through 10) enhance the regression model. The
LMS regression method identifies observations 1 through 10 as bad leverage points,
contradicting the definition of good leverage points as described and graphically
illustrated in Rousseeuw and Leroy (1987). Without the first 10 observations, there is no
regression model, and the problem reduces to simply an outlier identification problem.
Several methods in Scout 2008, such as the PROP method with an OKG start and the
MCD method, find the first 14 observations in both 3 (without y-variable) and 4 (with y-
variable) dimensional spaces.

Alternatively, instead of minimizing the median of squared residuals, one can minimize
some percentile (e.g., 75th percentile, or 90th percentile) of squared residuals. This
method is labeled as the least percentile of squares (LPS) regression method in the
regression module of Scout software package. The problem of not distinguishing
between the good and bad leverage points may be addressed by using the LPS regression
(see example in Scout User Guide). Depending upon the number of bad leverage points
and regression outliers present in the data set, one may want to use the LMS or the LPS
method on the same data set to obtain the appropriate robust fit. Obviously, the LPS
regression estimates obtained by minimizing the k1'1 (k> 50%) percentile of squared
residuals will have a lower break down point than the LMS estimates. For example, the
BD of LPS regression estimates obtained by minimizing the 75lh (k=75%) percentile of
squared residuals is (n-[n*0.75]-p+2)/n, where p is the number of regression variables,
and [x] represents the largest integer contained in x.

In order to perform the LPS regression, one needs to have some idea about the value of k,
the percentage of outliers (bad leverage points and regression outliers) that may be
present in the data set. One may want to perform the LPS regression for a few values of
k including k = 0.5. As mentioned before, since the number of outliers (both regression

9


-------
and leverage) are not known in advance, it is suggested to use graphical displays, such as
scatter plots of the raw data, scatter plots of the principal components (PCs), a normal
quantile-quantile (Q-Q) plot of dependent variable (to identify regression outliers), and a
Q-Q plot of Mahalanobis distances (MDs) of explanatory variables (to identify leverage
points) to get some idea about the number (or percentage, k) of outliers that may be
present in the data set. Based upon the outlier information thus obtained, one may
perform an appropriate LMS/LPS regression on the data set. Graphical displays are also
useful to perform confirmatory analyses, that is multivariate graphs in Scout can be used
to verify if indentified outliers (e.g., based upon MDs and weights) indeed represent
outlying and aberrant observations. The BD points for LMS (k~0.5) and the least
percentile of squared residuals (LPS, k>0.5) regression methods as incorporated in Scout
are summarized in the following table. Note that LMS is labeled as LPS when k>0.5. In
the following the fraction, k is given by 0.5< kl

Minimizing Squared Residual BD

Pos = [n/2], k = 0.5	(n-Pos)/n

Pos = [(n+l)/2]	(n-Pos)/n

Pos = [(n+p+l)/2]	(n-Pos)/n

Pos = [n*k], k>0.5 ~ LPS (n-Pos)/n

Minimizing Squared Residual	BD

Pos = [n/2], k = 0.5	(n-Pos-p+2)/n

Pos = [(n+l)/2]	(n-Pos-p+2)/n

Pos = [(n+p+l)/2]	(n-Pos-p+2)/n

Pos = [n*k], k>0.5 (n-Pos-p+2)/n ~ LPS

Here [x] = greatest integer contained in x, and k represents a fraction: 0.5
-------
not unique. It should be noted that different search options may result in different MCD
estimates.

The BD point of MCD estimates is given by the fraction (n-h+l)/n. It is noted that there
is a direct relation between the coverage value, h, and the BD point of the MCD
estimates. Higher values of h yield estimates with a lower BD point. The use of the
default value of coverage, h, roughly identifies the optimal (~ about 50%) number of
outliers. In practice, the MCD method identifies some of the inliers as outliers. As a
result, MCD method is often called to be an inefficient method (Maronna, Martin, and
Yohai (2006)). Just like the LMS method, re-weighted estimates of location and scale are
obtained by assigning "zero" weights to observations with robust MDs exceeding an
approximate chi-square value (0.975) with p degrees of freedom. In practice, it is
observed that even after performing this extra step, some of the non-outlying
observations are assigned a "zero" weight.

Scout offers some additional options to identify appropriate number of outliers using the
MCD method. Instead of finding a "best" subsets of size, h = [(n+p+l)/2], one may find
a "best" subset of size h= [n*k], where k represents some percentile >0.5. For example,
for k = 0.75, the objective will be to find a subset with minimum determinant of the
covariance matrix based upon the best subset consisting of roughly 75% (= [n*.75]) of
the observations. The BD of such MCD estimates will be roughly equal 25% (= (n-
h+l)/n). The MCD method in Scout is called the Extended MCD method. In order to use
this option to appropriately compute the coverage, it is desirable to use graphical displays
(or other robust methods) to gain some information about the number of outliers present
in the data set. The BD of such MCD estimates will be roughly equal 25% (~ (n-h+l)/n).
It should be noted that, the MCD estimates based upon a "best" subset consisting of a
higher (> 50%) percentage of data may suffer from masking effects, especially when the
data set consists of clustered data. Since all of these options are available in Scout 2008,
the user is encouraged to confirm these statements and observations on data sets from
their applications.

1.3.8 PROP Influence Function

The PROP influence function (Singh, 1993) represents a smooth redescending influence
function assigning full weights to observations coming from the central part of data, and
reduced (instead of zero weights) to negligible weights to intermediate and extreme
outliers, respectively. The details of this method can be found in Singh (1993, 1996), and
Singh and Nocerino (1995, 1997). Even though, theoretical BD of M-estimation methods
is not greater than l/(p+l), it is noted that the practical BD of an iteratively obtained
robust M-estimate (generalized likelihood estimate) based upon PROP (Singh, 1993)
influence function can be much higher than l/(p+I). The break down point of robust
estimates based upon PROP influence function increases with each iteration. By
definition of the PROP influence function, the iterative process identifies multiple
outliers smoothly and effectively by reducing the influence of outliers successively in
various iterations. This is especially true when an initial robust start based upon OK.G

11


-------
(Devlin. Gnanadesikan, and Kettenring (1975), Maronna and Zamar (2002)) method is
used in the iterative process of obtaining M-estimates.

In order to identify potential outliers present in a data set, the PROP function uses an
influence function, a, value. Since the number of outliers present in a data set is not
known in advance, it is desirable to use more than one value of the influence function, a,
on the same data set. As mentioned before, the use of graphical displays is also
recommended on methods available in Scout (e.g., Huber, MCD, MVT, or PROP) to get
some idea about the number (or % k) of outliers that may be present in the data set; and
also to confirm that identified outliers do represent outlying observations. Information
gathered from the graphical displays can be used to determine an appropriate critical or
influence function alpha, a (0
-------
1.4.1 Coverage and Influence Function Levels in Robust Outlier
Identification Methods

It should be pointed out that the success of a robust method in identifying multiple
outliers depends upon the coverage (e.g., h in MCD method) or cutoff levels (e.g.,
influence function alpha in PROP M-estimation method) and the behavior of the
influence function (nondccreasing, redescending, smooth redescending) used to identify
those outliers. For an illustration, the MCD method uses the half samples of size h,
where the coverage factor, h is typically given by h = [(n+p+l)/2], M-estimation methods
based upon PROP and Huber influence functions use a critical or influence function
cutoff level, a, and MVT method uses a trimming percentage, a% to identify outliers in
p-diinensional data sets of size n. In addition to coverage and critical cutoff levels, initial
robust start estimates in the iterative process (e.g., M-estimation) of obtaining robust
estimates also play an important role in achieving high break down estimates. It should
be noted that there is a direct relationship between the coverage or influence cutoff and ¦
the break down (BD) point of an estimate. Specifically, for the MCD (and also LMS
regression method) method, higher values of h yield MCD estimates with lower BD
points; for influence function based M-estimation methods (e.g., PROP influence
function), higher values of influence function, a yield estimates with higher BD points,
and for MVT method, higher values of trimming percentage tend to yield estimates with
higher BD points.

As a rule of thumb, for appropriate identification of outliers, n should be at least 5p; this
is especially true when dimension, p>5. From theoretical point of view, Scout can
compute various robust statistics and estimates for values of n > (p+2). However, as well
knows, the results (estimate, graphs, and outliers) obtained using such small high
dimensional (curse of dimensionality) data sets may not always be reliable and
defensible.

1.4.2 Outlier Determination Critical Alpha

In addition to coverage or influence function cutoff levels, all of the outlier methods use a
critical level (outlier critical alpha) which is used to determine outliers. Critical values of
various test statistics used in all graphical (e.g., Q-Q and index plots, ellipsoids) and
outlier identification methods (e.g., MDs, Max-MDs, kurtosis, skewness) are computed
using this critical alpha. For an example, MCD method uses a default chi-square (with p
degreed of freedom = df) cutoff alpha level=0.025 for determination of outliers.
Observations with MCD MDs exceeding chi-square (0.975) cutoff with p df may
represent potential outliers. Similarly, other multivariate outlier methods in Scout
including classical, sequential classical, and M-estimation methods (PROP, Fluber) use
an outlier alpha (user selected) that is used to compute critical values of the test statistics
(individual MD, or Max MD) used to determine outliers. Classical and robustified MDs
exceeding those critical values may represent potential outliers requiring further
investigation.

13


-------
1.5 QA/QC Module

This module provides univariate and multivariate classical as well as robust methods that
can be used in quality assurance and quality control (QA/QC) applications. All classical
and robust options and methods available in univariate Interval Module (under
Stats/GOF) and Outliers/Estimates module are available in QA/QC module. Specifically,
QA/QC module has univariate control-chart-type interval graphs; multivariate control-
chart-type index plots; and prediction and tolerance ellipsoids. These graphs can be
generated using all observations in a data set or just using observations in a specified
training (e.g., background data, placebo) subset data set. These graphs can be used to
compare test (site, project, new drug) data with control limits (e.g., prediction, tolerance,
simultaneous limits) computed based upon some training (background, reference,
controlled) data set. Specifically, this module can be used to compare training
(background, reference, upgradient wells) and test (polluted site, groundwater monitoring
wells, dredged sediments) data sets. Enough observations from the training data set
should be made available to compute defensible control limits and ellipsoids.

The training and test data option is specifically useful to determine if observations from
one test group (e.g., polluted site, test group, new treatment) can be considered as coming
from the training group (e.g., reference group, background, training group, placebo)
perhaps with known well-established acceptable behavior of the contaminant
concentrations of potential concern (COPCs). For such graphical displays, relevant
statistics and limits are computed using training (controlled, background, reference,
placebo) data set, and all points in training and test data sets are plotted on those
graphical displays. Test data points (site observations) lying outside the limits (e.g.,
tolerance and simultaneous limits) may represent out-of-control observations, that is may
represent observations not belonging to the controlled population represented by the
training data set.

Classical methods included in QA/QC module can handle univariate and multivariate
data sets with non-detect observations. For univariate data sets with NDs, the estimates of
all relevant statistics (mean, sd, standard error of the mean, upper and lower limits) are
computed using the Kaplan Meier (1958) method. The individual ND data points
displayed on the interval graphs are shown (in red color) based upon the user selected
option (e.g., replaced by DL, DL/2, and ROS estimates). KM method is also used to
compute relevant multivariate statistics (e.g., mean vector, covariance matrix, prediction
and tolerance ellipsoids) based upon training data set. Those KM statistics are used to
generate univariate or multivariate control- chart-type graphs. All data (raw or
processed) including the imputed data (for NDs) from both training and test data sets are
plotted on those control-chart-type graphs. Processed data may represent Mahalanobis
distances (used in control-chart-type index plot) or principal component scores (used in
prediction or tolerance ellipsoids). It should be noted that for uncensored data sets,
classical estimates of location and scale should be in agreement with respective KM
estimates.

14


-------
1.6 Regression Module

Scout can perform multiple linear classical and robust regression using several methods
available in the literature. Specifically, Scout can perform least median of squared (LMS)
regression as well least percentile of squared (LPS) regression as described earlier in this
chapter. Scout can also perform robust regression based upon M-estimation procedure for
MVT, and Huber, Biweight, and PROP influence functions. This module generates
several formalized graphical displays including Q-Q plot and index plot of residuals with
appropriate limits drawn at the critical values of residual unsquared Mahalanobis
distances (univariate); scatter plots of residuals versus unsquared leverage distances
(Singh andNocerino (1995)), residual versus residual (R-R) plots, Y versus Y-hat, and Y
versus standardized residuals plots. It should be pointed out that residuals are not
standardized when the scale estimate (standard deviation of residuals) is very small such
as less than le-10. The graphical displays included in Scout are useful to identify:
regression outliers, inconsistent (bad) leverage points; and distinguish between good
(consistent) and bad (inconsistent) leverage points. For most of the graphical displays
listed above, Scout 2008 collects and uses user selected critical levels to compute
appropriate critical values of statistics plotted (e.g., critical values of MDs, critical value
of Max MD) in graphical displays. Scout also generates confidence and prediction bands
around fitted regression models including classical linear, quadratic, and cubic; and
robust linear models. For the sake of completeness, in addition to robust regression
methods, Scout also performs regression diagnostics.

1.6.1 Robust Regression Based Upon M-Estimation and Generalized M-
Estimation

Scout can perform robust regression with or without the leverage option. If the leverage
option is not used, then iterative M-estimation procedure is used directly on residuals;
and when leverage option is used, the generalized M-estimation method is used. In
generalized M-estimation method, leverage points (outliers in X-space of explanatory
variables) are identified first; and weights thus obtained are used in the first iteration to
identify regression outliers (e.g., Singh and Nocerino, 1995). Typically, in practice not all
leverage points are regression outliers. It is observed that the generalized M-estimation
regression method (e.g., PROP influence function) works quite effectively in identifying
regression outliers, and distinguishing between good and bad leverage points. The user
may want to use both options (leverage and no leverage) supplemented with graphical
displays on a given data set and compare relevant regression statistics (e.g., coefficient of
determinations, residual scale estimates, standard errors of estimates of regression
coefficients) thus obtained to determine the best multiple linear model fit.

15


-------
1.7 Principal Component Analysis (PCA) and Discriminant
Analysis (DA)

Scout 2008 can perform classical as well as robust principal component and discriminant
analyses. The details of robust PCA and DA based upon the MVT method, the PROP
and the Huber (Huber, 1981, Gnanadesikan and Kettenring, 1981) influence functions are
given in Singh and Nocerino (1995). Additional details about robust PCA and robust
discriminant analyses can be found in Campbell (1972), Hubert and Driessen (2002),
Hubert and Engelen (2006), Hubert, Rousseeuw and Branden (2005), and Todorov and
Pires (2007).

For uncensored data sets without non-detect observations, Scout can perform classical
PCA and robust PCA based upon M-estimation methods (e.g., PROP, Huber, MVT), and
MCD method. PCA can be performed using covariance as well as correlation matrices.
Often for large dimensional data sets, PCA is used as a dimension reduction technique,
where future statistical analyses are performed on a much smaller (than p original
variables) number, k (<=p) of PCs.

° It is noted that PCA performed using covariance matrix is more informative,
especially when PCA is to be used as a dimension reduction technique.

° Q-Q plots and scatter plots of PC scores obtained using the covariance matrix
may be used to identify potential outliers. Significant jumps and turns in Q-Q plot
of PCs suggest the presence of multiple populations in the data set.

° Q-Q plots ar>d scatter plots of PC scores based upon the correlation matrix may be
used to assess approximate multinormality (cautiously).

Based upon the PC statistics and scores thus obtained, this module generates Scree and
Horn plots for the eigen values, Scatter plots of PC scores, normal Q-Q plots of PC
scores. One can store PC scores in the same or a different worksheet for future analyses.
PCA is often used dimension reduction techniques. Typically, first few PCs explain most
of the variation that might be present in a data set. The Q-Q plots of the first few PCs and
scatter plots of first few PCs can be used to identify variance inflating outliers and/or to
identify the presence of mixture data sets. One can draw prediction and tolerance
ellipsoids on scatter plot of PC scores.

For multivariate data sets with NDs, not much guidance is available in the statistical
literature on how to perform PCA. This topic is still under investigation. Scout 2008 can
be used to perform PCA based upon based upon Kaplan-Meier (1958) method (still being
investigated). Using the KM covariance (correlation) matrix, one can generate Scree and
Horn Plots. For exploratory purposes, one can also impute PC scores based upon KM
covariance matrix. However, in order to compute load matrix and PC scores, one needs
to replace "ND observations with some imputed values. Scout offers several choices for
computing such PC scores. These methods include substitution methods (0, DL/2, and
DL, uniform random generation of NDs), and regression on order statistics (ROS)
methods. It should be noted that for exploratory purposes, one may want to use Data

16


-------
module of Scout to impute non-detect observations before using PCA module. This step
will yield a full data set without any ND observations (NDs replaced by
imputed/substituted values). One can then use any of the classical and robust PCA
methods available in Scout.

Scout 2008 can be used to perform classical and robust (based upon MVT, PROP and
Huber influence functions) Fisher linear discriminant analysis (FDA), linear and
quadratic discriminant analyses. The classical and robust DA methods are supplemented
with graphical displays. The available graphical displays include Scree plots of eigen
values and scatter plots of discriminant scores (for Fisher Discriminant Analysis) and
original variables used to perform discriminant analysis. On scatter plots of discriminant
scores, Scout can draw prediction and/or tolerance ellipsoids. As with all other graphical
displays with group assignment options, on scatter plots of discriminant scores, one can
reclassify an observation from one group into another group interactively by change
group and save changes options. This option can be quite useful for properly classifying
border line observations. It should be noted that based upon the discriminant functions
(classical or robust), Scout can be used to plot and classify observations with unknown
(or new) group memberships into one of the groups used in deriving those discriminant
functions.

Several cross validation (CV) methods for DA are also available in Scout 2008. The CV
methods in Scout 2008 include: leave-one-out (Lachenbruch and Mickey (1968)), split
samples (training and test sets), M-fold CV and bootstrap methods (e.g., Davison and
Hall (1992), Bradley and Efron (1997)). In order to use the CV methods properly, the
user should make sure that enough data are available in each of the various groups
included in the data set.

1.8 Output Generated by Scout 2008

All modules of scout either generate graphical output displays (*.gst file), or Excel-type-
spreadsheets (*.ost file), or both graphical displays and excel-type-spreadsheets. The
"ost" output file generated by Scout can be saved as an Excel file; and "gst" graphical
display can be copied into a Word or WordPerfect file. All of the relevant information,
statistics, classical and robust estimates of parameters of interest are displayed on those
output sheets. Specifically, all classical estimates, initial robust estimates, final robust
estimates, and associated weights are displayed on the output sheet generated by Scout.
The user can also save intermediate results in a separate spreadsheet by choosing the
Intermediate Iterations option. In addition to graphs, most graphical displays also exhibit
relevant estimates, test statistics and associated critical levels and p-values.

17


-------
1.9 Installing and Using Scout

1.9.1	Minimum Hardware Requirements

o Intel Pentium l .0 GHz

° 285 MB (396 MB including Scout 2008 resources) of hard drive space
° 512 MB of memory (RAM)
o CD-ROM drive

° Windows 98 or newer. Scout was thoroughly tested on NT-4, Windows
2000, and

o Windows XP operating systems. Limited testing has been conducted on
Windows ME.

1.9.2	Software Requirements

Scout has been developed in the Microsoft .NET Framework using the C# programming
language. As such, to properly run Scout, the computer using the program must have the
.NET Framework pre-installed. The downloadable .NET files can be found at one of the
following two Web sites:

o http://msdn2.microsoft.com/en-us/netframeworkydefault.aspx
Note: Download .NET version 1.1

o http://www.microsoft.com/down loads/detai ls.aspx?Family ld=262D25E3-
F589-4842-8157-034D1 E7CF3A3&displavlang=en

The first Web site lists all of the downloadable .NET Framework files, while the second
Web site provides information about the specific file(s) needed to run Scout. Download
times are estimated at 57 minutes for a dial-up connection (56K), and 13 minutes on a
DSL/Cable connection (256K).

1.9.3	Installation Instructions

Scout 2008 v. 1.00.01 Installation Instructions from the CD

Open Windows Explorer and create a new directory called Scout 2008 v. 1.00.01.

Download (save) the Scout 2008 v. 1.00.01 files from the CD to the Scout 2008 v.
1.00.01 directory.

Using Windows Explorer, right click on the Scout 2008 v. 1.00.01 main directory and
make sure that the read-only attribute is off.

Using Windows Explorer, create a shortcut (optional) by right-clicking on the file,
Scout.exe (application), in the Scout directory; left click on "Send To" and left click on
"Desktop (create shortcut)" to create a shortcut icon the desktop (optional: rename to
Scout 2008 v. 1.00.01).

18


-------
Using Windows Explorer, start Scout 2008 v. 1.00.01 by left double-clicking on the file,
Scout.exe (application), in the Scout directory, or by left double-clicking on the Scout
shortcut icon on the desktop, or by using the RUN command from the Start Menu to
locate and run Scout.exe.

Try to open an example file in the Scout sub-directory, Data. If the file does not open, be
sure that the read-only attribute is off (right-click on the Data sub-directory).

If the computer does not have .NET Framework 1.1 installed (either a pre-2002 Windows
operating system or a late version of Windows XP), then it will be necessary for the end
user to download it from Microsoft. A Google search for "NET Framework 1.1" will
yield several download locations.

1.9.4 Getting Started

The functionality and the use of the methods and options available in Scout have been
illustrated using "Screen Shots" of output screens generated by Scout. Scout uses a pull-
down menu structure, similar to a typical Windows program.

The screen below appears when the program is executed.

Scout 200.8, • [Worksheet]-

Ffe Edt Conflgixe Data Graphs Stats/GOP 0«jtkers|fEsttn6t« Regression MJtrvarlete EDA GeoStets Programs Window Help

Navigation Panel |
Name

Navigation
Panel

4

LOG 61713 AM
LOO 11 49 00 AW
LOG 11 49 04 AM

[Information] D \Narain\ScouLFo'_Windows\ScoulSoijfce\Wort
-------
o The NAVIGATION PANEL displays the name of data sets and all
generated outputs.

o At present, the navigation panel can hold at most 20 outputs. In order
to see more files (data files or generated output files), one can click on
Widow Option.

® The LOG PANEL displays transactions in green, warnings in orange, and
errors in red. For an example, when one attempts to run a procedure
meant for censored data sets on a full-uncensored data set, Scout will print
out a warning message in orange in this panel.

o Should both panels be unnecessary, you can click Configure > Panel
ON/OFF.

The use of this option will give extra space to see and print out the statistics of interest.
For an example, one may want to turn off those panels when multiple variables (e.g.,
multiple Q-Q plots) are analyzed and GOF statistics and other statistics may need to be
captured for all of the variables.

20


-------
Chapter 2

Working with Data, Graphical Output, and
Graphical Output

2.1 Creating a New Spreadsheet (Data Set)

To create a new worksheet: click File > New

Configure Programs Window Help

'anel

Open
Import ~

Exit

2.2 Open an Existing Spreadsheet (Data Set)

If your data sets are stored in the Scout data format (*.wst), Scout output format (*.ost),
Scout graphical format (*.gst) or an Excel spreadsheet (*.xls), then click File !> Open.

o If your data sets are stored in the Microsoft Excel format (*.xls), or in the DOS-
Scout format (*.dat) or ParallAX format (*.pax), then choose File > Import
> Excel or Old Scout or ParallAX.

Configure Programs Window Help

New
Open

Import '~

?anel

Oil

Exit

Excel (,xls) Data
Old 5cout (,dat) Data
Comma Delimited (.csv) Data
Blank or Tab Delimited (.txt) Data
Parallax (.pax) Data

° Make sure that the file that you are trying to import is not currently open.
Otherwise, there will be the following warning message in the Log panel:

"[Information] Unable to open C:\***.xls. " Check the validity of this file.
Note: *.csv files and * .txt Files will be available in later versions of Scout.

21


-------
2.3 Input File Format

® The program can read Excel files (*.xls files), data files (*.dat files for DOS
versions of GeoEas and Scout software packages), ParallAX files (*.pax files),
comma delimited data files (*.csv files), and tab or space delimited files (*.txt
files).

o The user can perform typical Cut, Paste, and Copy operations, as in Microsoft
Excel.

o The first row in all input data files should consist of alphanumeric (strings of
numbers and characters) variable names representing the header row. Those
header names may represent meaningful variable names such as Arsenic,
Chromium, Lead, Temperature, Weight, Group-ID, and so on.

o The Group-ID column has the labels for the groups (e.g., Background,
AOC1, AOC2, 1, 2, 3, a, b, c, Sitel, Site2, and so on) that might be
present in the data set. The alphanumeric strings (e.g., Surface, Sub-
surface) can be used to label the various groups.

o The data file can have multiple variables (columns) with unequal number
of observations. NOTE: Some of the robust methods require all of the
variables to have an equal number of observations.

o Except for the header row and columns representing the group labels, only
numerical values should appear in all of the other columns.

o All of the alphanumeric strings and characters (e.g., blank, other

characters, and strings), and all of the other values (that do not meet the
requirements above) in the data file are treated as missing values.

o Also, a large value denoted by 1E3I (= 1 x 1031) can be used to represent
missing data values. All of the entries with this value are ignored from the
computations. Those values are counted when missing data values are
tracked.

2.4 Number Precision

° You may turn Full Precision on or off by choosing: Configure > Full Precision
On/OFF.

22


-------
Seout -4>.0j

Configure :

Programs Window Help

Nav v Panel On/Off

° By leaving the Full Precision turned on, Scout will display numerical values using
an appropriate (the default) decimal digit option. However, by turning the Full
Precision off, all of the decimal values will be rounded to the nearest thousandths
place.

o Full Precision On option is specifically useful when one is dealing with data sets
consisting of small numerical values (e.g., <1) resulting in small values of the
various estimates and test statistics. Those values may become very small with
several leading zeros (e.g., 0.00007332) after the decimal. In such situations, one
may want to use the Full Precision option to see nonzero values after the decimal.

2.5 Entering and Changing a Header Name

1. Highlight the column whose header name (variable name) you want to change by
clicking either the column number or the header as shown below.



0

1

2



Arsenic





1

4.5j



2

5 G





3

4.3





4

5.4





5

9.2





2. Right-Click and then click "Header Name"



0

1 | 2



2

5 6
A 3
54
92

	



3

4

5





3. Change the FleaderName.

23


-------
Header Name



¦m

Header Name:

|Arsenic Site 1



OK

| Cancel |



4. Click the "OK" button to get the following output with the changed variable
name.



0

1

2



Arsenic Site 1





1

4 5





2

56





3

43





4

5.4





5

9.2





2.6 Editing

Click on the Edit menu item to reveal the following drop-down options.

H?

^^9 Configure Data Graphs Stats/GOF Outliers/Estim

Naviga

Cut (Ctrl-X)
Copy (Ctrl-C)
Paste (Ctrl-V)



0

1



Name







Wo

1



	

— -



2

3

I

The following Edit drop-down menu options are available:

o Cut option: similar to a standard Windows Edit option, such as in Excel. It

performs standard edit functions on selected highlighted data (similar to a buffer).

° Copy option: similar to a standard Windows Edit option, such as in Excel. It
performs typical edit functions on selected highlighted data (similar to a buffer).

o Paste option: similar to a standard Windows Edit option, such as in Excel. It
performs typical edit functions of pasting the selected (highlighted) data to the
designated spreadsheet cells or area.

° Note that the Edit option could also be used to Copy Graphs.

24


-------
2.7 Handling Non-detect Observations

Scout can handle data sets with single and multiple detection limits.

For a variable with non-detect observations (e.g., arsenic), the detected values, and the
numerical values of the associated detection limits (for less than values) are entered in the
appropriate column associated with that variable.

Specifically, the data for variables with non-detect values are provided in two columns.
One column consists of the detected numerical values with less than (< DL,) values
entered as the corresponding detection limits (or reporting limits), and the second column
represents their detection status consisting of only 0 (for less than values) and 1 (for
detected values) values. The name of the corresponding variable representing the
detection status should start with d_, or D_ (not case sensitive) and the variable name.
The detection status column with variable name starting with a D_ (or a d_) should have
only two values: 0 for non-detect values, and 1 for detected observations.

For an example, the header name, D Arsenic, is used for the variable, Arsenic having
non-detect observations. The variable D_Arsenic contains a 1 if the corresponding
Arsenic value represents a detected entry, and contains a 0 if the corresponding entry for
variable, Arsenic, represents a non-detect.

There should not be any missing value in the non-detects column. If there exists an
observation with no indication of "0" or "1" in the non-detects column, then that
observation should be deleted if the various methods for non-detects are to be used.
Otherwise the methods for detected data (i.e., methods which do not require a non-detects
column) can be used.

25


-------
















fcS] D:\exaniple.ws

t













0 I 1

2

3

4

5

6 1 31



Aisenrc

D_Ar*enic

Mercury

D_Mercuiy

Vanadium

Zinc

Group 1 —IH

1

45

0

0 07

11 16 4 89 3iSurface

2

56

1

0 07

1

168

90 7|Suiface

3

4 3

I 0

011

0

772

95 5 Surface 1

4

54

1

02

0

~79~4~

113jSurface

5

9 2

1

0 61

l"

153"

266 Surface

G

62

1

012



30 8

80 9[Surface

7

67

1

0 04



29 4

80 4lSurface

8

56

1

006



| 13*8

89 2^Surface

9

85

1

099

1

r8 9

182jSurf.se© ,

10

565

1

0125

11

i 1725

80 4'Surface

i

11

54

1

018

1]

I f7 2

91 9j Subsurface.

12

55

1

0 21



! IS 3

112] Subsurface i

13

59

l~ 1

023

1

: ~~ "i6~8

172|SubswfacO|

14

5.1

1

0 44

1

i 171

99( Subsurface

15

52

1

012

1

j 10*3

90 7| Subsutfaco

16

45



0 055

1

r ~ " 75" i

66 3(Subsuface;

17

61

1

-

0 055

1

"24 3;

! 75

|

Subsurfacej ||

18

"~6T



i 021

_

1 10

185

Subsurface1 B

19

68

1

; 067

__

! VGS

164

Subsurface. 1]

20

* 5



"01

_

r 12

I 68 4|Subsutface| |j

21





08

il



I 1 II

22





026

~_ . _ii





i

!				

23





0 97









24





0 05



j

I "

1 " "j ,

y





02G

Ti

I

1

	^

2.8 Handling Missing Values

| sS File Edit Configure Graphs 5tats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel |

C°PV 1 | 2 3 4 | 5

6

7

8

Name

Generate Data ~ i
Impute ND Data 	!	¦		i	







Worksheet

¦ !R!)!ll9AIM!f!l>1^n Rpntarfi Mrainn ujihh Hpan
Transform (No NDs) 1 Replace Missing with Median

1

i







| J | | Remove Rows with Missing Data

1







Section 4.4 details how missing values are treated in Scout.

26


-------
2.9 Saving Files



~9| File ' Edit	Configure Window j

N<, New	J

—Open		 		

— Import	~ 	 	

Close	"°-

Print

Print Preview
Exit

t-

o The Save option allows the user to save the active window.

° The Save As option allows the user to save the active window. This option

follows typical Windows standards, and saves the active window to a file in Excel
(*.xls) format or an output sheet (*.ost) format.

2.10 Printing Non-Graphical Outputs

l. Click the output you want to copy or print in the Navigation Panel.

27


-------
Scout:-4.0:-[HuberQiM-nsljJ

py Fie Edit Conftgue
Navigalion Panel

Programs Window Hdp

_L

J	I	L

-I

| Hubei MullivaiiateOulliei Anafems

D \Narain\Scout_Fo
OLSOut ost
OLSresQQ gst
OLSresXY gst
OLSresNOX gst
0LS_YYhat gst
OLSresY gst
OLSresYhat gst

Huberlndex of Obse
HuberDD gst
HuberClQ gst

Use Selected Option* 1
"D^/TimTo* CWialan |1>n/200B 4*15 4~6PM

From Fie \>J«an\Scc*i_Fo»_Window\ScoutSouiceV.WofkD8lln£xcdSBRADU
F J Pi ecu on OFF

DtcdAlpha	005

Influence Function Alpha	0 05

Initial Eitrftatej	Robust Medan Vecto* aid OKG (M.wcmaZama< 2002] Matjw

Display Condaboo R Mabw	Do Mot Dupity Con elation R matiK

Dft&bjboridSquaedMOs	Be»a Dctrtxion

ShowIntennedalsReaJt DoNot DapttylnteimeddieResuts
Ttie lor Index Plot (Hubei Estimate

T tie foi Didance-Dctance Plot	Hubef Ettmata
' Tile l«Q~CfPlot HiieiEimie
Graphs Cntical Alpha ,005
MDt Dctoixiton Beta

Nunber of ObietvViom
Nuntw of Selected Vanktfw

liU

"^r1

Log Panel

LOO	4 15 26 PM »f1nfoimalion] Y-Hatvs Residuals Plot of Residuals was generated!

LOG 4.15 49PM '(Information] Index Plot of MDs was geneiated'

LOG	4 15 51 FM '[Information] Classical Distance versus Distance Plot ofMDs was generated!

LOG	4 15 53 PM '(Information] GQ Plot of MDs was jwrieraled1

2.

Click File E> Print.

Sil Scout; 4|J). [jHuberOut.ost])

~Sljjjj Edit Configure Programs

Nc'

_l

N

New
Open
Import
Close

Save
Save As,

Print Preview

Exit

0.

Huberlndex of Obse
HuberDD gst
HuberQQ.gst

2.11 Working with Graphs

Advanced users are provided with two sets of tools to modify graphics displays. A
graphics tool bar is available above the graphics display, and as the user right clicks on
the desired object within the graphics display, a drop-down menu will appear. The user
can select an item from the drop-down menu list by clicking on that item. This will allow

28


-------
the user to make desired modifications as available for the selected menu item. An
illustration is given below.

2.11.1 Graphics Toolbar

aHuff's	is<3Efti * a I "3i

Scatter Plot of Discriminant Scores

-14.2

10.!	11.1	12.1	13.1	14.1	15.1	161	17.1	18.1	19.1	20.1	21.1 218

DS2

ai »2 a 3

The user can change fonts, font sizes, vertical and horizontal axes, and select new colors
for the various features and text. All of those actions are generally used to modify the
appearance of the graphic display. The user is cautioned that those tools can be
unforgiving and may put the user in a situation where the user cannot go back to the
original display. Users may want to explore the robustness of those tools and become
more experienced in their use before actually trying to use those graphic tools on real
data sets.

Another feature in this graphics tool bar is the presence of one, two, or three drop-down
variable selection boxes, depending upon the type of graph.

o The XY Plot in Regression has only one drop-down variable selection box
for different X variables.

o The Scatter Plots in 2D Graphs, Principal Component Analysis, and
Discriminant Analysis have two drop-down variable selection boxes for

29


-------
selecting different X and Y variables. The first box is for the X variable
and the second box is for the Y variable.

•	Scatter Plots in 3D Graphs have three drop-down variable selection boxes
for selecting different X, Y and Z variables.

•	The user can select the required variables and the new graph is obtained
by clicking the "Redraw'" button. An example is given below.

Note: One can select variables from the graph itself, as shown in the following figure.

Graph: PROP principal components scatter plot.

Data Set used: Well-known Wood data set. All five of the X-variables were selected to derive the PCs.

Default Graph Obtained: PCI is drawn along the X-axis and PC2 is drawn along the Y-axis.

Changing X-axis variable to PC4 and Y-axis variable to variable X2.

h- ; ¦ L_ H

BU- » 'M ' ^ x Ira

.rJ

l

-

ReDraw j

271
2.24

Scatter Plot of PROP PCs



PC1
PC2
PC3
PC4
PC5

Ml

A



M









d



	



1.74





x3

V



1.24



J







0.74

jri

M









PC2

o a

as k







J

m

-0.76











-1.26

d

M









-1.76











-2.26

M









-2.76
-2.53

-2.03 -1.53 -1 03 -0.53 -0.03 0.47 0.97

PC1

1.47

1.97

2.47 2.71

30


-------
The X-axis variable is PC4 and the Y-axis variable is variable X2.

5 « # L'lUr/ * »

i J *2 % ¦ a *

|PC4

»| |x2 t| ReDiaw |



Scatter Plot of PROP PCs



0.175









0.166

J









ji

0.156



JI

A



0.146









X 0.136





M

M

M



0.126





j*

a

M



0.116





4

J *

A



0.106





d *
M



0.096

-5.9 -4.9

-3.9 -2.9 -1.9

PC4

-0

1 i

.9 0.1 1.1 1.3

2.11.2 Drop-Down Menu Graphics Tools

Those tools allow the user to move the mouse icon to a specific graphic item such as an
axis label or a display feature. The user then right clicks the mouse button and a drop-
down menu appears. This menu presents the user with available options for that
particular control or graphic object. If one is not careful and experienced, then there is a
small risk of making an unrecoverable error when using those drop-down menu graphics
tools. As a cautionary note, the user can always delete the graphics window and redraw
the graphical displays by repeating their operations from the datasheet and menu options
available in Scout. An example of a drop-down menu obtained by right clicking the
mouse button on the background area of the graphics display is given as follows. Some
of the options are: changing the color of the observations, changing the type of graph,
viewing the observation numbers (Point Labels), and editing the title of the graph.

31


-------
>ba	a  <

0.175
0166
0.156

0 096

& s ij

PC4

-4.9

Scatter Plot of PROP PCs

v Toolbar
U Data Editor

[HI Legend Box

A

|L_ Gallery

2£ 1=* fei kl Ci te!

Color ~

Ljkiteiiiiiiitt

3 oj. Edit title
Point Labels
Font...

lit 9 9 taa A sir
'~AIL

£HLs.LE,1»£

~sj Properties...

g] Statistical Studies



~t\ pr

~ ReDraw

PC4

d

A

1.1 1.3

	1

0.7 | 0.1 | -0.8 | -4.7 | -0.3 j -5Q | -02 | -5.0 | 0.0

| 03 |-1.1 | 0.6 |



I 0 .6



mm



-5.3



-0.3 I

m±j

0.1061 0.1361 0.127 1 0.159 | 0.114 | 0.163 | 0.1231 0167 ! 0.118| 0.156 1 0.159 | 0.134 | 0 1401 0111 |

0.1141

| 0.1321 0 125 1 0 1031

0169 i

0106 |

Scout provides a different Drop-Down Menu Graphic Tool in the presence of
observations of various groups. This can be used to change the grouping of the
observations on the graph. To perform this feature, move the mouse icon to the particular
observation and click the right click button on the mouse. A menu comes up. Click the
"Change Group" option. A window comes up with "Change Group Drop-Down
Box." Select the new group of the observation and click "OK" to continue or "Cancel"
to cancel the option. Once a selection has been made, move the mouse icon to that
particular observation and click on the left mouse button. This will change the
observation group assignment and the observation will belong to the new group shown on
the graph.

32


-------
Graph of 2D scatter plot with groups from graphs.

Data Set used: Beetles.

Changing the left-most observation from Group 3 (red triangle) to Group 2 (green circle).



_ 1 ^ir

Scatterplot

L_

¦Satery »

*

Studwign



Change

¦

Color >

<*•

Pont UWs

*



72	82	92	102	112	122	132	142	152	163	160

x2

Change group option brings up a Change Group window, as shown below.

Scatterplot

72	62

'<3	122	132	142	1$2	162	16.8

x2

33


-------
The left-most observation from Group 3 (red triangle) now belongs to Group 2 (green circle) on the
graph.

Scatterplot

161.1

KtlA)

To incorporate the changes in the graph to the worksheet, click the "Save Changes"
option after using the right-click button on the mouse. This saves the new grouping to
the first available column on the worksheet as "newGrp."

Observation 53 changed from Group 3 to Group 2.



0

1

2

3

4

Group

h1

x2

newGtp



43

2

124

15 2

44

2

120

13 2



45

2

119

16 2



46

2

119

14 2



47

2

133

13 2



48

2

121

152



49

2

128

142



50

2

129

14 2



51

2

124

132



52

2

129

142



53

3

145

8 2



54

3

140

113



55

3

140

11 3



56

3

131

103



57

3

139

11 3



58

3

139

103



59

3

136

12

3



60

3

129

11

3



61

3

140

10 3



62

3

137

9 3



34


-------
2.11.3 3D Graphics Chart Rotation Control Button

The axes in a 3D scatter plot can be rotated using the Chart Rotation Control button
present on the top-left corner of the 3D scatter plot.

@	Scatterplot

When this Chart Rotation Control button is clicked, the Chart Rotation Control tool box
appears. This tool box has three scroll bars for the three axes and a fourth scroll bar for
adjusting the brightness of the graph. The scroll bars can be used to rotate any or all of
the three axes. When the "Reset" button is clicked, the graph is reset to the standard
front view. The "Cancel" button brings the graph to its default view.

Chart Rotation Control

Y-Axis	Z-Axis

0 aJ 0

d

zl

XAxis

jlI J ±\ o

Light Level

JLl J Jj1™

Reset

OK Cancel



35


-------
The angle of rotation for the three axes ranges from -120 to +1 I I degrees. The positive
sign is for rotation in clockwise direction and the negative sign counter-clockwise
direction. The Light Level scroll bar ranges from 0 for black to 391 for the white
(brightest) level.

36


-------
References

ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 User Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:

http://www.epa.gov/esd/tsc/softvvare.htm.


-------
Chapter 3
Select Variables Screens

Scout provides a number of variable selection screens for different types of statistical
analysis. Most of them are illustrated here.

3.1 Data Drop-Down Menu

3.1.1 Transform (No NDs)

o When the user clicks Data £> Transform (No NDs), the following window wil
appear:

EH SelectTransform,Variable

"Select Tremfwm -
(• Z-Tiamlotm
f Lmearfax + b)
f* NatualLog
C Log Base 10

E»*M

C Pow(x,a)
<"* Box-Co*
f Ranked
C Ordefed
C Rank*

AicSne
C Group Item*

Select a Variable to Transform

ID Count

75
75
75
75

m

Variable to Transform

Name

I ID

I Count





I i£l

'Select Worksheet	

(• NewWaksheet

New Worksheet Filename

p Lhoose tiom txistng
Worksheets

l>n|

New Column Name

Select Column

CO

D

C2fi

C3

C4

C5'

C6

C7

CSj

C9

C10

CH|

a 2

CI 3

!

~1

C15

CI 6

a;

K»j

A

o This screen allows the user to transform a single variable. The transformations
available are in the "Select Transform" box.

° A single variable is selected and that variable appears in the "Variable to
Transform" box.

° The user can select the worksheet to store the transform using the "New

Worksheet" or the "Other Worksheets" and a set of available columns appear in
the "Select Column" box. The user has to specify a name for the new column.

39


-------
o An example of the selections made is shown below.

fH Scout 200B - [C:\0L0_0rfvB\HyFile5\WPWIN\SC0UT\Scoul 2008 Beta Test Venion 1.00.00\Scoul\DdlaU3RADU.xls] |J|^j(x]

El SelectiTransform Variable.	fx]1

"Select Tiarsfoim

C	Z-Ttai«fam

f	Linearfax + b)

f"	NatualLog

f	Log B«se 10

a	Exp(x]

r	Pow(x.a]

f	Box-Cox

f	Ranked

C	Ordered

C	Rank*

f	AicSne
Group Hems

Select a Variable to Transform

Name

Oxrtt
x1
x2
x3

75
75
75
75

Variable to Transform

Name



I 'D

1 Couni

V



1

75

<\





1 L>.|



I ,>:i

New Column Name



3.1.2 Impute: Transform Two Columns to a Column (NDs)

° When the user clicks Data I> Impute (NDs) the window given below will appear.

o This selection screen comes up only for data sets having non-detects. If the file
does not have columns for indicating non-detects, then an error message is
displayed in the Log Panel.

o This screen allows the user to transform a single variable. The transformations
available are in the "Select Transform" box.

o A single variable is selected and that variable appears in the "Variable to
Transform" box.

40


-------
Hili

-Select NDi Replacement

(*	Detection LmJ
C 1/2 Detection Imt
f Ze10

f NoanaJ ROS Ettmatej
f Gamma RQS Estmalej
f LognamalROS Est
<"* Urrfam

New Column Nome

Select a Variable to Transform

Select Column

CO	C1	C2	C3	C4

C5	C6	C7	C8	C9

CIO	C11	CI2	CI 3	C14

C15	CIS	CI 7	C18	C19
C20

Variable to Transform

Name

1 ID

\ Count |

X

1

53

GioiplX

3

10

Giixp3<

5

20

wimica

v

W 1

I ID I Ctml|

Select Worksheet	

f* NewWaktheel

Select New Worksheet Filename

Olhei WorktheeJ®

° The user can select the worksheet to store the transform using the "New

Worksheet" or the "Other Worksheets" and a set of available columns appear in
the "Select Column" box. The user has to specify a name for the new column.

o An example of the selections made is shown below:

EM

'Select HD» Replacement —

f Deled/on lm*
1/2 Detection lm*

Zeio

(* NotmalROS Esbmate*
<"* Gartrua ROS Estxnale*
f LognoimdROS Est
Unfotm

New Column Nome

GrouplXJmputed

Select a Variable to Transform

X

Gfotp2><
Gicmj3"I

1 53
5 20
7 23

Select Column

C3	EE)	ci 1	C12	C13

C14	CI 5	CI 6	CI 7	CI 8

CI 9	C20	C21	C22	C23

C24	C25	C2S	C27	C29

Variablo to Transform

Name

Gioi^IX

I ID I Com~

"Select Worksheet_

f New Worksheet

(• Other Worksheet*

BRADU

Worksheet

censor-by-grpsl

41


-------
3.1.3 Copy

° When the user clicks Data t> Copy, the following window will appear:

H Select' Vajiiab(ej toiCo py

Select a Column to Copy

Variable to Copy

| Name

1 ID

1 Count |

1 Aroclorl 25.4-~

0

""" 53

Aroclor_Without_NonD

2

44







>>

«

Name

ID Count

OK

Cancel

'Select Worksheet	

(• New Worksheet

Select New Worksheet Filename

f Other Worksheets

New Column Name

Select Column

CO	C1	C2

C3	C4	C5

CB	C7	C8

C9	CIO	C11

C12	C13	C14

CI 5	CIS	C17

C18	CI 3	C20



o This screen allows the user to copy a single variable to a new column.

3.2 Graphing and Statistical Analysis of Univariate Data

° Variables need to be selected to perform statistical analyses.

42


-------
o When the user clicks on any drop-down menu (Except Background vs. Site
Comparison option), the following window will appear.

Select Variables

Variables







Selected



Name

ID

I Count |



Name | ID | Count |

Arsenic

0

20 I





Mercury

2

30

» 1





Vanadium

4

20





Zinc

5

20







Group

6

20













« 1













Group by variable











1 ^











OK | Cancel |

o The Options button is available in certain menus. The use of this option leads
to a different pop-up window.

° Multiple variables can be processed simultaneously in Scout.

o Moreover, if the user wants to perform a statistical analysis on a variable (e.g.,
contaminant) by a Group variable, click on the arrow below the "Group by
Variable" to get a drop-down list of the available variables to select an
appropriate group variable. For an example, a group variable (e.g., Site Area)
can have alphanumeric values, such as AOCI, AOC2, AOC3, and
Background. Thus, in this example, the group variable name, Site Area, takes
4 values, such as AOCI, AOC2, AOC3, and Background.

° The Group variable is particularly useful when data from two or more samples
need to be compared.

o Any variable can be a group variable. However, for meaningful results, only a
variable that really represents a group variable (categories) should be selected
as a group variable.

43


-------
o The number of observations in the group variable and the number of
observations in the selected variables (to be used in a statistical procedure)
should be the same. In the example below, the variable, "Mercury," is not
selected because the number of observations for Mercury is 30; in other
words, Mercury values have not been grouped. The group variable, and each
of the selected variables, has 20 data values.

—













Variables

Selected





Name | ID | Count |

Name | ID

I Count |



Mercury 2 30

Arsenic 0

20



Group 6 20 i

Vanadium 4

20



" 1

Zinc 5

20



« I









Group by variable







[ ^







Arsenic ( Count = 20)







Mercury (Count =* 30 )







Vanadium (Count = 20







Zinc f Count = 20)







mm ff(cMegi)ll







UN | Lancei |

Caution: Care should be taken to avoid misrepresentation and improper use of group
variables. It is recommended not to assign any missing values for the group variable.

More on Group Option

° The group option provides a powerful tool to perform various statistical tests
and methods (including graphical displays) separately for each of the groups
(samples from different populations) that may be present in a data set. For an
example, the same data set may consist of samples from the various groups
(populations). The graphical displays (e.g., box plots, Q-Q plots) and statistics
of interest can be computed separately for each group by using this option.

o In order to use this option, at least one variable representing the group ID
(alphanumeric characters) should be included in the data set. The various
values of that group variable represent different group categories.

44


-------
o Note that the number of values (representing group membership) in a group
variable should equal the number of values in the variable (e.g., Arsenic) of
interest that needs to be partitioned into various groups (e.g., monitoring
wells).

° The group column can be any qualitative group ID representing different
species, laboratories, shifts, regions, and so on. For an example, in
environmental applications, data for the various groups represent data from
the various site areas (e.g., background, AOC1, AOC2, ...), or from
monitoring wells (e.g., MW1, MW2, ...).

3.2.1 Graphs by Groups

® Individual or multiple graphs (Q-Q plots, box plots, and histograms) can be
displayed on a graph by selecting the "Graphs by Groups" option.

° Individual graphs for each group (specified by the selected group variable) are
produced by selecting the "Individual Graph" option.

o Multiple graphs (e.g., side-by-side box plots, multiple Q-Q plots on the same
graph) are produced by selecting the "Group Graph" option for a variable
categorized by a group variable. Using this "Group Graph" option, multiple
graphs can be displayed for all of the sub-groups included in the Group
variable. This option is useful when data to be compared are given in the
same column and are classified by the group variable.

o Multiple graphs (e.g., side-by-side box plots, multiple Q-Q plots) for selected
variables are produced by selecting the "Group Graph" option. Using the
"Group Graph" option, multiple graphs can be displayed for all selected
variables. This option is useful when data (e.g., lead) to be compared are
given in different columns, perhaps representing different populations.

Note ¦ It is the users' responsibility to provide cm adequate amount of detected data to perform the group
operations. For an example, if the user desires to produce a graphical Q-Q plot (using only detected data)
with regression lines displayed, then there should be at least two detected points (to compute slope,
intercept, sd) in the data set. Similarly if the graphs are desired for each of the group specified by the
group ID variable, there should be at least 2 detected observations in each group specified by the group
variable Scout generates a warning message (in orange color) in the lower panel of the Scout screen
Specifically, the user should make sure that a variable with non-detects and categorized by a group
variable should have enough detected data in each group to perforin the various methods (e.g., GOF tests,
0-0 plots with regression lines) as incorporated in Scout

The analyses of data categorized by a group ID variable such as:

1)	Surface vs. Subsurface,

2)	AOC l vs. AOC 2,

45


-------
3)	Site vs. Background, and

4)	Upgradient vs. Downgradient monitoring wells, are quite common in many
environmental applications.

3.2.2 Select Variables Screen for Two-Sample Hypothesis Testing

The variables selection screen is different for two-sample hypothesis testing when
compared to single sample hypothesis testing. The "Select Variables" screen is as
shown.

il Select Variables,



Variables

Name

I ID

I Count I



0

25

y

1

25

z

2

25

Jj^£Lj Without Group Variable

First Sample Set
Second Sample Set

f With Group Variable

>> | Variable

Group Var
First Sample Set
Second Sample Set

"3]
"31

3]

Options

OK

Cancel

A

3.2.2.1 Without Group Variable

o The first sample set (e.g., background concentration) and the second sample set
(e.g., site concentration) of variables (e.g., COPC) are selected.

o The "Options" button provides the various options available with the selected
test.

3.2.2.2 With Group Variable

° This option is used when data values of the variable (e.g., COPC) for the first
sample set (e.g., site) and the second sample set (e.g., background) are given in
the same column. The values are separated into different populations (groups) by
the values of an associated group variable. The group variable may represent

46


-------
several populations (e.g., several AOCs, MWs). The user can compare two groups
at a time by using this option.

When using this option, the user should select a group variable by clicking the
arrow next to the Group Var option for a drop-down list of available variables.
The user selects an appropriate (meaningful) variable representing groups, such as
Background and AOC. The user is allowed to use letters, numbers, or
alphanumeric labels for the group names. A sample variables selection screen is
shown below.



OH

Variables

Name

1 ID

I Count

Group

0

53

X

1

53

GrouplX

3

10

Gioup2<

5

20

Group 3*1

7

23

Without Group Variable

» | Background / Ambient
>> Area of Concern / Site

<* With Group Variable

» | Variable
Group Var

Background / Ambient
Area of Concern / Site

I*

[Group	(Court ¦ 53) j*]

I*	3

F	3

Options

OK

Cancel

47


-------
Regression Menu

When the Regression Menu is clicked on, the following window pops up.

I Select' RegressibnrVhriab les

Variables

Name

Court

V

*1

x2
*3

ID

i Count

75
75
75
75
75

Selected Dependant Variable

Name

| ID | Count "|

Selected Independant Variables

Name	| ID | Count |

Graphics

Options

Group

"3

OK.

Cancel



Both dependent and independent variables need to be selected.

The use of the "Options" button leads to a new options window. The methods on
regression drop-down menu have different "Options" and "Graphics" screens.
They are discussed in Chapter 8.

Grouping works in the same way as for univariate data.


-------
An example of the selected screen is shown below.

50

Select' Rcgressionj Variables,:.

Variables

Selected Dependant Variable

Name

I ID

I Count

Count

» 1

Name

j ID | Count

« 1

y

1 75

Selected Independant Variables

Graphics
Options

Name

| ID | Count 1

2	75

3	75

4	75

Group



OK

Cancel

A

Multivariate Outliers and PCA Menu

For multivariate outliers or multivariate PCA, the following "Select Variables'
screen appears:

Select Variables;

Variables

Name

I ID

I Count |

y

1

75

x1

2

75

x2

3

75

x3

4

75

Selected

Name

I Count

Group by Variable

I	3

Options	Graphics

OK [ Cancel |


-------
The variables that are to be considered for the analyses are selected and the
"Options" button may be clicked to select from the various options available
Those options are discussed in Chapters 7 and 9.

A "Graphics" button is provided for Robust/Iterative methods and Principal
Component Analysis methods as shown below. Those options are discussed
Chapters 7 and 9.

Select Variables,

Variables	Selected

Name	| ID	| Count |

Group by Variable

I	3

Options	Graphics

OK	| Cancel |


-------
3.5 Multivariate Discriminant Analysis Menu

o When the Multivariate EDA > Discriminant Analysis is clicked on, the following
window appears.

Kf-	_ 3

Group by Variable

	;	3

LinearJJiscrim^^	Method

Variables

Name

I ID

I Count |

Count

"0

75

y

1

75

x1

2

75

x2

3

75

x3

4

75







Selected Matrix Columns

Options

Name

ID Count

• Prior Probability 	

(*; Equal
C Estrnated
C User Supplied

Graphics 1
Options

Cancel

Scores Storage 	

(• No Storage
C Same Worksheet
C New Worksheet



° There should be a group column specifying the various groups present.

° The group variable is selected from the "Group by Variable" drop-down bar.

o The various variables required for the analysis are then selected.

o If the prior probabilities are supplied by the user, then a column should exist in
the work sheet for the prior probabilities and the probabilities can be selected
from the "Select Group Priors Column" drop-down bar.

51


-------
° An example is illustrated below.

^Linear bisrnminan^'AnalysislGlassical Method

Variables

Name

count
Priors

ID I Countf'

150
3

Group by Variable

(count (Count = 150)

Selected Matrix Columns

Options

Graphics
Options

Name

I ID

Court

sp-length

1

150

sp-width

2

150

pt-length

3

150

pt-width

4

150

OK

Cancel



,xj

33:

"Prior Probability 	

f*1 Equal
C Esbrrwted
(*; User Supplied
Select Group Priors Column
jPrtors (Count = 3)

"Scores Storage	

ff No Storage
C Same Worksheet
C New Worksheet

A

Note: The Prior Probability box is not available for the Fisher Discriminant Analysis since equal priors
are assumed.

52


-------
Chapter 4
Data

Scout provides the user with an array of options to modify the given data, both without
non-detects and with non-detects. The various options include:

°	Copy: copies data from one column to another.

o	Generate: generates univariate and multivariate data.

°	Impute: generates estimated data for non-detect observations.

o	Missing: handles missing observations.

o	Transform: transforms data without non-detects using mathematical functions.

4.1 Copy

l. Click Data E> Copy.

|g| Seoul 2008 [D:\Narain\WorkDatlnExcel\STACKliOSSjj

Multivariate EDA GeoStats Programs Window Help!

1 Navigation Panel |

copr n

2 3 | 4

1 5

6

7

8

9

11 Name 1

venerate uata * »

Temp Acid-Conc |











(¦D:\Narain\WorkDatL.

Handle Mesng Data ^ ^

Transformation (No NDs) ~ l—l
60

Expand Data j
Bensforcfs Analysts

27 89 1 |

1 ! 1 1 1









OLSOut ost
OLSresXY gst

HI QOnt nrt

27, 88 j

!









25 90 1

i i i

i





1 1

1

2. The "Select Variable to Copy" screen (Section 3.1.3) will appear. Also, see
example screens shown below.

o A single variable is selected and that variable appears in the "Variable to
Copy" box.

o The user can select the preferred worksheet in storing the transformed data
using the "New Worksheet" or the "Other Worksheets" and a set of
available columns appear in the "Select Column" box. If the "New
Worksheet" option is selected, then the data is copied onto the new
worksheet. If the "Other Worksheets" option is selected, a set of
available worksheets arc displayed and the columns available for the
selected "Other Worksheet" are also displayed. The user has to specify a
name for the new column.

53


-------
° Examples for the selections using "New Worksheet" and "Other
Worksheet" are shown below.

B1MI

Select a Column to Copy

Variable to Copy

Name

| ID

I Count |

» I

Name

1 ID

1 Court

Aiodor_WithouLNonD

2

44

« I

Aioclorl 254

0

53

OK

Cancel

Select Worksheet-

(~ New Worksheet

Select New Worksheet Filename
|NewFileName

Olhef Wotksheels

New Column Name

CopiedColumn

Select Column

CO C1 C2

C3	C4	C5

C6	C?	C8

C9	DO	C11

C12	CT3	C14

C15	C16	C17

CI 8	CI 9	C20

A

EBB

Select a Column to Copy

Variable to Copy

Name

I ID

I Coirrt |

»

Name

ID

I Count

Arocloi1254

0

53

« |

Atoclctf_V/4houl_N onD

2

44

pSeled Woiksheet-

C NewWorksheet

(* OtherWorksheets

BRADU

censor-by-grps1 _xls
Aroclor 1254

New Column Name

CopiedColumn

Select Column

C3 E3 C5

CG	C7	C8

C9	CIO C11

C12	C13 CI 4

C15	C16 CI 7

C18	C19 C20

C21	C22



54


-------
4.2 Generate

The Generate option generates univariate uniform, normal, gamma and lognormal
distributed random numbers, and also multivariate normal data.

4.2.1 Univariate

l. Click Data t> Generate E> Univariate.

BS pH ^ ^it Configure ^^3 Graphs StatsfGOF Outliers/Estimates QA/QC Regression Multivariate EDA GeoStats Programs Window Help

Navigation Panel |

Copy t 1 o 1 i 1 a



5

G |

7

8

3

Name |

Generate Data ~ ! Univariate ~

Uniform 1

J











Handle Missing Data >| Multivariate ~

Normal 1

Mu^NarairAWotkUatl... I

Transforation (No NDs) ~ M	1—

^ r. . 80! 27i
Exoand Data I i 1

Gamma
Lognormal



	

	





	

OLSiesXY gst
I m en..* 1

Bensford's Analysis ^5j 25^

1

_

	 J

	i



2. Random numbers from the four different distributions are generated:

° Uniform distribution: input parameters are "a" (lower limit) and "b"
(upper limit).

o Normal distribution: input parameters are "Mu" (mean) and "Sigma"
(standard deviation) of raw data.

° Gamma distribution: input parameters are "Alpha" (scale parameter) and
"Beta" (shape parameter).

° Lognormal distribution: input parameters are "Mu" (mean) and "Sigma"
(standard deviation) of data is log-transformed space (logged data).

55


-------
An example for the normal distribution is illustrated.

o Click Data > Generate > Univariate > Normal.



0

Number of Observations

20

Mu (Mean)

(o

Sigma (Stdv)

rr—

Name of New Column

OK

Cancel

-Select Worksheet	

(* New Worksheet

Select New Worksheet Fdename

I	

f Other Worksheets

A

o Specify the number of observations required. The default is "20."

o Specify "Mu" (mean) and "Sigma" (standard deviation). The
defaults are "0" and "1," respectively.

o Specify the name of the new column.

o Select the worksheet into which the new data is to be generated.


-------
Click "OK" to continue or "Cancel" to cancel the Generate option.





Number of Observations

I

Mu (Mean)

Sigma (Sldv)
I 05

Name of New Column
RandomNumbers

OK

Cancel

r Select Worksheet	

New Worksheet

Select New Worksheet Filename
[NormalData

Other Worksheets

//,


-------
Output Screen for Univariate Normal Data.

m

Scout -4.0; -- [Norma IDataijj

File Edit Configure Data Graphs Stats/GOF Outliers/Estimates F
Navigation Panel

Name

Worksheet
NormalData



0

1



RandomN umbers



1

3 58556289197292

2

3.461244035533121

I

3'

2.81215307221327j

4

2.0734818083191800



5

3.49467504882474



6

3.40443935566417!

7

3.01967228931611



8

3.882082735344311

9

2.641370539537981

10

3.18959283116352



11





12





The new worksheet has been named "Normal Data," as seen in the Navigation Panel.

4.2.2 Multivariate

1. Click Data E> Generate > Multivariate l> Normal.

Scout 2008 - [D:\Narain\WorkPatlnExcel\STACKliOSSjl

Data j

?9 ?9 F|k Edit Configure
Navigation Panel

Graphs Stats/GOF Outliers/Estimates QA/QC Regression Multivariate EDA GeoStats Programs Window Help

Name

D:\Narain\Woft

Expand Data
Bensforcfs Analysis

JJ	?	1—| 3

4

5

6

7

9

9

Uravanate ~ jj r i

I	 I











Multivariate ~

Normal

t	!

1

	











J

i80 27i »• I





i



P, 25

90

I . I











58


-------
11 GeneratejMultinormali

Available Columns

Select Mean Vector Column

Name

Sdl
Sd2

I ID | CounTf

Number of Observations

Covariance S Matirx

Name

1 ID 1 Court j

OK

Cancel



Select Worksheet	

r New Worksheet
Select New Worksheet Filename

I	

f Other Worksheets

Note. In order to use this option, the user should make sure that there is a column for the mean vector and
p columns for the variance covariance matrix, where p is the number of variables in the matrix

o The mean vector is chosen from the "Select Mean Vector
Column" drop-down bar and the columns representing the
columns of variance-covariance matrix are chosen for the
"Covariance S Matrix."
o The selected worksheet represents the worksheet where the new
generated data would be stored. The generated data then can be
used in various other modules of Scout or some other software,
o If the "New Worksheet" is selected, then a name for the

worksheet has to be specified,
o Click "OK" to continue or "Cancel" to cancel the Generate option.

SHU Generate Mullinormali

Available Columns

Name

Wean
MN 0
MN 1

ID Count

0 2

3	10

4	10

Number of Observations

Select Mean Vector Column

|Mean (Count"2)

3

Covariance S Matirx

Std Dev 1
Std Dev 2

OK

Cancel

'Select Worksheet-

f New Worksheet

(* Other Worksheets

B...



59


-------
Output Screen for Multivariate Normal Data.

H Scout 4'.0j = [Worksheet]]

File Edit Configure Data Graphs 5tats/GOF Outliers/Estimates Regression Multivariate EDA Geo5tats Programs Window Help

Navigation Panel |



0

1

2

3 | 4

5

6

Name

Mean

MN_D

Std Devi

Std Dev2 | MN_0

MN_I



Worksheet
D \Narain\Scout_Fo
D.\Narain\Scout_Fo

1

10
15



2
0 6

OS; 1G 2537653947062

1243900850408

--

2

3| 15 3297427239163j 12 2910863942053

3





I 17 2531862559383, 8 21118433085578



4





| 14 4396726483095, 8 60121110989546'

1 ( i

5



- -

	

1 15 3956066747923^ 12 48778492786680



r



i i q 7nd <;n7ni 1 ?: q sq/i n wi k?ki nq

4.3 Impute (NDs)

Data sets with non-detect observations are transformed using the impute option. Various
options are available to impute (estimate or extrapolate) the non-detect observations. The
use of this option generates additional columns consisting of all of the extrapolated non-
detects and detected observations. Those columns can be appended to the any of the
existing open spreadsheets or in a new worksheet.

Click Data > Impute (NDs).

153 File Edit Configure ^^3 Graphs Stats/GOF Outliers/Estimates QA/QC Regression Multivariate EDA GeoStats Programs Window Help

Navigation Panel |

Copy

Generate Data ~

Handle Missing Data ~
Transformation (No NDs) ~
Expand Data
Bensford's Analysis

1

2 !

3

4

5

6

7

8



Name

:ngth

sp-width

p* length

pl-widlh

a_sp-
	UnnJh	

a.sp-

	wnrlth	

1

a_pt-

	Iwirrfh	

d_pt-width



D\Narain\WorkDat
OLSOut ost
OLSresXY gst
OLSOut a ost
OLSresXY a ost

51

3 5

1 4

02

; 02

r 02

1!
1

1

1j

1



4~9

-47;

4 8

""" c:

i 5
I 32

14i

l 13l

1i

1

T

1
1

	

| 311 15|

! °2'

°i

0,

0

0

		

2. The "Select Variable to Impute" screen (see Section 3.1.2 and the screen below)
appears. The various options available are:

o Detection Limit: the non-detect observations are given the values of the
detection limit.

p '/2 Detection Limit: the non-detect observations are given the values of the
one-half of the detection limit.

° Zero: the non-detect observations are given zero values.

° Normal ROS: Regression on Order Statistics (ROS) is used to extrapolate
the non-detect observations using a normal model.

o Gamma ROS: Regression on Order Statistics (ROS) is used to extrapolate
the non-detect observations using a gamma model.

60


-------
o Lognormal ROS: Regression on Order Statistics (ROS) is used to
extrapolate non-detect observations using a lognormal model.

° Uniform: the non-detect observations are given a value of a uniform
distribution random number with the lower limit as zero and upper limit as
the detection limit.

3. An example for the Normal ROS is illustrated.
o Click Data > Impute (NDs).

o In the "Select Variable To Impute" screen, the following options
are selected.

o Select the method to replace NDs ("Select NDs Replacement"),
the variable to transform, the New Column Name, and the
worksheet.

° Click "OK" to continue or "Cancel" to cancel the impute option.

61


-------
Output Screen for Impute using Normal ROS.

Id Scout-4,0j - fD:\Nflrain\Scout'_por_V/indows\ScoulSQurce\V/orkDatlnExcel\Data\censQr,-by.-grps1ili

cy File Edit Configure Data Graphs

Navigation Panel

Stats/GOF Outliers/Estimates Regression Miitivariate EDA GeoStats Wndow Help

4.4 Missing

Scout has three methods to handle missing observations. The first method replaces the
missing observations by the mean of the data, the second method replaces the missing
observations by the median of the data and the third method removes the rows with
missing observations. A new column is created for the selected variable using the
selected option. This new column can be added to a new worksheet or an existing
worksheet. Note that observations are given values 1E-31 or 1E+31 (considered to be
missing).

1. Click Data Missing > Replace Missing with Median.

gQ Fde Edit Configure Graphs Stats/GOF Outliers/Estimates QA/QC Regression

Multivariate EDA GeoStats Programs Window Help,

Navigation Panel |

Copy

Generate Data ~
Impute ND Data

1

2 | 3 4

1 5 I S

f

8



Name

ength

sp-width | pl-length pt-wtdlh

d_sp-
	Iprinlh	

1 cl_sp- 1

1	lAiiHfh	

a.pt-

	Unnlh	i

d pt-width



DANarainWVorkDatl
OLSOut ost
OLSresXY gst
OLSOut_a ost
HI ^rocYY a net

Repl

-1 C 1 i '

lace Missing with Mean

0 2
0*21

02^

1|

i|

i

0

l

1 1l

1

T

L 1

1

'i

1

—

Transformation (No NDs) ~
Expand Data
Bensford's Analysis

Replace Missing with Median

Remove Rows with Missing Data

1 Tbr"~

i

i



o

0

0|

62


-------
2.

The following screen appears:

Variables

Name

1 ID

1 Count I

IA&

u

\ti 1

Selected

Name

I Count

OK

Cancel

Select Worksheet	

(• New Worksheet

Select New Worksheet Filename

C Other Worksheets

/a

o Select the variable to modify ("Variables").

o Specify whether the new column should be added to a "New
Worksheet" or to existing "Other Worksheets" (under "Select
Worksheet").

o Click "OK" to continue "Cancel" to cancel the missing option.

Output Screen for Missing (Replace rows with the median).

-PMSBEBCg

~5 File Edit Configure Data Graphs Stats/GOF Outliers/Estimates Regression Multivariate EDA
Navigation Panel

Name

Worksheet



0

1

2

3





Data

m_Data









1

3

3









2

5

5









3

06

06









4

08

08









5

4

4









6

8

8









7

9

9









8

4

4









9



4









10

1

1









11

1

1









12

3

3









13

J0000E+031

4









14

4

4









15

5

5









16





I





63


-------
4.5 Transform (No NDs)

Scout offers a number of options to transform the variables without non-detects:

° z - transform: standardizes the variable; i.e., the mean of the observations is
subtracted and the result is divided by the standard deviation.

® Linear (ax + b): gives a linear transformation of x. The values of "a" and "b" are
entered by the user.

o Natural Log: gives the natural logarithm transform of the variable.

°	Log Base 10: gives the logarithm to the base 10 transform of the variable.

°	Exp(x): gives the exponential transformation of the variable.

o	Pow(x, a): gives the value of the variable "x" raised to power "a."

(x"

\ a /

the

° Box-Cox: gives the Box-Cox transformation of the variable; i.e.,
value of "a" is entered by the user.

o Ranked: gives the order number of the observations in the variable after sorting.

° Ordered: sorts the data in ascending order.

° Rankit: gives the expected values of ordered statistics of the standard normal
distribution corresponding to the data points in a manner determined by the order
in which the data points appear.

° Arcsine: gives the arc-sine value of the observations in the selected variable.

° Group Items: this option is used in conjunction with the Discriminant Analysis
for data sets with groups. This option outputs the group names in a sorted order
in the selected column. This option is useful when the user wants to input the
values of prior probabilities for the groups.

Click Data ^ Transform (No NDs).

tt&l Scout- 4'.0j = [D.: UH a r a i n \Scou r,_W, i ndov

ys^coutSourceAWo

rkDatlnEx

cel\BRADy]j







File Edit Configure

[23 Graphs Stats/GOF

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel |

Copy

1 | 2

3

4

5

6

7

9

Name

Generate Data ~
Impute ND Data
Handle Missing Data ~

V [ k1

x2

x3









D\Narain\Scout_F<

97^ 101

lOTl 95;

19 G

2T5

29 3
299

I

i

i

	





iml in 7>

?n?

31

I

1

1 1

64


-------
2. The "Select Transform Variable" screen (See also Section 3.1.1) appears.

o Specify the transform to apply ("Select Transform").

o Specify a variable to transform ("Select a Variable to
Transform").

o Specify whether the new column should be added to a "New
Worksheet" or existing, "Other Worksheets" (under "Select
Worksheet"; then, enter a name for the transformed variable
(under "New Column Name").

o Click "OK" to continue or "Cancel" to cancel the Transform option.

1! Select Transform^ Variable

"Select Transform -
(* Z-Transform
f Lmear(ax + b]
C Natural Log
C Log Base 10

Exp(x)
r Pow(n,a)
r Box-Cox
f Ranked
f Ordered
C Rankit
f AicSine
C Group Hems

Select a Variable to Transform

Name

Data

l

' S e t e d Worksheet"

("* NewWorksheet

Other Worksheets

Worksheet

New Column Name

| Z_Transforrn
Select Column

C2	C3	C4 _-l

C5	C6	C7

C8	C3	C10

cn	ci 2 d:

C14	C15 C1E:

CI 7	CI 8 CI?

< j

I >1

A

65


-------
Output Screen for Transform (No NDs).
Selected options: z - transform and Ranked.

S.eftut' [MorkSheetaJ

~S File Edit Configure Data Graphs Stats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Window
Navigation Panel

Name

Worksheet
WorkSheei a



0

1

2

3

4

5



Data

m_Data



Z_T ransform

Ranked



1

3

3



-0 31038S96593722

3



2

5

5



0.5064208391607280

4



3

0.6

'0.6



¦1.290556332054760

10



4

0.8

0.8



¦1.20887555154496

11



5

4

4



0 098016936611754

1



G

8

8



1.73163254680765

12



7

9

9



. 2.14003644935G63

5



8

4

4



0.098016936611754

8



9



4



0.098016936611754

9



10

1

1



-1.12719477103517

13



11

1

1



-1.12719477103517

14



12

3

3



-0 31038696593722

2



13

)0000E+031

4



0.09801693GG11754

15



14

4

4



0.098016936611754

6



15

5

5



0 5064208391607280

7



1. . 1fi



I I





4.6 Expand Data

Scout allows the user to generate the interaction terms using the available variables. This
part of the Scout program was developed so that the user can generate interaction terms
for regression analysis. The highest power supported by Scout is 10. But the user is
cautioned that the maximum number of interaction terms supported by Scout is 256. If
more than 256 terms are generated, then those terms will not be displayed on the
worksheet. The user is also cautioned that generating interaction terms with high degrees
takes up considerable computer resources and computing time.

66


-------
Click Data > Expand Data.

2. The following "Select Transform Variable" screen appears.

Select Variables for Expansion.

Variables

Name

I ID 1 Count]

Stack-Loss

nn

Variables to Expand

Nana	I ID I Count I

Ai-Flow

Temp

Acid-Conc

21
21
21

"Expand Selected Variables to this Power

"Select where to place the expanded columns ~
r* Add columns to current woiksheet

(* Race expansion ti a new worksheet

Select Filename for new worksheet

|New Sheet foi Data Expansion

Copy Dependent Colunn to new worksheet

[Stack-Loss (Count = 21)

I- Copy Group column to new woiksheet

Tr]

OK | Cancel |

//.

o Specify the variable to expand ("Variables to Expand").

o Specify the power /degree ("Expand Selected Variables to this
Power").

o Specify whether the new columns should be added to a "New
Worksheet" or existing, "Other Worksheets" (under "Select
Worksheet"; then, enter a name for the transformed variable
(under "New Column Name").

o If new worksheet option is selected specify if the dependent

variable used in regression should be copied to the new worksheet.

o If new worksheet option is selected specify if the group column
should be copied to the new worksheet.

® Click "OK" to continue or "Cancel" to cancel this option.

67


-------
in

Scout 2008j =. [New/Sheet forj Data; ExpansionJ

SO File Edit Configure Data Graphs 5tats/G0F Outliers/Estimates QA/QC Regression Multivariate EDA GeoStats Progr

Navigation Panel

Name

D \Narain\WorkDatl.
Expansion ost

1

0

1

2

3

4

5

6

7

btack-

AA

AB

. AC

BB

BC

CC



42

8,400

2,160

7.120

729

2,403

7.921



2

37

6,400

2,160

7.040

729

2,376

7.744



3

37

5,625

1,875

6.750

625

2,250

8.100



4

28 3.844

1,488

5.394

576

2,088

7,569



5

18] 3,844

1,364

5,394

484

1,914

7,569

	

6

18, 3,844

1,426

5,394| 529

2,001

7,569

7

19] 3,844

1,488

5,7661 576

2,2321 8.649

	

8

20

3,844

1,488

5,766

576

2,2321 8,649

9

15

3,384

1,334

5.046

529

2.001

7,569



10

14

3,364

1.044

4.640

324

1,440

6.400



11

14

3,364

1.044

5,162

324

1,602

7,921



12

13

3,364

986

5,104

289

1,496

7.744



13

11

3,364

1,044

4,756

324

1,476

6,724



14

12

3,364

1,102

5,394

361

1,767

8,649



15

8

2,500

900

4,450

324

1,602

7,921



16

7

2,500

900

4,300

324

1,548

7,396



17

8

2.500

950

3,600

361

1,368

5,184



18

8

2,500

950

3.950

361

1,501

6,241



19

9

2.500

1.000

4,000

400

1,600

6,400



20

15

3.138

1.120

4.592

400

1.640

6,724



21

15

4.900

1.400

6.370

400

1.820

8,281



Note' /( second output sheet called " Expansion.ost" will be generated This output sheet will indicate what
the variables in the column header stand for in the interaction terms



Scout 2000' px0ansi6n.ost']|

File Edit Configure Programs Window Help
Navigation Panel

Name

D \Naram\WorkDatl.
New Sheet for Data .

Expansion.ost

' I I

D ate/Time of Computation

I I I

Expansion Legend

10/29/2008 1 2 49 41 PM

From File

D. VN arainSWorkDatl r£ xcel\S T ACKL0 SS

To New Worksheet

New Sheet for Data Expansion

Expanded to the

2nd Power



Representation

Actual Variable Name

j "A"

Air-Flow

"B"

Temp

I „c„

Acid-Conc



68


-------
4.7 Benford's Analysis

Benford's law (see separate pdf file of Appendix C for details), less commonly known as
Newcomb's law, the first digit law, the first digit phenomenon, and the leading digit
phenomenon, was independently discovered first by Simon Newcomb (1881), and then
by Frank Benford (1938). Each noticed that the beginning tables of books of logarithms
were "dirtier" at the beginning (due to use) rather than at the end, noting that some
particular first digits should occur with a greater "natural" frequency.

Newcomb's form of the law is given as

/?(*/, (/) = /) = lo g,o

1 + -Try

i = 1, 2, 3	9

And the equivalent Benford's form of the law is given as

p(d,(/) = i) = log,0

rfi(')+'

M>)

i = l,2,3,...,9

wherep(d\(i) =/) is the probability that the first place,j - 1 (/'= 1,2, 3, ..., w), significant
non-zero integer digit, dj(/) = d/(i), of a number, N, has a particular integer value, /.

Those logarithmically distributed significant digits can be calculated and summarized as

First Place Digit Integer, d\(i)

Probability of Occurrence p(d\(i) = i)

3 9

/= I,2,3,...,9

l

0.30103

2

0.17609

3

0.12494

4

0.09691

5

0.07918

6

0.06695

7

0.05799

8

0.05115

9

0.04578

Click Data E> Benford's Analysis.

Scout 2QQ8; - f D:\yarain\ScoutJgn_WindowsWcgutSource\WorkDat InE-^e Wfeiidom0ata250Q.xls]!

Data

~§ File Edit Configure
Navigation Panel

Graphs Stats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

MHMIBfflMBIIM

GOFNoNDsStats o

Copy

Generate Data
Impute ND Data
Handle Missing Data
Transform (No NDs)

Bensford's Analysis

i

MN 1

MN_2

MN_3

MN 4

MN 5

MN_6

MN_7

MN

752359686 5858412241 150022582571065760841749422081 3922904801,460564284139256E

J	1	I	L

,523887297 3962833375D4751683674467529427190353773332340182683469816317182592

t	1	1	1	J	1	1	:	

I 2822S29723914046790 33296291612993850442'557545660 3685536433 585416521130540?
It	i	i	i	I	i	i	>

69


-------
2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If graphs have to be produced by using a group variable, then select a group
variable by clicking the arrow below the "Group by Variable" button. This
will result in a drop-down list of available variables. The user should select
an appropriate variable representing a group variable.

o Click "OK" to continue or "Cancel" to cancel Benford's analysis.

Output example: The data set "RandomData2500.xls" was used. The results of the
first digit analysis and the second digit analysis were computed.

Output for Benford's Analysis.

'Benford Analysts

User Selected Options



Date/T ime of Computation

1/30/2008 5 53 14 PM

From FOe

DANarainVScout_Foj_Windows\ScoutSource\WoikDatlnExcel\1FlandomData2500 xls

Full Precision

OFF

MN_0































Numbei of Valid Observations

2500















Number of Distinct Obseivations

2500



















Benford's First Digit Analysis





0

1

2

3

4

5

6

7

8

9



Expected

0 00000

030103

017609

012494

0 09691

0 07318

0 06695

0 05799

0 05115

0 04576



Actual

0 0)000

0 40280

0 20040

0 07920

0 05080

0 05480

0 05600

0 05360

0 05040

0 05200







Benford's Second Digit Anajpss





0

1

2

3

4

5

6

7

8

9



Expected

011968

011389

010832

010433

010031

0 09668

0 09337

0 09035

0 08757

0 08500



Actual

0.11760

0.11400

012520

0 10520

010640

0 09640

0 09200

0 08080

0 08920

0 07320



70


-------
References

F. Benford, "The Law of Anomalous Numbers." Proceedings of the American
Philosophical Society, 78, 551-572 (1938).

ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 Technical Guide." The software
ProUCL 4.00.04 can be downloaded from web site at:
http://www.epa.gov/esd/tsc/software.htm.

ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 User Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:
http://www.epa.gov/esd/tsc/software.htm.

S. Newcomb, "Note on the Frequency of Use of the Different Digits in Natural
Numbers," American Journal of Mathematics, 4, 39-40 (1881).

71


-------
Chapter 5
Graphs

The Graphs option provides graphical displays for both univariate and multivariate data.



~5, file Edit Configure Data

Stats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Navigation Panel |



Univariate ~ | -j

2

3

4 | 5

6 | 7

8

Name

bcatter plots ~ |

' y

x1

x2

x3 |

|



D \Narain\Scout_Fo .

1 i l; 97

101

19 G

28 3|

1

	

5.1 Univariate Graphs

Three commonly used graphical displays are available under the Univariate Graph
Option:

o Box Plots
o Histogram
o Multi-Q-Q

° The box plots and multiple Q-Q plots can be used for full data sets without non-
detects and also for data sets with non-detect values.

o Three options are available to draw Q-Q plots with non-detect (ND) observations.
Specifically, Q-Q plots are displayed only for detected values, with NDs replaced
by Vi detection limit (DL) values, or with NDs replaced by the respective
detection limits. The statistics displayed on a Q-Q plot (mean, sd, slope, and
intercept) are computed according to the method used. The NDs are displayed
with a smaller font and in red color.

° Scout can display box plots for data sets with NDs. This kind of graph may not
be very useful if many NDs are present in the data set.

o A few choices are available to construct box plots for data sets with NDs.
For an example, all non-detects below the largest detection limit (DL) and
portion of the box plot below the largest DL are not shown on the box
plot. A horizontal line is displayed at the largest detection limit level.

o Scout constructs a box plot using all of the detected and non-detect (using
DL values) values. Scout shows the full box plot; however, a horizontal
line is displayed at the largest detection limit.

73


-------
o When multiple variables are selected, one can choose to: 1) produce multiple
graphs on the same display by choosing the "Group Graphs" variable option, or
2) produce "Individual Graphs" for each selected variable.

o The "Graph by Group" variable option produces side-by-side box plots, multiple
Q-Q plots, or histograms for the groups of the selected variables representing
samples obtained from multiple populations (groups). Those multiple graphs are
particularly useful to perform two (background vs. site) or more sample visual
comparisons.

o Additionally, the box plot has an optional feature which can be used to
draw lines at statistical limits (e.g., upper limits of background data set)
computed from one population on the box plot obtained using the data
from another population (e.g., a site area of concern). This type of box
plot represents a useful visual comparison of site data with background
threshold values (background upper limits).

o Up to four (4) statistics can be added to a box plot. If the user inputs a
value in the value column, then the check box in that row will get
activated. For example, the user may want to draw horizontal lines at 80th
percentile, 90th percentile, 95th percentile, or a 95% UPL on a box plot.

5.1.1 Box Plots

l. Click Graphs > Univariate > No NDs or With NDs > Box Plot.



~§ File Edit Configure" Data
Navigation Panel

Name

D \Narain\Scout Fo ..

Stats/GOF Outlier's/Estimates Regression Multivariate EDA GeoStats Proc

univariate.

0^ EMEB c>

Scatter Plots ~

With NDs ~

3.202!

I asstea

Histograms
Q-Q Plots

4

5

6

u b roup I

y

Groups

u ijrou

y '

1

19 601



2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

° If graphs have to be produced by using a group variable, then select a group
variable by clicking the arrow below the "Group by Variable" button. This
will result in a drop-down list of available variables. The user should select an
appropriate variable representing a group variable.

74


-------
o When the "Options" button is clicked, the following window appears.

_Gr=ph by Groups	

(• Individual Graphs C Group Graphs

Label

Value

1.	r

2.	r

3.	r r

4.	r r

-Graphical Display Optior.s	

<• Color Gradient
C For Export (BW Printers)

OK

Cancel

o The default option for "Graph by Groups" is "Individual Graphs."
This option will produce one graph for each selected variable. If you
want to put all the selected variables into a single graph, then select the
"Group Graphs" option. This group graphs option is used when
multiple graphs categorized by a group variable have to be produced
on the same graph.

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to use and import graphs in black and white
into a document or report, then check the radio button next to "For
Export (BW Printers)."

o Click on the "OK" to continue or "Cancel" to cancel the options.

° Click on the "OK" to continue or "Cancel" to cancel the Box Plot.

75


-------
Box Plot Output Screen (Single Graph).

Selected options: Label (Background UPL), Value (103.85), Individual Graphs, and Color Gradient.

Box Plot Output Screen (Group Graphs).

Selected options: Group Graphs and Color Gradient.

76


-------
5.1.2 Histograms
5.1.2.1 NoNDs

I. Click Graphs > Univariate > No NDs i> Histograms.

H Scout'4.^ = [p:Warain\ScoutLFdr^WlindiTO^\ScputSourc^\WgrkDatlnExi:el\Data\censoriib^.-gii^S;1i]J

Graphs

Help

Navigation Panel | I



Boxplots

4

5 | 6 I 7

8

Scatter Plots ~ j With NDs ~

O-O Plots

Name |

U Uioupl i
" V	:

Group2X | u-^ouf*| Groups

U LaroupJ

y

RBmmHaaneael

1 | 1| 3202;

y y riyu

i 1j 19 601- 1| 116 467

1

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

° If graphs have to be produced by using a group variable, then select a group
variable by clicking the arrow below the "Group by Variable" button. This
will result in a drop-down list of available variables. The user should select
an appropriate variable representing a group variable.

° When the "Options" button is clicked, the following window appears.

fefil Graphs* Histogrami

"Graph by Groups

f* Individual Graphs'
C Group Graphs

¦ G raphical D isplay 0 ptions —
f* Color Gradient

For Export (BW Printers)

•Select Number of Bins

10

OK

Cancel

A

o The default selection for "Graph by Groups" is "Individual
Graphs." This option produces a histogram (or other graphs),
separately for each selected variable. If multiple graphs or graphs by

77


-------
groups are desired, then check the radio button next to "Group
Graphs."

o The default option for ''Graphical Display Options" is "Color
Gradient." If you want to use and import graphs in black and white
into a document or report, then check the radio button next to "For
Export (BW Printers)."

o Specify the number of bins for the selected variable in "Select
Number of Bins" text box. The default is "10."

o Click "OK" to continue or "Cancel" to cancel the option.

• Click "OK" to continue or "Cancel" to cancel the Histogram.

Histogram Output Screen.

Selected options: Group Graphs and Color Gradient.

Histograms for X (1), X (2), X (3)

¦xpiMxcaBxra

78


-------
5.1.2.2 WithNDs

Click Graphs > Univariate E> With NDs l> Histograms.



File Edit Configure Data

Navigation Panel

Name

ET^Nara irRS goatlESBI

rin nut

©gsfiS 5tats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

No NDs

5catter Plots ~

IS D

3 202
"4?M

Boxplots

GCBcaiBug

Q-Q Plots

ILMjmupl

—y—
I	1

.ir

Group2X

19 601|

u_u roup-i
	y.

GmupGX

1, 1164671

ii fn?a??r

u_u r<
	X

2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

° If graphs have to be produced by using a group variable, then select a group
variable by clicking the arrow below the "Group by Variable" button. This
will result in a drop-down list of available variables. The user should select
an appropriate variable representing a group variable.

o When the "Options" button is clicked, the following window appears.

"Display Non-Delects 	——-—1

(* Do not Use Non-D elects
Use Nori-Detect Values
0" Use 1/2 Non-Detect Values

-Graph by Groups 	

f* Individual Graphs
C Group Graphs

¦ G raphical D isplay 0 ptions —-
(* Color Gradient

C For Export (BW Printers)

"-Select Number of Bins
|l0

OK

Cancel



79


-------
o Specify the "Use Non-detects" option. The default is "Do not Use
Non-detects."

Do not Use Non-detects: Selection of this option excludes the NDs detects
and uses only detected values on the associated histogram.

Use Non-detect Values: Selection of this option treats detection limits as
detected values and uses those detection limits and detected values on the
histogram.

Use 'A Non-detect Values: Selection of this option replaces the detection
limits with their half values, and uses half detection limits and detected
values on the histogram.

o The default selection for "Graph by Groups" is "Individual
Graphs." This option produces a histogram (or other graphs)
separately for each selected variable. If multiple graphs or graphs by
groups are desired, then check the radio button next to "Group
Graphs."

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to use and import graphs in black and white
into a document or report, then check the radio button next to "For
Export (BW Printers)."

o Specify the number of bins for the selected variable in "Select
Number of Bins" text box. The default is "10."

o Click "OK" to continue or "Cancel" to cancel the option.

Click "OK" to continue or "Cancel" to cancel the Histogram.


-------
Histogram Output Screen.

Selected options: Group Graphs and Color Gradient.

5.1.3 Q-Q Plots
5.1.3.1 NoNDs

1. Click Graphs ~ Univariate ~ No NDs ~ Q-Q Plots.

Scout 4.0 [D:\Narai11\Scout_For_Windaw5\ScoutSource\WorkDall11Excel\Dala\Gehan Tes

Navy.xls]



^3 File Edit Configure Data

^223 Stats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel |

Univariate ~ | No NDs ~

Boxplots
Histograms

4

5

G

7

£

Name

Scatter Plots ~ With NDs ~













1 ! 1! 2













2. The "Select Variables" screen (Section 3.2) will appear.

•	Select one or more variables from the "Select Variables" screen.

•	If graphs have to be produced by using a group variable, then select a group
variable by clicking the arrow below the "Group by Variable" button. This
will result in a drop-down list of available variables. The user should select
an appropriate variable representing a group variable.

81


-------
When the "Options" button is clicked, the following window appears.

1! OotibnsjGiiaphlilniwar,iate... _ '

-Display Regression Lines 	

C Do Not Display

Display Regression Lines

"Graphical Display Options —
(* Color Gradient

For Export (BW Printers)

OK

Cancel



o The default option for "Display Regression Lines" is "Do Not
Display." If you want to see regression lines, then check the radio
button next to "Display Regression Lines."

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to use and import graphs in black and white
into a document or report, then check the radio button next to "For
Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the option.

Click "OK" to continue or "Cancel" to cancel the Q-Q Plot.


-------
Q-Q Plot for No NDs Output Screen.

Q-Q Plot for Background

29 40

28 40

27.40

26.40

25.40

24 40

23.40

22 40

21 40

20.40

1940

1840

W 17.40

O 16.40

> 15.40

«a 14.40
J3

o 13.40
T3

g 12 40
"2 11 40

o

10.40
9.40
8.40

N-10

Most ¦ 135000
Sd-91682
Slop#-9.5771
Irtercept • 135000
Correiaticri, R - 0 9622

-0 60
-150

Theoretical Quantiles (Standard Normal)

Note: For Multi-Q-Q plot option, for both "Full" as well as for data sets "With NDs, " the values along the
horizontal axis represent quantiles of a standardized normal distribution (Norma! distribution with mean 0
and standard deviation 1). Quantiles for other distributions (e.g., Gamma distribution) are used when
using Goodness-of-Fit (GOF) test option.

5.1.3.2 With NDs

1. Click Graphs ~ Univariate ~ With NDs ~ Q-Q Plots.

IS Scout 4.0 [D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\Data\censor-by-grps1]

ay File Edit Configure Data Stats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel |

No NDs ~ 19 q

4

5 6

7

8

Name

Scatter Plots ~ Boxplots

U broupl

X

Group2X

Group2><

U UK

	x

ItMflaMIIM

, 1 3.202 MfcUfadtu*

0-0 Plots

1

19.601 1

116.467



nO HIM nnt

-> 14

1

7 3 RSfi 1

in?n??

2. The "Select Variables" screen (Section 3.2) will appear.

• Select one or more variables from the "Select Variables1' screen.

If graphs have to be produced by using a group variable, then select a group
variable by clicking the arrow below the "Group by Variable" button. This
will result in a drop-down list of available variables. The user should select
an appropriate variable representing a group variable.

83


-------
When the "Options" button is clicked, the following window appears.

HI 0^tionsJilh|yariiat^Q.QwND^ ^

"Display Non-Delects 	

C Do not Display Non-Delects
(* Display Non-Detect Values

C Display 1/2 Non-Detect Values

-Display Regression Lines 	

<• Do Not Display
C Display Regression Lines'

-.G raphical D isplay 0 ptions —
(* Color Gradient

C For Export (BW Printers)

OK

Cancel

A

o Specify the "Display Non-detects" option. The default is "Do not
Display Non-detects."

Do not Display Non-detects: Selection of this option excludes the NDs
detects and displays only detected values on the associated Q-Q Plot.

Display Non-detect Values: Selection of this option treats detection limits
as detected values and displays those detection limits and detected values
on the Q-Q Plot.

Display 'A Non-detect Values: Selection of this option replaces the
detection limits with their half values, and it displays half detection limits
and detected values on the Q-Q Plot.

o The default option for "Display Regression Lines" is "Do Not
Display." If you want to see regression lines, then check the radio
button next to "Display Regression Lines."

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to use and import graphs in black and white
into a document or report, then check the radio button next to "For
Export (BW Printers)."


-------
o Click "OK" to continue or "Cancel" to cancel the option.
• Click "OK" to continue or "Cancel" to cancel the Q-Q Plot.

Q-Q Plot Output Screen

Selected options: Do not Display Non-detects and Color Gradient.

29 40
28 40

27	40

28	40
25 40
24.40
23 40
22 40
21.40
20.40



Q-Q Plot for Background



* / '

nwkQfound

TotalNunberol 10
Nunber of Non-Deled* ¦ 4
Number ol Detects ¦ 6
Mean-121867
Sd» 5 6419
Slope • 10 3239
intercept -12 1667
Con eteton. R. 0 9806

Ordered Observations



M







6.40
5.40
4.40



M







J.40
2.40
1 40











0.40
-3.60











-1 5

-10

-05 0.0 05
Theoretical Quantiles (Standard Normal)

1 j

15



5.2 Scatter Plots

Two-dimensional (2D) and three-dimensional (3D) Scatter Plots displays are available
under the Graphs Scatter Plots menu. Those graphs can be numbered according to
observations or by groups if a group variable exists in the data set.

5.2.1 2D Scatter Plots

1. Click Graphs ~ Scatter Plots ~ID.

SB Scout 4.0 [D:\Narain\Scoul_For_Windows\ScoutSource\WarkDatlnExcel\BRADU]

¦2 File Edit Configure Data

h| Stats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel |

Univariate ~ i 1

J	2_

	3	

4

5

G

7

8

Name





x2

x3









D:\Narain\Scout Fo... ||

1; ^

r io.i

19 G

28.3









85


-------
2.

The "Select Variables" screen (Section 3.2) will appear.

Select two or more variables from the "Select Variables" screen.

If the graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in drop-down list of available variables. The user
should select and click on an appropriate variable representing a group
variable.

Click "OK" to continue or "Cancel" to cancel the Graphs.

2D Scatter Plot.

Data Set Used: Bradu (4 variables).

Scatterplot

i'	'

2- , -

a —

The data set Bradu has four variables. The user can choose any one of the four variables
for the X-axis and one of the remaining three for the Y-axis using the drop-down bars in
the graphics toolbar as explained in Chapter 2. The observation numbers of the various
points on the graph can be viewed by right-clicking of the mouse and using the "Point
Labels" option.

86


-------
2D Scatter Plot.

Data Set Used: Iris (4 variables, 3 groups).



1 76	1 96

¦ < A3

Scatterplot

» 0 9
9 9 3

¦i * 9

2.16	2.36	2=6	2.76	236	3.16	3 36	3 56	3.78	3 96	4 16	4.36	4 56 4 64

sp-width

The user can choose any one of the four variables for the X-axis and one of the remaining
three for the Y-axis using the drop-down bars in the graphics toolbar as explained in
Chapter 2.

5.2.2 3D Scatter Plots

l. Click Graphs ~ Scatter Plots^ 3D.

3Scout 4.0 - [D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExEel\FULLIRIS]

¦y File Edit Configure Data Stats/GOF Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel J

Univariate ~ | 1

2

3

4

5

6

7

a

Name
D:\Narain\Scout Fa...

1 |			

sp-width

3.5

ptlength
1.4

pt-width
0.2









87


-------
2.

The "Select Variables'' screen (Section 3.2) will appear.

•	Select two or more variables from the "Select Variables" screen.

•	If the graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in drop-down list of available variables. The user
should select and click on an appropriate variable representing a group
variable.

•	Click "OK" to continue or "Cancel" to cancel the Graphs.

3D Scatter Plot.

Data Set used: Bradu.

O	Scatterplot

The user can choose different variables for the three axes using the drop-down bars in the
graphics toolbar as explained in Chapter 2.

88


-------
Rotation of axes using the Chart Rotation Control.

Chart Rotation Control

Y-Axb	Z-Ax«

_-] « -JO

K-Am

J ±1 26

Light Level

±1 J _!J"»

3D Scatter Plot using groups.

Data Set Used: Iris (4 variables, 3 groups).



Scatterplot

*

i 1 i" ¦

"Hi
i J *11*1

1 i ¦> *
•i* 1 i*

2

2 -

2	3

2	?\* 3

. ,2 ,3,,	' i£ *

2Tj3 f	i * ..

tj ?	. »

sp-length

3 3

"5 ^

¦ l A 2 3

89


-------
90


-------
Chapter 6

Goodness-of-Fit and Descriptive Statistics

6.1 Descriptive Statistics of Univariate Data

This option is used to compute general summary statistics for any or all of the variables
in the data file. Summary statistics can be generated for full data sets without non-detect
observations, and for data sets with non-detect observations. Two menu options: No NDs
(Full) and with non-detects (NDs) are available.

o No NDs (Full) - This option computes summary statistics for any or all of the
variables in a data set without any non-detect values.

° With NDs - This option computes simple summary statistics for any or all of
the variables in a data set that also have ND observations. For variables with
ND observations, simple summary statistics are computed based upon the
detected observations only.

° Multivariate - This option computes the mean vector, the median vector, the
standard deviation vector, the covariance matrix and the correlation matrix for
the multivariate data.

6.1.1 Descriptive (Summary) Statistics for Data Sets with No Non-detects
l. Click Stats/GOF > Descriptive l> No NDs.

§1 Scout 20.08J o[D:\yarainj\Scgul_f;or^W,indqws\ScgutSnurce\WqrMatlnE^el\BRADy])

~jj File Edit Configure Data Graphs
Navigation Panel

Stats/GOF

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

D \Narain\Scout Fo

1

GOF	~

Hypothesis Testing ~
Intervals	~

With NDs
Multivariate

x3

28 3j
?a qi'

8

2.

The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

° If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click "OK" to continue or "Cancel" to cancel the Descriptive Statistics.

91


-------
The following summary statistics are available for the variables selected.

o	Number of Observations

o	Number of Missing Values

o	Minimum Observed Value

o	Maximum Observed Value

o	Mean = Sample Average Value

o	Q1 = 25th Percentile

o	Q2 = Median

o	Q3 = 75th Percentile

o	90th Percentile

o	95th Percentile

o	99th Percentile

o	(Sample) Standard Deviation

o	MAD = Median Absolute Deviation

o	MAD/0.675 = Robust Estimate of Variability, Population Standard

Deviation, o

o	Skewness = Skewness Statistic

o	Kurtosis = Kurtosis Statistic

o	CV = Coefficient of Variation

The details of these descriptive (summary) statistics are described in the EPA
(2006) guidance.


-------
Output for Descriptive Statistics - No Non-detects (NDs).

1 ! i













|

Univariate Descriptive Statistics for Datasets with No NDs

Date/Time of Computation

5/28/2007 5:42.00 PM





User Selected Options









From File

D: \N arain\S cout_For_Windows\S coutS ourceVWorkD atl nExcel\IFIIS.xls

Full Precision

OFF

















Var 0: | sp-length











Vai2: ptlength







t



Var 0:

sp-width

Var 2:

pt-width |



Number of Observations

50

50

50

50





i Number of Missing Values

0

0

0

0





Minimum Observed Value

43

2.3

1

0.1





; M aximum 0 bserved V alue

5.8

4.4

1.9

06





Mean

5 006

3.428

1.4G2

0 246





(Q1) 25% Percentile

4.8

3.15

1 4

0.2





. (Q 2) Median

5

34

1 5

02





(Q3) 75% Percentile

5.2

3.S5

1 55

03





i 90% Percentile

5.4

3.9

1.7

0.4





| 95% Percentile

5.G

4.05

1 7

0.4





| 99% Percentile

5.75

4.3

1 9

0.55





| Standard Deviation

0 352

0.379

0.174

0.105





; MAD / 0.G745

0 297

0 371

0.148

0





j Skewness

0.12

0 0412

0.106

1.254





j Kurtosis

•0.253

0 955

1.022

1.719





CV

0 0704

0.111

0119

0.428





Note When the variable name is loo long to fit in a single cell, then the variable number and its name are
printed above the results table. In the above output sheet, the variable, sp-length, was chosen as the first
variable and variable, pt-length, was chosen as the third variable. The names of those two variables
cannot fit in individual cells of the descriptive statistics table, hence they are named as Var 0 and Var 2,
respectively, in the table.

93


-------
6.1.2 Descriptive (Summary) Statistics for Data Sets with Non-detects
l. Click Stats/GOF l> Descriptive > With NDs.

IS Scout' 20.0,Bj - [ID^arainVScgutuForcMndowsVScoutSf

jyrceAWorkDatlr

il

jxceNBRAI

M]J





| ~§, File Edit Configure Data Graphs

Outliers/Estimates Regression

Multivariate EDA GeoStats Programs Window

Help

I Navigation Panel I

I

Descriptive ~

No NDs

1

4

5 1

6 | 7 !

£



Name

1

GOF ~

With NDs

r

x3



1





|

Hypothesis Testing ~

Multivariate

L



1





D.\Narain\Scout_Fo .

fl 1

?!

28 3



|





Intervals ~

r







1 !







1 -> 1

J a fi ?n 5'

?na



1 i



2. The "Select Variables" screen (Section 3.2) will appear.

° Select a variable(s) from the list of variables.

o Only those variables that have non-detect values will be shown.

° If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click "OK" to continue or "Cancel" to cancel the Descriptive Statistics,
o The following summary statistics are available for the variables selected.

o	Number of Observations

o	Number of Missing Values

o	Number of Detects

o	Number of Non-detects

o	Percentage of Non-detects

o	Minimum Observed Detected Value

o	Maximum Minimum Observed Detected Value

o	Mean of Detected Values

o	Median of Detected Values

o	Standard Deviation of Detected Values

o	MAD/0.675 of Detected Values = Robust Estimate of Variability

(standard deviation)

o	Skewness of Detected Values

o	Kurtosis of Detected Values

o	CV = Detected Values Coefficient of Variation

o	Ql - 25th Percentile of All Observations

o	Q2 = Median of All Observations

o	Q3 = 75th Percentile of All Observations

o	90th Percentile of All Observations

o	95th Percentile of All observations

94


-------
o 99th Percentile of All Observations

Note: In Scout, "Descriptive Statistics "for a data set with non-detect observations represent simple
summary statistics based upon, and calculated from, the data set without using non-detect observations
The simple "Descriptive Statistics /Univariate/ IVith i\'Ds " option only provides simple statistics (e g, %
NDs, max ND, Min ND, Mean of detected values) based upon the detected values only Those statistics
may help a user to determine the degree of skewness (e g , mild, moderate or high) of the data set
consisting of detected values Those statistics may also help the user to choose the most appropriate
method (e.g, KM (BCA) UCL or KM (t) JCL) to compute confidence, prediction and tolerance intervals

Output for Descriptive Statistics - With Non-detects.

1 1



















Univaiiate Descriptive Statistics foi Datasets w«lh NDs







Date/T ime of Computation
User Selected Options

5/28/2007 5.44.23 PM

- -





- -

-

. 	

From File
Full Precision

D' \N arain\S cout_Foi_Windows\S coutS ource\WorkD atl nE xcel\D ata\censor-by-grps1. x I s

OFF --- -- - ' "

	 	

X

	



— -

	

...... T

	

Number of Observations
Number of Missing Values

53
0





- -



... . I

- —

Number of Detects
Number of Non-Detects

49

4

	

		



~

.



	

Percentage of Non-Detects

7 547%















Minimum Observed Detect Value

3 202















Maximum Observed Detect Value

121 1















Mean of Detect values

55 05









...





Median of Detect values

31.57















Standard Deviation of Detect values

43.2















MAD / 0.G745 of Detect values
Skewness of Detect values

46.8
0~149







! I

—

Kurtosis of Detect values

-1 758















CV of Detect values
(Q 1)252 Percentile (All ON

0 785
9 608

-











	

(Q2) Median (All Obs)

31 57















(Q3) 75% Percentile (All Obs)

95 73















90% Percentile (All Obs)

107.6















95% Percentile (All Obs)
99% Percentile (All Obsj

1129

TT§y

- -

-



-





	

95


-------
6.1.3 Descriptive Statistics for Multivariate Data
l. Click Stats/GOF Descriptive > Multivariate.

SH Scout/ 2008 - [D:\Narain,\Scout_Forc_Wjindows^cqutSource\WorkDatl

Stats/GOF

Multivariate EDA GeoStats Programs Window Help

Navigation Panel |



'Descriptive'

3

NoNDs f

4

5

6

7

£

Name

GOF

~ I

With NDs

r

x3









( Hypothesis Testing
Intervals >

Multivariate

l

D \Naram\Scout_Fo .

1 '•



28.3









-> i

J 51 Rl





1

| |

2. The "Select Variables" screen (Section 3.2) will appear.

o Select a variable(s) from the list of variables.

o If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click "OK" to continue or "Cancel" to cancel the Descriptive Statistics.

96


-------
Output for Descriptive Statistics - Multivariate.

m

Scout. 200B -[MultiDesc.ost]]

~P File Edit Configure Programs Window Help
Navigation Panel

Name

C \OLD_Drive\MyFil.
MultiDesc ost

1 1

1 1 1 1 1 1 It

Date/T ime erf Computation
User Selected Options

Multivariate Descnptive Statistics

11/13/2008 3 08 34 PM

- - - ¦

-- - -

—

--

--





From File

C VOLDJ}iive\MyFdes\WPWiN\SCOUT\Scout 200810-17-08\Data\Scout v. 2 0 DataMR

-

FuS Precision

OFF

---

.... _

	

- - —

-

	



	

	

Multivariate Statistics

	





	

	

	



Number of Observations'166















Number of Selected Variables 4





	































Mean











sp-length

sp-wtdth

pt-length

pt-width















597

3149

3 772

1 346



































Median















sp-length

sp-width

pt-length

pt-width















58

3

4.35

1 4































Ml



S tandaid D aviation















Log Panel |



















LOG 3 07 28 PM ^[Information] ParallAX program started as separate Independant program'

LOG 3 08 15 PM >[lnformationj C \OLD_Dnve\MyFiles\WPWIN\SCOU"RScout 2008 10-17-08\Dala\Scoutv 2 0 Data

URISOUT DAT was imported into IRISOUT wst

97


-------
i

Scoulj 2008 - [Mu_ltiDesc.qst]J

'~§ Fite Edit Configure Programs Window Help

Navigation Panel |

III!











—

Name |



|



:: =

	

	

	



C \OLD_Diive\MyFil
MultiDesc ost

Standard Deviation

-

sp-length | sp-width | pHength

pt-wtcfth

	

— —





	

1 077 | 0624 1 824

0995





	





1

	





	

Covariance S Matrix



	



sp-length

sp-width

pt-length

pt-wtcfth







1 16

0 255

1 477

0 736













0 255

0 389

-0186

00255















1.477

-0186

3326

1 176















0 736

00255

1 176

0991















Determinant

0119















Log ol Determinant

-2126































Eigenvalues of Classical Covariance S Matrix











Eva! 1

Eval 2

Eval 3

Eval 4















4 604

0 756

0 426

00806















Sum ol Eigenvalues

5 866































Classical Correlation R Matrix |

JjU

Log Panal |

I I



LOG 3 07 28 PM >[lnformation| ParallAX program started as separate independant programi

LOG 3 08 1 5 PM ^Information] C \OLD_Drive\MyFiles\WPWIN\SCOUT^Cout 2008 10-17-08\Data\Scoutv 2 0 Data

\IRISOUT DAT was imported into IRISOUT wst

98


-------
H Scout- 2008 -. [MultiDesc.qsl]]

o@ File Edit Configure Programs Window Help
Navigation Panel

MB

. n" Xl

Name

C \OLD_Drrve\MyFil
MultiDesc ost

I I

I











"I

. 				.....L	



		

	

	

	

=1



Classical Correlation R Matrix





sp-lenglh

sp-width

pl-length

pt width



sp-lenglh

1

0 379

0 752

—-fiTsT

0 648

0 686
" 004TT
TS48-

	

_	

	

	

sp-widlh

0 379

1

pt-length

0 752

-0164

	,	

pt-wicfth

0 686

0 0411

1

	

	



Determinanl

0 0802







Log of Detetmnant

-2 523































Eigenvalues of Classical ConelationR Matrix











Eval 1

Eval 2

Eval 3 Eval 4















2409

1 147

0 365

0 0795















Sum ol Eigenvalues

4















i	i i r i i i

iiil	jT

Log Panel |

LOG 3 07 28 PM >[lnfoimation] PaiallWt program started as separate independant programl

LOG 3 08 1 5 PM ^Information] C \OLDJDilveV«1yFiles\WPWIN\SCOUTiScout 2008 10-1 7-08\Data\Scout v 2 0 Data

ilRISOUT DAT was imported into IRI30UT wst

99


-------
Multivaiiate Descriptive Statistics

Date/Time of Computation 3/13/2008 G 27 08 AM
User Selected Options

From File D \Harain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\BRADU
Full Precision OFF

Multivaiiate Statistics |













Number of Observations

75











Number of Selected Variables

4























Mean







y

x1

x2

x3











1 279

3 207

5 597

7 231























Median







y

*1

*2

x3











0.1

1 8

22

Z1























Standaid Deviation







y

*1

x2

x3











3 493

3 653

8239

11 74























Covariance 5 Matrix







y

x1

x2

x3











122

9 477

20 39

31 03











9 477

13.34

28 47

41 24











20.39

28 47

67 88

94 67











31.03

41 24

94 67

137 8











Determinant

1906











Log of Determinant

7 553























E igenvaluet of Classical Covaiiance S Matrix







Eval 1

Eval 2

Eval 3

Eval 4











0 914

1 688

5 538

2231











Sum of Eigenvalues

231 3











100


-------
6.2 Goodness-of-Fit (GOF)

Several goodness-of-fit (GOF) tests for univariate data (both for full data sets, i.e.,
without non-detects, and for data sets with NDs) and multivariate data are available in
Scout. In this user guide, those tests and available options have been illustrated using
screen shots generated by Scout. For more details about those tests, refer to the ProUCL
4.00.04 Technical Guide and the Scout Technical Guide (in preparation).

6.2.1 Univariate GOF

Two choices are available for the goodness-of-fit menu: No NDs (Full) and With NDs.

O No NDs (Full)

o This option is used to analyze full data sets without any non-detect
observations.

o This option tests for the normal, gamma, or lognormal distribution of the

variables selected using the Select Variables option,
o GOF Statistics: this option simply generates an output log of the GOF test
statistics and any derived conclusions about the data distributions of all
selected variables.

O With NDs

o Analyzes data sets that have both non-detected and detected values,
o Six sub-menu items listed and shown below are available for this option.

1.	Exclude NDs

2.	Normal ROS Estimates

3.	Gamma ROS Estimates

4.	Lognormal ROS Estimates

5.	DL/2 Estimates

6.	GOF Statistics

Scout handles Univariate GOF tests in the same way as ProUCL 4.00.04. More
information can be obtained from the ProUCL 4.00.04 Technical Guide and User Guide
(Chapter 8). The major upgrade in Scout for the GOF test of univariate data from
ProUCL 4.00.04 is the presence of Shapiro-Wilk's test for observations greater than 50
and less than 2000 (Royston 1982).

Classical Coiielation R Matin









a

x1

x2

x3









y

1

0 743

0 708

0 757









x1

0 743

1

0 346

0 962









x2

0 708

0 94G

1

0 379









x3

0 757

0 362

0 373

1









Determinant

0 00125











Log of Determinant

¦G 683









	













Eigenvalues of Classical Correlation R Maine







Eval 1

Eval 2

Eval 3

Eval 4











0 0172

0 055S

0.368

3 559











Sum of Eigenvalues

4











I0I


-------
6.2.1.1 GOF Tests for Data Sets with No NDs
6.2.1.1.1 GOF Tests for Normal and Lognormal Distribution

Click Stats/GOF > GOF > Univariate > No NDs > Normal or Lognormal.

§! Scoutj 4.0,' [B^^ain.VScoytJooJ^indgy^ScqutSgur^ 1

l stats/GOP

Navigation Panel | J

1 j

| Descriptive ~ J ? I ? I i I R I c

7

£

Name



J Normal

p*
T

Group3><
118 467

u br

	>

Hypothesis Testing >| Multivariate | With NDs

~
~

Gamma

GOFNROSNorm gst

2

intervals ~ F

» 1 4 238 1



Lognormal

T

102922



II I 11 452I 11 4521 1



¦ 1 ¦¦ i—j^™

0

93 659



The "Select Variables" screen (section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

o If graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click "Options" for GOF options.

102


-------
Goodness-of-Fit (Normal, Lognormal)

Select Confidence Level -
C SO '/=

(• 95

r 99 '/,

r Method ¦

(• Shapiro V-/ilk
Lilliefors

-Graphs by Group	

<• Individual Graphs
C Group Graphs

-Graphical Display Options	

<• Color Gradient
f For Export (ESW Printers)

OK

m

Display Regression Lines	

Do Not Display
(* Display Regression Lines

Cancel

o The default option for the "Select Confidence Level" is "95%."

o The default GOF method is "Shapiro Wilk." If the sample size is
greater than 50, the program automatically uses the "Lilliefors" test.

o The default method for "Display Regression Lines" is "Do Not
Display." If you want to see regression lines on a Q-Q plot, then
check the radio button next to Display Regression Lines.

o The default option for "Graphs by Group" is "Individual Graphs."

If you want to see the plots for all selected variables on a single graph,
then check the radio button next to Group Graphs.

Note: This option for Graphs by Group is specifically provided when the user wants to display multiple
graphs for a variable by a group variable (e.g., site AOCI. site AOC2, and background) This kind of
display represents a useful visual comparison of the values of a variable (e.g., concentrations of COPC-
Arsenic) collected from two or more groups (e.g , upgradient wells, monitoring wells, residential wells)

103


-------
o The default option for "Graphical Display Options" is "Color
Gradient." If you want to see the graphs in black and white to be
included in reports for later use, then check the radio button next to
For Export (BW Printers).

• Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

Output Screen for Normal Distribution (Full).

Selected options: Shapiro Wilk, Display Regression Line, and For Export (BW Printers).

Normal Q-Q Plot for Arsenic

Arsenic
N - 20

Mean - 5.8725
Sd - 1.224/

Slope - 1.1798
Intercept - 5.8725
Correlation, R - 0.9281
Shapiro-Wilk Test
Test Value - 0.868
Critical Val(0.05) - 0.905
Data not Noimal

-1	0

Theoretical Quantiles (Standard Normal)

-d- Arsenic

104


-------
Output Screen for Lognormal Distribution (Full).

Selected options: Shapiro Wilk, Display Regression Lines, and Color Gradient.

o

¦a i 80

Lognormal Q-Q Plot for Arsenic

Theoretical Quantiles (Standard Normal)

N-20

Mean- 1.7519
Sd- 0.1917
Sbpe-0.19*7
IntetC'T* ¦ 1 7519
Correlation, R ¦ 0.3533
Shapto-Wflk Test
T»*st Stali£lic» 0.932
OBIc«ValLje9;OJJ5)-0.905
Dnla appear Lognorirvil

6.2.1.1.2 GOF Tests for Gamma Distribution

Click Stats/GOF ~ GOF ~ Univariate ~ No NDs ~ Gamma.

a Scout 4.0 • [D:\Narain\Scout_For_Windows\ScoutSourceVWorkDatlnExcel\Data\censor-by-grps1]

Stats/GOF

¦y File Edit Configure Data Graphs J
Navigation Panel I

Name

I'WinriMifflii

GOFNROSNorm.gstB

Cutters/Estimates Regression Multivariate EDA GeoStats Programs Window Help
Descriptive

~ I ?

Bl

Univariate

Dl

No NDs

T1

2

Hypothesis Testing ~ Multivariate

_1T~7

With NDs ~

Normal

Intervals

4.238

A R7

Lognormal
Statistics

p
-------
Goodness-of-Rt (Gamma)

m

-Select Confidence Level	

C 90°/.

(?¦ 95'/.

r 99 V.

-Method	

(• Anderson Darling
C Kolmogorcrv Smirnov

—Display Regression Lines	

C Do Not Display

(• Display Regression Lines

	I

"Graph by Groups	

<• Individual Graphs
C Group Graphs

-Graphical Display Options	

(* Color Gradient
C For Export (BW Printers)

OK

Cancel

o The default option for the "Confidence Level" is "95%."

o The default GOF method is "Anderson Darling."

o The default option for "Display Regression Lines" is "Do Not

Display." If you want to see regression lines on the Gamma Q-Q plot,
then check the radio button next to "Display Regression Lines."

o The default option for "Graph by Groups" is "Individual Graphs."

If you want to see the graphs for all the selected variables into a single
graph, then check the radio button next to "Group Graphs."

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to see the graphs in black and white, check
the radio button next to "For Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the option.

o Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

106


-------
Output Screen for Gamma Distribution (Full).

Selected options: Anderson Darling, Display Regression Lines, Individual Graphs, and Color Gradient.

Gamma Q-Q Plot for Mercury

c?	c^"	cy"	o£

Theoretical Quantiles of Gamma Distribution

• Mercury

N-30

Mean-0.3)55
k star • 1 .1722
Slcipe • 1 0468
Hetcept--00111
Cor elation, R • 05713
Andersorvdarfng Test
Tesl Statistic - 0 730
Crlteai Vatae(0 05) - 0 769
Dale appea* Gamma Dtstr touted

6.2.1.1.3 GOF Statistics

1.

Click Stats/GOF ~ GOF ~ Univariate ~ No NDs ~ Statistics.

J Scout 4.0 - [D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\Data\censor-by-grps1]

Stats/GOF T

•s File Edit Configure Data Graphs
Navigation Panel I

I OutliersiEstlmatesRegression Multivariate EDA GeoStatsPrograms Window Help

Name

D:\Narain\Scout Fo..

GOFNROSNorm.gst

Descriptive



~ I 7

	0	

Hypothesis Testing ~ Multivariate
Intervals

Univariate ~ I No NDs



4Fv>

4.238
4 Fi?

A I . 	5	I	7

Normal jpz
With NDs ~ Gamma

Lognormal

Group3X UJj£

1 11G.4E7
1 102.922
(1 93 RR9

2. The "Select Variables" screen (Section 3.2) will appear.

•	Select one or more variables from the "Select Variables" screen.

•	If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

Click "Options" for GOF options.

107


-------
-Select Confidence Level
C 30%

f 35%

C 33%

~ K

Cancel

A

o The default option for the "Confidence Level" is "95%."

o Click "OK" to continue or "Cancel" to cancel the option.

Click "OK" to continue or "Cancel" to cancel the Goodness-of-Fit
Statistics.


-------
Output for GOF Statistics for univariate data without Non-detects.

Date/Time of Computation

U sei S elected 0 ptians

From File
Full Precision

Confidence Coefficient

Goodness-of-Fit Test Statistics foi Full Data Sets without Non-Delects

i7T4/TooT4"oT4Gpm

DANaiain\Scout_For_Windows\ScoutSouice\WorkDatlnE>icel\BEETLES

OFF

095















x2



























Raw Statistics













Number of Valid Samples

74











Number of Distinct Samples

9











Minimum

8











Maximum

16











Mean of Raw Data

12 99









Standaid Deviation of Raw Data

2.142







I

Kstar

32 67











Mean of Log Transformed Data

2 549





I



Standard Deviation of Log Transformed Data

0177





|















Noimal Distribution Test Regit



























Shapiro Wilk Test Statistic

0.894











Shapiro Wilk Critical (0 95) Value

0 95











Lilliefors Test Statistic

0195







!

Lilliefors Critical (0 95) Valuej 0103











Data not Normal at (0.05) Significance Level



























Gamma Distribution Test Results

























A-D Test Statistic

3.183











A-D Critical (0 95) Value

0 749











K-S Test Statistic

0214











K-S Critica(0 95) Value

0.103











Data not Gamma Distributed at (0.05) Significance Level







I

Lognoimal Distribution T est Resifts

Shapiro Wilk Test Statistic	0.872

Shapiro Wilk Critical (0.95) Value	0.95

Lilliefors Test Statistic	0 225

Lilliefors Critical (0 95) Value	0 103
Data not Lognormaf at (0.05] Significance Level

109


-------
6.2.1.2 GOF Tests for Data Sets With NDs

6.2.1.2.1 GOF Tests Using Exclude NDs for Normal and Lognormal Distribution

Click Stats/GOF GOF > Univariate > With NDs > Exclude NDs >
Normal or Lognormal.

Scout'4'.Qi - [Q:\Narain\Scout^f?orp_V/indpwsJVSaoutSQurce\WorkpatlnExcel\DataVcensor-b^-grps1i]J

Stats/GOF

~y, File Edit Configure Data Graphs
Navigation Panel

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

GOFNROSNorm gst

1

Hypothesis Testing
Intervals 	U

4.52
7 233
20 777|

4 238
~T52
"7 233
20 777|

Normal-ROS Estimates ~
Gamma-ROS Estimates ~
Log-R05 Estimates
DL/2 Estimates
Statistics	3S5

1— .
~ 334
t

2. The "Select Variables" screen (Chapter 3) will appear.

° Select one or more variables from the "Select Variables" screen.

° If graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click "Options" for GOF options.

1 10


-------
Goodness-of-Fit (Itormal Lognormal)

"Select Confidence Level -
C 9D X

d 95

r 99 V.

-Method	

(• Shapiro Wi Ik
C Lilliefors

¦Display Regression Lines	

<*" Do Not Display
(• Display Regression Lines

-Graphs by Group1	

(* Individual Graphs
C Group Graphs

"Graphical Display Options	

(* Color Gradient
C For Export (BW Printers)

OK

Cancel

[~1

The default option for the "Confidence Level" is "95%."

The default GOF method is "Shapiro Wilk." If the sample size is
greater than 50, the program defaults to "Lilliefors" test.

The default for "Display Regression Lines" is "Do Not Display."
If you want to see regression lines on the associated Q-Q plot,
check the radio button next to "Display Regression Lines."

The default option for "Graphs by Group" is "Individual
Graphs." If you want to see the plots for all selected variables on
a single graph, check the radio button next to "Group Graphs."


-------
Note: This option for Graphs by Group is specifically useful when the user wants to display multiple
graphs for a variable by a group variable (e.g., site AOC1, Site AOC2, and background). This kind of
display represents a useful visual comparison of the values of a variable (e.g., concentrations of COPC-
Arsenic) collectedfrom two or more groups (e.g., upgradient wells, monitoring wells, and residential
wells).

o The default option for Graphical Display Option is "Color
Gradient." If you want to see the graphs in black and white,
check the radio button next to "For Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the option.

• Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

Output Screen for Normal Distribution (Exclude NDs).

Selected options: Shapiro Wilk, Display Regression Lines, Group Graphs, and For Export (BW Printers).

Arsenic (subsurface)

Total Number of Data - 10
Number treated a* NO - I
DL-4.5
N - 9

Percent NDs - 10%

Mean - 5.6778
Sd-05911
Slope - 0.1431
Intercept - 5.6778
Correlation, R - 0.2265
Shapiro Wilk Test
Test Statistic - 0.927
Critical Value|0.05) - 0.829
Data appear Normal
Arsenic (surface)

Total Number of Data - 10
Number treated as ND - 2
DL-4.5
N - 8

Percent NDs-20%

Mean - 6.6313
Sd- 1.4400
Slope - 0.1952
Intercept - 6.6313
Correlation, R - 0.1262
Shapiro-WilkTest
Test Statistic - 0.807
Critical Valuep.05) - 0.818
Data not Normal

Normal Q-Q Plots (Statistics using Detected Data)
forArsenic (subsurface), Arsenic (surface)

Theoretical Quantiles (Standard Normal)

-3 Arsenic (subsurface)	-o- Arsenic (surface)

112


-------
Output Result for Lognormal Distribution (Exclude NDs).

Selected options: Shapiro Wilk, Display Regression Lines, Group Graphs, and Color Gradient.

*}1.98

Lognormal Q-Q Plots (Statistics using Detected Data)
forArsenic (subsurface), Arsenic (surface)

• Arsenic (subsurface)

Theoretical Quantiles (Standard Normal)

» Arsenic (surface)

Ai sonic (eubsiirticc)

I c(al	of Date • 10

Njmber treated at I® • 1
0L- 15040774
H - 9

Percent NDs • 1U*

Mean • 1.7318
Sd-01019
Stope-0.1059
We«cef«-1.7319
ConeUnri, R • 0 8726
Shapr»-W* lest.

Test Statute-0 938
Cffccd Vo((0.05) • 9.829
Data sopMt Lognonal

¦ 10

ToMNl

Percent NDs « 20*

0.19S



18731
-09211
SfwproAMk Test
Test Statistic • 0.835
Crlicaf Va(0 OS) ¦ 0 818
Data tppem Lognormal

6.2.1.2.2 GOF Tests Using Exclude NDs for Gamma Distribution

1. Click Stats/GOF ~ GOF ~ Univariate ~ With NDs ~ Exclude NDs ~
Gamma.

Scout 4.0 • [D:\Narain\Scoul_For_Winduws\ScoutSource\WorkDatliiExcel\Data\censor-by-grps1]

Stats/GOF P

¦jj? File Edit Configure Data Graphs
Navigation Panel I

Name

D:\Narain\Scout Fo

GOFNROSNorm. gst

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help
Descriptive

I 7

	A	

No NDs

5

~ L.-n

6

7

8

U broup^



U broupj

Hypothesis Testing ~
Intervals

Multivariate

With NDs ~ I Exdude NDs

4.52
7233
20.777

4.238
4.52
7.233
20.777

1

1;
1 i
1

Normal

Normal-ROS Estimates ~ Q
Gamma-ROS Estimates ~ Lognormal
Log-ROS Estimates ~ Fjy"
DL/2 Estimates	~ 334	1

Statistics	965	1

2. The "Select Variables" screen (Chapter 3) will appear.

• Select one or more variables from the "Select Variables" screen.

If graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The

113


-------
user should select and click on an appropriate variable representing a
group variable.

Click "Options" forGOF options.

Goodness-of-Rt (Gamma)

"Select Confidence Level ¦
r SO '/.

C 95%

C 99 %

¦Graph by Groups	

(~,. Individual Graphs
C' Group Graphs

"Graphical Display Options	

<•" Color Gradient
C For Export (BW Printers)

OK

"Method	

(* .Anderson Darling
C Kolniogorov Smirnov

"Display Regression Lines	

C Do Not Display
f* Display Regression Lines

Cancel

m

o The default option for the "Confidence Level" is "95%."

o The default GOF test method is "Anderson Darling."

o The default method for "Display Regression Lines" is "Do Not
Display." If you want to see regression lines on the normal Q-Q plot,
check the radio button next to "Display Regression Lines."


-------
o

The default option for "Graph by Groups" is "Individual Graphs."
If you want to display all selected variables on a single graph, check
the radio button next to "Group Graphs."

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to see the graphs in black and white, check
the radio button next to "For Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the option.

• Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

Output Screen for Gamma Distribution (Exclude NDs).

Selected options: Anderson Darling, Do Not Display, Individual Graphs, and For Export (BW Printers).

Gamma Q-Q Plot for Mercury with NDs
Statistics using Detected Data

Mercury

Total Number of Data - 30
Number treated a* NO - 5

01 - 0.5
N - 25

Percent NDs-17%

Mean - 0.3124
k star - 1.0533
Slope - 1.0294
Intercept - £.0048
Correlation, R- 0.9590
Anderson-Darling Test
Test Statistic-0.861
Critical Value|0.05) - 07/0
Data not Gamma Distributed

Theoretical Quantiles of Gamma Distribution

--J- Mercury

115


-------
6.2.1.2.3 GOF Tests Using Loa-ROS Estimates for Normal and Lognormal
Distribution

1. Click Stats/GOF GOF E> Univariate t> With NDs l> Log-ROS Estimates
> Normal or Lognormal.

Hil Scout- 4.0, - [0:\yarain\Scqut Jqty_W,indowsJScoutSource\WqrkDatlnE^el^ata\^nsorj-h^-gi;[is1j]|

Stats/GOF

~0 File Edit Configure Data Graphs
Navigation Panel

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name



GOFNROSNorm gst

1

Descriptive

~n

Dl

Hypothesis Testing ~
Intervals

_L

Univariate

Multivartate

No NDs

-ui—L 5 I 7

F	I u btOUP^I .

-,3=

4 52i

7 233

20 777i

4 238

4 52
"7233

20 777

_LjfOUP4X
Exclude NDs
Normal-R05 Estimates ~ U

tB7|

u roup J

y	

2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

o If graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

Gamma-ROS Estimates ~

DL/2 Estroates	~ Gamma

Statistics	Lognormal

° Click "Options" for GOF options.

116


-------
Goodness-of-Fit (Normal, Lognormal)

"Select Confidence Level -

r 90 %

& 55

r 99 s/„

¦Method	

f* Shapiro WiIk
r Lilliefors

'Display Regression Lines-——
f~" Do Not Display
(* Display Regression Lines

¦Graphs by Group	

(* Individual Graphs
r* Group Graphs

"Graphical Display Options	

(* Color Gradient
C For Export (BW Printers)

OK

Cancel

o The default option for the "Confidence Level" is "95%."

o The default GOF test method is "Shapiro Wilk." If the sample size is
greater than 50, the program defaults to use the "Lilliefors" test.

o The default method for "Display Regression Lines" is "Do Not
Display." If you want to see regression lines on the normal Q-Q plot,
check the radio button next to "Display Regression Lines."

117


-------
o The default option for "Graphs by Group" is 'Individual Graphs."
If you want to display all selected variables into a single graph, check
the radio button next to "Group Graphs/''

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to see the graphs in black and white, check
the radio button next to "For Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the option.

• Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

Output Screen for Normal Distribution (Log-ROS Estimates).

Selected options: Shapiro Wilk, Display Regression Lines, Group Graphs, and For Export (BW Printers).

10.00
9.00

2.00
t.00
0.00

Normal Q-Q Plots using Robust ROS Method
for Arsenic, Mercury

O m	™

-=i- Arsenic

1	o	1

Theoretical Quantiles (Standard Normal)

o- Mercury

Arsenic
N - 20

Mean - 5.8307
Sd-12799
Slope - 1.2517
Intercept - 5.8307
Correlation, R - 0.9421
Shapiro-WilkTest
Test Value » 0.895
Critical Val(0.05) - 0.905
Data not Normal
Mercury
N - 30

Mean - 0.2767
Sd - 0.2984
Slope - 0.2643
Intercept - 02767
Correlation, R - 0.8618
Shapiro-WilkTest
Test Value - 0.733
Critical Val(0.05) - 0.927
Data not Normal

118


-------
Output Screen for Lognormal Distribution (Log-ROS Estimates).

Selected options: Shapiro Wilk, Display Regression Lines, Group Graphs, and Color Gradient.

* Arsenic (subsurface)

Arsenic (subsurface)

N-10

Mean -1 7068
Sd-0.1246
Slope - 0 .1308
Intercept • 1 7063
Correlation, R - 0 JJ866
Shapro-WHi Test
Test Statistic - 0 931
Crticai Valued 051-0842
Data appear Lospurmai
Arsenic (surface)

N-10

Mean -1 7781
Sd-0 2678
Slope-0 2758
Iriercept ¦ 17781
Correlation, R - 0$82
Shapro-VMIk Test
Test Statistic -0.9*0
CriUeal Vafc*(0.05J - 0.842
Data appear Logncrmal

Lognormal Q-Q Plot for Group
Statistics using Robust ROS Method

-1	o	1

Theoretical Quantiles of Gamma Distribution

» Arsertc (surface)

6.2.1.2.4 GOF Tests Using Log-ROS Estimates for Gamma Distribution

1. Click Stats/GOF ~ GOF ~ Univariate ~ With NDs ~ Log-ROS Estimates
~ Gamma.

01Scout 4.0 [D:\Narain\Scout_For_Windows\ScoutSource\WorkDa1lnExcel\Data\censor by-grpsl]

^ File Edit Configure Data Graphs

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Navigation Panel j

Name

HMffllMffM

GOFNROSNorm.gst

Descriptive

M ? i

d msmmi

Hypothesis Testing ~ Multivariate

JJ"

L	A_

No NDs

¦i—. 3 6 7 8
* I	U broup
-------
Click "Options" for GOF options.

Goodness-pf-Flt (Gamma)

'Select Confidence Level ¦

r so >;

55 %

C 99°/.

-Graph by Groups	

(*, Individual Graphs
C Group Graphs

"Graphical Display Options	

(• Color Gradient
C For Export (B'.'J Printers)

"Method	

<• Anderson Darling
r Koln-.ogorov Smirnov

"Display Regression Una	

C Do Not Display
(* Display Regression Lines

OK

Cancel

o The default option for the "Confidence Level" is "95%."

o The default GOF test method is "Anderson Darling."

o The default method for "Display Regression Lines" is "Do Not
Display." If you want to see regression lines on the normal Q-Q plot,
check the radio button next to "Display Regression Lines."

o The default option for "Graph by Groups" is "Individual Graphs."
If you want to put all of the selected variables into a single graph,
check the radio button next to "Group Graphs."


-------
o The default option for "Graphical Display Options" is "Color

Gradient." If you want to see the graphs in black and white, check the
radio button next to "For Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the options.

• Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

Output Screen for Gamma Distribution (Log-ROS Estimates).

Selected options: Anderson Darling, Display Regression Lines, Individual Graphs, and Color Gradient.

Gamma Q-Q Plot for Mercury
Statistics using Robust ROS Method

Mercury

N * 30

Mean • 0.2767
k star -10653
Stope» 11071
Wercepl ¦ -0.0262
Correlation, R ¦ 0 9579
AndersovOering Test
TesJ Stafette -1 425
Crtical Vslue<0 05) - 0.772
Dais not Gemma MsttxJed
0.0058 - 0 0400
0.0165 - 0 0500
0 0336 - 0 0650
0Q326 - 0.0550
0.0492 - 0.0600
0.0607 - 0 0645
0.0785 - 0 0700
0.0785 - 0.0700
00974 - 0 .0067
0.1107-0 0922
1.1000 - 0 9900

I2l


-------
6.2.1.2.5 GOF Tests Using DL/2 Estimates for Normal or Lognormal Distribution

1. Click Stats/GOF E> GOF > Univariate With NDs DL/2 Estimates >
Normal or Lognormal.

liBI Scout4.0) - [D^arainyScoutLggrjyjnd^s^coutSDurceW/orWallhExcelXBat^censoii-.b^.-giipjIi]!

Navigation Panel j I
Name |



Descriptive

~ | Univariate > 1

4 I , 5 6 7 8 |
No NDs ~ u brouoz; K u ljfOUPJ

GOFNROSNorm gst

1

2

Hypothesis Testing
Intervals

	1" -

Multivariate |
_J 11 4 238

With NDS ~

Exclude NDs ~ fej	ft-

Normal-ROS Estimates ~ 1 - ,

Gamroa-ROS Estimates ~ \\ ^

3

1

| 452

1| 4 52

1l

Loq-ROS Estimates ~

4

1

7.233

11 7 233

1

DL/2 Estimates ~

5

1

20 777| 1| 20.777i 11

Statistics | Gamma |

G

ij

14138, 1| 14138, 1< 18 4S7j 1| 100MUK1UUM|

2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

o If graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

® Click "Options" for GOF options.

122


-------
GoodnessHif-Fit (Normal Lognormal)

"Select Confidence Level"

r SO1/.

95 %

C 99 V,

"Method	

(* ShapiroV/iIk
C Lilliefors

"Display Regression Lirves	

f~" Do Not Display

(*¦ Display Regression Lines'

¦Graphs by Group	

(* Individual Graphs
r Group Graphs

"Graphical Display Options	

(* Color Gradient
C For Export (BW Printers)

OK

Cancel

The default option for the "Confidence Level" is "95%."

The default method is "Shapiro Wilk." If the sample size is greater
than 50, the program defaults to the "Lilliefors" test.

The default method for "Display Regression Lines" is "Do Not
Display." If you want to see regression lines on the normal Q-Q plot,
check the radio button next to "Display Regression Lines."

123


-------
o The default option for "Graphs by Group" is "Individual Graphs."

If you want to put ail of the selected variables into a single graph,
check the radio button next to "Group Graphs."

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to see the graphs in black and white, check
the radio button next to "For Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the option.

• Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

Output Screen for Normal Distribution (DL/2 Estimates).

Selected options: Shapiro Wilk, Display Regression Lines, Group Graphs, and Color Gradient.

124


-------
Output Screen for Lognormal Distribution (DL/2 Estimates).

Selected options: Shapiro Wilk, Display Regression Lines, Individual Graphs, and For Export (BW
Printers).





Lognormal Q-Q Plot for Mercury



Metcury

0.00



Statistics using DL/2 Method



N - 30

Mean - -1.7413



J /



4)20



J

J /



Sd - 0.9845

-0.40



J /



Slope - 0.9865

-0.60



J /



Intercept • -1.7413







Correlation, R k 0.9748

41.80



J



Shapiro Wilk Test

-1.00

l/>

§ '20







Test Statistic - 0.931



/ J



Critical Value(0.05) = 0.927

f3 -1.40

0> 1.60
(/>

O 180



/ J

J

/j

/ j



Data appear Lognormal

¦O .2.00



/ j -1















2

1 0 1

Theoretical Quantiles of Gamma Distribution

2





-J- Mercury







6.2.1.2.6 GOF Tests Using DL/2 Estimates for Gamma Distribution

1. Click Stats/GOF ~ GOF ~ Univariate ~ With NDs ~ DL/2 Estimates ~
Gamma.

[3 Scout 4.0 - [D:\Narain\Scout_For_Winduws\ScoutSource\WorkDatlnExcel\Data\censor by grpsi]

Stats/GOF P

¦S File Edit Configure Data Graphs
Navigation Panel I

Name

D:\Narain\Scout Fo.

GOFNROSNorm.gst

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help
Descriptive

~ f ?

~

Hypothesis Testing ~ Multivariate
Intervals	> f

	I	

No NDs

4.52
7.233
20.777
14.138

4238
452
7.233
20.777
14.138

5	6	7	8

I L> brouw r	U_broupJ

Exclude NDs	~	"

Normal-ROS Estimates ~

Gamma-ROS Estimates ~

Log-ROS Estimates ~

DL/2 Estimates

Statistics

D

G7
22
159
Normal

18 467

2. The "Select Variables" screen (Section 3.2) will appear.

• Select one or more variables from the "Select Variables" screen.

-jQQ Lognormal

If graphs have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The

125


-------
user should select and click on an appropriate variable representing a
group variable.

Click "Options" for GOF options.

Goodness-of-Fit (Gamma)

"Select Confidence Level
C 90 %

G 95:4
C 99%

"Graph by Groups	

(*,. Individual Graphs
C Group Graphs

"Graphical Display Options	

Color Gradient
C For Export (BW Printers)

"Method	

(* .Anderson Darling
C Kolnvogorov Smirnov

"Display Regression Lines-	

C Do Not Display

(*, Display Regression Lines

OK

Cancel

o The default option for the "Confidence Level" is "95%."

o The default method is "Anderson Darling."

o The default method for "Display Regression Lines" is "Do Not
Display " If you want to see regression lines on the normal Q-Q plot,
check the radio button next to "Display Regression Lines."


-------
o The default option for "Graph by Groups" is "Individual Graphs."
If you want to put all of the selected variables into a single graph,
check the radio button next to "Group Graphs."

o The default option for "Graphical Display Options" is "Color
Gradient." If you want to see the graphs in black and white, check
the radio button next to "For Export (BW Printers)."

o Click "OK" to continue or "Cancel" to cancel the options.

• Click "OK" to continue or "Cancel" to cancel the goodness-of-fit tests.

Output Screen for Gamma Distribution (DL/2 Estimates).

Selected options: Anderson Darling, Display Regression Lines, Individual Graphs, and Color Gradient.



Gamma Q-Q Plot for Mercury





Moicuiy



Statistics using DL/2 Substitution Method





N-30

Me»n-0 2829









k star • 1 0875
Slope -1 0897

090







Nocep. -0023C
CorreWon, R • 0 9617
Andwson-Dartng Test

0 60

j





Test Statistic ¦ 1.128









CJfccal Value(0 051 ¦ 0 771







bate ix* Ottrma Ostrfculed

0 70

-







w

C 0 60









a









e

1050









o









£ o.«o

/







¦o









O

/







0J0

*

J 4







020

M

a *







0.10

/''« *
M







000









c? Or

*> O* O* 5? Q*

Theoretical Ouantiles of Gamma Distribution





S>

-*~ Msrcury









127


-------
6.2.1.2.7 GOF Statistics

I. Click Stats/GOF GOF E> Univariate l> With NDs l> Statistics.

Scout 4'.ft - [B:^ai;ainWcgutJopJA(indow%\ScoutSgurc%VWgr^atrnE^el\Data\censoi;-b^-gcpsJ]j

Stats/GOF

~0 File Edit Configure Data Graphs
Navigation Panel

Outliers/Estimates Regression Multivariate EDA Geo5tats Programs Window Help

Name

GOFNROSNorm gst

1

Descriptive	~ I ¦> I t j n

No NDs

Hypothesis Testing ~ ] Multivariate
Intervals	'

5 I 6 I 7 |

^ , - ¦ - I U bfOUP^ I r... I

IS Exclude NDs	~ —

4 52

7 233

20 777j

4 238]
4 52

7 233

20 777|

1671

Normal-ROS Estimates ~ I—-
3221

Gamma-ROS Estimates ~ |

Log-ROS Estimates ~
DL/2 Estimates

2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

° If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click "Options" for GOF options.

Sal QfttiionsGQIf Stats;

"Select Confidence Level
C 90%

(*35%
C 33%

OK

Cancel

o The default option for the "Confidence Level" is "95%."

o Click "OK" to continue or "Cancel" to cancel the option.

° Click "OK" to continue or "Cancel" to cancel the Goodness-of-Fit
Statistics.

128


-------
Output for GOF Statistics for univariate data with Non-detects.

; G oodness-of-Fit T est S tatistics for D ata 5 ets wkh Non-Detects

Date/Time of Computation

1/25/20081:01-29 PM

Usei Selected Options



From File

D: \Narain\S cout_F or_W indows\S coutS ouiceW/orkD atl nE xcel\D ata\censor-by-grps1

Full Precision

OFF

Confidence Coefficient

0.95





GtouplX





Obs No.

Num Miss

Num Valid

Detects

NDs

% NDs

GrouplX Data

10

0

10

8

2

20.00%





Number

Minimum

Maximum

Mean

Median

SD

Statistics (Non-Detects Only)

2

4

4

4

4

0

Statistics (Detects Only)

8

3 202

20.78

9.277

6 704

6 283

Statistics (All. NDs treated as DL value]

10

3 202

20 78

8.222

5.347

5.971

Statistics (All. NDs treated as DL/2 value)

10

2

20 78

7.822

5.347

6 334

Statistics [Normal ROS Estimated Data]

10

-2.508

20 78

7.256

5.347

7 034

Statistics (Gamma ROS Estimated Data)

10

1.421

20 78

8.027

5 405

6182

Statistics (Lognormal ROS Estimated Data]

10

2 011

20.78

7 917

5347

6.243





KHat

K Star

Theta Hat

Log Mean

Log Stdv

Log C V

Statistics (Detects Only)

2.674

1 938

3.469

2.029

0.673

0 332

Statistics (NDs = DL)

2.578

1 872

3189

1.901

0 652

0 343

Statistics (NDs = DL/2)

1.844

1 357

4 242

1 762

0 818

0 464

Statistics (Gamma ROS Estimates)

1 995

1.463

4 024







Statistics (Lognormal ROS Estimates)







1 801

0.769

0 427

129


-------
Output for GOF Statistics for univariate data with Non-detects (continued).

Noimal Distribution T est Resits

Shapiro-Wilks (Detects Only)

T est value
0 866
0 253~
0.796

Cut (0.95)
0.818

Conclusion with Alpha(0.05)
Data Appear Normal

Lilliefors (Detects Only)
Shapiro-Wilks (NDs = DL)

0.313
" 0.842 '

Data Appear Normal
Data Not Normal

Lilliefors (NDs = DL)
Shapiro-Wilks (NDs = DL/2)

0.266
~0.848

0 28

Data Appear Noimal
Data Appear Normal

0 842

Lilliefors (NDs = DL/2)
Shapiro-Wilks (Normal ROS Estimates]

0.237

0.28

Data Appear Normal

0 941

0.842

Data Appear Normal

Lilliefors (Normal ROS Estimates)] 0.201 | 0.28 j Data Appear Normal

G amma D istribution Test R esuls

j T est value

Crit (0 95)

Conclusion with Alpha(0.05)

Anderson-Darling (Detects Only)j 0.404

0 722



Kolmogorov-Smirnov [Detects Only)

0.197

0 297

Data Appear Gamma Distributed

Anderson-Darling (NDs = DL)

0.737

0 734



Kolmogorov-Smirnov (NDs = DL]

0 244
0.367

0 269

Data appear Approximate Gamma Distribution

Anderson-Darling (NDs = DL/2)

0 737



Kolmogorov-Smirnov (NDs = DL/2)

0165

0.27

Data Appear Gamma Distributed

Anderson-Darling (Gamma ROS Estimates)

0 355

0.736



Kolmogorov-Smirnov (Gamma ROS Est)

0178

0.27

Data Appear Gamma Distributed





Lognormal D istribution T est R esdts





T est value

Crit (0.95)

Conclusion with Alpha(0.05)

Shapiro-Wilks (Detects Only)

0.932

0 818

Data Appear Lognormal

Lilliefors (Detects Only)
Shapiro-Wilks (NDs = DL)

0.191

0.313

Data Appear Lognormal
Data Appear Lognormal

0.878

0.842

Lilliefors (NDs = DL)
Thapiro-WilkslFlD7=~DL72)

0 226

0 28

Data Appear Lognormal
Data Appear Lognormal

0 94

0 842

Lilliefors (NDs = DL/2)

0.157

0 28

Data Appear Lognormal
Data Appear Lognormal

Shapiro-Wilks (Lognormal ROS Estimates)

0.951

0 842

Lilliefors (Lognormal ROS Estimates)

0.161

0 28

Data Appear Lognormal

Note: DL/2 is not a recommended method



130


-------
6.2.2 Multivariate GOF

The multivariate goodness-of-fit test to test for multinormality of a data set can be
performed using Scout. Several test statistics, including the correlation coefficient based
upon ordered Mahalanobis distances (MDs) versus beta distribution quantiles (and also
approximate chi-square quantiles), multivariate kurtosis, and multivariate skewness, are
available in Scout. The details of those statistics can be found in Singh (1993) and
Mardia (1970).

I. Click Stats/GOF > GOF >¦ Multivariate.

File Edit Configure Data Graphs
Navigation Panel

Name

D\Narain\Scout Fo

2.

? Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help
3_

Descriptive

Hypothesis Tesbng
Intervals

J>] Univariate

~OM® it



951

4

5

6

7

8'

i x3









3j 28 3|
5l 289j

1

	

2. The "Select Variables" screen (Section 3.4) will appear.

o Select two or more variables from the "Select Variables" screen.

° If graphs have to be produced by using a Group variable, then select a group
variable by clicking the arrow below the "Group by Variable" button. This
will result in a drop-down list of available variables. The user should select
and click on an appropriate variable representing a group variable.

Click "Options" for the multivariate GOF options.

@0

"Display Regression Lines 1	


-------
o Specify the preferred "Critical Alpha." The default is "0.05."

o Specify the distribution (scaled beta or approximate chi-square) of
the MDs used to compute the quantiles. The default is a "Beta"
distribution.

o The default option for Display Regression Lines is "Do Not

Display", and the default option for "Graphical Display Options"
is "Color Gradient."

o Click on "OK" to continue or "Cancel" to cancel the GOF options.

• Click on "OK" to continue or "Cancel" to cancel the GOF computations.

Output Screen for Multivariate GOF.

GOF Q-Q Plot of MDs

Multivariate GOF Statistics

47.94

N « 75



P = 4

45.94

Slope - 2.8882

43.94 M

Intercept - -1.7628

41.94

Skewness(0.0S)» 2.3990

Skewness - 31 0467

39.94

Kirtosis(0.05) ¦ 25.2002

37 94

Kurtosis = 53.9679

Beta Correlation Coetflciert(0.05) - 0 994"

35.94

Beta Correlation Coefficient - 0 8738



Data set does not appear to be Mullnormal

33.94



31.94



29.94



27.94 d



V)



• 25 94



j§ 23 94



w



Q 21 94



T3



£ 1994



P M



517 94



15.94



4



13.94



11.94



9.94



7 94 *



5M ,4m*"' '



3 94



194 [	



-0.06



-2X6



-4 06



-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14



Beta Quantiles



Note: Several test statistics (correlation coefficient, skewness, and kurtosis) are shown in the above GOF
display. Singh (1993) has outlined some of these procedures to assess multivariate normality. Critical
values for these three statistics have been computed using extensive Monte Carlo simulations. Critical
values are still being simulated at the time of publishing this document. These values will be available in
the Q-Q plots in the near future. The developers of Scout may be contacted to obtain these critical values.
They do plan to publish them in the near future.

132


-------
6.3 Hypothesis Testing

Scout can perform hypothesis tests on data sets with and without ND observations. When
one wants to use two-sample hypothesis tests on data sets with NDs, Scout assumes that
samples from both of the groups have non-detect observations. This means is that a ND
column (with 0 or 1 entries only) needs to be provided for the variable in each of the two
groups. This has to be done even if one of the groups has all detected entries; in this
case, the associated ND column will have all entries equal to " 1." This will allow the
user to compare two groups (e.g., arsenic in background vs. site samples) with one group
having NDs and the other group having all detected data.

The hypothesis testing module of Scout is exactly same as the one available in ProUCL
4.00.04. ProUCL 4.00.04 has been developed to address several environmental
applications. More information on those methods can be obtained from the ProUCL
4.00.04 Technical Guide and User Guide (Chapter 9), respectively.

Note¦ Since the hypothesis testing module of Scout is imported from ProUCL 4 00 04, most of the
terminology used (site concentration, background concentration, background threshold values, etc.) are
borrowedfrom various environmental applications However, all of those tools (e.g., t-test. Gehan test)
can be used in various other applications. For an example, a two-sample t-test can be used to compare the
means of distributions of any two variables Similarly, the Gehan test may be used to compare the
measures of centra! tendency of two distributions based upon data sets with below detection limit
observations

6.3.1.1 Single Sample Hypothesis Tests for Data Sets with No Non-detects

6.3.1.1.1 Single Sample t-Test

l. Click Stats/GOF > Hypothesis Testing E> Single Sample > No NDs > t-
Test.

Scp.ut 4.,0j - [B:J\yarainJSMut_l;q!L.Windows\ScqutSource^WQrkDatlnExcel\Data,\censori-bY.-gii50i]|

Stats/GOF

File Edit Configure Data Graphs
Navigation Panel

Outliers/Estimates Regression Multivariate EDA Geo5tats Programs Window Help

Name

D \Narain\Scout_Fo
HTSS_NoNDs_tTes.
HTSS_NoNDs_Sig .

UTCO MrtMflr. c...

4 52

452

Proportion
Sign test

Wilcoxon Signed Rank

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

133


-------
When the options button is clicked, the following window will be shown.

"Select Null Hypothesis Form	

Mean <= Compliance Limit (Forml)

C Mean ?= Compliance Limit (Form 2)

C Mean Compliance Limit + S (Form 2)

Mean = Compliance Limit (2 Sided Alternative)

o Specify the "Confidence Level." The default is "0.95."

o Specify meaningful values for "Substantial Difference, S" and the
"Compliance Limit." The default choice for S is "0."

o Select the form of Null Hypothesis. The default is Mean <=
Compliance Limit (Form 1).

o Click "OK" to continue or "Cancel" to cancel the options.

Click "OK" to continue or "Cancel" to cancel the test.

Confidence Level | 0 95

Substantial Difference. S | jj[

(Used v.ith Test Form 2)

Compliance Limit 1 0

OK


-------
Output for Single Sample t-Test (Full Data without NDs).

	j.	i	I	

1 Sample-1



	

Single Sample t-Test







Raw Statistics





Number of Valid Samples

" ~T '



Number of Distinct Samples

9



Minimum

82 39



Maximum

1132



Mean

59 33

	

Median

103.5

SD

10.41

SE of Mean

3.463

	

HO: Site Mean = 100

	

	

Test Value
Two Sided Critical Value (0 05)

-0.178



2 30S



P-Value

0.8S3

—

Conclusion with Alpha =0.05

	

		

Do Not Reject HO. Conclude Mean = 100

P-Value > Alpha (0.06)

6.3.1.1.2 Single Sample Proportion Test

I. Click Stats/GOF t> Hypothesis Testing B> Single Sample [> No NDs E>
Proportion.

Scquti4.Qi-. [D:l\Marain\ScputJor._V/lindows,VScgutSDurc^\^brj^atlnEOTel.\Dala1\censqr,-by/Tgii[)S;1j]|

Stats/GOF

~y, File Edit Configure Data Graphs
Navigation Panel I

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

D \Narain\Seout Fo
HTSS_NoNDs_tTes
HTSS_NoNDs_Sig

~ R

Descriptive
GOF

Hypothesis Testing >1 Single Sample Tests ~

Intervals

-	-.v; i U laroupi I	I U broup^

~ I Two Sample Tests ~ ! With NDs ~

j

u roup J







		 	I	

4 52

1

4.52

1

| t-Test |

V

1

Proportion

i"

j 5ign test

Wikoxon Signed Rank

1

2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

135


-------
If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

When the options button is clicked, the following window will be shown.

' Single Sample Proportion Test Options [><]

Confidence Level | 0 95
Proportion | o 3
Action/Compliance Limit | 6

"Select Null Hypothesis Form"

(• P <= Porportion (Form 1)

P ;= Proportion (Form 2)

P = Proportion (2 Side Alternatived)

OK

Cancel

A

o Specify the "Confidence Level." The default is "95."

o Specify the "Proportion" level and a meaningful
"Action/Compliance Limit."

o Select the form of Null Hypothesis. The default is P <= Proportion
(Form 1).

o Click "OK" to continue or "Cancel" to cancel the options.

Click "OK" to continue or "Cancel" to cancel the test.


-------
Output for Single Sanplc Proportion Test (Full Data without NDs).















One-Sample Proportion Test





Raw Statistics





Number of Valid Samples

35" j



Number of Distinct Samples

33 |







Minimum

0~5S3 j"







Maximum

7.676









Mean

5133









Median

5.564 J







SD

T.533 j







SEof Mean

0.172 |

Number of Exceed a rices

27 i

i



Sample Proportion of Exceedances j 0 31S |













| HO: Site Proportion <

= 0.3 (Fonnl)















Large Sample z

-Test Value j 0 237 |





Critical Value (005)| 1 645 j







P-Valuej 0.406 |











; Conclusion with Alpha -0.05





Do Not Reject HO. Conclude Site Proportion <= 0.3

P-Value > Alpha (0.05)

	

-

. 	

6.3.1.1.3 Single Sample Sign Test

l. Click Stats/GOF > Hypothesis Testing t> Single Sample > No NDs > Sign
test.

Seoul 4.0; - [IJ:\yarahKMut_For.LVVJindw^NScoutSpurce\Wgrl®atlnE>ce.l\Da_tg\censor-by,-gri|)sJj]|

Stats/GOF

~g File Edit ConFqure Data Graphs
Navigation Panel

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Wndow Help

Name

D \Narain\Scout_Fo .
HTSS_NoNDs_tTes. .
HTSS_NoNDs_Sig .
HTR.S NnNDq Rm

1

~ |2 | 3

^ )fs	I 	

Descriptive
GOF

Hypothesis Testing >1 Single Sample Tests >1 No NDs

Intervals

Two Sample Tests

With NDs ~

4 52|

7 ttjP

4 52 j

t-Test
Proportion

Fu broupj
_y	

Wilcoxon Stoned Rank
7r3Tl	il n Rkh ¦ ¦ — ¦ —1 t -1 "¦urrwr--

137


-------
2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

o If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° When the options button is clicked, the following window will be shown.

§1 Single. Sample Sign test Options,

Confidence Level | 0 95

Substantial Difference. S

(Used with Test Form 2)

1

Action/Compliance Limit | 0~

"Select Null Hypothesis Form	

<• Median <- Compliance Limit (Form 1)

Median ;= Compliance Limit (Form 2)
f Median ;= Compliance Limit+ S (Form 2)
Median - Compliance Limit (2 Sided Alternative)

OK

Cancel



o Specify the "Confidence Level." The default choice is "0.95."

o Specify meaningful values for "Substantial Difference, S" and
"Action/Compliance Limit."

o Select the form of Null Hypothesis. The default is Median <=
Compliance Limit (Form 1).

o Click "OK" to continue or "Cancel" to cancel the options.

° Click "OK" to continue or "Cancel" to cancel the test.

138


-------
Output for Single Sample Proportion Test (Full Data without NDs).













— - -

Single Sample Sign Test







Raw Statistics







Number of Valid San-,pies

"10





Number of Distinct Samples

10









Minimum

750









Maximum

1161









Mean

9257









Median

388









SD

136.7









SE of Mean

43 24







Number .Above Limit

" ""3" "







Number Equal Limit

0







Number Below Limit

. __



HO: Site Median >=1000 (Form?)

	

	

	

Test Vciue

.......





Lower Critical Value (0.05)

1









~ P-Vsiue

o.TtF



Conclusion with Alpha = 0.05

Do Not Reject HO. Conclude Median >= 1000



P-Value> Alpha (0.05)







6.3.1.1.4 Single Sample Wilcoxon Signed Rank Test

l. Click Stats/GOF l> Hypothesis Testing Single Sample E> No NDs I>
Wilcoxon Signed Rank test.

139


-------
2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

° If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° When the options button is clicked, the following window will be shown.

Confidence Level | 095
Substantial Diffeience. S

(Used with Test Form 2)

I r

Action/Compliance Limit j 0

"Select Null Hypothesis Form	

<• Mean/Median <= Compliance Limit (Form 1)
Mean/Median >= Compliance Limit (Form 2)
PileanMedian >= Compliance Limit + S (Form 2)
f Mean/Median = Compliance Limit (2 Sided Alternative)

OK	Cancel

//,

o Specify the "Confidence Level." The default is "0.95."

o Specify meaningful values for "Substantial Difference, S," and
"Action/Compliance Limit."

o Select the form of Null Hypothesis. The default is Mean/Median <=
Compliance Limit (Form 1).

o Click "OK" to continue or "Cancel" to cancel the option.

o Click "OK" to continue or "Cancel" to cancel the test.

140


-------
Output for Single Sample Wilcoxon Signed Rank Test (Full Data without i\l)s)

Single Sample Wilcoxon Signed Rank Test



Raw Statistics

Number of Valid Samples

10



Number of Distinct Samples

10



Minimum

750



Maximum

1161



Mean

925 7



Median

m



SD

13G7



SE of Mean

4324



Number Above Limit

3



Number Equal Limit

0



Number Below Limit

/



T-plus

11 5



T-minus

43 5





HO: Site Median <= 1000 (Form 1)



Test Value

11 5



Critical Value (0 05)

45



P-Value

0 947





Conclusion with Alpha = 0.05

Do Not Reject HO. Conclude Mean/Median <= 1000

P-Value > Alpha (0.05)

6.3.1.2 Single Sample Hypothesis Tests for Data Sets With Non-detects
6.3.1.2.1 Single Sample Proportion Test

l. Click Stats/GOF > Hypothesis Testing l> Single Sample ^ With NDs >
Proportion test.

I4l


-------
3. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

» If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° When the options button is clicked, the following window will be shown.

! PH Single Sample Proportion Test Options

0 95

03

Confidence Level
Proportion

Action/Compliance Limit | 6

"Select Null Hypothesis Form	

(* P <= Porportion (Form 1)
P ;= Proportion (Form 2)
P = Proportion (2 Side Alternative^)

OK

Cance

A

o Specify the "Confidence Level." The default is "0.95."

o Specify meaningful values for "Proportion" and the
"Action/Compliance Limit."

o Select the form of Null Hypothesis. The default is P <= Proportion
(Form I).

o Click "OK" to continue or "Cancel" to cancel the option.

° Click "OK" to continue or "Cancel" to cancel the test.

142


-------
Output for Single Sample Proportion Test (with NDs).

Arsenic

Single Sample Proportion Test

Raw Statistics

Number of Valid Samples

Number of Distinct Samples
Number of Non-Detect Data

Number of Detected Data

24

10
'13'

11

Percent Non-Detects

Minimum Non-detect
Maximum Non-detect

Minimum Detected

Maximum Detected

Mean of Detected Dat3

Median of Detected Data

SD of Detected Dat3

Number of Exceed a noes

54 17'/,

0.9
2

0.5

"IT

1.23S

0.7

0.565

Sample Proportion of Exceedances

0.0S33

Some Non-Detect Values Exceed

The User Selected Action/Compliance Unit

Unabletodo Proportion Test with such param^Efs

143


-------
6.3.1.2.2 Single Sample Sign Test

1. Click Stats/GOF > Hypothesis Testing ~ Single Sample > With NDs !>
Sign test.

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

° If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° When the options button is clicked, the following window will be shown.

m

Single-Sample,-Sign. Tert. Options.

Confidence Level j 0 35

Substantial Difference. S I B

(Used v.ith Test Fornv2)

Action/Compliance Limit j 0

-Select Null Hypothesis Form	

(* Median <= Compliance Limit (Form 1)
f Median >= Compliance Limit (Form 2)

Median >=Compliance Limit+ S (Form 2)

Median = Compliance Limit (2 Sided Alternative)

OK

Cancel

A

144


-------
o Specify the "Confidence Level." The default is "0.95."

o Specify meaningful values for "Substantial Difference, S" and
"Action/Compliance Limit."

o Select the form of Null Hypothesis. The default is Median <=
Compliance Limit (Form 1).

o Click "OK" to continue or "Cancel" to cancel the option.

o Click "OK" to continue or "Cancel" to cancel the test.

Output for Single Sample Sign Test (Data with Non-dctccts).





<

Arsenic



Single Sample Sign Test



Raw Statistics

Number of Valid Samples

24



Number of Distinct Samples

10



(¦lumber of Mori-Detect Data

13



Number of Detected Data

11



Percent Non-Detects

5417'/,



Minimum Non-detect

09



Maximum Non-detect

2



Minimum Detected

05

jT"



Maximum Detected



Mean of Detected Data

1 236

~~0 T~
0S65



Median of Detected Data



SDof Detected Data



Number Above Limit

0



Number Equal Limit

0



Number Below Limit

24







	 —		

HO: Site Median <=5 (Forml)

Test Value

"1

T7



Upper Critical Value (0 05)
F-Valus

	

Conclusion with Alpha = 0.06

Do Not Reject HO. Conclude Median <= 5
P-Value > Alpha (0.05)

145


-------
6.3.1.2.3 Single Sample Wilcoxon Signed Rank Test

I. Click Stats/GOF ~ Hypothesis Testing > Single Sample With NDs >
Wilcoxon Signed Rank test.

[H Scout; 4'.0j - [D^orain^cout Jon_\^indo>^\ScoutSource\Wor^atln(^el\Data\cenBqr,-byv-gripsj1l]|

Stats/GOF

~§ File Edit Configure Data Graphs
Navigation Panel I

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

D \Narain\Scout_Fo
HTSS_NoNDs_tTes
HTSS_NoNDs_Sig

1

Descriptive
GOF

Hypothesis Testing M Single Sample Tests >,

Intervals

c... I U Uroupl | c	

No NDs ~ \

G,oup3X

Tr:



~ Two Sample Tests

Proportion

__



; |

Sign test



452|

1l 4 52i 1 i|

Wilcoxon Signed Rank 1



U uioupj

	y	

1

~T
1

2. The "Select Variables" screen (Section 3.2) will appear.

® Select one or more variables from the "Select Variables" screen.

o If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o When the options button is clicked, the following window will be shown.

^ Single SampleWHcoxon Signed, Rank Test Options

Confidence Level | 0 55

Substantial Difference. S

(Used with Test Form 2)

a

Action/Compliance Limit | 0

"Select Null Hypothesis Form	

<• MeaniMedian <= Compliance Limit (Form 1)
Mean/Median := Compliance Limit (Form 2)
C Mean/Median ;= Compliance Limit + S (Form 2)
MeariMedian = Compliance Limit (2 Sided Alternative)

OK

Cancel

A

146


-------
o Specify the "Confidence Level." The default is "0.95."

o Specify meaningful values for "Substantial Difference, S" and
"Action/Compliance Limit."

o Select the form of Null Hypothesis. The default is Mean/Median <=
Compliance Limit (Form I).

o Click "OK" to continue or "Cancel" to cancel the option!

o Click "OK" to continue or "Cancel" to cancel the test.

Output for Single Sample Wilcoxon Signed Rank Test (Data with Non-dctccts).

Arsenic

Single Sample Wilcoxon Signed Rank Test

Raw Statistics

Number of Valid Samples
Number of Distinct Samples
Number of Non-Detect Data

2-1

~uT
"iT

Number of Detected Data

11

Percent Non-Detects 5^17'/.

Minimum Non-detect- 0 9
Maximum Non-detect:



Minimum Detected] 0 5
Maximum Detectea 3 2

Mean of Detected Data
Median oT Detected Data

SD of Detected Data
Number Above Limit
Number Equal Limit
Number Below Urr.it

T-p!us
T-minus

1 236

0T~

0 965
"0

0

T-T

0
3C-0

|H0:SiteMedian<=6 (Formi)

Large Sample z-Test Value
Critical Value (005)
Value

¦¦1293

16^5

Conclusion with Alpha = 0.05

Do Not Reject HO. Conclude Mean/Median <= 6

P-Value > Alpha (005)

Dataset contains multiple Nan-Detect values!

All Observations < 2 are treated 33 Non-O^ects

147


-------
6.3.2.1 Two-Sample Hypothesis Tests for Data Sets With No Non-detects
6.3.2.1.1 Two-Sample t-Test

l. Click Stats/GOF Hypothesis Testing > Two-Sample Tests > No NDs !B> t-
Test.

2. The "Select Variables" screen (Section 3.2.2) will appear.

o Select the variables for testing.

o When the options button is clicked, the following window will be shown.

@1	[5= Sample 1 (Foim 2)
Sample 2 >= Sample 1 + S (Form

2)

C Sample 2 = Sample 1 (2 Sided)

OK

Cancel



148


-------
o Specify a useful "Substantial Difference, S" value. The default
choice is "0."

o Choose the "Confidence Level." The default choice is "95%."

o Select the form of Null Hypothesis. The default is AOC <=
Background (Form 1).

o Click on "OK" to continue or on "Cancel" to cancel the option.

° Click on the "OK" to continue or on "Cancel" to cancel the test.

Output for Two-Sample t-Tcst (Full Data without NDs).

Raw Statistics



Sample 1

Sample 2

Number of Valid Samples

10

20



Number of Distinct Samples

9

19



Minimum

3 202

1.5



Maximum

20 78

37 87



Mean

8.222

17 09



Median

5.347

18 79



SD

5.971

9.713



SE of Mean

1.888

2.172





Sample 1 vs Sample 2 T wo-Sample t-T est



HO: Mu of Sample 2-Mu of Samplel < = 0





t-T est

Critical



Method

DF

Value

t (0.050)

P-Value

Pooled (Equal Variance)

28

2G37

1 701

0.007

Satterthwaite (Unequal Variance)

2G.G

3.083

1 703

0 002

Pooled SD 8.688

Conclusion with Alpha = 0.050

* Student t (Pooled) Test: Reject HO, Conclude Sample 2 > Sample 1

* Satterthwaite Test: Reiect HO, Conclude Sample 2> Sample 1





Test of Equality of Variances



Numerator DF

Denominator DF

F-T est Value

P-Value

19

9

2.64G

0.137

Conclusion with Alpha = 0.05

"Two variances appear to be equal

149


-------
6.3.2.1.2 Two-Sample Wilcoxon Mann Whitney Test

l. Click Stats/GOF >• Hypothesis Testing B> Two-Sample Tests l> No NDs >
Wilcoxon Mann Whitney test.

]1 Scout' 4'.Q) - [D:\Narain^cgut Jon_V/indmvs,\ScoutSource\WQrkDatl.nE^el\Data,\censoii=bjfrgiips/1i|J

Navigation Panel |



Descriptive M 2 | 3 | 4 5 6 7 8

Name |

^ 		,_LtUaroupl Gloup2< u.broup^ e(oup5< u.uoupj

D \Narain\Scout Fo
HTS S_No N D s_tT e s
HTSS_NoNDs_Sig
HT9C; MnNnc CJin

1

y\ No nds ~ Hbq^s^^rbHhhih

1

___
"~T

i

2



3

1 ^ J- 11 4 j_ r j rj Quantde test

2. The "Select Variables" screen (Section 3.2.2) will appear.

o Select the variables for testing.

o When the options button is clicked, the following window will be shown.

IptionsHvpothesisTiest'2S Sub... |^|fn]

Substantial Difference, S | o

(Used with Test Form 2)

Confidence Coefficient 	

r 99.9%

r 99.5%
r 99%

r 97.5%
¦= Sample 1 [Form 2)

C Sample 2 >= Sample 1 + S (Form
2)

Sample 2 = Sample 1 (2 Sided)

OK

Cancel



150


-------
o Specify a "Substantial Difference, S" value. The default choice is
"0."

o Choose the "Confidence Level." The default choice is "95%."

o Select the form of Null Hypothesis. The default is AOC <=
Background (Form 1).

o Click on "OK" button to continue or on "Cancel" button to cancel the
selected options.

° Click on the "OK" button to continue or on the "Cancel" button to cancel
test.

Output for Two-Sample Wilcoxon-Mann-Whitncy Test (Full Data).

Sample 2 Data: X(2]

Sample 1 Data: X(1)



Raw Statistics



Sample 1

Sample 2

Number of Valid Samples

10

20



Number of Distinct Samples

9

19



Minimum

3.202

1 5



Maximum

20.78

37.87



Mean

8 222

17 09



Median

5.347

18.79



SD

5.971

9.713



SE of Mean

1.888

2.172





Wilcoxon-Mann-Whitney (WMW) Test



HO: Mean/Median of Sample 2 <=Mean/Median of Sample 1



Sample 2 Rank Sum W-Stat

366





WMW Test U-Stat

156





WMW Critical Value (0.050)

137





Approximate P-Value

0 00731







Conclusion with Alpha = 0.05

Reject HO, Conclude Sample 2 > Sample 1



151


-------
6.3.2.1.3 Two-Sample Quantile Test

l. Click Stats/GOF ~ Hypothesis Testing > Two-Sample Tests > No NDs l>
Quantile Test.

ijfil Scout 4.0) [D:\^brQin\Scqut^^or3^A0^^sNScoutSource\WorkDQtlnExc^l\Data^censorj-b^-g^Sjt])

Navigation Panel |

I

Descriptive ~ | 2 1 3 4 5

6 7 8

Name



W JJJjroupi Qroup2x

G,oup3X

D:\Narain\Scout_Fo ..
HTSS NoNDs tTes
HTSS_NoNDs_Sig .

UTTT C\«

1

2

3

Two Sample Tests >| No NDs ~ ||

-t > 111- im

t Test

1

1

1	 l 1 1 1 With NDs >\ Witoxon-Mann-Whitney

1 4^| 1 452i 	'1	'

2. The "Select Variables" screen (Section 3.2.2) will appear.

° Select the variables for testing.

° When the options button is clicked, the following window will be shown.

; H Quantile,Test Options

3(i)[

J



Select Confidence Coefficient

C 99•/. r 97 5=/.
c 95V. r 90°/.





OK

Cancel



A.

o Choose the "Confidence Level." The default choice is "95%."

o Click on "OK" button to continue or on "Cancel" button to cancel
the option.

o Click on the "OK" button to continue or on the "Cancel" button to cancel
the test.

152


-------
Output for Two-Sample Quantilc Test (Pull Data).



Non-parametric Quantile Hypothosis Test for Full Dataset (No Non-Detects]

Date/Time of Computation

3/4/2008 G. 52.32 AM

User Selected Options



From File

DAN arainSS cout_For_Windows\S coutS ource\WorkD atl nE xcel\Data\censor-by-grps1

Full Precision

OFF

Confidence Coefficient

95%

Null Hypothesis

Sample 2 Concentration Less Than or Equal to Sample 1 Concentration (Form 1)

Alternative Hypothesis

Sample 2 Concentration Greater Than Sample 1 Concentration











Sample 1 Data: GrouplX







Sample 2 Data: Group2<















Raw Statistics









Sample 1

Sample 2







Number of Valid Samples

10

20









Number of Distinct Samples

9

19









Minimum

3 202

1 5









Maximum

20 78

37.87









Mean

8 222

17.09









Median

5.347

18.79









SD

5.971

9 713









SE of Mean

1 888

2172

















QuantileTest















HO: Sample 2 Concentration <= Sample 1 Concentration (Fann 1)















Approximate R Value (0 045)

14











Approximate K Value (0 045]

12











Number of Sample 2 Observations in 'FT Largest

13











Calculated Alpha

0.044G



















Conclusion with Alpha = 0.045







Reject HO, Conclude Sample 2 Concentration > Sample 1 Concentration















153


-------
6.3.2.2 Two-Sample Hypothesis Tests for Data Sets With Non-detects
6.3.2.2.1 Two-Sample Wilcoxon Mann Whitney Test

l. Click Stats/GOF ~ Hypothesis Testing Two-Sample Tests > With NDs !>•
Wilcoxon Mann Whitney test.

Scout 4'._Q. - [D:\yaraia\Scout JqLV^indows\ScoutSoyrc^\Wor|^atJnE)i^l\Data,\censgr,-byr-gi:psJ]

Stats/GOF

n§ File Edit Configure Data Graphs
. Navigation Panel

Name

D:\Naram\Scout_Fo.
HTSS_NoNDs_tTes
HTSS_NoNDs_Sig .
HTSS_NoNDs_Sig

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help
Descriptive
GOF

"M 2 | 3 | 4

^-nw I U UfOl

Hypothesis Testing ~

Intervals

U roup I
Single Sample Tests ~ \

siaii; "

Gioup2X

4 521

7 233|

	in cn-j |

No NDs ~Jlj-

u uroup^
.v.

8

UJjtOUpJ

	y	

1 1164671	1

6ioup3X

452
7233

With NDs ~! Wilcoxon-Mann-Whitney

31 5

Gehan

Quantile Test

1

T
T

2. The "Select Variables" screen (Section 3.2.2) will appear.

® Select the variables for testing.

o When the options button is clicked, the following window will be shown.

H 9.p.tibns>H.ypo.the$.isTesfrK._Sub-.... (V [~" ><

Substantial Difference. S | q~

[Used with Test Form 2]

Confidence Coefficient 	

C 93 93£
C 99.55a:

C 99 2

C 37.5%
G 95Z
r 3QZ

¦Select Null Hypothesis Form 	

<• Sample 2 <= Sample 1 (Foiml)

f Sample 2 >= Sample 1 (Foim2)

r Sample 2 >= Sample 1 +S (Form
2)

Sample 2 = Sample 1 (2 Sided)

OK

Cancel

A

154


-------
o Specify a meaningful "Substantial Difference, S" value. The
default choice is "0."

o Choose the "Confidence level." The default choice is "95%."

o Select the form of Null Hypothesis. The default is AOC <=
Background (Form 1).

o Click on the "OK" button to continue or on the "Cancel" button to
cancel the selected options.

o Click on "OK" button to continue or on "Cancel" button to cancel the
test.

155


-------
Output for Two-Sample Wilcoxon-.YIann-Whitney Test (with Non-detccts).

Usei SelecledOptions'

From Fde

D \Narain\Scout_FoO//indows\ScoutSoi*ce\WorkDaUr£xce(\Data\cen$of-by-grps1

Full Precision

OFF

Confidence Coefficient

95*

Substantial Difference (S)

0 000

Selected Nufl Hypothesis

Sample 2 Mean/Median Less Than or Equal to Sample 1 Mean/Median (Form 1)

Alternative Hypothesis

Sample 2 Mean/Median Greater Than Sample 1 Mean/Meek an









Sample 1 Data:Gioup1X







Sample 2 Data: Group2<















Raw Statistics









Sample 1

Sample 2







Number of Vaid Samples

10

20









Number of N on-Detect Data

2

2









Number of Detect Data

8

18









Minimum N cm-Detect

4

1 5









Maximum Non-Detect

4

1 5









Percent Non detects

20 00*

10.002









Mriimum Detected

3 202

6316









Max mum Detected

20 78

37 87









Mean of Detected Data

9 277

10 83









Median of Detected Data

£,704

19 36









SD of Detected Data

6 283

8 582

















Wilcoson-Mann-Whitney Sample 1 vt Sample 2 Test







All observations <= 4 (Max DL) are ranked the same







Wilcoxon-Mann-Whitney (WMW) Test















HO: Mean/Median of Sample 2 <¦ Mean/Median d Sample 1















Sample 2 Rank Sum W-Stat

369











WMW Test U Stat

159











WMW Critrcd Value (0 050)

137











Approximate P-Value

0 00503



















Conclusion with Alpha = 0.05







Reject HO, Conclude Sample 2 > Sample 1







Note: In the WMW test, all observations below the largest detection limit are considered to be NDs
(potentially including detected values) and hence they all receive the same average rank This action may
reduce the associated power of the WMW test considerably. This in turn may lead to incorrect conclusion
All of the hypothesis testing approaches should be supplemented with graphical displays such as 0-0 plots
and box plots. When multiple detection limits are present, the use of the Gehan test is preferable.

156


-------
6.3.2.2.2 Two-Sample Gehan Test

1. Click Stats/GOF > Hypothesis Testing P- Two-Sample Tests > With NDs 6>
Gehan test.

BH

Srout. 4'.^ - [D:.\Nara|ii\Scou(ufor^Windgw^cqutSourc^Workpatlh&ccel\pata\censori-b^g[i^1i]j

Stats/GOF

~Q File Edit Configure Data Graphs

Navigation Panel

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

D \Narain\Scout_Fo

HTSS_NoNDs_tTes

HTSS_NoNDs_Sig
HTSS_NoNDs_Sig

1

Descriptive
GOF

Hypothesis Testing ~

Intervals

""""452

1|

7 233

I2 I 3

4

5

6

7

8

_fr»_v	Lc-.. -iv_

| Single Sample Tesl

\jl±

s ~

m

jfOUpl 1

Group2X

U broupz

	y	

Group3K

U IjfOUpj

	y

„J	u	i n

| No NDs ~ ki

| V 116467! 1

1

4 52! 1	1; ¦ 11

i

Gehan



1

1;

7 233| T] 315

Quantile Test

—ii

2. The "Select Variables" screen (Section 3.2.2) will appear.
o Select the variables for testing.

° When the options button is clicked, the following window will be shown.

El Qp.tionsHy[)~ thesisXesV2S^Sub... |V|fn]fx

Substantial Difference. S | o~

(Used with Test Form 2)

Confidence Coefficient 	

r 99 9SJ
r 99 5%
r 3s%

c 97 55;
<* 35%

C 30%

- S elect Null Hypothesis Form 	

(* Sample 2 <= Sample 1 (Foim 1)

C Sample 2 >= Sample 1 (Form 2)

Sample 2 >= Sample 1 + S (Foim
2)

C Sample 2 = Sample 1 (2 Sided)

OK

Cancel

l 57


-------
o Specify a "Substantial Difference, S" value. The default choice is
"0."

o Choose the "Confidence Level." The default choice is "95%."

o Select the form of Null Hypothesis. The default is AOC <=
Background (Form 1).

o Click on "OK" button to continue or on "Cancel" button to cancel
selected options.

Click on the "OK" button to continue or on the "Cancel" button to cancel
the test.


-------
Output for Two-Sample Gehan Test (with Non-dctccts).

1 Gehan Sample 1 vs Sample 2 Comparison Hypothesis Test foi Data Sets with Non-Detects

Date/Time of Computation

3/4/2008 710 37 AM

U ser S elected 0 ptions



From File

D Warain\Scout_Foi_V/indows\ScoutSource\WoikDatlnExcel\Dala\censoi-by-grps1

Full Precision

OFF

Confidence Coefficient

95%

Substantial Difference

0 000

Selected Null Hypothesis

Sample 2 Mean/Median Less Than or Equal to Sample 1 Mean/Median (Form 1)

Alternative Hypothesis

Sample 2 Mean/Median Gieater Than Sample 1 Mean/Median













Sample 1 Data: GiouplX











Sample 2Data: Group2X























Raw Statistics













Sample 1

Sample 2











Nurnbet of Valid Samples

10

20













Number of Non-Detect Data

2

2













Number ol Detect Data

8

18













Minimum Non-Detect

4

1.5













Maximum Non-Detect

4

1 5













Percent Non detects

20.002

10 002













Minimum Detected

3 202

6316













Maximum Detected

20 78

37 87









	

	

Mean of Detected Data

9 277

1883









Median of Detected Data

6 704

19 3G











SD of Detected Data

S 283

8 582





















	



S ample 1 vs S ample 2 G ehan Test





















HO: Mean/Median of Sample 2 <=Mean/Meciafi of baduyoin]























Gehan z Test Value

2 55G











	

	

Critical z (0 95]

1 645











P-Value

0 00529



























Conclusion with Alpha = 005











Reject HO. Conclude Sample 2 > Sample 1









	

P-Value < a!pha(0.05)









159


-------
6.3.2.2.3 Two-Sample Quantile Test

1. Click Stats/GOF ~ Hypothesis Testing > Two-Sample Tests ~ With NDs ~
Quantile Test.

m

Scout'4. Oj -J[P:\yaraih^cout'_t;or_Wlindoui!s\ScqutSourc^\Wor(UDatlnE>re(\Dataj\censon-byr-grpsJ1j]|

Stats/GOF I

File Edit Configure Data Graphs

Navigation Panel

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

D \Narain\Scout_Fo
HTSS_NoNDs_tTes .
HTSS_NoNDs_Sig
HTSS_NoNDs_Sig .

2. The "Select Variables" screen (Section 3.2.2) will appear.

° Select the variables for testing.

o When the options button is clicked, the following window will be shown.

iH Quantile Test Options

rSelect Confidence Coefficient	

r m	r 97.5'/.

C 95'/.	r 90"/.

OK

Cancel



o Choose the "Confidence Level." The default choice is "95%."

o Click on "OK" button to continue or on "Cancel" button to cancel the
option.

° Click on the "OK" button to continue or on the "Cancel" button to cancel
the test.

160


-------
Output for Two-Sample Quantilc Test (with Non-detccts).

Date/Time of Computation

Gehan Sample 1 vs Sample 2 Comparison Hypothesis Test for Data Sets withNorhDetects

3/4/2008 7 10 37 AM

User S elected ~ ptions



From File

D'\Narain\Scout_For_Windows\ScoutSource\WoikDatlnExcel\Data\censoi-by-gips1

Full Precision

OFF

Confidence Coefficient

95%

Substantial Difference

0 000

Selected Null Hypothesis

Sample 2 Mean/Median Less Than or Equal to Sample 1 Mean/Median (Form 1]

Alternative Hypothesis

Sample 2 Mean/Median Greater Than Sample 1 Mean/Median













Sample 1 Data: GrouplX











Sample 2 Data: Group2<























Raw Statistics













Sample 1

Sample 2











Number of Valid Samples

10

20













Number of Non-Detect Data

2

2













Number of Detect Data

8

18













Minirrwm Non-Detect

4

1.5













Maximum Non-Detect

4

1 5













Percent Non detects

20 00°4

10 002













Minimum Detected

3 202

6 316













Maximum Detected

20 78

37 87













Mean of Detected Data

9 277

18.83













Median of Detected Data

B 704

19 36













SD of Detected Data

8 283

8 582

























Samplel vs Sample 2 Gehan Test























HO: Mean/Median of Sample 2 <=Mean/Medan of badupawJ























Gehan z Test Value

2 556















Ditical 2 (0 95)

1 G45















P-Value

0 00529



























Conclusion with Alpha = 0.05











Reject HO. Conclude Sample 2 > Samplel











P-Value < alpha (0.05]











6.4 Classical Intervals

This section illustrates the computations of various parametric and nonparametric lower
and upper limits for the confidence, prediction and tolerance intervals. The data used is
univariate and can be with or with out non-detects. A detailed description of those limits
can be found in the ProUCL 4.00.04 Technical Guide.

161


-------
6.4.1 Upper (Right Sided) Limits

This module in Scout computes various parametric and nonparametric statistics and
upper limits that can be used as background threshold values and other not-to-exceed
values. The detailed illustrations of the computing of those statistics can be found in the
ProUCL 4.00.04 Technical Guide and User Guide (Chapter 10 and Chapter 11).

Right sided limits can be obtained separately, for the data following normal, gamma
lognormal or nonparametric distributions, using any of the four options ("Normal,"
"Gamma," "Lognormal" or "Nonparametric") from the drop-down menu. If the "All"
option in the drop-down menu is used, then the limits for all four distributions are printed
on single output sheet. Examples illustrated for the Upper (Right Sided) limits are shown
using the "All" option.

Scout 4".0) - fDi^araiiiVScqut^^Dr^V/indDws^ScoulSqurceW/or^atlnE^ehDataVcensqr-by-g^ps.lJj
Stats/GOF;

ay Fte Edt Configure Data Graphs
Navigation Panel I

I Ojtkers/Estmates Regression Multivariate EDA Geo Stats Programs Wndow Help

6.4.1.1 Upper (Right Sided) Confidence Limits (UCLs)

6.4.1.1.1 NoNDs

I. Click Stats/GOF Intervals > Upper (Right Sided) > UCLs > No NDs >
All.

H Seoul' A.Oj- [D:,\^arain\SnoiitiJ;flr_V/ihd[nv5\ScpujSpurce\V.forkD.allnExcGl\Dr|ta\(:Rnsor-h^r-grps1i];
Stats/GOF j

~y pjie Edit Conftgue
Navigation Panel

Data Graphs

Outlers/Estimates Regression MJtivarkate EDA GeoStats Programs Window Help

HTSS_N o N DsjT e s
HTSS_NoNDs_Sig
HTSS_NoNDs_Sig
UBSNoNDsAI! ost

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

162


-------
° If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o When the option button is clicked, the following window will be shown.

ill Options, £UJM-Ls) tfe tHBs,

Confidence Level | fiHH
Number of Bootstrap Operations I 2000

OK

Cancel

A

o Choose the "Confidence Level." The default choice is "95%."

o Choose "Number of Bootstrap Operations." The default is "2000."

o Click on "OK" button to continue or on "Cancel" button to cancel the
option.

° Click on the "OK" button to continue or on the "Cancel" button to cancel
the UCLs.

163


-------
Output Screen for UCL for Data Sets with No iS'on-detccts (All option).

General UCL Statistics feu Full Data Sets

Usei Selected Options

From Fife D Warain\ScoU_FoOi^r)dow$\$cciutSource\WotkDatlnExcel\Data\cen$oi-by-grp$1
FulPieciston iOFF"

Confidence Coefficient Is*

Number of Bootstiap Operations f2000

GeneralStatistics

Number of Vabd Observation^ 53 |~

Numbei of Drstmct Observatory 51

Raw Statistics

Minmim

15

Maxirium

1211

Mean

511

Median

24 56



~ T37(T~

Coefficient of Variation

~0lG7~

Skewness

0 277

Log-transformed Statistics

Minimum of Log Dataj	0 405

Maximum of Log Dataj	4 797

Mean of log Dataj	3 325

SD of tog Dataj	1233

......

Relevant UCL Statistics

Normal Distribution Test

USefors T est Statistcj 0 247
Ljfcefors Critical Value; 0122

Data Not Normal at 52 Significance Level

Assuming Normal Distribution

95* Student's* UCLI 6118

95% UCL* (Adjusted for Skewness)

95^Adjusted-CLT UCLP 6124"

Lognormal D istribution Test

Uiefors Test Statistic! 0 225
UDiefors Critical Value' 0122
D ata N ot Lognormal at 52 Significance Level

Assuming Lognormal Distribution

~35*H-UOJ 1005
95* Chebyshev]MVUEyucL^ T24 7
97 5^Chebyshev [MVUE"]Uai " 151"5~
~ " 99* Chebysbev [MVUETuCL|^20T f

Gamma Distribution Test

k star (bias corrected)^

0312 |

Data Distribution
D ata do not follow aD iscernable Distribution (Q.05j

Theta star

56 04





nu star

36 66





Approxmate Chisquare Value (05)

7498

Nonpaiametiic Statistics



Ad|usted Level of Significance

0 0455

95£CLT"ua["

61

Adjusted Chisquare Value

74.45

953; Jackknrfe UCLJ

61 18





95* Standaid Bootstrap UCL

60 9

Anderson-Darling Test Statistic

2 591

95*BootsttaptUCL

61 13

Anderson-Darling 5* Critical Value

"0782

95* Halfs Bootstrap UCL

61 15

Kolmogorov-Smirnov Test Statisticj 0 222
Kolmogorov-Smnnov 5* Cubed Valuej~ 0 126

Data Not Gamma Distributed at 5% Significance Level

Assuming Gamma Distribution

95% Approximate Gamma UCLl

95* Percentile Bootstrap UCLj 6133
95* BCA BootsliapUCLj " 61 03
" ^Tfwb^TjMiTa SdjUCLj
d]UCL

97 5* ChebyshevfMean, Sd) L
993; Chebyshev(Mean, Sd) UCL

95* Adjusted Gamma UCLj

6588
*66*35*"

Potential U CL to Use

Use 97 5* Chebyshev (Mean, Sd] UCL 88 S6

77 32

88 SB

1109

164


-------
6.4.1.1.2 WithNDs

l. Click Stats/GOF > Intervals t> Upper (Right Sided) UCLs > With NDs
~ All.

§5) Scout- 4.0; ¦ [DiVHflrainXScout^or^WindowsVScoutSqurceUVqrkDatlnExceJVDalaVcensor-b^-grpsli];

,c§ File Edit Configure Data Graphs
Navigation Panel
Name

ESBBSEBSZI

HTSS_NoNDs_tTes
HTSS_NoNDs_Sig
HTSS_NoNDs_Sig
UBSNoNDsAII ost
UCLNoNDsAlf ost
UCLwNDsALL ost

| Outters/Estnates Regression Muttivanate EDA GeoStats Pro^arrts Window Help

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o When the option button is clicked, the following window will be shown.

H Qotibnsj	Wjithj MM

Confidence Level j Big
Number of Bootstrap Operations I 2000

OK

Cancel

A

o Choose the "Confidence Level." The default choice is "95%."

o Choose "Number of Bootstrap Operations." The default is "2000."

o Click on "OK" button to continue or on "Cancel" button to cancel the
option.

165


-------
° Click on the "OK" button to continue or on the "Cancel" button to cancel
the UCLs.

Output Screen for UCL for Data Sets with Non-dctects (All option).

User Selected Options

From File

General UCL Statistics foi Data Sets with Non-Detects



D l\Narain\Scout_FoT_Wmdows\ScoulSouice\WorkDatlnExce(\Data\cen$or-by-gips1

	

Fiil Precision | OFF

	

Confidence Coefficient

95%

Number of Bootstrap Operations

2000

- - —





X





		

General Statistic*

Number of Valid Data
Number of Distinct Detected Data

53j Number of Detected Data
43i Number of Non-Detect Data

49

4



| Percent Non-Detects

7,55*

	





Raw Statistics

Log-transformed Statistics



Minimum Detectedj 3 202

Minimum Detectedj 1164



Maximum Detected 121,1

i

Maamum Detected 4 797



Mean of Detectedj 55 05

Mean of Detectedj 3523



SD of Detected 43 2

!

SD of Detected

1128



Minimum Non-Detect! 15

I

Mromum Non-Detect

0.405



Maximum Non-Detect

4

Maamum Non-Detect

1 38G



Note Data have multiple DLs - Use of KM Method is recommended

Number treated as Non-Detect

5



For all rrrethods (except KM, T)U2. and ROS Methods),

Number treated as Detected

48



Observations < Largest ND aie treated as NDs

Single DL Non-Detect Percentage

943%







UCL Statistics

		

Normal Distribution T est with Detected Values Only

LognormalDistribution T est with Detected Values Onfci

Ldbefors Test Statistic

0 802

UDiefors Test Statistic

0 856



5% Uliefois Critical Value

0 947

5% Lilliefots Ditical Value

0947



Data Not Normal at 5% Significance Level

Data Not Lognormal at 5£ Significance Level









166


-------
Output Screen for UCL for Data Sets with Non-dctccts (All option) (continued).

I	Assuming Lognormal Distribution

Assuming Noimal Distribution

DU2 Substitution Method
Mean

"95XDl72(iTUCL

Maximum Likelihood Estimate(MLE) Method
Mean

"SD

355£MLETl|Ua
952 MLE~(T ku) UCL

439

eTT

48 86
4677
59 62

DL72 Substitution Method^
Mean]

"3 273



SD I

1 406



95X H-Stat [DL/2] UCL

"1055

	

Log R0S Method





Mean in Log Scale'

334



594{

SD in Log Scale
Mean tn Original Scale|

SD m Original Scale
SSXT'ercentile Bootstrap UCL,
952 BCA Bootstrap UCL]

1 264 j

"5TT31_

4375j
HT®"
GO 82!

Gamma Distribution T est with Detected Values Oriy

k star (bias corrected)
Theta star
nu star

A-D Test Statistic

52 A-D Critical Value

K-S Test Statistic

52 K-S Critical Value

1 111
4*54
"10F9

0 775
""0*775

Data Distribution T est with Detected Values OrJy

D ata do not follow a D iscernable Distribution (0.05)

Nonparametric Statistics

Kaplan-Meier (KM) Method,

Mean'

51.14|

013!

Data Not Gamma Distributed at Significance Level
Assuming Gamma Distribution

SD'

SE of Mean

43 33!

6 013



952 KM (t)UCL

352 KM (z)UClJ

61 21
—61 03

Gamma R0S Statistics using Extrapolated Data

i
|



95* KM (jackkrufe) UCLi

61 14





Minrmum

1 OOOOE-9]



952 KM (bootstrap t) UCL1

62 07





Maximum

121 11

509]



952 KM (BCA) UCL j

60 58





Mean



952 KM (Percentile Bootstrap) UCL|

60 92





Median

24 56|



95X KM (Chebyshev) UCLl

77.35



SD

44 02



97 5% KM (Chebyshev) UCL|

89 69



k starj
Theta staj
Nu starj
AppChi2

952 Gamma Appioximate UCLi

0 302!

992 KM (Chebyshev) UCL1

111

952 Adjusted Gamma UCL|
Note: DL/2 is not a recommended method.

1693;

3205!

20 ifr

81 11 j

"8221

Potential UCLs to U te

952 KM (Chebyshev) UCL'

77 35

167


-------
6.4.1.2 Upper Prediction Limits (UPL) / Upper Tolerance Limits (UTL)
6.4.1.2.1 NoNDs

1. Click Stats/GOF > Intervals ~ Upper (Right Sided) ^ UPL/UTL ~ No
NDs P> All.

^ Scout* 4..QJ ¦ [D:\Nqrflin\ScQul^f}or_Windows\Scqut^urce\WorkDqtlnExcel\DataVcen5qrtljyr-gi]^sJi]j

Stats/GOF I

~5 File Edit Cortftgure Data Graphs
Navigation Panel I

I Out kers/Est mates Regression Multivariate EDA Geo Stats Prcnyams Wndow Help

Name

Descriptive

GOF

Hypothesis Testng ~

Groc^lX

Gtoup2><

_m rrq

u broup^!
	y	

Prethctcn Intervals ~ I	

Tolerance Intervals ~ [	

Confidence Intervals ~ Jl	

GroupSK

116 467

102 922

8

U broupJ
" y

93 659

1

" If

~r

Upper (Right-Sided) >1 UPL/im. >1 No NDs

T* uas

With NDs ~

18 467j 1

100 859

15006 1

~8f9

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° When the option button is clicked, the following window will be shown.

!3S|

Qjitijjhs, ygyyiEL Mai iHBs>

Number of Bootstrap Operations

OK

Confidence Level 13

Coverage | 0.9

Different or Future K Values	1

2000

Cancel

A

168


-------
o Specify the "Confidence Level"; a number in the interval [0.5, 1), 0.5
inclusive. The default choice is "0.95."

o Specify the "Coverage" level; a number in the interval (0.0, 1).
Default is "0.9."

o Specify the next "K." The default choice is "1."

o Specify the "Number of Bootstrap Operations." The default choice
is "2000."

o Click on "OK" button to continue or on "Cancel" button to cancel the
option.

o Click on "OK" button to continue or on "Cancel" button to cancel the

UPLs and UTLs.

169


-------
Output Screen for LPL/LTL for Data Sets with No Non-detects (All option).

User SelectedO ptions

General Background Statistics for Fii Data Sets

'

From File

D\Narain\S cout_For_Windows\S coutS ource\WorkD atlnExceIND ata\censor-by-gr ps1

Full Precision

OFF



	

Confidence Coefficient



Coverage



Different or Future K. Values

1

Number of Bootstrap Operations

2000

		 	

X



General Statistics

51

Total Number of Observations

53

Number of Distnct Observations







Raw Statistics

Log-T ransformed Statistics

Minimum

1 5

Minimum

0.405

Maximum

121 1
iT 6 5

Maximum

4 797

Second Largest

Second Largest

4 758
~2 273~

First Quartile

9 708

First Quartile

Median

24 5G

Median

3 201

Thud Quaitile

9S 88

Third Quartile

4 573

Mean

51 1
—437B~

Mean

3 325

SD

SD

1 298

Coefficient of Variation

0 857





Skewness

0 277

I



Background Statistics

NoimalDistiibution T est

0.247

Lognormal Distribution T est

Lilliefors Test Statistic

Lilliefors Test Statistic

0 225

Lilliefors QiticalValuej 0122

Lilliefors Critical Valuej 0122

DataNot Normal at 5% Significance Level

Data Not Lognormal at 5% Significance Level





170


-------
Output Screen for UPL/UTL for Data Sets with No Non-detects (All option) (continued).

Assuming Noimal Distribution	I	Assuming Lognormal Distribution

35% UTL with 90% Coverage

122 4

35% UTL with 90% Coverage

229 8

95% UPL (t)

1251

~\072 ~

95% UPL (t)

249 3

30% Percentile (2)

90% Percentile (z)

146 G

95% Percentile (z)

1231

95% Percentile (z]| 234 9

99% Percentile (z)

153

99% Percentile (z)j 56B 9





Gamma Distribution Test

Data D istribution T est

k star

0 912

Data do not follow a Discernable D istribution (0.05)

Theta star

56 04



nu star

96 66













A-D Test Statistic

2.591

N onparametric S tatcdics

5% A-D Critical Value

0 782

90% Percentile j 110

K-S Test Statistic

0 222
0126

95% Percentilej 116.4

5% K-S Critical Value

99% Percentilej ^ '

Data Nat Gamma Distributed at 5Z Significance Level

|



1

Assuming Gamma Distribution

J

£OI

SI
i

95% UTL with 90% Coverage 116 4

30% Percentile

95% Percentile Bootstrap UTL with 90% Coverage

114 8

35% Percentile

159 2
246 G

95% BCA Bootstrap UTL with 90% Coverage

1148

99% Percentile

95% UPL

1164





95% Chebyshev UPL

243 7
"2271





Upper Threshold Limit Based upon IQR

Note: UPL (or upper percentile for gamma distributed data) represents a preferred estimate of BTV

6.4.1.2.2 WithNDs

I. Click Stats/GOF E> Intervals E> Upper (Right Sided) > UPL/UTL E» With
NDs > All.

Scout- 4'.0j - [D:\HQroin\Scout_For._V/indovys,VScoutSo-urce\V/qrkDqllii£xcel\PqtQW:ensori-b^,-grp^s1i]J
Stats/GOF

~y File Edit Configure Data Graphs
Navigation Panel

Outbers/Estmates Regression Multivariate EDA GeoStats Pro^ams Wndow Help

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

171


-------
If the statistics have to be produced by using a Group variable, then select
a group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

When the option button is clicked, the following window will be shown.

Confidence Level |

Coverage | 0-9

Different or Future K Values | 1

Number of Bootstrap Operations | 2000
OK	Cancel

A

o Specify the "Confidence Level"; a number in the interval [0.5, 1), 0.5
inclusive. The default choice is "0.95."

o Specify the "Coverage" level; a number in the interval (0.0, 1).
Default is "0.9."

o Specify the next "K." The default choice is "1."

o Specify the "Number of Bootstrap Operations." The default choice
is "2000."

o Click on "OK" button to continue or on "Cancel" button to cancel the
option.

Click on "OK" button to continue or on "Cancel" button to cancel the

UPLs and UTLs.


-------
Output Screen for UPL/IITL for Data Sets With Non-detects (All option).

* General Background Statistics lor Data Sets mth NorvDetects

Uiei Selected Options

From File , D \Narain\ScouF For_Windows\ScourSouice\WoikDatlnExcel\Data\censoi-bv-grps1
Full Precision (OFF
Confidence Coefficient j 95%

Coverage 19IK







Different or Future K Values 11







Numbei of Bootstrap Operations 12000























X

	

	 -

	



General Statistics



Number of Valid Data|

53

Number of Detected Data

49"""

Number of Distinct Detected Data]

49

Number of Non-Detect Data

4

I



Percent Non-Detectsi7 552









Raw Statistics



Log-transformed Statistics



Minimum Detected]

3 202

Minimum Detected

1 164

Maximum Detected

|

"121 1

Maximum Detected

4 797

Mean of Detected]

55,05

Mean of Detected

3 523

SD of Detectedi

432 "

SD of Detected

1 128

Minimum Non-Detectl

1 5

Minimum Non-Detect

0 405

Maximum Non-Detecl|

4

Maximum Non-Detect

1 386

I







Data with Muttiple Detection Lidts



Single Detection Limit Scenario



Note' Data have multiple DLs • Use of KM Method is recommended

Number treated as Non-Detect with Single DL

5

For all methods (except KM, DL/2, and RQS Methods),



Number treated as Detected with Single DL

48

Observations < Largest ND are treated as NDs

Single DL Non-Detect Percentage

9 43X

Background Statistics

Normal Distribution T est with Defected Values Only	Lognotmal Distribution T est with Detected Values OrJy

Liiiiefors Test Statistic^ (1802	Liiiiefors Test Statistic^ 0 856

5Z Lrllrelors Critical Value) 0 947	5% Ldiiefors Critical Value) 0 947

Data Nat Normal at 52 Significance Level	Data Not Lognortnal at 5Z Significance Level

173


-------
Output Screen for UPL/UTL for Data Sets With Non-detects (All option) (continued).

Assuming Normal Distribution



Assuming Lognormal Distribution



DL/2 Substitution Method



DL/2 Substitution Methoc

l

I

Mean

51
~ 439

Mean (Log Scale)! 3 273
" " SD~[Log"Scafe)i " 1.406 "

95? UTL 90? Coveiage
. _ 95FUPL"(1)

122 5
T252

95% UTL 90% Coverage
" " " " " 95? UPL (t)

260 2
284 1" "

90? Percentile (z]
95? Percentile (z]

1073
" 1232

90? Percentile (z
95? Percentile (z

159 9
266 5

93? Percentile (z]

153.1

99? Percentile (z

694 8









Maximum Likelihood Estimate(MI.F) Method



Log ROS Methoc



Mean

48 86

Mean in Original Scale

51 13

SD

4G 77

SD in Original Scale

43 75

95? UTL with 90% Coverage

125

95? UTL with 90? Coverage

220 8





95? BCA UTL with 90? Coverage

1145





95? Bootstrap (?) UTL with 90? Coverage

1148

95? UPL (1)

127 9

95? UPL (t)

238 9

905; Percentile (z]

108 8

90? Percentile (z

142 5

95? Percentile (z]

125 8

95? Percentile (z|

225 6

99? Percentile (z)

157 7

99? Percentile (z

533 6





Gamma Distribution T est with Detected Values Only

Data Distribution T est with Detected Values Only

k star (bias corrected]

1 111

Data do not follow a Discernable Distribution (0 05)

Theta star

49 54



nu star

1089

I









A-D Test Statistic

2 882

N onpar amebic S tatistics



5? A-D Ciitical Value

0 775

Kaplan-Meier (KM) Method



K-S Test Statistic

0 23G

Mean

51 14

5% K-S Critical Vakje

0.13

SD

43 33

D ata N ot Gamma D istributed at 5X S ignificance Level

SE of Mean

6 013

1 1 95? KM UTL with 90? Coverage

121 7

Assuming G amma D istribution



95? KM Chebyshev UPL

241 8

Gamma ROS Statistics with extiapolated Data



95? KM UPL (t)

124 4

Mean

50 9

90? Percentile (z)

106 7

Median

24.56

95? Percentile (z)

122.4

SD

44 02

99? Percentile (z)

151 9

k star

0 302





Theta star

168 3





Nu star

3205





95? Percentile of Chisquare (2k)

2 759





90? Percentile

150





95? Percentile

232 3





99? Percentile

445 9

	 - — - - '		

	

N ote: U PL (or upper percentile for gamma distributed data) represents a preferred estinate ot BTV

Foi an E xample: KM-U PL may be used when multiple detection limits are present

Note: DL/2 is not a recommended method.

174


-------
6.4.2 Classical Confidence Intervals
6.4.2.1 Without Non-detects

The confidence intervals for data with no non-detects available in Scout are:

o Normal:

o Student's t

x±t,



o Gamma:

o Approximate Gamma
o Adjusted Gamma
o Lognormal
o Land's H

LCL = exp

(

- sy
y H—— +

2

C s H ^
y <*/

v
r

LCL — exp
o Chebyshev MVUE

-

V H—— +

2

yjn- 1

V	J J

(s^A

yjn- 1

V

—	^ ®mvue

a yjn

o Nonparametric
o CLT

x±z,

2) yjn

o Jackknife

y(«)±

//„/ 7

(%"-\) J{o)

o Standard Bootstrap

d±\a/fB

175


-------
o Bootstrap t

LCL -x-t,



s UCL = x-t,„, ,4=

4~n

o Percentile Bootstrap

a

LCL = — percentile of X

i a

UCL = 1 — — percentile of x

o Chebyshev

- , 1 s

x ±-

o Modified (t)

jf+-4-±t

6s2n {"A'" ') yfn

o Adjusted CLT

x±

1 + 2 z,

Z{%)+'

6yfn

Details of those intervals can be found in the ProUCL 4.00.04 Technical Guide.

Click Stats/GOF l>Intervals P- Classical E> Confidence Intervals > No NDs.

Scout* 4J..Q) - rP:\Narain\Scou1'_For._W.indowsKcoutSource\WorkDaH'nExcel\F>UUillRIST|

Stats/GOF'

og File Edit Configure Data Graphs
Navigation Panel

Outliers/Estimates Regression Multiyanate EDA GeoStats Programs Window Help

Name



ConfNoNDsNorm ost
D \Narain\Scout_Fo.
ConfwNDs ost

Descriptive	~

GOF	~

Hypothesis Testing ~

t-widlh

pt-length

pt-width

Predicbon Intervals ~
Robust ~! Tolerance Intervals ~

Confidence Intervals ~.

Upper (Right-Sided) ~ With NDs ~

176


-------
The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click on "Options" for interval options.

1H1 Qptiqns, Gfl.ntideriGej Intenyalsj Mbj ND.Sj

Confidence Level |
Number.of Bootstrap Operations I 2000

OK

Cancel

A

o Specify the preferred "Confidence Level." The default is
"0.95."

o Specify the preferred number of bootstrap operations. The
default is "2000."

o Click "OK" to continue or "Cancel" to cancel the options.

o Click "OK" to continue or "Cancel" to cancel the computations.

177


-------
Output for Classical Confidence Intervals without Non-detects.

Confidence Intervals lor Datasets withni NortOetecU

~Die/timerfC^laiion 11/15/2008 1 2 39 56PM
User Selected Options

From File D \Narain\Scoul_Fof_Windows\ScaiSoijrce\WorkDatlr£xc^BOD\FAT

Ful Precision
Number of Bootstrap Operations

OFF
2000

Confidence Coefficient j 0 96

Skin(xl)

Number of Valid Observational 20
Number of Distrit Observation] 20

RawStatittics

Mean] 25 31
25 55~
Variance! 25 23
Standard Deviation! 5023

Nonnallnleivab

Normal Lower Lorn* j Upper Limit
SludenPjtl" ~2295" I "27ffi~

Gamma Statistics

k Stai (Bias Corrected)	20 54

Theta Star;	f232~

nu Stdil	821 5

Gamma Intervals

Gamma
Approxmate Gamma
Adjusled Gamma

Lower Urn* | Upper Limit
23 03" j ~2794
22 82 j 29 21

Log-T lantformed Statistics

I

3 21

Mean of Log Transformed Data
Standard Deviation of Log Transformed Data^ 0216 ]

MVU Estimate of Median

MVU Estmate of Mean
MVU E striate of SD

MVU Estmate of Standard Etror of Mean

Lognormallntervab

24 75

25 34 j

"WaT

' 1 232 r

Lognoima! ' Lower Limrt Upper Limit;
IwTh]-23"02""J
9 83"

Chebyshev (MVUE)I

; Hi 23 0

IEi| "19 8

28 26

30 85

Nortparainetitc Intervals

Lower Limit! Upper Lomrt
231~ | "27 51* "
Jackkrafo" 22 95 " { ~ 27 ES_
Standard Bootibap 2319 27 42

Nonpaiametnc
Central Lfrut Theotem

Bootstiap-t
Percentile Bootstrap

22 67

2313"

Chebyshev "2028"
Modified (t) 22 93"
Adpjsted CLtI 23 3~

27 59
T7ll

30 33
27 63
"*2731"

J

	i_

i

- -1

178


-------
6.4.2.2 With Non-detects

The confidence intervals for data with non-detects available in Scout are:
° Normal:

o Student's t

o Normal ROS Student's t
o Gamma:

o Gamma ROS Approximate Gamma
o Gamma ROS Adjusted Gamma
° Lognormal:

o Lognormal ROS Land's H
o Lognormal ROS Chebyshev MVUE
o Lognormal ROS % Bootstrap
o Nonparametric:
o Kaplan-Meier (t)

o Kaplan-Meier % Bootstrap (bootstrapping the KM means)

a

LCL = — percentile of x

o Kaplan Meier BCA Bootstrap

o Kaplan-Meier (z)

UCL = 1 — — percentile of x

179


-------
o Kaplan Meier Chebyshev

1 5

X ± •

yfa yfn
o Winsor(t)

x +t -^L-

s(n-k)

where v = n-2k	5... = ¦

V — 1

XH, = Winsorized mean

Details of those intervals can be found in the ProUCL 4.00.04 Technical Guide.

Click Stats/GOF > Intervals t> Classical > Confidence Intervals > With
NDs (Typical) or With NDs (Bounded).

HI Scout 4.0, =. [D:\Warain\5coul_FQr_VVJindo>vs.\ScoulSource,\WorkDallnExcelVfjyi!l!IRISJ|

' File Ecfit Configure Data Graphs
Navigation Panel

Name

ConfNoNDsNorm ost
D \Narain\Scout_Fo
ConfuvNDs ost

[ Outliers/Estmates Regression Multivariate EDA GeoStats Programs Window Help

With NDs (Bounded)

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click on "Options" for interval options.

180


-------




Confidence Level j ^

1

Number of Bootstrap Operations j 2000

OK

Cancel







/Z

o Specify the preferred "Confidence Level." The default is
"0.95."

o Specify the preferred number of bootstrap operations. The
default is "2000."

o Click "OK" to continue or "Cancel" to cancel the options.

° Click "OK" to continue or "Cancel" to cancel the computations.

Output for Classical Confidence Intervals with Non-detects (Typical).

Confidence Intervals Datasets with NorvDetecU

Date/Time of Compulation 11 /21 /20081 25 37 PM
User Selected Options j

From Fde jD \Narari\Scout_FoMAfrdows\ScoutScwrce\WoikDatlnExceI\Data\censor-by gtpsl >
Full Precision .OFF
Nuitber of Bootstrap Operations 12000

Confidence Coefficient j0 95

General Statistics

Number of Valid Data

Number of Detected Data
Number of Distinct Detected Data'

53
"49"
49*

Minimum Detededj
Maximum Detected

Number of Hon Detect Dataj
Percent Non-Detects|

Minimum Non detect

Maximum Non detect

Mean of Detected Data

SD of Detected Data

3 202
T2U
" 4

7 555» *
""15""

J
~5505~
432~ '

Maximum Likelihood Statistics

Maximum Likelihood Estimated Meanj~ 48 &o
Maximum Likelihood Estimated Stdv! 46 77

Normal Confidence Intervab

Normal | Lower Limit
MLEltj 35 97

Upper Limit I

NormalROS Statistics

Mean of Normal ROS Datal

61 75

48 06

Stdv of Normal ROS Data 48 36
"ROS StudenPrf ~34~73~' I ~*6lls"

:L.:


-------
Output for Classical Confidence Intervals with Non-detects (Typical) (continued).

Gamma ROS Statistics









k Star of Gamma ROS Data

0 302









Theta Star of Gamma ROS Data

1G8 3









Nu Stai of Gamma ROSData

32.05



















Gamma Intelvals









Gannma

Lower Limit j Upper Limit









ROS Approximate Gamma' 32.93

89









ROS Adjusted Gamma

43 94

6209

I







|





Log-T lansfoimed Statistics

I

I

Mean of Log-Transformed Detected Data

3 523



| |

Stdv of Log-T ransformed Detected Data

1 128









Mean of Lognormal ROS Data

51 13

|





Stdv of Lognormal ROS Dataj 43 75

I











|





Lognormal Confidence Intervab



l

i

Lognormal

Lower Limit

Upper Limit





|

ROS Land's H

41 91

109 5





I

ROS % Bootstrap

40.11

62 98





|

ROS BCA Bootstrap

39 71

63 51





I











Kaplan Meiei Distribution Fiee Statistics

I

I

Kaplan Meier Mean

51 14

|

|

Kaplan Meier Stdv

43 33

I





Kaplan Meier SEM

G.013

I











I





Nonpaiametric Confidence Intervals

|





Nonparametric

Lower Limit

Upper Limit









Kaplan Meier (t]

39 07

63 21









Kaplan Meier (z)

39 35

62 92









Kaplan Meier X Bootstrap

401

62 95









Kaplan Meier BCA Bootstrap

40 91

63 54

I





Kaplan Meier Chebyshev

24 25

78 03



















Winsorization Statistics









Winsor Mean|

50 72









Winsor Stdv

42.87









Winsor (t)

38 83

62 6



















182


-------
Output for Classical Confidence Intervals with Non-detccts (Bounded).

¦ Bounded Confidence Inteivals loi Dalasets wihNon-Detects

Date/Time erf Computation (1/15/200812 45' 11 PM

User Selected Options

From File | D.\Naratn\Scout_For_Windows\ScoutSource\WorkDatlnExcel\Data\cen$or-by-grp$1

Full Precision

OFF

Number of Bounding Operations j 1000

Bounding Coefficient

09

Number of Bootstrap Operations

2000

Confidence Coefficient

09

1

..... -r

x 1

I

]





General Statistics |

Lower Bound (LB)

Upper Bound (UB)







Mean

50 95

I

51 06









Standard Deviation]

43 84

I

43 97

























Normal Confidence Lmits |

LBLCL |

UB LCL I LBUCL

UBUCL







Student (t)j

40 83 |

40 97

61 06

61 14



'

1



Gamma Statistics

Lower Sound [LB) j

Upper Bound (UB)







k Stai (Bias Corrected)

0 761

I

0 883









Theta Star

57 8



66 87





nu Star

80.62

93 58 I





















Gamma Confidence Limits |

LBLCL

UB LCL |

LBUCL

UBUCL |





Approximate Gamma|

40 04

40 77

6611

67 4







Adjusted Gamma|

39 72

40 51 |

66 61

6811



I

1

I

I

Logrtormal Statistics !

Lower Bound (LB)

Upper Bound (UB)







Mean of Log Trarreformed Data1

3179



3.297









d Deviation of Log-Transformed Data

1 355



1.674

























Lognoimal Statistics

Lowe; Bound (LB)

Uppei Bound (UB)







Mean of Log-Tlansformed Data

3179



3 297









d Deviation of Log-Transformed Data

1 355



1 674



_ I









Lognormal Confidence Limits

LB LCL

UB LCL

L8UCL

UBUCL







Land's H

46 22

5914

106*6

197~8



— ...



Chebyshev (MVUEJ

-1 432

16 07

1141

1891























Nonpaiametrtc Corifidence Limits

LB LCL

UB LCL

LB UCL

UB UCL







Central Limit Theorem

41 01

41 15

60 88

60 96







Central Limit Theorem

40 83

40 97

61 06

61 14







Standard Bootstrap

40 9

41 43

60 58

61 09







Bootstrap-t

40 64

41 65

60 88

61 88







Percentile Bootstrap

40 76

41 69

60 42

61 36







Chebyshev

31 84

32 01

70 04

701







Modified (t

40 87

41 02

611

61 18







Adjusted Cll

40 78

40 9

61 12

61 2







183


-------
6.4.3 Classical Tolerance Intervals
6.4.3.1 Without Non-detects

The tolerance intervals for data with no non-detects available in Scout are:

° Normal:

LTL = x-K, - xS

( ">%•?)

UTL - x + K, , ,s

° Lognormal:
LTL = exV[y-K(n%i})Sy

UTL = exp y + K, , ,sv

I A'p) y

o Nonparametric:

o Percentile Bootstrap
o BCA Bootstrap

— \3

a-



®2(LOWER) ^

Z0 +

Z0 + 2

I 5

or/2

H

z0 +zQ,/2)«

z0=O

-I

#(x; < x)

^2 (UPPER) ^

z0 +

N

Zq + Z

-a/2



z0 + z" 0/2

LTL = x^2[i-0>vi:w))

UTL = x

_ —(a2(um:R))

o Percentile Tolerance
Details of those intervals can be found in the ProUCL 4.00.04 Technical Guide.

184


-------
1.

Click Stats/GOF > Intervals > Classical > Tolerance Intervals > No NDs.

Scout-' .4.0/--|5^W9r3jn,\S.couli_ForF_W|ind(^^coujSii,urce>Wb/^aU'nE^el\^yiiL IRIS])

5tats/GGF"

b§ File Edit Configure Data Graphs
Navigation Panel I

Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window Help

Name

ConfNoNDsNorm ost
D \Narain\Scout_Fo
ConfwNDs ost

Pnn1wMn« a nst

Descriptive	~

GOF	~

Hypothesis Testing ~ j

j-widlh pt-length

pt-width

Prediction Intervals ~"j,-

Tolerance Intervals ~ ! No NDs

Confidence Intervals ~ With NDs

Upper (Right-Sided) ~ JT

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options" for interval options.

iliE

.jAijgns, Tole,[anee) IjitepvalS) fc{o) MQs,

Confidence Level

Coverage | 0 9
Numberof Bootstrap Operations I 2000

OK

Cancel

A

o Specify the preferred "Confidence Level." The default is
"0.95."

o Specify the preferred coverage percentage. The default is
"0.9."

o Specify the preferred number of bootstrap operations. The
default is "2000."

o Click "OK" to continue or "Cancel" to cancel the options.

o Click "OK" to continue or "Cancel" to cancel the computations.

185


-------
Output for Classical Tolerance Intervals without Non-detects.



Tolerance Intervals/Limits (TLs) foi Datasets Wihtai Non-Detecti

Date/Tme of Computation

2/25/2008 7 51 11 AM

User Selected Options



Fiom File

D VJarain\Scout_For-Windows\ScoutSource\WofkDatlnExcet\Data\censcr-by-grp5l

Full Piecisicn

OFF

Number of Bootstrap Operators

2000

Coverage

09

Confidence Coefficient

0 95



X























Number of Valid Observations

53





-





Number of Distinct Observations

51























Raw Statistics























Mean

51 1











Minimum

1 5











52 Percentile

2606











102 Percentile

4 071











IstQuaHe

9 608











Median

24 56











3rd Quaitile

95.73











902 Percentile

1076











952 Percentde

1129











Maamirn

121 1











Standard Deviation

43 78











MAD i 0 6745

30 48











IQR / 1 35

64 57











12 Percentile (z)

-50 75











52 Percentile (z)

-20 91











102 Percentile (z)

-5 006











1st Quaitile (z)

21 57











ROS Median (z)

51 1











3rd Quarble (z)

80 64











902 Percentile (z)

107 2











952 Percentile (z)

1231











992 Percentile (z)

153























Normal T olerance Limits











Tolerance [Lower Limit

Upper Limit











Normal -35 74

1379























Log-T rantf ormed S tatbtics











Mean of Log-Transformed Data1 3 325











Standard Deviation of Log-Transformed Dataj 1 299























Log-T ransformed T olerance Linfa











Lognoimal^ 2119 [ 364 G























Nortparametric T olerance Linfc











2 Bootstiap

98 51

1164











BCA Bootstrap

97 97

1148











2 TL

2 053

1164























186


-------
6.4.3.2 With Non-detects

The tolerance intervals for data with non-detects available in Scout are:
o Normal:

o Using MLE of mean and standard deviation

o Using Normal ROS methods

o Lognormal ROS

o Using bootstrap methods based on Lognormal ROS

• Nonparametric:

o Nonparametric KM

Details of those intervals can be found in the ProUCL 4.00.04 Technical Guide and the
Scout Technical Guide.

187


-------
1. Click Stats/GOF ^Intervals >¦ Classical Tolerance Intervals ~ With
NDs.



~§i Fib Edit Conftgixe Data Graphs

CXjtfiers/Estnates Regression Multivariate EDA GeoStats Programs Wndow Help

Navigation Panel J



Descriptive
GOF

HvDothesK Testina

' 1 2 1 3 | 4 | 5 | 6 | 7

8

9

Name |

^ j-wfdth | pt-tength | pt-width

I I I





MWSMfeJ

7G

1 fffifercat? mi fifesteii C

Predrctlon Inte
Tolerance Inte

. „ i i

rvab ~ 	





ConfNoNDsNorm ost
D \Narain\Scout_Fo
ConfwNDs ost

77

.. ,

			

~1 Robust ~

rvals ~ I I





78

2

6 7

	y

j	







79

2

6

29

r

Upper (Rtght-Stded) >|| WDG315 Oil ®Si)(Z!BS(6l3^S ||

80

2

5 7

	Tsl

2 6:

-i ii

3,5| 1

in1 11

j j| With NDs (Bounded) |

2. The "Select Variables" screen (Section 3.2) will appear.

o Select one or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

o Click on "Options" for interval options.

ii(§pini0Wbiiyii&







Confidence Level j 1

M



Coverage | 0 9

Niimber of Bootstrap Operations | 2000

OK

Cancel









A

o Specify the preferred "Confidence Level." The default is
"0.95."

o Specify the preferred coverage percentage. The default is
"0.9."

o Specify the preferred number of bootstrap operations. The
default is "2000."

188


-------
o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the computations.

Output for Classical Tolerance Intervals with Non-detects.

Tolerance Intervals for Datasets with Non-Detects

Date/Time of Computation

2/25/2008 8-3G-35 AM

User Selected Options



From File

D \Naram\Scout_For_Windows\ScoutSou[ce\WorkDatlnExcel\Data\censo!-by-gips1

Full Precision

OFF

Number of Bootstrap Operations

2000

Coverage

09

Confidence Coefficient

0 95 '

K2 represents the two-sided cutoff for tolerance intervals based upon the procedure desaked h Hahn and Meekef (1331)





X































Number of Valid Observations

53















Number of Distinct Observations

49















Number of Non-D elect Data

4















Number of Detected Data

49















Minimum Detected

3 202















Maximum Detected

121 1















Percent Non-Detects

7 552















Minimum Non-detect

1 5















Maximum Non-detect! 4

i































Raw Statistics

|











Mean of Detected Data

55 05

l











SD of Detected Data

43 2































Maximum Likelihood Estimates (MLEs)















MLE Mean

48 85















1 2 Percentile (2)

•59.95















52 Percentile (2]

•28.07















1 02 Percentile (2]

-11.08















1st Quaitile (2]

17 31















ROS Median (2]

48 88















3rd Quaitile (2]

80 41















902 Percentile (2]

1088















952 Percentile (z]

125 8















992 Percentile (z]

157 7















MLE Stdv

48 77















189


-------
Output for Classical Tolerance Intervals with Non-detects (continued).

K2| 1.983























N ormal T olerance 1 ntervab













Lower Limit

Upper Limit











MLE

-43 91

141.6























NormalRQS Statistics











Minimum of ROS Data

•49.39











Maximum of ROS Data

121.1











Mean of ROS Data

48.0G











SD of ROS Data

48.36











K2

1 983























Nonparamtric Percentiles Using ROS Data











1% ROS Percentile

•49.39











5% ROS Percentile

-36 93











10% ROS Percentile

3.513











1st ROS Quartile

9.608











ROS Median

24.26











3rd ROS Quartile

95.73











90% ROS Percentile

107 6











95% ROS Percentile

112.9











99% ROS Percentile

118.7























Parametiic Percentiles Using Normal Distribution











1% ROS Percentile [z)

-64.44











5% ROS Percentile [z)

-31.49











10% ROS Percentile [z)

•13.92











1st ROS Quartile [z)

15.44











ROS ROS Median (z)

48 06











3rd ROS Quartile (z)

80.68











90% ROS Percentile [z)

110











95% ROS Percentile (z)

127.6











99% ROS Percentile [z)

160 6























Normal ROS Tolerance Interval













Lower Limit

Upper Limit











Normal

¦47.86

144











190


-------
Output for Classical Tolerance Intervals with Non-detects (continued).

Log-T ransfoimed Statistics







Mean of Log-Transformed Detected Data

3.523







Stdv of Log-Transformed Detected Data

1.128







Minimum of Lognormal ROS Data

2 204







Maximum of Lognormal ROS Data

121.1







Mean of Lognormal ROS Data

51.13







Stdv of Lognormal ROS Data

43.75







K2

1.983













'

N onparamtric Percentiles U sing ROS Data







1% ROS Percentile

2.204







5% ROS Percentile

3 041







10% ROS Percentile

4.174







1st ROS Quartile

9.608







ROS Median

24.26







3rd ROS Quartile

95.73







90% ROS Percentile

107 6







95% ROS Percentile

112.9







99% ROS Percentile

118.7















Parametric Percentiles Using Lognormal D isliiiuliuri







1%R0S Percentile (z)

1.493







5% ROS Percentile (z)

3.532







10% ROS Percentile (z)

5.589







1st ROS Quartile [z]

12.04







ROS ROS Median (z'

28 22







3rd ROS Quartile (z]

66.19







90% ROS Percentile [z]

142.5







95% ROS Percentile (z]

225.6







99% ROS Percentile (z]

533.6















Lognormal T olerance Intervals









Lower Limit

Upper Limit



•



ROS Lognormal

2 302

346







ROS % Bootstrap

98 51

116.4







ROS BCA Bootstrap

97 97

116.4















191


-------
Output for Classical Tolerance Intervals with Non-detects (continued).

Kaplan Meier Distribution Free Statistics







Mean

51.14







1X Percentile (z)

-49. GG







5Z Percentile (z)

-20.13







10% Percentile (z)

-4.339







1st Quartile (z)

21.91







Median (z)

51.14







3rd Quartile (z)

80.3G







30% Percentile (z)

10G.7







95% Percentile (z)

122.4







99% Percentile (z)

151.9







Standard Deviation

43.33







Kaplan Meier SEM

6.013







K2

1.983















Nonparametiic T olerance Intervals









Lower Limit

Upper Limit







KM Nonparametric

¦34.8

137.1















6.4.4 Classical Prediction Intervals
6.4.4.1 Without Non-detects

The prediction intervals for data with no non-detects available in Scout are (the square
root quantity, [(1 /k) + (l/n)] ", in the equations below is given for k = 1 future
observation):

o Normal

° Lognormal

exp



° Chebyshev

J:±-I=sjl + —

a

n

192


-------
o Nonparametric t

LPL = x

(»»)

m

= (« + !)

v2 y

UPL = x

H

m

= (« + !)

r a^
1--

V 2 y

Details of those intervals can be found in the ProUCL 4.00.04 Technical Guide and the
Scout Technical Guide.

I. Click Stats/GOF > Intervals > Classical > Prediction Intervals No NDs.

01 Scoutj 4}.0) - [D:\Narain\Scout_lior._Windpws\ScoiJlSource\WorkDairnExcel\BRADII).]!

Stats/GOF ij

~§ File Edit Configure Data Graphs
Navigation Panel

Outliers/Estimates Regression Multivariate EDA GeoStats Prog-ams Window Help

Name

D \Narain\Scout_Fo
D \Narain\Scout Fo .

1

Descnptive	~

GOF	~

Hypothesis Testing ~

x2

x3

4I

103|

~~95|

Robust ~

—IU7- ¦¦¦¦

9 9j

UBBMHMMB

Tolerance Intervals ~ With NDs
Confidence Intervals ~

Upper (Right-Sided) ~

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options" for interval options.

!H Options, Pre djfetib rtj Intervals] Nbj MDsj

Confidence Level ]
Different or Future K Values | 5

OK

Cancel

A

193


-------
o Specify the preferred "Confidence Level." The default is
"0.95."

o Specify the number of future k values. The default is "5."
o Click "OK" to continue or "Cancel" to cancel the options.
Click "OK" to continue or "Cancel" to cancel the computations.


-------
Output for Classical Prediction Intervals without Non-detects.

: Prediction Intervals/Limls [PLs] for Dataseis WttmJt NorvDetects

User Selected Options



Date/Time of Computation

2/25/2008 9:03:29 AM

From File

D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\Data\censor-by-grps1

Full Precision

OFF

Number of Future K Values

5

Confidence Coefficient

0 95



X























Number of Valid Observations

53











Number of Distinct Observations

51























Raw Statistics























Minimum

1 5











Mean

51 1











Median

24 5S











Maximum

121.1











Standard Deviation

43.78























Noimal Piediction Intervals











Normal

Lower Limit

Upper Limit











Student's 1

-37.58

139.8











For Next 5

-67.0G

169 3























Log-T ransformed Statistics























Mean of Log-Transformed Data

3.325











Standard Deviation of Log-Transformed Data

1 298























Lognormal

Lower Limit

Upper Limit











Log

2.007

385











For Next 5

0 838

922.5























Chebyshev

Lower Limit

Upper Limit











Chebyshevj -146 5

248 7























Nonpaiametric

Lower Limit

Upper Limit











Nonparametric

0 394

1195























195


-------
6.4.4.2 With Non-detects

The prediction intervals for data with non-detects available in Scout are:

° MLE-t
° Lognormal ROS -1
o Nonparametric

o KM Chebyshev
o KM-t
o KM - z

Details of those intervals can be found in the ProUCL 4.00.04 Technical Guide and the
Scout Technical Guide.

1. Click Stats/GOF !> Intervals > Classical > Prediction Intervals With
NDs.

Hi Scout; 4±Q) - fDiAHarainVScoutLJior^VVindov/sNScoutSourceW/orkDatlnExcelNDQtQ^ensorrb^.-grpsIji
Stats/GOF'

ay Fde Ed* Configure Data Graphs |
1 Navigation Panel I

Qjtfiers/Estmates Recession Multivariate EDA GeoStats Programs Wndow Help

Name

D \Narain\Scoul_Fo | 2
PredNoNOs ost j 3

Desapbve
GOF

Hypothesis Testrig ~

[D _>< GrouplX

4 52

7 233

i	

T Robust ~

U_broupl

Gtoup2<

U_biciup^

Group 3X

Classical M Prediction Intervals ~

	 No NDs

Tolerance Intervals >||

Confidence Intervals ~ j	1,1

Upper (Rtght-Sided) ~ 5	1

116467)
1029221

U_bfOUpJ|

	V I

II

93 859

	1

97334!"

2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options" for interval options.

196


-------
MS Options,Xoferancei Intervals; V/ithjMBs, [Xj

Confidence Level |

EES



Coverage |

09

Number of Bootstrap Operations |

2000

OK

Cancel

J

1 A

o Specify the preferred "Confidence Level." The default is
"0.95."

o Specify the number of future k values. The default is "5."
o Click "OK" to continue or "Cancel" to cancel the options.
° Click "OK" to continue or "Cancel" to cancel the computations.

197


-------
Output for Classical Prediction Intervals with Non-detects.
I	Piediction Intel vals foi D atasets w*h Non-Detects

User Selected Options

Date/Time of Computation

2/25/2008 9 06.12 AM

From File

D. \N arain\S cout_For_Windows\S coutS outceW/orkD atlnExcel\D ata\censor-by-grps1

Full Precision

OFF

Number of Future K Values

Confidence Coefficient

0.95

X























General Statistics











Number of Valid Observations

53











Number of Distinct Observations

49











Number of Non-Detect Data

4











Number of Detected Data

49











Minimum Detected

3.202











Maximum Detected

121.1











Percent Non-Detects

7.55%











Minimum Non-detect

1.5











Maximum Non-detect

4























Raw Statistics











Mean of Detected Data

55 05











SD of Detected Data

43 2























Maximum Likelihood Estimates (MLEs)











MLE Mean

48 86











1 % Percentile (z)

-59 95











5% Percentile (z)

-28 07











10% Percentile (z)

-11.08











1st Quartile (z)

17.31











ROS Median (z)

48.86











3rd Quartile (z)

80.41











90% Percentile (z)

108.8











95% Percentile (z)

125 8











99% Percentile (z)

157.7











MLE Stdv

46.77











198


-------
Output for Classical Prediction Intervals with Non-detects (continued).

1 1 1 1

Normal Prediction Intervals







Lower Limit

Upper Limit





MLE (t)

-45.88

143.G





Prediction Interval for Next 5

•77.37 •

175.1











Normal ROS Statistics





Minimum of ROS Data

-49.39





Mean of ROS Data

48 06





Maximum of ROS Data

121.1





SD of ROS Data

48.36











Nonparamtric Percentiles U sing ROS Data





1% ROS Percentile

-49.39





5% ROS Percentile

-36.93





10% ROS Percentile

3.513





1st ROS Quartile

9 608





ROS Median

24.26





3rd ROS Quartile

95.73





90% ROS Percentile

107.6





95% ROS Percentile

112.9





99% ROS Percentile

1187











Parametric Percentiles U sing NormalD istiixtion





1% ROS Percentile (z)

-64.44





5% ROS Percentile (z)

-31.49





10% ROS Percentile (z)

-13.92





1st ROS Quartile (z)

15.44





ROS ROS Median (z]

48 06





3rd ROS Quartile (z)

80.68





90% ROS Percentile (z)

110





95% ROS Percentile (z)

127.6





99% ROS Percentile (z)

160.6











Normal ROS Prediction Intervals







Lower Limit

Upper Limit





Normal

-49 89

146





Prediction Interval for Next 5

-82.46

178 6











199


-------
Output for Classical Prediction Intervals with Non-detects (continued).

Kaplan Meier Distribution Free Statistics



Mean

51.14



1Z Percentile (z)

¦49 66



5% Percentile (z)

¦20.13



10% Percentile (z)

-4.389



1st Quartile (z)

21 91



Median (z)

51.14



3rd Quartile (z)

80.36



30% Percentile (z)

106.7



35% Percentile (z)

122.4



33% Percentile (z)

151.9



Standard Deviation

43.33



Kaplan Meier SEM

6.013







Nonparametric Prediction Intervals





Lower Limit

Upper Limit



KM Chebyshev

¦144.5

246.7



KM (t)

-36.62

138.9



KM (z)

•34.58

136.9



Prediction Interval for Next 5

-65 8

168.1







6.5 Robust Intervals

Various robust and resistant univariate intervals (confidence intervals, prediction
intervals, tolerance intervals, and simultaneous intervals) can be computed using Scout.
For details of those robust intervals, refer to Kafadar (1982) and Singh and Nocerino
(1997). Singh and Nocerino (1997) discussed the performance of those intervals.
Typically, those robust procedures are iterative requiring initial estimates of location and
scale. In Scout, those robust intervals can be computed using the mean and the standard
deviation, or median and MAD/0.6745 as the initial estimates of center and location. The
different methods for the computation of the robust intervals available in Scout are:

o	PROP (using PROP influence function)

°	Huber (using Huber influence function)

o	Tukey's Biweight as described in Tukey (1977)

°	Lax/Kafadar Biweight as described in Kafadar (1982) and Horn (1988)

°	MVT (using trimming percentage)

200


-------
The performance of these intervals can also be compared using the graphics option in the
variable selection screen. If the graphics option is selected, then a plot of intervals will
be generated for all of the interval methods selected in the options window.

6.5.1 Robust Confidence Intervals

1. Click Stats/GOF > Intervals > Robust E> Confidence Intervals.

§1 Scout4.0) - rD:Warain\Scoul' For, WindowsXScoutSourceW/orkDalliiExceWylAGKLOSSlj

Help

Navigation Panel | J



Descriptive ~ | 2 3 4 5

6

7 j

E

Name

GOF ~ t A j n

jemp. Acid-Conc
Hvnohhesis Testinn ~ 1		r-r*			









1 !

Classical

!







2

	

Predction Intervals
Tolerance Intervals



i

3

37| 75

j



A

28| 62i 24
18j ~ "22i



--- -h





5

Simultaneous Intervals
Group Analysis

6

18j 62] 23j





2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

201


-------
o Click on "Options" for interval options.

m

Robust GonfidenGei Inter, valis; Qp.tibns,

-PROP Method Options

F7 PROP

-# Iterations ¦

10

rInitial Estimate —
f Mean/Stdv
f* Median/MAD

"Influence Alpha
I 005 '

"MDs Distribution -|
(• Beta
Chisquared

-Huber Method Options

P? Hubei

-tt Iterations

10

—Initial Estimate ~
C Mean/Stdv
(* Median/MAD

"Influence Alpha
( 005

rMDs Distribution
(* Beta
f Chisquared

~Tukey Biweight Method Options
"tt Iterations

Tukey



B weight

10
Maximum

-Initial Estimate —
C- Mean/Stdv
t* Median/MAD

—Tuning Constants

r~r~

Location

4

Scale

-Lax/Kafadar Biweight Method Options
I-8 Iterations	

9

Lax/Kafader
B i weight

10

Maximum

—Initial Estimate —
C Mean/Stdv
<• Median/MAD

"Tuning Constant ~

I i

"MVT Method Options 	

rtt Iterations ¦

f? MVT

10

"Initial Estimate —
f Mean/Stdv
(• Median/MAD

—T rimming %

01

Confidence Level

0 95

OK

Cancel

A

o Choose your methods and options. All of the options displayed
in the above graphical user interface (GUI) are the default
options.

o Click "OK" to continue or "Cancel" to cancel selected options,
o Click "Graphics" for the graphics option.

202


-------
Conjideneei Intei; vats, plot;

1*7 Generate Robust Intervals Plot

Intervals Plot Title
(Robust Confidence Intervals

OK

Cancel



o Click "OK" to continue or "Cancel" to cancel graphics
options.

° Click "OK" to continue or "Cancel" to cancel the computations.

Output for Robust Confidence Intervals.

Date/Time of Computation

: Robust Conlidence Intervals

11/15/20081! 48 55 AM

User Selected Options
From File

Full Precision
Confidence Coefficient

j D \Narain\3cout_Fcii_Windows\ScoutSource\WorkDallnExcelVSTACKLQSS
[OFF
|0 95

PROP Method | Influence Function Alpha of 0 05 with MDs following Beta Distribution.

| PROP CLs derived using 10 Iterations and initial estimates of median/MAD
Hubei Method [Influence Function Aipiia of 0.05 with MDs following Beta Distribution

jHuber CLs derived using 10 Iterations and initial estimates of median/MAD

Tukey Biweight Method j Location Tuning Constant of 4 and a Scale Tunmg Constant of 4

jTukey CLs derived using a Maximum of 10 Iterations and initial estimates of median/MAD.
Lax/Kafader Biweight Method j Tuning Constant of 4

ILax/Kafadei CLs derived using a Maximum of 10 Iterations and initial estimates of median/MAD

MVT Method

| T riming Percentage of 10X

MVT CLs derived using 10 Iterations and initial estimates of median/MAD

Stack-Lo»









'





.





Numbei





Standard

MAD/













Obs.

Mean

Median

Deviation

0.6745

SE Mean

Critical t

LCL

UCL



Classical

21

17 52

. .15

1017

5 93

2.22

2 086

12 89

2215

	



Initial

Initial

Final

Final













Method

Mean

Stdv

Mean

Stdv

Wsum

SEM

Critical t

LCL

UCL



PROP

15

5 93

133

4 206

1713

1.016

2119

11 14

15.45



Hubei

15

5 93

1G76

8.79

20 3

1957

2 091

1268

" 20 84



TukepBiweicft

15

5 93

1321

5 839

16 G3

1.432

2124

10.17

16 25



Lax Kafader Biweighi

15

5 93

14 57

7.571

1742

1 314

"" 2116

1074

1841



MVT

15

5 93

1521

7413

19

1.701

2101

11 64

18 78

-

203


-------
Output for Robust Confidence Intervals (continued).

Robust Confidence Intervals

l-fcjber	Tukey

Intervals for Stack-Loss

6.5.2 Robust Simultaneous Intervals

1. Click Stats/GOF ~Intervals ~ Robust ~ Simultaneous Intervals.

SB Scout 4.0 - [D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExEel\STACKLOSS]

00 File Edit Configure Data Graphs

9 Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel j

1 f

Descriptive ~
GOF ~
Hypothesis Testing ~

2 3

4 5

6

7

t

Name

emp. Acid-Conc

/" .	1 83



	'

<—-——



RobConflnt.ost

	!	I

2



u

Classical ~

Prediction Intervals
Tolerance Intervals







3

37 75











4

28 G2 24

Confidence Intervals







5

18 62 22

Simultaneous Intervals







fi

18 62 23

Group Analysis







2. The "Select Variables" screen (Section 3.2) will appear.

• Select one or more variables from the "Select Variables" screen.

° If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The

204


-------
user should select and click on an appropriate variable representing a
group variable.

o Click on "Options" for interval options.

1551

E^ll^Jt^Sirn,u)tah%o.u^ IhjtefcyalSj Options;

- PR 0 P M ethod 0ptions 	

r~# Iterations ¦

(7 PROP

10

"Initial Estimate —
C Mean/Stdv
f? Median/MAD

"Influence Alpha

0.05

~MDs Distribution
(* Beta
C Chisquared

"Huber Method Options

ft? Huber

¦# Iterations

10

"Initial Estimate —
C Mean/Stdv
f Median/MAD

-Influence Alpha —

0 05

"MDs Distribution
(* Beta
C Chisquared

"Tukey Biweight Method Options
p# Iterations

r? TukeV
Bi weight

10

Maximum

"Initial Estimate —
C Mean/Stdv

Median/MAD

"Tuning Constants

| 4

Location

4

Scale

"Lax/Kafadar Biweight Method Options
I-# Iterations	

Lax/Kafader
Biweight

10
Maximum

"Initial Estimate —
r Mean/Stdv
(* Median/MAD

"T uning Constant ~

-"MVT Method Options

W MVT

¦# Iterations ¦

10

—Initial Estimate —
f Mean/Stdv
t* Median/MAD

-Trimming %

0.1

Confidence Level
| 095

OK

Cancel

A

o Specify the preferred options. All of the options displayed are
defaults.

o Click "OK" to continue or "Cancel" to cancel the options.

o Click "Graphics" for the graphics option.

205


-------
y	 .... .... 	

iigaimDJanmns OodaarcefelM&fi

H

P Generate Robust Intervals Plot



Intervals Plot Title



(Robust Simultaneous Intervals



OK | Cancel

1



o Click "OK" to continue or "Cancel" to cancel graphics
options.

Click "OK" to continue or "Cancel" to cancel the computations.


-------
Output for Simultaneous Intervals.

! Robust Simultaneous Intervals/Units (SLs)

Date/Time of Computation

2/25/2003 9 22 03 AM















User Selected Options





-













From File

D \Narain\Scout_Foi_Wmdows\ScoutS ourceNWorkDatlnExcel\Data\censor-by-grps1





Full Precision

OFF

















Confidence Coefficient

095

















PROP Method

Influence Function Alpha of 0 05 with MDs foUowng Beta Distribution

jPROP SLs derrved using 10 Iterations and initial estimates of median/MAD

Hubei Method

Influence Function Alpha of 0 05 with MDs following Beta Distribution







| Huber SLs derived using 10 Iterations and initial estimates o( median/MAD

Tukey Biweight Method

Location Tuning Constant of 4 and a

Scale Tuning Constant of 4









Tukey SLs derived using a Maximum of 10 Iterations and initial estimates of ntedian/MAD

Lax/Kafader Biweight Method

Tuning Constant of 4

















Lax/Kafader SLs derived using a Maximum of 10 Iterations and initial estimates of median/MAD

MVT Method

Turning Percentage of 10.08£



MVT SLs derived using 10 Iterations and initial estimates of median/MAD.

D2Max represents unsquared critical value of Max-MD (Mahalanobis Distances) computed based MponWsun Values



X















Number





Standard

MAD/













Obs.

Mean

Median

Deviation

0.6745

D2Max

LSL

USL





Classical

53

51 1

24.56

43 78

30.48

3.151

¦86 88

1891





























Initial

Initial

Final

Final













Method

Location

Scale

Mean

Stdv

Wsum

D2Max

LSL

USL





PROP

24 56

30 48

51 1

43 78

53

3151

-86 88

1891





Huber

24 56

30.48

51.1

43 78

53

3151

¦86 88 | 1891





Tukey Biweicft

24 56

30 48

1495

159

41

3 047

-33 48

63 38





Lax Kafader BiweitJ*

24 56

30 48

14 02

13.09

4983

3127

-26 9

54 93





MVT

24 56

30 48

44 44

40 48

48

3112

-81 52

1704

	 . _











207


-------
Output for Simultaneous Intervals (continued).

Robust Simultaneous Intervals

Hiiser	TiAey

Intervals for X

6.5.3 Robust Prediction Intervals

1. Click Stats/GOF ~Intervals ~ Robust ~ Prediction Intervals.

eBScout 4.0 [D:\Narain\Scout_For_Windows\ScoutSource\WDrkDatlnExceKSTACKl_OSS]

¦t1 File Edit Configure Data Graphs Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel |

	

Descriptive ~

GOF ~
HvDothesisTestina ~

2

3 4

5

6

7

8

Name

emp.

Acid-Conc









| \Narain\Scouf_Fo, |



!| Intervals > 1

Bb

Classical ~ l









RobConflnt.ost
RobSimulnt.ost

2

-

Robust >

Prediction Intervals







3

37 75 	ra



Tolerance Intervals









4

28 62 24



Confidence Intervals









5

18 62 22



Simultaneous Intervals
Group Analysis







6

18 62 23









2. The "Select Variables" screen (Section 3.2) will appear.

® Select one or more variables from the "Select Variables" screen.

208


-------
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options" for interval options.

A

o Specify the preferred options. All of the options displayed are
defaults.

o Click "OK" to continue or "Cancel" to cancel the options.
° Click "Graphics" for the graphics option.

209


-------
H f?ce.dietioiij Intervals; ^Iflt

X

l>/ Generate Robust Intervals Plot

Intervals Plot Title
(Robust Prediction! ntervals

OK I

Cancel

A

o Click "OK" to continue or "Cancel" to cancel graphics
options.

o Click "OK" to continue or "Cancel" to cancel the computations.

Output for Robust Prediction Intervals.

Dale/T ime ol Computation

¦ Robust Prediction Interval

11/15/200812-13 44 PM

User Selected Options

From File j DANaiainSScout_For_Windows\ScoutS ourceW/orkD atlnE xcel\S TACKLO S S
Full Precision jOFF

Confidence Coefficient

PROP Method

Huber Method

Influence Function Alpha of 0 05 with MDs following Beta Distribution

PROP PLs derived using 10 Iterations and initial estimates of median/MAD

Influence Function Alpha of 0.05 with MDs following Beta Distribution.
Huber PLs deiived using 10 Iterations and initial estimates of median/MAD

Tukey Biweight Method | Location Tuning Constant of 4 and a Scale Tuning Constant of 4

iTukey PLs derived using a Maximum of 10 Iterations and initial estimates of median/MAD

Lax/Kafader Biweight Method

T uning Constant of 4

Lax/Kafader PLs derived using a Maximum of 10 Iterations and initial estimates of mediari/MAD



|MVT PLs derived using 10 Iterations and initial estimates of median/MAD

	

- 			

	























Air-Flow













	



I



Number





Standard

MAD/









Ob*.

Mean

Median

Deviation

0.6745

SE Mean

Qitical t

LPL

UPL



Classical

21

HI 43

58

8168

583

2 001

2 08G |

40 85

80



























Initial

Initial

Final

Final I

| I





Method

Mean

Stdv

Mean

Stdv

Wsum

SEM

Critical t

LPL

UPL



PROP

58

583

57.18

5.02

17 54

1 199

2.114

46 26

68 09



Huber

53

5 83

GO 07

8 546

20 62

1 882

2 089

41.78

78 34



TukeyBiweicfit

58

5 83

57.48

7 438

17 66

1 784

2113

41.18

73 76



Lax Kafader Biweir^t

58

5 83

59 41

4164

14 84

1 081

2.147

5018

68 65



MVT

58

5 33

58 37

6 809

18

1 5G2

2.101

43 63

73 04























"

210


-------
Output for Robust Prediction Intervals (continued).

Robust Predictionlntervals

Hubw	Tukey

Intervals for Stack-Loss

6.5.4 Robust Tolerance Intervals

1. Click Stats/GOF ~Intervals ~ Robust ~ Tolerance Intervals.

S§ Scout 4.0 [D:\Narain\Scout_For_Windows\ScoutSoiirce\WorkDatlnExcel\STACKLOSS]

¦0 File Edit Configure Data Graphs

Stats/GOF

| Outliers/Estimates Regression Multivariate EDA GeoStats Programs Window

Help

Navigation Panel |



Descriptive ~
GOF ~
Hvnothesis Testina ~

2 3

4

5

6

7

8

Name

emp. Acid-Conc













1 i

Intervals ~ 1

Classical ~ I











RobConflnt.ost
RobSimulnt.ost
RobPredlnt.ost

2

	



Robust ~

Prediction Intervals







3

37



Tolerance Intervals







T1

28

62 24

Confidence Intervals









5

18

S2 22

Simultaneous Intervals
Group Analysis









6

18

62 23







2. The "Select Variables" screen (Section 3.2) will appear.

• Select one or more variables from the "Select Variables'' screen.

211


-------
o If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

° Click on "Options" for interval options.

pobust' Tolerance; InteryalSj QjJtibns,

¦PROP Method Options

rtt Iterations

F? PROP

10

"Initial Estimate —
f Mean/Stdv
C Median/MAD

-Influence Alpha —

0.05

rMDs Distribution
(* Beta
C Chisquared

—Huber Method Options

I? Huber

"8 Iterations

10

"Initial Estimate —
C Mean/Stdv
<• Median/MAD

"Influence Alpha —

0.05

rMDs Distribution
(* Beta
f Chisquared

"Tukey Biweight Method Options
rtt Iterations ¦

a Tukey
Biweight

10
Maximum

r Initial Estimate ~~
C Mean/Stdv
Median/MAD

"Tuning Constants

I 4

Location

4

Scale

"Lax/Kafadar Biweight Method Options
rtt Iterations ¦

rj Lax/Kafader
Biweight

10
Maximum

r Initial Estimate —
Mean/Stdv
(* Median/MAD

-Tuning Constant _

-MVT Method Options

MVT

"tt Iterations

10

r Initial Estimate —
C Mean/Stdv
(• Median/MAD

—T rimming %

0.1

Confidence Level
| 095
Coverage

*

OK

Cancel

A.

o Specify the preferred options. All of the options displayed are
defaults.

o Click "OK" to continue or "Cancel" to cancel the options.

212


-------
Click "Graphics" for the graphics option.

il !?[edibtioni Inteijvalsj £lot

F7 Generate Robust Intervals Plot

Intervals Plot Title
(Robust Predictionlntervals

OK	Cancel

A

o Click "OK" to continue or "Cancel" to cancel graphics
options.

Click "OK" to continue or "Cancel" to cancel the computations.


-------
Output for Robust Tolerance Intervals.



Robust T olerance Inter valsAJmis (TLs)

Date/Tirne of Computation

2/25/2008 9 23 20 AM

User Selected Options



From File

DANarain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\Data\censor-by-grps1

Full Precision

OFF

Confidence Coefficient

0 95

Coverage

09

PROP Method

Influence Function Alpha of 0 05 with MDs following Beta Distribution



PROP TLs derived using 10 Iterations and initial estimates of rnedian/MAD.

Huber Method

Influence Function Alpha of 0 05 with MDs following Beta Distribution



Huber TLs derived using 10 Iterations and initial estimates of median/MAD

Tukey Biweight Method

Location Tuning Constant of 4 and a Scale Tuning Constant of 4



Tukey TLs derived using a Maximum of 10 Iterations and initial estimates of median/MAD

Lax/Kafader Biweight Method

T uning Constant of 4



Lax/Kafader TLs derived using a Maximum of 10 Iterations and initial estimates of median/MAD

MVT Method

T riming Percentage of 10SJ



MVT TLs derived using 10 Iterations and initial estimates of median/MAD.

K2 represents the two-sided cutoff for tolerance intervals and is computed based ipon Wsun Values

following the procedure described in Hahn and Meeker (1991)



X













Number





Standard

MAD/











Obs.

Mean

Median

Deviation

0.6745

k2

LTL

UTL



Classical

53

51.1

24.56

43.78

30.48

1 993

¦35.74

137.9









Initial

Initial

Final

Final











Method

Location

Scale

Mean

Stdv

Wsum

k2

LTL

UTL



PROP

24 5G

30.48

51 1

43 78

53

1.983

•35.74

137 9



Huber

24 56

30.48

51.1

43.78

53

1 983

•35.74

137.9



Tukey Biwei^t

24 56

30 48

14 95

159

41

2 045

¦17 56

47 46



Lax Kafader Biweight

24 56

30 48

14 02

13 09

49 83

1.997

-12.12

4015



MVT

24 56

30.48

44.44

40.48

48

1 983

-35.85

124.7







214


-------
Output for Robust Tolerance Intervals (continued).

Robust Tolerance Intervals

140.0	_____	_____

130X1

120 X)

110XJ	9

IOOjD
90X1

800	«

70X1
60.0
3 500

a			

a

40.0
30.0
20 a
10 o
0.0
-10.0
-20 X)

¦30 X)

-40 X)

-SOO

Classical	PROP	tt-fcet	Tukey

Intervals for X

6.5.5 Intervals Comparison

1. Click Stats/GOF ~ Intervals ~ Robust ~ Intervals Comparison.

IB Scout 2008 - [D:\Narain\Scout_For_Windows\ScoutSource\WorkDatlnExcel\BRADU]

a2 File Edit Configure Data Graphs

Stats/GOF

1 Outliers/Estimates Regression Multivariate EDA Geo5tats Programs Window

Help

Navigation Panel

	L

¦ 1

Descriptive ~
GOF ~
Hvoothesis Testina ~

2 3 4

5

6

7

£

Name

x1 x2 x3











1

^Intervals

D

—1CL1-	>19.6 28.3

Classical ~ ¦









IntComp.gst

2





I

P.obust ~

Prediction Intervals
Tolerance Intervals







3

3

10.3













4

4

9.5

9.9

Confidence Intervals









5 j

5

10

10.3

Simultaneous Intervals







6

6

10

10.8

Interval Comparison







2. The "Select Variables" screen (Section 3.2) will appear.

• Select one or more variables from the "Select Variables" screen.

215


-------
If the results have to be produced by using a Group variable, then select a
group variable by clicking the arrow below the "Group by Variable"
button. This will result in a drop-down list of available variables. The
user should select and click on an appropriate variable representing a
group variable.

Click on "Options" for interval options. The options screens shown below
are the default options screen and the options screen for the PROP
method.

SH Q^tionsliitenyalsRobustGA;

"Select Method		—

<• Classical

r prop

Huber

Tukey B weight
C Lax Kafader Biweight
r MVT

"Select Intervals
W Prediction Intervals

1*7 Tolerance Intervals

I<7 Simultaneous Intervals

OK

-Confidence Level
[095

"Converage
[09

T itle for Method Analysis

Cancel Intervals




-------
13 OptionslntervalsRobustGA

Select Methcd
C Classical
f PROP
Huber
C TiJcey Biweight
C Lax Kafader El iweight

C MVT

I	

Select Intervals
f>? Prediction Intervals

& Tolerance Intervals

W Simultaneous Intervals

Confidence Level



Convetage

[0.95



|0.9

Initial Estimate

MDs Distribution

f Mean/Stdv

Beta

t* MediaVMAD

Chisquared

8 Iterations

InlluenceAlpha

1 10

| 0.05

Maximum



T itle lor Method Analysis

OK	Cancel | j Interval:;

o Specify the preferred options.

o Click "OK" to continue or "Cancel" to cancel the options.
• Click "OK" to continue or "Cancel" to cancel the computations.

Output for Intervals Comparison (Default Options — Classical on data set BRADU.xls).

Classical Intervals

M	M

¦	SS* Predctioo units

¦	low-5 727178

¦	Upper-8284S111

¦	35%OtU«neouslm4s

¦	Lower » -10.18796

¦	Upper* 12.745290
~ ¦95%TolnenceL»nl*

" , , 4 I wth 90% Coverage
*	¦ Lower--5 *18152

* ¦ Upper » 7 9754854

¦	OasscalMean

¦	Mean • 12786687

-10 3 ¦
-113

Index of Observations

217


-------
Output for Intervals Comparison (Default Options - PROP on data set BRADU.xls).

Robust PROP Intervals using Median/MAD

¦	95% Pradction Lrij

¦	Lower --1 185427

¦	l#per-1.0531190

¦	95% Smutaneous UmU

¦	Lower--1.662105

¦	upper. 1.7297975

¦	95% Tolerance Liml*

¦	wBh 90% Coverage

¦	Lower--1146212
| Ifcper »1.0139043

¦	PROP Mean

¦	Mean--0.066154

M MM

Index of Observations

6.5.6 Group Analysis

This option in Scout is used for comparing the intervals for each of the groups in a
particular variable of the data.

1. Click Stats/GOF ~Intervals ~ Robust ~ Intervals Comparison.

218


-------
2. The "Select Variables" screen (Section 3.2) will appear.

° Select one or more variables from the "Select Variables" screen.

o Select the Group variable by clicking the arrow below the "Group by
Variable" button. This will result in a drop-down list of available
variables. The user should select and click on an appropriate variable
representing a group variable.

° Click on "Options" for interval options. The options screen shown below
is the options screen for the PROP method.

Options Prediction*Intervals Comparison!by;Gcou|)>

"Select Method	

C Classical

PROP
C Huber
f Tukey Biweight
f Lax Kafadar Biweight
C MVT

'Confidence Level
[095

'Future K -

pInitial Estimate 	

C Mean/Stdv
(* Median/1.48MAD

-MDs Distribution
(• Beta
r Chisquared

-tt Iterations "

10
Maximum

'InfluenceAlpha ¦

0 025

l* Use Default Title

OK

Cancel



o Specify the preferred input parameters for PROP method,
o Click "OK" to continue or "Cancel" to cancel the options,
o Click "OK" to continue or "Cancel" to cancel the computations.

219


-------
Output for Group Analysis (PROP Options - FULLIRIS.xls).

Prediction Intervals by Group

<}

5 7n

5 i-QM

Group 1
n » 50
sd-0.35
Wsum = 4989

Group 2
n - 50
sd-0.52
Wwm ¦ 50.00

95% Prediction Intervals for sp-length (Future K ® 1)

PROP Method

Mid Estimate
Medanrt 4&AAD
Bda MDs OstilbUion
Influence Alpha ¦ 0.025
Nunber teraUons • 10

220


-------
References

Dixon, W.J., and Tukey, J.W. (1968). "Approximate Behavior of Winsorized
(trimming/Winsorization 2)," Technometrics, 10, 83-98.

Fisher, A. and Horn, P. (1994). "Robust Prediction Intervals in a Regression Setting."
Computational Statistics & Data Analysis, 17, pp. 129-140.

Giummore, F. and Ventura, L. (2006). "Robust Prediction Limits Based on M-
estimators," Statistics and Probability Letters, 76, 1725-1740.

Gross, A.M. (1976). "Confidence Interval Robustness with Long-Tailed Symmetric
Distributions," Journal of the American Statistical Association, 71, 409-417.

Horn, P.S., Britton, P.W, and Lewis, D.F. (1988). "On The Prediction of a Single Future
Observation from a Possibly Noisy Sample," The Statistician, 37, 165-172.

Huber, P.J. (1981). Robust Statistics, John Wiley and Sons, NY.

Kafadar, K. (1982). "A Biweight Approach to the One-Sample Problem," Journal of the
American Statistical Association, 77, 416-424.

Mardia, K.V. (1970). "Measures of Multivariate Skewness and Kurtosis with
Applications," Biometrika, 57, 519-530.

ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 User Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:
http ://wwvv.epa. go v/esd/tsc/software. htm.

ProUCL 4.00.04. (2009). "ProUCL Version 4.00.04 Technical Guide." The software
ProUCL 4.00.04 can be downloaded from the web site at:
http://www.epa.gov/esd/tsc/software.htm.

Royston, J. P. (1982). "The W test for Normality," Applied Statistics, 31, 2, 176-180.

Scout. 2002. A Data Analysis Program, Technology Support Project, USEPA, NERL-
LV, Las Vegas, Nevada.

Scout. 2008. Technical Guide under preparation.

Singh, A., and Nocerino, J.M. 1997. "Robust Intervals in Some Chemometric
Applications," Chemometrics and Intelligent Laboratory Systems, 37, pp. 55-69.

221


-------
Singh, A. and Nocerino, J.M. 2002. "Robust Estimation of the Mean and Variance
Using Environmental Data Sets with Below Detection Limit Observations,"
Chemometrics and Intelligent Laboratory Systems Vol. 60, pp. 69-86.

Singh, A. 1993. Omnibus Robust Procedures for Assessment of Multivariate Normality
and Detection of Multivariate Outliers, In Multivariate Environmental Statistics, Patil,
G.P. and Rao, C.R., Editors, pp. 445-488, Elsevier Science Publishers.

Tukey, J.W. (1977). Exploratory Data Analysis, Addison-Wesley Publishing Company,
Reading, MA.

USEPA. 2006. Data Quality Assessment: Statistical Methods for Practitioners, EPA
QA/G-9S. EPA/240/B-06/003. Office of Environmental Information, Washington,
D.C. Download from: http://www.epa.gov/quality/qs-docs/g9s-final.pdf.

222


-------