United States
       Environmental Protection
       Agency
Office of Environmental
Information
Washington, DC 20460
EPA/240/B-06/003
 February 2006
S-EPA  Data Quality Assessment:
       Statistical Methods for
       Practitioners
       EPA QA/G-9S

-------

-------
                                    FOREWORD

       This document, Data Quality Assessment: Statistical Methods for Practitioners, provides
general guidance to organizations on assessing data quality criteria and performance
specifications.  The Environmental Protection Agency (EPA) has developed the Data Quality
Assessment (DQA) Process for project managers and planners to determine whether the type,
quantity, and quality of data needed to support Agency decision making has been achieved. This
guidance is the culmination of experiences in the design and statistical analyses of environmental
data in different Program Offices at the EPA. Many elements of prior guidance, statistics, and
scientific planning have been incorporated into this document.

       This document is one of a series of quality management guidance documents that the
EPA Quality Staff has prepared to assist users in implementing the Agency-wide Quality
System. Other related documents include:

       EPA QA/G-4        Guidance on Systematic Planning Using the Data Quality
                          Objectives Process

       EPA QA/G-5S      Guidance on Choosing a Sampling Design for Environmental Data
                          Collection

       EPA QA/G-9R      Data Quality Assessment: A Reviewer's Guide

       This document provides guidance to EPA program managers and planning teams as well
as to the general public as appropriate. It does not impose legally binding requirements and may
not apply to a particular situation based  on the  circumstances.  EPA retains the discretion to
adopt approaches on a case-by-case basis that differ from  this guidance where appropriate.

       This guidance is one of the U.S Environmental Protection Agency Quality System Series
documents. These documents  describe the EPA policies and procedures for planning,
implementing, and assessing the effectiveness of the Quality System. These documents are
updated periodically to incorporate new topics and revision or refinements to existing
procedures. Comments received on this, the 2006 version, will be considered for inclusion in
subsequent versions.  Please send your comments to:

             Quality Staff (2811R)
             U.S. Environmental Protection Agency
             1200 Pennsylvania Avenue, NW
             Washington, DC 20460
             Phone: (202) 564-6830
             Fax: (202)565-2441
             E-mail: quality@epa.gov

Copies of the EPA Quality System documents  may be downloaded from the Quality Staff Home
Page: www.epa.gov.quality.

EPA QA/G-9S                               iii                               February 2006

-------
                                       PREFACE

       Data Quality Assessment: Statistical Methods for Practitioners describes the statistical
methods used in Data Quality Assessment (DQA) in evaluating environmental data sets. DQA is
the scientific and statistical evaluation of environmental data to determine if they meet the
planning objectives of the project, and thus are of the right type, quality, and quantity to support
their intended use. This guidance applies DQA to environmental decision-making (e.g.,
compliance determinations) and also addresses DQA in environmental estimation (e.g.,
monitoring programs).

       This  document is distinctly different from other guidance documents in that it is not
intended to be read in a linear or continuous fashion.  Instead, it is intended to be used as a
"tool-box" of useful techniques in assessing the quality of data. The overall structure of the
document will enable the analyst to investigate many problems using a systematic methodology.
Each statistical technique examined in the text is demonstrated separately in the form of a series
of steps to be taken.  The technique is then illustrated with a practical example following the
steps described.

       This  document is intended for all EPA and extramural organizations that have quality
systems based on EPA policies and specifications, and that may periodically assess these quality
systems (or have them assessed by EPA) for compliance to the specifications. In addition, this
guidance may be used by other organizations that assess quality systems applied to specific
environmental programs.

       The guidance provided herein is non-mandatory and is intended to help personnel who
have minimal experience with statistical terminology to understand how a technique  works and
how it may be applied to a problem.  An explanation of DQA in plain English may be found in
the companion guidance document, Data Quality Assessment: A Reviewer's Guide (EPA
QA/G-9R)(U.S. EPA,  2006b).
EPA QA/G-9S                                iv                                 February 2006

-------
                              TABLE OF CONTENTS

                                                                               Page
INTRODUCTION	1
STEP 1:  REVIEW DQOs AND THE SAMPLING DESIGN	5
         1.1  OVERVIEW AND ACTIVITIES	6
STEP 2:  CONDUCT A PRELIMINARY DATA REVIEW	9
         2.1  OVERVIEW AND ACTIVITIES	12
         2.2  STATISTICAL QUANTITIES	12
             2.2.1  Measures of Relative Standing	12
             2.2.2  Measures of Central Tendency	13
             2.2.3  Measures of Dispersion	15
             2.2.4  Measures of Association	16
                   2.2.4.1  Pearson's Correlation Coefficient	16
                   2.2.4.2  Spearman's Rank Correlation	18
                   2.2.4.3  Serial Correlation	18
         2.3  GRAPHICAL REPRESENTATIONS	20
             2.3.1  Histogram	20
             2.3.2  Stem-and-LeafPlot	21
             2.3.3  Box-and-Whiskers Plot	23
             2.3.4  Quantile Plot and Ranked Data Plots	25
             2.3.5  Quantile-Quantile Plots and Probability Plots	26
             2.3.6  Plots for Two or More Variables	27
                   2.3.6.1  Scatterplot	29
                   2.3.6.2  Extensions of the Scatterplot	30
                   2.3.6.3  Empirical Quantile-Quantile Plot	31
             2.3.7  Plots for Temporal Data	32
                   2.3.7.1  Time Plot	33
                   2.3.7.2  Lag Plot	34
                   2.3.7.3  Plot of the Autocorrelation Function (Correlogram)	34
                   2.3.7.4  Multiple Observations in a Time Period	35
                   2.3.7.5  Four-Plot	36
             2.3.8  Plots for Spatial Data	37
                   2.3.8.1  Posting Plots	38
                   2.3.8.2  Symbol Plots and Bubble Plots	39
                   2.3.8.3  Other Spatial Graphical Representations	39
         2.4  PROBABILITY DISTRIBUTIONS	40
             2.4.1  The Normal Distribution	40
             2.4.2  The ^-Distribution	41
             2.4.3  The Lognormal Distribution	41
             2.4.4  Central Limit Theorem	41
STEP 3:  SELECT THE STATISTICAL METHOD	43
         3.1  OVERVIEW AND ACTIVITIES	46
         3.2  METHODS FOR A SINGLE POPULATION	46
             3.2.1  Parametric Methods	48
                   3.2.1.1  The One-Sample Mest and Confidence Interval or Limit	48
EPA QA/G-9S                              v                               February 2006

-------
                                                                               Page
                   3.2.1.2  The One-Sample Tolerance Interval or Limit	48
                   3.2.1.3  Stratified Random Sampling	51
                   3.2.1.4  The Chen Test	52
                   3.2.1.5  Land's Method for Lognormally Distributed Data	57
                   3.2.1.6  The One-Sample Proportion Test and Confidence Interval	58
             3.2.2  Nonparametric Methods	60
                   3.2.2.1  The Sign Test	60
                   3.2.2.2  The Wilcoxon Signed Rank Test	61
         3.3  COMPARING TWO POPULATIONS	63
             3.3.1  Parametric Methods	65
                   3.3.1.1  Independent Samples	66
                   3.3.1.2  Paired Samples	71
             3.3.2  Nonparametric Methods	75
                   3.3.2.1  Independent Samples	75
                   3.3.2.2  Paired Samples	81
         3.4  COMPARING SEVERAL POPULATIONS SIMULTANEOUSLY	86
             3.4.1  Parametric Methods	88
                   3.4.1.1  Dunnett'sTest	88
             3.4.2  Nonparametric Methods	89
                   3.4.2.1  The Fligner-Wolfe Test	89
STEP 4:  VERIFY THE ASSUMPTIONS OF THE STATISTICAL METHOD	93
         4.1  OVERVIEW AND ACTIVITIES	96
         4.2  TESTS FOR DISTRIBUTIONAL ASSUMPTIONS	96
             4.2.1  Graphical Methods	98
             4.2.2  Normal Probability Plot Tests	98
             4.2.3  Coefficient of Skewness/Coefficient of Kurtosis Tests	98
             4.2.4  Range Tests	99
                   4.2.4.1  The Studentized Range Test	99
                   4.2.4.2  Geary's Test	100
             4.2.5  Goodness-of-Fit Tests	100
                   4.2.5.1  Chi-Square Test	100
                   4.2.5.2  Tests Based on the Empirical Distribution Function	102
             4.2.6  Recommendations	102
         4.3  TESTS FOR TRENDS	102
             4.3.1  Introduction	102
             4.3.2  Regression-Based Methods for  Estimating and Testing for Trends	103
                   4.3.2.1  Estimating a Trend Using the Slope of the Regression Line	103
                   4.3.2.2  Testing for Trends Using Regression Methods	104
             4.3.3  General Trend Estimation Methods	104
                   4.3.3.1  Sen's Slope Estimator	104
                   4.3.3.2  Seasonal Kendall Slope Estimator	104
             4.3.4  Hypothesis Tests for Detecting  Trends	104
                   4.3.4.1  One Observation per Time Period for One Sampling Location 106
EPA QA/G-9S                              vi                               February 2006

-------
                                                                              Page
                   4.3.4.2 Multiple Observations per Time Period for One Sampling
                          Location	109
                   4.3.4.3 Multiple Sampling Locations with Multiple Observations	110
                   4.3.4.4 One Observation for One Station with Multiple Seasons	113
             4.3.5  A Discussion on Tests for Trends	113
             4.3.6  Testing for Trends in Sequences of Data	114
        4.4  OUTLIERS	115
             4.4.1  Background	115
             4.4.2  Selection of a Statistical Test for Outliers	116
             4.4.3  Extreme Value Test (Dixon's Test)	117
             4.4.4  Discordance Test	117
             4.4.5  Rosner'sTest	119
             4.4.6  Walsh's Test	120
             4.4.7  Multivariate  Outliers	120
        4.5  TESTS FOR DISPERSIONS	122
             4.5.1  Confidence Intervals for a Single Variance	122
             4.5.2  The F-Test for the Equality of Two Variances	123
             4.5.3  Bartletfs Test for the Equality of Two or More Variances	123
             4.5.4  Levene's Test for the Equality of Two or More Variances	123
        4.6  TRANSFORMATIONS	126
             4.6.1  Types of Data Transformations	127
             4.6.2  Reasons for Transforming Data	128
        4.7  VALUES BELOW DETECTION LIMITS	130
             4.7.1  Approximately less than 15% Non-detects - Substitution Methods	131
                   4.7.1.1 Aitchison's Method	131
             4.7.2  Between Approximately 15% - 50% Non-detects	132
                   4.7.2.1 Cohen's Method	132
                   4.7.2.2  Selecting Between Aitchison's Method and Cohen's Method ..132
             4.7.3  Greater than Approximately 50% Non-detects - Test of Proportions	134
             4.7.4  Greater than Approximately 90% Non-detects	135
             4.7.5  Recommendations	135
        4.8  INDEPENDENCE	136
STEP 5: DRAW CONCLUSIONS FROM THE DATA	139
        5.1  OVERVIEW AND ACTIVITIES	141
        5.2  PERFORM THE STATISTICAL METHOD	141
        5.3  DRAW STUDY CONCLUSIONS	141
             5.3.1  Hypothesis Tests	141
             5.3.2  Confidence Intervals or Limits	142
             5.3.3  Tolerance Intervals or Limits	143
        5.4  EVALUATE PERFORMANCE OF THE SAMPLING DESIGN	143
        5.5  INTERPRET AND COMMUNICATE THE RESULTS	144
APPENDIX A:  STATISTICAL TABLES	145
APPENDIX B:  REFERENCES	183
If applicable, a list of tables, figures, and boxes is listed at the beginning of each chapter.
EPA QA/G-9S                             vii                              February 2006

-------
EPA QA/G-9S                                   viii                                   February 2006

-------
                                    INTRODUCTION

       Data Quality Assessment (DQA) is the scientific and statistical evaluation of
environmental data to determine if they meet the planning objectives of the project, and thus are
of the right type, quality, and quantity to support their intended use. DQA is built on a
fundamental premise: data quality is meaningful only when it relates to the intended use of the
data.  This guidance describes the technical aspects of DQA in evaluating environmental data
sets. A conceptual presentation of the DQA process is contained in Data Quality Assessment:
A Reviewer's Guide (EPA QA/G-9R) (U.S. EPA 2004).

       By using DQA, a reviewer can answer four important questions:

          1.  Can a decision (or estimate) be made with the desired level  of certainty, given the
              quality of the data?

          2.  How well did the sampling design perform?

          3.  If the same sampling design strategy is used again for a similar study, would the
              data be expected to support the same intended use with the desired level of
              certainty?

          4.  Is it likely that sufficient samples were taken to enable the reviewer to see an
              effect if it was really present?

       The first question addresses the reviewer's immediate needs.  For example, if the data are
being used for decision-making and provide evidence strongly in favor of one course of action
over another, then the decision maker can proceed knowing that the decision will be supported
by unambiguous data. However, if the data do not show sufficiently strong evidence to favor
one alternative, then the data analysis alerts the decision maker to this uncertainty. The decision
maker now is in a position to make an informed choice about how to proceed (such as collect
more or different data before making the decision, or proceed with the decision despite the
relatively high, but tolerable, chance of drawing an erroneous conclusion).

       The second question addresses how robust this sampling design is with respect to slightly
changing conditions. If the design is very sensitive to potentially disturbing influences, then
interpretation of the results may be difficult.  By addressing the second question, the reviewer
guards against the possibility of a spurious result arising from a unique set of circumstances.

       The third question addresses the problem of whether this could be considered a unique
situation where the results of this DQA only apply only to this situation and cannot be
extrapolated to other situations. It also addresses the suitability of using this data collection
design again for future projects. For example, if it is intended to use a certain sampling design at
a different location from where the design was first used, it should be determined how well the
design can be expected to perform given that the outcomes and environmental conditions of this
sampling event will be different from those of the original event.  As environmental conditions

EPA QA/G-9S                                  1                                 February 2006

-------
will vary from one location or one time to another, the adequacy of the sampling design should
be evaluated over a broad range of possible outcomes and conditions.

       The final question addresses the issue of whether sufficient resources were used in the
study.  For example, in an epidemiological investigation, was it likely the effect of interest could
be reliably observed given the limited number of samples actually obtained.

       The data life cycle comprises three steps: planning, implementation, and assessment.
During the planning phase, the Data Quality Objectives (DQO) Process (or any other systematic
planning procedure) is used to define criteria for determining the number, location, and timing of
samples to be collected in order to produce a result with the desired level of certainty.  This,
along with the sampling methods, analytical procedures, and appropriate quality assurance (QA)
and quality control (QC) procedures, is documented in the QA Project Plan. The implementation
phase consists of collecting data following the QA Project Plan specifications. At the outset of
the assessment phase, the data are validated and verified to ensure that the sampling and analysis
protocols specified in the QA Project Plan were followed, and that the measurement systems
performed in accordance with the criteria  specified in the QA Project Plan. Then, DQA
completes the data life cycle through determining if the performance and acceptance criterion
from the DQO planning objectives were achieved and to draw conclusions from the data.

       DQA involves five steps that begin with a review of the planning documentation and end
with an answer to the question posed during the planning phase of the study. These steps
roughly parallel the actions of an environmental statistician when analyzing a set of data. The
five steps, which are described in more detail in the following chapters of this guidance, are:

          1.  Review the project objectives and sampling design.
          2.  Conduct a preliminary data review.
          3.  Select the statistical method.
          4.  Verify the assumptions of the statistical method.
          5.  Draw  conclusions from the data.

       These five steps are presented in a linear sequence, but DQA is actually an iterative
process.  For example, if the preliminary data review reveals patterns or anomalies in the data set
that are inconsistent with the project objectives, then some aspects of the study planning may
have to be reconsidered in  Step  1.  Likewise, if the underlying assumptions of the statistical
method are not supported by the data, then previous steps of the DQA may have to be revisited.

       This guidance is written for a broad audience of potential data users, data analysts, and
data generators.  Data users (such as project managers or risk assessors who are responsible for
making decisions or producing estimates regarding environmental characteristics) should find
this guidance useful for understanding and directing the technical work of others who produce
and analyze data. Data analysts should find  this guidance to be a convenient compendium of
basic assessment tools.  Data generators (such as analytical chemists or field sampling specialists
responsible for collecting and analyzing environmental samples and reporting the resulting data
values) should find this guidance useful for understanding how their work will be used  and for

EPA QA/G-9S                                 2                                 February 2006

-------
providing a foundation for improving the efficiency and effectiveness of the data generation
process.

       This guidance presents background information and statistical tools for performing DQA;
a non-statistical discussion of DQA is to be found in the companion document Data Quality
Assessment: A Reviewer's Guide (EPA QA/G-9R) (U.S. EPA 2004). Each chapter corresponds
to a step in the DQA and begins with an overview of the activities to be performed for that step.
Following the overviews in Chapters 1, 2, 3, and 4, specific graphical or statistical tools are
described and step-by-step procedures are provided along with examples.  Chapter 5 gives some
advice on the interpretation of statistical tests. Appendix A contains statistical tables, and
Appendix B provides references and useful publications for in-depth statistical analyses.
EPA QA/G-9S                                 3                                 February 2006

-------
EPA QA/G-9S                                   4                                   February 2006

-------
                                            CHAPTER 1
                  STEP 1:  REVIEW DQOs AND THE SAMPLING DESIGN
THE DATA QUALITY ASSESSMENT PROCESS
          Review DQOs and Sampling Design
           Conduct Preliminary Data Review
              Select the Statistical Method
              Verify the Assumptions
           Draw Conclusions from the Data
 REVIEW DQOs AND SAMPLING DESIGN

Purpose

Review the DQO outputs, the sampling design, and
any data collection documentation for consistency.  If
DQOs have not been developed, define the statistical
method and specify tolerable limits on decision errors.


Activities

• Review Study Objectives
•Translate Objectives into Statistical Hypothesis
• Develop Limits on Decision Errors
• Review Sampling Design


Tools

• Statements of hypotheses
•Sampling design concepts
                              Step 1: Review DQOs and Sampling Design

             Review the objectives of the study.
             •    If DQOs were developed, then review the outputs from the DQO Process.
             •    If DQOs have not been developed, then ascertain what these objectives were.
            Translate the data user's objectives into a statement of the primary statistical hypothesis.
             •    If DQOs were developed, translate them into a statement of the primary hypothesis.
             •    If DQOs have not been developed, then ascertain what hypotheses or estimates were
                 developed.
            Translate the data user's objectives into limits on Type I or Type II decision errors.
             •    If DQOs have not been developed, document the data user's probable tolerable limits on
                 decision errors, width of gray region, and estimated preliminary values.
             •    If DQOs were developed, confirm the limits on decision errors.

            Review the sampling design and note any special features or potential problems.
             •    Review the sampling design for any potentially serious deviations.
EPA QA/G-9S
                                February 2006

-------
                                          CHAPTER 1

                 STEP 1: REVIEW DQOs AND THE SAMPLING DESIGN

1.1     OVERVIEW AND ACTIVITIES

        DQA begins by reviewing the key outputs from the planning phase of the data life cycle
such as the Data Quality  Objectives, the QA Project Plan, and any associated documents.  The
study objectives provide  the context for understanding the purpose of the data collection effort
and establish the qualitative and quantitative basis for assessing the quality of the data set for the
intended use.  The sampling design (documented in the QA Project Plan) provides important
information about how to interpret the data. By studying the sampling design, the analyst can
gain an understanding of the assumptions under which the design was developed, as well as the
relationship between these assumptions and the objectives.
        In the unfortunate instances when project
objectives have not been developed during the
planning phase of the study, it is necessary to
develop statements of the data user's objectives prior
to conducting the DQA. The primary purpose of
stating the data user's objectives prior to analyzing
the data is to establish appropriate criteria for
evaluating the quality of the data with respect to their
intended use. Analysts who are not familiar with the
DQO Process should refer to the Guidance for the
Data Quality Objectives Process (EPA QA/G-4)
(U.S.  EPA 2000), a book on statistical planning and
analysis, or a consulting statistician.  The seven steps
of the DQO Process are illustrated in Figure 1-1.

        If the project has been framed as a hypothesis
test, then the uncertainty limits can be expressed as
the data user's tolerance for committing false
rejection (Type I or a false positive) or false
acceptance (Type II or a false negative) decision
errors. A false rejection error occurs when the null
hypothesis is rejected when it is in fact true. A false
acceptance error occurs when the null hypothesis is
not rejected when it is in fact false. Other related
phrases in common use  include "level of
significance" which is equal to the probability of a
 Step 1. State the Problem
       Define the problem that motivates the study;
       Identify the planning team; examine budget, schedule.
 Step 2. Identify the Goal of the Study
       State how environmental data will be used in solving the
       problem; identify study questions; define alternative outcomes.
 Step 3. Identify Information Inputs
      Identify data and information needed to answer study questions.
 Step 4. Define the Boundaries of the Study
       Specify the target population and characteristics of interest;
       define spatial and temporal limits, scale of inference.
 Step 5. Develop the Analytic Approach
       Define the parameter of interest; specify the type of inference
       and develop logic for drawing conclusions from the findings.
        Statistical
     Hypothesis Testing

          I
Estimation and other
analytical approaches

    I
Step 6. Specify Performance or Acceptance Criteria
      Develop performance criteria for new data being collected,
  	acceptance criteria for data already collected.	
 Step 7. Develop the Detailed Plan for Obtaining Data
      Select the most resource effective sampling and analysis plan
      that satisfies the performance or acceptance criteria.
Figure 1-1.  The DQO Process
Type I error, and "power" which is equal to 1 minus probability of a Type II error.  Statistical
power really is a function that describes a "power curve" over a range of Type II errors. The
characteristics of the power curve can be a great importance in choosing the appropriate
statistical test.  For detailed information on how to develop false rejection and false acceptance
EPA QA/G-9S
                            February 2006

-------
decision error rates, see Chapter 6 of Guidance on Data Quality Objectives EPA QA/G-4 (U.S.
EPA 2000).

       If the project has been framed in terms of confidence intervals, then uncertainty is
expressed as a combination of two interrelated terms: the width of the confidence interval
(smaller intervals correspond to a smaller degree of uncertainty) or the confidence level that the
true value of the parameter of interest lies within the interval (a higher confidence level
represents a smaller degree of uncertainty).

       For the review of sampling design, recall that the key distinction in sampling design is
between authoritative or judgmental sampling (in which sample numbers and locations are
selected based on expert opinion) and probability sampling (in which sample numbers and
locations are selected based on randomization and each member of the target population has a
known probability of being included in the sample). Judgmental sampling should be considered
only when the objectives of the investigation are not of a statistical nature (for example, when
the objective of a study is to identify specific locations of leaks, or when the study is focused
solely on the sampling locations themselves).  Generally, conclusions drawn from authoritative
samples apply only to the individual samples; aggregation may result in severe bias due to lack
of representativeness and lead to highly erroneous conclusions. Judgmental sampling also
precludes the use of the sample for any purpose other than the original one. If judgmental
sample  data are used, great care should be taken in interpreting any statistical statements
concerning the conclusions to be drawn. Using some probabilistic statement with and
judgmental sample is incorrect and should be  avoided as it gives an illusion of correctness when
there is  none. The further the judgmental sample is from a truly random sample, the more
questionable the conclusions.  Guidance on Choosing a Sampling Design for Environmental
Data Collection (EPA QA/G-5S) (U.S.  EPA  2002) provides extensive information on sampling
design issues and their implications for data interpretation.

       The analyst should review the sampling design documentation with the data user's
objectives in mind.  Look for design features that support or contradict those objectives.  For
example, if the data user is interested in making a decision about the mean level of
contamination in an effluent stream over time, then composite samples may be an appropriate
sampling approach.  On the other hand, if the  data user is looking for hot spots of contamination
at a hazardous waste site, compositing should be used with caution, to avoid "averaging away"
hot spots.  Also, look for potential problems in the implementation of the sampling design.  For
example, if simple random sampling has been used verify that each point in space (or time) had
an equal probability of being selected for a simple random sampling design. Small deviations
from a sampling plan may have minimal effect on the conclusions drawn from the data set.
Significant or substantial deviations should be flagged and their potential effect carefully
considered throughout the entire DQA.  The most important point is to verify that the collected
data are consistent with how the QA Project Plan, Sampling and Analysis Plan, or overall
objectives of the study stated them to be.
EPA QA/G-9S                                 7                                 February 2006

-------
EPA QA/G-9S                                   8                                   February 2006

-------
                                            CHAPTER 2
                   STEP 2: CONDUCT A PRELIMINARY DATA REVIEW
THE DATA QUALITY ASSESSMENT PROCESS
          Review DQOs and Sampling Design
           Conduct Preliminary Data Review
              Select the Statistical Method
               Verify the Assumptions
            Draw Conclusions from the Data
 CONDUCT PRELIMINARY DATA REVIEW

Purpose

Generate statistical quantities and graphical
representations that describe the data.  Use this
information to learn about the structure of the data
and identify any patterns or relationships.


Activities

• Review Quality Assurance  Reports
• Calculate Basic Statistical Quantities
• Graph the Data


Tools

• Statistical quantities
• Graphical representations
                               Step 2: Conduct a Preliminary Data Review

         •    Review quality assurance reports.
             •   Look for problems or anomalies in the implementation of the sample collection and analysis
                 procedures.
             •   Examine QC data for information to verify assumptions underlying the Data Quality
                 Objectives, the Sampling and Analysis Plan, and the QA Project Plans.

         •    Calculate the statistical quantities.
             •   Consider calculating appropriate percentiles (Section 2.2.1).
             •   Select measures of central tendency (Section 2.2.2) and dispersion (Section 2.2.3).
             •   If the data involve two variables, calculate the correlation coefficient (Section 2.2.4).

         •    Display the data using graphical representations.
             •   Select graphical representations (Section 2.3) that illuminate the structure of the data set and
                 highlight assumptions underlying the Data Quality Objectives, the Sampling and Analysis
                 Plan, and the QA Project Plans.
             •   Use a variety of graphical representations that examine different features of the set.
EPA QA/G-9S
                                 February 2006

-------
                                     List of Boxes

                                                                                 Page
Box 2-1: Directions for Calculating Percentiles with an Example	13
Box 2-2: Directions for Calculating the Measures of Central Tendency	14
Box 2-3: Example Calculations of the Measures of Central Tendency	14
Box 2-4: Directions for Calculating the Measures of Dispersion	15
Box 2-5: Example Calculations of the Measures of Dispersion	15
Box 2-6: Directions for Calculating Pearson's Correlation Coefficient with an Example	17
Box 2-7: Directions for Calculating Spearman's Correlation Coefficient with an Example	19
Box 2-8: Directions for Estimating the Serial Correlation Coefficient with a Example	19
Box 2-9: Directions for Generating aHistogram	21
Box 2-10: Example of Generating aHistogram	21
Box 2-11: Directions for Generating a Stem and Leaf Plot	22
Box 2-12: Example of Generating a Stem and Leaf Plot	23
Box 2-13: Directions for Generating a Box and Whiskers Plot	24
Box 2-14: Example of a Box and Whiskers Plot	24
Box 2-15: Directions for Generating a Quantile Plot and a Ranked Data Plot	26
Box 2-16: Example of Generating a Quantile Plot	26
Box 2-17: Directions for Constructing a Normal Probability Plot	28
Box 2-18: Example of Normal Probability Plot	28
Box 2-19: Directions for Generating a Scatterplot and an Example	29
Box 2-20: Directions for Constructing an Empirical Q-Q Plot with an Example	32
Box 2-21: Directions for Generating a Time Plot and an Example	34
Box 2-22: Directions for Constructing a Correlogram	35
Box 2-23: Example Calculations for Generating a Correlogram	36
Box 2-24: Directions for Generating a Posting Plot and a Symbol Plot with an Example	38
EPA QA/G-9S                               10                               February 2006

-------
                                    List of Figures

                                                                                Page
Figure 2-1.  A Histogram of Concentration Frequencies	20
Figure 2-2.  A Histogram of Concentration Densities	20
Figure 2-3.  Example of a Box-Plot	23
Figure 2-4.  Example of a Quantile Plot of Right-Skewed Data	25
Figure 2-5.  Example of a Scatterplot	29
Figure 2-6.  Example of a Coded Scatterplot	30
Figure 2-7.  Example of a Parallel-Coordinate Plot	30
Figure 2-8.  Example of a Scatterplot Matrix	31
Figure 2-9.  Example of an Empirical Q-Q Plot	31
Figure 2-10. Example of a Time Plot	33
Figure 2-11. Example of a Lag Plot	34
Figure 2-12. Example of a Correlogram	35
Figure 2-13. Example of a Four-Plot	37
Figure 2-14. Example of a Posting Plot	38
Figure 2-15. Example of a Symbol Plot	39
Figure 2-16. Example of a Bubble Plot	39
Figure 2-17. Two Normal Curves, Common Mean, Different Variances	40
Figure 2-18. The Standard Normal Curve, Centered on Zero	40
Figure 2-19. Three Different Lognormal Distributions	41
EPAQA/G-9S                               11                               February 2006

-------
                                      CHAPTER 2

                STEP 2: CONDUCT A PRELIMINARY DATA REVIEW

2.1    OVERVIEW AND ACTIVITIES

       In this step of DQA, the analyst conducts a preliminary evaluation of the data set,
calculates some basic statistical quantities, and examines the data using graphical representations
The first activity in conducting a preliminary data review is to review any relevant QA reports
giving particular attention should be paid to information that can be used to check assumptions
made in the DQO Process. Of great importance are apparent anomalies in recorded data, missing
values, deviations from standard operating procedures, and the use of nonstandard data collection
methodologies.

       Graphs can be used to identify patterns and trends, to quickly confirm or disprove
hypotheses, to discover new phenomena, to identify potential problems, and to suggest corrective
measures.  Since no single graphical representation will provide a complete picture of the data
set, the analyst should choose different graphical techniques to illuminate different features of
the data. Section 2.3 provides descriptions and examples of common graphical representations.

       For a more extensive discussion of the overview and activities of this step, see Data
Quality Assessment:  A Reviewer's Guide (EPA QA/G-9R) (U.S.  EPA 2004).

2.2    STATISTICAL QUANTITIES

2.2.1   Measures of Relative Standing

       Sometimes the analyst is interested in knowing the relative position of one or several
observations in relation to all of the observations. Percentiles or quantiles are  measures  of
relative standing that are useful  for summarizing data.  A percentile is the data value that is
greater than or equal to a given percentage of the data values. Stated in mathematical terms, the
pih percentile is a data value that is greater than or equal top% of the data values and is less than
or equal to (!-/>)% of the  data values. Therefore, if V is the/>th percentile, then/>% of the values
in the data set are less than or equal to x, and (100-/?)% of the values are greater than x.  A
sample percentile may fall between a pair of observations. For example, the 75th percentile of a
data set of 10 observations is not uniquely defined.  Therefore, there are several methods for
computing sample percentiles, the most common of which is described in Box 2-1.

       Important percentiles usually reviewed are the quartiles of the data, the 25th, 50th, and 75th
percentiles. The 50th percentile is also called the sample median (Section 2.2.2), and the 25th and
75th percentile are used to estimate the dispersion of a data set (Section 2.2.3). Also important
for environmental data are the 90th, 95th, and 99th percentiles where a decision  maker would like
to be sure that 90%, 95%, or 99% of the contamination levels are below a fixed risk level.
EPA QA/G-9S                                 12                                 February 2006

-------
                   Box 2-1 : Directions for Calculating Percentiles with an Example

    LetXi, X2, ...,Xn represent the n data points.  To compute thepth percentile, y(p), first rank the data from
    smallest to largest and label these points X(i), Xpj, .  . ., X(n). The pth percentile is:
    where r= (n-1)p + 1, / = floor(r), f= r-i, and X(n+1) = X(n). Note that floor(r) means calculate rand then
    discard all decimals.

    Example:  The 90th and 95th percentile will be computed for the following 10 data points (ordered from
    smallest to largest) : 4, 4, 4, 5, 5, 6, 7, 7, 8, and 10 ppb.

    For the 95th percentile, r= (10-1)x0.95 + 1 = 9.55, / = floor(9.55) = 9, and f= 9.55-9 = 0.55. Therefore,
    the 95th percentile is y(0.95) = 0.45x8 + 0.55x10 = 9.1.

    For the 90th percentile, r= (10-1)x0.9 + 1 = 9.1, / = floor(9.1) = 9, and f=9.1 -9 = 0.1. Therefore, the 90th
    percentile  isy(0.9) = 0.9x8 + 0.1x10 = 8.2.
2.2.2  Measures of Central Tendency

       Measures of central tendency characterize the center of a data set. The three most
common estimates are the mean, median, and the mode. Directions for calculating these
quantities are contained in Box 2-2; examples are provided in Box 2-3.

       The most commonly used measure of the center of a data set is the sample mean, denoted
by X. The sample mean can be thought of as the "center of gravity" of the data set.  The sample
mean is an arithmetic average for simple sampling designs; however, for complex sampling
designs, such as stratification, the sample mean is a weighted arithmetic average.  The sample
mean is influenced by extreme values (large or small) and the treatment of non-detects (see
Section 4.7).

       The sample median, is the second most popular measure of the center of the data. This
value falls directly in the middle of the ordered data set. This means that 1A of the data are
smaller than the sample median and /^ of the data are larger than the sample median.  The
median is another name for the 50th percentile (Section 2.2.1).  The median is not influenced by
extreme values and can easily be used if non-detects are present.

       Another method of measuring the center of the data is the sample mode.  The sample
mode is the value that occurs with the greatest frequency.  Since the sample mode may not exist
or be unique, it is the least commonly used measure of center.  However, the mode is useful for
qualitative data.
EPAQA/G-9S                                 13                                  February 2006

-------
                  Box 2-2:  Directions for Calculating the Measures of Central Tendency

     LetXi, X2, ..., Xn represent the n data points.

     Sample Mean: The sample mean, X , is the sum of the data points divided by the sample size, n:
     Sample Median: The sample median, X , is the center of the ordered data set.  To compute the sample
     median, sort the data from smallest to largest and label these points X(i),X( 2),  . .  .,X(n).  Then,
                                 X = <
                                                          if " is even
                                      *((n+i)/2)   if " is odd
     Sample Mode: The sample mode is the value in the sample that occurs with the greatest frequency.  The
     sample mode may not exist or be unique.  Count the number of times each value occurs. The sample
     mode is the value that occurs most frequently.
                   Box 2-3: Example Calculations of the Measures of Central Tendency

    The following is an example of computing the sample mean, median, and mode for the 10 data points:
    4,4, 7, 7,4,  10,4, 3, 7, and 8.

    Sample mean:

         — _ 4 + 4 + 7 + 7 + 4 +  10 + 4 + 3 + 7 + 8_ 58   co
         X	5.8
                                  10                       10
     Sample median: The ordered data are: 3, 4, 4, 4, 4, 7, 7, 7, 8, and 10.  Since n = 10 is even, the sample
     median is:


         median = -[^(10/2) +  ^(10/2+1)] = -[x(s)  + X(6}] = -[4 +  ?] = 5.5
     Sample mode: Computing the number of times each value occurs yields:

       4 appears 4 times; 5 appears 0 times; 6 appears 1 time; 7 appears 3 times; 8 appears 1 time; and 10
       appears 1 time.

     As the value of 4 occurs most often, it is the sample mode of this data set.
EPA QA/G-9S                                    14                                    February 2006

-------
2.2.3   Measures of Dispersion

        Measures of central tendency are more meaningful if accompanied by a measure of the
spread of values about the center.  Measures of dispersion in a data set include the range,
variance, sample standard deviation, coefficient of variation, and the interquartile range.
Directions for computing these measures are given in Box 2-4; examples are given in Box 2-5.
                     Box 2-4: Directions for Calculating the Measures of Dispersion

    LetXi, Xi, ..., Xn represent the n data points.

    Sample Range: The sample range, R, is the difference between the largest and smallest values of the
    data set, i.e., R = max(X/) - min(X,).
    Sample Variance: To compute the sample variance, s2, compute: s2 =—	-^—	^—

    Sample Standard Deviation:  The sample standard deviation, s, is the square root of the sample variance,
    i.e.,  s=vs2

    Coefficient of Variation:  The coefficient of variation (CV) is the sample standard deviation divided by the
    sample mean (Section 2.2.2), i.e., CV = s/X . The CV is often expressed as a percentage.

    Interguartile Range:  The interquartile range (IQR) is the  difference between the 75th and the 25th
    percentiles, i.e., IQR = y(75)  -y(25).
                     Box 2-5:  Example Calculations of the Measures of Dispersion

    The directions in Box 2-4 and the following 10 data points (in ppm): 4, 5, 6, 7, 4, 10, 4, 5, 7, and 8, are
    used to calculate the measures of dispersion. From Box 2-3, X = 6 ppm .

    Sample Range: R = max(X,) - min(X,) = 10-4 = 6 ppm


                         (42 +52 +--- + S2)- — (4 + S + --- + 8)2   396- — -602
                         V              '10                    10
                      9   V              '10                    10
    Sample Variance:  s = - — - = - — - = 4 ppm
    - K -                   10-1                     9         ^

    Sample Standard Deviation:  s = vs2 =v4=2ppm

    Coefficient of Variation:  CV = s/X = 2ppm/6ppm = 0.33 = 33%

    Interguartile Range: Using the  directions in Section 2.2.1 to compute the 25th and 75th percentiles of the
    data are:  y(25) =X(3) = 4.25 ppm and y(75) =X(8) = 7 ppm. The interquartile range (IQR) is the difference
    between these values:  IQR = y(75) - y(25) = 7 - 4.25 = 2.75 ppm
        The easiest measure of dispersion to compute is the sample range.  For small samples, the
range is easy to interpret and may adequately represent the dispersion of the data.  For large


EPAQA/G-9S                                    15                                   February 2006

-------
samples, the range is not very informative because it only considers extreme values and is
therefore greatly influenced by outliers.

       Generally speaking, the sample variance measures the average squared distance of data
points from the sample mean. A large sample variance implies the data are not clustered close to
the mean. A small sample variance (relative to the mean) implies most of the data are near the
mean. The sample variance is affected by extreme values and by a large number of non-detects.
The sample standard deviation is the square root of the sample variance and has the same unit of
measure as the data.

       The coefficient of variation (CV) is a measure having no units that allows the comparison
of dispersion across several sets of data.  The CV (also known as the relative standard deviation)
is often used in environmental applications because variability (when expressed as a standard
deviation) is often proportional to the mean.

       When extreme values are present, the interquartile range may be more representative of
the dispersion of the data than the standard deviation. This statistical quantity is the difference of
the 75l  and 25th percentiles and therefore, is not influenced by extreme values.

2.2.4   Measures of Association

       Data sets often include measurements of several characteristics (variables) for each
sampling point. There may be interest in understanding the relationship or level of association
between two or more of these variables.  One of the most common measures of association is the
correlation coefficient. The correlation coefficient measures the relationship between two
variables, such as a linear relationship between two sets of measurements. Note that the
correlation coefficient does not imply cause and effect.  The analyst may  say the correlation
between two variables is high and the relationship is strong, but may not say an increase or
decrease in one variable causes the other variable to increase or decrease without further
evidence and strong statistical controls.

2.2.4.1   Pearson's Correlation Coefficient

       The Pearson correlation coefficient measures the strength of the linear relationship
between two variables. A linear association implies that as one variable increases, the other
increases or decreases linearly.  Values of the correlation coefficient close to +1 (positive
correlation) imply that as one variable increases, the other increases nearly linearly. On the other
hand, a correlation coefficient close to -1 implies that as one variable increases, the other
decreases nearly linearly. Values close to 0 imply little linear correlation between the variables.
When data are truly independent, the correlation between data points is zero (note, however, that
a correlation of 0 does not necessarily imply independence). Directions and an example for
calculating Pearson's correlation coefficient are contained in Box 2-6.
EPA QA/G-9S                                 16                                February 2006

-------
          Box 2-6: Directions for Calculating Pearson's Correlation Coefficient with an Example

    LetXi, X2, ..., Xn represent one variable of the n data points and let YI, Y2, ..., Yn represent a second
    variable of the n data points. The Pearson correlation coefficient, r, between X and Yis:
                          r =-
    Example:  Consider the following data set (in ppb):

    Sample No.     1     2    3    4
    Arsenic      8.0   6.0   2.0   1.0
    Lead        8.0   7.0   7.0   6.0
                                           V,2 =198,^]X/Y/ =8-8 + 6-7 + 2-7+1-6=126
                                         126-H^8-
                           and r =               4           = 0.865
I                                                    Oft Oft ^
                                    105--^—^-|198-
                                           4   A       4

    Since r is close to 1, there is a strong linear relationship between these two contaminants.
       The correlation coefficient does not detect nonlinear relationships so it should be used
only in conjunction with a scatter plot (Section 2.3.7.2). A scatter plot can be used to determine
if the correlation coefficient is meaningful or if some measure of nonlinear relationships should
be used.  The correlation coefficient can be significantly influenced by extreme values so a
scatter plot should be used first to identify such values.

       An important property of the correlation coefficient is that it is unaffected by changes in
location of the data (adding or subtracting a constant from all of the X measurements or all the 7
measurements) or by changes in scale of the data (multiplying or dividing a\\X or all 7 values by
a positive constant). Thus, linear transformations on the Xs and 7s do not affect the correlation
of the measurements.  This is reasonable since the correlation reflects the degree to which
linearity between X and 7 measurements occur and the degree of linearity is unaffected by
changes in location or scale. For example, if a variable was temperature in Celsius, then the
correlation would not change if Celsius was converted to Fahrenheit.

       On the other hand, if nonlinear transformations of the X and/or 7 measurements are made,
then the Pearson correlation between the transformed values will differ from the correlation of
the original measurements.  For example, if Xand 7, respectively, represent PCB and dioxin
concentrations in soil, and x = log(X) andy = log(7), then the Pearson correlations between X and

EPA QA/G-9S                                  17                                  February 2006

-------
7, Xandy, x and 7, and x andy, will all be different, in general, since the logarithmic
transformation is a nonlinear transformation.

       Pearson's correlation may be sensitive to the presence of one or two extreme values,
especially when sample sizes are small.  Such values may result in a high correlation, suggesting
a strong linear trend, when only moderate trend is present. This may happen, for instance, if a
single (X,Y) pair has very high values for both Xand 7 while the remaining data values are
uncorrelated.  Extreme values may also lead to low correlations between Xand 7, thus tending to
mask a strong linear trend.  This may happen if all the (X,Y) pairs except one (or two) tend to
cluster tightly about a straight line, and the exceptional point has a very large lvalue paired with
a moderate or small 7 value (or vice versa). As influences of extreme values can be important, it
is wise to use a scatter plot (Section 2.3.7.2) in conjunction with a correlation coefficient.

2.2.4.2   Spearman's Rank Correlation

       An alternative to the Pearson correlation is Spearman's rank correlation coefficient. It is
calculated by first replacing each lvalue by its rank (i.e., 1 for the smallest lvalue, 2 for the
second smallest lvalue, etc.) and each 7 value by its rank. These pairs of ranks are then treated
as the (X,Y) data and Spearman's rank correlation is calculated using the same formulae as for
Pearson's correlation (Box 2-6).  Directions and an example for calculating the Spearman's Rank
correlation coefficient are contained in Box 2-7.

       Since meaningful (i.e., monotonic increasing) transformations of the data will not alter
the ranks of the respective variables (e.g., the ranks for log (X) will be the same for the ranks for
X),  Spearman's correlation will not be altered by nonlinear increasing transformations of the Xs
or the 7s. For instance, the Spearman correlation between PCB and dioxin concentrations (X and
7) in  soil will be the same as the correlations between their logarithms (x and_y).  This desirable
property, and the fact that Spearman's correlation is less sensitive to extreme values, makes
Spearman's correlation a good alternative or complement to Pearson's correlation coefficient.

2.2.4.3   Serial Correlation

       For a sequence of data points taken serially in time, or one-by-one in a row, the serial
correlation coefficient can be calculated  by replacing the sequencing variable by the numbers 1
through n and calculating Pearson's correlation coefficient with x being the actual data values,
and y being the numbers 1 through n.  For example, for a sequence of data collected at a waste
site along a straight transit line, the distances on the transit line of the data points are replaced by
the numbers 1 through w, e.g., first 10-foot sample point = 1, the 20-foot sample point = 2, the
30-foot sample point = 3, etc., for samples taken at 10-foot intervals. Directions for the Serial
correlation coefficient, along with an example, are given in Box 2-8.
EPAQA/G-9S                                 18                                February 2006

-------
          Box 2-7: Directions for Calculating Spearman's Correlation Coefficient with an Example

     LetXi, X2,  ..., Xn represent a set of ranks of the n data points of one data set and let YI,  Y2, ..., Yn
     represent a set of ranks of a second variable of the n data points. The Spearman Rank correlation
     coefficient, r,  between X and Yis computed by:
                            r =-
                                                        °      1   °
                                                       y//24y//
     Example:  Concentrations of arsenic and lead are taken at 4 sample locations. The following table gives
     the data set (in ppb) ranked according to the arsenic values. Ranks are given in parentheses.  Note that
     any tied values are assigned average rank.

       Sample No.        4          3          2           1
       Arsenic         1.0(1)      2.0(2)      6.0(3)       8.0(4)
       Lead           6.0(1)      7.0(2.5)    7.0(2.5)     8.0(4)

            4444           4
     Now,  VX, =10,yY =10,yX,2 =30,VY2 =29.5,VX,Y, =1-1+2-2.5 + 3-2.5 + 4-4 = 29.5
           ^^^  '     ^^^  '     ^^^  '      ^^^ '        ^^^   ' '
                                            29.5-
                                                  10-10
                             and r =
               = 0.9487
                                                   29.5-
                                              4   A        4

     Since r is close to 1, there is a strong linear relationship between these two contaminants when ranked.
Box 2-8: Directions
for Estimating
the Serial Correlation Coefficient with a Example
Directions: LetXi,X2, . . . ,Xn represent the data values collected in sequence over equally spaced periods of
time. Label the periods of time 1, 2, ..., n to match the data values. Use the directions in Box 2-6 to calculate the
Pearson's Correlation Coefficient between the data X and the time periods Y.
Example:
Time
Reading
Time
Periods
The following are hourly readings from
12:00
6.5
1
13:00
6.6
2
14:00
6.7
3
15:00
6.4
4
16:00
6.3
5
a discharge monitor.
17:00
6.4
6
18:0
0
6.2
7
19:00
6.2
8
20:00
6.3
9
Using Box 2-6, with the readings being the Xvalues and the Time Periods being
correlation coefficient of 0.4318.
21:00
6.6
10
22:00
6.8
11
23:00 24
6.9
12
00
.0
3
the Yvalues gives a serial
EPA QA/G-9S
19
February 2006

-------
2.3    GRAPHICAL REPRESENTATIONS
2.3.1   Histogram

       A histogram is a visual representation of
the data collected into groups. This graphical
technique provides a visual method of identifying
the underlying distribution of the data. The data
range is divided into several bins or classes and the
data is sorted into the bins. A histogram is a bar
graph conveying the bins and the frequency of data
points in each bin.  Other forms  of the histogram
use a normalization of the bin frequencies for the
heights of the bars.  The two most common
normalizations are relative frequencies
(frequencies divided by sample size) and densities
(relative frequency divided by the bin width).
Figure 2-1 is  an example of a histogram using
frequencies and Figure 2-2 is a histogram of
densities.
Number of Observations
O hJ .&. O 00 C


-








	 |
) 5 10 15 20 25 30 35 40
Concentration (ppm)
                                                 Figure 2-1. A Histogram of
                                                            Concentration Frequencies
Percentage of Observations (per p
O M ^ O)
-








) 5 10 15 20 25 30 35 4!
Concentration (ppm)
                                                Histograms provide a visual method of
                                         accessing location, shape and spread of the data.
                                         Also, extreme values and multiple modes can be
                                         identified.  The details of the data are lost, but an
                                         overall picture of the data is obtained.  The stem-
                                         and-leaf plot described in the next section offers the
                                         same insights into the data as a histogram, but the
                                         data values are retained. Therefore, stem-and-leaf
                                         plots can be more informative than histograms for
                                         smaller data sets.
                                                The visual impression of a histogram is
                                         sensitive to the choice of bins. A large number of
                                         bins will increase data detail while fewer bins will
                                         increase the smoothness of the histogram. A good
starting point when choosing the number of bins is the square-root of the sample size. Another
factor in choosing bins is the choice of endpoints. Using simple bin endpoints can improve the
readability of the histogram. Simple bin endpoints include multiples of 5k units for some integer
k.  For example, 0 to <5, 5 to <10, etc. (Figure 2-1),  or 1  to <1.5, 1.5 to <2, etc. Finally,  when
plotting a histogram for a continuous variable, e.g., concentration, it is necessary to decide on an
endpoint convention; that is, what to do with data points that fall on the boundary of a bin. With
discrete variables, (e.g., family size) the intervals can be centered in between the variables.  For
the family size data, the intervals can  span between 1.5 and 2.5,  2.5 and 3.5, and so on. Then the
whole numbers that relate to the family size can be centered within the box. Directions for
generating a histogram are contained in Box 2-9 and  an example is contained in Box 2-10.
Figure 2-2. A Histogram of
           Concentration Densities
EPA QA/G-9S
                                           20
February 2006

-------
                             Box 2-9: Directions for Generating a Histogram

     STEP 1:   Partition the data range into several bins. Partitioning means the bins should be connected, but
              not overlapping.  For example, 0 to <5, 5 to <10, etc.  In almost all cases, the bin widths should
              be the same. When selecting bins, first choose a number of bins.  A good rule of thumb is the
              square root of the sample size. Next,  select bin endpoints that are simple (for example,
              multiples of 5k for some integer k) and will produce approximately the number of desired bins.

     STEP 2:   Place each data point into the proper bin. This creates a summary of the data called a
              frequency table.  If desired, compute one or both types of normalizations of the frequencies.
              The relative frequency is the  frequency divided by the sample size. The density is the relative
              frequency divided by the bin width.

     STEP 3:   Determine the horizontal axis based on the range of the frequencies or normalized frequencies.

     STEP 4:   A histogram is a bar graph of the  frequencies or normalized frequencies. For each bin, draw a
              bar using the bin endpoints on the x-axis as the width and the  frequency or normalized
              frequency on the y-axis as the height.
Conside
28.6, 17
STEP 1:
STEP 2:
STEP 3:
STEP 4:
Box 2-10: Example of Generating a Histogram
rthe following 22 measurements of a contaminant concentration (in ppm): 17.7, 17.4, 22
2 191 <4 72 <4 152 147 149 109 124 124 116 147 102 52 165 and 8 S
With 22 data points, a rough est
ranges from 0 to 40, the suggest
using multiples of 5 to create sirr
bins which is close to suggested
Column 2 of the frequency table
each bin defined in Step 1. The
frequency divided by the sample
relative frequencies divided by th
The horizontal axis for the
data is from 0 to 40 ppm.
The vertical axis for the
histogram of frequencies is
from 0-10 and the vertical
axis for the histogram of
densities is from 0% - 10%.
The histogram of frequencies
is shown in Figure 2-1 and
the histogram of densities is
shown in Figure 2-2.
mate of the number of bins is ^22=4.69 or 5. Since
ed bins are 0 to <8, 8 to <16, etc. This is a little awkw
ipler bins leads to 0 to <5, 5 to <10, etc. This choice le
number and are the bins that will be used.
below shows the frequency or number of observations
third column shows the relative frequencies which is th
size. The final column of the table gives the densities
le bin widths.
Relative
Bin Frequency Frequency
0- 5 ppm 2 9.09
5 -10 ppm 3 13.64
10 -15 ppm 8 36.36
15 -20 ppm 6 27.27
20 -25 ppm 1 4.55
25 -30 ppm 1 4.55
30 -35 ppm 0 0.00
35 -40 ppm 1 4.55
8, 35.5,
.
the data
ard and
ads to 8
within
e
or the
Density
1.8
2.7
7.3
5.5
0.9
0.9
0.0
0.9

2.3.2   Stem-and-LeafPlot

         The stem-and-leaf plot is used to show both the data values and visual information
about the distribution of the data.  Like a histogram, a stem-and-leaf plot is visual representation
EPAQA/G-9S                                    21                                     February 2006

-------
of the data collected into groups.  However, data detail is retained in a stem-and-leaf.  As with
histograms, stem-and-leaf plots provide an understanding of the shape of the data (and the
underlying distribution), i.e., location, shape, spread, number of modes and identification of
potential outliers. Stem-and-leaf is also sensitive to bin selection. A stem-and-leaf plot can be
more useful than a histogram in analyzing small data sets because it not only allows a
visualization of the data distribution, but enables the data to be reconstructed.

        First, bins that divide the data range are selected in a similar manner as for a histogram;
these are the stems.  Since the number of data points is typically small the number of bins or
stems should be approximately 4 to 8. Data points are then sorted into the proper stem.  Each
observation in the stem-and-leaf plot consists of two parts: the stem and the leaf. The stem is
generally made up of the leading digit or digits of the data values while the leaf is made up of the
trailing digit or digits. The  stems are displayed on the vertical axis and the trailing digits of the
data points make up the leaves.  Changing the stem can be accomplished by increasing or
decreasing the  leading digits that are used,  dividing the groupings of one stem (i.e., all numbers
which  start with the numeral 6 can be divided into smaller groupings), or multiplying the data by
a constant factor (i.e., multiply the data by  10 or 100).  Non-detects can be placed at the detection
limit with the leaf indicating the observation was actually a nondetect. Directions for
constructing a stem-and-leaf plot are given in Box 2-11 and an example is contained in
Box 2-12.
                      Box 2-11: Directions for Generating a Stem and Leaf Plot

   LetX-i, X2, ..., Xn represent the n data points. To develop a stem-and-leaf plot, complete the following steps:

   STEP 1: Arrange the observations in ascending order.  The ordered data is labeled (from smallest to largest)
           X(1),X(2),  ...,X(n).

   STEP 2: Choose either one or more of the leading digits to be the stem values. For example, if data ranges
           from 0 to 30, then the best  choice for stems would be the values of the tens column, 0, 1,2, 3.

   STEP 3: List the stem values from smallest to largest along the vertical axis. Enter the trailing digits for each
           data point as the leaf. The leaves extend to the right of the stems. Continuing the example from
           above, the values of the ones column (and possible one decimal place) would be used as the leaf.
EPA QA/G-9S                                  22                                  February 2006

-------
                        Box 2-12: Example of Generating a Stem and Leaf Plot

   Consider the following 22 samples of trifluorine (in ppm):  17.7, 17.4, 22.8, 35.5, 28.6, 17.2, 19.1, <4, 7.2, <4,
   15.2, 14.7, 14.9, 10.9, 12.4, 12.4, 11.6, 14.7, 10.2, 5.2, 16.5, and 8.9.

   STEP1: Arrange the observations in ascending order: <4, <4, 5.2, 7.7, 8.9, 10.2, 10.9, 11.6, 12.4, 12.4,
           14.7, 14.7,  14.9, 15.2,  16.5, 17.2, 17.4, 17.7, 19.1, 22.8, 28.6, 35.5.

   STEP 2: The data ranges from 0 to 40 so the best choice for the stems is the tens column.  Initially, this gives
           4 stems.  We can divide the stems in half to give stems of 0 to <5, 5 to <10, etc. This gives 8 stems.

   STEP 3: List the stem values along the vertical axis from smallest to largest.  Enter the leaf (the remaining
           digits) values in order from lowest to highest to the right of the stem.

                 0 |<4 <4
                 0 j 5.2 7.7 8.9
                 1  | 0.2 0.9 1.6  2.4 2.4 4.7 4.7  4.9
                 1  | 5.2 6.5 7.2  7.4 7.7 9.1
                 2 j 2.8
                 2 | 8.6
                 3 I
                 3 | 5.5

   Inspection of the stem-and-leaf indicates the data values are centered in the low teens with a slightly right-
   skewed distribution.  Summary statistics can be calculated to strengthen these visual conclusions.
2.3.3  Box-and-Whiskers Plot

       Box and Whisker plots, also known as box-plots, are useful in
situations where a picture of the distribution is desired, but it is not
necessary or feasible to portray all the details of the data.  A box-plot
(see Figure 2-3) displays several percentiles of the data set.  It is a
simple plot, yet provides insight into the location, shape, and spread
of the data and underlying distribution. A simple box-plot contains
only the Oth (minimum data value), 25th, 50th, 75th and 100th
(maximum data value) percentiles.  A more complex version includes
identification of the mean with a plus-sign and potential outliers
(identified by stars).  Since the box-plot is compact (essentially one-
dimensional), several can be placed on to a single graph.  This allows
a simple method to compare the locations, spreads and shapes of
several data sets or different groups within a single data set. In this
situation, the width of the box can be proportional to the sample size
of each data set.
                        Figure 2-3. Example of a
                                    Box-Plot
       A box-plot divides the data into 4 sections, each containing 25% of the data.  The length
of the central box indicates the spread of the data (the central 50%) while the length of the
whiskers shows the breadth of the tails of the distribution.  The box-plot also demonstrates the
shape of the data in the following manner. If the upper box and whisker are approximately the
same length as the lower box and whisker, then the data are distributed symmetrically. If the
upper box and whisker are longer than the lower box and whisker, then the data are right-
EPA QA/G-9S
23
February 2006

-------
skewed.  If the upper box and whisker are shorter than the lower box and whisker, then the data
are left-skewed.  Directions for generating a box and whiskers plot are contained in Box 2-13,
and an example is contained in Box 2-14.

       If the mean is added to the box-plot, then recall comparing the mean and median provides
another method of identifying the shape of the data. If the mean is approximately equal to the
median, then the data are distributed symmetrically. If the mean is greater than the  median, then
the data are right-skewed; if the mean is less than the median, then the data are left-skewed.
                      Box 2-13:  Directions for Generating a Box and Whiskers Plot

  STEP 1:    Compute the 0th (minimum value), 25th, 50th (median), 75th and 100th (maximum value) percentiles.

  STEP 2:    Plot these points either vertically or horizontally. If more than one box-plot is drawn, then the other
             axis can be used to identify the box-plots of the different groups or data sets. Draw a box around
             the 25th and 75th percentiles and add a line through the box at the 50th percentile. Optionally,
             make the width of the box proportional to the sample size.

  STEP 3:    If desired, compute the mean and indicate this value on the box-plot with a plus-sign. Also, identify
             potential outliers if desired. A potential outlier is a value at a distance greater than 1.5xlQR from the
             closest end of the box.

  STEP 4:    Draw the whiskers from each end of the box to the furthest data point that has not been identified as
             an outlier. Plot the potential outliers using asterisks.
           Box 2-14: Example of a Box and Whiskers Plot

  Consider the following 22 samples of trifluorine (in ppm) listed in order
  from smallest to largest: 4.0,6.1,9.8, 10.7, 10.8, 11.5, 11.6, 12.4,
  12.4, 14.6, 14.7, 14.7, 16.5, 17.0, 17.5, 20.6, 20.8, 25.7, 25.9, 26.5,
  32.0, and 35.5.

  STEP 1:  The minimum value is 4.0, y(25) = 11.5, the median is 14.7,
           y(75) = 20.8, and the maximum value is 35.5.

  STEP 2:  The simple box-plot is drawn using the summary statistics
           calculated in step 1

  STEP 3:  The mean of the data is 16.9. The only potential outlier is
           35.5 because 35.5 -20.8 = 14.7 which is greater than
           1.5xlQR= 13.95.

  STEP 4:  The lower whisker extends from 11.5 to 4.0 and the upper
           whisker extends from 20.8 to 32.0

  Examining the box-plot, it is easy to see that the data are centered at
  15 ppm and 50% of the data lie between approximately 12 ppm and
  21 ppm. Also, since the upper box and whisker are longer than the
  lower box and whisker, the distribution of the data is right-skewed.
EPA QA/G-9S
24
February 2006

-------
2.3.4  Quantile Plot and Ranked Data Plots

       The quantile plot and the ranked data plot are two further methods of visualizing the
location, spread and shape of the data. Both plots are displays of the ranked data and differ only
in the horizontal axis labels.  No subjective decisions such as bin size are necessary and all of the
data is plotted rather than summaries.

       A ranked data plot is a display of the data from smallest to largest at evenly spaced
intervals. A quantile plot is a graph of the ranked data versus the fraction of data points it
exceeds. Therefore, a quantile plot can be used to read the quantile information such as the
median, quartiles, and the interquartile range.  This additional information can aid in the
interpretation of the shape of the data.

       Using either the quantile plot or the ranked data plot, the spread of the data may be
visualized by examining the slope of the graph.  The closer the general slope is to 0, the lower
the variability of the data set.

       Also using either plot, the shape of the data  maybe determined by inspecting the tails of
the graph.  If the left and right tails have approximately the same curvature, then the data are
distributed symmetrically. If the
curvature of the right-tail is greater
than the curvature of the left-tail, then
the data are right-skewed. This is the
case of the data plotted in Figure 2-4.
If the curvature of the left-tail is
greater than the curvature of the right-
tail, then the data are left-skewed.
Finally, the degree of curvature in the
tails of either plot is proportional to
the length of the tail of the data.  In
Figure 2-4, the plot rises slowly to a
point, then the  slope increases
dramatically. This implies there is a
large concentration of small data
values and relatively few large data
values. In this  case, the left-tail of the
data is very short and the right-tail is
relatively long.
                       Interquartile Range
                       Lower    Upper
                       Quartile   Quartile


             0.2
     0.4        0.6
Fraction of Data (f-values)
0.8
Figure 2-4. Example of a Quantile Plot of Right-
            Skewed Data
       Directions for developing a quantile plot are given in Box 2-15 and an example is given
in Box 2-16.
EPA QA/G-9S
       25
                        February 2006

-------
               Box 2-15: Directions for Generating a Quantile Plot and a Ranked Data Plot

    LetX|,X2, ..., Xn represent the n data points, and Xm, X(2), ..., X(n) be the ordered data from smallest to
    largest.  Compute the fractions f,= (i - Q.5)/n for/= 1	n. The quantile plot is a plot of the pairs (f,, X(,)),
    with straight lines connecting consecutive points. If desired, add vertical lines indication the quartiles,
    median, or other quantiles of interest.

    Alternatively, a ranked data plot can be made by simply plotting the ordered Xvalues at equally spaced
    intervals along the horizontal axis.
Box 2-16: Example of Generating a Quantile Plot
Consider the following 10 data points (in ppm): 4, 5, 6, 7, 4, 10, 4, 5, 7, and 8. The data ordered from
smallest to largest, X(,), are shown in the second row of the table below and the third row displays the
values fj for each / where f,= (/ - Q.5)/n.
i 123456789 10
X(/) 4 4 4 5 5 6 7 7 8 10
fi 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
The pairs (f,, X(,)) are then f
10
? 8
Q.
a.
ro
4
C
slotted to yield the following quantile plot:
r
/(
./

i , i , i , i ,
D 0.2 0.4 0.6 0.8 1




Fraction of Data (f-values)
Note that the graph curves upward. This indicates that the data are skewed to the right.
2.3.5  Quantile-Quantile Plots and Probability Plots

       There are two types of quantile-quantile plots or q-q plots. One is an empirical q-q plot
which involves plotting the quantiles of two data sets against each other.  This is a technique to
determine if the data sets were generated by the same underlying distribution and is discussed in
Section 2.3.6.3.  The other type of q-q plot involves graphing the quantiles of a data set against
the quantiles of a specific probability distribution. This is a technique to determine if the data set
was generated by the theoretical distribution.  The following section will focus on the most
common of these plots for environmental data, the normal probability plot (normal q-q plot).
However, the discussion holds for other probability  distributions.  The normal probability plot is
EPA QA/G-9S
26
February 2006

-------
a visual method to roughly determine how well the data set is modeled by a normal distribution.
Formal tests are contained in Chapter 4, Section 2. Directions for developing a normal
probability plot are given in Box 2-17 and an example is given in Box 2-18.  A discussion of the
normal distribution is contained in Section 2.4.

       A normal probability plot is the graph of the quantiles of a data set against the quantiles
of the standard normal distribution.  This can be accomplished by using a software package,
plotting the sample quantiles against standard normal quantiles, or plotting the sample quantiles
on normal probability paper. If the graph is approximately linear (the correlation coefficient
being fairly high excluding outliers), then this is an indication that the data are normally
distributed and a formal test should be performed. If the graph is not linear, then the departures
from linearity give important information about how the data distribution deviates from a normal
distribution.

       A nonlinear normal probability plot may be used to interpret the shape and tail behavior
of the data. First, add the quartile line, the line through the first and third quartiles,  to the plot.
Next, examine the relationship of the tails of the normal probability plot to the quartile line.
Relationship of the q-qplot to the
quartile line
upper-tail
above
below
above
below
lower-tail
below
above
above
below
Distribution of the data in relation
to the normal distribution
shape
symmetric
symmetric
right-skewed
left-skewed
tail behavior
heavy
light
       A normal probability plot can also be used to identify potential outliers.  A data value (or
a few data values) much larger or much smaller will appear distant from the other values, which
will be concentrated in the center of the graph.

       As a final note, using a simple natural log transformation of the data, the normal
probability plot can be used to determine if the  data are well modeled by a lognormal
distribution. The lognormal is an important probability distribution when analyzing
environmental data where normality cannot be  assumed.

2.3.6   Plots for Two or More Variables

       Data often consist of measurements of several characteristics (variables) for each sample
point in the data set.  For example, a data set may consist of measurements of lead, mercury, and
chromium for each soil sample or may consist of daily concentration readings for several
monitoring sites. In this case, graphs may be used to compare and contrast different variables.
EPA QA/G-9S
27
February 2006

-------
                      Box 2-17:  Directions for Constructing a Normal Probability Plot

       LetX-i, X2, ..., Xn represent the n data points.  Many software packages will produce normal probability
       plots and this is the recommended method.  If one of these packages is unavailable, then use the
       following procedure.

       STEP 1:  Compute the fractions i-,= (/- Q.5)/n for/= 1	n.

       STEP 2:  Compute the normal quantiles for the f/. These are Y,- = ®~1(f/), where O~1(-) is the inverse
                of the standard normal cumulative density function. The  Y, can be found using Table A-1:
                locate an f, in the body of the table, then the Y, is the found on the edges.  For example,  if f,
                = 0.975, then Y, is 1.96.

       STEPS:  Plot the pairs (Y,, X,), for/=1	n.

       STEP 4:  If desired, then add the quartile line through the first and third quartiles. This acts as a guide
                to determine if the points follow a straight line, which may be fitted to the data.

       If the graph of these pairs approximately form a straight line, then the data may be well modeled by a
       normally distribution.  Otherwise, use the table above to determine the shape and tail behavior.
                               Box 2-18: Example of Normal Probability Plot

  Consider the following 15 data points:  5, 5, 6, 6, 8, 8, 9, 10, 10, 10, 10, 10, 12, 14, and 15.

  STEP 1:  The computed fractions, f,, are: 0.0333, 0.1000, 0.1667, 0.2333	0.9667.
  STEP 2:  Working inside-out in Table A-1, the
           normal quantiles for the fiare: -1.83, -
           1.28, -0.97, -0.73, -0.52, -0.34, -0.17,
           0.00,  0.17, 0.34, 0.52, 0.73,  0.97, 1.28,
           1.83.

  STEP 3:  The data and normal quantile pairs are
           plotted in the graph below.

  STEP 4:  Using Box 2-1, the first and third
           quartiles are 7 and 10, respectively. The
           first and third quartiles of the standard
           normal distribution are -0.67 and 0.67,
           respectively.  The quartile line is the line
           through the points (-0.67, 7) and (0.67,
           10), and has been added to the plot.

  The plot looks approximately linear, but deviates
  from the quartile line in the upper-tail. However,  no
  definite conclusion should be drawn from a plot of
  15 points. A formal test of normality (section 4.2)
  should be performed.
                     Normal Q-Q Plot
EPA QA/G-9S
28
February 2006

-------
       To compare and contrast several variables, collections of the single variable displays
described in previous sections are useful.  For example, the analyst may generate box-plots or
histograms for each variable using the same axis for all of the variables.  Separate plots for each
variable may be overlaid on one graph, such as overlaying quantile plots for each variable.
Another useful technique for comparing two variables is to place histograms (sometimes called
bi-histograms) or stem-and-leaf plots back to back. Additional plots to display two or more
variables are described in Sections 2.3.6.1 through 2.3.6.3. In many software packages, three
dimensional scatter plots can be rotated and compared in order to find hidden relationships and
anomalies.
2.3.6.1   Scatterplot

       For data sets consisting of multiple
observations per sampling point, a scatterplot is one
of the most powerful graphical tools for analyzing
the relationship between two or more variables.
Scatterplots are easy to construct for two variables
(Figure 2-5) and many software packages can
construct 3-dimensional scatterplots.  Directions for
constructing a 2-dimensional scatterplot are given in
Box 2-19 along with an  example.
   40
   30
                Chromium VI (ppb)
Figure 2-5. Example of a Scatterplot
Box 2-19: Directions for Generating a Scatterplot and an Example
LetXi, X2, ..., Xn represent one variable of the n data points and let YI, Y2, ..., Yn represent a second variable of
the n data points. The paired data can be written as (X, Y,) for/ = 1, ..., n. To construct a scatterplot, plot one
variable along the horizontal axis and the other variable along the vertical axis.
Ex
ax








ample: PCE values are displayed on the vertical axis and Chromium VI values are displayed on the horizontal
is of Figure 2-5.
Obs
1
2
3
4
5
6
7
8

PCE
(PPb)
14.49
37.21
10.78
18.62
7.44
37.84
13.59
4.31

Chromium
VI (ppb)
3.76
6.92
1.05
6.30
1.43
6.38
5.07
3.56











Obs
9
10
11
12
13
14
15
16

PCE
(PPb)
2.23
3.51
6.42
2.98
3.04
12.60
3.56
7.72

Chromium
VI (ppb)
0.77
1.24
3.48
1.02
1.15
5.44
2.49
3.01











Obs
17
18
19
20
21
22
23
24
25
PCE
(PPb)
4.14
3.26
5.22
4.02
6.30
8.22
1.32
7.73
5.88
Chromium
VI (PPb)
2.36
0.68
0.65
0.68
1.93
3.48
2.73
1.61
1.42










       A scatterplot can clearly show the relationship between two variables if the data range is
sufficiently large.  Truly linear relationships can always be identified in scatter plots, but truly
nonlinear relationships may appear linear (or some other form) if the data range is relatively
small.  Scatterplots of linearly correlated variables cluster about a straight line. As an example of
a nonlinear relationship, consider two variables where one variable is approximately equal to the
square of the other. With an adequate range in the data, a scatterplot of this data would display a
EPA QA/G-9S                                 29                                 February 2006

-------
partial parabolic curve. Other important modeling relationships that may appear are exponential
or logarithmic.  Two additional uses of scatterplots are the identification of potential outliers for
a single variable or for the paired variables and the identification of clustering in the data.

2.3.6.2   Extensions of the Scatterplot

       It is easy to construct a 2-dimensional scatterplot by hand and many software packages
can construct a useful 3-dimensional scatterplot.  However, with more than 3 variables, it is
difficult to construct and interpret a scatterplot. Therefore, several graphical representations
have been developed that extend the idea of a scatterplot for data consisting of more than 2
variables.
       The simplest of these graphical
techniques is a coded scatterplot. All
possible two-way combinations are given
a code and the pairs of data are plotted on
one scatterplot. The coded scatterplot
does not provide information on three-
way or higher interactions between the
variables.  If the data ranges for the
variables are comparable, then a single
set of axes may suffice, if greater data
ranges, different scales will be required.
As an example, consider a data set of 3
variables, A, B, and C and assume the all
of the data ranges are similar. An analyst
may choose to display the pairs  (At, B,)
using a small circle, the pairs (At, C,)
using a plus-sign, and the pairs (Bf, Q) using a square on one scatterplot.  The completed coded
scatterplot is given in Figure 2-6.
40

30
1 20
10
°c
Chromium vs. PCE
°0 0
Atrazine vs. PCE
~ Atrazine vs. Chromium IV
o
-° + +
 o° , +
'§&^^ + + +
°F , _|_
10 20
(PPb)
Figure 2-6. Example of a Coded Scatterplot
  0123
                                10 11  12 13  14
  01234567
Figure 2-7. Example of a Parallel-Coordinate Plot
EPA QA/G-9S
    30
       The parallel-coordinates
method employs a scheme where
coordinate axes are drawn in
parallel (instead of perpendicular).
Figure 2-7 is an example of a
parallel-coordinate plot. Consider
a sample point Xconsisting of
values X\ for measurement 1, X2
for measurement 2, through Xp for
measurement/'.  A parallel-
coordinate plot is constructed by
placing an axis for each of the/?
measurements in parallel and
plotting Xi on axis 1, Xi on axis 2,
and so on. The points are then
                     February 2006

-------
joined with a broken line.  This graph contains all of the information of a scatterplot in addition
to information concerning 3-way and higher interactions.  However, with/? measurements, one
must construct several parallel-
coordinate plots in order to display
all possible pairs.
       A scatterplot matrix is
another useful method of extending
scatterplots to higher dimensions.
In this case, a scatterplot is created
for all possible pairs of
measurements which are then
displayed in a matrix format.  This
method is easy to implement and
provides a  concise method of
displaying  the individual
scatterplots.  Interpretation proceeds
as with a simple or coded
scatterplot. As in those  cases, this
method does not provide
information about 3-way or higher
interactions.  An example  of a
scatterplot matrix is contained in
Figure 2-8.
40

^30
O.
a.
|20
3
S
0 10
°c
40

S-30
Q.
Q.
E20
S
0 10
0
r 14
12
3"
Q.
3 8
0>
I C
+ '§ 6
. +
^ , , , , \

- +
- + +
4- jf-
"f^+^ +
44-
^4-,
10 20 30 40 0 10 20 30 40
Chromium IV (ppb) 14
12
— 10
Q.
a s
+ •=
\ + 1 6
W*+4+ + + :
Chromium IV (ppb)
+
++


•^f
2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Atrazine (ppb) Atrazine (ppb)
Figure 2-8. Example of a Scatterplot Matrix
                                               2.3.6.3   Empirical Quantile-Quantile Plot

                                                      An empirical quantile-quantile (q-q)
                                               plot is a plot of the quantiles (Section 2.2.1)
                                               of two data sets against each other and is
                                               similar to the normal probability plot
                                               discussed in Section 2.3.5.  This plot
                                               (Figure 2-9) is used to compare the
                                               distributions of two measurements; for
                                               example, the distributions of lead and iron
                                               concentrations in a drinking water well.  This
                                               technique is used to determine if the data sets
                                               were generated from the same underlying
                                               distribution. If the graph is approximately
Figure 2-9. Example of an Empirical           linear> then the distributions are rOughly the
            Q-Q Plot                           same. If the graph is not linear, then the
departures from linearity give important information about how the two data distributions differ.
 The interpretation is similar to that of a normal probability plot. Directions for constructing an
empirical q-q plot with an example are given in Box 2-20.
EPAQA/G-9S                                31                                 February 2006

-------
              Box 2-20: Directions for Constructing an Empirical Q-Q Plot with an Example

    LetXi, X2,..., Xn represents data points of one variable and let YI, Y2,..., Ym represent a second variable of
    m data points. LetX(i), X(2),..., X(n) and Y(i), Y(2),..., Y(m) be the ordered data sets.

    If n = m, then an empirical q-q plot of the two variables is simply a plot of the ordered values of the
    variables, i.e., a scatterplot of the pairs (X(i), Y(ij), (Xpj, Ypj), ..., (X(n), Y(n)).
    Suppose n > m. Then the empirical quantile-quantile plot will consist of m pairs of points with the ordered
    Y values plotted against the m evenly spaced quantiles of X Using Box 2-1, compute the quantiles that
    correspond to the fractions gy- = (/ - l)/(m - 1) fory= 1 ..... m.

    Example: Consider contaminant readings from two separate drinking water wells at the same site.

             well 1: 1.32, 3.26,  3.56, 4.02, 4.14, 5.22, 6.30, 7.72, 7.73, 8.22
             well 2: 0.65, 0.68,  0.68, 1.42, 1.61, 1.93, 2.36, 2.49, 2.73, 3.01, 3.48, 5.44.

    Since the sample sizes are not equal, we need to compute the 10 evenly space quantiles for well 2.

             g-i = 0, so the first quantile is y(q-\) = 0.65.
             q2 = 1/9.  So r = (n-l)-g2 +1 = 11-1/9 + 1 = 20/9, / = floor(r) = 2 , and f = r-i =2/9.
                    Therefore, the second quantile is y(g2) = (7/9)-X(2) + (2/9)-X(3) = 0.68.

    Continuing this process for) =2 ..... 10 yields the following 1 0 quantiles:

             0.650, 0.680, 1.009, 1.547, 1.894, 2.374, 2.570, 2.886, 3.376, 5.440.

    These,  paired with the well 1 data, are plotted in Figure 2-9.
2.3.7  Plots for Temporal Data

       Data collected over specific time intervals (e.g., monthly, biweekly, or hourly) have a
temporal component. For example, air monitoring measurements of a pollutant may be collected
once a minute or once a day; water quality monitoring measurements of a contaminant level may
be collected weekly or monthly.  An analyst examining temporal data may be interested in the
trends over time, correlation among time periods, or cyclical patterns.  Some graphical
techniques specific to temporal data are the time plot, lag plot, correlogram, and variogram.

       A data sequence collected at regular time intervals is called a time series. Time series
data analysis is beyond the scope of this guidance. It is recommended that the interested reader
consult a statistician.  The graphical representations presented in this section are recommended
for any data set that includes a temporal component regardless of the decision to perform a time
series analysis.  The graphical techniques described below will help identify temporal patterns
that need to be accounted for in any analysis of the data.

       The analyst examining temporal environmental data may be interested in seasonal  trends,
directional trends,  serial correlation, or stationarity.  Seasonal trends are patterns in the data that
repeat over time, i.e., the data rise and fall regularly over one or more time periods.  Seasonal
trends may be large scale, such as a yearly cycle where the data show the same pattern of rising
and falling from year to year, or the trends may be small scale, such as a daily cycle. Directional
EPA QA/G-9S                                   32                                  February 2006

-------
trends are positive or negative trends in the data which is of importance to environmental
applications where contaminant levels may be increasing or decreasing.  Serial correlation is a
measure of the strength of the linear relationship of successive observations.  If successive
observations are related, statistical quantities calculated without accounting for the serial
correlation may be biased. Finally, another item of interest for temporal data is stationarity
(cyclical patterns). Stationary data look the same over all time periods. Directional trends or a
change in the variability in the data imply non-stationarity.

2.3.7.1    Time Plot

       A time plot (also known as a time series plot) is simply a plot of the data over time.  This
plot makes it easy to identify lack of randomness, changes in location, change in scale, small-
scale trends, or large-scale trends over time. Small-scale trends are displayed as fluctuations
over smaller time periods. For example, ozone levels over the course of one day typically rise
until the afternoon, then decrease, and this process is repeated every day. Larger scale trends,
such as  seasonal fluctuations, appear as regular rises and  drops in the graph. For example, ozone
levels tend to be higher in the summer than in the winter  so ozone data tend to show both a daily
trend and  a seasonal trend. A time plot can also show directional trends or changing variability
over time.
       A time plot               20
(Figure 2-10) is constructed
by plotting the measurements     15
on the vertical axis versus the   $
                              _3
actual time of observation or    5 10
the order of observation on      -5
the horizontal axis.  The
points plotted may be
connected by lines, but this
may create an unfounded          °0     5     10    15   20    25    so    35    40    45    so
sense of continuity.  It is                                      Time
important to use the actual     Figure 2-10. Example of a Time Plot
time or number at which the
observation was made.  This
can create discontinuities in the plot but are needed as the data that should have been collected
now appear as "missing values" but do not disturb the integrity of the plot. Plotting the data at
equally spaced intervals when in reality there were different time periods between observations
is not advised.

       The scaling of the vertical axis of a time plot is of some importance. A wider scale tends
to emphasize large-scale trends, whereas a narrower scale tends to emphasize small-scale trends.
 Using the ozone example above, a wide scale would emphasize the seasonal component of the
data, whereas a smaller scale would tend to emphasize the daily fluctuations. Directions for
constructing a time plot are contained in Box 2-21 along with an example.
EPAQA/G-9S                                 33                                 February 2006

-------
                    Box 2-21:  Directions for Generating a Time Plot and an Example

  LetXi, X2, ..., Xn represent n data points listed in order by time, i.e., the subscript represents the ordered time
  interval. A plot of the pairs (/, X,) is a time plot of the data.

  Example: Consider the following 50 daily observations listed in order by day:

                    10.05,  11.22, 15.90, 11.15, 10.53, 13.33, 11.81, 14.78, 10.93, 10.31,

                     7.95,10.11,10.27,14.25,  8.60,  9.18,12.20,  9.52, 7.59,10.33,

                    12.13,11.31,10.13,  7.11,  6.72,  8.97,10.11,  7.72, 9.57,  6.23,

                     7.25,   8.89,  9.14,12.34,   9.99,11.26,  5.57,  9.55,  8.91,  7.11,

                     6.04,   8.67,  5.62,  5.99,   5.78,  8.66,  5.80,  6.90,  7.70,  8.87.

  A time plot is constructed by plotting the pairs (/, X) where / represents the number of the day and X, represents
  the concentration level. The plot is shown in Figure 2-10. Note the slight  negative trend.
2.3.7.2   Lag Plot

       A lag plot is another method to determine if the data set or time series is random.
Nonrandom structure in the lag plot implies nonrandomness in the data. Examples of
nonrandom structure are linear patterns or elliptical patterns.  A linear pattern implies the data
contain a directional trend while an elliptical pattern implies the data contain a seasonal
component.
       If we have data points X\, Xi,..., Xn, then a
lag plot is a scatterplot of the points (Xt, Xt_k) for
some integer lag &, the most common being lags
1, 2, or 3. Figure 2-11 is a 1-lag plot for the data
in the example from Box 2-21.  Notice that there
is a light linear structure suggestion a possible
directional trend in the data.  See Section 2.3.7.3
for higher lags.
                                                   Figure 2-11. Example of a Lag Plot
2.3.7.3   Plot of the Autocorrelation Function
          (Correlogram)

       Serial correlation is a measure of the
strength of the relationship between successive
observations. If successive observations are
related, then the data must be transformed or the
relationship must be accounted for in the data
analysis.  The correlogram is a plot that is used to display serial correlations when the data are
collected  at equally spaced time intervals.  The autocorrelation function is a summary of the
serial correlations of data.  The 1st sample autocorrelation coefficient, r\, is the correlation
between points at lag 1 (points that are 1 time unit apart); r2 is the correlation between points at
lag 2; etc. A correlogram (Figure 2-12) is a plot of the sample autocorrelation coefficients, r^
versus k.
EPA QA/G-9S                                  34                                   February 2006

-------
   1 25 i	1
                                                       For a large independent data sequence
   0.75

 _  0.5
 ^
   0.25

     0

   -0.25

   -0.5
of n time points, the autocorrelations are
approximately normally distributed with mean
zero and variance \ln. Therefore, to
determine if the time points are independent,
first plot the approximate 95% confidence
lines + 2/v« (shown as dashed lines in
Figure 2-12) on the correlogram. If any of the
autocorrelations lie outside the confidence
                   10    15    20    25    30
                         k                      lines, then there is evidence of serial
Figure 2-12.  Example of a Correlogram       correlation and we conclude that the time
                                                points are not independent.

       In examining Figure 2-12, there are 4 time points that lie outside the 95% confidence
bounds.  These are at lags 1, 2, 12 and 24. This demonstrates strong evidence that the sequence
is serially correlated (most likely containing a positive trend along with an annual component).

       In application, the correlogram is only useful for data at equally spaced intervals and for
irregular intervals a text on the geostatistical use of a variogram is recommended.  Directions for
constructing a correlogram are contained in Box 2-22; example calculations are contained in
Box 2-23.  For large sample sizes, a correlogram is tedious to construct by hand; therefore,
statistical software should be used.
                        Box 2-22:  Directions for Constructing a Correlogram

  LetX-i, X2, ..., Xn represent the data points ordered by time for equally spaced time points, i.e., X: was collected
  at time 1, X-i was collected at time 2, and so on. To construct a correlogram, first compute the sample
  autocorrelation coefficients, rk.  So for k = 0, 1, ..., compute
                                               n
                              = i± and gk =1     (xt - X\Xt_k - X)
                                9°
                                             t=k+l
  Once the r^ have been computed, a correlogram is the graph (k, r^) for k= 0, 1, .  . .  , and so on. As an
  approximation, compute up to k = n/6. Also, note that r0 = 1. Finally, place horizontal lines at ±2/-Jn .
2.3.7.4   Multiple Observations in a Time Period

       Many times in environmental data collection, multiple samples are taken at each time
point. For example, the data collection design may specify collecting 5 samples from a drinking
well every Wednesday for three months. In this case, the time plot described in Section 2.3.7.1
may be used to display the complete data set, display the mean weekly level, display a
confidence interval for each mean, or display a confidence interval for each mean with the
individual  data values. A time plot of all the data will allow the analyst to determine if the
variability  for the different collection periods changes. A time plot of the means will allow the
EPAQA/G-9S                                 35                                 February 2006

-------
analyst to determine if the means are possibly changing between the collection periods. In
addition, each collection period may be treated as a distinct variable and the methods described
in Section 2.3.6 may be applied.
                     Box 2-23: Example Calculations for Generating a Correlogram

  A correlogram will be constructed using the following four hourly data points: hour 1:  4.5 ppm, hour 2: 3.5 ppm,
  hour 3:  2.5 ppm, and hour 4:  1.5 ppm. Only four data points are used so all computations may be shown for
  illustrative purposes.  Therefore, the guideline of computing no more than n/6 autocorrelation coefficients will be
  ignored.  The first step to constructing a correlogram is to compute the sample mean (Box 2-2), which is 3.
  Then,
         f=1

         4
                              f=1
  g., = -xf - X\Xt_^ - x] = -[(3.5 - 3)(4.5 - 3) + (2.5 - 3)(3.5 - 3)+ (1 .5 - 3)(2.5 - 2)] = 0.3125
        f=2

  Similarly, g2 = -0.375 and g3 = -0.5625.
  So r, = .L =        =0.25,   = g1 = z0375      and   = g^ = -0.5625
        go    1-25         2   g0    1-25           2   g0     1.25

  Thus, the correlogram of these data is a plot of (0, 1) (1, 0.25), (2, -0.3) and (3, -0.45) with two confidence lines
  at ±2/-Jn = ±1. This graph is shown below.

0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8



-
-
•if.
-
*
-
-
ii
01234
k
(Hours)
                                                                     + 1.0
                                                                     -1.0
2.3.7.5   Four-Plot

       A four-plot is a collection of 4 specific plots that provide different visual methods for
illustrating a measurement process. There are 4 basic assumptions that underlie all measurement
processes, there should be: random data, a fixed distribution, with a fixed location and variance.
If all of these assumptions hold, then the measurement process or data set is considered to be
stable. The fixed distribution we discuss here is a normal distribution, but any probability
distribution of interest could be used.  A four-plot consists of a time plot, a lag plot, a histogram
and a normal probability plot.  The data are random if the lag plot is structureless.  The data are
EPA QA/G-9S                                   36                                   February 2006

-------
from a normal distribution if the histogram is symmetric and bell-shaped and the normal
probability plot is linear. If the time plot is flat and non-drifting, then the data have a fixed
location. If the time plot has a constant spread, the data have a fixed variance.  Figure 2-13 is a
four-plot for the data contained in the example of Box 2-21.  In this particular example, note that
the data are not quite normal (deviations from the straight line on the plot), does not have a fixed
location ( a downward trend in the time plot), and possibly has serial correlation present (the
tendency of the lag plot to be increasing from left to right).
                                                                Lilfl Plot
                       Time Plot
                       Histogram
                                                              Normal Q-Q Plot
     Figure 2-13. Example of a Four-Plot

2.3.8  Plots for Spatial Data

       The graphical representations of the preceding sections may be useful for exploring
spatial data. However, an analyst examining spatial data may be interested in the location of
extreme values, overall spatial trends, and the degree of continuity among neighboring locations.
 Graphical representations for spatial data include postings, symbol plots, correlograms, h-scatter
plots, and contour plots.
EPA QA/G-9S
37
February 2006

-------
       The graphical representations presented in this section are recommended for all spatial
data regardless of whether or not geostatistical methods will be used to analyze the data. The
graphical representations described below will help identify spatial patterns that need to be
accounted for in the analysis of the data.

2.3.8.1   Posting Plots

       A posting plot (Figure 2-14) is a map of
data locations along with the corresponding
data values.  Data posting may reveal obvious
errors in data location and identify data values
that may be in error.  The graph of the sampling
locations gives the analyst an idea of where the
data were collected (i.e., the sampling design),
areas that may have been inaccessible, and
areas of special  interest to the  decision maker
which may have been heavily  sampled. It is
often useful to mark the highest and lowest
values of the data to see if there are any obvious
trends. If all of the highest concentrations fall
in one region of the plot, the analyst may
consider some method such as post-stratifying    _,.     „ ^ A  _,      ,,.„.„,
the data (stratification after the data are           Fl§ure 2'14'  ExamPle of a Postm§ Plot
collected and analyzed) to account for this fact in the analysis. Directions for generating a
posting plot are contained in Box 2-24.
          Box 2-24: Directions for Generating a Posting Plot and a Symbol Plot with an Example

    On a map of the site, plot the location of each sample. At each location, either indicate the value of the
    data point (a posting plot) or indicate the data value by a symbol (a symbol plot) or circle (a bubble plot).
    The circles on a bubble plot are equal to the square roots of the data values.

    Example: The spatial data displayed in the table below contains both a location (Northing and Easting)
    and a concentration level ([c]).  The chosen symbols are floor (concentration/5) and range from '0' to 7.'.
    The data values with corresponding symbols are:
N
15
15
15
15
15
15
10
10
E
0
5
10
15
20
25
0
5
[c]
4.0
11.6
14.9
17.4
17.7
12.4
28.6
7.7
Sym
0
2
2
3
3
2
5
1
N
10
10
10
10
5
5
5
E
10
15
20
25
0
5
10
[c]
15.2
35.5
14.7
16.5
8.9
14.7
10.9
Sym
3
7
2
3
1
2
2
N
5
5
5
0
0
0
0
E
15
20
25
10
15
20
25
[c]
12.4
22.8
19.1
10.2
5.2
4.9
17.2
Sym
2
4
3
2
1
0
3
    The posting plot of this data is displayed in Figure 2-14, the symbol plot is displayed in Figure 2-15, and the
    bubble plot is displayed in Figure 2-16.
EPA QA/G-9S
38
February 2006

-------
2.3.8.2   Symbol Plots and Bubble Plots

       For large amounts of data, a posting plot may be visual unappealing. In this case, a
symbol plot (Figure 2-15) or bubble plot (Figure 2-16) can be used. Rather than plotting the
actual data values, symbols or bubbles that represent the data values are displayed.  A symbol
plot breaks the data range into bins and plots a bin label corresponding to the data value.  A
bubble plot consists of circles, with radii equal to the square roots of the data values. So unlike
posting and symbol plots, a bubble plot gives a visual impression of the size of the data values.
Directions for generating a symbol or bubble plot are contained in Box 2-24.
Figure 2-15. Example of a Symbol Plot   Figure 2-16. Example of a Bubble Plot

2.3.8.3   Other Spatial Graphical Representations

       The three plots described in Sections 2.3.8.1 and 2.3.8.2 give information on the location
of extreme values and spatial trends. The graphs below provide another item of interest to the
data analyst, continuity of the spatial data. The graphical representations are not described in
detail because they are used more for preliminary geostatistical analysis.  These graphical
representations can be difficult to develop and interpret. For more information on these
representations, consult a statistician.

       An h-scatterplot is a plot of all possible pairs of data whose locations are separated by a
fixed distance in a fixed direction (indexed by h).  For example, an h-scatterplot could be based
on all the pairs whose locations are 1 meter apart in a southerly direction.  An h-scatterplot is
similar in appearance to a scatterplot (Section 2.3.6.1).  The shape of the spread of the data in an
h-scatterplot indicates the degree of continuity among data values a certain distance apart in
particular direction.  If all the plotted values fall close to a fixed line, then the data values at
locations separated by a fixed distance in a fixed location are very similar.  As data values
become less and less similar, the spread of the data around the fixed line increases outward.  The
data analyst may construct several h-scatterplots with different distances to evaluate the change
in continuity in a fixed direction.
EPA QA/G-9S
39
February 2006

-------
       A spatial correlogram is a plot of the correlations of the h-scatterplots. As the
h-scatterplot only displays the correlation between the pairs of data whose locations are
separated by a fixed distance in a fixed direction, it is useful to have a graphical representation of
how these correlations change for different separation distances in a fixed direction.  The spatial
correlogram is such a plot which allows the analyst to evaluate the change in continuity in a
fixed direction as a function of the distance between two points. A spatial correlogram is similar
in appearance to a temporal correlogram (Section 2.3.8.2).

       Contour plots are used to reveal overall spatial trends in the data by interpolating data
values between sample locations. Most contour procedures depend on the density of the grid
covering the sampling area (higher density grids usually provide more information than lower
densities). A contour plot gives one of the best overall pictures of the important spatial features.
 However, contouring often requires that the actual fluctuations in the data values are smoothed
so that many spatial features of the data may not be visible. The contour map should be used
with other graphical representations  of the data and requires expert judgment to interpret.

2.4     PROBABILITY DISTRIBUTIONS
2.4.1   The Normal Distribution

       Data, especially measurements, often occur in
natural patterns that can be considered to come from a
distribution of values. In most instances, the data
values will be grouped around some measure of central
tendency such as the mean or median.  The spread of
the data is called the variance (the square root of this is
called the standard deviation). A distribution with a
large variance will be more spread out than one with a
small variance (Figure 2-17).  If a histogram of the data
has a bell-shape (symmetric pattern about the mean
           Figure 2-17. Two Normal Curves,
                       Common Mean, Different
                       Variances
Figure 2-18. The Standard Normal
             Curve, Centered on Zero
with rapidly tapering tails), then the underlying
distribution is often a normal distribution.

       If it is known or assumed that the
underlying distribution is normal, then this is
usually written as 'distributed N^o2)' where // is
the mean and o2 is the variance.  By subtracting /LI
and dividing by a, any normal distribution can be
transformed to a standard normal distribution,
N(0,l). A standard normal random variable is most
often denoted by Z. A plot of the standard normal
is given in Figure 2-18. It is frequently necessary
to refer to the percentiles of a standard normal and
EPA QA/G-9S
  40
February 2006

-------
in this guidance document, the subscript to a quoted Z-value will denote the percentile (or area
under the curve, cumulative from the left), see Figure 2-18 (showing the area, 0.8413
corresponding to a Z-value of 1.00, and written as Zo.84i3 =1.00). Although common practice to
use this subscript notation, some text tables and software programs use a different nomenclature;
the user is advised to verify the meaning of any statistic encountered.

2.4.2   The ^-Distribution

       The standard normal curve is used when exact information on the mean and variance are
available, but when  only estimates from a sample are available, a different type of sampling
distribution applies.  When only information from a random sample on sample mean and sample
variance is known for decision making purposes, a Student's-^ distribution is appropriate.  It
resembles a standard normal but is lower in the center and fatter in the tails. The degree of
fatness in the tails is a function of the degrees of freedom available,  which in turn is related to
sample size. As the sample size increases, the estimates of the mean and variance improve.  As a
result, the Student's-^ distribution more closely resembles the standard normal distribution.

2.4.3   The Lognormal Distribution

Another commonly  used distribution in
environmental work is the lognormal distribution.       " ~  i \
The lognormal distribution is bounded on the left by     .
0, has a fatter right-tail than the normal distribution,
and has a right-skewed shape. These characteristics     "
are shown in Figure 2-19. The lognormal and        1 -:
normal distributions are related by a simple
transformation: if Xis distributed lognormally, then
Y = \n(X) is distributed normally.  However, log-        3
transforming lognormal data to perform statistical
procedures requiring normal data is a practice that       " '-.	••	•	=
should be done with care
                                                  Figure 2-19. Three Different Lognormal
2.4.4   Central Limit Theorem                                 Distributions

       In many testing and estimation situations in environmental work, the focus of the
investigation centers on the mean of a population. It is rare that true normality of the
observations can be  assumed.  In most cases, the normally-based statistical tests are not overly
affected by the lack  of normality since tests are very robust and perform tolerably well unless
gross non-normality is present. In addition, many tests become increasingly tolerant of
deviations from normality as the number of observations increase.  In simple terms, the
underlying distribution of the sample mean more closely resembles a normal distribution as the
number of observations increases. This occurs no matter what the underlying distribution of an
individual observation.  This phenomenon is called the Central Limit Theorem and  is the basis of
many statistical procedures.
EPAQA/G-9S                                41                                February 2006

-------
EPA QA/G-9S                                  42                                  February 2006

-------
                                            CHAPTER 3
                      STEP 3: SELECT THE STATISTICAL METHOD
THE DATA QUALITY ASSESSMENT PROCESS
             Review DQOs and Sampling Design
             Conduct Preliminary Data Review
               Select the Statistical Method
                 Verify the Assumptions
             Draw Conclusions from the Data
          SELECT THE STATISTICAL METHOD

        Purpose

        Select an appropriate procedure for analyzing
        data based on the preliminary data review.


        Activities

        • Select Statistical Method
        • Identify Assumptions Underlying Test


        Tools

        • Hypothesis tests for a single population
        • Confidence  Intervals for a single population
        • Hypothesis tests for comparing two populations
        • Confidence  Intervals for comparing two populations
                                  Step 3: Select the Statistical Method

         •    Select the statistical method based on the data user's objectives and the results of the preliminary
             data review.
             •   If the problem involves comparing study results to a fixed threshold, such as a regulatory
                 standard, consider the methods in Section 3.2.
             •   If the problem involves comparing two populations, such as comparing data from two
                 different locations or processes, then consider the hypothesis tests in Section 3.3.

         •    Identify the assumptions underlying the statistical method.
             •   List the key underlying assumptions of the statistical procedure, such as distributional form,
                 dispersion, independence, etc.
             •   Note any sensitive assumptions where relatively small deviations could jeopardize the
                 validity of the results.
EPA QA/G-9S
43
February 2006

-------
                                    List of Boxes

                                                                              Page
Box 3-1:  Directions for the One-Sample Mest	49
Box 3-2:  Directions for Computing a One-Sample t Confidence Interval or Limit	49
Box 3-3:  A One-Sample Mest Example	50
Box 3-4:  An Example of a One-Sample Clipper Confidence Limit for a Population Mean	50
Box 3-5:  Directions for Computing a One-Sample Tolerance Interval or Limit	51
Box 3-6:  An Example of a One-Sample Upper Tolerance Limit	51
Box 3-7:  Directions for a One-Sample ^-Test for a Stratified Random Sample	54
Box 3-8:  An Example of a One-Sample ^-Test for a Stratified Random Sample	55
Box 3-9:  Directions for the Chen Test	56
Box 3-10: Example of the Chen Test	57
Box 3-11: Directions for Computing Confidence  Limits for the Population
          Mean of aLognormal Distribution Using Land's Method	58
Box 3-12: An Example Using Land's Method	58
Box 3-13: Directions for the One-Sample Test for Proportions	59
Box 3-14: Directions for Computing a Confidence Interval for a Population Proportion	59
Box 3-15: An Example of the One-Sample Test for Proportions	60
Box 3-16: Directions for the Sign Test (One-Sample)	62
Box 3-17: An Example of the Sign Test (One-Sample)	63
Box 3-18: Directions for the Wilcoxon Signed Rank Test (One-Sample)	64
Box 3-19: An Example of the Wilcoxon Signed Rank Test (One-Sample)	65
Box 3-20: Directions for the Two-Sample ^-Test (Equal  Variances)	67
Box 3-21: Directions for a Two-Sample t Confidence Interval (Equal Variances)	68
Box 3-22: An Example of a Two-Sample ^-Test (Equal Variances)	68
Box 3-23: Directions for the Two-Sample ^-Test (Unequal Variances)	69
Box 3-24: Directions for a Two-Sample t Confidence Interval (Unequal Variances)	70
Box 3-25: An Example of the Two-Sample ^-Test (Unequal Variances)	70
Box 3-26: Directions for a Two-Sample Test for Proportions	72
Box 3-27: Directions for Computing a Confidence Interval for the Difference Between
          Population Proportions	73
Box 3-28: An Example of a Two-Sample Test for Proportions	73
Box 3-29: Directions for the Paired r-Test	74
Box 3-30: Directions for Computing the Paired t  Confidence Interval	74
Box 3-31: An Example of the Paired ^-Test	75
Box 3-32: Directions for the Wilcoxon Rank Sum Test	77
Box 3-33: An Example of the Wilcoxon Rank Sum Test	78
Box 3-34: A Large Sample Example of the Wilcoxon Rank Sum Test	79
Box 3-35: Directions for the Quantile Test	80
Box 3-36: AExample of the Quantile Test	80
Box 3-37: Directions for the Slippage Test	81
Box 3-38: AExample of the Slippage Test	82
Box 3-39: Directions for the Sign Test (Paired Samples)	83
EPA QA/G-9S                               44                              February 2006

-------
                                                                               Page
Box 3-40:  An Example of the Sign Test (Paired Samples)	84
Box 3-41:  Directions for the Wilcoxon Signed Rank Test (Paired Samples)	85
Box 3-42:  An Example of the Wilcoxon Signed Rank Test (Paired Samples)	86
Box 3-43:  A Large Sample Example of the Wilcoxon Signed Rank Test (Paired Samples)	87
Box 3-44:  Directions for Dunnett's Test	88
Box 3-45:  An Example ofDunnett's Test	89
Box 3-46:  Directions of the Fligner-Wolfe Test	91
Box 3-47:  An Example of the Fligner-Wolfe Test	92
EPA QA/G-9S                              45                               February 2006

-------
                                      CHAPTER 3

                   STEP 3: SELECT THE STATISTICAL METHOD

3.1    OVERVIEW AND ACTIVITIES

       This chapter provides an overview of issues associated with selecting an appropriate
statistical method that will be used to draw conclusions from the data. There are two important
outputs from this step: (1) the chosen method, and (2) the assumptions underlying the method. If
a particular statistical procedure has been specified either in the DQO Process, the QA Project
Plan, or the particular program or study, the analyst should use the results of the preliminary data
review to determine if it is appropriate for the data collected. If a particular procedure has not
been specified, then the analyst should select one based upon the data user's objectives and the
preliminary data review.

       One division in the methods of this section is between parametric and nonparametric
hypothesis tests. Parametric tests typically concern the population mean or quantile, use the
actual data values, and assume data values follow a specific probability distribution.
Nonparametric tests typically concern the population mean or median, use data ranks, and don't
assume a specific probability distribution. Parametric tests will have more power than a
nonparametric counterpart if the assumptions are met. However, the distributional assumptions
are often strict or undesirable for the parametric tests and deviations can lead to misleading
results.

3.2    METHODS FOR A SINGLE POPULATION

       The methods of this section concern comparing a single population parameter to a
regulatory value (i.e. a fixed number) or the estimation of the population parameter.  If the
regulatory or action-value was estimated, then a one-sample method is not appropriate and a
two-sample test should be selected.

       An example of a one-sample  test would be to determine if 95% of all companies emitting
sulfur dioxide into the air are below a fixed discharge level. For this example, the population
parameter is a proportion and the threshold value is 95% (0.95).  Comparing the mean
contaminant concentration of a hazardous site to the mean concentration of a background area
would be a considered a two-sample test.

       The hypothesis tests discussed in this section may be used to determine if there is
evidence that 6 < OQ, 6 > OQ, or 6 ^60  where 6 represents the population mean, median,
proportion, or quantile, and 6>0 represents the threshold value. There are also
confidence/tolerance interval procedures to estimate 6. Section 3.2.1 discusses parametric
hypothesis tests and confidence/tolerance intervals for a population mean or a population
proportion. Section 3.2.2 discusses nonparametric hypothesis tests for the population median or
population mean.
EPA QA/G-9S                                46                                February 2006

-------
Decision Tree for Selecting the Specific Method
                                    Parametric
 One-Sample
 ^-Sample
                                    Nonparametric
Test
Population
Parameter
Distributional
Assumption
Section
/-Test and CI
Stratified /-Test
Chen Test
Land's CI Method
Test for a Proportion and CI
Mean
Mean
Mean
Mean
Proportion
Normal
Normal
Right-Skewed
Lognormal

3.2.1.1
3.2.1.3
3.2.1.4
3.2.1.5
3.2.1.6
                                                                        Paired /-Test
                              Diffin Means
Normal
                                                                        Dunnett's Test
                                 Mean
                                                                     Kruskal-Wallis Test
                                 Mean
Sign Test
Wilcoxon Signed Ranks Test
Median
Median /Mean
None
Symmetric
3.2.2.1
3.2.2.2
/-Test and CI (equal variances)
/-Test and CI (unequal variances)
Test for Proportions and CI
Diffin Means
Diffin Means
Diffin Props
Normal
Normal
3.3.1.1.1
3.3.1.1.2
3.3.1.1.3
3.3.1.2.1
Wilcoxon Rank Sum Test
Quantile Test
Slippage Test
Diffin Means
Right-Tail
Right-Tail
Same Variance
3.3.2.1.1
3.3.2.1.2
3.3.2.1.3
Sign Test
Wilcoxon Signed Ranks Test
Median
Median /Mean
None
Symmetric
3.3.2.2.1
3.3.2.2.2
             3.4.1.1
             3.4.1.2
EPA QA/G-9S
47
      DRAFT FINAL
      September 2004

-------
3.2.1   Parametric Methods

       These methods rely on the knowing the specific distribution of the population or of the
statistic of interest.

3.2.1.1   The One-Sample Mest and Confidence Interval or Limit

Purpose:  Test for a difference between a population mean and a fixed threshold or to estimate a
population mean.

Data: A simple or systematic random sample, xi,...,xn, from the population of interest.

Assumptions:  The data are independent and come from an approximately normal distribution or
the sample size is large (n > 30).

Limitations and Robustness:  One-sample t methods are robust against the population distribution
deviating moderately from normality. However, they are not robust against outliers and have
difficulty dealing with non-detects.  The nonparametric methods of section 3.2.2 are an
alternative. The substitution or adjustment procedures of chapter 4 can be used for non-detects.

Directions for the one-sample t-test are contained in Box 3-1, with an example in Box 3-3.
Directions for the one-sample confidence interval are contained in Box 3-2, with an example in
Box 3-4.

3.2.1.2   The One-Sample Tolerance Interval or Limit

Purpose:  A tolerance interval specifies a region that contains a certain proportion of the
population with a certain confidence.  For example, "the 99% tolerance interval  for 90% of the
population is 7.5 to 9.9", is interpreted as, "I can be 99% certain that the interval 7.5 to 9.9
captures 90% of the population."

Data: A simple or systematic random sample, x\,...,xn, from the population of interest.  The
sample may or may not contain compositing.

Assumptions:  The data are independent and approximately normally distributed.

Limitations and Robustness:  Tolerance intervals are robust against the population distribution
deviating moderately from normality. However, they are not robust against outliers.

Directions for the one-sample tolerance interval are contained in Box 3-5, with an example  in
Box 3-6.
EPA QA/G-9S                                48                                February 2006

-------
                              Box 3-1 : Directions for the One-Sample Mest
 COMPUTATIONS: Compute the mean, X ,  and standard deviation, s, of the data set.
 STEP 1. Null Hypothesis:          H0:// = C

 STEP 2. Alternative Hypothesis:    i)  HA :  ju > C  (upper-tail test)
                                  ii) HA :  ju < C (lower-tail test)
                                  ill)  HA : ju * C  (two-tail test)
 STEP 3. Test Statistic:
STEP 4.  a) Critical Value:
                                      X-C
                                         Use Table A-2 to find:
 STEP 4. b) p-value:
 STEP 5. a) Conclusion:
 STEP 5. b) Conclusion:
                                Use Table A-2 to find:
                                        i)  P(fn-i>fo)
                                        ii) P(fn-i |f0|) , where |f0| is the absolute value of fo.

                                i)  If k > fn_i  ]_a , then reject the null hypothesis that the true population
                                   mean is equal to the threshold C.
                                ii)  If t0 < -fn-i,i-a . tnen reject the null hypothesis.
                                iii) If |f0| > fn_j i_a/2 , then reject the null hypothesis.

                                If p-value < a, then reject the null hypothesis that the true population mean is
                                equal to the threshold C.
 STEP 6.  If the null hypothesis was not rejected, there is only one false acceptance error rate (/?at//i), and if
           n >—\±2L	^~Pi + 1-"'  | then the sample size was probably large enough to achieve the DQOs.
          The value of a' is a for a one-sided test and  a/2 for a two-sided test.
               Box 3-2:  Directions for Computing a One-Sample t Confidence Interval or Limit
 COMPUTATIONS: Compute the sample mean,  X , and sample standard deviation, s, of the data set.

 A 100(1 - a)% confidence interval for// is X±tn_^^_a/2 • s/r- , where Table A-2 is used to find tn_^^_a/2 .

 A 100(1 - a)% upper confidence limit (UCL) for // is X + tn_^ -\_a •  s/r.
                                                           /vn
 A 100(1 - «)% lower confidence limit (LCL) for ju is X - tn_^ -\_a • s/,-.
EPA QA/G-9S
                                                 49
February 2006

-------
                                 Box 3-3:  A One-Sample f-Test Example

 Consider the following 9 random data values (in ppm):

                      82.39, 103.46, 104.93, 105.52, 98.37, 113.23, 86.62, 91.72, 108.21.

 This data will be used to test: Ho: // < 95 ppm vs.  HA: // > 95 ppm.  The decision maker has specified a 5% false
 rejection error rate (a) at 95 ppm (C), and a 20% false acceptance error rate (/?) at 105 ppm (fi-\).

 COMPUTATIONS:  The mean is X = 99.38 ppm and the standard deviation is s = 10.41 ppm.

 STEP1. Null Hypothesis:                 H0: // < 95

 STEP 2. Alternative Hypothesis:   HA: fi > 95 (upper-tail test)


 STEP 3. Test Statistic:           f0 = ?-£- =  9,9n3f"95  = 1.26
                                      s/      10. 4
                                                    /9

 STEP 4.  a)  Critical Value:                 Using Table A-2, tn_^_a = t8095 = 1.86

 STEP 4.  b)  p-value              Using Table A-2, 0.10 < p-value < 0.15. Using statistical software,
                                 p- value = P(fn_1 >t0) = P(t8 >1.26) = 0.1216.

 STEPS,  a)  Conclusion:          Since 1.26 < 1.86, we fail to reject the null hypothesis that the true population
                                 mean is at most 95 ppm.

 STEP 5.  b)  Conclusion:          Since p-value = 0.1216 > 0.05 = significance level, we fail to reject the
                                 null hypothesis that the true population mean  is at most 95 ppm.

 STEP 6.  Since the null hypothesis was not rejected and there is only one false acceptance error rate, it is possible
          to use the sample size formula to determine if the error rate has been satisfied.  Since,


                   n^s2h-a> + z^pf f  zla._ 10.412 (1.645 + 0.842)2  + 1.6452  _
                          (^-C)2        2          (95-105)2           2

          the false acceptance error rate has probably been satisfied.
           Box 3-4: An Example of a One-Sample t Upper Confidence Limit for a Population Mean

 The effluent from a discharge point in a plating manufacturing plant was sampled 7 times over the course of 4
 days for the presence of Arsenic with the following results (in ppm):  8.1,  7.9, 7.9, 8.2, 8.2, 8.0, 7.9. A 95% upper
 confidence limit for the population mean will be computed.

 COMPUTATIONS: The sample mean is X = 8.03 ppm and the sample standard deviation is s = 0.138 ppm.


 A 95% upper confidence limit for p. is: X + tn_^_a-  s//-   i.e.%. 03 ±1.943 • 0.138/   Or8.131 ppm
                                                /\n                       / \7
EPA QA/G-9S                                     50                                     February 2006

-------
              Box 3-5: Directions for Computing a One-Sample Tolerance Interval or Limit

 COMPUTATIONS:  Compute the sample mean, X ,  and sample standard deviation, s, of the data set.
 A 100(1 - a)% tolerance interval for(1 -p)% of the population is X±k2 -s , where k2 = z1_p/2 •  	_-
                                                                               V " • Xn-

 Table A-1 is used to find z-|_p/2  and Table A-9 is used to find  ^_1 a.
 A 100(1 - a)% upper tolerance limit for (1 -p)% of the population is X + k^-s and
 a 100(1 - a)% lower tolerance limit for (1 -p)% of the population is X -k-\ • s , where
                          ^-p+il^-p-ab          Z2                  Z2
                                  	.  a = 1-  /..   A, and /J = z1_D	
                     Box 3-6:  An Example of a One-Sample Upper Tolerance Limit

 The effluent from a discharge point in a plating manufacturing plant was sampled 7 times over the course of 4
 days for the presence of Arsenic with the following results (in ppm):  8.1, 7.9, 7.9, 8.2, 8.2, 8.0, 7.9. A 95% upper
 tolerance limit for 90% of the population will be computed.

 COMPUTATIONS: The sample mean is X = 8.03 ppm and the sample standard deviation is s = 0.138 ppm.
 Also, 8 = 1-      = 0.7731. 6=1.28'-      = 1.2495. and ^ = 1.28 + l.28-0.7731.1.24J5_
            2-6                      7                1            0.7731

 A 95% upper tolerance limit for 90% of the population is:  X + k^-s  i.e.  8.03+ 2.716-0.138 or 8.405
         ppm.
 So we are 95% confident that at least 90% of the population is less than 8.405 ppm.
3.2.1.3   Stratified Random Sampling

       This section provides a brief introduction of stratified random sampling and directions for
a one-sample Mest with stratified data.  For a more in-depth discussion of stratified random
sampling, see Guidance on Choosing a Sampling Design for Environmental Data Collection,
EPA QA/G-5S (U.S. EPA 2002).

       A stratified random sample occurs when a population is split into subpopulations (called
strata) and simple random samples are taken from the subpopulations. Reasons for
implementing stratified random sampling include administrative convenience and a gain in the
precision in the estimates if a heterogeneous population can be split into homogeneous
subpopulations.
EPAQA/G-9S                                  51                                  February 2006

-------
       Suppose the cost of taking a sample is C = c0 + Zctfih, where c0 is the overhead cost, c/, is
the cost of sampling in stratum h, and «/, is the sample size taken from stratum h.  Then the
variance of the sample mean is minimized for given cost C and the cost is minimized for a
specified variance if nh is proportional to Wh
-------
Assumptions: A simple or systematic random sample, x\,...,xn, from the population of interest
which is believed to be right-skewed. This can be verified by inspection of a histogram of the
data or a sample skewness that is greater than one.
EPA QA/G-9S                                53                                February 2006

-------
                Box 3-7: Directions for a One-Sample f-Test for a Stratified Random Sample
COMPUTATIONS:  Calculate the stratum weights,

                                                              vh > where Vh is the surface area of stratum h
  multiplied by the depth of sampling in stratum h.  For each stratum, calculate the sample stratum mean and the

  sample stratum standard deviation,
                                       ""
                                                              ""
                                  nh
                                    "  ••-
                                                        nh -l
                                                         "
                                                                  _          _

Compute the overall sample mean and the sample variance of the mean, Xsf =    Wf,Xf,  and sf =
                                                                                             L      2

                                                                                               — ^—^-
                                                                         ft=1
                                                                                            ft=1
  Finally, calculate the approximate degrees of freedom, df =	sf    — (round up to the next integer).

                                                       ^—,L    l/Vi7St?
  STEP 1. Null Hypothesis:
                                 H  : ju = C
  STEP 2. Alternative Hypothesis:   i)  HA :  // > C  (upper-tail test)


                                  ii)  HA : /u < C (lower-tail test)


                                  ill) HA : jU^C (two-tail test)
  STEPS. Test Statistic:
                                          sf
STEP 4.  a)  Critical Value:
                                          Use Table A-2 1 find:


                                          0  fdf,1-a



                                          ii)  -tdf.t-a
  STEP 4. b) p-value:
                                Use Table A-2 to find:
  STEP 5. a) Conclusion:
  STEP 5. b) Conclusion:
                                i)   If to > tdf ^_a , then reject the null hypothesis that the true population mean


                                    is equal to the threshold C.

                                ii)  If to < -tdf i_a , then reject the null hypothesis.


                                iii)  lf |fo| > tdf, -\-a/2 • tnen reject the null hypothesis.




                                If p-value < a, then reject the null hypothesis that the true population mean is

                                equal to the threshold C.
  STEP 6. If the null hypothesis was not rejected, there is only one false acceptance error rate (/?at//i), and if



                           — +  1~"' , then the sample size was probably large enough to achieve the DQOs.

                  (u-i — cr         2


          The value of a' is a for a one-sided test and a 12 for a two-sided test.
EPA QA/G-9S
                                                  54
February 2006

-------
               Box 3-8: An Example of a One-Sample Mest for a Stratified Random Sample

  Consider a stratified random sample consisting of two strata where stratum 1 comprises 25% of the total site
  surface area and stratum 2 comprises the other 75%. Suppose 40 samples were collected from stratum 1, and
  60 samples were collected from stratum 2. This information will be used to test the null hypothesis that the
  overall site mean is 40 ppm versus the lower-tail alternative.  The decision maker has specified a 1% false
  rejection decision limit (a) at 40 ppm and a 20% false acceptance decision error limit (/?) at 35 ppm  (//i).

  COMPUTATIONS:  The stratum weights are W-\ = 0.25, W2 = 0.75. For stratum 1, the sample mean is 31  ppm
  and the sample standard deviation is 18.2 ppm. For stratum 2, the sample mean is 35 ppm, and the sample
  standard deviation is 20.5 ppm.

                                            L
  The sample overall mean concentration is Xst = ~^WhXh =0.25-31 + 0.75-35 = 34 and the sample variance
                                           ft=i
  of the mean is
                          S2 =
                           Sf
                                    "*
                                          0.252-18.22   0.752-20.52
                                            4°
                                                           6°
The approximate degrees of freedom is:


                   df=	5sL_
                                                   4.46*
                                          0.254-18.24   0.754-20.54
                                                   	h-
                                                                   = 73.7->74.
                                            402-39
                                                       602•59
STEP1. Null Hypothesis:

STEP 2. Alternative Hypothesis:


STEPS. Test Statistic:


STEP 4. a) Critical Value:

STEP 4. b) p-value:


STEP 5. a) Conclusion:


STEP 5. b) Conclusion:
                                   H0 :  n = 40

                                   HA :  ju < 40 (lower-tail test)


                                        Xsf - C _ 34 - 40
                                   fn=-
                                                     = -2.841
                                         ssf      V4.46

                                   Using Table A-2, -tdf^_a =-t74 099 =-2.378 .

                                   Using Table A-2, p-value < 0.005.  (the exact value is
                                    P(tdf < to) = P(f74 < -2.841) = 0.0029 ).

                                   Since test statistic = -2.841 < -2.378 = critical value, we reject the null
                                   hypothesis that the true population mean is equal to 40.

                                   Since p-value = 0.0029 < 0.01 = significance level, we reject the null
                                   hypothesis that the true population mean is equal to 40.
Limitations and Robustness: Chen's test is a generalization of the one-sample t-test. Like the t-
test, this Chen's test can have some difficulties in dealing with non-detects, especially if there are
a large number of them.  For a moderate amount of non-detects, a substitution method (e.g.,/^ of
the detection limit) will suffice if the threshold level C is much larger than the detection limit,
otherwise refer to the methods discussed in Chapter 4.
EPA QA/G-9S
                                               55
                                                                                     February 2006

-------
Directions for the Chen test are contained in Box 3-9, with an example in Box 3-10.
                                  Box 3-9:  Directions for the Chen Test

 COMPUTATIONS:  Visually check the assumption of right-skewness by inspecting a histogram.  If at most 15% of
 the data points are below the detection limit (DL) and C is much larger than the DL, then replace values below the
 DL with DL/2. Compute the mean,  X , and the standard deviation, s of the data set. Then compute
                b =
                                y (x/- X")  (sample skewness), a = —7=
                                ~^                                6v"
b       .    X-C
   , and t0 = —y—
 NOTE: The sample skewness should be greater than 1 to indicate the underlying distribution is right-skewed.

 STEP 1.  Null Hypothesis:          H0:,« = C

 STEP 2.  Alternative Hypothesis:   i)  HA : // > C  (upper-tail test)
                                 ii)  HA :  ju < C  (lower-tail test)
                                 ill) HA :  jU^C  (two-tail test)
STEP 3. Test Statistic:


STEP 4. a) Critical Value:
 STEP 4.  b)  p-value:
 STEP 5.  a)  Conclusion:
 STEP 5.  b)  Conclusion:
                                 T = t
                                 Use Table A-1to find:
                                Use Table A-1 to find:
                                       i)   P(Z>T)
                                       ii)  P(Z < 7)
                                       iii)  2-p(z>|r|)


                                i)   If 7 > Zj_a , then reject the null hypothesis that the true population mean is
                                   equal to the threshold value C.
                                ii)  If 7 < za, then reject the null hypothesis.

                                iii)  If |T| > Zj_a/2, then reject the null hypothesis.

                                If p-value < a, then reject the null hypothesis that the true population mean is
                                equal to the threshold value C.
EPA QA/G-9S
                                                 56
                     February 2006

-------
                                Box 3-10: Example of the Chen Test

 Consider the following sample of a contaminant concentration measurements (in ppm):

                          2.0, 2.0, 5.0, 5.2, 5.9, 6.6, 7.4, 7.4, 9.7, 9.7, 10.2, 11.5,
                         12.4, 12.7, 14.1, 15.2, 17.7, 18.9, 22.8, 28.6, 30.5, 35.5.

 We want to test the null hypothesis that the mean p is less than 10 ppm versus the alternative that it exceeds 10
 ppm. A significance level of 0.05 is to be used.

 COMPUTATIONS:  A histogram of the 22 data points indicates a right-skewed distribution.  It is found that
  X = 13.227  ppm and s = 9.173 ppm.  Also,

     h       22     ^(^  ,,0997x3  ., n,R   a   1.078                 X-C   13.227-10
     b =	T-  > IX.- -13.227)  = 1.078 ,  a = —j= = 0.0383 , and in =	7— = ,-. *-,~ ,	= 1.650 .
         21-20-9.1733^                       6V22                  s/_   QMy


 A skewness value of 1.078 indicates the data are right-skewed.

 STEP 1. Null Hypothesis:        H0:p<10

 STEP 2. Alternative Hypothesis:  HA:p>10 (upper-tail test)

 STEPS. Test Statistic:           T = t0 + a-(l + 2^)+4a2-(f0+2f(31) = 1.96 .

 STEP 4. a)  Critical Value:               Using Table A-1, Z]_a = z0 95 = 1.645.

 STEP 4. b)  p-value:            Using Table A-1, P(Z > 1.96) =1-0.9750 = 0.0250 .

 STEP 5. a)  Conclusion:         Since the test statistic = 1.96 > 1.645 = the critical value, we reject
                              the null hypothesis.

 STEPS, b)  Conclusion:         Since the p-value = 0.0250 < 0.05 = the significance level, we reject
                              the null hypothesis.
3.2.1.5   Land's Method for Lognormally Distributed Data

Purpose: Estimate the mean of a lognormally distributed population using confidence limits.
Another alternative is to log-transform the data to make it approximately normal, then use the t
confidence interval described in Box 3-2, and finally transforming the confidence bounds back to
the original scale.  This method produces a biased interval estimate and should be avoided,
unless a median is being estimated in which case this method produces an unbiased estimate.

Data: A simple or systematic random sample, xi,...,xn, from the population of interest.  The
sample may or may not contain compositing.

Assumptions: The underlying distribution of the data is approximately lognormal.

Limitations and Robustness: Land's method is extremely sensitive to outliers since the mean and
standard deviation are not robust against outliers unless a preventative limit on variance is used.

EPA QA/G-9S                                   57                                   February 2006

-------
Directions for Land's Method are contained in Box 3-11, with an example in Box 3-12.
                Box 3-11:  Directions for Computing Confidence Limits for the Population
                       Mean of a Lognormal Distribution Using Land's Method

  COMPUTATIONS: Transform the data: y/=lnx,,/=1	n. Next, compute the sample mean, y , and sample
  variance, sy, of the transformed data. The values for H-\_a and Ha  come from Table A-17.

  An upper one-sided 100(1 - a)% confidence limit for the population mean is exp y+-^-^ y 1~"  '
                                                                       2   V/J-1
  A lower one-sided 100«% confidence limit for population mean is exp y +-^-j-  y  a
                                                               2  V/J-1
                            Box 3-12: An Example Using Land's Method

  A random sample of 15 concentrations from a monitoring process (assumed to be lognormal) is reported:

           8.12, 7.32, 4.82, 6.52, 7.80, 11.89, 12.94, 7.51, 18.14, 4.09, 5.70, 15.57, 6.68, 8.15, 5.56.

  Compute an upper one-sided 95% confidence limit for the population mean of the process.

  COMPUTATIONS: The log-transformed data set is:

             2.09, 1.99, 1.57, 1.87, 2.05, 2.48, 2.56, 2.02, 2.90, 1.41, 1.74, 2.75, 1.90, 2.10, 1.72

  The sample mean and sample standard deviation of the transformed data are y = 2.0767 and  sy = 0.4272.
  The value of H0.95 is found by interpolation in Table A-17.

  An upper one-sided 95% confidence limit for the population mean is

                    f_  sy   syH095"|     f-™.,  0.42722   0.4272-1.99391  ..„
                 exp  y + -*- +  *	  = exp 2.0767+	+	=	 =10.97
                   1y   2     ^T                  2
3.2.1.6   The One-Sample Proportion Test and Confidence Interval

Purpose: Test for a difference between a population proportion, P, and a fixed threshold (Po) or
to estimate the population proportion. If PQ = 0.5, this test is equivalent to the Sign test.

Data: A simple or systematic random sample, xi,...,xn, from the population of interest.

Assumptions:  The data constitutes an independent random sample from the population.

Limitations and Robustness:  Both nPo and n(l-Po) must be at least 5 to apply the normal
approximation. Otherwise, exact tests must be used and a statistician should be consulted.
EPA QA/G-9S                                   58                                   February 2006

-------
Directions for the one-sample test for proportions are contained in Box 3-13, with an example in
Box 3-15.  Directions for computing a confidence interval for a proportion are contained in
Box 3-14.
                      Box 3-13: Directions for the One-Sample Test for Proportions

  COMPUTATIONS: Compute p, the sample proportion of data values that fit the desired characteristic.

  STEP 1. Null Hypothesis:          H0 : P = P0

  STEP 2. Alternative Hypothesis:    i)  HA : P > P0  (upper-tail test)
                                  ii)  HA :  P < P0 (lower-tail test)
                                  ill)  HA : P * pQ (two-tail test)
  STEPS. Test Statistic:
  STEP 4. a) Critical Value:
  .
, where c =
                             -0.5ln   ifp>P0
                              0.5/n   ifp|z0|)
  STEP 5. a) Conclusion:
  STEP 5. b) Conclusion:
i)  If z0 > Zj_a , then reject the null hypothesis that the true proportion is
   equal to the threshold value P0.
ii)  If z0 < za , then reject H0.
iii) If |z0| > Zi_a/2 , then reject H0.

If p-value < a, then reject the null hypothesis that the true proportion is
equal to the threshold value PQ.
  STEP 6.  If the null hypothesis was not rejected, there is only one false acceptance error rate (/?at P-\), and if
                                               2
           n >
               , then the sample size was large enough to achieve the
                           Pi-Po
          DQOs. The value of a' is a for a one-sided test and a/2 for a two-sided test.
           Box 3-14:  Directions for Computing a Confidence Interval for a Population Proportion

 COMPUTATIONS:  Compute p, the sample proportion of data values that fit the desired characteristic.

 A 100(1 - «)% confidence interval for P is p±z-\_aj2 -J——— > where Table A-1 is used to find z-\_a/2 •
EPA QA/G-9S
                59
                                     February 2006

-------
                      Box 3-15: An Example of the One-Sample Test for Proportions

  Consider 85 concentration samples of which 1 1 have are greater than the clean-up standard. Test the null
  hypothesis that the population proportion of concentrations greater than the clean-up standard is 0.2 versus the
  lower-tailed alternative. The decision maker has specified a 5% false rejection rate (a) for P0 = 0.2, and a false
  acceptance rate (ft) of 20% for P-i =0.15.

  COMPUTATIONS: Since both nP0 = 85-0.2 = 17 and n(1- P0) = 85- (1-0.2) = 68 are at least 5, the normal
  approximation can be applied.  From the data, the sample proportion isp = 11/85 = 0.1294

  STEP 1. Null Hypothesis:            H0:P>0.20

  STEP 2. Alternative Hypothesis:      HA: P<0.20

  „„-> T  ._.....                     p + 0.5/n-p0    0.1294 + 0.5/85-0.2
  STEPS. Test Statistic:              z0 =  ,          "  = - .         = — = -1.49
  STEP 4. a) Critical Value:           Using Table A-1, za= zoos =-1.645 .


  STEP 4. b) p-value:                Using Tables A-1, P(Z < -1.49)= 1-0.931 9 = 0.0681 .

  STEPS, a) Conclusion:             Since the test statistic = -1.49 > -1.645, we fail to reject the null
                                    hypothesis that the true population proportion is at least 0.20.

  STEP 5. b) Conclusion:             Since p-value =  0.0681 > 0.05 = significance level, we fail to reject the
                                    null hypothesis that the true population proportion is at least 0.20.

  STEP 6. Since the null hypothesis was not rejected and there is only one false acceptance error rate, it is
          possible to use the sample size formula to determine if the error rate has been satisfied. Since,
             n <
                             Pi-Po
                                                    1.64^/0.2(1 - 0.2) + 0.84^/0.15(1 - 0.15)
                                                               0.15-0.2
= 365.53 ,
           the sample size was not large enough to achieve the DQOs. So the null hypothesis was not rejected,
           but the false acceptance error rate was not satisfied. Therefore, there is insufficient evidence that the
           proportion is less than 0.20, but this conclusion is uncertain because the sample size was too small.
3.2.2   Nonparametric Methods

        These methods rely on the relative rankings of data values.  Knowledge of the precise
form of the population distribution is not necessary.
3.2.2.1   The Sign Test

Purpose:  Test for a difference between the population median, and a fixed threshold.


Data: A simple or systematic random sample, xi,...,xn, from the population of interest. The
sample may or may not contain compositing.


EPA QA/G-9S                                    60                                     February 2006

-------
Assumptions: The Sign test can be used for any underlying population distribution.

Limitations and Robustness:  The Sign test has less power than the one-sample Mest or the
Wilcoxon Signed Rank test.  However, the Sign test makes no distributional assumptions like the
other two tests and it can handle non-detects if the detection limit is below the threshold.

Directions for conducting a sign test are contained in Box 3-16, with an example in Box 3-17.

3.2.2.2   The Wilcoxon Signed Rank Test

Purpose: Test for a difference between the true location (mean or median) of a population and a
fixed threshold.  If the underlying population distribution is approximately normal, then the one-
sample Mest will have more power (chance of rejecting the null hypothesis when it is false) than
the Wilcoxon Signed Rank test.  For symmetric distributions, the Wilcoxon Signed Rank test
will have more power than the Sign test.  If the sample size is small and the data is not
approximately symmetric nor normally distributed, then the Sign test should be used.

Data: A simple or systematic random sample, x\,...,xn, from the population of interest.

Assumptions: The  data set comes from an approximately symmetric distribution.

Limitations and Robustness For large  sample sizes (n > 50), the one-sample t-test is more robust
against violations of its assumptions than the Wilcoxon Signed Rank test.  The Wilcoxon signed
rank test may produce misleading results if there are many tied data values. If possible,
measurements should be recorded with sufficient accuracy so that a large number of tied values
do not occur. Estimated concentrations should be reported for data below the detection limit,
even if these estimates are negative, as their relative magnitude is of importance.  If this is not
possible, substitute the value DL/2 for each value below the detection limit providing all the data
have the same detection limit. When  different detection limits are present, all data could be
censored at the highest detection limit but this will substantially weaken the test.  A statistician
should be consulted on the potential use of Gehan ranking.

Directions for the Wilcoxon signed rank test are contained in Box 3-18, with an example in
Box 3-19.
EPAQA/G-9S                                61                                February 2006

-------
                            Box 3-16:  Directions for the Sign Test (One-Sample)

  COMPUTATIONS:  Let C be the threshold.  Compute the deviations  dt = x,- - C .  If any of the deviations are
  zero delete them and correspondingly reduce the sample size.  Finally, compute B, the number of times d, > 0.
  STEP1: Null Hypothesis:

  STEP 2: Alternative Hypothesis:



  STEPS: Test Statistic:
         HQ: median = C

         i)   HA: median > C (upper-tail test)
         ii)  HA: median < C (lower-tail test)
         iii)  HA: median * C  (two-tail test)

         If n < 20, then the test statistic is B.

         If n > 20, then the test statistic is z0 =
                                                                          B-n/2
  STEP 4 a): Critical Value:    If n < 20, then use Table A-18 to find
                                      i)   BupPer(n, 2a)
                                      ii)  B|0wer(n, 2a) -1
                                      iii)  B|0wer(n, a) - 1 and Bupper (n, a)

                              If n > 20, then use Table A-1 to find
  STEP 4 b):  p-value:         If n < 20, then let p(i) = I   — and the p-value is

  STEP 5 a): Conclusion:
  STEP 5 b):  Conclusion:
                                      iii)  2-mirJ V   p(/),  V"  p(i)
                                                [Z-i/=0  W  ^—'i=B  W
                              If n > 20, then use Table A-1 to find
         iii)  2.p(z>|z0|)
If n<20, then
         i)   If 6 > Bupper(n, 2a), then reject the null hypothesis that the true
            population median is equal to the threshold value C.
         ii)  If 6 < B|0»er(fl, 2 a) - 1 , then reject the null hypothesis.
         iii)  If 6 > 6upper(n, a) or 6 < 6|OWer(n,  a) - 1 , then reject the null hypothesis.

If n >20, then
         i)   If z0 > Zi_a , then reject the null  hypothesis that the true population
            median is equal to the threshold value C.
         ii)  If z0 < za, then reject the null hypothesis.
         iii)  If |z0| > z1_a/2 , then reject the null hypothesis.

If p-value < a, then reject the  null hypothesis that the true population median is
equal to the threshold value C.
EPA QA/G-9S
                       62
February 2006

-------
                        Box 3-17:  An Example of the Sign Test (One-Sample)

  The following 10 data points (in ppb) will be used to test the null hypothesis that the population median is at
  least 1000 ppb versus the lower-tail alternative. The decision maker has specified a 10% false rejection error
  rate (a) at 1000 ppb (C), and a 20% false acceptance error rate (/?) at 900 ppb (fa ).

  COMPUTATIONS:  The table below displays the data values and the deviations:
Xi
di
974
-26
1044
44
1093
93
897
-103
879
-121
1161
161
839
-161
824
-176
796
-204
<750 (DL)
-625
  Therefore, 6 = number of d, > 0 = 3.

  STEP1:  Null Hypothesis:           H0: median > 1000 ppb

  STEP 2:  Alternative Hypothesis:      HA: median < 1000 ppb

  STEP 3:  Test Statistic:             Since n < 20, the test statistic is 6 = 3.
STEP 4 a):  Critical Value:
  STEP 4 b):  p-value:
  STEP 5 a):  Conclusion:
  STEP 5 b):  Conclusion:
                                 Since n < 20, Table A-18 is used to find Btower(n, 2a) -1 = 2.
                                                101 1
                        Since n < 20, p-value = >     —— = 0.1719.
                                           z—11  /  ' ~1fl
                                           /=o v

                               Since test statistic = 3 > 2 = critical value, we fail to reject the null
                               hypothesis that the true median is at least 1000 ppb.

                               Since p-value = 0.1719 > 0.10 = significance level, we fail to reject the
                               null hypothesis that the true median is at least 1000 ppb.
3.3    COMPARING TWO POPULATIONS

       The two-sample methods of this section concern the comparison of two population
parameters (means, medians, or proportions). During Step 1:  Review DQOs and Sampling
Design, the population parameters were specified.  In environmental applications, the two
populations to be compared may be a potentially contaminated area with a background area or
concentration levels from an upgradient and a downgradient well. Another example is
comparing site concentration levels before and after clean-up to test for significant improvement.

       The populations of interest can be one of two types, independent or paired. With
independent populations, measurements are taken separately from the two groups of sampling
units; for example, concentrations from a contaminated site and a background site. The
observations from paired populations are  correlated.  Here, two measurements are taken upon
one set of sampling units at separate instances; for example, measurements before and after
clean-up or two labs making separate measurements on a single set of samples. Parametric and
nonparametric methods are  described for  both independent and paired populations.
EPA QA/G-9S
                                           63
February 2006

-------
                   Box 3-18: Directions for the Wilcoxon Signed Rank Test (One-Sample)
   COMPUTATIONS: Let C be the threshold.  Compute the deviations d, = x, - C.  If any of the deviations are
   zero, then delete them and correspondingly reduce the sample size. Rank the absolute deviations, |d,|, from
   smallest to largest. If there are tied observations, then assign the average rank. Let R, be the signed rank of
   |d,|, where the sign of R, is determined by the sign of d,.
   STEP 1.  Null Hypothesis:
   STEP 2.  Alternative Hypothesis:


   STEPS.  Test Statistic:
                                  H0: location = C
                                  i)   HA: location > C (upper-tail test)
                                  ii)  HA: location < C (lower-tail test)
                                  ill)  HA: location * C (two-tail test)
                                  If n< 20, then  T+ =  y f?,- ,  the sum of the positive signed ranks.
                                  where var(7+ =—^	-&	'-	"Vf, f2 -1 , a is the number of
                                            \   /       24        48^ J  J   '
                                  tied d, groups and (/ is the size of group/

STEP 4.  a)  Critical Value:  If n < 20, then use Table A-7 to find:     If n > 20, then use Table A-1 to find:
                                  i)  n
                                  ii) wa
                                  iii) Wa/2
                                                                      Zf.a/2
   STEP 4.  b)  p-value: If n < 20, then use Table A-7 to find:
               i)   p(w 20, then use Table A-1 to find:
                                                                i)   P(Z>z0)
                                                                ii)  P(Z|z0|)
     STEP 5.  a) Conclusion:  If n< 20, then:                  If n> 20, then:
     i)       If T+ >n(n+'l)/2 - wa, then reject the null           i)  If z0 > zi_a, then reject the null
     ii)  If T+  < wa, then reject the null hypothesis.              ii)  If z0 < za, then reject the null
     iii) If T+  > n(n+'l)/2 - w^ or T+ < wa, then reject the null   iii)  If |z0| > z-i-o/2, then reject the null
   STEP 5.  b) Conclusion:
                                  If p-value < a, then reject the null hypothesis that the true population
                                  location is equal to the threshold C.
   STEP 6.  If the null hypothesis was not rejected and n>20, then the sample size necessary to achieve the
            DQOs assuming only one false acceptance error rate (fi at /*i) has been specified is
            If this is true then the false acceptance error rate has probably been satisfied (the value of a' is a for
            a one-sided test and a 12 for a two-sided test).
EPA QA/G-9S
                                                 64
February 2006

-------
                 Box 3-19: An Example of the Wilcoxon Signed Rank Test (One-Sample)

   The following 10 data points (in ppb) will be used to test the null hypothesis that the population median is at
   least 1000 ppb versus the lower-tail alternative. The decision maker has specified a 10% false rejection  error
   rate (a) at 1000 ppb (C), and a 20% false acceptance error rate (/?) at 900 ppb).

                    974, 1044,  1093, 897, 879, 1161, 839, 824, 796,  <750 (detection limit)

   COMPUTATIONS:  For this example, the only option is to  assign the value 375 ppb (DL/2) to the nondetect.
   The table below displays the remaining computations.
Xi
d,
Id/I
rank
Ri
974
-26
26
1
-1
1044
44
44
2
2
1093
93
93
3
3
897
-103
103
4
-4
879
-121
121
5
-5
1161
161
161
6.5
6.5
839
-161
161
6.5
-6.5
824
-176
176
8
-8
796
-204
204
9
-9
375
-625
625
10
-10
   STEP1. Null Hypothesis:           H0: median > 1000 ppb

   STEP 2. Alternative Hypothesis:     HA: median < 1000 ppb

   STEP 3. Test Statistic:             Since n <20, compute T+ =  Vf?; =2 + 3 + 6.5 = 11.5
   STEP 4. a)  Critical Value:

   STEP 4. b)  p-value:


   STEP 5. a)  Conclusion:


   STEP 5. b)  Conclusion:
Since n < 20, Table A-7 is used to find w0.io = 14.

Since n < 20, Table A-7 is used to find the p-value is between 0.05 and
0.075. Using software, the p-value is found to be 0.0527.

Since test statistic = T+ = 11.5 < 14 = critical value, we reject the null
hypothesis that the true population median is at least 1000.

Since p-value = 0.0527 < 0.10 = significance level, we reject the null
hypothesis that the true population median is at least 1000.
   NOTE: The Sign test failed to reject the null hypothesis for this example.  Recall that the Wilcoxon Signed
   Rank test has more power than the Sign test if the distribution of the population is symmetric.
        The hypothesis tests in this section may be used to determine if there is evidence that
0\-02< SQ, 6\ - 62 > SQ, or 6\ - 62 ^ So, where 6\ and 82 represent population means, medians, or
proportions and SQ represents the threshold value.  Also, there are confidence interval procedures
to estimate 61-62 . Section 3.3.1.1 covers parametric methods for comparing two independent
populations, while Section 3.3.1.2 covers parametric methods for comparing two paired
populations.  Section 3.3.2 describes nonparametric counterparts to the parametric methods.

3.3.1   Parametric Methods

        These methods rely on the knowing the specific distribution of the populations or of the
statistics of interest.
EPA QA/G-9S
             65
February 2006

-------
3.3.1.1   Independent Samples

3.3.1.1.1  The Two-Sample f-Test and Confidence Interval (Equal Variances)

Purpose: Test for a difference or estimate the difference between two population means when it
can be assumed the population variances are approximately equal.

Data: A simple or systematic random sample x\, x2, .  . .  , xm from one population, and an
independent simple or systematic random sampley\,yi, .  . .  ,yn from the second population.

Assumptions: The two populations are independent. If not, then it is possible that a paired
method  could be used. Both are approximately normally distributed or the sample sizes are large
(m and n both at least 30).  If this  is not the case, then a nonparametric procedure is an
alternative. Finally, the variances of both populations are approximately equal. If the population
variances are not equal (tests are available in Section 4.5), then use the methods of the next
section.

Limitations and Robustness: The two-sample t-test with equal variances is robust to moderate
violations of the assumption of normality, but not to large inequalities of variances. An
alternative is the parametric methods for unequal variances described in the next section. The t-
test is not robust against outliers because sample means and standard deviations are sensitive to
outliers.

Directions for the two-sample t-test with equal variances are contained in Box 3-20, with an
example in Box 3-22.  Directions for a two-sample confidence interval with equal variances are
contained in Box 3-21.

3.3.1.1.2  The Two-Sample f-Test and Confidence Interval (Unequal Variances)

Purpose: Test for a difference or estimate the difference between two population means when it
is suspected the population variances are not equal.

Data: A simple or systematic random sample x\, x2, .  . .  , xm from the one population, and an
independent simple or systematic random sampley\,yi, .  . .  ,yn from the second population.

Assumptions: The two populations are independent. If not, then it is possible that a paired
method  could be used. Both are approximately normally distributed or the sample sizes are large
(m and n both at least 30).  If this  is not the case, then a nonparametric procedure is an
alternative.

Limitations and Robustness: The two-sample t-test with unequal variances is robust to  moderate
violations of the assumption of normality.  The t-test is also not robust to outliers because sample
means and standard deviations are sensitive to outliers.
EPA QA/G-9S                                66                               February 2006

-------
Directions for the two-sample t-test with unequal variances are contained in Box 3-23, with an
example in Box 3-25.  Directions for a two-sample confidence interval with unequal variances
are contained in Box 3-24.
                    Box 3-20: Directions for the Two-Sample Mest (Equal Variances)
                                                                 (m-~
                                                                       m +n -2
COMPUTATIONS: Calculate the sample means, X and Y , and the sample variances, sx and sy of the two

populations. Also compute the pooled standard deviation estimate,  sp = 1

STEP 1. Null Hypothesis:           H0 :  /ux - juY = £0

STEP 2. Alternative Hypothesis:     i)  HA :  jux - juY > £0 (upper-tail test)
                                 ii)  HA : /ux-/uyfo)
                                         ii) P(tm+n-2|f0|)

                                 i)   If to > tm+n_2 \_a , then reject the null hypothesis that the true difference
                                     between population means is equal to the threshold So.
                                 ii)   If to < -tm+n-2,-\-a • tnen reject the null hypothesis.
                                 iii)  'f |^o| > tm+n-2 -\-ai2 < tnen reject the null hypothesis.

                                 If p-value < a, then reject the null hypothesis that the true difference
                                 between population means is equal to the threshold So.
 STEP 6.  If the null hypothesis was not rejected, there is only one false acceptance error rate (/?at <5|), and both m

          and n are at least —p     	__£i_ +_b«_ i then the sample sizes were probably large enough to
                             (S,-S0)2        4
          achieve the DQOs.  The value of a' is a for a one-sided test and a/2 for a two-sided test.
EPA QA/G-9S
                                                67
February 2006

-------
               Box 3-21 :  Directions for a Two-Sample t Confidence Interval (Equal Variances)
COMPUTATIONS:  Calculate the sample means, X and  Y , and the sample variances, s|- and Sy of the two
populations.  Also compute the pooled standard deviation estimate, sp =
                                                                  (m -1) • Sx + (n -1) •
                                                                        m + n -2
A100(1 - a)% confidence interval for ftx - ftY  is \X-Y)±tm+n-2,i-a/2 'sp\l— + —, where Table A-2 is used to


find fm+n-2,1-a/2 •
                     Box 3-22: An Example of a Two-Sample f-Test (Equal Variances)

At a hazardous waste site, an area cleaned using an in-situ methodology (area 1) was compared with a similar, but
relatively uncontaminated reference area (area 2). If the in-situ methodology worked, then the average contaminant
levels at the two sites should be approximately equal.  If the methodology did not work, then area 1 should have a
higher average than the reference area.  Suppose 7 random samples were taken from area 1, and 8 were taken
from area 2.  Methods described in Section 4.5 were used to determine that the variances were essentially equal.
The false rejection error rate was set at 5% and the false acceptance error rate was set at 20% (fi) if the difference
between the areas is 2.5 ppm (5|).
COMPUTATIONS:  The sample means and sample variances are X-, = 7.8,  X2 = 6.6 ,  sf = 2.1, and s2 = 2.2.

                                                         = 1.4676
       ,_,,_,_,_,.,•   •         1(7-1). 2.1 + (8-1). 2.2
The pooled standard deviation is:  $p = J-	-	-	-	~ '
                                          7 + 8-2
STEP 1.  Null Hypothesis:
                               H  :
STEP 2.  Alternative Hypothesis:  HA : /^ - ju2 > 0
STEPS.  Test Statistic:




STEP 4.  a) Critical Value:

STEP 4.  b) p-value:



STEP 5.  a) Conclusion:


STEP 5.  b) Conclusion:
                                                    7.8-6.6-0
                                       ,.[1 + 1    1.4676J1 + 1
                                       J\m  n           V7  8
                                                                = 1.5799
                               Using Table A-2,  fm+IJ_2,i_a = fi3,0.95 = 1 -771

                               Using Table A-2, 0.05 < p-value < 0.10.  (The exact
                               p-value = P(fm+n_2 >fo ) = p(fis > 1.5799) = 0.0691)

                               Since test statistic = 1.5799 < 1.771 = critical value, we fail to reject the null
                               hypothesis of no difference between population means.

                               Since p-value = 0.0691 > 0.05 = significance level, we fail to reject the null
                               hypothesis of no difference between population means.
STEP 6.  Since the null hypothesis was not rejected and there is only one false acceptance error rate, we can
         compute the sample sizes necessary to achieve the DQOs.  Each sample size must be at least
                                     (2.5-0)2

         Since m and n are both greater than 4.94, the false acceptance error rate has probably been satisfied.
EPA QA/G-9S
                                                  68
February 2006

-------
                    Box 3-23:  Directions for the Two-Sample Mest (Unequal Variances)

 COMPUTATIONS:  Calculate the sample means, X and Y , and the sample variances, Sx and Sy of the two
 populations.  Also, compute the degrees of freedom (Satterthwaite's approximation) for the test:
                                         ^2
                        df=-
                                  2    2
                                 SX  } SY
                                 m   n
                                          sy
                           m
                               (m-l)  n2(n-
 STEP 1.  Null Hypothesis:
                                  Ho :
                                             -, rounded down to the next integer.
 STEP 2. Alternative Hypothesis:
 STEPS. Test Statistic:
STEP 4.  a) Critical Value:
                                  i)  HA :  jux - juY > <%  (upper-tail test)
                                  ii)   HA : jux-juY< £0  (lower-tail test)
                                  ill)  HA : /UX-/UY ^<50  (two-tail test)
                                              -Sn
                                         L2    _2
                                          SX f  SY
                                        V m    n
                                   Use Table A-2 to find:
 STEP 4.  b)  p-value:
                                  Use Table A-2 to find:
 STEP 5. a) Conclusion:
                                  i)   If k > tfjf i_a , then reject the null hypothesis that the difference
                                      between the population means is equal to <%.
                                  ii)   If to < -tdf,-\-a • tnen reject H0.
                                  iii)  lf |fo| > W,i-a/2 . tnen reject H0.
 STEP 5.  b) Conclusion:
                                  If p-value < a, then reject the null hypothesis that the difference between
                                  the population means is equal to So.
 STEP 6.  If the null hypothesis was not rejected, then there is no simple method to determine if the sample sizes
          were large enough to achieve the DQOs.
EPA QA/G-9S
                                                  69
February 2006

-------
             Box 3-24: Directions for a Two-Sample t Confidence Interval (Unequal Variances)

 COMPUTATIONS: Calculate the sample means, X and Y , and the sample variances, Sx and sy of the two
 populations.  Also, compute the degrees of freedom (Satterthwaite's approximation) for the confidence interval:
                                          ,2
                         df=-
                                   2
                                   l
                                   m
                   •, rounded down to the next integer.
                             m2(m-l)   n

A 100(1 - a)% confidence interval for /ux -//y

'm+n-2, 1-a/2 •
                /—  — \           [s2   s2"
              is \X - YJ+fdf i-a/2 'v~^~+~^~ > wnere Table A-2 is used to
                                                                                                find
                   Box 3-25: An Example of the Two-Sample f-Test (Unequal Variances)

 At a hazardous waste site, an area cleaned using a new methodology (area 1) was compared with a similar area
 cleaned with the standard technology (area 2). If the new methodology worked, then the two sites should be
 approximately equal in average contaminant levels. If the new methodology did not work, then area 1  should
 have a higher average than the reference area. Suppose 7 random samples were taken from area 1 and 8 were
 taken from area 2.  As the contaminant concentrations in the two areas are supposedly equal, we will test no
 difference in population means versus the upper-tail alternative. Using Section 4.5, it was determined  that the
 variances of the two populations were not equal, therefore using Satterthwaite's method is appropriate. The false
 rejection error rate was set at 5% and the false acceptance error rate was set at 20% (ft) if the difference between
 the areas is 2.5 ppm.

 COMPUTATIONS:  The sample means and sample variances are X-, =9.2 , X2 = 6.1, s2 =1.3, and s| = 5.7 .
  Satterthwaite's approximation of the degrees of freedom for the test is:
                                         ,2
                        df=-
                                  2    2
                                 SX  } SY
                                  m
                                       n
                         1.3
                         7
                                                         5.7
                                         sy
                                                            5.7^
                                     • = 10.3-»10
m2(m-l)
                                                 72(7-l)   82(8-l)
  STEP 1. Null Hypothesis:
       H0 : //! - //2 = 0
  STEP 2. Alternative Hypothesis:


  STEP 3. Test Statistic:



  STEP 4. a) Critical Value:

  STEP 4. b) p-value:


  STEP 5. a) Conclusion:


  STEP 5. b) Conclusion:
         _(X1-X2}-S0_ 9.2-6.1-0 _Q
       fn —  — i     ^^= -- — i          — o.
                                               s2
                                                                = 3.271
                            13   57
               !L
               m    n

      Using Table A-2, tdf^_a =f10,o.95 =1.812

      Using Table A-2, p-value < 0.005.  (The exact
       p- value = P(tdf >f0) = p(fio > 3.271) = 0.0042 )

      Since test statistic = 3.271 > 1.812 = critical value, we reject the null
      hypothesis that there is no difference between the population means.

      Since p-value = 0.0042 < 0.05 = significance level, we reject the null
      hypothesis that there is no difference between the population means.
EPA QA/G-9S
                      70
                                                                                         February 2006

-------
3.3.1.1.3  Two-Sample Test for Population Proportions

Purpose: Test for a difference between two population proportions, PI and PI. A simple or
systematic random sample, xi,...,xn, from one population and an independent simple or
systematic random sample, y\,yi,  . .  . ,yn, from second population of interest.

Assumptions: The data constitutes independent simple or systematic random samples from the
populations.

Limitations and Robustness: To ensure the normal approximation is appropriate, compute mp\,
m(l-pi), np2, and n(l-p2), where m and n are the sample sizes and/?i and/>2 are the sample
proportions. If all of the products are at least  5, then the normal approximation may be used.
Otherwise, seek the assistance from a statistician as exact tests must be used. Since data
positioning is used rather than actual data values, the procedures are robust to outliers.

Directions for the two-sample test for proportions are contained in Box 3-26, with an example in
Box 3-28. Directions for a two-sample confidence interval for the difference between
proportions are contained in Box 3-27.

3.3.1.2   Paired Samples

       Observations from paired populations are correlated. The general set up for this teat
involves taken two measurements upon one group of sampling units at separate instances; for
example, measurements before and after clean-up, or two labs making separate measurements on
a single set of objects.

3.3.1.2.1  The Paired f-Test and Confidence Interval

Purpose: Test for or estimate the difference between two paired population means.

Data: Two paired data setsxi,...,xn andy\,...,yn-

Assumptions: The two data sets come from approximately normal distributions or n > 30.

Limitations and Robustness: Since there is really only one sample, the limitations for the paired
7-test are the same as those for the one-sample f-test. These methods are robust against the
population distribution deviating moderately from normality. However, they are not robust
against outliers. In either case, the nonparametric methods of section 3.3.2.2 offer an alternative.
 Finally, these methods have difficulty dealing with non-detects. The substitution or adjustment
procedures of chapter 4 are  an alternative, but it is best to use a nonparametric method.

Directions for the paired t-test are given in Box 3-29, with an example in Box 3-31. Directions
for a paired t confidence interval are given in  Box 3-30.
EPAQA/G-9S                                71                                February 2006

-------
                        Box 3-26:  Directions for a Two-Sample Test for Proportions

  COMPUTATIONS: Let p,, / = 1,2, denote the sample proportion of data values from population / that fit the
  characteristic of interest. Also, calculate the pooled sample proportion p, which is the total number of data
  values that fit the characteristic divided by the total sample size,  m + n .

  To ensure the normal approximation is appropriate, compute mp-\, m(1-pi), np2, n(1-p2). If all of these values
  are at least 5, then the normal approximation may be used.  Otherwise, seek assistance from a statistician.
  STEP1.  Null Hypothesis:
  STEP 2.  Alternative Hypothesis:
  STEPS.  Test Statistic:
STEP 4.  a)  Critical Value:
                                       H. D   D    £
                                     o • n ~ "2 = "o


                                   i)   HA :  P| -P2 ><50  (upper-tail test)

                                   ii)  HA  : p-\-p2< so (lower-tail test)

                                   iii)  HA : P^-P2#S0 (two-tail test)


                                    ,      (Pl~P2)-^0
                                      Use Table A-1  to find:
  STEP 4.  b) p-value:
                                   Use Table A-1  to find:
                                            Hi)  2.p(z
                                                       >z
  STEP 5.  a) Conclusion:
  STEP 5.  b) Conclusion:
                                   i)   If z0 > Zj_a , then reject the null hypothesis that the difference in
                                       population proportions is equal to So.
                                   ii)  If z0 < za, then reject the null hypothesis.

                                   iii)  If |z0| > z.|_a/2 , then reject the null hypothesis.

                                   If p-value < a, then reject the null hypothesis that the difference in
                                   population proportions is equal to So.
  STEP 6.  If the null hypothesis was not rejected, there is only one false acceptance error rate (/?at P-I-P2), and
           both m and n are at least
                                                       where - =
                                       (P2-Pi)2
           then the sample sizes were probably large enough to achieve the DQOs.  The value of a' is a for a
           one-sided test and  a 12  for a two-sided test.  If only one of m or n meets the sample size criterion,
           then use statistical software to calculate the power of the test, assuming that the true values for the
           proportions P-\ and PI are those obtained in the sample.  If the estimated power is below  1-/?, the false
           acceptance error rate has not been satisfied.
EPA QA/G-9S
                                                  72
February 2006

-------
                     Box 3-27:  Directions for Computing a Confidence Interval for the
                                     Difference Between Population Proportions

 COMPUTATIONS: Let p,, / = 1,2, denote the sample proportion of data values from population / that fit the
 characteristic of interest. Also, calculate the pooled sample proportion p, which is the total number of data values
 that fit the characteristic divided by the total sample size, m + n .

 To ensure the normal approximation is appropriate, compute mp-i, m(1-pi), np2, n(\-pi). If all of these values are
 at least 5, then the normal approximation may be used.  Otherwise, seek the assistance from a statistician as
 exact tests must be used.

 A 100(1 - a)% confidence interval for the difference between population proportions, P-\ - Pi, is
                      - p2) + Zi_a/2 • Jp(l - pi — + -} , where Table A-1  is used to find z^a
                       Box 3-28:  An Example of a Two-Sample Test for Proportions

  At a hazardous waste site, investigators must determine whether an area suspected to be contaminated with
  dioxin needs to be remediated.  The possibly contaminated area (area 1) will be compared to a reference area
  (area 2) to see if dioxin levels in area 1 are greater than dioxin levels in the reference area.  An inexpensive
  surrogate probe was used to determine if each individual sample is either "contaminated," i.e., over the health
  standard of 1 ppb, or "clean." The decision maker is willing to accept a false rejection decision error rate of 10%
  (a) and a false-negative decision error rate of 5% (/?) when the difference in  proportions between areas exceeds
  0.1. A team collected 92 readings from area 1  (of which 12 were contaminated) and 80 from area 2, the
  reference area, (of which 10 were contaminated).

  COMPUTATIONS: The sample proportion for area 1 is p-i = 12/92 = 0.130, the sample proportion for area 2 is
  p2 = 10/80 = 0.125, and the pooled sample proportion  is p = (12 + 10) / (92 + 80 ) = 0.128.  Since mp-\ = 12,
  m(1-pi)  = 80, np2 = 10, n(\-pi) = 70 are all at least 5, the normal approximation is appropriate.

  STEP 1. Null Hypothesis:            H0 : P, - P2 = 0

  STEP 2. Alternative Hypothesis:      HA:P1-P2>0  (upper-tail test)

  STEP 3. Test Statistic:               zn =  ,    Pl"P2     =        0-13-0.125       = 0.10.
                                                     + ll    |o.128(l-0.128)f—+ -
                                                   m   n)   \      v        \92  8C
                                                                               80,

STEP 4.  a)  Critical Value:           Using Table A-1, z^_a = zogo =1.282

STEP 4.  b)  p-value:                 Using Table A-1, P(Z >ZO) = P(Z > 0.1 o) = 1-0.5398 =0.4602

STEPS,  a)  Conclusion:             Since test statistic = 0.10 < 1.282 = critical value, we fail to reject the null
                                   hypothesis of no difference between population proportions.

STEPS,  b)  Conclusion:             Since p-value = 0.4602 > 0.10 = significance level, we fail to reject the
                                   null hypothesis of no difference between population proportions.

STEP 6.  Since the null hypothesis was not rejected and there is only one false acceptance error rate (/? = 0.05
         at a difference of P-\ - P2 = 0.1) has been specified, it is possible to calculate the sample sizes that
         achieve the DQOs.  So m and n must be of size at least

                              2(1.282+ 1.645)20.1275(1-0.1275) _,nng
                                                              — i yu.b,
                                            0.12

         since P =—	:	= 0.1275 .  As both m and n are less than 190.6, the false acceptance error

         rate has not been satisfied.  Therefore, the null hypothesis was not rejected, but the samples sizes
         were not large enough to ensure adequate power in the test.
EPA QA/G-9S                                      73                                       February 2006

-------
                                 Box 3-29:  Directions for the Paired f-Test

 COMPUTATIONS:  Compute the differences d, = y,-x, for/=1	n and the sample mean, D , and sample
                    standard deviation, SD, of the differences.
 STEP 1.
             Null Hypothesis:
Ho :
 STEP 2.     Alternative Hypothesis:        i)   HA :  JUD > S0  (upper-tail test)

                                          ii)  HA :  Ho < SQ  (lower-tail test)
                                          iii)  HA :  t*D *• $Q  (two-tail test)
 STEP 3.
             Test Statistic:
STEP 4.  a)   Critical Value:
                                          Use Table A-2 to find:
 STEP 4. b)  p-value:
                                        Use Table A-2 to find:
                                                0   P(fn-i>fo)
                                                  iii)  2.p(fn_1>|f0|)
STEPS,  a)
                 Conclusion:
 STEPS, b)
                Conclusion:
i)  If to > tn_l}l_a , then reject the null hypothesis that there is no
   difference between the population means.
ii)  If to < -tn-\,\-a • tnen reject the null hypothesis.

iii) lf |fo| > fn-i, i-a/2 • tnen reject the null hypothesis.

If p-value < a, then reject that the difference between the population
means is So.
 STEP 6.  If the null hypothesis was not rejected, there is only one false acceptance error rate (/?at /*i), and
         n >
              — -^— - - ~
                                 1~"
                                     , then the sample size was probably large enough to achieve the DQOs.
          The value of a' is a for a one-sided test and  a/2  for a two-sided test.
                    Box 3-30:  Directions for Computing the Paired t Confidence Interval

 COMPUTATIONS:  Compute the differences d/ = y/-x, for/=1	n and the sample mean, D , and sample
                    standard deviation, SD, of the differences.


 A 100(1 - a)% confidence interval for //D is D + tn_^ -\_ai2 • SD//- , where Table A-2 is used to find  tn_^ -\_ai2 .
                                              '    '    /-Jn                                  '   '
EPA QA/G-9S
                                                  74
                                                  February 2006

-------
                              Box 3-31:  An Example of the Paired f-Test

  Consider the following 9 pairs of data points (in ppb):
x,
y-
178
92
52
67
161
62
245
206
164
106
184
126
157
108
308
314
130
126
  This data will be used to test the null hypothesis that there is no difference in means versus the alternative that
  the mean of the first population is greater than the mean of the second population.  The decision maker has
  specified a 5% false rejection error rate and a 10% false acceptance error rate at a difference of 50 ppb.

  COMPUTATIONS:  The table below displays the computations.
                -86
15
-99
-39
-58
-58
-49
-4
  The sample mean of the differences is -41.33 and the sample standard deviation is 39.92.

  STEP 1.  Null Hypothesis:           H0 : /UD = 0


  STEP 2.  Alternative Hypothesis:     HA : //D < 0  (lower-tail test)


                                       D-SQ _-41.33-0 _  .,,,
  STEPS.  Test Statistic:



  STEP 4. a) Critical Value:

  STEP 4. b) p-value:




  STEP 5. a) Conclusion:


  STEP 5. b) Conclusion:
                                     _
                                     =
                      39.92
                            /9
         Using Table A-2, -tn_^_a = -t8j 0.95 = -1 .860 .

                 Using Table A-2, 0.005 < p-value < 0.01 .  (The exact
         p- value = P(fn_1 
-------
Data:  A random sample x\, x2,  . .  .  , xm from one population, and an independent random
sample y\, yi,  .  . .  , yn from the second population.

Assumptions:  The validity of the random sampling and independence assumptions should be
verified by review of the procedures used to select the sampling points. The two underlying
distributions are assumed to have approximately the same shape (variance) and that the only
difference between them is a shift in location.  A qualitative test of this assumption can be done
by comparing histograms.

Limitations and Robustness: The Wilcoxon signed rank test may produce misleading results if
there are many tied data values.  When many ties are present, their relative ranks are the same,
and this has the effect of diluting the statistical power of the Wilcoxon test. If possible, results
should be recorded with sufficient accuracy so that a large number of tied values do not occur.
Estimated concentrations should be reported for data below the detection limit, even if these
estimates are negative, as their relative magnitude to the rest of the data is of importance. If this
is not possible, substitute the value DL/2 for each value below the detection limit providing all
the data have the same detection limit. When different detection limits are present, all data could
be censored at the highest detection limit but this will substantially weaken the test. A
statistician should be consulted on the potential use of Gehan ranking.

Directions for the Wilcoxon Rank Sum test are given in Box 3-32, with an example in Box 3-33.
 Directions for the large sample  approximation for the Wilcoxon Ranked Sum test are given in
the example in Box  3-34.

3.3.2.1.2   The Quantile Test

Purpose: Test for a shift to the right in the right-tail  of population 1 versus population 2  This
may be regarded as being equivalent to detecting if the values in the right-tail of population 1
distribution are generally larger  than the values in the right-tail of the population 2  distribution.

Data: A simple or systematic random sample, x\, x2,  . . ., xn, from the site population and an
independent simple  or systematic random sample, y\, yi, . .  ., ym, from the background
population.

Assumptions:  The validity of the random sampling and independence assumptions is assured by
using proper randomization procedures, which can be verified by reviewing the procedures used
to select the sampling points.

Limitations and Robustness:  Since the Quantile test focuses on the right-tail, large outliers will
bias  results. Also, the Quantile test says nothing about the center of the two distributions.
Therefore, this test should be used in combination with a location test like the t-test (if the data
are normally distributed) or the Wilcoxon Rank Sum test.
EPA QA/G-9S                                76                                February 2006

-------
                           Box 3-32:  Directions for the Wilcoxon Rank Sum Test

  COMPUTATIONS:  Rank the pooled data from smallest to largest assigning average rank to ties.  Sum the ranks
  of the first population and denote this by R-\.  Then compute
  STEP 1. Null Hypothesis:
                                                     m(m+1
H0 :  jux - juY =0 (no difference between population means)
  STEP 2. Alternative Hypothesis:   i)  HA :  jux-juY>0  (upper-tail test)
                                  ii)  HA :  ,«x-/«y<0 (lower-tail test)
                                  ill) HA :  jux - juY * 0 (two-tail test)
  STEP 3. Test Statistic:
If m and n are at most 20, then the test statistic is Wo.

If m and n are greater than 20, then compute z0  = •  °
                                                                                VvarK)  '
                                  where var
                mn(m + n
              = - -
                     12
                                                                        mn
  STEP 4. a) Critical Value:
            to find:
g is the number of tied groups, and f/ is the number of ties in they  group.

If m and n< =20, use Table A-8 to find:        If m and n >20 use Table A-1

i)  mn - wa                            i)  zi_a
ii)  wa                                 ii)  za
  STEP 4. b) p-value:
If m and n <=20, use Table A-8 to find:
        i)  p(tVra 20 use Table A-1 to find:
                                                                   <%)}        iii) 2.p(z>|z0|)
  STEP 5. a) Conclusion:
  STEP 5. b) Conclusion:
If m and n are at most 20, then
        i)  If Wo > mn - wa, then reject the null hypothesis.
        ii)  If Wo < wa, then reject the null hypothesis.
        iii) If Wo > mn -  wa/2 or Wo < wa/2, then reject the null hypothesis.

If m and n are greater than 20, then
        i)  If ZD > z-i_a, then reject the null hypothesis of no difference
           between population means.
        ii)  If zo < za, then reject the null hypothesis.
        iii) If |z0| > z.|_a/2 , then reject the null hypothesis.

If p-value < significance level, then reject the null hypothesis
  STEP 6. If the null hypothesis was not rejected, then the sample sizes necessary to achieve the DQOs should be
          computed. If the sample sizes are large, only one false acceptance error rate (/? at <5i) has been
          specified, then the false  acceptance error rate has probably been satisfied if both m and n are at least
                                 1.16-.
  NOTE:  The value of a' is a for a one-sided test and  a 12  for a two-sided test. The large sample normal
         approximation  is adequate as long as min(m,  n) > 10.
EPA QA/G-9S
                 77
                                                                 February 2006

-------
                           Box 3-33: An Example of the Wilcoxon Rank Sum Test

  At a hazardous waste site, an area cleaned (area 1) was compared with a relatively uncontaminated reference area
  (area 2). If the methodology worked, then the two sites should be approximately equal in average contaminant
  levels.  If the methodology did not work, then area 1 should have a higher average than the reference area. The false
  rejection error rate was set at 10% (a) and the false acceptance error rate was set at 20% (/?) if the difference
  between the areas is 2.5 ppb.  Seven random samples were taken from area 1 and 8 samples were taken from area
  2:

  Areal: 17,23,26, 5, 13, 13, 12
  Area 2: 16,20, 5,4, 8, 10, 7, 3

  COMPUTATIONS:  The ordered pooled data and their ranks are (Area 1 denoted by *):
Pooled
data
Rank
3
1
4
2
5*
3.5*
5
3.5
7
5
8
6
10
7
12*
8*
13*
9.5*
13*
9.5*
16
11
17*
12*
20
13
23*
14*
26*
15*
  The sum of the ranks of area 1 is R-i =3.5 + 8 + 9.5 + 9.5+ 12+14+ 15 = 71.5 and W0 = 71.5 - 7(7 + 1)72 = 43.5

  STEP1. Null Hypothesis:         H0 : jux - juY =0


  STEP 2. Alternative Hypothesis:   HA : jux-juY>0
  STEPS. Test Statistic:

  STEP 4. a) Critical Value:

  STEP 4. b) p-value:
                             Since m and n are less than 20, the test statistic is Wo.

                             Using Table A-8, mn - w0.io = 56 - 16 = 40

                             Using Table A-8, mn - wo.os  40 = Critical Value, we reject the null hypothesis of no
                               difference between population means.
Directions for the Quantile test are contained in Box 3-35, with an example in Box 3-36.

3.3.2.1.3  The Slippage Test

Purpose: Test for a shift to the right in the extreme right-tail of population 1 (site) versus
population 2 (background).  This is equivalent to asking if a set of the largest values of the site
distribution  are larger than the maximum value of the background distribution.

Data: A simple or systematic random sample, x\, x2, .  .  ., xm, from the site population and an
independent simple or systematic random sample, y\, yi,  .  .  ., yn, from the background
population.
EPA QA/G-9S
                                              78
February 2006

-------
                      Box 3-34:  A Large Sample Example of the Wilcoxon Rank Sum Test

 Arsenic concentrations (in ppm) from a site are to be compared to a reference area. The null hypothesis is that the
 means of the two areas are equal versus the upper-tail alternative. The false rejection error rate was set at 5% (a) and
 the false acceptance error rate was set at 20% (/?) if the difference between the areas is 2.5 ppm.

 Site concentrations (m = 22): 11.2, 11.3,12.2, 13.2,14.2,15.9, 16.3,17.1, 18.6, 19.2,21.5,
                            22.3,  22.4, 22.7, 22.8, 23.3, 24.1, 25.8, 30.2, 30.7, 31.4, 37.1

 background concentrations (n = 21): 6.1, 8.5,  11.1, 11.3, 12.6, 12.8, 13.6, 15.0, 15.2, 15.3, 16.1,
                                   16.2, 17.0, 17.1, 17.6, 19.2, 19.2, 19.6, 21.1, 22.2, 25.0

 COMPUTATIONS:  The ordered pooled data and their ranks are (site concentrations are denoted by *):
Value
6.1
8.5
11.1
11.2*
11.3
11.3*
12.2*
12.6
12.8
13.2*
13.6
Rank
1
2
3
4*
5.5
5.5*
7*
8
g
10*
11
Value
14.2*
15.0
15.2
15.3
15.9*
16.1
16.2
16.3*
17.0
17.1
17.1*
Rank
12*
13
14
15
16*
17
18
19*
20
21.5
21.5*
Value
17.6
18.6*
19.2
19.2
19.2*
19.6
21.1
21.5*
22.2
22.3*
22.4*
Rank
23
24*
26
26
26*
28
29
30*
31
32*
33*
Value
22.7*
22.8*
23.3*
24.1*
25.0
25.8*
30.2*
30.7*
31.4*
37.1*
Rank
34*
35*
36*
37*
38
39*
40*
41*
42*
43*
 The sum of the site ranks is

      R! = 4 + 5.5 + 7 + 10 + 12 + 16 + 19 + 21.5 + 24 + 26 + 30 + 32 + 33 + 34 + 35 + 36 + 37 + 39 + 40 + 41 + 42 + 43 = 587

 and W0 = 587 - 22(22 + 1 )/2 = 334.

 STEP 1. Null Hypothesis:         H0 : jux - juY =0


 STEP 2. Alternative Hypothesis:   HA : jux - juY > 0


 STEP 3. Test Statistic:           Since m and n are greater than 20, the test statistic is z0 = —P   ,   ,  . To
                                                                                       VV^M/Q]
                                compute var(Wo), we need to determine the number of tied groups and the number
                                of values in  each group. There are 3 tied groups, so g = 3. The number of tied
                                values in the groups are 2 (at 11.3), 2 (at 17.1), and 3 (at 19.2).  Therefore,
       var(M/0) =
_ 22-21-(22 + 21+1)
         12
                                             22-21
                                     12-(22+ 21X22+ 21-1)


                                             334-22-21/2
[2 • (22 -1)+ 2 • (22 -1+ 3 •
= 1693.23.
                                 Finally, z0 =
                                            = 2.50.
 STEP 4.  a)  Critical Value:

 STEP 4.  b)  p-value:

 STEP 5.  a)  Conclusion:


 STEP 5.  b)  Conclusion:
                                V1693.23

                  Using Table A-1, zo.S5 = 1.645.

                  Using Table A-1,  P(Z > 2.50) = 1 - 0.9938 = 0.0062 .

                  Since Test Statistic = 2.50 > 1.645 = Critical Value, we reject the null hypothesis of no
                  difference between population means.

                  Since p-value = 0.0062 < 0.05 = significance level, we reject the null hypothesis of no
                  difference between population means.
EPA QA/G-9S
                                     79
                                 February 2006

-------
                                 Box 3-35:  Directions for the Quantile Test

   COMPUTATIONS: Select a quantile, 0.5 < b-\ < 1.  Rank the pooled data from smallest to largest. Find the
   number of pooled data points larger than the £>{h quantile,

        c = m + n -floor{(m + n -i)-b-\}-'\, where floor means to calculate the value and discard all decimals.
   STEP 1. Null Hypothesis:

   STEP 2. Alternative Hypothesis:



   STEP 3. Test Statistic:

   STEP 4. a) Critical Value:

   STEP 4. b) p-value:
                                 H0: The right-tails of the two population distributions are the same.

                                 HA: The right-tail of the distribution of Population 1 (site) is shifted to the
                                      right, i.e., the values in the right-tail of the site distribution are larger
                                      than the values in the right-tail of the background distribution.

                                 s = number of site samples greater than the b\h quantile.

                                 If m and n are both at most 20, then use Table A-19 to find qa.

                                 If m and n are both at least 15, then use Table A-1 to find
                                   where /j. = n •
                                                   and   qa, then reject the null hypothesis that the right tails of the two
                                 population distributions are the same.

                                 If p-value < a, then reject the null hypothesis that the right tails of the two
                                 population distributions are the same.
                                 Box 3-36: A Example of the Quantile Test
  At a hazardous waste site a new, cheaper,  in-site methodology was compared against an existing methodology
  by remediating separate areas of the site using each method.  It will be assumed that the new methodology
  works as well as the old and we will test for evidence that the new method leaves higher concentrations.  A
  Quantile Test with a significance level of 0.05 will be used to make this determination based on 12 samples from
  the area remediated using the new methodology and 7 samples from the area remediated using the standard
  methodology.
   New Methodology:
   Standard Methodology:
                        7, 18,2,4,6, 11, 5, 9, 10,2, 3, 3
                        17, 8,20,4,6, 5,4
   COMPUTATIONS: The 0.75 quantile is selected. The ranked pooled data set is:

                     2*  2*  3*,  3*, 4, 4,  4*, 5*, 5,  6,  6*, 7*,  8,  9*, 10*, 11*,  17,  18*, 20,

   where * denoted samples from the new methodology portion of the site. The number of values larger than the
   0.75 quantile is:

                    c=m+n- floor{(m + n -1)- fo,}-1 = 7 +12 - floor{(7 +12 -1)- 0.75}-1 = 5
STEP 1.  Null Hypothesis:

STEP 2.  Alternative Hypothesis:



STEPS.  Test Statistic:

STEP 4.  a)  Critical Value:

STEP 4.  b)  p-value:


STEP 5.  a)  Conclusion:
                                   H0:  The right-tails of the two population distributions are the same.

                                   HA:  The right-tail of the distribution of Population 1 (site) is shifted to the
                                        right, i.e., the values in the right-tail of the site distribution are larger
                                        than the values in the  right-tail of the background distribution.

                                   s = number of site samples greater than the 0.75 quantile = 3

                                   Since m and n are both less than 20, we use Table A-19 to find go. 10 = 5

                                   Since m and n are both less than 15, we can't use the normal
                                   approximation for the p-value.

                                   Since s = 3 < 5 = g0.io, we fail to reject the null hypothesis that the right-tails
                                   of the two-population  distributions are the same.
EPA QA/G-9S
                                                 80
February 2006

-------
Assumptions:  The validity of the random sampling and independence assumptions is assured by
using proper randomization procedures, which can be verified by reviewing the procedures used
to select the sampling points.

Limitations and Robustness: Since the Slippage test focuses on the right-tail, large outliers will
bias results.  Also, the Slippage test says nothing about the center of the two distributions.
Therefore, this test should be used in combination with a location test like the Mest (if the  data
are normally distributed) or the Wilcoxon Rank Sum test.

Directions for the slippage test are contained in Box 3-37, with an example in Box 3-38.
                             Box 3-37:  Directions for the Slippage Test

   COMPUTATIONS: Let x-\,...,xm be the site data and y-\,...,yn be the background data. Order the samples
   separately.
   STEP 1. Null Hypothesis:
H0: The right-tails of the two population distributions are the same.
   STEP 2. Alternative Hypothesis:   HA: The extreme right-tail of the distribution of Population 1 (site) is
                                    shifted to the right, i.e., the largest values of the site distribution are
                                    larger than the largest values of the background distribution.
   STEP 3. Test Statistic:

   STEP 4. a)  Critical Value:


   STEP 4. b)  p-value:




   STEP 5. a)  Conclusion:


   STEP 5. b)  Conclusion:
s = number of site values greater than the maximum background value.

Use Table A-1 4 to find Sa.  Note that a can be 0.01, 0.05, or 0.10.
           (m + n -i -l)l _    n-m\  -^
              ''    ^        ~    ~
        i=s
Where "n!' = nx(n-1)x(n-2)x(n-3)x...3x2x1

If s > Sa, then reject the null hypothesis that the right-tails of the two
population distributions are the same.

If p-value < a, then reject the null hypothesis that the right-tails of the two
population distributions are the same.
3.3.2.2   Paired Samples

       Recall that the observations from paired populations are correlated. The general setting
involves taken two measurements upon one group of sampling units at separate instances; for
example, measurements before and after clean-up or two labs making separate measurements on
a single set of objects.

3.3.2.2.1   The Sign Test

Purpose: Test for a difference between the medians of two paired populations.  This test is very
similar to the one sample version presented in section 3.2.2.1.
EPA QA/G-9S
               81
February 2006

-------
                              Box 3-38:  A Example of the Slippage Test

  At a hazardous waste site a new, cheaper, in-site methodology was compared against an existing methodology
  by remediating separate areas of the site using each method.  It will be assumed that the new methodology
  works as well as the old and we will test for evidence that the new method leaves higher concentrations.  The
  Slippage Test with a significance level of 0.05 will be used to make this determination based on 12 samples
  from the area remediated using the new methodology and 7 samples from the area remediated using the
  standard methodology.

  New Methodology:       7, 18,2,4,6,  11, 5, 9, 10,2, 3, 3
  Standard Methodology:    17, 8, 20,  4, 6, 5, 4

  COMPUTATIONS: The ordered samples are:

  New Methodology:       2, 2, 3, 3,  4, 5, 6, 7, 9, 10, 11, 18
  Standard Methodology:    4, 4, 5, 6,  8, 17, 20

  STEP 1. Null  Hypothesis:        HO:  The right-tails of the two population distributions are the same.

  STEP 2. Alternative Hypothesis:   HA: The extreme right-tail of the distribution  of Population 1 (site) is
                                    shifted to the right, i.e., the largest values of the site distribution are
                                    larger than the largest values of the background distribution.

  STEP 3. Test Statistic:           s = number of site values greater than the maximum background value = 0.

  STEP 4. a) Critical Value:        Using Table A-14, S0.os =  6.

                                        .   0-1 /      .   v
  ___._,. ,.     .                   n • ml   ^-i (m + n -i -11!
  STEP 4. b) p-value:              1--,	\'Z_,  —7	T—   1-0 = 1


  STEPS, a) Conclusion:         Since test statistic = 0 < 6 = So.os, we fail to reject the null  hypothesis that
                                the right-tails of the two population distributions are the same.

  STEP 5. b) Conclusion:         Since p-value = 1 > 0.05 = significance level, we fail to reject the null
                                hypothesis that the right-tails of the two population distributions are the
                                same.
Data: Two paired data setsxi,...,xn andy\,...,yn selected randomly or systematically.

Assumptions:  The Sign test can be used no matter what the underlying distributions may be.

Limitations and Robustness:  The Sign test has less power than the two-sample ^-test or the
Wilcoxon Signed Rank Test.  However, the Sign test makes no distributional assumptions like
the other two tests and it can handle non-detects (if the detection limit is below the threshold).

Directions for the paired  populations sign test are contained in Box 3-39, with an example in
Box 3-40.

3.3.2.2.2   The Wilcoxon Signed Rank Test

Purpose:  Test for a difference between the locations (means or medians) of two paired
populations.  This test is very similar to the one sample version presented in Section 3.2.2.2.

EPA QA/G-9S                                    82                                    February 2006

-------
                           Box 3-39:  Directions for the Sign Test (Paired Samples)

  COMPUTATIONS:  Compute the deviations d, = x, - y,.  If any of the deviations are zero delete them and
  correspondingly reduce the sample size. Finally, compute 6 = number of differences greater than zero.
  STEP1. Null Hypothesis:

  STEP 2. Alternative Hypothesis:



  STEPS. Test Statistic:
                                    H0: median 1 = median 2

                                    i)  HA: median 1 > median 2 (upper-tail test)
                                    ii)  HA: median"! < median 2  (lower-tail test)
                                    iii) HA: median 1 ^ median 2 (two-tail test)
                                    If n < 20, then the test statistic is 6.
                                      If n > 20, then the test statistic is  z0 =
                                                                         B-n/2
STEP4.  a) Critical Value:    If n < 20, then use Table A-1 8 to find:
  STEP 4. b) p-value:
  STEP 5. a) Conclusion:
  STEP 5. b) Conclusion:
                                                                    If n > 20, then use Table A-1  to find:
                                      i)  Bupper(n, 2a)                            i)  z-\.a
                                      ii) Biower(n, 2 a) - 1                         ii)  za
                                      iii) 6iower(n, a) and Slower (n, a)              iii)  Zf.a/2
                                                      f-\     n\  !
                                    If n<20, then let p(l)=\     —where
                                                    ^W   (i  2"
                                                                                       n\
                                                                                    (n-i)\i\
                                              iii)  2-min  >>(/),>>(/)
                                    If n > 20, then use Table A-1  to find:
                                            i)  P(Z>Z0)
                                            ii)  P(Z|z0|)

                                    If n<20, then
                                            i)   If 6 > Bupper(n, 2a), then reject the null hypothesis that the
                                               two population medians are equal.
                                            ii) If 6 < B|OWer(fl, 2a) -1, then reject the null hypothesis.
                                            iii) If 6 > 6upper(n, a) or 6 < 6|OWer(n, a) -1, then reject the null
                                               hypothesis.

                                    If n >20, then
                                            i)   If zo > zi-a, then reject the null hypothesis that the two
                                               population medians are equal.
                                            ii) If z0 < za, then reject the null hypothesis.
                                            iii) If |z0|  > z-i-o/2, then reject the null hypothesis.

                                    If p-value < a, then reject the null hypothesis that the two population
                                    medians are equal.
EPA QA/G-9S
                                                   83
February 2006

-------
                       Box 3-40: An Example of the Sign Test (Paired Samples)

  Consider the following 9 pairs of data points (in ppb):
X,
y/
178
92
52
67
161
62
245
206
164
106
184
126
157
108
308
314
130
126
  This data will be used to test the null hypothesis that there is no difference in medians versus the alternative
  that the median of the first population is greater than the median of the second population.  The decision maker
  has specified a 5% false rejection error rate and a 10% false acceptance error rate at a difference of 50 ppb.

  COMPUTATIONS: The table below displays the computations.
        d,
                86
-15
99
39
58
58
49
  Therefore, 6 = the number of differences greater than zero = 7.
  STEP1. Null Hypothesis:

  STEP 2. Alternative Hypothesis:

  STEPS. Test Statistic:

  STEP 4. a)  Critical Value:


  STEP 4. b)  p-value:


  STEP 5. a)  Conclusion:

  STEP 5. b)  Conclusion:
                                  Ho: £1 = fa

                                  HA: £1 > £2

                                  Since n < 20, the test statistic is 6 = 7.

                                  Since n < 20, Table A-18 is used to find Bupper(n, 2a) = 8.
                           = yf9l4 = 0.0898.
                                  Since n < 20, p-value
                                  Since test statistic = 7 < 8 = critical value, we fail to reject H0.

                                  Since p-value = 0.0898 > 0.05 = significance level, we fail to reject H0.
Data:  Two paired data sets xi,...,xn
                                          ,. . .yn selected randomly or systematically.
Assumptions:  The data sets come from a approximately symmetric distributions.

Limitations and Robustness: For large sample sizes (n > 50), the paired Mest is more robust to
violations of its assumptions than the Wilcoxon signed rank test. For small sample sizes, if the
data are not approximately symmetric or normally distributed, the sign test should be used.

The Wilcoxon signed rank test may produce misleading results if there are many tied data values.
 Ties have the effect of diluting the statistical power of the Wilcoxon test. If possible, results
should be recorded with sufficient accuracy so that a large number of tied values do not occur.
Estimated concentrations should be reported for data below the detection limit, even if these
estimates are negative, as their relative magnitude to the rest of the data is of importance. If this
is not possible, substitute the value DL/2 for each value below the detection limit providing all
the data have the same detection limit. When different detection limits are present, all data could
be censored at the highest detection limit but this will substantially weaken the test.  A
statistician should be consulted on the potential use of Gehan ranking.
EPA QA/G-9S
                                               84
                                                           February 2006

-------
        Directions for the paired sample Wilcoxon signed rank test are contained in Box 3-41,
with an example in Box 3-42.  Directions for the large sample version of the test are contained in
the example in Box 3-43.
                 Box 3-41:  Directions for the Wilcoxon Signed Rank Test (Paired Samples)
  COMPUTATIONS:  Compute the deviations d, = x, - y,. If any of the deviations are zero delete them and
  correspondingly reduce the sample size.  Rank the absolute deviations, |d,|, from smallest to largest. If there
  are tied observations, then assign the average rank.  Let R, be the signed rank of |d,|, where the sign of R, is
  determined by the sign of d,.
  STEP1.  Null Hypothesis:
  STEP 2.  Alternative Hypothesis:


  STEPS.  Test Statistic:
                                   HQ: location-: = location
                                   i)   HA: location-: > Iocation2 (upper-tail test)
                                   ii)  HA: location-: < Iocation2 (lower-tail test)
                                   iii)  HA: location-: * Iocation2 (two-tail test)
        If n < 20, then T+ =
                                                         Rj , the sum of the positive signed ranks.
                                     If n > 20, then z0 =
                                                         Vvar(T+)
                                               T+
                                    where var 7  =
                                                         24
                                                              	y^tj\tj -1], g is the number of
                                                                 T"O '"^™
                                    tied d, groups and (/ is the size of group/
STEP 4.  a) Critical Value:    If n < 20, use Table A-7 to find:
  STEP 4.  b)  p-value:
        ii)  wa
        iii) Wa/2
If n < 20, use Table A-7 to find:
i)  p(w 20, use Table A-1  to find:
                                                                      i)  z-,.a
                                                                      ii)  za
                                                                     If n > 20, use Table A-1 to find:
                                                                              i)  P(Z>ZO)
                                                                              ii)   P(Z|z0|)
STEPS,  a) Conclusion:
             i)  If T+  ^
                             lfn<20,
                             l)® - wa, then reject the null
               ii)  If T+  < wa, then reject the null hypothesis.
               iii) If T+  > 11(11+1 )/2 - w/a/2 or T+ < wa, then reject the null
                                             lfn>20,
                                        i)  If zo > zi.a, then reject the null
                                         ii)  If ZQ < za, then reject the null
                                        iii) If |zo| > z-i-o/2, then reject the null
  STEP 5.  b)  Conclusion:
                                   If p-value < a, then reject the null hypothesis that the two population
                                   locations are equal.
  STEP 6.  If the null hypothesis was not rejected, then the sample sizes necessary to achieve the DQOs
           should be computed. If the sample size is large, only one false acceptance error rate (/? at <5-i) has
           been specified, and
                                          var
           then the false acceptance error rate has probably been satisfied.  The value of a' is a for a one-
           sided test and a/2 for a two-sided test.
EPA QA/G-9S
                                                 85
                                                              February 2006

-------
               Box 3-42: An Example of the Wilcoxon Signed Rank Test (Paired Samples)
  Consider the following 9 pairs of data points (in ppb):
X,
y/
178
92
52
67
161
62
245
206
164
106
184
126
157
108
308
314
130
126
  This data will be used to test the null hypothesis that there is no difference in medians versus the alternative
  that the median of the first population is greater than the median of the second population. The decision maker
  has specified a 5% false rejection error rate and a 10% false acceptance error rate at a difference of 50 ppb.

  COMPUTATIONS: The table below displays the computations.
d.
W\
rank
Ri
86
86
8
8
-15
15
3
-3
99
99
9
9
39
39
4
4
58
58
6.5
6.5
58
58
6.5
6.5
49
49
5
5
-6
6
2
-2
4
4
1
1
  STEP1.  Null Hypothesis:

  STEP 2.  Alternative Hypothesis:

  STEPS.  Test Statistic:


  STEP 4.  a)  Critical Value:

  STEP 4.  b)  p-value:

  STEP 5.  a)  Conclusion:

  STEP 5.  b)  Conclusion:
HQ: medianl = median 2

HA: median 1< median 2 (lower-tail test)

Since n< 20, compute T+ =  Vf?,- =1 + 4 + 5 + 6.5 + 6.5 + 8 + 9 = 40
                       {/:R,->0}

Since n < 20, Table A-7 is used to find n(n+'l)/2 - WQ.OS = 45 - 8 = 37.

Since n < 20, Table A-7 is used to find the approximate p-value of
0.025. Using software, the p-value is found to be 0.0195.

Since test statistic = 40 > 37 = critical value, we reject the null
hypothesis of equal population medians.

Since p-value = 0.0195 < 0.05 = significance level, we reject the null
hypothesis of equal population medians.
  NOTE: The Sign test failed to reject the null hypothesis for this example. The Wilcoxon Signed Rank test has
  more power than the Sign test if the distribution of the population is symmetric.
3.4    COMPARING SEVERAL POPULATIONS SIMULTANEOUSLY

       This section describes procedures for comparing several population means
simultaneously.  The comparison is made between several treatment populations versus a single
control population.  For example, we can simultaneously test for a difference between the
concentrations at several sites versus  a single background area.

       The methods in section compare the several populations while controlling the overall
significance level.  If individual two-sample t-tesis are performed at significance level a, then
the overall significance level is higher than a.  Possible much higher if there are many
experimental groups.  For example, comparing two experimental groups to a control using two
two-sample ^-tests at significance level 0.05 results in an overall significance level  of 1-(1-
0.05)(1-0.05) = 0.0975. The tests in this section are more powerful in detecting differences
between the experimental groups and the control than other multiple comparison methods that
compare all possible pairs of population means, e.g., the ANOVAF-test and the Kruskal-Wallis
test.
EPA QA/G-9S
             86
February 2006

-------
          Box 3-43: A Large Sample Example of the Wilcoxon Signed Rank Test (Paired Samples)

  A hazardous waste site has recently gone through remediation.  To determine if the remediation method was
  effective, 24 paired samples (before and after clean-up) will be compared.  The following data will be used to
  test the null hypothesis that there is no difference in medians versus the alternative that the median
  concentration before is greater than the median concentration after clean-up.  The decision maker has specified
  a 5% false rejection error rate and a 10% false acceptance error rate at a difference of 30 ppb.

  before: 331, 351, 259, 323, 305, 336, 196, 233, 336, 349, 352, 341, 172, 253, 275, 285, 212, 349, 301, 343, 368, 332, 374, 311
  after:  246, 270, 229, 326, 295, 238, 278, 302, 331, 264, 267, 249, 288, 272, 270, 313, 337, 284, 271, 253, 295, 271, 289, 281

  COMPUTATIONS:  The table below displays the computation of the signed ranks.
d.
85
81
30
-3
10
98
-82
-69
W
85
81
30
3
10
98
82
69
rank
17.5
14
8
1
4
22
15
12
RI
17.5
14
8
-1
4
22
-15
-12
  STEP1.  Null Hypothesis:

  STEP 2.  Alternative Hypothesis:


  STEPS.  Test Statistic:
  STEP 4.  a)  Critical Value:

  STEP 4.  b)  p-value:

  STEP 5.  a)  Conclusion:


  STEP 5.  b)  Conclusion:
d.
5
85
85
93
-117
-19
5
-28
\d,\
5
85
85
93
117
19
5
28
rank
2.5
17.5
17.5
21
23
5
2.5
6
RI
2.5
17.5
17.5
21
-23
-5
2.5
-6
d.
-125
65
30
90
73
60
85
30
W
125
65
30
90
73
60
85
30
rank
24
11
8
20
13
10
17.5
8
RI
-24
11
8
20
13
10
17.5
8
HQ: median 1 = median 2

HA: median 1 > median 2 (upper-tail test)


Since n > 20, compute z0 = - .  ^ .    — .  First,
T+ =
                                               R, =17.5 + 14 + -- + 8 = 214. To compute  var(7+), we need
                                    to identify the number of tied groups and the number of values in each
                                    of the tied groups.  There are 3 tied groups so g = 3. The number of
                                    values in the tied groups are 2 (at 2.5), 3 (at 30), and 4 (at 85).
                                    Therefore,

,_.   „       214-24-(24+l)/4   .
Finally, z0 = - .    v     '- — = 1.
               Vl 223. 125
                                                                     00
                                                                     83
Since n > 20, Table A-1 is used to find zo.gs = 1 .645.

Since n > 20, Table A-1 is used to find P(Z > z0) = 1-0. 9664 = 0.0336 .

Since test statistic = 1.83 > 1.645 = critical value, we reject the null
hypothesis of equal population medians.

Since p-value = 0.0336 < 0.05 = significance level, we reject the null
hypothesis of equal population medians.
        The Dunnett test of Section 3.4.1.1 is a parametric test that compares several treatment
means to a control mean.  Section 3.4.2.1 presents a nonparametric alternative to the Dunnett
test.
EPA QA/G-9S
              87
February 2006

-------
3.4.1  Parametric Methods

       These methods rely on the knowing the specific distribution of the populations.

3.4.1.1    Dunnett's Test

Purpose: Test simultaneously for a difference between several population means and the mean
of a control population. A typical application would involve comparing different potentially
cleaned areas of a hazardous waste site to an uncontaminated reference area.

Data:  A set of k-l independent experimental random samples, xn,...,xin , i = I,..., k-l and an
independent random sample  from the control population, xcl,...,xcn .

Assumptions: The Dunnett test is similar to the two-sample Mest so the populations need to be
approximately  normal or the sample sizes need to be large (>30).  If this is not the case, then the
nonparametric  Fligner-Wolfe test can be used. However, that test simple detects a difference
between all population means. Both tests assume that the k populations have equal variances.

Limitations and Robustness:   The Dunnett critical values (Table A-14)  are for the case of equal
number of samples in the control and experimental groups, but are approximately correct
provided the number of samples from the investigated groups are more than half but less than
double the size of the control group. Also, Table A-14 is for one-tailed tests only.

Directions for Dunnett's test are contained in Box 3-44, with an example in Box 3-45.
                              Box 3-44:  Directions for Dunnett's Test

 COMPUTATIONS: Calculate the sample mean, X,- , and the sample variance, s,? for each of the /<-1 populations
          _                                   fc~1
 along with Xc and s^ .  Also compute SSE = (nc -l)sc +T]("/ -l)sf
                                               /=1
 STEP 1. Null Hypothesis:         H0: n\ = jLtc, for / = 1 ..... /c-1
 STEP 2. Alternative Hypotheses:   i)  HA: At least one /// > /^, for/ = 1 ..... /c-1.
                              ii) HA: At least one /ui < /uc, for/ = 1 ..... /c-1.
 STEP 3. Test Statistic:           For each of the /<-1 experimental populations, compute:

                                     x,-xc
                                        1   1
                                          + •
                                   N-k (n,   nc
                                               , where N is the total sample size.
 STEP 4. a)  Critical Value:        Use Table A-15 to determine the critical value TD(a) where the degrees of
                              freedom are N-k.
 STEP 4. b)  p-value:             Too complex for this guidance.
 STEP 5. a)  Conclusion:          For each /, reject the corresponding null hypothesis if f, > TD(a).
 STEPS, b)  Conclusion:          Use the critical value approach.
EPA QA/G-9S                                 88                                  February 2006

-------
                                Box 3-45: An Example of Dunnett's Test

  At a hazardous work site, 6 designated areas previously identified as 'problems' have been cleaned. In order for
  these areas to be admitted to the overall work site abatement program these areas should be sown to be the same
  as the reference area.  The means of these areas will be compared to mean of a reference area located on the site
  using Dunnett's test. The null hypothesis is no difference between the means of the 'problem' areas and the mean
  of the reference area and the alternative hypotheses are that the 'problem' means are greater than the reference
  area mean. The significance level is 0.05. Summary statistics for the data are given in the table below.

n.
x,
s,?
njn.
ti
Reference
7
10.3
2.5


IAK3
6
11.4
2.6
1.16
1.18
ZBF6
5
12.2
3.3
1.4
1.93
3BG5
6
10.2
3.0
1.16
0.11
4GH2
7
11.4
3.2
1
1.22
5FF3
8
11.9
2.6
0.875
1.84
6GW4
7
12.1
2.8
1
2.00
  Since all of the sample size ratios fall between 0.5 and 2.0, Dunnett's test may be used.

  COMPUTATIONS: The sample means and sample variances are displayed in rows 2 and 3 above. The f, row is
  the collection of test statistics that are generated in Step 3 below. The final computation is
  MSE = (7-1). 2.5 +(6-1). 2.6+--- + (7-l). 2.8 = 110.4.
  STEP1. Null Hypothesis:

  STEP 2. Alternative Hypotheses:

  STEPS. Test Statistic:
                                     H0: jui = jUc, for/= 1 ..... 6

                                     HA: ju/> jUc, for/= 1 ..... 6

                                     For each of the 6 'problem' areas, f, was computed. For
                                     example, (1 =
                                                     110.4
                                                     46-7
                                                          1  1
                                                          6~ + 7
                                                                   = 1.18. The results are displayed in
                                          cth
                                       the 5tn row of the table.
STEP 4. a) Critical Value:


STEP 5. a) Conclusion:
                                       Using Table A-1 5 of Appendix A with N-k = 46-7 = 39 degrees of
                                       freedom, the critical value 7D(0.95) = 2.37.

                                       Since ti < 2.37 = 7o(0.95) for all /, we fail to reject the null hypothesis.
                                       We conclude that none of the 'problem' areas have contamination
                                       levels significantly higher than the reference area. Therefore, these
                                       areas may be admitted to the work site abatement program.
3.4.2   Nonparametric Methods

        These methods rely on the relative rankings of data values. Knowledge of the precise
form of the population distributions is not necessary.

3.4.2.1   The Fligner-Wolfe Test

Purpose:  Test simultaneously for a difference between several population locations (means or
medians) and the location of a control population. A typical application would involve
EPA QA/G-9S
                                               89
February 2006

-------
comparing different potentially cleaned areas of a hazardous waste site to an uncontaminated
reference area. This test is similar to the Wilcoxon Rank Sum test.

Data: A set of k-l independent experimental simple or systematic random samples, xn,...,xin , /'
= I,..., k-l and an independent simple or systematic random sample from the control population,
                 * =   'w. and N = N* + nc.
Assumptions: All data sets come from distributions with similar shapes.

Limitations and Robustness:  The alternative hypothesis is quite restrictive as it demands all of
the experimental groups have population means that are either at least as large as the control
mean (or at most the control mean). They are not appropriate tests when some experimental
group means may be higher than the control mean and some might be lower.

Directions for the Fligner-Wolfe test are contained in Box 3-46, with an example in Box 3-47.
EPA QA/G-9S                                90                                February 2006

-------
                               Box 3-46:  Directions of the Fligner-Wolfe Test

 COMPUTATIONS:  Rank the N pooled data points from smallest to largest assigning average rank to ties. Let rfj
 denote the rank of data point X\j. Compute
                                              y=i /=1

 Note that the first term of FW\s the sum of the ranks from all experimental groups.

 STEP 1. Null Hypothesis:          H0 :  location/ =locationc, for/ =1 ..... k-^

 STEP 2. Alternative Hypothesis:    i) HA : location/ >locationc, for/ =1 ..... k-^ (at least one strict inequality)
                                  ii)  H0 : location/ < locationc, for/ = 1 ..... k-^ (at least one strict inequality)

 STEP 3. Test Statistic:            If min(nc,  N) < 20, then the test statistic is FW0.

                                  If min(nc,  A/*) > 20, then the test statistic is z0 = — ,°  ,°   , — , where
                                                                              Vvar(FM/0)
                                                            "CN
'  .y\i2-i).g is the number of tied
 -1    Ay   '
                                                 12

                                  groups and (/ is the number of tied values in they* group.
 STEP 4. a) Critical Value: If min(nc, N ) < 20, then use Table A-8 to find wa.

                                  If min(nc, A/*) > 20, then use Table A-1 to find:
 STEP 4. b) p-value:              If min(nc, N) < 20, then use Table A-8 to find:
                                          i)  p(wrs < ncN' - FW0)
                                  If min(nc, N) > 20, then use Table A-1 to find:
 STEPS, a) Conclusion:           If min(nc, N) < 20, then
                                          i)  If FWo > nc N - wa, then reject the null hypothesis of no difference
                                             between population locations.
                                          ii)  If FW0 < wa, then  reject the null hypothesis.

                                  If min(nc, N) > 20, then
                                          i)  If zo > zi-a, then reject the null hypothesis of no difference
                                             between population locations.
                                          ii)  If z0 < za, then reject the null hypothesis.

 STEP 5. b) Conclusion:           If p-value < significance level, then reject the null hypothesis of no difference
                                  between population locations.

 NOTE:  The large sample normal approximation is adequate as long as min(nc, N) > 10.
EPAQA/G-9S                                       91                                      February 2006

-------
                             Box 3-47: An Example of the Fligner-Wolfe Test

 4 contaminated ponds are to be combined with a fifth (reference) pond before further work can commence.  If the
 ponds are approximately equal, the proposed remediation method will be acceptable.  The assumption of
 normality cannot be made but all ponds were produced by the same waste process, they should exhibit similar
 characteristics.  The significance level is 0.05.  Data values with ranks in parentheses are given in the table
 below.
Reference
Pond
10(1)
19(5.5)
20(8)
23(11.5)
24(13)
30(17)
32(19.5)
46 (26)
North
15(3)
25(14)
32(19.5)
39 (23)
43 (24.5)



East
20(8)
26(15)
34(21.5)
43 (24.5)
52 (28)



South
12(2)
16(4)
22(10)
23(11.5)
34(21.5)



West
19(5.5)
20(8)
28(16)
31 (18)
49 (27)



 COMPUTATIONS: Note that nc = 8 and N = 20 since n, = 5 for/= 1,...,4.  Compute

                                    k-1 n,
                                          l-ii-
                                              N
                                                       = 304.5-210 = 94.5.
                                   y=i /=1


STEP 1.  Null Hypothesis:         H0 :pond; = pondc, for/'= 1,...,4


STEP 2.  Alternative Hypothesis:   HA : pond. > pondc, for / = 1,... ,4 (at least one strict inequality)

                                Since min(nc, N) = 8 < 20, then the test statistic is FW0 = 94.5.
 STEPS.  Test Statistic:

 STEP 4.  a) Critical Value:

 STEP 4.  b) p-value:




 STEP 5.  a) Conclusion:


 STEP 5.  b) Conclusion:
                                Since min(nc, N) < 20, Table A-8 is used to find wo.os = 47.

                                Since min(nc, N) < 20, using Table A-8 gives a p-value of
                                P(wrs < 8 • 20 -94.5) = P(wrs < 65.5) > 0.10 (the exact
                                p-value is 0.2380.)

                                Since min(nc, N) <20, and  FW0 = 94.5 < 113 = 8• 20-47 = nc/V* -wa, we
                                fail to reject the null hypothesis of no difference between ponds.

                                Since p-value = 0.2380 > 0.05 = significance level, we fail to reject the null
                                hypothesis of no difference between population ponds.
EPA QA/G-9S
                                                 92
February 2006

-------
                                         CHAPTER 4

       STEP 4:  VERIFY THE ASSUMPTIONS OF THE STATISTICAL METHOD
THE DATA QUALITY ASSESSMENT PROCESS
           Review DQOs and Sampling Design
            Conduct Preliminary Data Review
              Select the Statistical Method
               Verify the Assumptions
            Draw Conclusions from the Data
                                                        VERIFY THE ASSUMPTIONS OF THE
                                                                STATISTICAL TEST
                                                       Examine the underlying assumptions of the statistical
                                                       method in light of the environmental data.
        Activities

        • Determine Approach for Verifying Assumptions
        • Perform Tests of Assumptions
        • Determine Corrective Actions
                                                       Tools

                                                       • Tests of distributional assumptions
                                                       • Tests for independence and trends
                                                       • Tests for dispersion assumptions
                         Step 4: Verify the Assumptions of the Statistical Test

        •   Determine approach for verifying assumptions.
                Identify any strong graphical evidence from the preliminary data review.
                Review (or develop) the statistical model for the data.
                Select the methods for verifying assumptions.

        •   Perform tests of assumptions.
                Adjust for distributional  assumption if warranted.
                Perform the calculations required for the tests.

        •   If necessary, determine corrective actions.
                Determine whether data transformations will correct the problem.
                If data are missing, explore the feasibility of using theoretical justification or of
                collecting new data.
                Consider robust procedures or nonparametric hypothesis tests.
EPA QA/G-9S
93
February 2006

-------
                                    List of Boxes
                                                                               Page
Box 4-1:  Directions for Studentized Range Test	99
Box 4-2:  Example of the Studentized Range Test	100
Box 4-3:  Directions for Geary's Test	101
Box 4-4:  Example of Geary's Test	101
Box 4-5:  Directions for the Test for a Correlation Coefficient and an Example	105
Box 4-6:  Upper Triangular Computations for Basic Mann-Kendall Trend Test with a Single
         Measurement at Each Time Point	106
Box 4-7:  Directions for the Mann-Kendall Trend Test for Small Sample Sizes	106
Box 4-8:  An Example of Mann-Kendall Trend Test for Small Sample Sizes	106
Box 4-9:  Directions for the Mann-Kendall Test Using a Normal Approximation	108
Box 4-10: An Example of Mann-Kendall Trend Test by Normal Approximation	109
Box 4-11: Data for Multiple Times and Multiple Stations	Ill
Box 4-12: Testing for Comparability of Stations and an Overall Monotonic Trend	112
Box 4-13: Directions for the Wald-Wolfowitz Runs Test	114
Box 4-14: An Example of the Wald-Wolfowitz Runs Test	115
Box 4-15: Directions for the Extreme Value Test (Dixon's Test)	118
Box 4-16: An Example of the Extreme Value Test (Dixon's Test)	118
Box 4-17: Directions for the Discordance Test	119
Box 4-18: An Example of the Discordance Test	119
Box 4-19: Directions for Rosner's Test for Outliers	120
Box 4-20: An Example of Rosner's Test for Outliers	121
Box 4-21: Directions for Walsh's Test for Large Sample  Sizes	122
Box 4-22: Directions for Constructing Confidence Intervals and  Confidence Limits for the
          Sample Variance and Standard Deviation with an Example	123
Box 4-23: Directions for an F-Test to Compare Two Population Variances with an Example 124
Box 4-24: Directions for Bartletfs Test	125
Box 4-25: An Example of Bartletfs Test	125
Box 4-26: Directions for Levene's Test	126
Box 4-27: An Example of Levene's Test	127
Box 4-28: Directions for Transforming Data and an Example	129
Box 4-29: Directions for Aitchison's Method to Adjust Means and Variances	131
Box 4-30: An Example of Aitchison's Method	132
Box 4-31: Directions for Cohen's Method	133
Box 4-32: An Example of Cohen's Method	133
Box 4-33: Example of Double Linear Interpolation	134
Box 4-34: Directions for Selecting Between Cohen's Method or Aitchison's Method	134
Box 4-3 5: Example of Determining Between Cohen's Method and Aitchison' s Method	135
Box 4-36: Directions for the Rank von Neumann Test	137
Box 4-37: An Example of the Rank von Neumann Test	138
EPA QA/G-9S                              94                              February 2006

-------
                                     List of Tables
                                                                                   Page
Table 4-1. Data for Examples in Section 4.2	96
Table 4-2. Tests for Normality	97
Table 4-3. Recommendations for Selecting a Statistical Test for Outliers	117
Table 4-4. Guidelines for Analyzing Data with Non-Detects	130
Table 4-5. Guidelines for Recommended Parameters for Different Coefficient of Variations
          and Censoring	136
EPA QA/G-9S
95
February 2006

-------
                                      CHAPTER 4

       STEP 4: VERIFY THE ASSUMPTIONS OF THE STATISTICAL METHOD

4.1    OVERVIEW AND ACTIVITIES

       In this step, the analyst should assess the validity of the statistical method chosen in
step 3 by examining its underlying assumptions or determine that the data support the underlying
assumptions necessary for the selected method, or if a different statistical method should be used.

       If it is determined that one or more of the assumptions is not met, then an alternative
action is required.  Typically, this means the selection of a different statistical method. Each
method of Chapter 3 provides a detailed list of alternative methods.

       Parametric tests also have difficulty dealing with outliers and non-detects.  If either is
found in the data, then a corrective action would be to use the corresponding nonparametric
method.  In general, nonparametric methods handle outliers and non-detects better than
parametric methods. If a trend in the data is detected or the data are found to be dependent, then
the methods of Chapter 3 should not be applied.  Time series or geostatistical methods may be
required and a statistician should be consulted. For a more extensive discussion of the overview
and activities of this step, see Data Quality Assessment: A Reviewer's Guide (EPA QA/G-9R)
(U.S.EPA 2004).

4.2    TESTS FOR DISTRIBUTIONAL ASSUMPTIONS

       Many statistical tests and models are only appropriate for data that follow a particular
distribution. This section will aid in determining if a distributional assumption of a statistical test
is satisfied, in particular, the assumption of normality. Two of the most important distributions
for tests involving environmental data are the normal distribution and the lognormal distribution,
both of which are discussed in this section.  To test if the data follow a distribution other than the
normal distribution or the lognormal distribution, apply the chi-square test discussed in
Section 4.2.5 or consult a statistician.

       There are many methods available for verifying the assumption of normality ranging
from simple to complex.  This section discusses methods based on graphs, sample moments
(kurtosis and skewness), sample ranges, the Shapiro-Wilk test and closely related tests, and
goodness-of-fit tests. Discussions for the simplest tests contain step-by-step directions and
examples based on the data in Table 4-1. These tests are summarized in Table 4-2. This section
ends with a comparison of the tests to help the analyst select a test for normality.

                       Table 4-1.  Data for Examples in Section 4.2
             These are  10 observations of dust from commercial air filters in ppm

15.63


11.00


11.75


10.45


13.18


10.37


10.54


11.55


11.01


10.23

X =11.57
5=1.677
»= 10
EPA QA/G-9S
96
February 2006

-------
                             Table 4-2.  Tests for Normality
Test
Shapiro WilkW Test
Filliben's Statistic
Skewness and Kurtosis Tests
Studentized Range Test
Geary's Test
Chi-Square Test
Lilliefors
Kolmogorov- Smirnoff Test
Section
4.2.2
4.2.2
4.2.3
4.2.4
4.2.4
4.2.5
4.2.5
Sample
Size
<5000
< 100
>50
<1000
>50
Large3
>50
Recommended Use
Highly recommended.
Highly recommended, especially when used in
conjunction with a normal probability plot.
Useful for large sample sizes.
Highly recommended (with some conditions).
Useful when tables for other tests are not
available.
Useful for grouped data and when the
comparison distribution is known.
Useful when tables for other tests are not
available.
 The necessary sample size depends on the number of groups formed when implementing this test. Each group
 should contain at least 5 observations.

       The normal distribution is one of the most common probability distributions in the
analysis of environmental data. It can often be used to approximate other probability
distributions and in some instances more complex distributions can be transformed to an
appropriate normal distribution. Additionally, as the sample size becomes larger, the sample
mean has an approximate normal distribution hence the common assumption associated with
parametric tests is that the data follows a normal distribution.

       The graph of a normally distributed random
variable is bell-shaped (see Figure 4-1) with the
highest point located at the mean which is equal to the
median. A normal curve is symmetric about the
mean, hence the part to the left of the mean is a mirror
image of the part to the right. In environmental data,
random errors occurring during the measurement
process may be normally distributed.
                                                                      Normal Distribution
                                                                  10      15     23     25
                                                  Figure 4-1.  Density Plots for the Normal
                                                              and Lognormal Distributions
       Environmental data commonly exhibit
frequency distributions that are non-negative and
skewed with heavy or long right-tails. Several
standard parametric probability models have these properties, including the Weibull, gamma, and
lognormal distributions.  The lognormal distribution (Figure 4-1) is a commonly used
distribution for modeling environmental contaminant data.  The advantage to this distribution is
that a simple (logarithmic) transformation will transform a lognormal distribution into a normal
distribution. Therefore, the methods for testing for normality described in this section can be
used to test for lognormality if a logarithmic transformation has been used. It should be noted
that as the shape parameter of the lognormal approaches zero, the lognormal approaches a
normal. When the shape parameter is large, the lognormal becomes more positively skewed.
EPA QA/G-9S
                                           97
February 2006

-------
4.2.1   Graphical Methods

       Graphical methods (Section 2.3) present detailed information about data sets that may not
be apparent from a test statistic. Histograms, stem-and-leaf plots, and normal probability plots
are some graphical methods that are useful for determining whether or not data follow a normal
curve.  Both the histogram and stem-and-leaf plot of a normal distribution are bell-shaped.  The
normal probability plot of a normal distribution follows a straight line. For non-normally
distributed data, there will be large deviations in the tails or middle of a normal probability plot.

       Using a plot to decide if the data are normally distributed is a qualitative judgment.  For
extremely non-normal data, it is easy to make this determination; however, in many cases the
decision is not straightforward.  Therefore, formal test procedures are usually necessary to test
the assumption of normality.

4.2.2   Normal Probability Plot Tests

       One of the most powerful tests for normality is the Shapiro-Wilk Wtest. This test is
similar to computing a correlation between the quantiles of the standard normal distribution and
the ordered values of a data set. If the normal probability plot is approximately linear (i.e., the
data follow a normal curve), the test statistic will be relatively high. If the normal probability
plot is nonlinear, then the test statistic will be relatively low.

       The Wtest is recommended in several EPA guidance documents and in many statistical
texts.  Tables of critical values  for sample sizes up to 50 have been developed for determining
the significance of the test statistic. However, many software packages can perform the Wtest
for data sets with sample sizes  as large as 5000. This test is difficult to compute by hand since it
requires two different sets of tabled values and a large number of summations and
multiplications.  Therefore, directions for implementing this test are not given in this document.

       Several tests related to the Shapiro-Wilk test have been proposed.  D'Agostino's test for
sample sizes between 50 and 1000 and Royston's test for sample sizes up to 2000 are two such
tests that approximate some of the key quantities or parameters of the Wtest.

       Another related test is the Filliben statistic, also called the probability plot correlation
coefficient.  This test measures the linearity of the points on the normal probability plot. Similar
to the Wtest, if the normal probability plot is approximately linear, then the correlation
coefficient will be relatively high. Although easier to compute that the Wtest, the Filliben
statistic is still difficult to compute by hand. Therefore, directions for implementing this test are
not given in this guidance.

4.2.3   Coefficient of Skewness/Coefficient of Kurtosis Tests

       The degree of symmetry (or asymmetry) displayed by a data set is measured by the
coefficient of skewness (gi). The coefficient of kurtosis, g4, measures the degree of flatness of a
probability distribution near its center. Several test methods have been proposed using these
coefficients to test for normality. One method tests for normality by adjusting the coefficients of

EPA QA/G-9S                                 98                                 February 2006

-------
skewness and kurtosis to approximate a standard normal distribution for sample sizes greater
than 50.

       Two other tests based on these coefficients include a combined test based on a chi-
squared (/2) distribution and Fisher's cumulant test. Fisher's cumulant test computes the exact
sampling distribution of g3 and g4; therefore, it is more powerful than previous methods which
assume that the distributions of the two coefficients are normal.  Fisher's cumulant test requires a
table of critical values, and these tests require a sample size of greater than 50. Tests based on
skewness and kurtosis are rarely used as they are less powerful than many alternatives.

4.2.4  Range Tests

       Nearly 100% of the area of a normal curve lies within ±5 standard deviations from the
mean.  Tests for normality have been developed based on this fact. Two  such tests are the
Studentized Range test and Geary's test.  Both of these tests use a ratio of an estimate of the
sample range to the sample standard deviation. Very large and very small values of the ratio
imply that the data are not well modeled by a normal distribution.

4.2.4.1    The Studentized Range Test

       This test compares the sample range to the  sample standard deviation. Tables of critical
values for sample sizes up to 1000 (Table A-3 of Appendix A) are available for determining
whether the absolute value of this ratio is significantly large.  Directions for implementing this
method are given in Box 4-1 along with an example. The Studentized range test does not
perform well if the data are asymmetric or if the tails of the data are heavier than the normal
distribution. In addition, this test may be sensitive to extreme values. Many environmental data
sets are positively skewed (have a long tail of high values) and are similar to a lognormal
distribution..  If the data appear to be lognormally distributed, then this test should not be used.
In most cases, the Studentized range test performs as well as the Shapiro-Wilk test and is easier
to apply.
                          Box 4-1:  Directions for Studentized Range Test

 COMPUTATIONS: Using Section 2.2.3, calculate sample range, w, and sample standard deviation, s.

 STEP 1. Null Hypothesis:         Ho: The underlying distribution of the data is a normal distribution.

 STEP 2. Null Hypothesis:         HA: The underlying distribution of the data is not a normal distribution.

 STEP 3. Test Statistic:           R = w/s

 STEP 4. a)  Critical Value:        Use Table A-3 to find the critical values a and b.

 STEPS, a)  Conclusion:          If R falls outside the two critical values, then reject the null hypothesis that
                              the underlying distribution is normal.
Directions for the Studentized range test are contained in Box 4-1, with an example in Box 4-2.


EPA QA/G-9S                                 99                                  February 2006

-------
                          Box 4-2:  Example of the Studentized Range Test

  Using a significance level of 0.01, determine if the underlying distribution of the data from Table 4-1 can be
  modeled using a normal distribution.

  COMPUTATIONS: w = Xm - X(D = 15.63 - 10.23 = 5.40 and s = 1.677.

  STEP 1. Null Hypothesis:        H0: The underlying distribution of the data is a normal distribution.

  STEP 2. Null Hypothesis:        HA: The underlying distribution of the data is not a normal distribution.

  STEP 3. Test Statistic:           R=w/s = 5.4 /1.677 = 3.22.

  STEP 4. a) Critical Value:        Using Table A-3, the critical values a and b are 2.51 and 3.88.

  STEPS, a) Conclusion:          Since R falls between the two critical values, we fail to reject the null
                              hypothesis that the underlying distribution is normal.
4.2.4.2    Geary's Test

       Geary's test compares the sum of the absolute deviations from the mean to the sum of the
squares.  If the ratio is too large or too small, then the underlying distribution of the data should
not be modeled as a normal distribution. Directions for implementing this method are given in
Box 4-3 and an example is given in Box 4-4.  This test does not perform as well as the Shapiro-
Wilk test or the Studentized Range test.

       Directions for Geary's test are given in Box 4-3, with an  example in Box 4-4.

4.2.5  Goodness-of-Fit Tests

       Goodness-of-fit tests are used to test whether data follow a specific distribution, i.e., how
"good" a specified distribution fits the data. In verifying assumptions of normality, one would
compare the data to a normal distribution with a specified mean and variance.

4.2.5.1    Chi-Square Test

       One classic goodness-of-fit test is the chi-square test which involves breaking the data
into groups and comparing these groups to the expected groups from the known distribution.
There are no fixed methods for selecting these groups and this test also requires a large sample
size since at least 5 observations per group is needed to implement this test. In addition, the chi-
square test does not have the power of the Shapiro-Wilk test or some of the other tests mentioned
above. However, it is more flexible since the data can be compared to probability  distributions
other than the normal but the application of goodness-of-fit tests to non-normal data is beyond
the scope  of this guidance.
EPA QA/G-9S                                 100                                 February 2006

-------
                              Box 4-3: Directions for Geary's Test

COMPUTATIONS:  Calculate the sample mean, X , the sample sum of squares, SSS, and the sum of
absolute deviations, SAD:
                                                       , and  SAD =
                                                i=i
     STEP 1. Null Hypothesis:

     STEP 2. Null Hypothesis:


     STEP 3. Test Statistic:



     STEP 4. a) Critical Value:

     STEP 4. b) p-value:


     STEPS, a) Conclusion:



     STEP 5. b) Conclusion:
                                Ho: The underlying distribution of the data is a normal distribution.

                                HA: The underlying distribution of the data is not a normal distribution.
                                 z0 = ^~°0!,9,79 , where a =  ,SAD
                                                          -Jn-SSS
                                         0,
                                     0.2123
                                Use Table A-1 to find zi.^.

                                Use Table A-1 to find 2-p(z >|z0|).


                                If |z0| > z^_a/2 , then reject the  null hypothesis that the underlying
                                distribution is normal.

                                If p-value < a, then reject the null hypothesis that the underlying
                                distribution is normal.
                                Box 4-4:  Example of Geary's Test

Using a significance level of 0.05, determine if the underlying distribution of the data from Table 4-1 can be
modeled using a normal distribution.


COMPUTATIONS:  X = 11.571,  SAD = 11.694 , and SSS = 25.298 .

STEP1.  Null Hypothesis:         H0:  The underlying distribution of the data is a normal distribution.

STEP 2.  Null Hypothesis:         HA:  The underlying distribution of the data is not a normal distribution.


STEP 3.  Test Statistic:           z0 = °'7^~°'7979 = -0.9339 , since a = , 1t694   = 0.7352 .
                                       0.2123
                                                                         ,
                                                                        V1 0-25.298
STEP 4.  a)  Critical Value:

STEP 4.  b)  p-value:

STEP 5.  a)  Conclusion:



STEP 5.  b)  Conclusion:
                                Using Table A-1 , zo.g/s = 1 .96.

                                Using Table A-1, 2-P(Z> 0.9339) = 2-(1 -0.8238) = 0.3524.

                                Since |z0| = 0.9339 < 1 .96 = z-\_a , we fail to reject the null hypothesis
                                that the underlying distribution is normal.

                                Since p-value = 0.3524 > 0.05 = a, we fail to reject the null hypothesis
                                that the underlying distribution is normal.
EPAQA/G-9S
                                              101
                                                                                       February 2006

-------
4.2.5.2   Tests Based on the Empirical Distribution Function

       Various methods have been used to measure the discrepancy between the empirical
distribution function and the theoretical cumulative distribution function (cdf).  These measures
are referred to as empirical distribution function statistics.  The best known is the Kolmogorov-
Smirnov (K-S) statistic.  The K-S approach is appropriate if the sample size exceeds 50 or if the
theoretical cumulative density function is a specific distribution with known parameters.  A
modification to the test, called the Lilliefors K-S test, can be used to test that the data (n > 50)
comes from  a normal distribution with mean and variance equal to the sample values.

       Unlike the K-S type statistics, most empirical distribution function statistics are based on
integrated or average values between the empirical distribution function and hypothesized
cumulative distribution functions.  The two most powerful are the Cramer-von Mises and
Anderson-Darling statistics.  Extensive simulations show that the Anderson-Darling empirical
distribution function statistic is as good as any test, including the Shapiro-Wilk test, when testing
for normality.  However, the Shapiro-Wilk test is applicable only for the normal distribution,
while the Anderson-Darling method applies to other distributions.

4.2.6   Recommendations

       Tests for normality with small samples have very little statistical power. Therefore, for
small sample sizes it is common for a nonparametric statistical test be selected during Step 3 of
the DQA in order to avoid incorrectly assuming the data are normally distributed when there is
simply not enough information.

       This  guidance recommends using the Shapiro-Wilk fFtest, wherever practicable.  The
Shapiro-Wilk Wtest is one of most powerful tests for normality.  This test is difficult to
implement by hand but can be applied easily using a statistical  software package.  If the Shapiro-
Wilk fFtest is not feasible, then this guidance recommends using either Filliben's statistic
together with inspection of the normal probability plot, or the studentized range test. Filliben's
statistic performs similarly to the Shapiro-Wilk test. The studentized range is a simple test to
perform, but, it is not applicable for non-symmetric data with large  tails. If the data are not
highly skewed and the tails  are not  significantly large compared to a normal distribution, then the
studentized range provides a simple and powerful test that can be calculated by hand.  The
Lilliefors Kolmogorov-Smirnoff test is statistically powerful but is also more difficult to apply
and uses specialized tables not readily available.

4.3    TESTS FOR TRENDS

4.3.1   Introduction

       This  section presents statistical tools for detecting and estimating trends in environmental
data.  The detection and estimation of temporal or spatial trends are important for many
environmental studies or monitoring programs. In cases where temporal or spatial patterns are
strong, simple procedures such as time plots or linear regression over time can reveal trends. In
more complex situations, sophisticated statistical models and procedures may be needed.  For

EPA QA/G-9S                                102                               February 2006

-------
example, the detection of trends may be complicated by the overlaying of long- and short-term
trends, cyclical effects (e.g., seasonal or weekly systematic variations), autocorrelations, or
impulses or jumps (e.g., due to interventions or procedural changes).

       The graphical representations of Section 2.3.7 are recommended as the first step to
identify possible trends. A Time Plot and/or Lag Plot is recommended for temporal data as it
may reveal long-term trends, seasonal behavior, or impulses.  A posting or bubble plot is
recommended for spatial data to reveal  spatial trends such as areas of high concentration or
inaccessible areas.

       For most of the statistical tools presented below, the focus is on monotonic long-term
trends (i.e., a trend that is exclusively increasing or decreasing), as well as other sources of
systematic variation, such as seasonality.  The investigations of trend in this section are limited
to one-dimensional domains, e.g., trends in a pollutant concentration over time. The current
edition of this  document does not address spatial trends or trends over space and time, which
may involve sophisticated geostatistical techniques such as kriging and require the assistance of
a statistician.  Section 4.3.2 discusses estimating and testing for trends using regression
techniques.  Section 4.3.3  discusses nonparametric trend estimation procedures, and Section
4.3.4 discusses hypothesis tests for detecting trends under several types of situations.

4.3.2  Regression-Based Methods for Estimating and Testing for Trends

4.3.2.1   Estimating  a Trend Using the Slope of the Regression Line

       The classic procedures for assessing linear trends involve regression.  Linear regression is
a commonly used procedure in which calculations are performed on a data set containing pairs of
observations (Xt, 7;), so as to obtain the slope and intercept of a line that best fits the data. For
temporal data, the Xt values represent time and the Yt values represent the observations. An
estimate of the magnitude of trend can be obtained by performing a regression of the data versus
time and using the slope of the regression line as the measure of the strength of the trend.

       Regression procedures are easy  to apply.  All statistical  software packages and
spreadsheet programs  will calculate the slope and intercept of the best fitting line, as well as the
correlation coefficient r (see Section 2.2.4). However, regression entails several limitations and
assumptions. First of all, simple linear  regression (the most commonly used method) is designed
to detect linear relationships between two variables; other types of regression models are
generally needed to detect non-linear relationships such as cyclical or non-monotonic trends.
Regression is very sensitive to outliers and presents difficulties  in handling data below the
detection limit, which are commonly encountered in environmental studies. Hypothesis testing
for linear regression also relies on two key assumptions: normally distributed errors, and constant
variance.  It may be difficult or burdensome to verify these assumptions in practice, so the
accuracy of the slope estimate may be suspect. Moreover, the analyst must ensure that time plots
of the data show no cyclical patterns, outlier tests show no extreme data values, and data
validation reports indicate that nearly all the measurements were above detection limits.  Due to
these drawbacks, linear regression is not recommended as a general tool for estimating and

EPA QA/G-9S                                103                                February 2006

-------
detecting trends, although it may be useful as an informal and quick screening tool for
identifying strong linear trends.

4.3.2.2   Testing for Trends Using Regression Methods

       For simple linear regression, the statistical test of whether the slope is significantly
different from zero is equivalent to testing if the correlation coefficient is significantly different
from zero.  This test assumes a linear relation between 7 and X with independent normally
distributed errors and constant variance across all the lvalues. Non-detects and outliers may,
however, invalidate the test.

Directions for this test are given in Box 4-5 along with an example.

4.3.3   General Trend Estimation Methods

4.3.3.1   Sen's Slope Estimator

       Sen's Slope Estimate is a nonparametric alternative for estimating a slope.  This approach
involves computing slopes for all the pairs of time points and then using the median of these
slopes as an estimate of the overall slope.  As such, it is insensitive to outliers and can handle a
moderate number of values below the detection limit and missing values.  Assume that there are
n time points, and \QtXt denote the data value for the /'th time point. If there are no missing data,
there will be n(n-l)/2 possible pairs of time points (/,/) in which /'  
-------
                Box 4-5: Directions for the Test for a Correlation Coefficient and an Example

     COMPUTATIONS: Calculate the correlation coefficient, r (Section 2.2.4).

     STEP 1.  Null Hypothesis:          Ho:  The correlation coefficient is zero.

     STEP 2.  Alternative Hypothesis:    HA:  The correlation coefficient is different from zero.

                                             r
     STEP 3.  Test Statistic:            ta =
                                            1-r2
                                            n-2

     STEP 4.  a)  Critical Value:     Use Table A-2  to find fn_2,i-a/2-

     STEP 4.  b)  p-value:               Use Table A-2 to find  2-p(fn_2 >|f0|).


     STEPS,  a)  Conclusion:           If |f0| > tn_2^_a/2 , then reject the null hypothesis that the
                                      correlation coefficient is zero.

     STEP 5.  b)  Conclusion:           If p-value < a, then  reject the null hypothesis that the correlation
                                      coefficient is zero.
     Example:  Using a significance level of 0.10, determine if the following data set (in ppb) has significant
     correlation between it's two variables:

                     Sample Number       1234
                         Arsenic          8.0        6.0        2.0        1.0
                          Lead           8.0        7.0        7.0        6.0

     COMPUTATIONS:  In Section 2.2.4,  the correlation coefficient r for this data was calculated to be 0.865.

     STEP 1.  Null Hypothesis:          Ho:  The correlation coefficient is zero.

     STEP 2.  Alternative Hypothesis:    HA:  The correlation coefficient is different from zero.


     STEPS.  Test Statistic:            fn =    °'865    =2.438.
                                            1-0.8652
                                              4-2

     STEP 4.  a)  Critical Value:         Using Table A-2, f2,o.975 = 4-303 •

     STEP 4.  b)  p-value:               Using Table A-2, 0.10< p-value < 0.20 (the exact
                                      p-value = 0.1350)

     STEP 5.  a)  Conclusion:           Since |f0| < t2 o 975 , we fail to reject the null hypothesis that the
                                      correlation coefficient is zero.

     STEPS,  b)  Conclusion:           Since p-value = 0.1350 > 0.10 (a;) we fail to reject the null hypothesis
                                      that the correlation coefficient is zero.
EPA QA/G-9S                                      105                                      February 2006

-------
4.3.4.1    One Observation per Time Period for One Sampling Location
       The Mann-Kendall test involves computing a statistic S, which is the difference between
the number of pairwise differences that are positive minus the number that are negative. If S is a
large positive value, then there is evidence of an increasing trend in the data.  If S is a large
negative value, then there is evidence of a decreasing trend in the data.  The null  hypothesis  or
baseline condition for this test is that there is no temporal trend in the data values.  The
alternative hypothesis is that of either an upward trend or a downward trend.
Box 4-6: Upper Triangular Computations for
Trend Test with a Single Measurement at
Original Time
Measurement
X,
X2
Xn_2
Xn-l
fl fc fe f4 • k-1
X^ V V V
1 A2 A3 A4 . . A/j-i
Yi2 Yia Y32 ' ' Yiin.i
Y23 Y32 ' ' Y2in.i
Yn-2,n-1

Basic Mann-Kendall
Each Time Point
fn
xn
Y1n
Y2n
Yn.2,n
Yn-l,n
where Y// = sign (XrX<) = + ifXy>X,
— n if v — v
— U IT Ay — A/
— - IT Ay < A/

#(plusses)
#(plusses)
#(plusses)
#(plusses)
Total
#(plusses)

#(minuses)
#(minuses)
#(minuses)
#(minuses)
Total
#(minuses)




              Box 4-7:  Directions for the Mann-Kendall Trend Test for Small Sample Sizes
  COMPUTATIONS: Create the upper triangular table of pairwise differences as described in Box 4-6.
  STEP 1.  Null Hypothesis:
  STEP 2.  Null Hypothesis:

  STEPS.  Test Statistic:
  STEP 4.  a) Critical Value:
  STEP 4.  b) p-value:
  STEP 5.  a) Conclusion:
  STEP 5.  b) Conclusion:
H0: There is no trend.
i) HA:  There is a downward trend.
ii) HA: There is an upward trend.
S = Total #(plusses) - Total #(minuses)
Use Table A-12a to find the critical value.
Use Table A-12b to find the p-value.
If |S| > the critical value, then reject the null hypothesis of no trend.
If p-value < a, then reject the null hypothesis of no trend.
EPA QA/G-9S
               106
February 2006

-------
       The computations for the basic Mann-Kendall trend test are depicted in Box 4-6.  Assign
a value of DL/2 to all non-detects.  The test statistic is the difference between the number of
strictly positive differences and the number of strictly negative differences. Differences of zero
are not included in the test statistic (and should be avoided, if possible, by recording data to
sufficient accuracy). The steps for conducting the Mann-Kendall test for small sample sizes
(n < 10)  are contained in Box 4-7 and an example is contained in Box 4-8. For sample sizes
greater than 10, there is a normal approximation for the Mann-Kendall test. Directions for this
approximation are contained in Box 4-9 with an example given in Box 4-10.
                Box 4-8: An Example of Mann-Kendall Trend Test for Small Sample Sizes

Consider 5 measurements ordered by the time of their collection: 5,6, 11,8, and 10. This data will be used to test for
an upward trend at a significance level of 0.05.

COMPUTATIONS: A triangular table was used to construct the pairwise differences.	
                     Time
                     Data
                       5
                       6
                      11
     2
     6
3
11
5     No. of   No. of-
10    + Signs    Signs
                                   4
                                   3
                                   0
                                   1
STEP 1. Null Hypothesis:

STEP 2. Null Hypothesis:

STEPS. Test Statistic:

STEP 4. a)  Critical Value:

STEP 4. b)  p-value:

STEP 5. a)  Conclusion:

STEP 5. b)  Conclusion:
HO:  There is no trend.

HA:  There is an upward trend.

S =  Total #(plusses) - Total #(minuses) = 8-2 = 6

Using Table A-12a, the critical value is 8.

Using Table A-12b, the p-value is 0.117.

As S = 6 < 8 (critical value), fail to reject the null hypothesis of no trend

Since p-value = 0.117 > 0.05(significance level), we fail to reject the null
hypothesis of no trend.	
EPA QA/G-9S
                 107
                                          February 2006

-------
               Box 4-9:  Directions for the Mann-Kendall Test Using a Normal Approximation

  COMPUTATIONS: If the sample size is 10 or more, a normal approximation to the Mann-Kendall procedure may
  be used.  Compute S as described in Box 4-7 above.
  STEP 1. Null Hypothesis:

  STEP 2. Null Hypothesis:



  STEP 3. Test Statistic:
  STEP 4. a) Critical Value:

  STEP 4. b) p-value:


  STEP 5. a) Conclusion:

  STEP 5. b) Conclusion:
H0:  There is no trend.

i) HA: There is a downward trend.
ii) HA: There is an upward trend.
                                  zn =
     S-sign(s)   ,     ../„<,   1
   =	.  , ,v ;, where V(S) = —
                       v '  18
                                                                               y=i
g is the number of tied groups, and f/ is the number of points in the)  group.
Note that sign(S) = 1 if S > 0, 0 if S = 0 and -1 if S < 0.

        Use Table A-1 to find z-\.a.

Use Table A-1 to find P(Z > |z0|).


If |z0| > z-\_a , then reject the null hypothesis of no trend.

If p-value < a, then reject the null hypothesis of no trend.
EPA QA/G-9S
                108
February 2006

-------
Box 4-10: An
A test for an upward trend with
COMPUTATIONS: Using Box
Data 1 2 3
10 10 10 10
10 00
10 0
5
10
20
18
17
15
24


S = (sum of + signs) - (sum of
STEP1. Null Hypothesis:
STEP 2. Null Hypothesis:
STEPS. Test Statistic:

Example of Mann-Kendall Trend Test
a = 0.05 will be based on the 1 1 weekly
by Normal Approximation
measurements is shown
4-6, a triangular table was constructed of the pairwise slopes. "0'
456789
5 10 20 18 17 15
0 + + + +
0 + + + +
0 + + + +
+ + + + +
+ + + +
_
-
-



-signs) = 35-13 = 22.
H0: There is no trend.
HA: There is an upward trend.
10 11 No. of
24 15 + Signs
+ + 6
+ + 6
+ + 6
+ + 7
+ + 6
+ - 1
+ - 1
+ - 1
+ 0 1
0
35



There are several observations tied at 10 and 15. Thus, the
values will be used. In this formula
below.
indicates a tie.
No. of
-Signs
1
1
1
0
0
4
3
2
0
1
13



formula for tied
, g = 2, f-i = 4 for tied values of 1 0, and
fe = 2 for tied values of 15. Therefore,
v(s)4("


STEP 4. a) Critical Value:
STEP 4. b) p-value:
STEP 5. a) Conclusion:

STEP 5. b) Conclusion:
l(l1-l)(2-11 + 5)-[4(4-l)(2-4+5) + 2(
S-sign(s) 22-1
--inH -7 _ & \ / — _
VV(S) V155. 33
Using Table A-1 , 20.95 = 1 .645.
Using Table A-1, p-value = P(Z > 1
Since test statistic = 1 .685 > 1 .645
hypothesis of no trend.
2-l)(2-2 + 5)]}=155.33
1 685
I .\J\J\J .

.685) = 0.046.





= critical value, we reject the null


Since p-value = 0.046 < 0.05 = a, we reject the null hypothesis of no trend.
4.3.4.2 Multiple Observations per Time Period for One Sampling Location

       Often, more than one sample is collected for each time period. There are two ways to
deal with multiple observations per time period. One method is to compute a summary statistic,
such as the median, for each time period and to apply one of the Mann-Kendall trend tests of
Section 4.3.4.1 to the summary statistic. Therefore, instead of using the individual data points in
the triangular table, the summary statistic would be used. Then the steps given in Box 4-7 or
Box 4-9 could be applied to the summary statistics.
EPA QA/G-9S
109
                                                                            February 2006

-------
       An alternative approach is to consider all the multiple observations within a given time
period as being essentially taken at the same time within that period.  The S statistic is computed
as before with n being the total of all observations.  The variance of the S statistic (calculated in
step 2 of Box 4-9) is changed to:
                                                h
                  n(n - 1X2/7 + 5) -  T tj (tj - fa, + 5) -  T Uk (uk - fc + 5)
                               y=i               k=i
where g represents the number of tied groups, tj represents the number of data points in the/h
group, h is the number of time periods which contain multiple data, and Uk is the sample size in
the kih time period.

       The preceding variance formula assumes that the data are not correlated. If correlation
within single time periods is suspected, it is preferable to use a summary statistic (e.g., the
median) for each time period and then apply either Box 4-7 or Box 4-9 to the summary statistics.


4.3.4.3   Multiple Sampling Locations with Multiple Observations

       The preceding methods involve a single sampling location (station). However,
environmental data often consist of sets of data collected at several sampling locations (see
Box 4-11). For example, data are often systematically collected at several fixed sites on a lake or
river,  or within a region or basin.  The data collection plan (or experimental design) must be
systematic in the sense that approximately  the same sampling times should be used at all
locations. In this situation, it is desirable to express the results by an overall regional summary
statement across all sampling locations. However, there must be consistency in behavioral
characteristics across sites over time in order for a single summary statement to be valid across
all sampling locations.  A useful plot to assess the consistency requirement is a single time plot
(Section 2.3.7.1) of the measurements from all stations where a different symbol is used to
represent each station.

       If the stations exhibit approximate trends in the same direction with comparable slopes,
then a single summary statement across stations is valid and this implies two relevant sets of
hypotheses should be investigated:

       Comparability of stations.  HQ: Similar dynamics affect all K stations vs. HA:  At least
       two stations exhibit different dynamics.

       Testing for overall monotonic trend. H0*:  Contaminant levels do not change over time
       vs. HA*: There is an increasing (or  decreasing) trend consistent across all stations.

EPAQA/G-9S                                 110                                February 2006

-------
       Therefore, the analyst must first test for homogeneity of stations, and then, if
homogeneity is confirmed, test for an overall monotonic trend. Directions for the test are
contained in Box 4-11 and ideally, the stations in Box 4-11 should have equal sample sizes.
However, the numbers of observations at the stations can differ slightly, because of isolated
missing values, but the overall time periods spanned must be similar. This guidance
recommends that for less than 3 time periods, an equal number of observations (a balanced
design) are required. For 4 or more time periods, up to 1 missing value per sampling location
may be tolerated.
                      Box 4-11: Data for Multiple Times and Multiple Stations

 Let /= 1, 2, ..., n represent time, k= 1, 2, ..., K represent sampling locations, andX* represent the measurement
 at time / for location k. This data can be summarized in matrix form, as shown below.

                                       Stations
                                                     K
                    1
                    2
             Time
Xn
X,1
X|2 • • • XIK
X22 . . . X2K
X\/
.12 ... AnK
                            Si
                           V(Si)
 S2
V(S2)
 Z2
V(SK)
 ZK
         where  S/< = Mann-Kendall statistic for station k (see STEP 3, Box 4-7),
                  fc) = variance for S statistic for station k (see STEP 3, Box 4-9), and
       a.     One Observation per Time Period.  When only one measurement is taken for
each time period for each station, a generalization of the Mann-Kendall statistic can be used to
test the above hypotheses.  This procedure is described in Box 4-12.

       b.     Multiple Observations per Time Period.  If multiple measurements are taken at
some times and stations, then the previous approaches are  still applicable.  However, the
variance of the statistic Sk must be calculated using the equation for calculating V(S) given in
Section 4.3.4.2. Note that Sk is computed for each station, so n, tj, g, h, and Uk are all station-
specific.
EPA QA/G-9S
      111
                         February 2006

-------
            Box 4-12: Testing for Comparability of Stations and an Overall Monotonic Trend

Let /= 1, 2, ..., n represent time, k= 1, 2, ..., K represent sampling locations, ar\dXik represent the
measurement at time / for location k. Let a represent the significance level for testing homogeneity and a*
represent the significance level for testing for an overall trend.

COMPUTATIONS:  Calculate the Mann-Kendall statistic Sk and its variance V(S^) for each of the K stations using
the methods of Box 4-9.  Now, calculate
                                                                     K
Test of Homogeneity

STEP 1.  Null Hypothesis:

STEP 2.  Null Hypothesis:


STEP 3.  Test Statistic:
                                                                    k=i
                                H0: Similar dynamics affect all K stations.

                                HA: At least two stations exhibit different dynamics.

                                      K
                                     k=i
STEP 4.  a)  Critical Value:        Use Table A-9 to find xk-^-


STEP 4.  b)  p-value:              Use Table A-9 to find P(XK--\
STEPS,  a)  Conclusion:
STEPS,  b)  Conclusion:
                                If xJi > XK--\ i-a> then reject the null hypothesis that similar dynamics affect
                                all K stations.

                                If p-value < a, then reject the null hypothesis that similar dynamics affect all
                                K stations.
If Ho is not rejected, then proceed to test of overall trend.  Otherwise, individual «*-level Mann-Kendall tests
should be conducted using the methods presented in Section 4.3.4.1.
  Test of Overall Trend

  STEP 1.  Null Hypothesis:

  STEP 2.  Null Hypothesis:


  STEP 3.  Test Statistic:


  STEP 4.  a)  Critical Value:


  STEP 4.  b)  p-value:



  STEP 5.  a)  Conclusion:


  STEPS,  b)  Conclusion:
                                Ho*: Contaminant levels do not change overtime.

                                HA*: There is an increasing (or decreasing) trend consistently
                                     exhibited across all stations.
                                 X%=K-Z2.
                                Use Table A-9 to find x\-\-a* •
                                Use Table A-9 to find p^ >
                                                      (^2
                                If xJi > Z-n-a* • ^nen reject the null hypothesis that contaminant levels do
                                not change overtime.

                                If p-value < a, then reject the null hypothesis .
EPAQA/G-9S
                                                 112
                                                                                          February 2006

-------
4.3.4.4   One Observation for One Station with Multiple Seasons

       Temporal data are often collected over extended periods of time.  Within the time
variable, data may exhibit periodic cycles, which are patterns in the data that repeat over time.
For example, temperature and humidity may change with the season or month, and may affect
environmental measurements. (For more information on seasonal cycles, see Section 2.3.7). In
the following discussion, the term season represents one time point in the periodic cycle, such as
a month within a year or an hour within a day.

       If seasonal cycles are anticipated, then two approaches for testing for trends are the
seasonal Kendall test and Sen's test for trends. The seasonal Kendall test may be used for large
sample sizes, and Sen's test for trends may be used for small sample  sizes. If different seasons
manifest similar slopes (rates of change) but possibly different intercepts, then the Mann-Kendall
technique of Section 4.3.4.3 is applicable, replacing time by year and replacing station by season.
       The seasonal Kendall test, which is an extension of the Mann-Kendall test, involves
calculating the Mann-Kendall test statistic, S, and its variance separately for each season. The
sum of the ^s and the sum of their variances are then used to form an overall test statistic that is
assumed to be approximately normally distributed for larger size samples.

       For data at a single site, collected at multiple seasons within multiple years, the
techniques of Section 4.3.4.3 can be applied to test for homogeneity of time trends across
seasons. The methodology follows Boxes 4-11 and 4-12 exactly except that 'station' is replaced
by 'season' and the inferences refer to seasons.

4.3.5   A Discussion on Tests for Trends

       This section discusses some further considerations for choosing among the many tests for
trend. Mann-Kendall type nonparametric trend tests and estimates use ordinal time (ranks) rather
than cardinal time (actual time values) and this restricts the interpretation of measured trends.
All of the Mann-Kendall Trend Tests presented are based on certain pairwise differences in
measurements at different time points. The only information about these differences that is used
in the Mann-Kendall calculations is the  sign, and can be regarded as generalizations of the signed
rank test.  However, since information about the magnitudes of the differences is not used, this
can adversely affect the statistical power when only limited amounts of data are available.

       There are nonparametric methods based on ranks that take such magnitudes into account
and still retain the benefit of robustness  to outliers.  These procedures can be thought of as
replacing the data with their ranks and then conducting parametric analyses. These include the
Wilcoxon Rank Sum test and its many generalizations.  These methods are more resistant to
outliers than parametric methods. Rank-based methods which make fuller use of the information
in the data than the Mann-Kendall methods are not as robust with respect to outliers as the signed
rank test and the Mann-Kendall tests, but they have more statistical power.  This kind of tradeoff
EPAQA/G-9S                                113                                February 2006

-------
between power and robustness shows the need for an evaluation process leading to the selection
of the best statistical procedure for the situation.

4.3.6  Testing for Trends in Sequences of Data

       There are cases where it is desirable to see if a sequence of data (for example, readings
from a monitoring station) could be considered random variation or correlated in some way. One
test to make this determination is the Wald-Wolfowitz test.  This test can only be used if the data
are binary, i.e., there are only two potential values. For example, the data could be either
'Yes/No' or an investigation of persistent violation of a permitted limit by  a pollution control
process where a violation  equals 1, with 0 for not in violation. Directions for the Wald-
Wolfowitz test are given in Box 4-13  and an example in Box 4-14.
                        Box 4-13: Directions for the Wald-Wolfowitz Runs Test

 Consider a sequence of binary values.  Let m and n denote the number of observations for the two values with
 n |z0|).
If either m or n is less than 10 and if T < 1/1/0/2 or T > 2 //r -1/1/3/2, then reject
the null hypothesis that the data sequence is random.
If both m and n are at least 10 and if |z0| > Zi_o/2, then reject the null
hypothesis that the data sequence is random.

If p-value < a, the reject the null hypothesis that the data sequence is
random.
EPA QA/G-9S
                114
                                               February 2006

-------
                        Box 4-14: An Example of the Wald-Wolfowitz Runs Test

 The main discharge station at a chemical manufacturing plant is under a monitoring program. The permit states
 that the discharge should have a pH of 7.0 and should never be less than 5.0.  So the plant manager has decided
 to use a pH of 6.0 to an indicate potential problems.  In a four-week period the  following values were recorded:

         6.5   6.6   6.4   6.2    5.9   5.8   5.9  6.2   6.2    6.3   6.6  6.6   6.7   6.4
         6.2   6.3   6.2   5.8    5.9   5.8   6.1  5.9   6.0    6.2   6.3  6.2

 Since the plant manager has decided that a pH of 6.0 will indicate trouble the data have been replaced with a
 binary indicator.  If the value is greater than 6.0, the value will be replaced by a 1; otherwise the value will be
 replaced by a 0.  So the binary sequence is:

                             11110001111111111000100111

 As there are 8 values of 'O'and 19 values of T, n = 8 and m = 19 and the number of runs is T = 7.  Test at a
 significance level of 0.10 whether the data sequence is random.

 STEP 1.  Null Hypothesis:         H0:  data sequence is random.

 STEP 2. Alternative Hypothesis:   HA:  data sequence is non-random.

 STEP 3. Test Statistic:           Since n is less than 10,  the test statistic is  7= 7.

 STEP 4.  a) Critical Value:        Since n is less than 10,  Table A-13 is used to find wo.os = 9.

 STEP 5.  a) Conclusion:          Since n is less than 10 and T =7 < 9 = WQ.OS, we reject H0.
4.4    OUTLIERS

4.4.1  Background

       Potential outliers are measurements that are extremely large or small relative to the rest of
the data and, therefore, are suspected of misrepresenting the population from which they were
collected.  Potential outliers may result from transcription errors, data-coding errors, or
measurement system problems. However, outliers may also represent true extreme values of a
distribution (for instance, hot spots) and indicate more variability in the population than was
expected.  Failure to remove true outliers or the removal of false outliers both lead to a
distortion of estimates of population parameters and if it is recommended that the QA Project
Plan or Sampling and Analysis Plan be reviewed for anomalies that could account for the
potential outlier.

       Statistical outlier tests give the analyst probabilistic evidence that an extreme value does
not "fit" with the distribution of the remainder of the data and is therefore a statistical outlier.
These tests should only be used to identify data points that require further investigation.  The
tests alone cannot determine whether a statistical outlier should be discarded or corrected within
a data set.  This decision should be based on judgmental or scientific grounds.
EPAQA/G-9S                                   115                                   February 2006

-------
       There are 5 steps involved in treating extreme values or outliers:

           1.  Identify extreme values that may be potential outliers;
           2.  Apply statistical test;
           3.  Scientifically review statistical outliers and decide on their disposition;
           4.  Conduct data analyses with and without statistical outliers; and
           5.  Document the entire process.

Potential outliers may be identified through the graphical representations of Chapter 2 (step 1
above). Graphs such as the box and whisker plot, ranked data plot, normal probability plot, and
time plot can all be used to identify observations that are much larger or smaller than the rest of
the data. If potential outliers are identified, the next step is to apply one of the statistical tests
described in the following sections. Section 4.4.2 provides recommendations on selecting a
statistical test for outliers.

       If a data point is found to be an outlier, the analyst may either:  1) correct the data point;
2) discard the data point from analysis; or 3) use the data point in all analyses. This decision
should be based on scientific reasoning in addition to the results of the statistical test. For
instance, data points containing transcription errors should be corrected, whereas data points
collected while an instrument was malfunctioning may be discarded.  Discarding an outlier from
a data set should be done with extreme caution, particularly for environmental data sets, which
often contain legitimate extreme values.  If an outlier is discarded from the data  set, all statistical
analysis of the data should be applied to both the full and truncated data set so that the effect of
discarding observations may be assessed. If scientific reasoning does not explain the outlier, it
should not be discarded from the data set.

       If any data points are found to be statistical outliers through the use of a statistical test,
this information will need to be documented along with the analysis of the data set, regardless of
whether any data points are discarded. If no data points are discarded, document the
identification of any statistical outliers by documenting the statistical test performed and the
possible scientific reasons investigated. If any data points are discarded, document each data
point, the statistical test performed, the scientific reason for discarding each data point, and the
effect on the analysis of deleting the data points.  This information is critical for effective peer
review.

4.4.2  Selection of a Statistical Test for Outliers

       There are several statistical tests for determining whether or not one or more
observations are statistical outliers. Step by step directions for implementing some of these
tests are described in Sections 4.4.3 through 4.4.6. Section 4.4.7 describes statistical tests for
multivariate outliers.

       If the data are approximately normally distributed, this guidance recommends Rosner's
test when the sample size is greater than 25 and the Extreme Value test when the sample size is
less than 25.  If only one outlier is suspected, then the Discordance test may be substituted for
either of these tests.  If the data are not normally distributed, or if the data cannot be transformed
EPAQA/G-9S                                 116                                 February 2006

-------
so that the transformed data are normally distributed, then the analyst should either apply a
nonparametric test (such as Walsh's test) or consult a statistician.  A summary of these
recommendations is contained in Table 4-3.
       Table 4-3. Recommendations for Selecting a Statistical Test for Outliers
Sample
Size
n<25
n<50
n>25
n>50
Test
Extreme Value Test
Discordance Test
Rosner's Test
Walsh's Test
Section
4.4.3
4.4.4
4.4.5
4.4.6
Assumes
Normality
Yes
Yes
Yes
No
Multiple
Outliers
No/Yes
No
Yes
Yes
4.4.3   Extreme Value Test (Dixon's Test)

       Dixon's Extreme Value test can be used to test for statistical outliers when the sample
size is less than or equal to 25. This test considers both extreme values that are much smaller
than the rest of the data (case 1) and extreme values that are much larger than the rest of the data
(case  2).  This test assumes that the data without the suspected outlier are normally distributed;
therefore, it is necessary to perform a test for normality on the data without the suspected outlier
before applying this test.  If the data are not normally distributed, then either transform the data,
apply a different test, or consult a statistician. Directions for the Extreme Value test are
contained in Box 4-15; an example of this test is contained in Box 4-16.

       This guidance recommends using this test when only one outlier is suspected in the data.
If more than one outlier is suspected, the Extreme Value test may lead to masking where two or
more  outliers close in value "hide" one another.  Therefore, if the analyst decides to use the
Extreme  Value test for multiple outliers, apply the test to the least extreme value first.

4.4.4   Discordance Test

       The Discordance test can be used to test if one extreme value is an outlier. The
Discordance test assumes that the data without the suspected outlier are approximately normally
distributed. Therefore, it is necessary to check for normality before applying this test.
Directions and an example of the Discordance test are contained in Box 4-17 and Box 4-18.
EPA QA/G-9S
117
February 2006

-------
                       Box 4-15: Directions for the Extreme Value Test (Dixon's Test)

  Let X(i), X(2), . . . , X(n) represent the data ordered from smallest to largest.  Check that the data without the
  suspect outlier are normally distributed, using one of the methods of Section 4.2. If normality fails, either
  transform the data or apply a different outlier test.
  STEP 1.  Null Hypothesis:
                                 HO:  There are no outliers in the data.
  STEP 2. Alternative Hypothesis:   i)  HA: X(i) is an outlier.
                                  ii) HA: X(n) is an outlier.
  STEP 3. Test Statistic:
                                 i) Compute the test statistic C, where
3 da, then reject the null hypothesis that there are no outliers in the data.
                     Box 4-16: An Example of the Extreme Value Test (Dixon's Test)

The data in order of magnitude from smallest to largest are (in ppm):

                 82.39, 86.62, 91.72, 98.37,  103.46, 104.93, 105.52, 108.21, 113.23, 150.55.

As the value 150.55 is much larger than the other values, it is suspected that this data point might be an outlier.
The Studentized Range test (Section 4.2.6) shows that there is no reason to suspect that the data are not
normally distributed.

STEP 1. Null Hypothesis:          Ho:  There are no outliers in the data.

STEP 2. Alternative Hypothesis:   HA: X(n) is an outlier.

oTCDo     tct t-f              ~      	~   X(n)-X(n-n  150.55-113.23
STEP 3. Test Statistic:


STEP 4. a) Critical Value:

STEP 5. a) Conclusion:
                                  Sincen = 10, c =
                                                    X(n) - X(2)     150.55-86.62

                                  Using Table A-4, do.os = 0.477.
                                  Since C = 0.584 > 0.477 = do.os, we reject the null hypothesis of no outliers in
                                  the data.
EPA QA/G-9S
                                                  118
February 2006

-------
                             Box 4-17:  Directions for the Discordance Test

  LetX(i), X(2), .  . ., X(n) represent the data ordered from smallest to largest.  Check that the data without the
  suspect outlier are normally distributed, using one of the methods of Section 4.2. If normality fails, either
  transform the data or apply a different outlier test.

  COMPUTATIONS:  Compute the sample mean,  X ,  and the sample standard deviation, s.

  STEP 1. Null Hypothesis:          H0: There are no outliers in the data.

  STEP 2. Alternative Hypothesis:    i)  HA: X(i) is an outlier.
                                 ii) HA: X(n) is an outlier.

                                       V1	 \/, ,                  y, , 	 \/
  STEPS. Test Statistic:            i)  D =	^           ii)  D = —^	
                                          s                      s

  STEP 4. a) Critical Value:         Use Table A-5 to find da.

  STEPS, a) Conclusion:           If D> da, then reject the null hypothesis that there are no outliers in the data.
                             Box 4-18: An Example of the Discordance Test

  The data in order of magnitude from smallest to largest are (in ppm):

                  82.39, 86.62, 91.72, 98.37, 103.46, 104.93, 105.52, 108.21, 113.23, 150.55.

  It is suspected that the data point 150.55 might be an outlier. The Studentized Range test (Section 4.2.6) shows
  that there is no reason to suspect that the data are not normally distributed.

  COMPUTATIONS: X = 104.5 ppm and s = 18.922 ppm.

  STEP 1.  Null Hypothesis:        H0: There are no outliers in the data.

  STEP 2.  Alternative Hypothesis:  HA: X(n) is an outlier.


  STEP 3.  Test Statistic:            D = ^il£ =  15°-55-104-5 = 2.43
                                        s         18.922

  STEP 4.  a) Critical Value:        Using Table A-5, do.os = 2.176.

  STEPS,  a) Conclusion:         Since D = 2.43 > 2.176 = da, we reject the null hypothesis of no outliers.
4.4.5   Rosner's Test

        A parametric test developed by Rosner can be used to detect up to 10 outliers for sample
sizes of 25 or more.  This test assumes that the data without the suspected outliers are normally
distributed. Therefore, it is necessary to perform a test for normality before applying this test.  If
the data are not normally distributed, then either transform the data, apply a different test, or
consult a statistician.  Directions for Rosner's test are contained in Box 4-19 and an example is
contained in Box 4-20.


EPAQA/G-9S                                    119                                     February 2006

-------
                           Box 4-19: Directions for Rosner's Test for Outliers

  LetX|,X2, .  . .  , Xn represent the ordered data points.  By inspection, identify the maximum number of possible
  outliers, 1 < r0 < 1 0. Check that the data are normally distributed without the suspected outlier(s) using one of the
  methods of Section 4.2.
  COMPUTATIONS: Using the entire data set, compute the sample mean,  X   , and the sample standard
  deviation,  s^ .  Determine the observation farthest from X^ and label it y^ . Delete y^ from the data and
  compute the sample mean, X(1) , and the sample standard deviation, s(1) . Now determine the observation
  farthest from X(1) and label it y(1) . Delete  y(1)  from the data and compute the sample mean, X(2) , and the
  sample standard deviation, s^2' .  Continue this process until ro extreme values have been eliminated. After this
  process the analyst should have:

                             <°>, y(0)

  STEP 1. Null Hypothesis:         H0:  There are no outliers in the data.

  STEP 2. Alternative Hypothesis:    HA:  There are as many as r0 outliers in the data.

  Steps 3 through 5 of this test are iterative. First, test if there are r0 outliers. If not, then test if there are r0 - 1
  outliers. Continue, until it is determined that either there are a certain number of outliers or that there are no
  outliers.
  STEP 3.  Test Statistic:           Rr = - 7— -, - , where r starts at r0 and runs through 1 .
                                        s('  i;

  STEP 4.  a) Critical Value:               Use Table A-6 to find lr.

  STEPS,  a) Conclusion:          If f?r>Ar, then conclude that there are r outliers.  Otherwise, return to step 3
            to
                                test for r-1 outliers.
4.4.6   Walsh's Test

        A nonparametric test was developed by Walsh to detect multiple outliers in a data set.
This test requires a large sample size: n > 220 for a significance level of a = 0.05, and n > 60 for
a significance level of a = 0.10. However, since the test is a nonparametric test, it may be used
when the data is not normally distributed. It should be noted that this test is used infrequently
with environmental data.  Directions for the test for large sample sizes are given in Box 4-21.

4.4.7   Multivariate Outliers

        Multivariate analysis, such as factor analysis and principal components analysis, involves
the analysis of several variables simultaneously.  Outliers in multivariate analysis are then values
that are extreme in relationship to either one or more variables.  As the number of variables
increases, identifying potential outliers using graphical representations becomes more difficult.
In addition, special procedures are required to test for multivariate outliers and details of these
procedures are beyond the scope of this guidance.


EPA QA/G-9S                                   120                                    February 2006

-------
                            Box 4-20: An Example of Rosner's Test for Outliers

  Consider the following 32 ordered data points (in ppm): 2.07, 40.55, 84.15, 88.41, 98.84, 100.54, 115.37, 121.19,
  122.08, 125.84,  129.47, 131.90, 149.06, 163.89, 166.77, 171.91, 178.23, 181.64, 185.47, 187.64, 193.73, 199.74,
  209.43, 213.29,  223.14, 225.12, 232.72, 233.21, 239.97, 251.12, 275.36, 395.67.

  A normal probability plot excluding the suspected outliers shows that there is no reason to suspect that the data is
  not normally distributed. In addition, this graph identified four potential outliers: 2.07, 40.55, 275.36, and 395.67.
  Rosner's test at  a significance level of 0.05 will be applied to see if there are ro = 4 or fewer outliers.

  COMPUTATIONS:  The summary statistics and suspected outliers are listed in the following table:
/ x« s« y(/)
0
1
2
3
169.923
162.640
167.993
172.387
75.133
63.872
57.460
53.099
395.67
2.07
40.55
275.36
  STEP 1. Null Hypothesis:         H0: There are no outliers in the data.

  STEP 2. Alternative Hypothesis:   HA: There are as many as 4 outliers in the data.
  STEPS. Test Statistic:


  STEP 4. a) Critical Value:

  STEP 5. a) Conclusion:



  STEPS. Test Statistic:


  STEP 4. a) Critical Value:

  STEP 5. a) Conclusion:



  STEP 3. Test Statistic:


  STEP 4. a) Critical Value:

  STEP 5. a) Conclusion:



  STEPS. Test Statistic:


  STEP 4. a) Critical Value:

  STEP 5. a) Conclusion:
                   275.36-172.387
                = •!	!• = 1.939
                       53.099
Using Table A-6, fa = 2.89.

Since RA = 1.939 < 2.89 = fa, there aren't 4 outliers.  Therefore, test if there
are 3 outliers by computing
r' -
                   40.55-167.993
                                 • = 2.218
         s(2)           57.46

Using Table A-6, fa = 2.91.

Since R3 = 2.218 < 2.91 = fa, there aren't 3 outliers.  Therefore, test if there
are 2 outliers by computing
R2 =
      y(D_X<1>
            |2.07-162.641
               63.872
                              • = 2.514
      Using Table A-6,/12 = 2.92.
Since R2 = 2.514 < 2.92 = fa, there aren't 2 outliers.  Therefore, test if there
are 1 outlier by computing
                  395.67-169.923
                                  • = 3.005
        s(0)            75.133

Using Table A-6, fa = 2.94.

Since R-i = 3.005 > 2.94 = fa, we conclude there is one outlier, 395.67.
EPA QA/G-9S
                 121
                                                     February 2006

-------
                    Box 4-21: Directions for Walsh's Test for Large Sample Sizes

  LetX(i), X(2), .  . .,X(n) represent the ordered data.  If n < 60, do not apply this test.  If 60 < n <220, then a= 0.10.
  If n> 220, then a =0.05.

  COMPUTATIONS: Identify the number of possible outliers, r. Compute

                                                            //    i \ i,r
                    c = ceiling(V2nj, k = r +c , b2 = V ,  and a =
                                                            c-b -1

 where ceiling( ) indicates rounding the value to the next largest integer.

 STEP 1.   The r smallest points are outliers (with an a level of significance) if
 STEP 2.   The r largest points are outliers (with an a level of significance) if

                               -^(n+1-r) ~ (1 + a)X(n-r) + aX(n+-\-k) > °

 STEP 3.   If both of the inequalities are true, then both small and large outliers are indicated.
4.5    TESTS FOR DISPERSIONS

       Many statistical tests make assumptions about the dispersion (variance) of data. This
section considers some of the most commonly used statistical tests for comparing variances.
Section 4.5.1 constructs a confidence interval for a population variance.  Section 4.5.2 provides a
test for comparing two population variances. Section 4.5.3 (Bartlett's test) and Section 4.5.4
(Levene's test) describe tests that compare two or more population variances.  The analyst
should be aware that many statistical tests only require the approximate equality of variances and
that many of the tests remain valid unless there is gross inequality in the population variances.

4.5.1   Confidence Intervals for a Single Variance

       This section discusses confidence intervals for a single variance or standard deviation.
The method described  in Box 4-22 can be used to find a two-sided 100(l-a)% confidence
interval. The upper end point of a two-sided 100(l-a)% confidence interval is also a
100(l-a/2)% upper confidence limit., and the lower end point is also a 100(l-a/2)% lower
confidence limit. Since the standard deviation is the square root of the variance, a confidence
interval for the variance can be converted to a confidence interval for the standard deviation by
taking the square roots of the endpoints of the interval.  The confidence interval procedure
assumes the data are a  random sample from a normally distributed population and can be highly
sensitive to outliers or  to departures from normality.
EPA QA/G-9S                                122                                 February 2006

-------
                   Box 4-22: Directions for Constructing Confidence Intervals and
           Confidence Limits for the Sample Variance and Standard Deviation with an Example

  LetXi,X2,  . .  . , Xn represent the n data points.
  COMPUTATIONS: Calculate the sample variance s2 (Section 2.2.3).
 A 100(1-a)% confidence interval for the population variance, , where Table A-9 is
  used to find %%_-\ a/2 and ^n--\ -\-ai2 • A 100(1-«)% confidence interval for the population standard deviation is
  the square root of the interval above.

  Example: Ten samples were analyzed for lead (in ppb): 46.4, 46.1, 45.8, 47, 46.1, 45.9, 45.8, 46.9, 45.2, 46.

  COMPUTATIONS: The sample variance is s2 = 0.286.
 A 95% confidence interval for the population variance, 
-------
         Box 4-23: Directions for an F-Test to Compare Two Population Variances with an Example

  LetX|,X2,  . .  . , Xm represent the m data points from population 1 and YI, Y2,  . .  . , Yn represent the n data
  points from population 2.

  COMPUTATIONS: Calculate the sample variances sx and sy (Section 2.2.3).


  STEP1. Null Hypothesis:         H0:  a2x = ay .


  STEP 2. Alternative Hypothesis:   HA:  crx # ay.


  STEP 3. Test Statistic:            F0 = max(Fx, Fy) = max ^-,  -^- .  If F0 = Fx, then let k = m - 1 and
                                                       l.sy   sxj
                                 q = n - 1.  If Fo = Fy, then let k = n - 1 and q = m - 1.

  STEP 4. a) Critical Value:        Use Table A-10 to find F<,,, 1.^2.

  STEP4. b) p-value:             Use Table A-10 to find 2- P(FM > F0).

  STEPS, a) Conclusion:          If F0 > F^,,, 1-0/2, then reject the null hypothesis of equal population variances.

  STEPS, a) Conclusion:          If p-value < a, then reject the null hypothesis of equal population variances.

  Example: Manganese concentrations (in ppm) were collected from 2 wells.  A 0.OS-level F-test will be used to
  test if the population variances are equal.

                  well X  50, 73, 244, 202
                  well Y:  272, 171, 32,250, 53

  COMPUTATIONS: The sample variances are s2x = 9076 and sy =12125.


  STEP1. Null Hypothesis:         H0:  o2x = ^ .


  STEP 2. Alternative Hypothesis:   HA:  crx * oy.


  STEP 3. Test Statistic:            F0 = max(Fx, Fy) = max -^f,  &} = maxf-^Z6-  !£!££! = 1 336
                                   °       V x,  y;      ^2   ^j      ^12125'  9076 J


                                 Also, /<=5-1=4andg = 4-1=3.

  STEP 4. a) Critical Value:        Using Table A-10, F4,3,0.975= 15.1.

  STEP 4. b) p-value:             Using Table A-10, p-value = 2-P(F4,3 > 1.336) which exceeds 0.20.
                                 (Using software, the exact p-value is 0.8454)

  STEPS, a) Conclusion:          Since F0 = 1.336 < 15.1  = F4,3,0.975, we fail to reject the null
                                 hypothesis of equal population variances.

  STEP 5. a) Conclusion:          Since p-value = 0.8454 > 0.05  = a, we fail to reject the null hypothesis
                                 of equal population variances.
EPA QA/G-9S                                     124                                     February 2006

-------
                                  Box 4-24:  Directions for Bartlett's Test

  Consider^ independent random samples with a sample size of n/for the/h group and let N = n^+--- + nk.

  COMPUTATIONS:  For each of the/(groups, calculate the sample variance, sf (Section 2.2.3). Also, compute
                           A   	 k
  the pooled variance: s? =	>    (n, -l)s? •
     ~                P   H I   I. / , : * \ I   7°/
STEP 1.  Null Hypothesis:
                                         = •" = °"k (The variances are equal)
STEP 2. Alternative Hypothesis:   HA: The variances are not equal.

STEPS. Test Statistic:

STEP 4. a) Critical Value:

STEP 4. b) p-value:

STEP 5. a) Conclusion:

STEP 5. a) Conclusion:
                                  Use Table A-9 to find  Xk--\,-\-a •

                                  Use Table A-9 to find  p(xl--\ > B0 ).

                                  If Bo > %%-•], 1-a itnen reject the null hypothesis of equal variances.
                                  If p-value < a, then reject the null hypothesis of equal variances.
                                 Box 4-25: An Example of Bartlett's Test
  Manganese concentrations were collected from 6 wells over a 4 month period. It is important to determine if the
  variances of the six wells are equal.  Bartlett's test at a significance level of 0.05 will be used.
Sampling Date
January 1
February 1
March 1
April 1
n,-(A/=17)
x,
s,2
Well 1
50
73
244
202
4
142.25
9076.25
Well 2


46
77
2
61.50
480.5
Well 3
272
171
32
53
4
132
12454
Well 4


34
3940
2
1987
762841 8
Well 5


48
54
2
51
18
Well 6

68
991
54
3
371
288349
  COMPUTATIONS: The pooled variance is S2p = —-— [(4-1)-9076.25 +--+(3-1)-288349J = 751836.84

  STEP1. Null Hypothesis:          H0:  a\=-- = a\.

  STEP 2. Alternative Hypothesis:    HA: The variances are not equal.
  STEPS. Test Statistic:

  STEP 4. a) Critical Value:

  STEP 4. b) p-value:


  STEP 5. a) Conclusion:

  STEP 5. a) Conclusion:
                                BO = (17 - 6)- ln(751836.84) - [(4 -1)- ln(9076.25)+-••+ (3 -1)- ln(288349)]
                                   = 43.15

                                Using Table A-9, ^5,0.95 =11-07 •

                                Using Table A-9, p-value = P\%1 >43.15J < 0.005.
                                (Using statistical software, the exact p-value is almost 0).

                                Since Bo = 43.15 > 11.07 = %k-\ \-a • we reject the null hypothesis of equal
                                population variances.
                                Since p-value = almost 0 < 0.05 = a, we reject the null hypothesis of equal
                                population variances.
EPA QA/G-9S
                                                 125
February 2006

-------
                               Box 4-26: Directions for Levene's Test

  Consider/< independent random samples with a sample size of n, for the /th group and let N = n-\ H	\- nk.

  COMPUTATIONS: For each of the/(groups, calculate the group mean, X,-. Then compute the absolute
  residuals, z/y = X/y-X,- , where X/j represents the/h value of the /th group.  For each group, calculate the mean

  absolute residual, z,- =—^  '  z« . Next, calculate the overall mean absolute residual,
                                            y=i
                                                 =       n'2'-
  Finally, compute the following sums of squares for the absolute residuals:

                 k  n,                       k
                          _ >y
                      ~N 'z  ' SSGROUPS =
                                                       o
                                                        , and SSERROR =SSTOTAL -SSGROUPS.
                /=1 y=i
  STEP1.  Null Hypothesis:        H0:  o-^ = ••• = cr^.

  STEP 2.  Alternative Hypothesis:   HA:  The variances are not equal.

                                    ssGRQt/PS/(*-l)
STEP 3.  Test Statistic:


STEP 4.  a)  Critical Value:

STEP 4.  b)  p-value:

STEP 5.  a)  Conclusion:


STEPS,  a)  Conclusion:
                                F0 =
                                    SSERROR /(A/ -*)

                                Use Table A-10 to find Fk.-\,N-k,-\-a-

                                Use Table A-10 to find P(FM, «-* > F0).

                                If Fo > Fk--\, N-k, i-«, then reject the null hypothesis of equal population
                                variances.

                                If p-value < a, then reject the null hypothesis of equal population variances.
4.6    TRANSFORMATIONS

       Most statistical tests and procedures contain assumptions about the data. For example,
some common assumptions are that the data is normally distributed; variance components of a
statistical model are additive; two independent data sets have equal variance; and a data set has
no trends over time or space. If the data do not satisfy such assumptions, then the results of a
statistical procedure or test may be biased or incorrect.  Fortunately, data that do not satisfy
statistical assumptions may often be converted or transformed mathematically into a form that
allows standard statistical tests to perform adequately.

       It is not recommended to transform data for estimation purposes. Transforming,
estimating, and then transforming the estimate back to the original domain will, in general, lead
to biased estimates. A better approach to estimation here is to use a nonparametric procedure.
EPA QA/G-9S
                                             126
                                                                                   February 2006

-------
Box 4-27: An Example of Levene's Test
Four months of data on arsenic concentration (ppm) were collected from six wells at a Superfund site. This data
set is shown in the table below. Before analyzing this data, it is important to determine if the variances of the six
wells are equal. Levene's test at a significance level of 0.05 will be used to make this determination.
COMPUTATIONS: The table below contains the data values
means and the absolute residual means.
Month
1
2
3
4

well!
value
22.90
3.09
35.70
4.18
res
6.43
13.38
19.23
12.29
X1= 16.47
z,= 12.83
well 2
value
2.00
1.25
7.80
52.00
X2 =
res
13.76
14.51
7.96
36.24
15.76
Z2 =18.12
The overall absolute residual mean
The sum
STEP1.
STEP 2.
STEP 3.
STEP 4.
STEP 4.
STEP 5.
STEP 5.
of squares are:
SSTOTAL
Null Hypothesis:
Alternative Hypothesis:
Test Statistic:
a) Critical Value:
b) p-value:
a) Conclusion:
a) Conclusion:
well 3
value
2.0
109.4
4.5
2.5
res
27.6
79.8
25.1
27.1
X3 = 29.60
z3=39.90
the absolute residuals (labeled res), the sample
well 4
value
7.84
9.30
25.90
2.00
X4 =
res
3.42
1.96
14.64
9.26
11.26
Z4=7.32
well 5 we
value
24.90
1.30
0.75
27.00
X5 =
res value
11.41 0.34
12.19 4.78
12.74 2.85
13.51 1.20
13.49 X6 =
zs= 12.46 z6 =
16
res
1.95
2.49
0.56
1.09
2.29
1.52
is z = (12.83 + 18.12 + 39.9 + 7.32 + 12.46 + 1.52)/6 = 15.36.
= 6300.89, SSWELLS =3522.90, and SSERROR = 2777.99 .
H ' 2 - - 2
HA: The variances are not equal.
p ooGROt/PS
/(/c-1) 3522.9/(6-l)
U SSERROR /(N -k)
2777.99/(24-6)
4 56



Using Table A-1 0, F5, is, 0.95 = 2.77.
Using Table A-10, 0.01 <
Using software, P(F5, -is >
Since F0 = 4.56 > 2.77 =
population variances.
Since p-value = 0.0073 <
population variances.
p-value < 0.001.
4.56) = 0.0073.
Fs, 18,0.95
we reject the null hypothesis of equal
0.05 = a, we reject the null hypothesis of equal
4.6.1   Types of Data Transformations

       Any mathematical function that is applied to every point in a data set is called a
transformation.  Some commonly used transformations include:

       Logarithmic (LogXor Ln X):  This transformation is best suited for data that is right-
skewed which may occur when the measurement follows a lognormal distribution. The
transformation is also helpful when the variance at each level of the data is proportional to the
square of the mean of the data points at that level.  For example, if the variance of data collected
around 50 ppm is approximately 250, but the variance of data collected around 100 ppm is
approximately 1000, then a logarithmic transformation may be useful.  This situation is often
EPA QA/G-9S                               127                               February 2006

-------
characterized by having a constant coefficient of variation (standard deviation divided by the
mean) over all possible data values.

       If some of the original values are zero or negative, it is customary to add a small quantity
to make the data value non-zero since the logarithm of zero or a negative number does not exist.
The size of the small quantity depends on the magnitude of the non-zero data. As a working
point, a value of one tenth the smallest non-zero value could be selected.  It does not matter
whether a natural (In) or base 10 (log) transformation is used because the two transformations are
related by the expression ln(X) = 2.303 log(X).  Directions for applying a logarithmic
transformation with an example are given in Box 4-28.

       Square Root (\X):  This transformation may be used when dealing with small whole
numbers, such as bacteriological counts, or the occurrence of rare events, such as violations of a
standard over the course of a year. The  underlying assumption is that the original data follow a
Poisson-like distribution in which case the mean and variance of the data are equal. It should be
noted that the square root transformation overcorrects when very small values and zeros appear
in the original data.  In these cases,  
-------
Box 4-28
: Directions for Transforming
Data and an Example
LetX|,X2, . . . , Xn represent the n data points. To apply a transformation, simply apply the transforming
function to each data point. When a transformation is implemented to make the data satisfy some
statistical assumption, it will need to be verified that the transformed data satisfy this assumption.
Example: Transformina Loanormal Data
A logarithmic transformation is particularly useful for pollution data. Pollution data are often right-skewed,
thus the log-transformed data will tend to be symmetric. Consider the data set shown below with 15 data
points. A histogram of this data below shows that the data are possibly lognormally distributed. The
transformed data are shown in column 2. A histogram of the transformed data below shows that the
transformed data appear to be normally distributed.
Observed Transformed Observed
X ->
MX] X
0.22 -> -1.51 0.47
3.48 -> 1.25 0.67
6.67 -> 1.90 0.75
2.53 -> 0.93 0.60
1.11 -> 0.10 0.99
0.33 -> -1.11 0.90
1.64 _» 0.50 0.26
Transformed
-» MX]
-> -0.76
-> -0.40
-> -0.29
-> -0.51
-> -0.01
-> -0.11
-> -1.35
1.37 -> 0.31
7
» 6
o
|4
Number of Observations Number of Ot
o-^coco-tioiai^j o -> co GJ





0
4 -3







1 2 3
Observed Values



-2 -1 0
Transformed Va ues




4567


1234
       While transformations are useful for dealing with data that do not satisfy statistical
assumptions, they can also be used for various other purposes.  For example, transformations are
useful for consolidating data that may be spread out or that have several extreme values.  In
addition, transformations can be used to derive a linear relationship between two variables on the
newly transformed data, so that linear regression analysis can be applied.  They can also be used
to efficiently estimate quantities such as the mean and variance of a lognormal distribution.
EPA QA/G-9S                                129                                February 2006

-------
Transformations may also make the analysis of data easier by changing the scale into one that is
more familiar or easier to work with.

       Once the data have been transformed, all statistical analysis should be performed on the
transformed data. Rarely should an attempt made to transform the data back to the original form
because this can lead to biased estimates. For example, estimating quantities such as means,
variances, confidence limits, and regression coefficients in the transformed scale typically leads
to biased estimates when transformed back into original scale. However, it may be difficult to
understand or apply results of statistical analysis expressed in the transformed scale.  Therefore,
if the transformed data do not give noticeable benefits to the analysis, it is better to use the
original data.  There is no point in working with transformed data unless it adds value to the
analysis.

4.7    VALUES BELOW DETECTION LIMITS

       Data generated from chemical analysis may fall below the detection limit (DL) of the
analytical procedure. These measurement data are generally  described as non-detects rather
than as zero or not present and the appropriate limit of detection is usually reported. In cases
where measurement data are described as non-detects, the concentration of the chemical is
unknown although it lies somewhere between zero and the detection limit. Data that includes
both detected and non-detected results are called censored data in the statistical literature.

       There are a variety of ways to evaluate data that includes values below the detection
limit. However, there are no general procedures that are applicable in all cases. Some general
guidelines are presented in Table 4-4. Although these guidelines are usually adequate, they
should be implemented cautiously.
                Table 4-4. Guidelines for Analyzing Data with Non-Detects
Approximate Percentage
of Non-Detects
< 15%
15% -50%
> 50% - 90%
Section
4.7.1
4.7.2
4.7.3
Statistical Analysis Method
Replace non-detects with 0, DL/2, DL.,
Cohen's Method.
Trimmed mean, Cohen's Method,
Winsorized mean and standard deviation.
Tests for proportions (Section 3.2.1.5)
       All of the suggested procedures for analyzing data with non-detects depend upon the
amount of data below the detection limit.  For relatively small amounts below detection limit
values, replacing the non-detects with a small number and proceeding with the usual analysis
may be satisfactory depending on the purpose of the analysis.  For moderate amounts of data
below the detection limit, a more detailed adjustment is appropriate.  In situations where
relatively large amounts of data fall below the detection limit, one may need only to consider
whether or not the chemical was detected above some level. Table 4-4 provides percentages to
EPA QA/G-9S
130
February 2006

-------
assist the user in evaluating their particular situation. However, it should be recognized that
these percentages are not hard and fast rules.

       In addition, sample size influences which procedures should be used to evaluate the data.
 For example, the case where 1 sample out of 4 is not detected should be treated differently from
the case where 25  samples out of 100 are non-detects. Therefore, this guidance suggests that the
data analyst consult a statistician for the most appropriate way to evaluate data containing values
below the detection level.

4.7.1  Approximately less than 15% Non-detects - Substitution Methods

       If a small proportion of the observations are non-detects, then these  may be replaced with
a small number, usually DL/2, and the usual analysis performed. Alternative substitution values
are 0 (see Aitchison's Method below) or the detection limit.  It should be noted that Cohen's
Method (section 4.7.2.1) will also work with small amounts of non-detects.

4.7.1.1   Aitchison's Method

       Later adjustments to the mean and variance assume that the data values really were
present but could not be recorded since they were below the detection limit. However, there are
cases where the data values are below the detection limit because they are actually zero, i.e., the
contaminant or chemical of concern being entirely absent. Such data sets typically contain a
mixture of zero values and present, but nondetected values. Aitchison's Method is simply
adjustment formulas for the mean and variance if 0 values are  substituted for non-detects.
Directions for Aitchison's method are contained in Box 4-29 with an example in Box 4-30.
              Box 4-29: Directions for Aitchison's Method to Adjust Means and Variances

 LetXi,X2, .  ..Xn.Xfw-i,.  . . , Xn represent the data points where the first m values are above the detection limit
 and the remaining n-m data points are below the detection limit.

 COMPUTATIONS: Using the data above the detection level, compute the sample mean and sample variance:


                      Xd =Vm X,. and s^ =^JTm  X? -A
                       d         '     d              '
 ^     x^-i-x-i     ,         -i     ,     •     -^  m ^    -i  2
 Compute the adjusted sample mean and sample variance, X = — Xd and s =
                                                      d         -   — - - -
                                                   n            n - 1      n(n - 1)
EPAQA/G-9S                                 131                                February 2006

-------
                           Box 4-30: An Example of Aitchison's Method

 The following data consist of 10 Methylene Chloride samples: 1.9,1.3, <1, 2.0, 1.9, <1, <1, <1, 1.6, 1.7. There are
 6 values above the detection limit and 4 below, so m = 6 and n-m = 4. Aitchison's method will be used to estimate
 the mean and sample variance of this data.

 COMPUTATIONS: Compute the mean and variance for the 6 values above the detection limit

                                  Xd =1.733 and s2, =0.0667.

                                           	   r*
 The adjusted sample mean and sample variance are: X =	1.733 = 1.04 and


                         s2 = -6-^-L 0.0667+	^—r-1.73332 =0.8382.
                             10-1         10-(10-1)
4.7.2  Between Approximately 15% - 50% Non-detects

4.7.2.1   Cohen's Method

       Cohen's method provides adjusted estimates of the sample mean and standard deviation
that accounts for data below the detection level. The adjusted estimates are based on the
statistical technique of maximum likelihood estimation of the mean and variance so that the non-
detects are accounted for.  Care has to be taken when using the adjusted mean and variance in
statistical tests. If the percentage of data below the detection limit is relatively small (e.g., less
than 20%), the significance level and power of the statistical test are approximately correct. As
the proportion of data below detection increases, power declines and the true significance level
increases dramatically. This is mainly attributable to the lack of independence between the
adjusted mean and adjusted variance.  If more than 50% of the observations are not detected,
Cohen's method should not be used. In addition, this method requires that the data without the
non-detects be normally distributed and that the detection limit is always the same.  Directions
for Cohen's method are contained in Box 4-31 with an example in Box 4-32.

4.7.2.2   Selecting Between Aitchison's Method and Cohen's Method

       Cohen's underlying model is that the population contains a normal distribution, but we
cannot see the values below the censoring point. Aitchison's underlying model is that the
population consists of a proportion following a normal distribution together with a proportion of
values at zero. The difference in concepts becomes relevant depending on the types of
inferences made.  For example,  in estimating upper quantiles, the analyst may use only the
normal portion for the statistics, adjusting the quantile to account for the estimated proportion at
zero.  If a confidence interval for the mean was required a simple substitution of zero for all data
below detection would suffice.  To determine if a data set is better adjusted by Cohen's method
or Aitchison's method, a simple graphical procedure using a Normal Probability Plot
(Section 2.3.5) can be used.  Directions for this procedure are given in Box 4-34 with an example
in Box 4-35.
EPA QA/G-9S                                 132                                 February 2006

-------
                             Box 4-31:  Directions for Cohen's Method

LetX|,X2,  . ..Xn.Xmn,. .  . , Xn represent the data points where the first m values are above the detection limit
(DL) and the remaining n-m data points are below the detection limit.

COMPUTATIONS:  Using the data above the detection level, compute the sample mean and sample variance:


                                                                   X
                      ~*r     ^  X—* -t r       9
                      X  = —and s =
                               2
  Compute h =	and y =	—— .  Use h, y, and Table A-11 to determine i .  If the exact values of h and
                           (xd-DL?

  y do not appear in the table, use double linear interpolation (Box 4-33) to estimate i .


  Estimate the corrected sample mean,  X , and sample variance, s2:
                                     S2=s2d+A(xd-DL)2.
                              Box 4-32:  An Example of Cohen's Method

  Sulfate concentrations (mg/L) were measured for 24 data points with 3 values falling below the detection limit of
  1450 mg/L. The 24 values are:

                 1850, 1760, <1450, 1710, 1575, 1475, 1780, 1790, 1780, <1450, 1790, 1800,
                 <1450, 1800, 1840, 1820,  1860, 1780, 1760, 1800,  1900, 1770, 1790, 1780.

  Cohen's Method will be used to adjust the sample mean and sample variance for use in a f-test to determine if the
  mean is greater than 1600 mg/L.

  COMPUTATIONS: The sample mean and sample variance of the m = 21 values above the detection level are

                                   Xd =1771.9 and s^ =8593.69.

                           24 —21                   8593 69
  The values of h and rare: h =	= 0.125 and  y =	:	- = 0.083. Table A-11 does not contain
                             24                  (1771.9-1450)2

  the exact entries for h and y, double linear interpolation was used to estimate  i = 0.149839 (see Box 4-33).

  The adjusted sample mean and sample variance are:


            ~X = Xd  - i(xd  - DL) = 1771.9 - 0.149839 • (1771.9 -1450) = 1723.67 = ~X


          S2 = s2 + A(xd-DL)2 =8593.69+ 0.149839-(1771.9-1450)2  =24119.95 = s2
EPAQA/G-9S                                    133                                    February 2006

-------
                          Box 4-33: Example of Double Linear Interpolation

    The details of the double linear interpolation are provided to assist in the use of Table A-11. The desired
    value for i corresponds to y = 0.083 and h = 0.125 from Box 4-32. The values from Table A-11 used for
    interpolation are:
y
PI
A2
= 0
= 0
05
10
h
C-i
Xn =
X21 =
=
0
0
0.10
11431
11804
C2
X12 =
X22 =
=
0
0
0.15
17925
18479
We first interpolate between columns:

         y =x +
         71    11
                      C~°1  -(x12 - Xn) = 0.1 1431 + °-125~0-10- (0.17925 -0.1 1431) = 0.14678
                            V 12    11'                      V               '
                                                0.15-0.10
        y2 =x21+
                 C~°1
                           • (x22 -x21) = 0.1 1804 + °-125~°-10- (0.18479 -0.1 1804) = 0.151415
                                                    —
                                                0.15 — 0.10
    Now we interpolate between the rows:
                     r2-r1
                          • (y2 -y1) = 0.14678 + °-083 ~°-05 . (0.151415 -0.14678) = 0.149839
                           V2   1'           0.10-0.05  v                '
            Box 4-34: Directions for Selecting Between Cohen's Method or Aitchison's Method

 LetXi,X2, . .,Xm, .  . ., Xn represent the data points with the first m values are above the detection limit (DL) and
 the remaining n-m data points are below the DL.

 STEP 1:   Use Box 2-17 to construct a Normal Probability Plot of all the data but only plot the values belonging
           to those above the detection level.  This is called the Censored Plot.

 STEP 2:   Use Box 2-17 to construct a Normal Probability Plot of only those values above the detection level.
           This called the Detects only Plot.

 STEP 3:   If the Censored Plot is more linear than the Detects Only Plot, use Cohen's Method to estimate the
           sample mean and variance. If the Detects Only Plot is more linear than the Censored Plot, then use
           Aitchison's Method to estimate the sample mean and variance.
4.7.3   Greater than Approximately 50% Non-detects - Test of Proportions

        If more than 50% of the data are below the detection limit but at least 10% of the
observations are quantified, then the best option is a test of proportions. Thus, if the parameter
of interest is a mean, consider switching the parameter of interest to some percentile greater than
the percent of data below the detection limit.  For example, if 67% of the data are below the DL,
consider switching the parameter of interest to the 75th percentile.  Then the method described in
3.2.1.5 can be applied to test a hypothesis concerning the 75th percentile. It is important to note
that the tests of proportions may not be applicable for composite samples. In this case, the data
analyst should consult a statistician before proceeding with analysis.
EPA QA/G-9S
                                           134
                                                                                     February 2006

-------
          Box 4-35: Example of Determining Between Cohen's Method and Aitchison's Method

 Readings of Chlorobenzene were obtained from a monitoring well:

                        <1, <1, <1, 1.2, 1.25, 1.3, 1.45, 1.35, 1.55, 1.6, 1.85, 2.1

 Step 1:    Using the directions in Box 2-17 the following is the Censored Plot:
 STEP 2:   Using the directions in Box 2-17 the following is the Detects only Plot:
 STEP 3:   Since the Censored Plots is more linear than the Detects Only Plot, Cohen's Method should be used
           to estimate the sample mean and variance.
4.7.4  Greater than Approximately 90% Non-detects

       If very few quantified values are found, a method based on the Poisson distribution may
be used as an alternative approach. However, with a large proportion of non-detects in the data,
the data analyst should consult with a statistician before proceeding with analysis.

4.7.5  Recommendations

       If the number of sample observations is small (n < 20), Cohen's and other maximum
likelihood methods can produce biased results since it is difficult to assure that the underlying
distribution is appropriate and the solutions to the likelihood equation are statistically consistent
only if the number of samples is large.  Additionally, most methods will yield estimated
parameters with large estimation variance, which reduces the power to detect import differences
from standards or between populations. While these methods can be applied to small data sets,
the user should be cautioned that they will only be effective in detecting large departures from
the null hypothesis.

       If the degree of censoring is relatively low, reasonably good estimates of means,
variances and upper percentiles can be obtained.  However,  if the rate of censoring is very high
(greater than 50%) then little can be done statistically except to focus on some upper quantile of

EPAQA/G-9S                                  135                                 February 2006

-------
the contaminant distribution, or on some proportion of measurements above a certain critical
level that is at or above the censoring limit.

       When the numerical standard is at or below one of the censoring levels and a one-sample
test is used, the most useful statistical method is to test whether the proportion of a population
that is above (below) the standard is too large, or to test whether and upper quantile of the
population distribution is above the numerical standard. Table 4-5 gives some recommendation
on which statistical parameter to use when censoring is present in data sets for different sizes of
the coefficient of variation.

           Table 4-5.  Guidelines for Recommended Parameters for Different
                         Coefficient of Variations and Censoring
Approximate Coefficient
of Variation (CV)
Large: CV> 1.5
Medium: 0.5 50%)
Upper Percentile
Upper Percentile
Median
       When comparing two data sets with different censoring levels (i.e., different detection
limits), it is recommended that all data be censored at the highest censoring value present and a
nonparametric test such as the Wilcoxon Rank Sum Test (Section 3.3.2.1.1) used to compare the
two data sets.  There is a corresponding loss of statistical power but to a certain extent this can be
minimized through the use of large sample sizes.

4.8    INDEPENDENCE

       When data are truly independent, the correlation between data points is by definition zero
and the selected statistical tests attains the desired decision error rates (given the appropriate
assumptions have been satisfied). When correlation exists, the effectiveness of statistical tests is
diminished.  Environmental data are particularly susceptible to correlation problems due to the
fact that such environmental data are collected under a spatial pattern or sequentially over time.

       If observations are positively correlated over time or space, then the effective sample size
for a test tends to be smaller than the actual sample size—i.e., each additional observation does
not provide as much 'new' information because  its value is partially determined by the value of
adjacent observations.  This smaller effective sample size means that the degrees of freedom for
the test statistic is smaller, or equivalently, the test is not as powerful as originally thought. In
addition to affecting the false acceptance error rate, applying the usual tests to correlated data
tends to result in a test whose actual significance level is larger than the nominal error rate.
EPA QA/G-9S
136
February 2006

-------
       When observations are correlated, the estimate of the variance in the test statistic formula
is often understated. For example, consider the mean of a series of n temporally-ordered
observations. If these observations are independent, then the variance of the mean is o2ln, where
a2 is the variance of an individual observation.  However, if the observations are not independent
and the correlation between successive observations isp, then the variance of the mean is
                            ,	,                       2 ""!
                         var(X) = a2 (l + q) where q = —^ (n - k]pk ,
                                                      n k=i
which will tend to be larger than a2In if the correlations are positive.  If one conducts a t-test at a
certain significance level using the usual formula for the estimated variance, then the actual
significance level can be as much as double what was expected even for low values of p.

       One of the most effective ways to determine statistical independence is through use of the
Rank von Neumann Test. Directions for this test are given in Box 4-36 with an example in
Box 4-37.  Compared to other tests of statistical independence, the Rank von Neumann test has
been shown to be more powerful over a wide variety of cases. It is also a reasonable test when
the data follow a normal distribution.
                        Box 4-36: Directions for the Rank von Neumann Test

 LetXi,X2,  . .  . , Xn represent the data values collected in sequence over equally spaced periods of time.

 COMPUTATIONS: Order the data measurements from smallest to largest and assign rank n to measurement X,.
 If measurements are tied, then assign the average rank.

 STEP 1. Null Hypothesis:        H0: The data are independent.

 STEP 2. Alternative Hypothesis:   HA: The data are not independent.
                                     19
 STEP 3. Test Statistic:           v0 =  i    \V (r/ - r/
                                  n\n -1 f^
 STEP 4. a)  Critical Value:        Use Table A-16 to find vn, a.

 STEP 5. a) Conclusion:          If VQ < vn, a, then reject the null hypothesis that the data are independent.

 NOTE:  If the Rank von Neumann ratio test indicates significant evidence of dependence in the data, then a
 statistician should be consulted before further analysis is performed. If ranks are tied, the power of the statistical
 test is diminished
EPA QA/G-9S                                 137                                 February 2006

-------
                          Box 4-37: An Example of the Rank von Neumann Test
 The following are hourly readings from a discharge monitor: hourly readings from a discharge monitor.

 COMPUTATIONS:

 Time           12:00 13:00 14:00  15:00  16:00 17:00  18:00  19:00 20:00  21:00  22:00 23:00 24:00
 Reading         6.5   6.6    6.7   6.4   6.3   6.4   6.2   6.2   6.3   6.6   6.8   6.9    7.0
 Rank            7    8.5     10   5.5   3.5   5.5   1.5   1.5   3.5   8.5    11    12    13

 STEP 1.  Null Hypothesis:         H0: The data are independent.

 STEP 2.  Alternative Hypothesis:  HA: The data are not independent.

 STEP 3.  Test Statistic:            v0 =	4^	vf8.5-7)2 +(lO-8.5)2 +---+(13-12)2}= 0.473

 STEP 4.  a)  Critical Value:        Using Table A-16, 1/13,0.05 = 1.14.

 STEP 5.  a) Conclusion:          Since VQ = 0.473 < 1.14 = 1/13,0.05, we reject the null hypothesis that the data
                                are independent.
EPAQA/G-9S                                     138                                     February 2006

-------
                                          CHAPTER 5
                   STEP 5: DRAW CONCLUSIONS FROM THE DATA
THE DATA QUALITY ASSESSMENT PROCESS
          Review DQOs and Sampling Design
          Conduct Preliminary Data Review
             Select the Statistical Method
              Verify the Assumptions
           Draw Conclusions from the Data
i
  DRAW CONCLUSIONS FROM THE DATA

Purpose

Perform the statistical procedure and interpret the
results in the context of the data user's objectives.


Activities

• Perform the Statistical Procedure
• Draw Study Conclusions.
• Evaluate Performance of the Sampling Design


Tools

• Issues in hypothesis testing related to
 communicating the results of the DQA
                               Step 5: Draw Conclusions from the Data

        •   Perform the calculations for the statistical method.
                Perform the calculations and document them clearly.
                If anomalies or outliers are present in the data set, perform the calculations with
                and without the questionable data.

        •   Evaluate the results and draw conclusions.
                If the null hypothesis is rejected, then draw the conclusions and document the
                analysis.
                If the null hypothesis is not rejected, verify whether the tolerable limits on false
                acceptance decision errors have been  satisfied.  If so, draw conclusions and
                document the analysis; if not, determine corrective actions, if any.
                Interpret the results of the test or confidence interval.

        •   Evaluate the performance of the sampling design if the design is to be used again.
                Evaluate the statistical power of the design over the full range of parameter values;
                consult a statistician as necessary.
EPA QA/G-9S
          139
                                February 2006

-------
                                    List of Boxes

                                                                              Page
Box 5-1:  Checking Adequacy of Sample Size for a One-Sample Mest for Simple
         Random Sampling	142
Box 5-2:  Example of Power Calculations for the One-Sample Test of a Single Proportion	144
EPA QA/G-9S                              140                             February 2006

-------
                                      CHAPTER 5

                  STEP 5:  DRAW CONCLUSIONS FROM THE DATA

5.1    OVERVIEW AND ACTIVITIES

       In this final step of the DQA, the analyst performs the statistical hypothesis test or
computes the confidence interval and draws conclusions that address the data user's objectives.

5.2    PERFORM THE STATISTICAL METHOD

       The goal of this activity is to conduct the statistical hypothesis test or compute the
confidence interval procedure chosen in Chapter 3.  The calculations for the procedure should be
clearly documented and easily verifiable.  In addition, documentation of the results should be
understandable so they can be communicated effectively to those who may hold a stake in the
resulting decision.  If computer software is used to perform the calculations, then the procedures
should be adequately documented.

5.3    DRAW STUDY CONCLUSIONS

5.3.1   Hypothesis Tests

       The goal of this activity is to translate the results of the statistical hypothesis test so that
the data user may draw a conclusion from the data; the results being either:

       (a)     reject the null hypothesis, in which case there is significant evidence in favor of
              the alternative hypothesis.  The decision can be made with sufficient confidence
              and without further analysis.  This is because the statistical tests described in this
              document inherently control the false rejection error rate within the data user's
              tolerable limits when the underlying assumptions are valid.

       (b)    fail to reject the null hypothesis, in which case there is not significant evidence for
              the alternative hypothesis.  The analyst is concerned about a possible false
              acceptance error.  The most thorough procedure for verifying whether the false
              acceptance error limits have been satisfied is to compute the estimated power of
              the statistical test.

       Alternatively the sample size required to satisfy the data user's objectives can be
calculated retrospectively using an estimate of the variance or an upper confidence limit on
variance obtained from the actual data.  If this theoretical sample size is less than or equal to the
number of samples actually taken, then the test is probably sufficiently powerful. The equations
required to perform these calculations have been provided in the instructions for many of the
hypothesis test procedures in Chapter 3. An example of this method is contained in Box 5-1, but
it is emphasized that this only gives an estimate of power, not an absolute determination.
EPAQA/G-9S                                141                                February 2006

-------
     Box 5-1:  Checking Adequacy of Sample Size for a One-Sample f-Test for Simple Random Sampling

  In Box 3-3, the one-sample f-test was used to test the hypothesis H0: ft < 95 ppm vs. HA: fi > 95 ppm. DQOs
  specified that the test should limit the false rejection error rate to 5% and the false acceptance error rate to 20% if
  the true mean was 105 ppm.

  A random  sample of size n = 9 had sample mean X = 99.38 ppm and sample standard deviation s = 10.41
  ppm. The null hypothesis was not rejected. Assuming that the true value of the standard deviation was equal to
  the sample estimate of 10.41 ppm, it was found that a sample size of 9 would be required.  This validated the
  sample size of 9 which had actually been used.

  In such a case it makes sense to build in some conservatism, for example, by using an upper 90% confidence
  limit for a in the sample size calculation of Box 3-3.  Using Box 4-22, it is found that an upper 90% confidence
  limit for the true standard deviation is

                                     In _ -I        I  H
                                                     = 15.76.


  Using this  value for s in the sample size calculation of Box 3-3 leads to the sample size estimate of 17. Hence, a
  sample size of at least 17 should be  used to be 90% sure of achieving the DQOs.

  Since it is  generally desirable to avoid the need for additional sampling, it is advisable to conservatively estimate
  sample size in the first place. In cases where DQOs depend on a variance estimate, this conservatism is
  achieved by intentionally overestimating the variance.
5.3.2  Confidence Intervals or Limits

       A confidence interval is simply an interval estimate for the population parameter of
interest.  The interval's width is dependent upon the variance of the point estimate, the sample
size, and the confidence level. More specifically, the width is large if the variance is large, the
sample size is small, or the confidence level is large.

       The interpretation of a confidence interval makes use of probability in an intuitive sense.
When a confidence interval has been constructed using the data, there is still a chance that the
interval does not include the true value of the parameter estimated.  For example, consider this
confidence interval statement: "the 95% confidence interval for the unknown population mean is
43.5 to 48.9". It is interpreted as, "I can be 95% certain that the interval 43.5 to 48.9 captures the
unknown mean." Notice how there is a 5% chance that the interval does not capture the mean.

       The confidence level is the 'confidence' we have that the population parameter lies
within the interval.  This concept is analogous to the false  rejection error rate. The width of the
interval is related to statistical power, or the false acceptance error rate. Rather than specifying a
desired false acceptance error rate, the desired interval width can be specified.

       A confidence interval can be used to make to decisions and in some situations  a test of
hypothesis is set up as a confidence interval.  Confidence intervals are analogous to two-sided
hypothesis tests. If the threshold value lies outside of the interval, then there is evidence that the
population parameter differs from the threshold value.  In  a similar manner, confidence limits
can also be related to one-sided hypothesis tests. If the threshold value lies above (below) an

EPA QA/G-9S                                  142                                  February 2006

-------
upper (lower) confidence bound, then there is evidence that the population parameter is less
(greater) than the threshold.

5.3.3  Tolerance Intervals or Limits

       A tolerance interval is an interval estimate for a certain proportion of the population. The
interval's width is dependent upon the variance of the population, the sample size, the desired
proportion of the population, and the confidence level. More specifically, the width is large if
the variance is large, the sample size is small, the proportion is large, or the confidence level is
large.

       When a tolerance interval has been constructed using the data, there is still a chance that
the interval does not include the desired proportion of the population. For example, consider this
tolerance interval statement: "the 99% tolerance interval for 90% of the population is 7.5 to 9.9".
 It is interpreted as, "I can be 99% certain that the interval 7.5 to 9.9 captures 90% of the
population."  Notice how there is a 1% chance that the interval does not capture at least the
desired proportion.

       The confidence level is the 'confidence' we have that the desired proportion of the
population lies within the interval.  This concept is analogous to the false rejection error rate.
The width of the interval is related to statistical power, or the false acceptance error rate. Rather
than specifying a desired false acceptance error rate, the desired interval width can be specified.

       A tolerance interval can be used to make to decisions, and in some situations a test of
hypothesis can be set up as a tolerance interval. Tolerance intervals are analogous to two-sided
hypothesis tests. If the threshold value lies  outside of the interval, then there is  evidence that the
desired proportion of the population differs from the threshold value.  In a similar manner,
tolerance limits can also be related to one-sided hypothesis tests.  If the threshold value lies
above (below) an upper (lower) tolerance limit, then there is evidence that the desired proportion
of the population is less (greater) than the threshold.

5.4    EVALUATE PERFORMANCE OF THE SAMPLING DESIGN

       If the sampling design is to be used again, either in a later phase of the current study or in
a similar study, the analyst will  be interested in evaluating the overall performance of the design.
 To evaluate the  sampling design, the analyst performs a statistical power analysis that describes
the estimated power  of the statistical test over the range of possible parameter values.  The
estimated power is computed for all parameter values under the alternative hypothesis to create a
power curve. A power analysis helps the analyst evaluate the adequacy of the sampling design
when the true parameter value lies in the vicinity of the action level. In this manner, the analyst
may determine how well a statistical test performed and compare this performance with that of
other tests. Box 5-2  illustrates power calculations for a test of a single proportion.
EPA QA/G-9S                                 143                                 February 2006

-------
          Box 5-2: Example of Power Calculations for the One-Sample Test of a Single Proportion

  This box illustrates power calculations for the test of Ho: P > 0.20 vs. HA: P < 0.20, with a false rejection error
  rate of 5% when P = 0.20 presented in Box 3-13.  The power of the test will be calculated assuming P-i =0.15
  and before data is available. Since nP0 and n(1-Po) both exceed 5, the normal approximation  may be used.

  STEP 1:  Determine the general conditions for rejection of the null hypothesis.  In this case, the null hypothesis
           is rejected if the sample proportion is sufficiently smaller than Po.  Using Box 3-11,  Ho is rejected if
                             0.5/_p0
                                       -zi_«  or p +
           where p is the sample proportion and -zx_a is the standard normal critical value.

  STEP 2:  Determine the specific conditions for rejection of the null hypothesis if P-i is the true value of the
           proportion P. Using the equations above, rejection occurs if
                                                    _ 0.20 -0.1 5 -1.645V0.2 -0.8 785 _
                                                    ~       VO.15-0.85/85

  STEP 3:  Find the probability of rejection if P-i is the true proportion.  The quantity on the left-hand side of the
           above inequality is a standard normal variable.  Hence the power at P-\ =0.15 is the probability that a
           standard normal variable is less than -0.55.  Using Table A-1, this probability is approximately 0.3,
           which is fairly small.
5.5     INTERPRET AND COMMUNICATE THE RESULTS

        At this point, the analyst has performed the applicable statistical procedure and has drawn
conclusions.  In many cases, the conclusions are straightforward and convincing so they lead to
an unambiguous path forward for the project. In other cases, however, it is advantageous to
consider these conclusions in a broader context in order to determine a course of action, see Data
Quality Assessment: A Reviewer's Guide (EPA QA/G-9R) (U.S. EPA 2004).
EPA QA/G-9S                                    144                                   February 2006

-------
                                APPENDIX A:




                            STATISTICAL TABLES
EPA QA/G-9S                           145                            February 2006

-------
                            List of Tables

                                                               Page
TABLE A-l. STANDARD NORMAL DISTRIBUTION	147
TABLE A-2. CRITICAL VALUES OF STUDENT'S-^ DISTRIBUTION	149
TABLE A-3. CRITICAL VALUES FOR THE STUDENTIZED RANGE TEST	150
TABLE A-4. CRITICAL VALUES FOR THE EXTREME VALUE TEST
          (DIXON'S TEST)	151
TABLE A-5. CRITICAL VALUES FOR DISCORDANCE TEST	152
TABLE A-6. APPROXIMATE CRITICAL VALUES Xr FOR ROSNERS TEST	153
TABLE A-7. QUANTILES OF THE WILCOXON SIGNED RANKS TEST	155
TABLE A-8. CRITICAL VALUES FOR THE WILCOXON RANK-SUM TEST	156
TABLE A-9. PERCENTILES OF THE CHI-SQUARE DISTRIBUTION	159
TABLE A-10. PERCENTILES OF THE F-DISTRIBUTION	160
TABLE A-l 1. VALUES OF THE PARAMETER i FOR COHEN'S ESTIMATES
           ADJUSTING FOR NONDETECTED VALUES	164
TABLE A-12a: CRITICAL VALUES FOR THE MANN-KENDALL TEST FOR TREND.. 165
TABLE A-12b: PROBABILITIES FOR THE SMALL-SAMPLE MANN-KENDALL
            TEST FOR TREND	165
TABLE A-l3. QUANTILES FOR THE WALD-WOLFOWITZ TEST FOR RUNS	166
TABLE A-14. CRITICAL VALUES FOR THE SLIPPAGE TEST	170
TABLE A-15. DUNNETT'S TEST (ONE TAILED)	173
TABLE A-16. APPROXIMATE a-LEVEL CRITICAL POINTS FOR RANK VON
           NEUMANN RATIO TEST	175
TABLE A-17. VALUES FOR COMPUTING A ONE-SIDED CONFIDENCE LIMIT ON
           A LOGNORMAL MEAN	176
TABLE A-18. CRITICAL VALUES FOR THE SIGN TEST	178
TABLE A-19. CRITICAL VALUES FOR THE QUANTILE TEST	179
EPA QA/G-9S                         146                        February 2006

-------
                 TABLE A-l. STANDARD NORMAL DISTRIBUTION
Table values are P(Z < zp) = p.
ZP
-0.0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1.0
-1.1
-1.2
-1.3
-1.4
-1.5
-1.6
-1.7
-1.8
-1.9
-2.0
-2.1
-2.2
-2.3
-2.4
-2.5
-2.6
-2.7
-2.8
-2.9
-3.0
-3.1
-3.2
-3.3
-3.4
.00
.5000
.4602
.4207
.3821
.3446
.3085
.2743
.2420
.2119
.1841
.1587
.1357
.1151
.0968
.0808
.0668
.0548
.0446
.0359
.0287
.0228
.0179
.0139
.0107
.0082
.0062
.0047
.0035
.0026
.0019
.0013
.0010
.0007
.0005
.0003
.01
.4960
.4562
.4168
.3783
.3409
.3050
.2709
.2389
.2090
.1814
.1562
.1335
.1131
.0951
.0793
.0655
.0537
.0436
.0351
.0281
.0222
.0174
.0136
.0104
.0080
.0060
.0045
.0034
.0025
.0018
.0013
.0009
.0007
.0005
.0003
.02
.4920
.4522
.4129
.3745
.3372
.3015
.2676
.2358
.2061
.1788
.1539
.1314
.1112
.0934
.0778
.0643
.0526
.0427
.0344
.0274
.0217
.0170
.0132
.0102
.0078
.0059
.0044
.0033
.0024
.0018
.0013
.0009
.0006
.0005
.0003
.03
.4880
.4483
.4090
.3707
.3336
.2981
.2643
.2327
.2033
.1762
.1515
.1292
.1093
.0918
.0764
.0630
.0516
.0418
.0336
.0268
.0212
.0166
.0129
.0099
.0075
.0057
.0043
.0032
.0023
.0017
.0012
.0009
.0006
.0004
.0003
.04
.4840
.4443
.4052
.3669
.3300
.2946
.2611
.2296
.2005
.1736
.1492
.1271
.1075
.0901
.0749
.0618
.0505
.0409
.0329
.0262
.0207
.0162
.0125
.0096
.0073
.0055
.0041
.0031
.0023
.0016
.0012
.0008
.0006
.0004
.0003
.05
.4801
.4404
.4013
.3632
.3264
.2912
.2578
.2266
.1977
.1711
.1469
.1251
.1056
.0885
.0735
.0606
.0495
.0401
.0322
.0256
.0202
.0158
.0122
.0094
.0071
.0054
.0040
.0030
.0022
.0016
.0011
.0008
.0006
.0004
.0003
.06
.4761
.4364
.3974
.3594
.3228
.2877
.2546
.2236
.1949
.1685
.1446
.1230
.1038
.0869
.0721
.0594
.0485
.0392
.0314
.0250
.0197
.0154
.0119
.0091
.0069
.0052
.0039
.0029
.0021
.0015
.0011
.0008
.0006
.0004
.0003
.07
.4721
.4325
.3936
.3557
.3192
.2843
.2514
.2206
.1922
.1660
.1423
.1210
.1020
.0853
.0708
.0582
.0475
.0384
.0307
.0244
.0192
.0150
.0116
.0089
.0068
.0051
.0038
.0028
.0021
.0015
.0011
.0008
.0005
.0004
.0003
.08
.4681
.4286
.3897
.3520
.3156
.2810
.2483
.2177
.1894
.1635
.1401
.1190
.1003
.0838
.0694
.0571
.0465
.0375
.0301
.0239
.0188
.0146
.0113
.0087
.0066
.0049
.0037
.0027
.0020
.0014
.0010
.0007
.0005
.0004
.0003
.09
.4641
.4247
.3859
.3483
.3121
.2776
.2451
.2148
.1867
.1611
.1379
.1170
.0985
.0823
.0681
.0559
.0455
.0367
.0294
.0233
.0183
.0143
.0110
.0084
.0064
.0048
.0036
.0026
.0019
.0014
.0010
.0007
.0005
.0003
.0002
EPA QA/G-9S
147
February 2006

-------
            TABLE A-l. STANDARD NORMAL DISTRIBUTION (CONT.)
Table values are P(Z < zp) = p.
ZP
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
.00
.5000
.5398
.5793
.6179
.6554
.6915
.7257
.7580
.7881
.8159
.8413
.8643
.8849
.9032
.9192
.9332
.9452
.9554
.9641
.9713
.9772
.9821
.9861
.9893
.9918
.9938
.9953
.9965
.9974
.9981
.9987
.9990
.9993
.9995
.9997
.01
.5040
.5438
.5832
.6247
.6591
.6950
.7291
.7611
.7910
.8186
.8438
.8665
.8869
.9049
.9207
.9345
.9463
.9564
.9649
.9719
.9778
.9826
.9864
.9896
.9920
.9940
.9955
.9966
.9975
.9982
.9987
.9991
.9993
.9995
.9997
.02
.5080
.5478
.5871
.6255
.6628
.6985
.7324
.7642
.7939
.8212
.8461
.8686
.8888
.9066
.9222
.9357
.9474
.9573
.9656
.9726
.9783
.9830
.9868
.9898
.9922
.9941
.9956
.9967
.9976
.9982
.9987
.9991
.9994
.9995
.9997
.03
.5120
.5517
.5910
.6293
.6664
.7019
.7357
.7673
.7967
.8238
.8485
.8708
.8907
.9082
.9236
.9370
.9484
.9582
.9664
.9732
.9788
.9834
.9871
.9901
.9925
.9943
.9957
.9968
.9977
.9983
.9988
.9991
.9994
.9996
.9997
.04
.5160
.5557
.5948
.6331
.6700
.7054
.7389
.7704
.7995
.8264
.8508
.8729
.8925
.9099
.9251
.9382
.9495
.9591
.9671
.9738
.9793
.9838
.9875
.9904
.9927
.9945
.9959
.9969
.9977
.9984
.9988
.9992
.9994
.9996
.9997
.05
.5199
.5596
.5987
.6368
.6736
.7088
.7422
.7734
.8023
.8289
.8531
.8749
.8944
.9115
.9265
.9394
.9505
.9599
.9678
.9744
.9798
.9842
.9878
.9906
.9929
.9946
.9960
.9970
.9978
.9984
.9989
.9992
.9994
.9996
.9997
.06
.5239
.5636
.6026
.6406
.6772
.7123
.7454
.7764
.8051
.8315
.8554
.8770
.8962
.9131
.9279
.9406
.9515
.9608
.9686
.9750
.9803
.9846
.9881
.9909
.9931
.9948
.9961
.9971
.9979
.9985
.9989
.9992
.9994
.9996
.9997
.07
.5279
.5675
.6064
.6443
.6808
.7157
.7486
.7794
.8078
.8340
.8577
.8790
.8980
.9147
.9292
.9418
.9525
.9616
.9693
.9756
.9808
.9850
.9884
.9911
.9932
.9949
.9962
.9972
.9979
.9985
.9989
.9992
.9995
.9996
.9997
.08
.5319
.5714
.6103
.6480
.6844
.7190
.7517
.7823
.8106
.8365
.8599
.8810
.8897
.9162
.9306
.9429
.9535
.9625
.9699
.9761
.9812
.9854
.9889
.9137
.9934
.9951
.9963
.9973
.9980
.9986
.9990
.9993
.9995
.9996
.9997
.09
.5359
.5753
.6141
.6517
.6879
.7224
.7549
.7852
.8133
.8389
.8621
.8830
.9015
.9177
.9319
.9441
.9545
.9633
.9706
.9767
.9817
.9857
.9890
.9916
.9936
.9952
.9964
.9974
.9981
.9986
.9990
.9993
.9995
.9997
.9998
EPA QA/G-9S
148
February 2006

-------
          TABLE A-2. CRITICAL VALUES OF STUDENT'S-? DISTRIBUTION
Degrees of
Freedom
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
120
00
l-«

0.70
0.727
0.617
0.584
0.569
0.559
0.553
0.549
0.546
0.543
0.542
0.540
0.539
0.538
0.537
0.536
0.535
0.534
0.534
0.533
0.533
0.532
0.532
0.532
0.531
0.531
0.531
0.531
0.530
0.530
0.530
0.529
0.527
0.526
0.524

0.75
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.677
0.674

0.80
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.851
0.848
0.845
0.842

0.85
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.046
1.041
1.036

0.90
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.34
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.289
1.282

0.95
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.658
1.645

0.975
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.980
1.960

0.99
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.358
2.326

0.995
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.617
2.576
    Note: The last row of the table (co degrees of freedom) gives the critical values for a standard normal distribution (Z),
         e.g., U 0.95 = z 0.95= 1.645.
EPA QA/G-9S
149
February 2006

-------
     TABLE A-3. CRITICAL VALUES FOR THE STUDENTIZED RANGE TEST
«
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
150
200
500
1000
Level of Significance «
0.10
a b
1.78 2.00
2.04 2.41
2.22 2.71
2.37 2.95
2.49 3.14
2.59 3.31
2.68 3.45
2.76 3.57
2.84 3.68
2.90 3.78
2.96 3.87
3.02 3.95
3.07 4.02
3.12 4.09
3.17 4.15
3.21 4.21
3.25 4.27
3.29 4.32
3.45 4.53
3.59 4.70
3.70 4.84
3.79 4.96
3.88 5.06
3.95 5.14
4.02 5.22
4.08 5.29
4.14 5.35
4.19 5.41
4.24 5.46
4.28 5.51
4.33 5.56
4.36 5.60
4.40 5.64
4.44 5.68
4.72 5.96
4.90 6.15
5.49 6.72
5.92 7.11
0.05
a b
1.76 2.00
1.98 2.43
2.15 2.75
2.28 3.01
2.40 3.22
2.50 3.40
2.59 3.55
2.67 3.69
2.74 3.80
2.80 3.91
2.86 4.00
2.92 4.09
2.97 4.17
3.01 4.24
3.06 4.31
3.10 4.37
3.14 4.43
3.18 4.49
3.34 4.71
3.47 4.89
3.58 5.04
3.67 5.16
3.75 5.26
3.83 5.35
3.90 5.43
3.96 5.51
4.01 5.57
4.06 5.63
4.11 5.68
4.16 5.73
4.20 5.78
4.24 5.82
4.27 5.86
4.31 5.90
4.59 6.18
4.78 6.39
5.47 6.94
5.79 7.33
0.01
a b
1.74 2.00
1.87 2.45
2.02 2.80
2.15 3.10
2.26 3.34
2.35 3.54
2.44 3.72
2.51 3.88
2.58 4.01
2.64 4.13
2.70 4.24
2.75 4.34
2.80 4.44
2.84 4.52
2.88 4.60
2.92 4.67
2.96 4.74
2.99 4.80
3.15 5.06
3.27 5.26
3.38 5.42
3.47 5.56
3.55 5.67
3.62 5.77
3.69 5.86
3.75 5.94
3.80 6.01
3.85 6.07
3.90 6.13
3.94 6.18
3.99 6.23
4.02 6.27
4.06 6.32
4.10 6.36
4.38 6.64
4.59 6.84
5.13 7.42
5.57 7.80
EPA QA/G-9S
150
February 2006

-------
       TABLE A-4. CRITICAL VALUES FOR THE EXTREME VALUE TEST
                              (DIXON'S TEST)
«
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Level of Significance «
0.10
0.886
0.679
0.557
0.482
0.434
0.479
0.441
0.409
0.517
0.490
0.467
0.492
0.472
0.454
0.438
0.424
0.412
0.401
0.391
0.382
0.374
0.367
0.360
0.05
0.941
0.765
0.642
0.560
0.507
0.554
0.512
0.477
0.576
0.546
0.521
0.546
0.525
0.507
0.490
0.475
0.462
0.450
0.440
0.430
0.421
0.413
0.406
0.01
0.988
0.889
0.780
0.698
0.637
0.683
0.635
0.597
0.679
0.642
0.615
0.641
0.616
0.595
0.577
0.561
0.547
0.535
0.524
0.514
0.505
0.497
0.489
EPA QA/G-9S
151
February 2006

-------
           TABLE A-5. CRITICAL VALUES FOR DISCORDANCE TEST
N
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Level of Significance «
0.01
1.155
1.492
1.749
1.944
2.097
2.221
2.323
2.410
2.485
2.550
2.607
2.659
2.705
2.747
2.785
2.821
2.854
2.884
2.912
2.939
2.963
2.987
3.009
3.029
3.049
3.068
3.085
3.103
0.05
1.153
1.463
1.672
1.822
1.938
2.032
2.110
2.176
2.234
2.285
2.331
2.371
2.409
2.443
2.475
2.504
2.532
2.557
2.580
2.603
2.624
2.644
2.663
2.681
2.698
2.714
2.730
2.745
«
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Level of Significance «
0.01
3.119
3.135
3.150
3.164
3.178
3.191
3.204
3.216
3.228
3.240
3.251
3.261
3.271
3.282
3.292
3.302
3.310
3.319
3.329
3.336
0.05
2.759
2.773
2.786
2.799
2.811
2.823
2.835
2.846
2.857
2.866
2.877
2.887
2.896
2.905
2.914
2.923
2.931
2.940
2.948
2.956
EPA QA/G-9S
152
February 2006

-------
    TABLE A-6. APPROXIMATE CRITICAL VALUES lr FOR ROSNER'S TEST
n
25





26





27





28





29





30





31





r
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
«
0.05
2.82
2.80
2.78
2.76
2.73
2.59
2.84
2.82
2.80
2.78
2.76
2.62
2.86
2.84
2.82
2.80
2.78
2.65
2.88
2.86
2.84
2.82
2.80
2.68
2.89
2.88
2.86
2.84
2.82
2.71
2.91
2.89
2.88
2.86
2.84
2.73
2.92
2.91
2.89
2.88
2.86
2.76
0.01
3.14
3.11
3.09
3.06
3.03
2.85
3.16
3.14
3.11
3.09
3.06
2.89
3.18
3.16
3.14
3.11
3.09
2.93
3.20
3.18
3.16
3.14
3.11
2.97
3.22
3.20
3.18
3.16
3.14
3.00
3.24
3.22
3.20
3.18
3.16
3.03
3.25
3.24
3.22
3.20
3.18
3.06
n
32





33





34





35





36





37





38





r
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
«
0.05
2.94
2.92
2.91
2.89
2.88
2.78
2.95
2.94
2.92
2.91
2.89
2.80
2.97
2.95
2.94
2.92
2.91
2.82
2.98
2.97
2.95
2.94
2.92
2.84
2.99
2.98
2.97
2.95
2.94
2.86
3.00
2.99
2.98
2.97
2.95
2.88
3.01
3.00
2.99
2.98
2.97
2.91
0.01
3.27
3.25
3.24
3.22
3.20
3.09
3.29
3.27
3.25
3.24
3.22
3.11
3.30
3.29
3.27
3.25
3.24
3.14
3.32
3.30
3.29
3.27
3.25
3.16
3.33
3.32
3.30
3.29
3.27
3.18
3.34
3.33
3.32
3.30
3.29
3.20
3.36
3.34
3.33
3.32
3.30
3.22
n
39





40





41





42





43





44





45





r
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
«
0.05
3.03
3.01
3.00
2.99
2.98
2.91
3.04
3.03
3.01
3.00
2.99
2.92
3.05
3.04
3.03
3.01
3.00
2.94
3.06
3.05
3.04
3.03
3.01
2.95
3.07
3.06
3.05
3.04
3.03
2.97
3.08
3.07
3.06
3.05
3.04
2.98
3.09
3.08
3.07
3.06
3.05
2.99
0.01
3.37
3.36
3.34
3.33
3.32
3.24
3.38
3.37
3.36
3.34
3.33
3.25
3.39
3.38
3.37
3.36
3.34
3.27
3.40
3.39
3.38
3.37
3.36
3.29
3.41
3.40
3.39
3.38
3.37
3.30
3.43
3.41
3.40
3.39
3.38
3.32
3.44
3.43
3.41
3.40
3.39
3.33
EPA QA/G-9S
153
February 2006

-------
TABLE A-6. APPROXIMATE CRITICAL VALUES XT FOR ROSNER'S TEST (CONT.)
«
46





47





48





49





50





60





r
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
«
0.05
3.09
3.09
3.08
3.07
3.06
3.00
3.10
3.09
3.09
3.08
3.07
3.01
3.11
3.10
3.09
3.09
3.08
3.03
3.12
3.11
3.10
3.09
3.09
3.04
3.13
3.12
3.11
3.10
3.09
3.05
3.20
3.19
3.19
3.18
3.17
3.14
0.01
3.45
3.44
3.43
3.41
3.40
3.34
3.46
3.45
3.44
3.43
3.41
3.36
3.46
3.46
3.45
3.44
3.43
3.37
3.47
3.46
3.46
3.45
3.44
3.38
3.48
3.47
3.46
3.46
3.45
3.39
3.56
3.55
3.55
3.54
3.53
3.49





































«
70





SO





90





100





150





200





r
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
a
0.05
3.26
3.25
3.25
3.24
3.24
3.21
3.31
3.30
3.30
3.29
3.29
3.26
3.35
3.34
3.34
3.34
3.33
3.31
3.38
3.38
3.38
3.37
3.37
3.35
3.52
3.51
3.51
3.51
3.51
3.50
3.61
3.60
3.60
3.60
3.60
3.59
0.01
3.62
3.62
3.61
3.60
3.60
3.57
3.67
3.67
3.66
3.66
3.65
3.63
3.72
3.71
3.71
3.70
3.70
3.68
3.75
3.75
3.75
3.74
3.74
3.72
3.89
3.89
3.89
3.88
3.88
3.87
3.98
3.98
3.97
3.97
3.97
3.96





































«
250





300





350





400





450





500





r
1
5
10



1
5
10



1
5
10



1
5
10



1
5
10



1
5
10



a
0.05
3.67
3.67
3.66



3.72
3.72
3.71



3.77
3.76
3.76



3.80
3.80
3.80



3.84
3.83
3.83



3.86
3.86
3.86



0.01
4.04
4.04
4.03



4.09
4.09
4.09



4.14
4.13
4.13



4.17
4.17
4.16



4.20
4.20
4.20



4.23
4.23
4.22



EPA QA/G-9S
154
February 2006

-------
       TABLE A-7. QUANTILES OF THE WILCOXON SIGNED RANKS TEST
Values in the table are such that P(T* < wa) is approximately equal to, but less than a. For
example, if n = 12, then P(7^ < 17) = 0.0461, which is slightly less than 0.05.  Note the exact
probability was computed using the statistical software package R.
N
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
wo.oos
-
-
-
-
0
1
3
5
7
9
12
15
19
23
27
32
37
t%01
-
-
-
0
1
3
5
7
9
12
15
19
23
27
32
37
43
W0.02S
-
-
0
2
3
5
8
10
13
17
21
25
29
34
40
46
52
>%05
-
0
2
3
5
8
10
13
17
21
25
30
35
41
47
53
60
W0.075
0
1
2
4
7
9
12
16
19
24
28
33
39
45
51
58
65
Wo.io
0
2
3
5
8
10
14
17
21
26
31
36
42
48
55
62
69
>%15
1
2
4
7
9
12
16
20
24
29
35
40
47
53
60
68
76
t%20
2
3
5
8
11
14
18
22
27
32
38
44
50
57
65
73
81
EPA QA/G-9S
155
February 2006

-------
     TABLE A-8. CRITICAL VALUES FOR THE WILCOXON RANK-SUM TEST
Table values are the largest x values such that P( Wrs 
-------
TABLE A-8. CRITICAL VALUES FOR THE WILCOXON RANK-SUM TEST (CONT.)
Table values are the largest x values such that P( Wrs 
-------
TABLE A-8. CRITICAL VALUES FOR THE WILCOXON RANK-SUM TEST (CONT.)
Table values are the largest x values such that P( Wrs 
-------
        TABLE A-9. PERCENTILES OF THE CHI-SQUARE DISTRIBUTION
                                     X\-a
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
l-«
0.005
0.04393
0.0100
0.072
0.207
0.412
0.676
0.989
1.34
1.73
2.16
2.60
3.07
3.57
4.07
4.60
5.14
5.70
6.26
6.84
7.43
8.03
8.64
9.26
9.89
10.52
11.16
11.81
12.46
13.12
13.79
20.71
27.99
35.53
43.28
51.17
59.20
67.33
0.010
0.03157
0.0201
0.115
0.297
0.554
0.872
1.24
1.65
2.09
2.56
3.05
3.57
4.11
4.66
5.23
5.81
6.41
7.01
7.63
8.26
8.90
9.54
10.20
10.86
11.52
12.20
12.88
13.56
14.26
14.95
22.16
29.71
37.48
45.44
53.54
61.75
70.06
0.025
0.03982
0.0506
0.216
0.484
0.831
1.24
1.69
2.18
2.70
3.25
3.82
4.40
5.01
5.63
6.26
6.91
7.56
8.23
8.91
9.59
10.28
10.98
11.69
12.40
13.12
13.84
14.57
15.31
16.05
16.79
24.43
32.36
40.48
48.76
57.15
65.65
74.22
0.050
0.02393
0.103
0.352
0.711
1.145
1.64
2.17
2.73
3.33
3.94
3.57
5.23
5.89
6.57
7.26
7.96
8.67
9.39
10.12
10.85
11.59
12.34
13.09
13.85
14.61
15.38
16.15
16.93
17.71
18.49
26.51
34.76
43.19
51.74
60.39
69.13
77.93
0.100
0.0158
0.211
0.584
1.064
1.61
2.20
2.83
3.49
4.17
4.87
5.58
6.30
7.04
7.79
8.55
9.31
10.09
10.86
11.65
12.44
13.24
14.04
14.85
15.66
16.47
17.29
18.11
18.94
19.77
20.60
29.05
37.69
46.46
53.33
64.28
73.29
82.36
0.900
2.71
4.61
6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99
17.28
18.55
19.81
21.06
22.31
23.54
24.77
25.99
27.20
28.41
29.62
30.81
32.01
33.20
34.38
35.56
36.74
37.92
39.09
40.26
51.81
63.17
74.40
85.53
96.58
107.6
118.5
0.950
3.84
5.99
7.81
9.49
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
37.65
38.89
40.11
41.34
42.56
43.77
55.76
67.50
79.08
90.53
101.9
113.1
124.3
0.975
5.02
7.38
9.35
11.14
12.83
14.45
16.01
17.53
19.02
20.48
21.92
23.34
24.74
26.12
27.49
28.85
30.19
31.53
32.85
34.17
35.48
36.78
38.08
39.36
40.65
41.92
43.19
44.46
45.72
46.98
59.34
71.42
83.30
95.02
106.6
118.1
129.6
0.990
6.63
9.21
11.34
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.73
26.22
27.69
29.14
30.58
32.00
33.41
34.81
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.28
49.59
50.89
63.69
76.15
88.38
100.4
112.3
124.1
135.8
0.995
7.88
10.60
12.84
14.86
16.75
18.55
20.28
21.96
23.59
25.19
26.76
28.30
29.82
31.32
32.80
34.27
35.72
37.16
38.58
40.00
41.40
42.80
44.18
45.56
46.93
48.29
49.64
50.99
52.34
53.67
66.77
79.49
91.95
104.2
116.3
128.3
140.2
EPA QA/G-9S
159
February 2006

-------
                           TABLE A-10. PERCENTILES OF THE F-DISTRIBUTION
Degrees
Freedom for
Denominator
1 .90
.95
.975
.99
2 .90
.95
.975
.99
3 .90
.95
.975
.99
4 .90
.95
.975
.99
.999
5 .90
.95
.975
.99
.999
6 .90
.95
.975
.99
.999
Degrees of Freedom for Numerator
1
39.9
161
648
4052
8.53
18.5
38.5
98.5
5.54
10.1
17.4
34.1
4.54
7.71
12.2
21.2
74.1
4.06
6.61
10.0
16.3
47.2
3.78
5.99
8.81
22.8
35.5
2
49.5
200
800
5000
9.00
19.0
39.0
99.0
5.46
9.55
16.0
30.8
4.32
6.94
10.6
18.0
61.2
3.78
5.79
8.43
13.3
37.1
3.46
5.14
7.26
10.9
27.0
3
53.6
216
864
5403
9.16
19.2
39.2
99.2
5.39
9.28
15.4
29.5
4.19
6.59
9.98
16.7
56.2
3.62
5.41
7.76
12.1
33.2
3.29
4.76
6.60
9.78
23.7
4
55.8
225
900
5625
9.24
19.2
39.2
99.2
5.34
9.12
15.1
28.7
4.11
6.39
9.60
16.0
53.4
3.52
5.19
7.39
11.4
31.1
3.18
4.53
6.23
9.15
21.9
5
57.2
230
922
5764
9.29
19.3
39.3
99.3
5.31
9.01
14.9
28.2
4.05
6.26
9.36
15.5
51.7
3.45
5.05
7.15
11.0
29.8
3.11
4.39
5.99
8.75
20.8
6
58.2
234
937
5859
9.33
19.3
39.3
99.3
5.28
8.94
14.7
27.9
4.01
6.16
9.20
15.2
50.5
3.40
4.95
6.98
10.7
28.8
3.05
4.28
5.82
8.47
20.0
7
58.9
237
948
5928
9.35
19.4
39.4
99.4
5.27
8.89
14.6
27.7
3.98
6.09
9.07
15.0
49.7
3.37
4.88
6.85
10.5
28.2
3.01
4.21
5.70
8.26
19.5
8
59.4
239
957
5981
9.37
19.4
39.4
99.4
5.25
8.85
14.5
27.5
3.95
6.04
8.98
14.8
49.0
3.34
4.82
6.76
10.3
27.6
2.98
4.15
5.60
8.10
19.0
9
59.9
241
963
6022
9.38
19.4
39.4
99.4
5.24
8.81
14.5
27.3
3.94
6.00
8.90
14.7
48.5
3.32
4.77
6.68
10.2
27.2
2.96
4.10
5.52
7.98
18.7
10
60.2
242
969
6056
9.39
19.4
39.4
99.4
5.23
8.79
14.4
27.2
3.92
5.96
8.84
14.5
48.1
3.39
4.74
6.62
10.1
26.9
2.94
4.06
5.46
7.87
18.4
12
60.7
244
977
6106
9.41
19.4
39.4
99.4
5.22
8.74
14.3
27.1
3.90
5.91
8.75
14.4
47.4
3.27
4.68
6.52
9.89
26.4
2.90
4.00
5.37
7.72
18.0
15
61.2
246
985
6157
9.42
19.4
39.4
99.4
5.20
8.70
14.3
26.9
3.87
5.86
8.66
14.2
46.8
3.24
4.62
6.43
9.72
25.9
2.87
3.94
5.27
7.56
17.6
20
61.7
248
993
6209
9.44
19.4
39.4
99.4
5.18
8.66
14.2
26.7
3.84
5.80
8.56
14.0
46.1
3.21
4.56
6.33
9.55
25.4
2.84
3.87
5.17
7.40
17.1
24
62.0
249
997
6235
9.45
19.5
39.5
99.5
5.18
8.64
14.1
26.6
3.83
5.77
8.51
13.9
45.8
3.19
4.53
6.28
9.47
25.1
2.82
3.84
5.12
7.31
16.9
30
62.3
250
1001
6261
9.46
19.5
39.5
99.5
5.17
8.62
14.1
26.5
3.82
5.75
8.46
13.8
45.4
3.17
4.50
6.23
9.38
24.9
2.80
3.81
5.07
7.23
16.7
60
62.8
252
1010
6313
9.47
19.5
39.5
99.5
5.15
8.57
14.0
26.3
3.79
5.69
8.36
13.7
44.7
3.14
4.43
6.12
9.20
24.3
2.76
3.74
4.96
7.06
16.2
120
63.1
253
1014
6339
9.48
19.5
39.5
99.5
5.14
8.55
13.9
26.2
3.78
5.66
8.31
13.6
44.4
3.12
4.40
6.07
9.11
24.1
2.74
3.70
4.90
6.97
16.0
CO
63.3
254
1018
6366
9.49
19.5
39.5
99.5
5.13
8.53
13.9
26.1
3.76
5.63
8.26
13.5
44.1
3.11
4.37
6.02
9.02
23.8
2.72
3.67
4.85
6.88
15.7
EPA QA/G-9S
160
February 2006

-------
                        TABLE A-10. PERCENTILES OF THE F-DISTRIBUTION (CONT.)
Degrees
Freedom for
Denominator
7 .90
.95
.975
.99
.999
8 .90
.95
.975
.99
.999
9 .90
.95
.975
.99
.999
10 .90
.95
.975
.99
.999
12 .90
.95
.975
.99
.999
Degrees of Freedom for Numerator
1
3.59
5.59
8.07
12.2
29.2
3.46
5.32
7.57
11.3
25.4
3.36
5.12
7.21
10.6
22.9
3.29
4.96
6.94
10.0
21.0
3.18
4.75
6.55
9.33
18.6
2
3.26
4.74
6.54
9.55
21.7
3.11
4.46
6.06
8.65
18.5
3.01
4.26
5.71
8.02
16.4
2.92
4.10
5.46
7.56
14.9
2.81
3.89
5.10
6.93
13.0
3
3.07
4.35
5.89
8.45
18.8
2.92
4.07
5.42
7.59
15.8
2.81
3.86
5.08
6.99
13.9
2.73
3.71
4.83
6.55
12.6
2.61
3.49
4.47
5.95
10.8
4
2.96
4.12
5.52
7.85
17.2
2.81
3.84
5.05
7.01
14.4
2.69
3.63
4.72
6.42
12.6
2.61
3.48
4.47
5.99
11.3
2.48
3.26
4.12
5.41
9.63
5
2.88
3.97
5.29
7.46
16.2
2.73
3.69
4.82
6.63
13.5
2.61
3.48
4.48
6.06
11.7
2.52
3.33
4.24
5.64
10.5
2.39
3.11
3.89
5.06
8.89
6
2.83
3.87
5.12
7.19
15.5
2.67
3.58
4.65
6.37
12.9
2.55
3.37
4.32
5.80
11.1
2.46
3.22
4.07
5.39
9.93
2.33
3.00
3.73
4.82
8.38
7
2.78
3.79
4.99
6.99
15.0
2.62
3.50
4.53
6.18
12.4
2.51
3.29
4.20
5.61
10.7
2.41
3.14
3.95
5.20
9.52
2.28
2.91
3.61
4.64
8.00
8
2.75
3.73
4.90
6.84
14.6
2.59
3.44
4.43
6.03
12.0
2.47
3.23
4.10
5.47
10.4
2.38
3.07
3.85
5.06
9.20
2.24
2.85
3.51
4.50
7.71
9
2.72
3.68
4.82
6.72
14.5
2.56
3.39
4.36
5.91
11.8
2.44
3.18
4.03
5.35
10.1
2.35
3.02
3.78
4.94
8.96
2.21
2.80
3.44
4.39
7.48
10
2.70
3.64
4.76
6.62
14.1
2.54
3.35
4.30
5.81
11.5
2.42
3.14
3.96
5.26
9.89
2.32
2.98
3.72
4.85
8.75
2.19
2.75
3.37
4.30
7.29
12
2.67
3.57
4.67
6.47
13.7
2.50
3.28
4.20
5.67
11.2
2.38
3.07
3.87
5.11
9.57
2.28
2.91
3.62
4.71
8.45
2.15
2.69
3.28
4.16
7.00
15
2.63
3.51
4.57
6.31
13.3
2.46
3.22
4.10
5.52
10.8
2.34
3.01
3.77
4.96
9.24
2.24
2.84
3.52
4.56
8.13
2.10
2.62
3.18
4.01
6.71
20
2.59
3.44
4.47
6.16
12.9
2.42
3.15
4.00
5.36
10.5
2.30
2.94
3.67
4.81
8.90
2.20
2.77
3.42
4.41
7.80
2.06
2.54
3.07
3.86
6.40
24
2.58
3.41
4.42
6.07
12.7
2.40
3.12
3.95
5.28
10.3
2.28
2.90
3.61
4.73
8.72
2.18
2.74
3.37
4.33
7.64
2.04
2.51
3.02
3.78
6.25
30
2.56
3.38
4.36
5.99
12.5
2.38
3.08
3.89
5.20
10.1
2.25
2.86
3.56
4.65
8.55
2.16
2.70
3.31
4.25
7.47
2.01
2.47
2.96
3.70
6.09
60
2.51
3.30
4.25
5.82
12.1
2.34
3.01
3.78
5.03
9.73
2.21
2.79
3.45
4.48
8.19
2.11
2.62
3.20
4.08
7.12
1.96
2.38
2.85
3.54
5.76
120
2.49
3.27
4.20
5.74
11.9
2.32
2.97
3.73
4.95
9.53
2.18
2.75
3.39
4.40
8.00
2.08
2.58
3.14
4.00
6.94
1.93
2.34
2.79
3.45
5.59
CO
2.47
3.23
4.14
5.65
11.7
2.29
2.93
3.67
4.86
9.33
2.16
2.71
3.33
4.31
7.81
2.06
2.54
3.08
3.91
6.76
1.90
2.30
2.72
3.36
5.42
EPA QA/G-9S
161
February 2006

-------
                        TABLE A-10. PERCENTILES OF THE F-DISTRIBUTION (CONT.)
Degrees
Freedom for
Denominator
15 .90
.95
.975
.99
.999
20 .90
.95
.975
.99
.999
24 .90
.95
.975
.99
.999
30 .90
.95
.975
.99
.999
60 .90
.95
.975
.99
.999
Degrees of Freedom for Numerator
1
3.07
4.54
6.20
8.68
16.6
2.97
4.35
5.87
8.10
14.8
2.93
4.26
5.72
7.82
14.0
2.88
4.17
5.57
7.56
13.3
2.79
4.00
5.29
7.08
12.0
2
2.70
3.68
4.77
6.36
11.3
2.59
3.49
4.46
5.85
9.95
2.54
3.40
4.32
6.66
9.34
2.49
3.32
4.18
5.39
8.77
2.39
3.15
3.93
4.98
7.77
3
2.49
3.29
4.15
5.42
9.34
2.38
3.10
3.86
4.94
8.10
2.33
3.01
3.72
4.72
7.55
2.28
2.92
3.59
4.51
7.05
2.18
2.76
3.34
4.13
6.17
4
2.36
3.06
3.80
4.89
8.25
2.25
2.87
3.51
4.43
7.10
2.19
2.78
3.38
4.22
6.59
2.14
2.69
3.25
4.02
6.12
2.04
2.53
3.01
3.65
5.31
5
2.27
2.90
3.58
4.56
7.57
2.16
2.71
3.29
4.10
6.46
2.10
2.62
3.15
3.90
5.98
2.05
2.53
3.03
3.70
5.53
1.95
2.37
2.79
3.34
4.76
6
2.21
2.79
3.41
4.32
7.09
2.09
2.60
3.13
3.87
6.02
2.04
2.51
2.99
3.67
5.55
1.98
2.42
2.87
3.47
5.12
1.87
2.25
2.63
3.12
4.37
7
2.16
2.71
3.29
4.14
6.74
2.04
2.51
3.01
3.70
5.69
1.98
2.42
2.87
3.50
5.23
1.93
2.33
2.75
3.30
4.82
1.82
2.17
2.51
2.95
4.09
8
2.12
2.64
3.20
4.00
6.47
2.00
2.45
2.91
3.56
5.44
1.94
2.36
2.78
3.36
4.99
1.88
2.27
2.65
3.17
4.58
1.77
2.10
2.41
2.82
3.86
9
2.09
2.59
3.12
3.89
6.26
1.96
2.39
2.84
3.46
5.24
1.91
2.30
2.70
3.26
4.80
1.85
2.21
2.57
3.07
4.39
1.74
2.04
2.33
2.72
3.69
10
2.06
2.54
3.06
3.80
6.08
1.94
2.35
2.77
3.37
5.08
1.88
2.25
2.64
3.17
4.64
1.82
2.16
2.51
2.98
4.24
1.71
1.99
2.27
2.63
3.54
12
2.02
2.48
2.96
3.67
5.81
1.89
2.28
2.68
3.23
4.82
1.83
2.18
2.54
3.03
4.39
1.77
2.09
2.41
2.84
4.00
1.66
1.92
2.17
2.50
3.32
15
1.97
2.40
2.86
3.52
5.54
1.84
2.20
2.57
3.09
4.56
1.78
2.11
2.44
2.89
4.14
1.72
2.01
2.31
2.70
3.75
1.60
1.84
2.06
2.35
3.08
20
1.92
2.33
2.76
3.37
5.25
1.79
2.12
2.46
2.94
4.29
1.73
2.03
2.33
2.74
3.87
1.62
1.93
2.20
2.55
3.49
1.54
1.75
1.94
2.20
2.83
24
1.90
2.29
2.70
3.29
5.10
1.77
2.08
2.41
2.86
4.15
1.70
1.98
2.27
2.66
3.74
1.64
1.89
2.14
2.47
3.36
1.51
1.70
1.88
2.12
2.69
30
1.87
2.25
2.64
3.21
4.95
1.74
2.04
2.35
2.78
4.00
1.67
1.94
2.21
2.58
3.59
1.61
1.84
2.07
2.39
3.22
1.48
1.65
1.82
2.03
2.55
60
1.82
2.16
2.52
3.05
4.64
1.68
1.95
2.22
2.61
3.70
1.61
1.84
2.08
2.40
3.29
1.54
1.74
1.94
2.21
2.92
1.40
1.53
1.67
1.84
2.25
120
1.79
2.11
2.46
2.96
4.48
1.64
1.90
2.16
2.52
3.54
1.57
1.79
2.01
2.31
3.14
1.50
1.68
1.87
2.11
2.76
1.35
1.47
1.58
1.73
2.08
CO
1.76
2.07
2.40
2.87
4.31
1.61
1.84
2.09
2.42
3.38
1.53
1.73
1.94
2.21
2.97
1.46
1.62
1.79
2.01
2.59
1.29
1.39
1.48
1.60
1.89
EPA QA/G-9S
162
February 2006

-------
                        TABLE A-10. PERCENTILES OF THE F-DISTRIBUTION (CONT.)
Degrees
Freedom for
Denominator
120 .90
.95
.975
.99
.999
oo .90
.95
.975
.99
.999
Degrees of Freedom for Numerator
1
2.75
3.92
5.15
6.85
11.4
2.71
3.84
5.02
6.63
10.8
2
2.35
3.07
3.80
4.79
7.32
2.30
3.00
3.69
4.61
6.91
3
2.13
2.68
3.23
3.95
5.78
2.08
2.60
3.12
3.78
5.42
4
1.99
2.45
2.89
3.48
4.95
1.94
2.37
2.79
3.32
4.62
5
1.90
2.29
2.67
3.17
4.42
1.85
2.21
2.57
3.02
4.10
6
1.82
2.18
2.52
2.96
4.04
1.77
2.10
22.41
2.80
3.74
7
1.77
2.09
2.39
2.79
3.77
1.72
2.01
2.29
2.64
3.47
8
1.72
2.02
2.30
2.66
3.55
1.67
1.94
2.19
2.51
3.27
9
1.68
1.96
2.22
2.56
3.38
1.63
1.88
2.11
2.41
3.10
10
1.65
1.91
2.16
2.47
3.24
1.60
1.83
2.05
2.32
2.96
12
1.60
1.83
2.05
2.34
3.02
1.55
1.75
1.94
2.18
2.74
15
1.55
1.75
1.95
2.19
2.78
1.49
1.67
1.83
2.04
2.51
20
1.48
1.66
1.82
2.03
2.53
1.42
1.57
1.71
1.88
2.27
24
1.45
1.61
1.76
1.95
2.40
1.38
1.52
1.64
1.79
2.13
30
1.41
1.55
1.69
1.86
2.26
1.34
1.46
1.57
1.70
1.99
60
1.32
1.43
1.53
1.66
1.95
1.24
1.32
1.39
1.47
1.66
120
1.26
1.35
1.43
1.53
1.77
1.17
1.22
1.27
1.32
1.45
CO
1.19
1.25
1.31
1.38
1.54
1.00
1.00
1.00
1.00
1.00
EPA QA/G-9S
163
February 2006

-------
   TABLE A-ll. VALUES OF THE PARAMETER A FOR COHEN'S ESTIMATES
                 ADJUSTING FOR NONDETECTED VALUES
7
.00
.05
.10
.15
.20
.25
.30
.35
.40
.45
.50
.55
.60
.65
.70
.75
.80
.85
.90
.95
1.00
h
.01
.010100
.010551
.010950
.011310
.011642
.011952
.012243
.012520
.012784
.013036
.013279
.013513
.013739
.013958
.014171
.014378
.014579
.014773
.014967
.015154
.015338
.02
.020400
.021294
.022082
.022798
.023459
.024076
.024658
.025211
.025738
.026243
.026728
.027196
.027849
.028087
.028513
.029927
.029330
.029723
.030107
.030483
.030850
.03
.030902
.032225
.033398
.034466
.035453
.036377
.037249
.038077
.038866
.039624
.040352
.041054
.041733
.042391
.043030
.043652
.044258
.044848
.045425
.045989
.046540
.04
.041583
.043350
.044902
.046318
.047829
.048858
.050018
.051120
.052173
.053182
.054153
.055089
.055995
.056874
.057726
.058556
.059364
.060153
.060923
.061676
.062413
.05
.052507
.054670
.056596
.058356
.059990
.061522
.062969
.064345
.065660
.066921
.068135
.069306
.070439
.071538
.072505
.073643
.074655
.075642
.075606
.077549
.078471
.06
.063625
.066159
.068483
.070586
.072539
.074372
.076106
.077736
.079332
.080845
.082301
.083708
.085068
.086388
.087670
.088917
.090133
.091319
.092477
.093611
.094720
.07
.074953
.077909
.080563
.083009
.085280
.087413
.089433
.091355
.093193
.094958
.096657
.098298
.099887
.10143
.10292
.10438
.10580
.10719
.10854
.10987
.11116
.08
.08649
.08983
.09285
.09563
.09822
.10065
.10295
.10515
.10725
.10926
.11121
.11208
.11490
.11666
.11837
.12004
.12167
.12225
.12480
.12632
.12780
.09
.09824
.10197
.10534
.10845
.11135
.11408
.11667
.11914
.12150
.12377
.12595
.12806
.13011
.13209
.13402
.13590
.13775
.13952
.14126
.14297
.14465
.10
.11020
.11431
.11804
.12148
.12469
.12772
.13059
.13333
.13595
.13847
.14090
.14325
.14552
.14773
.14987
.15196
.15400
.15599
.15793
.15983
.16170
.15
.17342
.17925
.18479
.18985
.19460
.19910
.20338
.20747
.21129
.21517
.21882
.22225
.22578
.22910
.23234
.23550
.23858
.24158
.24452
.24740
.25022
.20
.24268
.25033
.25741
.26405
.27031
.27626
.28193
.28737
.29250
.29765
.30253
.30725
.31184
.31630
.32065
.32489
.32903
.33307
.33703
.34091
.34471
Y
.00
.05
.10
.15
.20
.25
.30
.35
.40
.45
.50
.55
.60
.65
.70
.75
.80
.85
.90
.95
1.00
h
.25
.31862
.32793
.33662
.34480
.35255
.35993
.36700
.37379
.38033
.38665
.39276
.39679
.40447
.41008
.41555
.42090
.42612
.43122
.43622
.44112
.44592
.30
.4021
.4130
.4233
.4330
.4422
.4510
.4595
.4676
.4735
.4831
.4904
.4976
.5045
.5114
.5180
.5245
.5308
.5370
.5430
.5490
.5548
.35
.4941
.5066
.5184
.5296
.5403
.5506
.5604
.5699
.5791
.5880
.5967
.6061
.6133
.6213
.6291
.6367
.6441
.6515
.6586
.6656
.6724
.40
.5961
.6101
.6234
.6361
.6483
.6600
.6713
.6821
.6927
.7029
.7129
.7225
.7320
.7412
.7502
.7590
.7676
.7781
.7844
.7925
.8005
.45
.7096
.7252
.7400
.7542
.7673
.7810
.7937
.8060
.8179
.8295
.8408
.8517
.8625
.8729
.8832
.8932
.9031
.9127
.9222
.9314
.9406
.50
.8388
.8540
.8703
.8860
.9012
.9158
.9300
.9437
.9570
.9700
.9826
.9950
1.007
1.019
1.030
1.042
1.053
1.064
1.074
1.085
1.095
.55
.9808
.9994
1.017
1.035
1.051
1.067
1.083
1.098
1.113
1.127
1.141
1.155
1.169
1.182
1.195
1.207
1.220
1.232
1.244
1.255
1.287
.60
1.145
1.166
1.185
1.204
1.222
1.240
1.257
1.274
1.290
1.306
1.321
1.337
1.351
1.368
1.380
1.394
1.408
1.422
1.435
1.448
1.461
.65
1.336
1.358
1.379
1.400
1.419
1.439
1.457
1.475
1.494
1.511
1.528
1.545
1.561
1.577
1.593
1.608
1.624
1.639
1.653
1.668
1.882
.70
1.561
1.585
1.608
1.630
1.651
1.672
1.693
1.713
1.732
1.751
1.770
1.788
1.806
1.824
1.841
1.851
1.875
1.892
1.908
1.924
1.940
.80
2.176
2.203
2.229
2.255
2.280
2.305
2.329
2.353
2.376
2.399
2.421
2.443
2.465
2.486
2.507
2.528
2.548
2.568
2.588
2.607
2.626
.90
3.283
3.314
3.345
3.376
3.405
3.435
3.464
3.492
3.520
3.547
3.575
3.601
3.628
3.654
3.679
3.705
3.730
3.754
3.779
3.803
3.827
EPA QA/G-9S
164
February 2006

-------
                TABLE A-12a: CRITICAL VALUES FOR THE
                    MANN-KENDALL TEST FOR TREND
Significance Level
a
0.20
0.10
0.05
0.01
Sample size «
4
4
6
6
-
5
6
8
8
10
6
7
9
11
13
7
7
11
13
17
8
8
12
16
20
9
10
14
18
24
10
11
17
21
27
11
13
19
23
31
12
14
20
26
34
13
16
24
28
40
14
17
25
31
43
15
19
29
35
47
16
20
30
38
52
17
22
34
42
58
18
25
35
45
63
19
27
39
49
67
20
28
42
52
72
          TABLE A-12b: PROBABILITIES FOR THE SMALL-SAMPLE
                    MANN-KENDALL TEST FOR TREND
s
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36




«
458 9
0.625 0.592 0.548 0.540
0.375 0.408 0.452 0.460
0.167 0.242 0.360 0.381
0.042 0.117 0.274 0.306
0.042 0.199 0.238
0.0083 0.138 0.179
0.089 0.130
0.054 0.090
0.031 0.060
0.016 0.038
0.0071 0.022
0.0028 0.012
0.00087 0.0063
0.00019 0.0029
0.000025 0.0012
0.00043
0.00012
0.000025
0.0000028




S
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
«
67 10
0.500 0.500 0.500
0.360 0.386 0.431
0.235 0.281 0.364
0.136 0.191 0.300
0.068 0.119 0.242
0.028 0.068 0.190
0.0083 0.035 0.146
0.0014 0.015 0.108
0.0054 0.078
0.0014 0.054
0.00020 0.036
0.023
0.014
0.0083
0.0046
0.0023
0.0011
0.00047
0.00018
0.000058
0.000015
0.0000028
0.00000028
EPA QA/G-9S
165
February 2006

-------
                  TABLE A-13. QUANTILES FOR THE WALD-WOLFOWITZ TEST FOR RUNS
n
4
/;/
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Wo.oi
-
-
-
-
-
-
-
3
3
3
3
3
4
4
4
4
W'o.os
-
-
3
3
3
3
4
4
4
4
5
5
5
5
5
5
W^o.io
3
3
4
4
4
4
5
5
5
5
6
6
6
6
6
6
n
5
m
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
W'o.oi
3
3
3
3
3
4
4
4
4
5
5
5
5
5
5
5
W-0.05
4
4
4
4
4
4
5
5
5
6
6
6
6
6
6
6
W().10
4
4
5
5
5
5
6
6
6
6
7
7
7
7
7
7
n
6
m
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
W'o.oi
3
4
4
4
4
5
5
5
5
5
6
6
6
6
6
W'o.os
4
5
5
5
6
6
6
7
7
7
7
7
7
7
7
Wo.io
5
6
6
6
7
7
7
8
8
8
8
8
8
8
8
n
7
m
1
8
9
10
11
12
13
14
15
16
17
18
19
20
W'o.oi

4
4
4
5
5
5
5
6
6
6
7
7
7
W^o.os

5
5
6
6
6
7
7
7
7
8
8
8
8
W^o.io

6
6
7
7
7
7
8
8
8
8
9
9
9
EPA QA/G-9S
166
February 2006

-------
              TABLE A-13.  QUANTILES FOR THE WALD-WOLFOWITZ TEST FOR RUNS (CONT.)
n

8












/;/
20
8
9
10
11
12
13
14
15
16
17
18
19
20
Wo.oi
4
5
5
5
6
6
6
6
6
7
7
7
7
7
W'o.os
5
6
6
7
7
7
7
8
8
8
8
9
9
9
W^o.io
6
6
7
7
8
8
8
8
9
9
9
10
10
10
n
m
Wfj.fti
W-0.05
W().10
n
m
Waal
W'o.os
Wo.io
n
m
Wfj.fti
W^o.os
W^o.io
9











9
10
11
12
13
14
15
16
17
18
19
20
5
6
6
6
7
7
7
7
8
8
8
8
6
7
7
7
8
8
9
9
9
9
10
10
7
8
8
8
9
9
9
10
10
10
11
11
10










10
11
12
13
14
15
16
17
18
19
20
7
7
7
7
8
8
8
8
9
9
9
8
8
8
9
9
9
9
10
10
11
11
9
9
9
10
10
10
11
11
11
12
12
11









11
12
13
14
15
16
17
18
19
20
7
7
7
8
8
8
9
9
9
9
8
8
9
9
9
10
10
10
11
11
9
9
10
10
10
11
11
11
12
12
EPA QA/G-9S
167
February 2006

-------
              TABLE A-13.  QUANTILES FOR THE WALD-WOLFOWITZ TEST FOR RUNS (CONT.)
n
/;/
Wo.oi
W'o.os
W^o.io
n
m
Wfj.fti
W'o.os
Wo.io
n
m
Wfj.fti
W^o.os
W^o.io
12








12
13
14
15
16
17
18
19
20
7
8
8
8
9
9
9
10
10
9
10
10
10
11
11
11
12
12
10
10
10
11
12
12
12
13
13
13







13
14
15
16
17
18
19
20
9
9
10
10
10
10
10
11
10
11
11
11
11
11
12
12
11
12
12
12
12
13
13
13
14






14
15
16
17
18
19
20
9
9
10
10
10
11
11
11
11
11
12
12
13
13
12
12
13
13
13
14
14
15





15
16
17
18
19
20
8
9
10
10
11
12
10
11
12
12
13
14
11
12
13
14
15
16
16


16
17

10
11

12
13

13
14


17


17
18

11
11

13
13

14
15


18


18
19

13
13

14
14

16
16


19

19
20
14
14
16
16
17
17

EPA QA/G-9S
168
February 2006

-------
                  TABLE A-13.  QUANTILES FOR THE WALD-WOLFOWITZ TEST FOR RUNS (CONT.)
n



/;/
18
19
20
Wo.oi
11
12
13
W'o.os
13
14
14
W^o.io
14
15
16
n


m
19
20
W'o.oi
12
12
W-0.05
14
14
W().10
16
16
n

m
20
W'o.oi
14
W'o.os
15
w,M
16


n
m
»W
W^o.os
WW

20
20
14
16
17
When w or OT is greater than 20 the Wp quantile is given by:
                 2mn(2mn-m -n}
Wn = H	h zn I "	!,"••••———-_j_  ^gj-g _ js me appr0priate quantile from the standard normal (see Table A-1).
  p     m+n   p\l(m+n)2(m+n-l)       P
EPA QA/G-9S
169
February 2006

-------
         TABLE A-14. CRITICAL VALUES FOR THE SLIPPAGE TEST
            LEVEL OF SIGNIFICANCE (a) APPROXIMATELY 0.01

m = size of population 2 (background)
« = size of population 1 (site)

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
5
5
5
5
4
4
4
4
4
4
4
3
3
3
3
3
3
6
6
5
5
5
5
4
4
4
4
4
4
4
4
3
3
3
7
6
6
6
5
5
5
5
4
4
4
4
4
4
4
4
4
8
7
6
6
6
5
5
5
5
5
4
4
4
4
4
4
4
9
8
7
7
6
6
6
5
5
5
5
5
4
4
4
4
4
10
8
8
7
7
6
6
6
5
5
5
5
5
5
5
4
4
11
9
8
8
7
7
6
6
6
6
5
5
5
5
5
5
5
12
9
9
8
8
7
7
6
6
6
6
5
5
5
5
5
5
13
10
9
9
8
8
7
7
6
6
6
6
6
5
5
5
5
14
11
10
9
8
8
7
7
7
7
6
6
6
6
5
5
5
15
11
10
10
9
8
8
7
7
7
7
6
6
6
6
6
5
16
12
11
10
9
9
8
8
7
7
7
7
6
6
6
6
6
17
12
11
10
10
9
9
8
8
7
7
7
7
6
6
6
6
18
13
12
11
10
10
9
9
8
8
7
7
7
7
6
6
6
19
14
12
11
11
10
9
9
8
8
8
7
7
7
7
6
6
20
14
13
12
11
10
10
9
9
8
8
8
7
7
7
7
7

&
a
1
M
-1

-------
      TABLE A-14. CRITICAL VALUES FOR THE SLIPPAGE TEST (CONT.)
             LEVEL OF SIGNIFICANCE (a) APPROXIMATELY 0.05

^
=
1
M
-1
u
03
.0
^-^

-------
      TABLE A-14.  CRITICAL VALUES FOR THE SLIPPAGE TEST (CONT.)
            LEVEL OF SIGNIFICANCE (a) APPROXIMATELY 0.10

m = size of population 2 (background)
« = size of population 1 (site)

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
5
3
3
3
3
3
2
2
2
2
2
2
2
2
2
2
2
6
4
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
7
4
4
3
3
3
3
3
3
3
2
2
2
2
2
2
2
8
4
4
4
4
3
3
3
3
3
3
3
3
2
2
2
2
9
5
4
4
4
4
3
3
3
3
3
3
3
3
3
2
2
10
5
5
4
4
4
4
3
3
3
3
3
3
3
3
3
3
11
6
5
5
4
4
4
4
3
3
3
3
3
3
3
3
3
12
6
5
5
5
4
4
4
4
3
3
3
3
3
3
3
3
13
6
6
5
5
4
4
4
4
4
3
3
3
3
3
3
3
14
7
6
5
5
5
4
4
4
4
4
3
3
3
3
3
3
15
7
6
6
5
5
5
4
4
4
4
4
4
3
3
3
3
16
7
7
6
6
5
5
5
4
4
4
4
4
4
3
3
3
17
8
7
6
6
5
5
5
5
4
4
4
4
4
4
3
3
18
8
7
7
6
6
5
5
5
4
4
4
4
4
4
4
3
19
9
8
7
6
6
5
5
5
5
4
4
4
4
4
4
4
20
9
8
7
7
6
6
5
5
5
5
4
4
4
4
4
4

&
a
1
M
-1

-------
                              TABLE A-15. DUNNETT'S TEST (ONE TAILED)
Degrees
of
Freedom
2
3
4
5
6
7
8
9
10
12
16
«
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
Total Number of Investigated Groups (k - 1)
2
3.80
2.54
2.94
2.13
2.61
1.96
2.44
1.87
2.34
1.82
2.27
1.78
2.22
1.75
2.18
1.73
2.15
1.71
2.11
1.69
2.06
1.66
3
4.34
2.92
3.28
2.41
2.88
2.20
2.68
2.09
2.56
2.02
2.48
1.98
2.42
1.94
2.37
1.92
2.34
1.90
2.29
1.87
2.23
1.83
4
4.71
3.20
3.52
2.61
3.08
2.37
2.85
2.24
2.71
2.17
2.82
2.11
2.55
2.08
2.50
2.05
2.47
2.02
2.41
1.99
2.34
1.95
5
5.08
3.40
3.70
2.76
3.22
2.50
2.98
2.36
2.83
2.27
2.73
2.22
2.66
2.17
2.60
2.14
2.56
2.12
2.50
2.08
2.43
2.04
6
5.24
3.57
3.85
2.87
3.34
2.60
3.08
2.45
2.92
2.36
2.81
2.30
2.74
2.25
2.68
2.22
2.64
2.19
2.58
2.16
2.50
2.11
7
5.43
3.71
3.97
2.97
3.44
2.68
3.16
2.53
3.00
2.43
2.89
2.37
2.81
2.32
2.75
2.28
2.70
2.26
2.64
2.22
2.56
2.17
8
5.60
3.83
4.08
3.06
3.52
2.75
3.24
2.59
3.06
2.49
2.95
2.42
2.87
2.38
2.81
2.34
2.76
2.31
2.69
2.27
2.61
2.22
9
5.75
3.94
4.17
3.13
3.59
2.82
3.30
2.65
3.12
2.54
3.00
2.47
2.92
2.42
2.86
2.39
2.81
2.35
2.74
2.31
2.65
2.26
10
5.88
4.03
4.25
3.20
3.66
2.87
3.36
2.70
3.17
2.59
3.05
2.52
2.96
2.47
2.90
2.43
2.85
2.40
2.78
2.35
2.69
2.30
12
6.11
4.19
4.39
3.31
3.77
2.97
3.45
2.78
3.26
2.67
3.13
2.59
3.04
2.54
2.97
2.50
2.92
2.46
2.84
2.42
2.75
2.36
14
6.29
4.32
4.51
3.41
3.86
3.05
3.53
2.86
3.33
2.74
3.20
2.66
3.11
2.60
3.04
2.56
2.98
2.52
2.90
2.47
2.81
2.41
16
6.45
4.44
4.61
3.49
3.94
3.11
3.60
2.92
3.48
2.79
3.26
2.71
3.16
2.65
3.09
2.61
3.03
2.57
2.95
2.52
2.85
2.46
EPA QA/G-9S
173
February 2006

-------
                           TABLE A-15.  DUNNETT'S TEST (ONE TAILED) (CONT.)
Degrees
of
Freedom
20
24
30
40
50
60
70
SO
90
100
120
00
«
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
.05
.10
Total Number of Investigated Groups (k - 1)
2
2.03
1.64
2.01
1.63
1.99
1.62
1.97
1.61
1.96
1.61
1.95
1.60
1.95
1.60
1.94
1.60
1.94
1.60
1.93
1.59
1.93
1.59
1.92
1.58
3
2.19
1.81
2.17
2.80
2.15
1.79
2.13
1.77
2.11
1.77
2.10
1.76
2.10
1.76
2.10
1.76
2.09
1.76
2.08
1.75
2.08
1.75
2.06
1.73
4
2.30
1.93
2.28
1.91
2.25
1.90
2.23
1.88
2.22
1.88
2.21
1.87
2.21
1.87
2.20
1.87
2.20
1.86
2.18
1.85
2.18
1.85
2.16
1.84
5
2.39
2.01
2.36
2.00
2.34
1.98
2.31
1.96
2.29
1.96
2.28
1.95
2.28
1.95
2.28
1.95
2.27
1.94
2.27
1.93
2.26
1.93
2.23
1.92
6
2.46
2.08
2.43
2.06
2.40
2.05
2.37
2.03
2.32
2.02
2.34
2.01
2.34
2.01
2.34
2.01
2.33
2.00
2.33
1.99
2.32
1.99
2.29
1.98
7
2.51
2.14
2.48
2.12
2.45
2.10
2.42
2.08
2.41
2.07
2.40
2.06
2.40
2.06
2.39
2.06
2.39
2.06
2.38
2.05
2.37
2.05
2.34
2.03
8
2.56
2.19
2.53
2.17
2.50
2.15
2.47
2.13
2.45
2.12
2.44
2.11
2.44
2.11
2.43
2.10
2.43
2.10
2.42
2.09
2.41
2.09
2.38
2.07
9
2.60
2.23
2.57
2.21
2.54
2.19
2.51
2.17
2.49
2.16
2.48
2.15
2.48
2.15
2.47
2.15
2.47
2.14
2.46
2.14
2.45
2.13
2.42
2.11
10
2.64
2.26
2.60
2.24
2.57
2.22
2.54
2.20
2.52
2.19
2.51
2.18
2.51
2.18
2.50
2.18
2.50
2.17
2.49
2.17
2.48
2.16
2.45
2.14
12
2.70
2.33
2.66
2.30
2.63
2.28
2.60
2.26
2.58
2.25
2.57
2.24
2.56
2.24
2.55
2.23
2.55
2.23
2.54
2.22
2.53
2.22
2.50
2.20
14
2.75
2.38
2.72
2.35
2.68
2.33
2.65
2.31
2.63
2.30
2.61
2.29
2.61
2.29
2.60
2.28
2.60
2.28
2.59
2.27
2.58
2.27
2.55
2.24
16
2.80
2.42
2.76
2.40
2.72
2.37
2.69
2.35
2.67
2.34
2.65
2.33
2.65
2.33
2.64
2.32
2.63
2.31
2.63
2.31
2.62
2.31
2.58
2.28
EPA QA/G-9S
174
February 2006

-------
        TABLE A-16. APPROXIMATE a-LEVEL CRITICAL POINTS FOR
                    RANK VON NEUMANN RATIO TEST
n
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
32
34
36
38
40
42
44
46
48
50
55
60
65
70
75
80
85
90
95
100
«
0.05

0.70
0.80
0.86
0.93
0.98
1.04
1.08
1.11
1.14
1.17
1.19
1.21
1.24
1.26
1.27
1.29
1.31
1.32
1.33
1.35
1.36
1.37
1.38
1.39
1.40
1.41
1.43
1.45
1.46
1.48
1.49
1.50
1.51
1.52
1.53
1.54
1.56
1.58
1.60
1.61
1.62
1.64
1.65
1.66
1.66
1.67
0.10

0.60
0.97
1.11
1.14
1.18
1.23
1.26
1.29
1.32
1.34
1.36
1.38
1.40
1.41
1.43
1.44
1.45
1.46
1.48
1.49
1.50
1.51
1.51
1.52
1.53
1.54
1.55
1.57
1.58
1.59
1.60
1.61
1.62
1.63
1.63
1.64
1.66
1.67
1.68
1.70
1.71
1.71
1.72
1.73
1.74
1.74
EPA QA/G-9S
175
February 2006

-------
TABLE A-17. VALUES OF H^ = H0.90 FOR COMPUTING A ONE-SIDED UPPER 90%
               CONFIDENCE LIMIT ON A LOGNORMAL MEAN


0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1.25
1.50
1.75
2.00
2.50
3.00
3.50
4.00
4.50
5.00
6.00
7.00
8.00
9.00
10.00
n
3
1.686
1.885
2.156
2.521
2.990
3.542
4.136
4.742
5.349
5.955
7.466
8.973
10.48
11.98
14.99
18.00
21.00
24.00
27.01
30.01
36.02
42.02
48.03
54.03
60.04
5
1.438
1.522
1.627
1.755
1.907
2.084
2.284
2.503
2.736
2.980
3.617
4.276
4.944
5.619
6.979
8.346
9.717
11.09
12.47
13.84
16.60
19.35
22.11
24.87
27.63
7
1.381
1.442
1.517
1.607
1.712
1.834
1.970
2.119
2.280
2.450
2.904
3.383
3.877
4.380
5.401
6.434
7.473
8.516
9.562
10.61
12.71
14.81
16.91
19.02
21.12
10
1.349
1.396
1.453
1.523
1.604
1.696
1.800
1.914
2.036
2.167
2.518
2.896
3.289
3.693
4.518
5.359
6.208
7.062
7.919
8.779
10.50
12.23
13.96
15.70
17.43
12
1.338
1.380
1.432
1.494
1.567
1.650
1.743
1.845
1.955
2.073
2.391
2.733
3.092
3.461
4.220
4.994
5.778
6.566
7.360
8.155
9.751
11.35
12.96
14.56
16.17
15
1.328
1.365
1.411
1.467
1.532
1.606
1.690
1.781
1.880
1.985
2.271
2.581
2.907
3.244
3.938
4.650
5.370
6.097
6.829
7.563
9.037
10.52
12.00
13.48
14.97
21
1.317
1.348
1.388
1.437
1.494
1.558
1.631
1.710
1.797
1.889
2.141
2.415
2.705
3.005
3.629
4.270
4.921
5.580
6.243
6.909
8.248
9.592
10.94
12.29
13.64
31
1.308
1.335
1.370
1.412
1.462
1.519
1.583
1.654
1.731
1.812
2.036
2.282
2.543
2.814
3.380
3.964
4.559
5.161
5.769
6.379
7.607
8.842
10.08
11.32
12.56
51
1.301
1.324
1.354
1.390
1.434
1.485
1.541
1.604
1.672
1.745
1.946
2.166
2.402
2.648
3.163
3.697
4.242
4.796
5.354
5.916
7.048
8.186
9.329
10.48
11.62
101
1.295
1.314
1.339
1.371
1.409
1.454
1.504
1.560
1.621
1.686
1.866
2.066
2.279
2.503
2.974
3.463
3.965
4.474
4.989
5.508
6.555
7.607
8.665
9.725
10.79
TABLE A-17. VALUES OF Ha = H0.w FOR COMPUTING A ONE-SIDED LOWER 10%
               CONFIDENCE LIMIT ON A LOGNORMAL MEAN


0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1.25
1.50
1.75
2.00
2.50
3.00
3.50
4.00
4.50
5.00
6.00
7.00
8.00
9.00
10.00
n
3
-1.431
-1.350
-1.289
-1.245
-1.213
-1.190
-1.176
-1.168
-1.165
-1.166
-1.184
-1.217
-1.260
-1.310
-1.426
-1.560
-1.710
-1.871
-2.041
-2.217
-2.581
-2.955
-3.336
-3.721
-4.109
5
-1.320
-1.281
-1.252
-1.233
-1.221
-1.215
-1.215
-1.219
-1.227
-1.239
-1.280
-1.334
-1.398
-1.470
-1.634
-1.817
-2.014
-2.221
-2.435
-2.654
-3.104
-3.564
-4.030
-4.500
-4.973
7
-1.296
-1.268
-1.250
-1.239
-1.234
-1.235
-1.241
-1.251
-1.264
-1.281
-1.334
-1.400
-1.477
-1.562
-1.751
-1.960
-2.183
-2.415
-2.653
-2.897
-3.396
-3.904
-4.418
-4.937
-5.459
10
-1.285
-1.266
-1.254
-1.249
-1.250
-1.256
-1.266
-1.280
-1.298
-1.320
-1.384
-1.462
-1.551
-1.647
-1.862
-2.095
-2.341
-2.596
-2.858
-3.126
-3.671
-4.226
-4.787
-5.352
-5.920
12
-1.281
-1.266
-1.257
-1.254
-1.257
-1.266
-1.278
-1.294
-1.314
-1.337
-1.407
-1.491
-1.585
-1.688
-1.913
-2.157
-2.415
-2.681
-2.955
-3.233
-3.800
-4.377
-4.960
-5.547
-6.137
15
-1.279
-1.266
-1.260
-1.261
-1.266
-1.277
-1.292
-1.311
-1.333
-1.358
-1.434
-1.523
-1.624
-1.733
-1.971
-2.229
-2.499
-2.778
-3.064
-3.356
-3.949
-4.549
-5.159
-5.771
-6.386
21
-1.277
-1.268
-1.266
-1.270
-1.279
-1.292
-1.310
-1.332
-1.358
-1.387
-1.470
-1.568
-1.677
-1.795
-2.051
-2.326
-2.615
-2.913
-3.217
-3.525
-4.153
-4.790
-5.433
-6.080
-6.730
31
-1.277
-1.272
-1.272
-1.279
-1.291
-1.307
-1.329
-1.354
-1.383
-1.414
-1.507
-1.613
-1.732
-1.859
-2.133
-2.427
-2.733
-3.050
-3.372
-3.698
-4.363
-5.037
-5.715
-6.399
-7.085
51
-1.278
-1.275
-1.280
-1.289
-1.304
-1.324
-1.349
-1.377
-1.409
-1.445
-1.547
-1.663
-1.790
-1.928
-2.223
-2.536
-2.864
-3.200
-3.542
-3.889
-4.594
-5.307
-6.026
-6.748
-7.474
101
-1.279
-1.280
-1.287
-1.301
-1.319
-1.342
-1.370
-1.403
-1.439
-1.478
-1.589
-1.716
-1.855
-2.003
-2.321
-2.657
-3.007
-3.366
-3.731
-4.100
-4.849
-5.607
-6.370
-7.136
-7.906
EPA QA/G-9S
176
February 2006

-------
TABLE A-17. VALUES OF H^.a = H0.95 FOR COMPUTING A ONE-SIDED UPPER 95%
               CONFIDENCE LIMIT ON A LOGNORMAL MEAN

Sy
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1.25
1.50
1.75
2.00
2.50
3.00
3.50
4.00
4.50
5.00
6.00
7.00
8.00
9.00
10.00
n
3
2.750
3.295
4.109
5.220
6.495
7.807
9.120
10.43
11.74
13.05
16.33
19.60
22.87
26.14
32.69
39.23
45.77
52.31
58.85
65.39
78.47
91.55
104.6
117.7
130.8
5
2.035
2.198
2.402
2.651
2.947
3.287
3.662
4.062
4.478
4.905
6.001
7.120
8.250
9.387
11.67
13.97
16.27
18.58
20.88
23.19
27.81
32.43
37.06
41.68
46.31
7
1.886
1.992
2.125
2.282
2.465
2.673
2.904
3.155
3.420
3.698
4.426
5.184
5.960
6.747
8.339
9.945
11.56
13.18
14.80
16.43
19.68
22.94
26.20
29.64
32.73
10
1.802
1.881
1.977
2.089
2.220
2.368
2.532
2.710
2.902
3.103
3.639
4.207
4.795
5.396
6.621
7.864
9.118
10.38
11.64
12.91
15.45
18.00
20.55
23.10
25.66
12
1.775
1.843
1.927
2.026
2.141
2.271
2.414
2.570
2.738
2.915
3.389
3.896
4.422
4.962
6.067
7.191
8.326
9.469
10.62
11.77
14.08
16.39
18.71
21.03
23.35
15
1.749
1.809
1.882
1.968
2.068
2.181
2.306
2.443
2.589
2.744
3.163
3.612
4.081
4.564
5.557
6.570
7.596
8.630
9.669
10.71
12.81
14.90
17.01
19.11
21.22
21
1.722
1.771
1.833
1.905
1.989
2.085
2.191
2.307
2.432
2.564
2.923
3.311
3.719
4.141
5.013
5.907
6.815
7.731
8.652
9.579
11.44
13.31
15.18
17.05
18.93
31
1.701
1.742
1.793
1.856
1.928
2.010
2.102
2.202
2.310
2.423
2.737
3.077
3.437
3.812
4.588
5.388
6.201
7.024
7.854
8.688
10.36
12.05
13.74
15.43
17.13
51
1.684
1.718
1.761
1.813
1.876
1.946
2.025
2.112
2.206
2.306
2.580
2.881
3.200
3.533
4.228
4.947
5.681
6.424
7.174
7.929
9.449
10.98
12.51
14.05
15.59
101
1.670
1.697
1.733
1.777
1.830
1.891
1.960
2.035
2.117
2.205
2.447
2.713
2.997
3.295
3.920
4.569
5.233
5.908
6.590
7.277
8.661
10.05
11.45
12.85
14.26
 TABLE A-17. VALUES OF Ha= H0.os FOR COMPUTING A ONE-SIDED LOWER 5%
               CONFIDENCE LIMIT ON A LOGNORMAL MEAN


0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1.25
1.50
1.75
2.00
2.50
3.00
3.50
4.00
4.50
5.00
6.00
7.00
8.00
9.00
10.00
n
3
-2.130
-1.949
-1.816
-1.717
-1.644
-1.589
-1.549
-1.521
-1.502
-1.490
-1.486
-1.508
-1.547
-1.598
-1.727
-1.880
-2.051
-2.237
-2.434
-2.638
-3.062
-3.499
-3.945
-4.397
-4.852
5
-1.806
-1.729
-1.669
-1.625
-1.594
-1.573
-1.560
-1.555
-1.556
-1.562
-1.596
-1.650
-1.719
-1.799
-1.986
-2.199
-2.429
-2.672
-2.924
-3.183
-3.715
-4.260
-4.812
-5.371
-5.933
7
-1.731
-1.678
-1.639
-1.611
-1.594
-1.584
-1.582
-1.586
-1.595
-1.610
-1.662
-1.733
-1.819
-1.917
-2.138
-2.384
-2.647
-2.922
-3.206
-3.497
-4.092
-4.699
-5.315
-5.936
-6.560
10
-1.690
-1.653
-1.627
-1.611
-1.603
-1.602
-1.608
-1.620
-1.637
-1.658
-1.727
-1.814
-1.916
-2.029
-2.283
-2.560
-2.855
-3.161
-3.476
-3.798
-4.455
-5.123
-5.800
-6.482
-7.168
12
-1.677
-1.646
-1.625
-1.613
-1.609
-1.612
-1.622
-1.636
-1.656
-1.681
-1.758
-1.853
-1.962
-2.083
-2.351
-2.644
-2.953
-3.275
-3.605
-3.941
-4.627
-5.325
-6.031
-6.742
-7.458
15
-1.666
-1.640
-1.625
-1.617
-1.618
-1.625
-1.638
-1.656
-1.680
-1.707
-1.793
-1.896
-2.015
-2.144
-2.430
-2.740
-3.067
-3.406
-3.753
-4.107
-4.827
-5.559
-6.300
-7.045
-7.794
21
-1.655
-1.636
-1.627
-1.625
-1.631
-1.643
-1.661
-1.685
-1.713
-1.745
-1.842
-1.958
-2.088
-2.230
-2.540
-2.874
-3.226
-3.589
-3.960
-4.338
-5.106
-5.886
-6.674
-7.468
-8.264
31
-1.648
-1.636
-1.632
-1.635
-1.646
-1.662
-1.686
-1.714
-1.747
-1.784
-1.893
-2.020
-2.164
-2.318
-2.654
-3.014
-3.391
-3.779
-4.176
-4.579
-5.397
-6.227
-7.066
-7.909
-8.755
51
-1.644
-1.637
-1.638
-1.647
-1.663
-1.685
-1.713
-1.747
-1.785
-1.827
-1.949
-2.091
-2.247
-2.416
-2.780
-3.169
-3.574
-3.990
-4.416
-4.847
-5.721
-6.608
-7.502
-8.401
-9.302
101
-1.642
-1.641
-1.648
-1.662
-1.683
-1.711
-1.744
-1.783
-1.826
-1.874
-2.012
-2.169
-2.341
-2.526
-2.921
-3.342
-3.780
-4.228
-4.685
-5.148
-6.086
-7.036
-7.992
-8.953
-9.918
EPA QA/G-9S
177
February 2006

-------
                 TABLE A-18.  CRITICAL VALUES FOR THE SIGN TEST
The table values are such that for order statistics x[lower] and X[upper], P(x[tow,er] < true median < x[upper]) > 1 - a . Note
that the significance levels are for two-sided tests.  For one-sided test, divide the significance level in half.

n

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
a
0.20
lower upper
1 5
1 6
2 6
2 7
3 7
3 8
3 9
4 9
4 10
5 10
5 11
5 12
6 12
6 13
7 13
7 14
8 14
8 15
8 16
9 16
9 17
10 17
10 18
11 18
11 19
11 20
12 20
12 21
13 21
13 22
14 22
14 23
15 23
15 24
16 24
16 25
0.10
lower upper
1 5
1 6
1 7
2 7
2 8
2 9
3 9
3 10
4 10
4 11
4 12
5 12
5 13
6 13
6 14
6 15
7 15
7 16
8 16
8 17
8 18
9 18
9 19
10 19
10 20
11 20
11 21
11 22
12 22
12 23
13 23
13 24
14 24
14 25
14 26
15 26
0.05
lower upper
-
1 6
1 7
1 8
2 8
2 9
2 10
3 10
3 11
3 12
4 12
4 13
5 13
5 14
5 15
6 15
6 16
6 17
7 17
7 18
8 18
8 19
8 20
9 20
9 21
10 21
10 22
10 23
11 23
11 24
12 24
12 25
13 25
13 26
13 27
14 27
0.02
lower upper
-
-
1 7
1 8
1 9
1 10
2 10
2 11
2 12
3 12
3 13
3 14
4 14
4 15
5 15
5 16
5 17
6 17
6 18
6 19
7 19
7 20
8 20
8 21
8 22
9 22
9 23
9 24
10 24
10 25
11 25
11 26
11 27
12 27
12 28
13 28
0.01
lower upper
-
-
-
1 8
1 9
1 10
1 11
2 11
2 12
2 13
3 13
3 14
3 15
4 15
4 16
4 17
5 17
5 18
5 19
6 19
6 20
7 20
7 21
7 22
8 22
8 23
8 24
9 24
9 25
10 25
10 26
10 27
11 27
11 28
12 28
12 29
EPA QA/G-9S
178
February 2006

-------
                                     TABLE A-19. CRITICAL VALUES FOR THE QUANTILE TEST
                                   m is the number of background samples, n is the number of site samples, and c is the number values larger than the b\  quantile.
                                    If s is greater than or equal to the table value, q^ then reject the null hypothesis of no difference at the given significance level.

m
4
5
6
7
8
9
10
11
12
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
N
4
000
'& 3) 8
4 2 1
4
4 2 1
4
5 3 1
4 3
s 3 ;
4 3
632
2
4 3
632
0
4 3
742
2
4 3
4
742
2
4 3
4
842
432
4
5
000
'& 3) 8
4 2 1
4
S 3 ;
3
5
5 3 ;
4 3
5
632
3
5
632
4
5
742
3
5
4
742
3 2
5 4
S ¥ 2
3 2
5
4
842
3 2
5 4
6
000
S 3) 8
j j ;
5
j j ;
5
632
3
5
6
632
3
5
6
742
5
4
6
742
5 4
6
S 4 2
5
6 4
S ¥ 2
3
5
6 4
9 J 2
5 2
4
6 5
7
000
S 3) S
5 3 1
5
6 J 2
6
6 J 2
5
6
742
6 4
7
742
4
6
S ¥ 2
6 4
7
842
6 4
7
9 J 2
6 4
7 5
9 J 2
6 4
7 5
8
000
'& 3) S
632
6
6 J 2
6
742
6 4
7
742
4
6
7
S ¥ 2
6
4
7
842
6 4
7
9 J 2
6 4
7 5
9 J 2
6 4
7 5
;o 5 2
6 4
7 5
9
000
S 3) 8
6 J 2
6
742
7
742
6 4
7
S ¥ 2
4
7
8
S 4 2
4
7
8
9 J 2
7 5
8
9 J 2
7 5
8
10 5 2
4
7
8 5
10 5 2
4
7
8 5
10
000
o1 3) 8
742
7
742
1
842
1
8
S 4 2
4
7
8
9 J 2
7
5
8
9 J 2
7
5
8
10 5 2
1
8 5
9
;o J 2
7
5
8
;; 6 3
7 3
8 5
9 6
11
000
'& 3) S
742
7
S 4 2
8
S ¥ 2
7
8
9 J 2
5
8
9
9 J 2
8 5
9
10 5 2
8 5
9
10 5 2
8 5
9
;; 6 3
5
8
9 6
;; 6 3
5 3
8
9 6
12
000
S 3) S
S ¥ 2
8
842
8
9 J 2
8 5
9
9 J 2
5
8
9
10 5 2
8 5
9
;o 5 2
8
5
9
;; 6 3
8
9 6
10
;; e j
8 5
9 6
;2 6 3
5
9
10 6
EPA QA/G-9S
179
February 2006

-------
                               TABLE A-19.  CRITICAL VALUES FOR THE QUANTILE TEST (CONT.)
                                   m is the number of background samples, n is the number of site samples, and c is the number values larger than the b\   quantile.
                                    If s is greater than or equal to the table value, q^ then reject the null hypothesis of no difference at the given significance level.

m
13
14
15
16
17
18
19
20
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
n
4
000
'& 3) 8
842
432
4
952
432
4
952
432
4
10 5 2
432
4
10 5 2
432
4
11 6 3
2
4 3
4 3
11 6 3
0
4 3
4 3
12 6 3
2
4 3
4 3
5
000
'& 3) S
9 J 2
3 2
5
4
9 5 2
3 2
5
4
;o J 2
3 2
5
4
;o J 2
5 2
4
;; 6 3
5
4 3
11 6 3
3 3
5
4
12 6 3
3 2
5
4 3
;2 6 3
3 2
5
4 3
6
000
S 3) S
9 5 2
2
5 4
6 5
;o J 2
5 2
4
6 5
10 5 2
5 2
4
6 5
;; e 3
5
4 3
6 5
11 6 3
5
4 3
6 5
;2 6 3
5
4
653
12 6 3
5
4
653
;3 7 3
5
4
653
7
000
S 3) S
;o s 2
6 4
7 5
10 5 2
2
6 4
7 5
;; 6 s
4
6 3
7 5
;; e 3
643
7 5
12 6 3
643
7 5
;2 6 3
643
7 5
13 7 3
4
6 3
7 5
;3 7 3
643
7 5
8
000
'& 3) 8
;o J 2
6
4
7 5
;; e 3
6 4
753
8 6
;; e 3
6 4
3
7 5
;2 6 3
6 4
7 3
8 5
;2 6 3
6 4
3
7 5
;3 7 3
753
8 6
;3 7 3
6 4
753
8 6
;•# 7 3
4
7 3
8 5
9
000
S 3) 8
;; e 3
3
7 5
8 6
;; e 3
753
8 6
12 6 3
753
8 6
;2 6 3
4
753
8 6
;j 7 3
753
8 6
13 7 3
753
8 6
;•# 7 3
7
5 3
8 6
;•# 7 j
753
8 6
10
000
o1 3) S
;; e j
7 3
8 5
9 6
12 6 3
8 5
9 6
;2 6 J
7 3
8 5
9 6
;j 7 3
5
8 3
9 6
;j 7 j
853
9 6
;•# 7 3
853
9 6
;•# 7 3
7
853
9 6
;j s 3
853
9 6
11
000
'& 3) S
;2 6 3
8 5
9 6
12 6 3
8 5
9 6
;3 7 3
853
6
9 7
;3 7 3
5 3
8
9 6
;•# 7 3
8 5
3
9 6
;•# 7 3
5
8 3
9 6
;j s 3
8 5
963
10 7
;j s 3
8 5
6 3
9 7
12
000
S 3) S
;2 6 3
8 3
9 5
10 6
;3 7 3
3
9 6
10 7
13 7 3
8 3
9 6
10 7
W 7 3
5 3
9 6
10 7
;•# 7 3
853
9 6
10 7
;j s 3
3
9 6
10 7
;j s 3
963
10 7
;e s ¥
964
10 7
EPA QA/G-9S
180
February 2006

-------
                               TABLE A-19.  CRITICAL VALUES FOR THE QUANTILE TEST (CONT.)
                                   m is the number of background samples, n is the number of site samples, and c is the number values larger than the b\   quantile.
                                    If s is greater than or equal to the table value, q^ then reject the null hypothesis of no difference at the given significance level.

m
4
5
6
7
8
9
10
11
12
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
n
13
pop
t/l ^4 *O
O t/l O
842
8
952
9
952
8
9
10 5 2
5
9
10
10 5 2
5
9
10
11 6 3
9 6
10
11 6 3
9 6
10
12 6 3
9 6
10
12 6 3
5
9
10 6
14
pop
t/l ^4 *O
O t/l O
952
9
952
9
10 5 2
9
10
10 5 2
5
9
10
11 6 3
9
10 6
11
11 6 3
9
6
10
12 6 3
10 6
11
12 6 3
9
10 6
11
13 7 3
6
10
11 7
15
pop
t/l ^4 *O
O t/l O
9 J 2
9
;o s 2
10
;o J 2
9
10
;; e 3
6
10
11
;; 6 3
10 6
11
12 6 3
10 6
11
12 6 3
10 6
11
;s 7 3
10 6
11 7
;s 7 3
6
10
11 7
16
pop
t/l ^4 *O
O t/l O
;o J 2
10
;o J 2
10
11 6 3
10
11
;; 6 3
6
10
11
12 6 3
10 6
11
12
;2 6 3
10
6
11
13 7 3
11 7
12
13 7 3
10
11 7
12
;•# 7 j
6
11
12 7
17
pop
t/l ^4 *O
O t/l O
10 5 2
10
;; 6 3
n
11 6 3
10
11
12 6 3
6
11
12
12 6 3
6
11
12
13 7 3
11 7
12
;j 7 j
11 7
12
14 7 3
11
7
12
14 7 3
11 7
12
18
pop
t/l ^4 ^O
O t/l O
;; e j
n
;; 6 3
n
12 6 3
11
12
12 6 3
11
12
13 7 3
12 7
13
13 7 3
11
7
12
14 7 3
12 7
13
14 7 3
11
12 7
13
15 8 3
1
12
13 8
19
pop
t/l ^4 ^O
O t/l O
;; e j
n
;2 e j
12
12 6 3
11
12
13 7 3
7
12
13
13 7 3
7
12
13
14 7 3
12
7
13
14 7 3
12 7
13
;j s 3
12
13 8
14
15 8 3
7
12
13 8
20
pop
t/l ^4 ^O
O t/l O
;2 6 J
12
12 6 3
12
;j 7 j
12
13
13 7 3
7
12
13
14 7 3
1
13
14
14 7 3
12
7
13
;j 8 3
13 8
14
15 8 3
12
13 8
14
16 8 4
13 8
U
EPA QA/G-9S
181
February 2006

-------
                               TABLE A-19.  CRITICAL VALUES FOR THE QUANTILE TEST (CONT.)
                                   m is the number of background samples, n is the number of site samples, and c is the number values larger than the b\   quantile.
                                    If s is greater than or equal to the table value, q^ then reject the null hypothesis of no difference at the given significance level.

m
13
14
15
16
17
18
19
20
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
c
0.10
0.05
0.01
n
13
000
S 3) 8
13 7 3
9
6
10 7
13 7 3
3
9 6
10 7
14 7 3
9 3
10 6
11 7
14 7 3
3
9 6
10 7
15 8 3
9 3
10 6
11 7
15 8 3
9 3
6
10 7
16 8 4
9
10 6 4
11 7
16 8 4
9
10 6 4
11 7
14
000
S 3) 8
13 7 3
9
10 6
11 7
;•/ 7 3
10 6
11 7
;•# 7 j
9 3
10 6
11 7
15 8 3
6 3
10
11 7
;j s 3
6 3
10
11 7
;e s ¥
10 6 4
11 7
;e s ¥
10 6 4
11 7
17 9 4
6
10 7 4
11 8
15
000
S 3) 8
14 7 3
10 6
11
12 7
;•/ 7 j
10 6
11 7
;j s 3
10
11 7
12 8
;j 8 3
10 6
7
11 8
16 8 4
10 6
11 7 4
12 8
16 8 4
10 6
11 4
12 7
;7 9 ¥
10
11 7 4
12 8
17 9 4
10
11 7 4
12 8
16
000
S 3) S
14 7 3
10 6
11
12 7
15 8 3
11 7
12 8
15 8 3
11 7
12 8
;e 8 4
4
11 7
12 8
16 8 4
6
1174
12 8
17 9 4
1174
12 8
17 9 4
1174
12 8
18 9 4
1174
12 8
17
000
S 3) 8
15 8 3
11
12 7
13 8
15 8 3
11 7
12 8
;
-------
                                APPENDIX B:




                                REFERENCES
EPA QA/G-9S                           183                           February 2006

-------
                             APPENDIX B: REFERENCES

       This appendix provides references for the topics and procedures described in this
document.  The references are broken into three groups: Primary, Basic Statistics Textbooks, and
Secondary. This classification does not refer in any way to the subject matter content but to the
relevance to the intended audience for this document, ease in understanding statistical concepts
and methodologies, and accessibility to the non-statistical community. Primary references are
those thought to be of particular benefit as hands-on material, where the degree of sophistication
demanded by the writer seldom requires extensive training in statistics; most of these references
should be on an environmental statistician's bookshelf.  Users of this document are encouraged
to send recommendations on additional references to the address listed in the Foreword.

       Some sections within the chapters reference materials found in most introductory
statistics books. This document uses Walpole and Myers (1985), Freedman, Pisani, Purves, and
Adhakari (1991), Mendenhall  (1987), and Dixon and Massey (1983).  Table B-l (at the end of
this appendix) lists specific chapters in these books where topics contained in this guidance may
be found. This list could be extended much further by use of other basic textbooks; this is
acknowledged by the simple statement that further information is available from introductory
text books.

       Some important books specific to the analysis of environmental data include: Gilbert
(1987), an excellent all-round  handbook having strength in sampling, estimation, and hot-spot
detection; Gibbons (1994), a book specifically concentrating on the application of statistics to
groundwater problems with emphasis on method detection limits, censored data, and the
detection of outliers;  and Madansky (1988), a slightly more theoretical volume with important
chapters  on the testing for Normality, transformations, and testing for independence. In addition,
Ott (1995) describes modeling, probabilistic processes, and the Lognormal distribution of
contaminants, and Berthouex and Brown (1994) provide an engineering approach to problems
including estimation, experimental design and the fitting of models. Millard and Neerchal
(2001) has excellent discussions of applied statistical methods using the software package S-
Plus.  Ginevan and Splitstone (2004) contains applied examples of many of the statistical tests
discussed in this guidance and includes some more sophisticated tests for the more  statistically
minded.  Gibbons and Coleman (2001) apply sophisticated statistical theory to environmental
issues, but the text is  not intended for the casual reader.

B.I    CHAPTER 1

       Chapter 1 establishes the framework of qualitative and quantitative criteria against which
the data that has been collected will be assessed.  The most important feature of this chapter is
the concept of the test of hypotheses framework which is described in any introductory textbook.
A non-technical exposition of hypothesis testing is also to be found in U.S. EPA (2000, 1994b)
which provides guidance on planning for environmental data collection.  An application of the
DQO Process to geostatistical  error management may be found in Myers (1997).
EPA QA/G-9S                                184                                February 2006

-------
       A full discussion of sampling methods with the attendant theory are to be found in
Gilbert (1987) and a shorter discussion may be found in U.S. EPA (1989).  Cochran (1966) and
Kish (1965) also provide more advanced theoretical concepts but may require the assistance of a
statistician for full comprehension. More sophisticated sampling designs such as composite
sampling,  adaptive sampling, and ranked set sampling, will be discussed in future Agency
guidance.

B.2    CHAPTER 2

       Standard statistical quantities and graphical representations are discussed in most
introductory statistics books. In addition, Berthouex & Brown (1994) and Madansky (1988)
both contain thorough discussions on the subject.  There are also several textbooks devoted
exclusively to graphical representations, including Cleveland (1993), which may contain the
most applicable methods for environmental data, Tufte (1983), and Chambers, Cleveland,
Kleiner and Tukey (1983).

       Two EPA sources for temporal data that keep theoretical discussions to a minimum are
U.S. EPA  (1992a) and U.S. EPA (1992b).  For a more complete discussion on temporal data,
specifically time series analysis, see Box and Jenkins (1970), Wei (1990), or Ostrum (1978).
These more complete references provide both theory and practice; however, the assistance of a
statistician may be needed to adapt the methodologies for immediate use.  Theoretical
discussions of spatial data may be found in Journel and Huijbregts (1978), Cressie (1993),  and
Ripley(1981).

B.3    CHAPTER 3

       The hypothesis tests covered in this edition of the guidance are well known and straight-
forward; basic statistics texts cover these subjects. Besides basic statistical text books,
Berthouex & Brown (1994), Hardin and Gilbert (1993), and U.S. EPA (1989, 1994) may be
useful to the reader.  In addition, there are some statistics books devoted specifically to
hypothesis testing, for example, see Lehmann (1991). These books may be too theoretical for
most practitioners, and their application to environmental situations may not be obvious.

       The statement in this document that the sign test requires approximately 1.225 times as
many observations as the Wilcoxon rank sum test to achieve a given power at a given
significance level is attributable to Lehmann (1975).

B.4    CHAPTER 4

       This chapter is essentially  a compendium of statistical tests drawn mostly from the
primary references and basic statistics textbooks. Gilbert (1987) and Madansky (1988) have an
excellent collection of techniques and U.S. EPA (1992a) contains techniques  specific to water
problems.
EPA QA/G-9S                               185                               February 2006

-------
       For Normality (Section 4.2), Madansky (1988) has an excellent discussion on tests as
does Shapiro (1986). For trend testing (Section 4.3), Gilbert (1987) has an excellent discussion
on statistical tests and U.S. EPA (1992) provides adjustments for trends and seasonality in the
calculation of descriptive statistics.

       There are several very good textbooks devoted to the treatment of outliers (Section 4.4).
Two authoritative texts are Barnett and Lewis (1978) and Hawkins (1980). Additional
information is also to be  found in Beckman and Cook (1983) and Tietjen and Moore (1972).

       Tests for dispersion (Section 4.5) are described in the basic textbooks and examples are
to be found in U.S. EPA  (1992a).  Transformation of data (Section 4.6) is a sensitive topic and
thorough discussions may be found in Gilbert (1987), and Dixon and Massey (1983). Equally
sensitive is the analysis of data where some values are recorded as non-detected (Section 4.7);
Gibbons (1994) and U.S. EPA (1992a) have relevant discussions and examples.

B.5    CHAPTER  5

       Chapter 5 discusses some of the philosophical issues related to hypothesis testing which
may help in understanding and communicating the test results.  Although there are no specific
references for this chapter, many topics (e.g., the use of p-values) are discussed in introductory
textbooks. Future editions of this guidance will be expanded by incorporating practical
experiences from the environmental community into this chapter.

B.6    LIST OF REFERENCES

B.6.1  Primary References

Berthouex, P.M., and L.C.  Brown, 1994. Statistics for Environmental Engineers.  Lewis, Boca
       Raton, FL.

Gibbons, R.D. andD.  E. Coleman, 2001. Statistical Methods for Detection and Quantification
       of Environmental Contamination.  John Wiley, New York, NY.

Gilbert, R.O., 1987.  Statistical Methods for Environmental Pollution Monitoring. John Wiley,
       New York, NY.

Ginevan, M.E. and D.E.  Splitstone, 2004. Statistical Tools for Environmental Quality
Measurement.  Chapman and Hall, Boca Raton, FL.

Myers, J.C., 1997. Geostatistical Error Management, John Wiley, New York,  NY.

Ott, W.R., 1995. Environmental Statistics and Data Analysis. Lewis, Boca Raton, FL.
EPA QA/G-9S                               186                               February 2006

-------
U.S. Environmental Protection Agency ,  1992a. Addendum to Interim Final Guidance
       Document on the Statistical Analysis of Ground-Water Monitoring Data atRCRA
       Facilities. EPA/530/R-93/003. Office of Solid Waste. (NTIS: PB89-151026)

U.S. Environmental Protection Agency, 2000. Guidance for the Data Quality Objectives
       Process (EPA QA/G4). EPA/600/R-96/055. Office of Research and Development.

U. S. Environmental Protection Agency, 2002. Choosing a Sampling Design for Environmental
       Data Collection (EPA QA/G5S).  Office of Environmental Information.

U.S. Environmental Protection Agency, 2004. Data Quality Assessment: A Reviewer's Guide
       (Final Draft) (EPA QA/G9R).  Office of Environmental Information.

B.6.2   Basic Statistics Textbooks

Dixon, W.J., and FJ. Massey, Jr., 1983.  Introduction to Statistical Analysis (Fourth Edition).
       McGraw-Hill, New York, NY.

Freedman, D. R.  Pisani, R. Purves, and A. Adhikari, 1991. Statistics. W.W. Norton & Co.,
       New York, NY.

Mendenhall, W., 1987.  Introduction to Probability and Statistics (Seventh Edition).  PWS-Kent,
       Boston, MA.

Walpole, R., and R. Myers, 1985. Probability and Statistics for Engineers and Scientists (Third
       Ed.). MacMillan, New York, NY.

B.6.3   Secondary References

Aitchison, J., 1955. On the distribution of a positive random variable having  a discrete
       probability mass at the origin. Journal of American Statistical Association 50(272):901-8

Barnett, V., and T. Lewis, 1978.  Outliers in Statistical Data.  John Wiley, New York, NY.

Beckman, R.J., andR.D. Cook, 1983.  Outlier	s, Technometrics 25:119-149.

Box, G.E.P., and G.M. Jenkins, 1970.  Time Series Analysis, Forecasting, and Control.  Holden-
       Day, San Francisco, CA.

Chambers, J.M, W.S. Cleveland,  B. Kleiner, and P.A. Tukey, 1983.  Graphical Methods for
       Data Analysis.  Wadsworth & Brooks/Cole Publishing Co., Pacific Grove, CA.

Chen, L., 1995. Testing the mean of skewed distributions, Journal of the American Statistical
       Association 90:767-772.

EPA QA/G-9S                               187                               February 2006

-------
Cleveland, W.S., 1993.  Visualizing Data. Hobart Press, Summit, NJ.
Cochran, W.G., 1966. Sampling Techniques (Third Edition).  John Wiley, New York, NY.

Cohen, A.C., Jr.  1959.  Simplified estimators for the normal distribution when samples are
       singly censored or truncated, Technometrics 1:217-237.

Conover, W.J., 1980. PracticalNonparametric Statistics (Second Edition). John Wiley, New
       York, NY.

Cressie, N., 1993. Statistics for Spatial Data.  John Wiley, New York, NY.

D'Agostino, R.B., 1971.  An omnibus test of normality for moderate and large size samples,
       Biometrika 58:341-348.

David, H.A., H.O. Hartley, and E.S.  Pearson, 1954. The distribution of the ratio, in a single
       normal sample, of range to standard deviation, Biometrika 48:41-55.

Dixon, W.J., 1953. Processing data for outliers, Biometrika 9:74-79.

Filliben, J.J., 1975. The probability plot correlation coefficient test for normality, Technometrics
       17:111-117.

Geary, R.C., 1947. Testing for normality, Biometrika 34:209-242.

Geary, R.C., 1935. The ratio of the mean deviation to the standard deviation as a test of
       normality, Biometrika 27:310-32.

Gibbons, R. D., 1994. Statistical Methods for Groundwater Monitoring. John Wiley, New
       York, NY.

Grubbs, F.E., 1969.  Procedures for detecting outlying observations in samples, Technometrics
       11:1-21.

Hardin, J.W., andR.O. Gilbert, 1993. Comparing Statistical  Tests for Detecting Soil
       Contamination Greater than Background, Report to U.S. Department of Energy,  PNL-
       8989, UC-630, Pacific Northwest Laboratory, Richland, WA.
Hawkins, D.M., 1980. Identification of Outliers.  Chapman and Hall, New York, NY.

Hochberg, Y., and A. Tamhane, 1987. Multiple Comparison Procedures.  John Wiley, New
       York, NY.

Journel, A.G., and C.J. Huijbregts, 1978. Mining Geostatistics. Academic Press, London.

Kish, L., 1965. Survey Sampling. John Wiley, New York, NY.

EPA QA/G-9S                               188                               February 2006

-------
Kleiner, B., and J.A. Hartigan, 1981.  Representing points in many dimensions by trees and
       castles, Journal of the American Statistical Association 76:260.

Lehmann, E.L., 1991.  Testing Statistical Hypotheses. Wadsworth & Brooks/Cole Publishing
       Co., Pacific Grove,  CA.

Lehmann, E.L., 1975.  Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Inc.,
       San Francisco,  CA.

Lilliefors, H.W., 1969. Correction to the paper "On the Kolmogorov-Smirnov test for normality
       with mean and  variance unknown," Journal of the American Statistical Association
       64:1702.

Lilliefors, H.W., 1967. On the Kolmogorov-Smirnov test for normality with mean and variance
       unknown, Journal of the American Statistical Association 64:399-402.

Madansky, A., 1988. Prescriptions for Working Statisticians.  Springer-Verlag, New York, NY.

Millard, S.P. and N.K. Neerchal, 2001.  Environmental Statistics with S-Plus. CRC Press, Boca
       Raton, FL.

Ostrum, C.W., 1978.  Time Series Analysis (Second Edition). Sage University Papers  Series,
       Vol 9. Beverly Hills and London.

Ripley, B.D., 1981. Spatial Statistics. John Wiley and Sons, Somerset, NJ.

Rosner, B., 1975. On the detection of many outliers, Technometrics 17:221-227'.

Royston, J.P., 1982. An  extension of Shapiro and Wilk's W test for normality to large samples,
       Applied Statistics 31:161-165.

Sen, P.K., 1968a. Estimates of the regression coefficient based on Kendall's tau, Journal of the
       American Statistical Association 63:1379-1389.

Sen, P.K., 1968b. On a class of aligned rank order tests in two-way layouts, Annals of
       Mathematical Statistics 39:1115-1124.
Shapiro, S., 1986. Volume  3: How to Test Normality and Other Distributional Assumptions.
       American Society for Quality Control, Milwaukee, WI.

Shapiro, S., and M.B.  Wilk, 1965. An analysis of variance test for normality (complete
       samples), Biometrika 52:591-611.

Stefansky, W., 1972.  Rejecting outliers in factorial designs, Technometrics 14:469-478.

EPA QA/G-9S                               189                               February 2006

-------
Stephens, M.A., 1974. EDF statistics for goodness-of-fit and some comparisons, Journal of the
      American Statistical Association 90:730-737.

Tietjen, G.L., and R.M. Moore, 1972.  Some Grubbs-type statistics for the detection of several
      outliers, Technometrics 14:583-597.

Tufte, E.R., 1983. The Visual Display of Quantitative Information.  Graphics Press, Cheshire,
      CN.

Tufte, E.R., 1990. Envisioning Information. Graphics Press, Cheshire, CN.

U. S. Environmental Protection Agency, 1989. Methods for Evaluating the Attainments of
      Cleanup Standards: Volume 1: Soils and Solid Media.  EPA/230/02-89-042. Office of
      Policy, Planning, and Evaluation. (NTIS: PB89-234959;

U. S. Environmental Protection Agency, 1992b. Methods for Evaluating the Attainments of
      Cleanup Standards: Volume 2: Ground Water. EPA/23 O/R-92/014.  Office of Policy,
      Planning, and Evaluation.  (NTIS: PB94-138815)

U. S. Environmental Protection Agency, 1994. Methods for Evaluating the Attainments of
      Cleanup Standards: Volume 3: Reference-Based Standards.  EPA/230/R-94-004.  Office
      of Policy, Planning, and Evaluation.  (NTIS: PB94-176831)

Walsh, I.E., 1958. Large sample nonparametric rejection of outlying observations, Annals of the
      Institute of Statistical Mathematics 10:223-232.

Walsh, I.E., 1953. Correction to "Some nonparametric tests of whether the largest observations
      of a set are too large or too small," Annals of Mathematical Statistics 24:134-135.

Walsh, I.E., 1950. Some nonparametric tests of whether the largest observations of a set are too
      large or too small, Annals of Mathematical Statistics 21:583-592.

Wang, Peter C.C., 1978.  Graphical Representation of Multivariate Data. Academic Press, New
      York, NY.

Wegman, Edward J., 1990. Hyperdimensional data analysis using parallel coordinates, Journal
      of the American Statistical Association 85: 664.

Wei, W.S., 1990. Time Series Analysis (SecondEdition). Addison Wesley, Menlo Park, CA.
EPA QA/G-9S                               190                               February 2006

-------