Guidance for Data Quality Assessment, Pre-Publication Copy (QA96 version) Final


       GUIDANCE FOR
DATA QUALITY ASSESSMENT
      Pre-Publication Copy
          EPAQA/G-9

           (QA96 Version)
United States Environmental Protection Agency
       Quality Assurance Division

        Washington, DC 20460
             FINAL

          FEBRUARY 1996

-------
                                          FOREWORD)

        This is the 19% (QA96) version of Guidance for Data Quality Assessment, EPA QA/G-9. The
Environmental Protection Agency (EPA) has developed the Data Quality Assessment (DQA) Process as an
important tool for project managers and planners to determine whether the type, quantity, and quality of date
needed to support Agency decisions has been achieved. This guidance is the culmination of experiences in
the design and statistical analyses of environmental data in different Program Offices at the EPA. Many
elements of prior guidance, statistics, and scientific planning have been incorporated into this document

        This document provides general guidance to organizations on assessing data quality criteria and
performance specifications for decision making. This guidance assumes that an appropriate Quality System
has been established and that planning for data collection has been achieved using a scientifically-based
information collection strategy. An overview of the Agency's recommended data collection procedure, the
DQO Process, is included in this guidance in Chapter 1 and EPA QA/G-4.

        Guidance for Data Quality Assessment is'distinctly different from other guidance documents; it is
not intended to be read in a linear or continuous fashion. The intent of the document is for it to be used as a
"tool-box" of useful techniques in assessing the quality of data. The overall structure of the document will
enable the analyst to investigate many different problems using a systematic methodology. The methodology
consists of five steps that should be iterated between them as necessary:

        (i)      Review the Data Quality Objectives
        (ii)     Conduct a Preliminary Data Review
        (iii)    Select the Statistical Test
        (iv)    Verify the Assumptions of the Test
        (v)     Draw Conclusions From the Data

        This approach closely parallels the activities of a statistician analyzing a data set for the first time.
The five step procedure is not intended to be a definitive analysis of a project or problem, but provide an
initial assessment on the "reasonableness" of the ^atfl tnfl* have been generated. Sophisticated statistical
analysis is often not necessary unless special or unusual circumstances have been encountered in the
generation or collection of the data or the analysis is planned in detail before the data are collected.  This
guidance is directed towards the analysis of relatively small data sets containing data that have been collected
in a relatively simple fashion.  The analysis of survey data containing large data sets or a complex sampling
scheme is best left for statistical experts.

        This document is a product of the collaborative effort of many quality management professionals
throughout the EPA and the environmental community ft has been peer reviewed by the EPA Program
Offices, Regional Offices, and Laboratories.  Many valuable cnmmentg and suggestions have beea
incorporated to make it more useful, and additional suggestions to improve its effectiveness are sought. The
Quality Assurance Division has the Agency lead for the development of statistical quality assurance
techniques and future editions of this guidance will contain some of these recent developments.

        This document is one of a series of quality management guidance documents that the EPA Quality
Assurance Division (QAD) has prepared to assist users in implementing the Agency-wide Quality System.
Other related documents currently available or planned include:
EPA QA/G-9                                     i                                           QA96

-------
       EPA QA/G-4   Guidance for The Data Quality Objectives Process

       EPAQA/G-4D  DEFT Software for the Data Quality Objectives Process

       EPA QA-G-4R  Guidance for the Data Quality Objectives Process for Researchers (planned)

       EPA QA/G-4S -Guidance for the Data Quality Objectives Process (Superfiwd)

       EPA QA/G-S   Guidance for Quality Assurance Project Plans (draft)

       EPA QA/G-5S  Guidance on Sampling Plans (planned)

       EPA QA/G-6   Guidance for the Preparation of Standard Operating Procedures (SOPs)for
                     Quality-Related Documents

       EPA QA/G-9D  Data Quality Evaluation Statistical Tools (DataQUEST)

       The External Comment Draft EPA QA/G-S should be available April 19%, the Final Version of
EPA QA/G-4S should be available July 1996, and the External Comment Draft EPA QA/G-4R and
QA/G-SS should be available October 19%.

       This Aliment is intmdcd to be a "living document" that will be updated annually to incorporate new
topics and revisions or refinements to existing procedures. Comments received on this 19% version will be
considered for inclusion in subsequent versions.  In addition, user-friendly PC-based software (EPA QA/G-
9D) to supplement this guidance is being developed and should be available from QAD in August 1996.

        Please send your written comments on Guidance for Data Quality Assessment to:

              Quality Assurance Division (8724)
              Office of Research and Development
              U.S. Environmental Protection Agency
              401M Street, SW
              Washington, DC 20460
              (202)260-5763
              FAX (202) 260-4346
              E-mail: qad@epamail.epa.gov
 EPAQA/G-9                                  ii                                       QA96

-------
                               TABLE OF CONTENTS

                                                                            Page
INTRODUCTION	 0-1
       0.1    PURPOSE AND OVERVIEW	  0-1
       0.2    DQA AND THE DATA LIFE CYCLE	   0-2
       0.3    THE 5 STEPS OF THE DQA PROCESS	 0-2
       0.4    INTENDED AUDIENCE	   0-3
       0.5    ORGANIZATION	   0-4
       0.6    SUPPLEMENTAL SOURCES	     0-4
       0.7    SCOPE AND LIMITATIONS	 0-4

STEP 1: REVIEW DQOs AND THE SAMPLING DESIGN 	1.1-1
       1.1    OVERVIEW AND ACTIVITIES	U -1
             1.1.1  Review Study Objectives		1.1-1
             1.1.2  Translate Objectives into Statistical Hypotheses	1.1-2
             1.1.3  Develop Limits on Decision Errors 	1.1-2
             1.1.4  Review Sampling Design	1.1-3
       1.2    DEVELOPING THE STATEMENT OF HYPOTHESES	1.2-1
       1.3    DESIGNS FOR SAMPLING ENVIRONMENTAL MEDIA 	1.3-1
             1.3.1  Authoritative Sampling	1.3-1
             1.3.2  Probability Sampling	1.3-1
                   1.3.2.1 Simple Random Sampling	-.	1.3-1
                   1.3.2.2 Sequential Random Sampling	1.3-2
                   1.3.2.3 Systematic Samples	1.3-2
                   1.3.2.4 Stratified Samples	1.3-2
                   1.3.2.5 Compositing Physical Samples	1.3-3
                   1.3.2.6 Other Sampling Designs	1.3-3

STEP 2: CONDUCT A PRELIMINARY DATA REVIEW	2.1-1
      2.1    OVERVIEW AND ACTIVITIES	2.1-1
             2.1.1  Review Quality Assurance Reports	2.1-1
             2.1.2  Calculate Basic Statistical Quantities	2.1-2
             2.1.3  GraphtheData	     21-2
      2.2    STATISTICAL QUANTITIES	2.2-1
             2.2.1  Measures of Relative Standing  	2.2-1
             2.2.2  Measures of Central Tendency	2.2-2
             2.2.3  Measures of Dispersion	..2.2-2
             2.2.4  Measures of Association		   22-5
      2.3    GRAPHICAL REPRESENTATIONS	'..'.'.'.'.'.'.'.'. 2.3- 1
             2.3.1  Histogram/Frequency Plots	2.3-1
             2.3.2  Stem-and-Leaf Plot	2.3-3
             2.3.3  Box and Whisker Plot	2.3-3
             2.3.4  Ranked Data Plot	2.3-6
             2.3.5  QuantilePlot	2.3-8
             2.3.6 Normal Probability PloS (Quantile-Quantile Plot)	2.3-10
EPAQA/G-9                              iii                                   QA96

-------
                                                                                 Page
             2.3.7   Plots for Two or More Variables	2.3-13
                    2.3.7.1  Plots for Individual Data Points  	2.3-13
                    2.3.7.2  Scatter Plot 	.2.3-14
                    2.3.7.3  Extensions of the Scatter Plot	2.3-15
                    2.3.7.4  Empirical Quantile-Quantile Plot	2.3-16
             2.3.8   Plots for Temporal Data	 .,2.3 -18
                    2.3.8.1  TimePlot	 2.3 - 19
                    2.3.8.2  Plot of the Autocorrelation Function (Correlogram)  	2.3-20
             2.3.9   Plots for Spatial Data	2.3-23
                    2.3.9.1  Posting Plots	2.3-23
                    2.3.9.2  SymbdPlots	2.3-23
                    2.3.9.3  Other Spatial Graphical Representations		2.3-25

STEP 3: SELECT THE STATISTICAL TEST 	.3.1-
       3.1    OVERVIEW AND ACTIVITIES	3.1 -
             3.1.1   Select Statistical Hypothesis Test..	3.1-
             3.1.2   Identify Assumptions Underlying the Statistical Test	3.1 -
       3.2    TESTS OF HYPOTHESES ABOUT A SINGLE POPULATION 	3.2-
             3.2.1   Tests for a Mean	3.2 -
                    3.2.1.1  The One-Sample t-Test.	3.2-2
                    3.2.1.2  The Wilcoxon Signed Rank (One-Sample) Test for the Mean  	3.2-7
             3.2.2   Tests for a Proportion or Percentite	3.2-11
                    3.2.2.1  The One-Sample Proportion Test	3.2 - 11
             3.2.3   Tests for aMedian  	3.2-13
       3.3    TESTS FOR COMPARING TWO POPULATIONS	3.3-1
             3.3.1   Comparing Two Means 	3.3-1
                    3.3.1.1  Student's Two-Sample t-Test (Equal Variances)	3.3-2
                    3.3.1.2 Satterthwaite's Two-Sample t-Test (Unequal Variances)  	3.3-2
             3.3.2   Comparing Two Proportions or Percentiles	3.3-7
                    3.3.2.1  Two-Sample Test for Proportions	3.3-7
             3.3.3   Nonparametric Comparisons of Two Population	3.3-10
                    3.3.3.1  The Wilcoxon Rank Sum Test	3r3-10
                    3.3.3.2 The Quantile Test	3.3-14
             3.3.4   Comparing Two Medians 	3.3-14

STEP4: VERIFY THE ASSUMPTIONS OF THE STATISTICAL TEST	4.1-1
       4.1    OVERVIEW AND ACTIVITIES	4.1-1
             4.1.1   Determine Approach for Verifying Assumptions  	4.1-1
             4.1.2  Perform Tests of Assumptions	4.1-2
              4.1.3  Determine Corrective Actions	4.1-2
       4.2    TESTS FOR DISTRIBUTIONAL ASSUMPTIONS   	4.2-1
              4.2.1   Graphical Methods	4.2-3
              4.2.2  Shapirp-Wilk Test for Normality (the W test)  	4.2-3
              4.2.3  Extensions of the Shapiro-Wilk Test (Filliben's Statistic) 	4.2-3
              4.2.4  Coefficient of Variation	4.2-4
              4.2.5  Coefficient of Skewness/Coefficient of Kurtosis Tests	4.2-4
 EPAQA/G-9                                iv                                      QA96

-------
                                                                                   Page
             4.2.6   Range Tests	4.2-5
             4.2.7   Goodness-of-Fit Tests	4.2-7
             4.2.8   Recommendations	4.2 - 7
       4.3    TESTS FOR TRENDS	4.3-1
             4.3.1   Introduction	.			 4.3 -1
             4.3.2   Regression-Based Methods for Estimating and Testing for Trends  ..:	4.3-1
                    4.3.2.1 Estimating a Trend Using the Slope of the Regression Line	4.3 = 1
                    4.3.2.2 Testing for Trends Using Regression Methods  	4.3-2
             4.3.3   General Trend Estimation Methods		4.3-3
                    4.3.3.1 Sen's Slope Estimator	4.3-3
                    4.3.3.2 Seasonal Kendall Slope Estimator	4.3-3
             4.3.4   Hypothesis Tests for Detecting Trends		.'.	4.3-3
                    4.3.4.1 One Observation per Time Period for One Sampling Location	4.3-3
                    4.3.4.2 Multiple Observations per Time Period foe One Sampling
                           Location	4.3-7
                    4.3.4.3 Multiple Sampling Locations with Multiple Observations  	4.3-7
                    4.3.4.4 One Observation for One Station with Multiple Seasons 	4.3-9
             4.3.5   A Discussion on Tests for Trends 	4.3-10
       4.4    OUTLIERS	4.4-1
             4.4.1   Background	4.4-1
             4.4.2   Selection of a Statistical Test	4.4-2
             4.4.3   Extreme Value Test (Dixon's Test)	4.4-2
             4.4.4   Discordance Test	;	.	4.4-4
             4.4.5   Rome's Test	4.4-5
             4.4.6   Walsh's Tests	4.4-7
             4.4.7   Multivariate Outliers	4.4-7
       4.5    TESTS FOR DISPERSIONS	4.5-1
             4.5.1   Confidence Intervals for a Single Variance	4.5 -1
             4.5.2   The F-Test for the Equality of Two Variances		;..'— 4.5-1
             4.5.3   Bartlett's Test for the Eo^iality of Two or More Variances	4.5-1
             4.5.4   Levene's Test for the Equality of Two or More Variances	4.5-4
       4.6    TRANSFORMATIONS	4.6-1
             4.6.1   Types of Data Transformations	4.6-1
             4.6.2   Reasons for Data Transformations	4.6 - 2
       4.7    VALUES BELOW DETECTION LIMITS	4.7-1
             4.7.1   Less than 15%Nondetects - Substitution Methods	4.7-2
             4.7.2   Between 15-50%Nondetects .;	4.7-2
                    4.7.2.1 Cohen's Method	-.	4.7-2
                    4.7.2.2 Trimmed Mean	4.7-4
                    4.7.2.3 Winsorized Mean and Standard Deviation			4.7-5
             4.7.3   Greater than 50% Nondetects- Test of Proportions	4.7-6

STEP 5: DRAW CONCLUSIONS FROMTHEDATA	 5.1 -1
       5.1    OVERVIEW AND ACTIVITIES	5.1-1
             5.1.1   Perform the Statistical Hypothesis Test	5.1-1
             5.1.2   Draw Study Conclusions  	;	:	5.1-1
             5.1.3   Evaluate Performance of the Sampling Design	5.1-2

EPAQA/G-9                                 v                                      QA96

-------
       5.2    INTERPRETING AND COMMUNICATING THE TEST RESULTS	5.2-1
             5.2.1   Interpretation of p-Values..-...'	5.2-1
             5.2.2   "Accepting" vs. "Failing to Reject" the Null Hypothesis  	5.2-1
             5.2.3   Statistical Significance vs. Practical Significance	5.2-2
             5.2.4   Impact of Bias on Test Results	;. 5.2 - 2
             5.2.5   Quantity vs. Quality of Data	-.	5.2-5
             5.2.6   "Proof of Safety" vs. "Proof of Hazard"	5.2-6
                                  LIST OF APPENDICES
                                                                                  Page
A.     STATISTICAL TABLES	A -1
B
  •
                                    LIST OF FIGURES
Figure No.
0.2-1. DQ A in the Context of the Data Life Cycle	  0-2
2.3-1. Example of a Frequency Plot	2.3-1
2.3-2. Example of a Histogram	2.3-1
2.3-3. Exampleof a Box and Whisker Plot	2.3-3
2.3-4. Exampleof a Ranked Data Plot	•	2.3-6
2.3-5. Exampleof a Quantile Plot of Skewed Data	...—	.. 2.3 - 8
2.3-6. Normal Probability Paper	2.3-12
2.3-7. Example of Graphical Representations of Multiple Variables	2.3-13
2.3-8. Exampleof a Scatter Plot	-.	2.3-14
2.3-9. Exampleof a Coded Scatter Plot	2.3-15
2.3-10. Exampleof a Parallel Coordinates Plot	2.3-15
2.3-11. Exampleof a Matrix Scatter Plot	 2.3 -16
2.3.12. Example of a Time Plot Showing a Slight Downward Trend	..2.3-19
2.3-13. Exampleof a Correlogram	,	2.3-20
2.3-14. Exampleof a Posting Plot	2.3-23
2.3-15. Example of a Symbol Plot	•	2.3-24
4.2-1. Graph of a Normal and Logoannal Distribution	4.2-1
5.2-1. Illustration of Unbiased versus Biased Power Curves	5.2-5
                                     LISTOFTABLES
 Table No.          . .'  .   .
 1.2-1. Commonly Used Statements of Statistical Hypotheses	1.2-3
 2.3-1. Table for Calculating^ Correlogram	2.3-22
 4.2-1. Data for Examples in Sectioa 4.2	4-2 -1
 4.2-2. Tests for Normality	.'....:	4-2'2
 4.4-1. Recommendations for Selecting a Statistical Test for Outliers	'.'.	4-4 •2
 4.7-1. Guidelines for Analyzing Data with Nondetects	4.7-1
 EPA QA/G-9                                 vi
QA96

-------
                                     INTRODUCTION
0.1    PURPOSE AND OVERVIEW

       Data Quality Assessment (DQA) is the scientific and statistical evaluation of data to determine if
data obtained from environmental data operations are of the right type, quality, and quantity to support their
intended use. This guidance demonstrates how to use DQA in evaluating environmental data sets and
illustrates how to apply some graphical and statistical tools for performing DQA. The guidance focuses
primarily on using DQA in environmental decision making; however, the tools presented for preliminary data
review and verifying statistical assumptions are useful whenever environmental data are used, regardless of
whether the data are used for decision making.

       DQA is built on & fimdamentel premise: data quality, as a concept, is meaningful only when it
relates to the intended use of the data. Data quality does not exist in a vacuum; one must know in what
context a data set is to be used in order to establish & relevant yardstick for judging whether or not the data set
is adequate.  By using the DQA Process, one can answer two fundamental questions:

1.      Can the decision (or estimate) be made with the desired confidence, given the quality of the data set?

2.      How well can the sampling design be expected to perform over & wide range of possible outcomes?
       If the same sampling design strategy is used again for a similar study, would the data be expected to
       support the same intended use with the desired level of confidence, particularly if the measurement
       results turned out to be higher or lower than those observed in the current study?

       The first question addresses the data user's immediate needs. For example, if the data provide
evidence strongly in favor of one course of action over another, then the decision maker can proceed knowing
that the decision will be supported by unambiguous data. If, however, the data do not show sufficiently
strong evidence to favor one alternative, then the data analysis alerts the decision maker to this uncertainty.
The decision maker now is in a position to make an informed choice about how to proceed (such as collect
more or different data before making the decision, or proceed with the decision despite the relatively high, but
acceptable, probability of drawing an erroneous conclusion).

       The second question addresses the data user's potential future needs.  For example, if investigators
decide to use a certain sampling design at a different location from where the design was first used, they
should determine how well the design can be expected to perform given that the outcomes and environmental
conditions of this sampling event will be different from those of the original event Because environmental
conditions will vary from one location or time to another, the adequacy of the sampling design approach
should be evaluated over a broad range of possible outcomes fl"^ conditions.

OJ    DQA AND THE DATA LIFE CYCLE

       The data life cycle (depicted in Figure 0.2-1) comprises three steps: planning, implementation, and
assessment  During the planning phase, the Data Quality Objectives (DQO) Process (or some other
systematic planning procedure) is used to define quantitative and qualitative criteria for determining when,
where, and how many samples (measurements) to collect and a desired level of confidence. This information,
along with the sampling methods, analytical procedures, and appropriate quality assurance (QA) and quality


EPAQA/G-9                                 0-1                                        QA96

-------
control (QC) procedures, arc documented in the Qualify Assurance Project Plan (QAPP). Data are then
collected following the QAPP specifications. DQA completes the data life cycle by providing the assessment
needed fr> determine if the planning objectives were achieved During the assessment phase, the data are
validated and verified to ensure that the sampling and analysis protocols specified in the QAPP were
followed, and that the measurement systems performed in accordance with the criteria specified in the QAPP.
DQA then proceeds using the validated data set to determine if the quality of the data is satisfactory.
                     PLANNING
                  CteSa CiESty Otfccawsa Process
              Quality Assurance Project Ran Davetopmsrt
                          ^
                 IMPLEMENTATION
                FWd Dda CoaaoBon tnd AsaKfetoJ
              Qutfty Aauanoa / QuaBy CortroJ AriMBas
                    ASSESSMENT
                    DsftSl \^IHfa!ff5T'^46i'*^^ria™M*
                    Dsia Qu^lty Asaaswrart
                                                              WR/TS
       • Vtofyi
       oVfertfyi
                                                                     (dumsri
                         OUTPUT
/
         VAUDATEOVERFEDDATA
/
                                                                4
        DATA QUAJJTY A88S8&MEMT
        • RSVJSW OQOa oral dae&i
        • Conduct piufliitJnsfy
        ° SaSecJ otetteticel tost
        • Vtefyenunpttora
                                                                        OUTPUT
                                                     CONCLUSIONS DRAWN FROM DATA
                              /
    Figure 0^-1. DQA in the Context of the Data Life Cycle


 03     THE S STEPS OF THE BQA PROCESS

        The DQA Process involves five steps that begin with a review of the planning documentation and
 end with an answer to the question posed during the planning phase of the study. These steps roughly
 parallel the actions of an environmental statistician when analyzing a set of data. The five steps, which are
 described in detail in the remaining chapters of this guidance, are briefly summarized as follows:

 1.      Renew &e Data Qua^ Objectives (DQps) and Sampling Design: Review the DQO outputs to
        assure that they are still applicable. If DQOs have not been developed, specify DQOs before
        evaluating the data (e.g., for environmental decisions, define the statistical hypothesis and specify
        tolerable limits on decision errors; for estimation problems, define an acceptable confidence or
        probability interval width). Review the sampling design and data collection documentation for
        consistency with the DQOs.

 1.      Con fact a Preliminary Data Review: Review QA reports, calculate basic statistics, and generate
        graphs of the data. Use this information to learn about the structure of the data and identify patterns,
        relationships, or potential anomalies.
 EPAQA/G-9
0-2
                                       QA96

-------
3. Select the Statistical Tesi: Select the most appropriate procedure for summarizing and analyzing
the data, based on the review of the DQOs, the sampling design, and the preliminary data review.
Identify the key underlying assumptions that must hold for the statistical procedures to be valid

4. Verify the Assumptions of the Statistical Test: Evaluate whether the underlying assumptions hold,
or whether departures are acceptable, given the actual data and other information about the study.

5. Draw Conclusions from the Data: Perform the calculations required for the statistical test and
document the inferences drawn as a result of these calculations. If the design is to be used again,
evaluate the performance of the sampling design.

These five steps are presented in a linear sequence, but the DQA process is by its very nature iterative. For
example, if the preliminary data review reveals patterns or anomalies in the data set that are inconsistent with
the DQOs, then some aspects of the study planning may have to be reconsidered in Step 1. Likewise, if the
underlying assumptions of the statistical test are not supported by the data, then previous steps of the DQA
process may have to be revisited The strength of the DQA process is that it is designed to promote an
understanding of how well the data satisfy their intended use by progressing in a logical and efficient manner.

Nevertheless, it should be emphasized that the DQA process cannot absolutely prove that one has or
has not achieved the DQOs set forth during the planning phase of a study. This situation occurs because a
decision maker can never know the true value of the item of interest Data collection only provides the
investigators with an estimate of this, not its true value. Further, because analytical methods are not perfect,
they too can only provide an estimate of the true value of an environmental sample. Because investigators
make a decision based on estimated and not true values, they run the risk of making a wrong decision
(decision error) about the item of interest

O4 INTENDED AUDIENCE

This guidance is written for a broad audience of potential data users, data analysts, and data
generators. Data users (such as project managers, risk assessors, or principal investigators who are
responsible for making decisions or producing estimates regarding environmental characteristics based on
environmental data) should find this guidance useful for understanding and directing the technical work of
others who produce «nd analyze data. Data analysts (such as quality assurance specialists, or any technical
professional who is responsible for evaluating the quality of environmental data) should find this guidance to
be a convenient compendium of basic assessment tools. Data generators (such as analytical chemists, field
sampling specialists, or fcc*"vcfli support staff responsible for collecting and analyzing environmental
samples and reporting the resulting data values) should find this guidance useful for understanding how their
work will be used and for providing a foundation for improving the efficiency and effectiveness of the data
generation process.

OS ORGANIZATION

This guidance presents background information and statistical tools for performing DQA. Each
chapter corresponds to a step in the DQA Process and begins with an overview of the activities to be
performed for that step. Following the overviews in Chapters 1,2,3, and 4, specific graphical or statistical
tools are described and step-by-step procedures are provided along With examples.
EPAQA/G-9 0-3 QA96

-------
0.6    SUPPLEMENTAL SOURCES

       Many of the graphical and statistical tools presented in this guidance are also implemented in a user-
friendly, personal computer software program called DataQUEST (Data Quality Evaluation Statistical Tools,
EPA QA/G-9D). DataQUEST simplifies the implementation of DQA by automating many of the
recommended statistical tools. DataQUEST runs on most IBM-compatible personal computers using the
DOS operating system; see the DataQUEST User's Guide for complete information on the minimum
computer requirements.

       The main references in this document are important works having application to environmental
sampling and interpretation of data; most of these references are widely available within the scientific and
environmental communities.  The remaining references are either more detailed original academic articles or
are not as readily available to analysts. Two excellent Agency references for analyzing environmental data
are Guidance on the Statistical Analysis of Ground-Water Monitoring Data (EPA 1992a), a useful
compendium of statistical methods and procedures (many of which are incorporated in this document) for the
analysis of data generated by EPA's Office of Solid Waste; and Scout: A Data Analysis Program (EPA
1993b), a software program for analyzing multivariate data that includes methods for identifying multivariate
outliers, graphing the raw data, and displaying the results of principal component analysis.

0.7     SCOPE AND LIMITATIONS

        This guidance is intended to be & convenient compendium of practical methods for the environmental
scientist and manager. It focuses on measurement data obtained through sampling and analysis of
contaminants in environmental media. Statistical TV>m£nclfl*1"ie na3 fre™ kept to the minimum and there are
some areas that will require the input of an environmental statistician for complete analysis. The intent of the
document is to assist the non-statistician in the review and analysis of environmental data.

        This document represents the first edition of the DQA guidance, which will be followed by annual
updates. Readers are encouraged to send their suggestions for improvements and additions to the U.S. EPA
Quality Assurance Division- (The address is given in the Foreword.) The annual updates will refine existing
sections, present new tools and procedures, and expand the scope of application to additional types of
environmental problems.

        This first edition is intended to cover most of the core topics of DQA for regulatory compliance
decisions that involve spatially distributed contamination Most of the tools will also be applicable to
sampling data from hazardous waste sites or facilities under Superfund or RCRA. Many of the tools are
generally applicable and useful for other types of problems as well  Future editions of this guidance will
 address more thoroughly the problems and issues associated with analyzing sampling data from more
dynamic processes, such as effluent discharged to waterways and emissions dispersed in ambient air.  Future
 editions will also address other topics, such as analyzing results from designed experiments and other
 research studies, as well as environmental enforcement investigations.

        This guidance is explicitly not intended to cover certain topics that are important in some areas of
 environmental protection. For example, it does not address the important area of survey sampling involving
 the administration of interviews or questionnaires to people. This document is not intended to substitute for
 more thorough treatments of fundamental statistical concepts (as found in standard textbooks), nor is it
         to provide a forum for publishing original research (as found in scholarly journals).
 EPAQA/G-9                                 0-4                                        QA96

-------
STEP Is REVIEW DQOs AND THE SAMPLING DESIGN
THE DATA QUALITY -ASSESSMENT PROCESS
Raviswf DQOs and Sampling Design
Conduct Preliminary Data Review
Select the Statistical Test
Verify the Assumptions
Draw Conclusions From the Data
REVIEW DQOs AND SAMPLING DESIGN
Review the DQO outputs, tha tamping design, end.
any data cotection dpcurortation for consistency r?
DQOe have not been developed, defins tha statistical
hypotheeie and epecfytoJefabb Bmte on decision errors.
«Revtow Study Objscttvea
• Tranelate Objectives Into Statistical Hypdhsab
o Davetop LMto en Oadslon Entire
«Rovtaw Satnpi^ Deafen
daslgn
Step 1: Review DQOs and Sampling Design

@ Review the objectives of the study.
o If DQOs have not been developed, review section 1.1.1 and define these objectives.
° if DQOe were developed, review tha outputs from the DQO Process.

o Translate the data user's objectives into & statement of trwpiimwy statistical hypothesis.
o If DQOe hav« not been developed, review section* 1.1.2 and 1.2. end Tabte 1.2-1,
then develop a statement of the hypothesis based on the data user's objectives.
o If DQOs were developed, translate them Into a statement of the primary hypothess.

o Translate tha data user's objectives Into Emits on Type I or Type II decision errors.
o tfDQOs have not been developed, review section 1.1.3 and document the data
user's tolerable Smite on decision errors.
° If DQOs were developed, confirm the Emits on decision errori.

o Review the aampSng design and note any special features or potential problems. •
° Review the sampfing design for any deviations (sections 1.1.4 and 1.3).
EPA QA/G-9
QA96

-------
                 STEP It REVIEW DQOs AND THE SAMPLING DESIGN
                                                                                Page
1.1     OVERVIEW AND ACTIVITIES	1.1-1
       1.1.1   Review Study Objectives	 1.1-1
       1.1.2   Translate Objectives into Statistical Hypotheses	 1.1 - 2
       1.1.3   Develop Limits on Decision Errors	1.1-2
       1.1.4   Review Sampling Design	1.1-3
1.2     DEVELOPING THE STATEMENT OF HYPOTHESES	 1.2-1
1.3     DESIGNS FOR SAMPLING ENVIRONMENTAL MEDIA	 1.3 -1
       1.3.1   Authoritative Sampling	1.3-1
       1.3.2   Probability Sampling	 1.3-1
             1.3.2.1 Simple Random Sampling	1.3-1
             1.3.2.2 Sequential Random Sampling	1.3-2
             1.3.2.3 Systematic Samples		1.3-2
             1.3.2.4 Stratified Samples	1.3-2
             1.3.2.5 Compositing Physical Samples	1.3-3
             1.3.2.6 Other Sampling Designs	1.3-3

                                   LIST OF TABLES
Table No.                                                                        Page
1.2-1. Commonly Used Statements of Statistical Hypotheses	1.2 - 3

                                   LISTOFBOXES
Boa No.                                                                         Page
1.1-1: Example Applying the DQO Process Retrospectively	1.1-4
                         Probability Sampling Designs
Section
                         Simple Random Sampling
1.3.2.1
                         Sequential Random Sampling
1.3.2.2
                         Systematic Samples
1.3.2.3
                         Stratified Samples
1.3.2.4
                         Compositing Physical Samples
1.3.2.5
                         Adaptive Sampling
1.3 2.6
                         Ranked Set Sampling
1.3.2.6
 EPAQA/G-9
                            QA96

-------
CHAPTER 1 .
STEP is REVIEW DQOs AND THE SAMPLING DESIGN

1.1 OVERVIEW AND ACTIVITIES

The DQA Process begins by reviewing the key outputs from the planning phase of the data life cycle:
the Data Quality Objectives (DQOs), the Quality Assurance Project Plan (QAPP), and any associated
documents. The DQOs provide the context for understanding the purpose of the data collection effort and
establish the qualitative and quantitative criteria for assessing the quality of the data set for the intended use.
The sampling design (documented in the QAPP) provides important information about how to interpret the
data. By studying the sampling design, the analyst can gain an understanding of the assumptions under which
the design was developed, as well as the relationship between these assumptions and the DQOs. By
reviewing the methods by which the samples were collected, measured, and reported, the analyst prepares for
the preliminary data review and subsequent steps of the DQA Process.

Careful planning improves the representativeness and overall quality of a sampling design, the
effectiveness and efficiency with which the sampling and analysis plan is implemented, and the usefulness of
subsequent DQA efforts. Given the benefits of planning, the Agency has developed the DQO Process which
is a logical, systematic planning procedure based on the scientific method The DQO Process emphasizes the
planning and development of & sampling design to collect the right type, quality, and quantity of data needed
to support the decision. Using both the DQO Process and the DQA Process will help to ensure that the
decisions are supported by data of adequate quality; the DQO Process does so prospecttvefy and the DQA
Process doss so retrospectively.

When DQOs have not been developed during the planning phase of the study, it is necessary to
develop stateire"t9 of the data user's objectives prior to conducting DQA. The primary purpose of stating the
data user's objectives prior to analyzing the data is to establish appropriate criteria for evaluating the quality
of the data with respect to their intended use. Analysts who are not familiar with the DQO Process should
refer to the Guidance for the Data Quality Objectives Process, EPA QA/G-4 (1 994), a book on statistical
decision making using tests of hypothesis, or consult & statistician.

The remainder of this chapter addresses recommended activities for performing this step of DQA and
technical considerations that support these activities. The remainder of this section describes the
nrtivifW the first three of which will differ depending on whether DQQs have already been
developed for the study. Section 1.2 describes how to select the null and alternative hypothesis and section
1.3 presents a brief overview of different types of sampling designs.

1.1.1 Review Study Objectives

In this activity, the objectives of the study are reviewed to provide context for analyzing the data, [fa
planning process has been implemented before the data are collected, then this step reduces to reviewing the
documentation on the study objectives. If no planning process was used, the data user should:

Q Develop a concise definition of the problem (DQO Process Step 1) and the decision (DQO Process Step
2) for which the data were collected. This should provide the fundamental reason for collecting the
environmental data and identify all potential actions that could result from the data analysis.
EPAQA/G-9. 1.1-1 QA96

-------
a Identify if aity essentid information is missing (DQO Process Step 3). If so, either collect the missing
information before proceeding, or select a different approach to resolving the decision.

0 Specify the scale of decision making (any subpopulations of interest) and any boundaries on the study
(DQO Process Step 4) based on the sampling design. The scale of decision making is the smallest area
or time period to which the decision will apply. The sampling design and implementation may restrict
how small or how large this scale of decision making can be.

LO Translate Objectives into Statistics! Hypotheses

In this activity, the data user's objectives are used to develop a precise statement of the primary1
hypotheses to be tested using environmental data. A statement of the primary statistical hypotheses includes
a null hypothesis, which is a "baseline condition" that is presumed to be true in the absence of strong
evidence to the contrary, and an alternative hypothesis, which bears the burden of proof. In other words, the
baseline condition will be retained unless the alternative condition (the alternative hypothesis) is thought to be
true due to the preponderance of evidence. In general, such hypotheses consist of the following elements:

3 a population parameter of interest, which describes the feature of the environment that the data user is
investigating;

a a numerical value to which the parameter will be compared, such as a regulatory or risk-based threshold
or a similar parameter from another place (e.g., comparison to a reference site) or time (e.g., comparison
to a prior time); and

a the relation (such as "is equal to" or "is greater than") that specifies precisely how the parameter will he
compared to the numerical value. .
If DQOs were developed, the state""?"* of hypotheses already should be dreumfted in the outputs of Step 6
of the DQO Process. If DQOs have not been developed, then the analyst should consult with the datauserto
develop hypotheses *h?t address the data user's concerns. Section 1.2 describes in detail how to develop the
statement of hypotheses and includes a list of commoa encountered hypotheses for environmental decisions. .

LO Develop Limits oaDsdsioa Errors

The goal of this activity is to develop numerical probability limits that express the data user's
tolerance for committing false positive (Type I) or false negative (Type H) decision errors as a result of
uncertainty in the data. A false positive eror occurs when the null hypothesis is rejected when it is true. A
false negative decision error occurs when the null hypothesis is not rejected when it is false. If tolerable
decision error rates were not established prior to data collection, then the data user should:
Specify the gray region yhfre the cwspqi1*"*?6? «f a **k« negative tfecigion ems are relatively minor
(DQO Process Step 6). The gray region is bounded on one side by the threshold value and on the other
1 Throughout this document, the tenn "prinaiy hypotheses" refers to ttw stBtiafical hypotheses that ooirespood to the date u«rt
dcciaoa. Other statistical hypotheses can bs formulated to fonaallytert the oMBiytfwM th^ underik tbe specifks calculatkxu used to
test the prirouy hypotheses. See Chapter 3 for examples of asaunptWM umieriying prinuiy hypothec
of how to test these underlying assumptions.

EPAQA/G-9 1.1 = 2 QA96

-------
    side by that parameter value where the consequences of making a false negative decision error begin to be
    significant Establish this boundary by evaluating the consequences of not rejecting the null hypothesis
    when it is false and then place the edge of the gray region where these consequences are severe enough to
    set a limit on the magnitude of this false negative decision enor.  The gray region is the area between this
    parameter value and the threshold value.

    The width of the gray region represents one important aspect of the decision maker's concern for decision
    errors.  A more narrow gray region implies a desire to detect conclusively the condition when the true
    parameter value is close to the threshold value ("dose" relative to the variability in the data).  When the
    true value of the parameter falls within the gray region, the decision maker may face a high probability of
    making a false negative decision error, because the data may not provide conclusive evidence for rejecting
    the null hypothesis, even though it is false (i.e., the data may be too variable to allow the decision maker
    to recognize that the baseline condition is, in fact, not true).

Q   Specify tolerable limits on the probability of committing false positive and false negative decision errors
    (DQO Process Step 6) that reflect the decision maker's  tolerable limits for making an incorrect decision.
    Select £ possible value of the parameter, then, choose a probability limit based on an evaluation of the
    seriousness of the potential consequences of making the decision error if the true parameter value is
    located at that point At a minimum^ the decision maker should specify a false positive decision error
    limit at the threshold value (a), and a false negative decision error limit at the other edge of the gray
    region (P).

An example of the gray region and limits on the probability of committing both false positive and false
negative decision errors are contained in Box 1.1-1.

       If DQOs were developed for the study, the tolerable limits on decision errors will already have been
developed.  These values can be transferred directly as outputs for this activity.  In this case, the action level
is the threshold value; the false positive error rate at the action level is the Type Terror rate or a; and the false
LL4J   Review Sampling Design

        The goal of this activity is to familiarize the analyst with the mam features of the sampling design
that was used to generate the environmental data. The overall type of sampling design and the manner in
which samples were collected as measurements were taken will place conditions and constraints on how the
data must be used and interpreted. Section 1.3 provides additional information about several different types
of sampling designs that are commonly used in environmental studies.

        Review the sampling design documentation with the data user's objectives in mind. Look for design
features that support or contradict those objectives. For example, if the data user is interested in makings
decision about the mean level of contamination in an effluent stream over time, then composite samples may
be an appropriate sampling approach.  On the other hand, if the data user is looking for hot spots of
contamination at a hazardous waste site, compositing should only be used with caution, to avoid "averaging
away" hot spots. Also, look for potential problems in the implementation of the sampling design. For
example, verify that each point in space (or time) had an equal probability of being selected for a simple
random sampling design. Small deviations from a sampling plan may have ini*""1"* effect on the conclusions
drawn from the data set Significant mr substantial deviations should be flagged and their potential effect
carefully considered throughout the entire DQA.

EPAQA/G-9                                  1.1-3                                        QA96

-------
                       Box 1.1-1:  Example Applying the DQO Process Retrospectively

     A waste incineration company was concerned that waste fly ash could contain hazardous levels of cadmium
     and should be disposed of in a RCRA tendfl. As a result, eight composite samples each consisting of eight
     grab samples were taken from each load of waste. The TCLP teachata from these samples were then
     analyzed using a method specified in 40 CFR, Pt 281, App. II.  DQOs were not developed for this problem;
     therefore, study objectives (sections 1.1.1 through 1.1.3) should be developed before the data are analyzed.

     1.1.1     Review Study Objectives

     •  Develop a concise definition of the problem - The problem is defined above.

     •  Identify if any essential information is missing - It does not appear than any essential information is missing.

     •  Specify the scale of decision making - Each waste load is sampled separately and decisions need to be
        made for each load. Therefore, the scale of decision making is an individual load.

     1.1.2    Translate Objectives into Statistical Hypotheses

     Since composite samples were taken, the parameter of interest is the mean cadmium concentration.  The
     RCRA regulatory standard for cadmium in TCLP teachate is 1.0 mg/L.  Therefore, the two hypotheses are
     'mean cadmium 2  1.0 mg/L' and 'mean cadmium < 1.0 mg/L*

     There are two possible decision errors 1) to decide the waste is hazardous ("mean 2 1.0*) when it truly is
     not ("mean < 1.0*), and 2) to decide the waste is not hazardous fmean < 1.0*) when it truly is f mean 2  1.0').
     The risk of deciding the fly ash is not hazardous when it truly is hazardous is more severe since potential
     consequences of this decision error include risk to human health and the environment Therefore, this error
     wi be labeled the false positive error and the other error wi be the false negative error. As a result of this
     decision, the nul hypothesis wi be that the waste is hazardous f mean cadmium 2 1.0 mg/L*) and the
     alternative hypothesis wi be that the waste is not hazardous ("mean cadmium < 1.0 mg/L*). (See section  1.2
     for more information on developing the nul and alternative hypotheses.)

     1.1.3    Develop Limits on Decision Errors

     •  Specify the  gray region - The consequence of a false negative decision error near the action level is
     unnecessary resource expenditure.  The amount of data also influences the width of the gray region.
     Therefore, for now, a gray region was set from .75 to 1.0 mg/L  This region could be revised depending on
     the power of the hypothesis test                                     "~""""
     • Specify tolerable Emits on the
     probability of committing a decision
     error - Consequences of a false
     positive error include risk to human
     health and environment Another
     consequence for the landfill owners is
     the risk of fines and imprisonment
     Therefore, the stringent Emit of 0.05
     was set on the probability of a false
     positive decision error. Consequences
     of a false negative error include
     unnecessary expenditures so a Emit of
     0.20 was set on its probability. This
     error rate could be revised based on
     the power of the hypothesis test

     The results of this planning process
     are summarized in the Decision
     Performance Goal Diagram.
                                                              DacMon Pafformanoe Goal Diagram
             aa   .78
                       1.0   1.25

                        A—AdonlMT
                TOM M«an Cadmium (mg/1)
                                  IS   1.78
                                           zo
                                                 1

                                                 at

                                                 aa

                                                 0.7

                                                 0.6

                                                 o.s

                                                 a*

                                                 0.3

                                                 O2

                                                 0.1

                                                 0
EPAQA/G-9
1.1-4
QA96

-------
O     DEVELOPING THE STATEMENT OF HYPOTHESES

        The full statement of the statistical hypotheses has two major parts: the null hypothesis (HJ and the
alternative hypothesis (HJ.  In both parts, a population parameter is compared to either a fixed value (for a
one-sample test) or another population parameter (for a two-sample test).  The population parameter is a
quantitative characteristic of the population that the data user wants to estimate using the data. In other
words, the parameter describes that feature of the population that the data user will evaluate when making the
decision. Examples of parameters are the population mean and median.

        If the data user is interested in drawing inferences about only one population, then the null and
alternative hypotheses will be stated in terms that relate the true value of the parameter to some fixed
threshold value. A common example of this one-sample problem in environmental studies is when pollutant
levels in an effluent stream are compared to a regulatory limit  If the data user is interested in comparing two
populations, then the null and alternative hypotheses will be stated in terms that compare the true value of one
population parameter to the corresponding true parameter value of the other population. A common example
of this two-sample problem in environmental studies is when a potentially contaminated waste site is being
compared to a reference area using samples collected from the respective areas.  In this situation, the
hypotheses often will be stated in terms of the difference between the two parameters.

        The decision on what should constitute the null hypothesis and what should be the  alternative is
          difficult tn q«y<»rtff'>  In many cases, this problem does not arise because the null and alternative
hypotheses are determined by specific regulation.  However, when the null hypothesis is not specified by
regulation, it is necessary to make this determination. The test of hypothesis procedure prescribes that the
null hypothesis is only rejected in favor of the alternative, provided there is overwhelming evidence from the
data that the null hypothesis is false.  In other words, the null hypothesis is considered to be true unless the
data show conclusively that this is not so. Therefore, it is sometimes useful to choose the null and alternative
hypotheses in light of the consequences of possibly making an incorrect decision between the null and
alternative hypotheses. The true condition that occurs with the more severe decision error (not what would be
decided in error based on the data) should be defined as the null hypothesis.  For example, consider the two
decision errors: "decide a company does not comply with environmental regulations when it truly does" and
"deride a company does comply vit» envirnnrnental regulations when it truly does noa." If the first decision
error is copsidered the more severe decision error, then the true condition of this error, "the company does
comply with the regulations" should be defined as the null hypothesis. If the second decision error is
considered the more severe decision error, then the true condition of this error, "the company does not comply
with the regulations" should be defined as the null hypothesis.

        An alternative mefood fig defining the null hypothesis is based on historical informatioa If 8 large
amount of informatioa exists suggesting that one hypothesis is extremely likely, then this hypothesis should
be defined as the alternative hypothesis.  In mis case, a large amount of data may not be necessary to provide
overwhelming evidence that the other (null) hypothesis is false. For example, if the waste from an incinerator
was previously hazardous and the waste process has not changed, it may be more cost-effective to define the
alternative hypothesis as "the waste is hazardous" and the null hypothesis as "the waste is not hazardous."

        Consider a data user who wants to know whether the true mean concentration (u) of atrazme in
ground water at a hazardous waste site is greater than a fixed threshold value C.  If the data user presumes
from prior information that the true mean concentration is at least C due possibly to some contamination
incident, then the data must provide compelling evidence to reject that presumption, and the hypotheses can
be stated as follows:

EPAQA/G-9                                 1.2 -1                                        QA96

-------
Narrative Statement of Hypotheses
Null Hypothesis (Baseline Condition):
The true mean concentration of atranne in ground
water is greater than or equal to the threshold
value C; versus
Alternative Hypothesis:
The true mean concentration of atrazinc in ground
water is less than the threshold value C.
Statement of Hypotheses Using Standard Notation
versus
HA: uC
        In stating the primary hypotheses, it is convenient to use standard statistical notation, as shown
throughout this document  However, the logic underlying the hypothesis always corresponds to the decision
of interest to the data user.

        Table 1.2=1 summarizes some common types of environmental decisions and the corresponding
hypotheses.  In Table 1.2=1, the parameter is denoted using the symbol "8," and the difference between two
parameters is denoted using "©, - ©2W where9, represents the parameter of the first population and 9,
represents the parameter of the second population The use of W0n is to avoid using the terms "population
mean" or "population median" repeatedly because the structure of the hypothesis test remains the same
regardless of the population parameter. The fixed threshold value is denoted "C," and the difference between
two parameters is denoted "60" (it is common to see the null hypothesis defined such that 60=- 0). If the data
user's problem does not fell into one of the categories described in Table 1.2-1, the problem and associated
hypotheses may be of a more complicated form and a statistician should be consulted.
        For the first of the six decision problems in Table 1.2-1, only estimates of 9 that exceed C can cast
doubt on the null hypothesis. This is called a one-tailed hypothesis test, because only parameter estimates on
one side of the threshold value can lead to rejection of the null hypothesis.  The second, fourth, and fifth rows
of Table 1.2-1 are also examples of one-tailed hypothesis tests.  The third and sixth rows of Table 1.2-1 are
examples of two-tailed tests, because estimates falling both below and above the null-hypothesis parameter
value can lead to rejection of the null hypothesis. Most hypotheses connected with environmental monitoring
are one-tailed because high pollutant levels can harm humans or ecosystems.
 EPAQA/G-9
1.2-2
QA96

-------
Table 1J-1. Commonly Used Statements of Statistical Hypotheses
Type of Decision
Null Hypothesis
Alternative
Hypothesis
Compare environmental conditions to a fixed
threshold value, such as a regulatory standard or
acceptable risk level; presume that the trus
condition is less than the threshold value.
H,,;
HA:
Compare environmental conditions to & fixed
threshold value; presume that the true condition is
greater than the threshold value.
@*C
H.: ©60

Ok 8,-8,5-0)
Compare environmental conditions associated with
two different populations to a fixed threshold value
(do) such as a regulatory standard or acceptable
risk level; presume that the true condition is greater
than the threshold value. If it is presumed mat
conditions associated with the two populations are
the same, the threshold value is 0.
HA: 8l-8,
-------
O DESIGNS FOR SAMPLING ENVIRONMENTAL MEDIA

Sampling designs provide the basis for how a set of samples may be analyzed. Different sampling
designs require different analysis techniques and different assessment procedures. There are two primary
types of sampling designs: authoritative (judgment) sampling and probability sampling. This section
describes some of the most common sampling designs.

1 J.I Authoritative Sampling

With authoritative (judgment) sampling, an expert having knowledge of the site (or process)
designates where and when samples are to be taken. This type of sampling should only be considered when
the objectives of the investigation are not of a statistical nature, for example, when the objective of a study is
to identify specific locations of leaks, or when the study is focused solely on the sampling locations
themselves. Generally, conclusions drawn from authoritative samples apply only to the individual samples
and aggregation may result in severe bias and lead to highly erroneous conclusions. Judgmental sampling
aicn precludes the use of the sample for any purpose other than the original one. Thus if the data may be used
in further studies (e.g., for an estimate of variability in a later study), a probabilistic design should be used.

When the study objectives involve estimation or decision making, some form of probability sampling
is required. As described below, this does not preclude use of the expert's knowledge of the site or process in
designing a probability-based sampling plan; however, valid statistical inferences require that the plan
incorporate some form of randomization in choosing the sampling location-** or sampling times Por example,
to determine maximum S02 emission from a boiler, the Campling plan would reasonably focus, or put most of
the weight on, periods of maximum or near-maximum boiler operation. Similarly, if a residential lot is being
evaluated for contamination, then the sampling plan can take into consideration prior knowledge of
contaminated areas, by weighting such areas more heavily in the sample selection and data analysis.

1.3.2 Probability Sampling

Probability samples are samples in which every member of the target population (Le., every potential
sampling unit) has a known probability of being included in the sample. Probability samples can be of
various types, but in some way, they all make use of randomization, which allows valid probability
statements to be made about the quality of estimates or hypothesis tests that are derived from the resultant
data.

One common misconception of probability sampling procedures is that these procedures preclude
the 053 nf important prim* infermaripaL Indeed, just the opposite is true. An efficient sampling design is one
that uses all available prior information to stratify the region and set appropriate probabilities of selection.
Another commooi misconception is foyt using & probability sampling design ipffnnff allowing the possibility
that the sample points will not be distributed appropriately across the region. However, if there is no prior
information regarding the areas most likely to be c*jnti"ni'ii?tH, & grid sampling scheme (a type of stratified
design) is usually recommended to ensure that the sampling points are dispersed across the region.

1.3.2.1 Simple Random Sampling ,

The simplest type of probability sample is the simple random sample where every possible sampling
unit in the target population has an equal chance of being selected. Simple random samples, like the other
samples, can be either samples in time and/or space and are often appropriate at an early stage of aa

EPAQA/G-9 1.3-1 QA96
-------
investigation in which little is known about systematic variation within the site or process. All of the
sampling units should have equal volume or mass, and ideally be of the same shape if applicable. With &
simple random sample, the tenn "random" should not be interpreted to mean haphazard; rather, it has the
explicit meaning of equiprobable selection. Simple random samples are generally developed through use of a
random number table or through computer generation of pseudo-random numbers.

1.3.2.2 Sequential Random Sampling

Usually, simple random samples have s fixed sample size, but some alternative approaches are
available, such as sequential random sampling, where the sample sizes are not fixed a priori. Rather, a
statistical test is performed after each specimen's analysis (or after some minimum number have been
analyzed). This strategy could be applicable when sampling and/or analysis is quite expensive, when
information concerning sampling and/or measurement variability b lacimg, whm the characteristics of
interest are stable over the time frame of the sampling effort, or when the objective of the sampling effort is
to test a single specific hypothesis.

OJJ Systematic Samples

In the case of spatial sampling, systematic sampling involves establishing a two-dimensional (or in
some cases a three-dimensional) spatial grid and selecting & random starting location within one of the cells.
Sampling points in the other cells are located in a deterministic way relative to that starting point In addition,
the orientation of the grid is sometimes chosen randomly and Various types of systematic samples are
possible. For example, points may be arranged in a pattern of squares (rectangular grid sampling) or it
pattern of equilateral triangles (triangular grid sampling). The result of either approach is a simple pattern of
equally spaced points at which sampling is to be performed

Systematic sampling designs have several advantages over random sampling and some of the other
types of probability sampling. They are generally easier to implement, for example. They are also preferred
when one of the objectives is to locate "hot spots" within a site or otherwise map the pattern of
concentrations over & site. On the other hand, they should be used with caution whenever there is a
possibility of some type of cyclical pattern in the waste site or process. Such a situation, combined with the
uniform pattern of sampling points, could very readily lead to biased results.

O.2.4 Stratified Samples

Another type of probability sample is the stratified random sample, in which the site or process is
divided into two or more nonoverlapping strata, sampling \mfaf are defined for each stratum, «mj separate
simple random samples are employed to select the units in each stratum. (If a systematic sample were
employed within each stratum, then the design would be referred to as s stratified systematic sample.) Strata
should be defined so that physical samples within @ stratum are more sircar to each other than to samples
from other strata. If so, a stratified random .cample should result in more precise estimates of the overall
population parameter than those that would be obtained from a simple random sample with the. same number
of sampling units.

Stratification is an accepted way to incorporate prior knowledge and professional judgment into a
probabilistic sampling design. Generally, units that are "alike" or anticipated to be "alike" are placed
togethe? in the same stratum. Units that are contiguous in space (e.g., similar depths) or time are often
grouped together into the same stratum, but characteristics other than spatial or temporal proximity can also

EPAQA/G-9 1.3-2 QA96
-------
be employed Media, terrain characteristics, concentration levels, previous cleanup attempts, and
confounding contaminants can also be used as the basis for creating strata.

Advantages of stratified samples over random samples include their ability to ensure more uniform
coverage of the entire target population and, as noted above, their potential for achieving greater precision in
certain estimation problems. Even when imperfect information is used to form strata, the stratified random
sample will generally be more cost-effective than a simple random sample. A stratified design can also be
useful when there is interest in estimating or testing characteristics for subsets of the target populatioa
Because different sampling rates can be used in different strata, one can oversample b strata containing those
subareas of particular interest to ensure that they are represented in the sample. In general, statistical
calculations for data generated via stratified samples are more complex than for random samples, and certain
types of tests, for example, cannot be performed when stratified samples are employed. Therefore, a
statistician should be consulted when stratified sampling is used.

0.2.5 Compositing Physical Samples

When analysis costs are large relative to sampling costs, cost-effective plans can sometimes be
achieved by compositing physical samples or specimens prior to analysis, assuming that there are no safety
hazards or potential biases (for example, the loss of volatile organic compounds from a matrix) associated
with such compositing. For the same total cost, compositing in this situation would allow a larger number of
sampling units to be selected than would be the case if compositing were not used. Composite samples
reflect a physical rather than a mathematical mechanism for averaging. Therefore, compositing should
generally be avoided if population parameters other than & mean are of interest (e.g., percehtiles or standard
deviations).

Composite sampling is also useful when the analyses of composited samples are to be used in a
two-staged approach in which the composite-sample analyses are used solely as a screening meclianigm to
identify if additional, separate analyses need to be performed This situation might occur during an early
stage of a study that seeks to locate those areas that deserve increased attention due to potentially high levels
of one or more contaminants.

13.2.6 Other Sampling Designs

Adaptive sampling involves talcing a sample and using the resulting information to design the next
stage of sampling. The process may continue through several additional rounds of sampling and analysis. A
common application of adaptive sampling to environmental problems involves subdividing the region of
interest into smaller units, taking a probability sample of these units, then sampling all units that border on
any unit with a concentration level greater than some specified level C. This process is continued until all
newly sampled units are below C. The field of adaptive sampling is currently undergoing active development
and can be expected to have a significant impact on environmental sampling.

Ranked set sampling (RSS) uses the availability of an inexpensive surrogate measurement when it is
correlated with the more expensive measurement of interest The method exploits this correlation to obtain a
sample which is more representative of the population that would be obtained by random sampling, thereby
leading to more precise estimates of population parameters than what would be obtained by random
sampling. RSS consists of creating n groups, each of size n (for a total of n3 initial samples), then ranking the
surrogate from largest to smallest within each group. One sample from each group is then selected according
to a specified procedure and these n samples are analyzed for the more expensive measurement of interest

EPAQA/G-9 1.3 = 3 QA96
-------
STEP 2s CONDUCT A PRELIMINARY DATA REVIEW
THE DATA QUAUTY ASSESSMENT PROCESS
Review DQOsand Sampling Design
Conduct Preliminary Data Review
Select th« Statistical Test
Verify the Assumptions
Draw Conclusions From the Data
CONDUCT PRELIMINARY DATA REVIEW
Gamrate stetMcaJ quantttao and grapMssl
repreaertatlona that describe the date. Uaattte
Nbnration to bam about the structure of tha daia
and Identify any patterns or relationships.
• Review Quatty Asswsnce Repoito
oCalcutta Baafc Statistical Quartfltea
<• Graph tftsOeSa
°GfapHcaJ representations'
Step 2: Conduct & Preliminary Data Review

Review quality assurance report.
° Look for problems or anomalies in th@ implementation of ft@ sampS® coSec^on end
analysts procedurec. . . .
a Examine QC data for Information to verify assumptions underlying the Data QuaSty
Objectivee, the Samplng and AnalywQ Plan, and the Quafty Assurance Project Plane.

Calculate the statistical quantities
Q Consider calculating appropriate percentaes (section 2.2.1)
° Select measures of central tendency (section 2.2.2) and dtepereion (section 2.2.3).
o If the data bivotve two variables, calculate the correlation coefficient (section 2.2.4).

Display the data using graphical representations.
° Select graphical representations (section 2.4) that luminate the structure of the data set
and highlight assumptions underlying the Data Quality Objectives, the SampSng and
Analysis Plan, and the Quality Assurance Project Plans.
a Use a variety of graphical representations that examine different features of the set
EPA QA/G-9
QA96
-------
STEP 2: CONDUCT A PRELIMINARY DATA REVIEW
Statistical Quantities
Coefficient of Variation
Correlation Coefficient
Interquartile Range
Mean
Median
Mode
Percentiles/Quantiles
Range
Standard Deviation
Variance
Section
2.2.3
2.2.4
2.2.3
2.2.2
2.2.2
2.2.2
2.2.1
2.2.3
2.2.3
2.2.3
Directions
Box 2.2-4
Box 2.2-6
Box 2.2-4
Box 2.2-2
Box 2.2-2
Box 2.2-2
Box 2.2-1
Box 2.2-4
Box 2.2-4
Box 2.2-4
Example
Box 2.2-5
Box 2.2-6
Box 2.2-5
Box 2.2-3
Box 2.2-3
Box 2.2-3
Box 2.2-1
Box 2.2-5
Box 2.2-5
Box 2.2-5
Graphical Representations
Box and Whisker Plot
Coded Scatter Plot

Contour Plots
Autocorrelation Function
Empirical Quantile-Quantile Plot
Frequency Plots
h-Scatterplot
Histogram
Normal Probability Plot
Parallel Coordinate Plot
Posting Plots
QuantilePlot
Ranked Data Plot
Scatter Ploi
Scatter Plot Matrix
Stem-and-leafPlot
Symbol Plots
Time Hot
Section
2.3.3
2.3.7.3
2.3.9.3
2.3.8.2
2.3.7.4
2.3.1
2.3.9.3
2.3.1
2.3.6
2.3.7.3
2.3.9.1
2.3.5
2.3.4
2.3.7.2
2.3.7.3
2.3.2
2.3:9.2
2.3.8.1
Figure
Figure 2.3-3
Figure 2.3-9

Figure 2.3-13
Box 2.3-14
Figure 2.3-1

Figure 2.3-2
Box 2.3-12
Figure 2.3-10
Figure 2.3-14
Figure 2.3-5
Figure 2.3-4
Figure 2.3-8
Figure 2.3-11
Box 2.3-4
Figure 2.3-15
Figure 2.3-12
Directions
Box 2.3-5

Box 2.3-16
Box 2.3-14
Box 2.3-1

Box 2.3-1
Box 2.3-11

Box 2.3-18
Box 2.3-9
Box 2.3-7
Box 2.3-13

Box 2.3-3
Box 2.3-18
Box 2.3-15
Example
Box 2.3-6

Box 2.3-17
Box 2.3-14
Box 2.3-2

Box 2.3-2
Box 2.3-12

Box 2.3-18
Box 2.3-10
Box 2.3-8
Box 2.3-13

Box 2.3-4
Box 2.3-18
Box 2.3-15
EPAQA/G-9
QA96
-------
CHAPTER!
STEP 2s CONDUCT A PRELIMINARY DATA REVIEW

2.1 OVERVIEW AND ACTIVITIES

In this step of the DQA Process, the analyst conducts a preliminary evaluation of the data set,
calculates some basic statistical quantities, and examines the data using graphical representations. A
preliminary data review should be performed whenever data are used, regardless of whether they are used to
support a decision, estimate; a population parameter, or answer exploratory research questions. By reviewing
the data both numerically and graphically, one can learn the "structure" of the data and thereby identify
appropriate approaches .and limitations for using the data. The DQA software DataQUEST (G-9D, 19%)
will perform all of these functions as well as more sophisticated statistical tests.

There are two main elements nf preliminary data review: (1) basic statistical quantities (summary
statistics); and (2) graphical representations of the data. Statistical quantities are functions of the data that
numerically describe the data set. Examples include & »«*y«\ m^i^ percentile, range, and standard
deviation. They can be used to provide a mental picture of the data and are useful for making inferences
concerning the population from which the data were drawn. Graphical representations are used to identify
patterns and relationships within the data, confirm
-------
When reviewing QA reports, particular attention should be paid to information that can be used to
check assumptions made in the Data Quality Objectives Process. Of great importance are apparent anomalies
in recorded data, missing values, deviations from standard operating procedures, and the use of nonstandard
data collection methodologies.

2.1.2 Calculate Basic Statistical Quantities
The goal of this activity is to yimmarize some basic quantitative characteristics of the data set using .
common statistical quantities. Some statistical quantities that are useful to the analyst include: number of
observations; measures of central tendency, such as a mean, median, or mode; measures of dispersion, such
as range, variance, standard deviation, coefficient of variation, or interquartile range; measures of relative
standing, such as percentiles; measures of distribution symmetry or shape; and measures of association
between two or more variables, such as correlation. These measures can then be used for description,
communication, and to test hypothesis regarding the population from which the data were drawn. Section 2.2
provides detailed descriptions and examples of these statistical quantities.

The sample design may influence how the statistical quantities are computed. The formulas given in
this chapter are for simple random sampling, simple random sampling with composite samples, and
randomized systematic sampling. If a more complex design is used, such as a stratified design, then the
formulas may need to be adjusted

2.1.3 Graph the Data

The goal of this step is to identify patterns and trends in the data that might go unnoticed using
purely numerical methods. Graphs can be used to identify these patterns and trends, to quickly confirmor
disprove hypotheses, to discover new phenomena, to identify potential problems, and to suggest corrective
measures. In addition, some graphical representations can be used to record and store data compactly or to
convey information to others. Graphical representations include displays of individual data points, statistical
quantities, temporal data, spatial data, and two or more variables. Since no single graphical representation
will provide a complete picture of the data set, the analyst should choose different graphical techniques to
illuminate different features of the data. Section 2.3 provides descriptions and examples of common
graphical representations.

At a Tnin«numt the analyst should choose & graphical representation of the individual data points and
a graphical representation of the statistical quantities. If the data set has a spatial or temporal component.
select graphical representations specific to temporal or spatial data m addition to those that do not Ifthedata
set consists of more than one variable, treat each variable individiialty before developing graphical
representations for the multiple variables. If the sampling plan or suggested analysis methods rely on any
critical assumptions, consider whether a particular type of graph might shed light on the validity of that
assumption. For example, if a small-sample study is strongly dependent on the assumption of normality, then
a normal probability plot would be useful (section 2.3.6).
The sampling design may influence what data may be included in each representation. Usually.the
graphical representations should be applied to each complete unit of randomization separately or each unit of
randomization should be represented with a different symbol. For example, the analyst could generate box
plots for each stratum instead of generating one box plot that includes the data from all the strata.
EPAQA/G-9 2.1-2 QA96
-------
2.2 STATISTICAL QUANTITIES

12.1 Measures of Relative Standing

Sometimes the analyst is interested in knowing the relative position of one of several observations in
relation to all of the observations. Perceatiles are one such measure of relative standing that may also be
useful for summarizing data. A percentile is the data value that is greater than or equal to a given percentage
of the data values. Stated in mathematical terns, the p* percentile is die datavaluethat is greater than or
equal to p% of the data values and is less than or equal to (l-p)% of the data values. Therefore, if V is the p*
percentile, then p% of the values in the data set are less than or equal to x, and (100-p)% of the values are
greater than or equal to x. A sample percentile may fall between a pair of observations. For example, the
75* percentile of a data set of 10 observations is not uniquely defined. Therefore, there are several methods
for computing sample percentiles, the most common of which is described in Box 2.2-1.

Important percentiles usually reviewed are the quartiles of the data, the 25th, 50*, and 75th
percentiles. The 50* percentile is also called the sample median (section 2.2.2), and the 25* and 75*
percentile are used to estimate the dispersion of a data set (section 2.2.3). Also important for environmental
data are the 90th, 95*, and 99* percentile where a decision maker would like to be sure that 90%, 95%, or
99% of the contamination levels are below & fixed risk level
Box2J-1: Directions for Calculating the Measure of Relative Standing (Pereen&tes)
with an Exampte

Let X,, Xfr .... X,, represent the n data points. To compute the p* percentite, y(p), first fist the data from
smallest to largest and label these points X,,,. X,^..., X,a) (so that X,,, is the smaiest. X,2) is the second
smallest, and X,., is the largest). Let t = p/100. and multiply the sample seen by t Divide the result into the
integer part and the fractional part. Le., let nt = j + g where] is the integer part and g e the fraction part. Then
the p* percentae, tfp). b calculated by:

If 9 = 0.

otherwise.

Example: The 90** and 95* percenHe wffl be computed for the following 10 data points (ordered from smallest
to largest): 4.4,4,5.5.6,7,7,8, and 10 ppb.

For the 95th percenffle, t = p/100 = 95/100= .95 and nt = (10K.95) = 9.5 = 9 * .5. Therefore, j = 9 and
g-.5. Because g = .5 ?> 0, y^S^X,,,,,, = X(0»1) = X,M)=10ppm. Therefore, 10 ppm is the 9561 percentae
of the above data.

For the 90* percent3e, t = p/100 = 90/100 = . 9 and nt*(10X.9) = 9. Therefore] = fi and g = 0. Slnc«g = 0,
y(90) = (X,8) * X,10))C> = (8 * 10V2 = 9 ppm.
A quantile is similar in concept to a percentile; however, a percentile represents a percentage whereas
a quantile represents a fraction. If V is the p* percentile, then at least p% of the values in the data set lie at or
below x, and at least (100-p)% of the values lie at or above x, whereas if x is the p/100 quantile of the data,
then the fraction p/100 of the data values lie at or below x and the fraction (1-pVlOO of the data values lie at
or above x. For example, the .95 quantile has the property that .95 of the observations lie at or below x and
.05 of the data lie at or above x. For the example in Box 2.2-1,9 ppm would be the .95 quantile and 10 ppm
would be the .99 quantile of the data.

EPAQA/G-9 2.2-1 QA96
-------
2.2.2 Measures of Central Tendency

Measures of central tendency characterize the center of a sample of data points. The three most
common estimates are the men", py*i'ant and the mode. Directions for calculating these quantities are
contained in Box 2.2-2; examples are provided in Box 2.2-3.

The most commonly used measure of the center of a sample is the sample mean, denoted by x. This
estimate of the center of a sample can be thought of as the "center of gravity" of the sample. The sample
mean is an arithmetic average for simple sampling designs; however, for complex sampling designs, such as
stratification, the sample mean is a weighted arithmetic average. The sample mean is infli«»mw< by extreme
values Qarge or small) and nondetects (see section 4.7).

The sample median (5) is the second most popular measure of the center of the data. This value falls
directly in the middle of the data when the measurements are ranked in order from smallest to largest This
meansthat V* of the data are smaller than the sample "wtian and V* of the data are larger than the sample
median. The median is another name for the 50* percentile (section 2.2.1). The median is not infhuriffd by
extreme values and can easily be used in the case of censored data (nondetects).

The third method of measuring the center of the data is the mode. The sample mode is the value of
the sample that occurs with the greatest frequency. Since this value may not always exist, or if it does it may
not be unique, this value is the least commonly used. However, the mode is useful for qualitative data.

2.2.3 Measures of Dispersion

Measures of central tendency are more meaningful if mxx>mpaniffd by inf"rTtifltion 0" hf>w frq data
spread out from the center. Measures of dispersion in a data set include the range, variance, sample standard
deviation, coefficient of variation, and the interquartile range. Directions for computing these measures are
given in Box 2.2-4; examples are given in Box 2.2-5.

The easiest measure of dispersion to compute is the sample range. For small samples, the range is
easy to interpret and may adequately represent the dispersion of the data. For-large samples, the range is not
very informative because it only considers (and therefore is greatly influenced) by extreme values.

The sample variance measures the dispersion from the mean of a data set A large sample variance
implies that there isa large spread among the data so that the data are not clustered around the mean. A small
sample variance implies that there «little spread among the data so that most of the data are near the mean.
The sample variance is affected by extreme values and by a large number of nondetects. The sample standard
deviation is the square root of the sample variance and has the same unit of measure as the data.

The coefficient of variation (CV) is & unitless measure that allows the comparison of dispersion
across several sets of data. The CV is often used in environmental applications because variability^
(expressed as a standard deviation) is often proportional to the mean.

When extreme values are present, the interquartile range may be more representative of the
dispersion of the data than the standard deviation. This statistical quantity does not depend on extreme
values and is therefore useful when the data include a large number of nondetects.
EPAQA/G-9 2.2-2 QA96
-------
Box 2^-2: Direction* for Calculating the Measures of Central Tendency

Let X,. Xj,.... Xa represent the n data points.

Sample Mean: The sample mean x is the sum of al the data points dvided by the total number of data points
(n):
Sample Median: The sample median (X) is the center of the date when the measurements are ranked fcn
order from smallest to largest To compute the sample median, 1st the data from smallest to largest and label
these points X,, >, X,,,, .... X,B) (so that X, ,, is the smallest, X,,, is the second smallest and X,B, » the
largest).

If the number of data points is odd, then X °

X
If the number of data pointefe even, then X =
Sample Mode: The mode » the value of the sample that occurs with the greatest frequency. The 'mode may
not exist, or 8 'A does, it may not be unique. To find the mode, count the number of times each value occurs.
The sample mode is the value that occurs most frequently.
Box 2^-3: Example Calculations of the Measures of Central Tendency

Using the directions in Box 2.2-2 and the following 10 data points (in ppm): 4, 5, 6, 7. 4, 10, 4. 5, 7, and 8,
the foBowing m an example of computing the sample mean, median, and mode.

Sample mean:

- 4+5+6*7+4* 10 +4+S+7 + 8 60 ,
X * - - B — s o ppm
10 10 ™

Therefore, the sample mean la 6 ppm.

Sample median: The ordered data are: 4. 4, 4, 5, 5, 6, 7, 7, 8, and 10. Since n=10 te even, tho aampis
medians

6 o . .
^
2 22

Thus, the sampto median s 5.5 ppm.

Sample mode: Computing the number of times each value occurs yields:

4 appears 3 times; 5 appears 2 femes; 6 appears 1 time; 7 appears 2 times; 8 appears 1 time; and .10
appeals 1 time.

Because the value of 4 ppm appears the most times, ft © the mode of Shes data ©s£.
EPAQA/G-9 2.2-3 QA96
-------
Bos 2L2-4: Directions for Calculating the Measures of Dispersion

Let X,. X* .... X, represent the n data points.

Sample Range: The sample range (R) to the deference between the largest value and the smaBest value of
the sample, i.e., R = maximum - minimum.

Sample Variance: To compute the sample variance (s2), compute:
n-l

Sample Standard Deviation: The sample standard deviation (s) is the square root of the sample variance. i.e..

Coefficient of Variation: The coefficient of variation (CV) to the standard deviation dwided by the sample mean
(section 2.2.2). i.e.,CV = s/x The CV to often expressed as a percentage.

Interquartile Range: Use the directions In section 2.2.1 to (compute the 25* and 7561 percentfles of the data
(y(25) and y(75) respectively). The interquartie range (IQR) to the difference between these values, i.e..
y(75)-y(25).
Box 2^-5: Exampto Calculations of the Measures of Dispersion

In thto box, the directions in Box 2.2-4 and the following 10 data points (to ppm): 4,5,6.7,4,10,4,5,7, and
8, are used to calculate the measures of dispersion. From Box 2.2-2, x = 6 ppm.

Sample Range: R = maximum-minimum = 10-4 = 6 ppm

Sample Variance: .

396 -
10-1 9

Sample Standard Deviation: s a ys* ° ft = 2 ppm

Coefficient of Variation: CF = s IX => 2ppm/6ppm = i = 33%

Interquartile Range: Using the directions in section 2.2.1 to compute the 25" and 75* percenties of the data
(y(25)andy(75)respectively): y(25) = X(j,,) = X(S) = 4ppmandy(75) = X(7,1) = X(8) = 7ppm. The
interquartfle range (IQR) to the deference between these values: IQR = y(75) - y(25) = 7 - 4 = 3 ppm
EPAQA/G-9 2.2-4 QA96
-------
2.2.4 Measures of Association

Data often include measurements of several characteristics (variables) for each sample point and
there may be interest in knowing the relation-ship or level of association between two or more of these
variables. One of the most common measures of association is the correlation coefficient. Directions and an
example for calculating a correlation coefficient are contained in Box 2.2-6.

The correlation coefficient measures the linear relationship between two variables. A linear
association implies that as one variable increases so does the other linearly,, or as one variable decreases the
other increases linearly. Values of the correlation coefficient close to +1 (positive correlation) imply that as
one variable increases so does the other, the reverse holds for values close to -1. A value of+1 implies &
perfect positive linear correlation, i.e., all the data pairs lie on a straight line with a positive slope. A value of
-1 implies perfect negative linear correlation. Values close to 0 imply little correlation between the variables.

The correlation coefficient does not imply cause and effect The analyst may say that the correlation
between two variables is high and the relationship is strong, but may not say that one variable causes the
other variable to increase or decrease without further evidence and strong statistical controls. The correlation
coefficient does not detect nonlinear relationships so it should be used only in conjunction with a scatter plot
(section 2.3.7.2). A scatter plot can be used to determine if the correlation
some measure of nonlinear relationships should be used. The correlation coefficient can be significantly
changed by extreme values so a scatter plot should be used first to identify such values.
Box £2-6: Directions for Calculating the Correlation Coefficient with an Exampl©

Let X,. X*.... X, represent one variable of the n data pointe and tot Y,. Y».... Ya represent a second variable of
the n data points. The Pearson correlation coefficient r, between X and Y fa computed by.

(E*,)3
n M n
Example: Consider the following data set (in ppb): Sample 1 — arsenic (X) » 4.0. lead (Y) ° 8.0; Samp!® 2
arsenic - 3.0, lead = 7.0; Sample 3 - arsenic *> 2.0, lead = 7.0; and Sample 4 - arsenic = 1.0. lead = 6.0.

*...* (1*0 - 73.
?3 . (10X28)

and r = - : =-^ a 0.949
I f+f\ \ It/11 AU/^ n no \^^/.\^*®/i I
P° " 4—] I198 " 4 ]]

Since r m dose to 1, there e a strong Enear relationship between these two contaminants.
EPAQA/G-9 2.2-5 QA96
-------
2 J GRAPHICAL REPRESENTATIONS

2 J.I Histogram/Frequency Plots

Two of the oldest methods for summarizing data distributions are the frequency plot (Figure 2.3-1)
and the histogram (Figure 2.3-2). Both the histogram and the frequency plot use the same basic principles to
display the data: dividing the data range into units, counting the number of points within the units, and
displaying the data as the height or area within a bar graph. There are slight differences between the
histogram and the frequency plot In the frequency plot, the relative height of the bars represents the relative
density of the data. In a histogram, the area within the bar represents the relative density of the data. The
difference between the two plots becomes mc&e distinct when unequal box sizes are used.
10
a
1 8
-8 4
i 2
°,

I II
> 5 10 15 20 23 30 33 &

MMUB1HIUMUI U4«l V
8
!
I'
I4
I'
'•,
•

1 II
J S 10 IS 3) 29 30 35 40

MMIUBttl
-------
: Directions for Generating a Histogram and a Frequency PtoS

Let X,. X* .... XB represent the n data pointe. To develop a histogram or a frequency plot

STEP 1: Select intervale that cover the range of observations. If possible, these intervale should have aqua!
widths. A rute of thumbs to have between 7 to 11 intervals. If necessary, specify an endpoint
convention, Le., what to do with cases that fafl on interval endpointa.

STEP 2: Compute the number of observation® within each interval. For a frequency plot with equal interval
sizes, the number of observations represents the height of the boxes on the frequency pfoi

STEP 3: Determine the horizontal axis based on the range of tha data The vertical axte for a frequency ptot
is the number of observation©. The vertical axis of the histogram la based on percentages.

STEP 4: For a histogram, compute the percentage of observations within each interval by dividing the
number of observations within each interval (Step 3) by the total number of observations.

STEP 5: For a histogram, select a common unit that corresponds to the x-axis.. Compute the number of
common units in each interval and divide the percentage of observations within each interval (Step
4) by this number. This step is only necessary when the intervals (Step 1) are not of equal widths.

STEP 6: Using boxes, ptot the intervals against the results of Step 5 for a histogram ox tha intervafe again®*
the number of observations in an interval (Step 2) for a frequency ptot
: Example of Generating a Histogram and a Frequency PtoS
Consider the fofiowing 22 samples of a contaminant concentration (In ppm): 17.7, 17.4, 22.8, 3S.S, 28.3,
17.2 19.1, «4. 7.2, «4. 15.2. 14.7, 14.9. 10.9, 12.4. 12.4, 11.6, 14.7, 10.2, 5.2. 16.5, and 8.9.

STEP 1: This data spans 0-40 ppm. Equally sized Intervals of 5 ppm wB be used: 0-5 ppm; 5-10 ppm;
etc. The endpoint convention wiB be that values are placed In the highest interval containing the
value. For example, a valueof 5 ppm wffl be placed in the interval 5- 10 ppm instead of 0-5 ppm.

STEP 2: The table below shows tha number of observations within each interval defined in Step 1.

STEP 3: The horizontal axis for the data is from 0 to 40 ppm. The vertical axe for the frequency ptot is from
0 • 10 and the vertical axis for the histogram is from 0% - 10%.

STEP 4: There are 22 observations total, so the number observations shown in the table baJow w§ bs
dMdedby22. The results are shown In column 3 of the table below.

STEP 5: A common unit for this data is 1 ppm. In each interval there are 5 common units so the
percentage of observation® (column 3 of the table below) should be divided by 5 (column 4).

STEP 6: The frequency plot is shown in Figure 2.3-1 and the histogram s shown In Figure 2.3-2.
fofOte %o?Ob@ %ofOb®
In Interval fn Interval oerpona
0- 5ppm 2 9.10 1.8
5-10 pom 3 13.60 2.7
10-15 ppm 8 36.36 7.3
15-20ppm . 3 27.27 5.5
20-25 ppm 1 4.55 0.9
25-30 ppm 1 4.55 0.9
30-35 ppm 0 0.00 0.0
35-40 ppm 1 4.55, 0.9

EPAQA/G-9 2.3-2 QA96
-------
2.3.2 Stem-and-LeafPIoa
The stem-and-leaf plot is used to show both the numerical values themselves and information about
the distribution of the data. It is a useful method for storing data in & compact form while, at the same time,
sorting the data from smallest to largest A stem-and-leaf plot can be more useful in analyzing data than a
histogram because it not only allows a visualization of the data distribution, but enables the data to be
reconstructed and lists the observations in the order of magnitude. However, the stem-and-leaf plot is one of
the more subjective visualization techniques because it requires the analyst to make some arbitrary choices
regarding a partitioning of the data. Therefore, this technique may require some practice or trial and error
before a useful plot can be created As a result, the stem-and-leaf plot should only be used to develop a
picture of the data and its characteristics. Directions for constructing a stem-and-leaf plot are given in Box
2.3-3 and an example is contained in Box 2.3-4.

Each observation in the stem-and-leaf plot consist of two parts: the stem of the observation and the
leaf. The stem is generally ™*A$ up of the leading digit of the numerical values while the leaf is made up of
trailing digits in the/order that corresponds to the order of magnitude from left to right The stem is displayed
on the vertical axis and the data points make up the leaves. Changing the stem can be accomplished by
increasing or decreasing the digits that are used, dividing the groupings of one stem (Le., all numbers which
start with the numeral 6 can be divided into smaller groupings), or multiplying the data by a constant factor
(i.e., multiply the data by 10 or 100). Nondetects can be placed in a single stem.

A stem-and-leaf plot roughly displays the distribution of the data. For example, the stem-and-leaf
plot of normally distributed data is approximately bell shaped. Since the stem-and-leaf roughly displays the
distribution of the data, the plot may be used to evaluate whether the data are skewed or symmetric. The top
half of the stem-and-leaf plot will be a mirror image of the bottom half of the stem-and-leaf plot for
symmetric data. Data that are skewed to the left will have the bulk of data in the top of the plot and less data
spread out over the bottom of the plot
2JJ Bos and Whisker Piofc

A box and whisker plot or box plot (Figure 2.3-3) is a schematic
diagram useful for visualizing important statistical quantities of the data. Box
plots are useful in situations where it is not necessary or feasible to portray all
the details of a distribution. Directions for generating a box and whiskers plot
are contained in Box 2.3-5, and an example is contained in Box 2.3=6.

A box and whiskers plot is composed of a central box divided by a line
and two lines extending out from the box called whiskers. The length of the
central box indicates the spread of the bulk of the data (the central 50%) while'
the length of the whiskers show how stretched the tails of the distribution are.
The width of the box has no particular meaning; the plot can be made quite
narrow without affecting its visual impact The sample median is displayed as a
line through the box and the sample mean is displayed using a .*+* sign. Any
unusually small or large data points are displayed by a '** on the plot A box
and whiskers plot can be used to assess the symmetry of the data. If the
distribution is symmetrical, then the box is divided in two equal halves by the
median, the whiskers will be the same length and the number of extreme data
points will be distributed equally on either end of the plot
EPA QA/G-9
2.3-3
Figure 2.3-3.
Example of a Box and
Whisker Plot
QA96
-------
Bos £3-3: Directions for Generating a Stem and Leaf Ptot

Let X,, X* .... X, represent the n data points. To develop a stem-and-teaf plot, complete the foflovwng steps:

STEP 1 : Arrange the observations in ascending order. The ordered data is usually labeled (from smallest to
STEP 2: Choose either one or more of the (eating digits to be the stem values. As an example, for the value 16.
1 could ba used as the stem as it a the leading digit

STEP 3: Us* the stem values from emaSest to largest at the left (along a vertical axis). Enter the leaf (th®
remaining digits) values in order from lowest to highest to the right of the stem. Using the value 16 as an
exampte. if the 1 to the stem then Ihe 6 wB be the teal.
30x2.3-4: Example of Generating a Stem and Leaf Ptet

Consider the folovwng 22 sampJes of trifluorine (in ppm): 17.7.17.4,22.8.35.5.28.6.17.218.1. <4,7.2. «4.15.2.
14.7,14.8.10.8,12.4.12.4.11.6.14.7.10.2.5.2.16.5. and 8.8.

STEP 1- Arrange th« observations in ascendng order <4. *4.5.2.7.7.8.8.10.2.10.8.11.6,12.4,12.4.14.7.
14.7,14.8.15.2,16.5,17.4,17.7.19.1,22.8,28.6.35.5.

STEP 2: Choose either one or more of the teadng digits to be the stem values. For the above data, using the first
digit as the stem does not provide enough detai for analysis. Therefore, the first digit wi be used as a
stem; however, each stem wi have two rows, one for the leaves 0 - 4. the other for the leaves 5 - 8.

STEP 3: List the stem values at the left (along a vertical axis) from smallest to largest Enter the leaf (the
remaining digits) values in order from lowest to highest to the right of the stem. The first digit of the data
was used as the stem values; however, each stem value has two leaf row®.

0(0,1,2,3,4) |<4 <4
0 (5,6.7,8, 8) 15.2 7.7 8.8
1 (0.1.2,3,4) 10.2 0.8 1.6 2.4 2.4 4.7 4.7 4.8
1(5,6,7,8,8) 1528.57.47.78.1
2(0,1.2,3,4) |2.8
2(5,6,7,8,8) 18.6 •
3(0.1,2.3,4) |
3(5.6.7,8,8) |5.5

Note: If nondetecte are present p!ac® them first in the ordered Ost, using a symbol such as «L If multiple detection
Emits were used, place the nondetecte In Increasing order of detection limits, using symbols such a@ «L1,
-------
Bos 2.3-3: Dtrecttons tot Generating a Bos and Whiskera Plot
STEP1: S^theveitk^scateofthep^basedonthsmaximimiendtnWmumv^uaeofttiedataaeL Selects
width forth® box plot keeping In mind that the width to only a visualization tool Label the width w; the
honzontaJ scale than ranges from -MW to >SW.

STEP 2: Compute the upper quarts* (Q(.75), the 75® percentBe) and the tower quartie (Q(.25), the 25s'
parpentie) using Box 2.2-1 . Compute the sample mean and median using Box 2.2-2. Then, compute
the intetquartSo range (IQR) where IQR = Q(.75) - Q(.2S).
STEP 3: Draw e box through points ( -V«W. Q(.75) ), ( -fcW, Q (25) ). ( *W. Q(.25) ) and ( HW, Q(.75) ). Draw
a fine from (V4W, Q(.5)) to (-%W, Q(.5)> and mark point (0. x) with (*y.

STEP 4: Compute the upper end of the top whisker by finding the largest data value X less than
Q(.75) * 1.5( Q(.75) - Q(.25) ), Draw e §ne from (0. Q(.75)) to (0. X).

Compute the lower end of the bottom whisker by finding the smallest data value Y greater than
Q(.25) - 1.5{ Q(.75) - Q(.2S) ). Draw a fine from (0, Q(.2SJ) to (0. Y).

STEPS: For aB points X° » X, place an asterisk (")«**« P«nt (0,X°).

For aB points Y° < Y. place an asterisk (•) at the point (0, Y°).
Bos 2.3-8. Example of a Box and Whiskers Plot

Consider the following 22 samples of trifluorine (in pom) feted in order from emaiest to largest 4.0,6.1.9.8.10.7.
10.8,11.5.11.8.12.4,12.4.14.6.14.7,14.7,16.5,17,17.5,20.6,20.8,25.7,25.9.26.5. 32.0. and 35.5.

STEP 1: The data ranges from 4.0 to 35.5 ppm. This is the range of the vertical axis. Arbitrarily, a width of 4 wi
be used for the honzontsi am

STEP 2: Using the formulas in Box 23-2, the sample mean *> 18.87 and the -
medran = 14.70. Using Box 2.2-1, Q(.75) = 20.8 and Q(.25) = 11.5. -:
Therefore. IQR « 20.8 -11.5 = 9.3.

STEP 3: In the figure, a box ha© been drawn through points (-2,20.8), (-2,11.5),
(2.11.5). (2,20.8). A Sna has been drawn from (-2.14.7) to (2,14.7),
and the point (0,16.87) has been marked with a '<•' agn.

STEP 4: Q(.75) * 1.5(9.3)» 34.75. The closest data value to thte number, but lass
than it, te 32.0. Therefore, a line has been drawn in the figure from
(0.20.8) to (0.32.0).

Q(.25) -1.5( 9.3) * -2.45. Th® closest data value to this number, but greater
than £ Is 4.0. Therefore, e ln« has been drawn in the figure from
<0.4)to(0,11.5).

STEP 5: There is only 1 data value greater than 32.0-whteh te 35.5. Therefore, th©
point (0,35.5) has been marked with an asterisk. There are no data values
less than 4.0.
EPAQA/G-9 2.3-5 QA96
-------
2.3.4 Ranked Data Plot
A ranked data plot is a useful graphical representation that is easy to construct, easy to interpret, and
makes no assumptions about a model for the data. The analyst does not have to make any arbitrary choices
regarding the data to construct a ranked data plot (such as cell sizes for a histogram). In addition, a ranked
data plot displays every data point; therefore, it is a graphical representation of the data "VT^'-ftd of a summary
of the data. Directions for developing a ranked data plot are given in Box 2.3-7 and an example is given in
Box 2.3-8.

A ranked data plot is a plot of the data from smallest to largest at evenly spaced intervals (Figure
2.3-4). This graphical representation is very similar to the quantile plot described in section 2.3.5. A ranked
data plot is marginally easier to generate than a quantile plot; however, a ranked data plot does not contain as
much information as a quantile plot Both plots can be used to determine the density of the data points and
the skewness of the data; however, a quantile plot contains information on the quartiles of the data whereas a
ranked dflta plot does not. '
^-
!
SmaHMt
figure 2.W. Example of a Ranked Data Plot

A ranked data plot can be used to determine the density of the data values, i.e., if all the data values
are close to the center of the data with relatively few values in the tails or if there is a large amount of values
in one tail with the rest evenly distributed. The density of the data is displayed through the slope of the graph.
A large amount of data values has a flat slope, Le., the graph rises slowly. A small amount of data values has
a large slope, Le., the graph rises quickly. Thus the analyst can determine where the data lie, either evenly
distributed or in large clusters of points. In Figure 2.3-4, the data rises slowly up to a point where the slope
increases and the graph rises relatively quickly. This means that there is a large ammmt of small data values
and relatively few large data values.

A ranked data plot can be used to determine if the data are skewed or if they are symmetric. A
ranked data plot of data that are skewed to the right extends more sharply at the top giving the graph a
convex shape. A ranked data plot of data that are skewed to the left increases sharply near the bottom giving
the graph a concave shape. If the data are symmetric, then the top portion of the graph will stretch to upper
right comer in the same way the bottom portion of the graph stretches to lower left, creating a s-shape.
Figure 2.3-4 shows a ranked data plot of data that are skewed to the right
EPAQA/G-9
2.3-6
QA96
-------

Box 2,3-7: Directions for Generating a Ranked Data Plot
Let X,, X* .... X, represent the n data points. Let X,,,, for 1=1 to n,
be the data listed in order from smaBest to largest so that X, 1} 0 • 1 )
is .the smallest, X, 2) (i = 2) is the second smallest, and X,., (i = n) is
the largest To generate a ranked data plot, plot the ordered X .
values at equatty spaced Intervals along the horizontal aids.

Box 2J-8: Example of Generating a Ranked Data Plot
Consider the fbBowing 22 samples of Wlourine (in ppm): 17.7, 17.4, 22.6, 35.5, 28.6, 17.2 19.1,
4.9, 7.2, 4.0, 15.2, 14.7, 14.9, 10.9, 12.4, 12.4, 11.6, 14.7, 105, 5.2, 16.5, and 8.9. The data
feted in order from smallest to largest X,, along with the ordered number of the observation (i) are:
i -Ju
2 4.9
3 5.2
4 7.7
5 8.9
6 10.2
7 10.9
8 11.6
9 12.4
10 12.4
11 14.7
±Y
-"CIV.
12 14.7
13 14.9
14 15.2
15 16.5
16 175
17 17.4
18 17.7
19 . 19.1
20 22.8
21 28.6
22 35.5
A ranked data plot of this data is a plot of the pairs (I.X,,,). This plot is shown below.

40
36
30
925
|20
Sl5
10
5
0
•
•
•
•
•
• * *
.•****
" • *

EPAQA/G-9
2.3-7
QA96
-------
QuantikPIot
A quantile plot (Figure 2.3-5) is a graphical representation of the data that is easy to construct, easy
to interpret, and makes no assumptions about a model for the data. The analyst does not have to make any
arbitrary choices regarding the data to construct a quantile plot (such as cell sizes for a histogram). In
addition, a quantile plot displays every data point; therefore, it is a graphical representation of the data
gimimary
A quantile plot is a graph of the quantiks (section 2.2. 1) of the data. The basic quantile plot is
visually identical to a ranked data plot except its horizontal axis varies from 0.0 to 1.0, with each point
plotted according to the fraction of the points it exceeds. This allows the addition of vertical lines indicating
the quartiles or, any other quantiles of interest Directions for developing a quantile plot are given in Box
2.3-9 and an example is given in Box 2.3-10.
8
Lower
Upper
Ouaitte"
0.4 0.6
Fraction of Data (f-values)
Figure 2.3-5. Example of a Quantile Plot of Skewed Data
A quantile plot can be used to read the quantile information such as the median, quartiles, and the
interquartile range. In addition, the plot can be used to determine the density of the data points, e.g., are all
the data values close to the center with relatively few values in the tails or are there a large amount of values
in one tail with the rest evenly distributed? The density of the data is displayed through the slope of the
graph. A large amount of data values has a flat slope, Le., the graph rises slowh/. A small amount of data
values has a large slope, Le., the graph rises quickly. A quantile plot can be used to determine if the data are
skewed or if they are symmetric. A quantile plot of data that are skewed to the right is steeper at the top right
than the bottom left, as in Figure 2.3-5. A quantile plot of data that are skewed to the left increases sharply
near the bottom left of the graph. If the data are symmetric then the top portion of the graph will stretch to
the upper right corner in the same way the bottom portion of the graph stretches to the lower left, creating an
s-shape.
EPAQA/G-9
2.3-8
QA96
-------
Box 24-0: DlrectkNU for Generating a Quantild Plot

LetX,. X*.... X. represent the n data point*. To obtain a quanta* plot letX,,,, for
i»1 to n, be the data fisted in order from smallest to largest so that X,,, 0 B1 )to
the smallest X,,, (ia 2) is the second smallest, and X,,, Q * n) is the largest For
each i. compute the fraction f, = (i - 0.5)/n. The quantile plot is a ptot of the pairs
(f,, X,,,). with straight Ines connecting consecutive points.
B
Consider the following 1 0 dat
and 8 ppm. The data ordere
and the ordered number for <
values f, for each i where f, = \
*o> -L
4 1 0
420
430
5 4 0
550
The pairs (fh X,,,} are then pfc
10
!'
i.
4
(
Note that the graph curves u
;ox 24-10: Example of Generating a Quantile Plot
a points: 4 ppm, 5 ppm. 6 ppm. 7 ppm. 4 ppm, 10 ppm, 4 ppm, 5 ppm. 7 ppm,
d from smallest to largest X,,,. are shown in the first column of the table below
»ach observation, i, is shown in the second column. The third column displays the
i-0.5yn.
JL_ *,„ J. _JL
05 66 0.55
15 7 7 0.65
25 78 0.75
35 89 0.85
.45 10 10 0.95
)tted to yield the following quantile plot

} 0.2 0.4 0.6 0.8 1
Fractal of Data (rvalues)
pward; therefore, the data appear to be skewed to the right
EPAQA/G-9
2.3-9
QA96
-------
2 J.6 Normal Probability Plot (QuantOe-Quantile Plot)

lliere are two types of quantile-quantile plots or q-q plots. The first type, an empirical quantile-
quantile plot (section 2.3.7.4), involves plotting the quantiles of two data variables against each other. The
second type of a quantile-quantile plot, a theoretical quantile-quantile plot, involves graphing the quantiles of
a set of data against the quantiles of a specific distribution. The following discussion will focus on the most
common of these plots for environmental data, the normal probability plot (the normal q-q plot);'however, the
discussion holds for other q-q plots. The normal probability plot is used to roughly determine how well the
data set is modeled by a normal distribution. Formal tests are contained in Chapter 4, section 2. Directions
for developing a normal probability plot are given in Box 2.3-11 and an example is given in Box 2.3-12.

A normal probability plot is the graph of the quantiles of a data set against the quantiles of the
normal distribution using normal probability graph paper (Figure 2.3-6). If the graph is linear, the data may
be normally distributed. If the graph is not linear, the departures from linearity give important information
about how the data distribution deviates from a normal distribution.

If the graph of the normal probability plot is not linear, the graph may be used to determine the
degree of symmetry (or asymmetry) displayed by the data. If the data are skewed to the right, the graph is
convex. If the data are skewed to the left, the graph is concave. If the data in the upper tail fall above and the
data in the lower tail fall below the quartile line, the data are too slender to be well modeled by a normal
distribution, i.e., there are fewer values in the tails of the data set than what is expected from a normal
distribution. If the data in the upper tail fall below and the data in the lower tail fall above the quartile line,
then the tails of the data are too heavy to be well modeled using a normal distribution, Le., there are more
values in the tails of the data than what is expected from a normal distribution. A normal probability plot can
be used to identify potential outliers. A data value (or a few data values) much larger or much smaller than
the rest will cause the other data values to be compressed into the middle of the graph, ruining the resolution.
Box 2.3=11: Directions for Constructing a Normal Probability Plot

Let X,, x».... X,, represent the n data points.

STEP 1: For each data value, compute the absolute frequency. AFh The absolute frequency is the number
of times each value occurs. For distinct values, the absolute frequency to 1. For non-distinct
observations, count the number of times an observation occurs. For example, consider the data 1,
2,3,3. The absolute frequency of value 1 fe 1 and the absolute frequency of value 2 to 1. Th©
absolute frequency of valu« 3 te 2 since 3 appears 2 times in the data set

STEP 2: Compute the cumulative frequencies, CF,. The cumulative frequency is the number of data points
*
that are less than or equal to X^L®., CF, * ^AF.. Using the data given in step 2. th®
M
cumulative frequency for value 1 fe 1, the cumulative frequency for value 2 is 2 (1*1). and ths
cumulative frequency for value 3 te 4 (1*1*2).
CF"
STEP 3: Comput® ¥. = 100 x — and plot the pairs (Yb XJ using normal probabity paper (Flgur®
(w*l)
2.3-6). If the graph of these pairs approximately forms a straight fne, then the data are probably
normally dstributed. Otherwise, the data may not be normally distributed.
EPAQA/G-9 2.3-10 QA96
-------
Consider the following 15 data
STEP 1 : Because the value
2, of 8 is 2, of 9 is 1
STEP 2: The cumulative fre<
values of 8. The a
STEP 3: The values Yi = 1
these pairs (Y,. X,) i
X

1
2
3
4
5
6
7
8
20
18
16
14
12
10
8
6
4
2
0
Box 2.3-12: Example of Normal Probability Plot
points- 5. 5. 6, 6, 8. 8, 9. 10, 10. 10, 10, 10, 12, 14, and 1
5 appears 2 times, its absolute frequency is 2. Similarly, th
, of 1 0 is 5, etc. These values are shown in the second cc
}uency of the data value 8 is 6 because there are 2 values
emulative frequencies are shown in the 3rd column of the ta
CF.
[00 x ( -)for each data point are shown in column 4 o
rt+1
jsing normal probability paper is also shown below.
Individual Absolute
X, Frequency AF
5 2
6 2
8 2
9 1
10 5
12 1
14 1
15 1

, '
•

Cumulative
Frequency CF, Y,
2 12.50
4 25.00
6 37.50
7 43.75
12 75.00
13 81.25
14 87.50
15 93.75

,.•''"

5.
eab
lumi
of 5,
ble.
(the

solute frequency of 6 is
i of the table below.
2 values of 6, and 2
table below. A pbt of

2 5 10 20 30 40 50 60 70 80 90 96 98
Y
EPA QA/G-9
2.3- 11
QA96
-------
001 005 0.1 02 05 1 25 10 . 20 30 40 50 60 70 80 90 95 88 09 99 8 99 9 9999

Figure 2.3-6. Normal Probability Paper

EPAQA/G-9 2.3-12 QA96
-------
2.3.7 Plots for Two or More' Variables
Data often consist of measurements of several characteristics (variables) for each sample point in the
data set For example, a data set may consist of measurements of weight, sex, and age for each animal in a
sample or may consist of dairy temperature readings for several cities. In this case, graphs may be used to
compare and contrast different variables. For example, the analyst may wish to compare and contrast the
temperature readings for different cities, or different sample points (each containing several variables) such
the height, weight, and sex across individuals in a study.

To compare and contrast individual data points, some special plots have been developed to display
multiple variables. These plots are discussed in section 2.3.7. 1. To compare and contrast several variables,
collections of the single variable displays described in previous sections are useful For example, the analyst
may generate box and whisker plots or histograms for each variable using the same axis for all of the
variables. Separate plots for each variable may be overlaid on one graph, such as overlaying quantile plots
for each variable on one graph. Another useful technique for comparing two variables is to place the stem
and leaf plots back to back. In addition, some special plots have been developed to display two or more
variables. These plots are described in sections 2.3.7.2 through 2.3.7.4.

23.7.1 Ptofts for Individual Data Points

Since it is difficult to visualize data in more than 2 or 3 dimensions, most of the plots developed to
display multiple variables for individual data points involve representing each variable as a distinct piece of a
two-dimensional figure. Some such plots include Profiles, Glyphs, and Stars (Figure 2.3-7). These graphical
representations start with a specific symbol to represent each data point, then modify the various features of
the symbol in proportion to the magnitude of each variable. The proportion of the magnitude is determined
by letting the minimum value for each variable be of length 0, the mammum be of length 1. The remaining
values of each variable are then proportioned based on the n™gnitiiA». of each value in relation to the
Profll© Plot
Glyph Plot
Star Plot
Figure 2.3=7. Example of Graphical Representations of
A profile plot starts with a line segment of a fixed length. Then lines spaced an equal distance apart
and extended perpendicular to the line segment represent each variable. A glyph plot uses a circle of fixed
radius. From the perimeter, parallel rays whose sizes are proportional to the magnitude of the variable extend
from the top half of the circle. A star plot starts with a point where rays spaced evenly around the circle
represent each variable and a polygon is then drawn around the outside edge of the rays.
EPAQA/G-9
2.3 -13
QA96
-------
23.12 Scatter Plot
For data sets consisting of paired observations where two or more continuous variables are measured
for each sampling point, a scatter plot is one of the most powerful tools for analyzing the relationship
between two or more variables. Scatter plots are easy to construct for two variables (Figure 23-8) and many
computer graphics packages can construct 3-dimensional scatter plots. Directions for constructing a scatter
plot for two variables are given in Box 2.3-13 along with an example.

A scatter plot clearly shows the
relationship between two variables. Both
potential outliers from a single variable and
potential outliers from the paired variables
may be identified on this plot A scatter
plot also displays the correlation between
the two variables. Scatter plots of highly
linearly correlated variables cluster
compactly around a straight line. In
addition, nonlinear patterns may be obvious
on a scatter plot For example, consider two
variables where one variable is
approximately equal to the square of the
other. A scatter plot of this data would
display a u-shaped (parabolic) curve.
Another important feature that can be
detected using a scatter plot is any
clustering effect among the data.
40
30
I
2
10
°c
* X
•

X
X X
. *
, . s*« * *
I 2 4 6 8
Chromium VI (ppb)
Figure 13-8. Example of a Scatter Plot
LetX,,)^..
then data p
first variable
variable is ft
Example: A
Chromium \
Box 23-13: Directions for Generating a Scatter Plot and an Example
., X, represent one variable of the n data points and let Y,, Y* ..., Y, represent a second variable of
oints. The paired data can be written as (XbY,) for i = 1,...,n. To construct a scatter plot, plot the
along the horizontal axis and the second variable along the vertical axis. It does not matter which
laced on which axis.
i scatter plot wl be developed for the data below. PCE values are displayed on the vertical axis and
/I values are displayed on the horizontal axis of Figure 2.3-8.
PCE
(ppb)
14.49
37.21
10.78
18.62
7.44
37.84
13.59
4.31
Chromium
VI (ppb)
3.76
6.92
1.05
6.30
1.43
6.38
5.07
3.56

PCE
(ppb)
2.23
3.51
6.42
2.98
3.04
12.60
3.56
7.72
Chromium
VI (ppb)
0.77
1.24
3.48
1.02
1.15
5.44
2.49
3.01

PCE
(ppb)
4.14
3.26
5.22
4.02
6.30
8.22
1.32
7.73
5.88
Chromium
VI (ppb)
2.36
0.68
0.65
0.68
1.93
3.48
2.73
1.61
1.42

EPAQA/G-9
2.3 -14
QA96
-------
2-3.7 J Extensions of the Scatter Plot
It is easy to construct a 2-dimensional scatter plot by hand and many software packages can construct
a useful 3-dimensional scatter plot However, with more than 3 variables, it is difficult to construct and
interpret a scatter plot Therefore, several graphical representations have been developed that extend the idea
of a scatter plot for data consisting of 2 or more variables.

The simplest of these graphical
representations is a coded scatter plot
In this case, all possible pairs of data are
given a code and plotted on one scatter
plot For example, consider a data set of
3 variables: variable A, variable B, and
variable C. Using the first variable to
designate the horizontal axis, the analyst
may choose to display the pairs (A, B)
using an X, the pairs (A, C) using a Y,
and the pairs (B, C) using a Z on one
scatter plot All of the information
described above for a scatter plot is also
available on a coded scatter plot
However, this method assumes that the
ranges of the three variables are
40
m
1"
i
10
°0

Ctmn*Jnvt.PCC
°o o
MmUmviPCE
a
Afcuki* «•. Chramfcn V
•f
o
-° £* *
ifejft-cpB* , + •*•+ -t-
10 20
(POO)

Figure 2.3-9. Example of a Coded Scatter Plot
comparable and does not provide information on three-way or higher interactions between the variables. An
example of a coded scatter plot is given in Figure 2.3-9.

A parallel coordinate plot also extends the idea of a scatter plot to higher dimensions. The parallel
coordinates method employs a scheme where coordinate axes are drawn in parallel (instead of perpendicular).
Consider a sample point X consisting of values X, for variable 1, Xj for variable 2, and so on up to X,, for
variable p. A parallel coordinate plot
is constructed by placing an axis for
each of the p variables parallel to
each other and plotting X, on axis 1,
X, on axis 2, and so on through X,
on axis p and joining these points
with a broken line. This method
contains all of the information
available on a scatter plot in addition
to information on 3-way and higher
interactions (e.g., clustering among
three variables). However, for p
variables one must construct (p+l)/2
parallel coordinate plots in order to
display all possible pairs of variables.
An example of a parallel coordinate
plot is given in Figure 2.3-10.
Figure 23-10. Example of a Parallel Coordinates Plot
EPAQA/G-9
2.3 -15
QA96
-------
A scatter plot matrix is another useful method of extending scatter plots to higher dimensions. In
this case, a scatter plot is developed for all possible pairs of the variables which are then displayed in a matrix
format This method is easy to implement and provides a concise method of displaying the individual scatter
plots. However, this method does not contain information on 3-way or higher interactions between variables.
An example of a scatter plot matrix is contained in Figure 2.3-11.
40
_
f 30
SS
E
i 20
I
1
0 10

°c
40

a
|30
E
2 20
E
5 10
M
14
12
5 10
a
7 8
_
~t ** (
. . c 8
-4r~ 3
-K *
jL.
X. . . . !
10 20 30 40 C
r Chromium IV (ppb) 14
12

« 10
ft
7 8
I- ^
B 6
. .\ ++ ^
^ *.n3j»f+ + ^
•
. +
+ +

' 1 '•

' •^H" ~tjT_ _L
~1~ '"
-H-
-i- . , . ,
1 10 20 30 40
r Chromium IV (ppb)
-f.
i
4- I
+ . '
XI
;
-r !
u , , . .
0 2 4 6 8 10 12 14 "02468 10 12 14
AtrizliM (ppb) AtrubM (ppb)
Figure 23-11. Example of a Matrix Scatter Plot
23.1.4 Empirical QuantOe-QuantOe Plot

An empirical quantile-quantile (q-q) plot involves plotting the quantiles (section 2.2.1) of two data
variables against each other. This plot is used to compare distributions of two or more variables; for
example, the analyst may wish to compare the distribution of lead and iron samples from a drinking water
well. This plot is similar in concept to the theoretical quantile-quantile plot and yields similar information in
regard to the distribution of two variables instead of the distribution of one variable in relation to a fixed
distribution. Directions for constructing an empirical q-q plot with an example are given in Box 2.3-14.

An empirical q-q plot is the graph of the quantiles of one variable of a data set against the quantiles
of another variable of the data set This plot is used to determine how well the distribution of the two
variables match. If the distributions are roughly the same, the graph is linear or close to linear. If the
distributions are not the same, than the graph is not linear. Even if the graph is not linear, die departures from
linearity give important information about how the two data distributions differ. For example, a q-q plot can
be used to compare the tails of the two data distributions in the same manner a normal probability plot was
used to compare the tails of the data to the tails of a normal distribution. In addition, potential outliers (from
the paired data) may be identified on this graph.
EPAQA/G-9
2.3 -16
QA96
-------
Box 2J-14: Direction* for Constructing an Empirical Q-Q Plot with an Example

Let X,, X» .... X. represent n data points of on* variable and let Y,, Y» .... Y, represent a second variable of m
data points. Let X,,»fbri = 1 ton, be the first variable listed in order from smaBert to largest so that X< ,, (i = 1)
is the smallest X,,, (i = 2) is the second smallest, and X,.,03n) is the largest LetY(l)>fori = 1 ton. bethe
second variable feted In order from smaBest to largest so that Y(l) (i » 1) is the smallest, Y(2) (i = 2) is the
second smallest, and Y(M) (I a m) is the largest

lfm = n: tf the two variables have the same number of observations, then an empirical q-q plot of the two
variables is simply a plot of the ordered values of the variables. Since n-m, replace m by n. A plot of the pairs
(X, , „ Y, ,,). (X,,,, Y(l)) ..... (X,.,, Y,.,) is an empirical quantae-quantfe plot

If n > m: If the two variables have a different number of observations, then the empirical quamHe-quantae plot
wffl consist of m (the smaler number) pairs. The empirical o/<| plot wi then be a plot of the ordered Y values
against the interpolated X values. For i « 1, i » 2, .... I «m, let v»(n/mXl- 0.5) + 0.5 and separate the result
into the integer part and the fractional part, Le., let v* j + g where] is the integer part and g is the fraction part
If g = 0, plot the pair (Y^X,,,). Otherwise, plot the pair (^(l-gJX^ + gX,,.,,). A plot of these pairs is an
empirical quantfle-quantto plot

Example: Consider two sets of contaminant readuigs from two separate drinking water wells at the same site.
The data from wefl 1 are: 1.32. 3.26, 3.56. 4.02. 4.14, 5.22, 6.30, 7.72, 7.73. and 8.22. The data from we* 2
are: 0.65. 0-68, 0.68, 1.42, 1.61. 1.93, 2.36, 2.49, 2.73, 3.01. 3.48. and 5.44. An empirical q-q plotwfl be
used to compare the distributions of these two wefe. Since there are 1 0 observations in wel 1 , and 12
observations in wel, the case for n •> m wffl be used. Therefore, for i * 1 . 2, .... 1 0, compute:

.5)+^ - 1.1 soj-1andg-.1. Since g*0, plot (1.32.(.9).65+(.1).68H1.32, 0.653)
2.3 soj»2andg-.3. Since g*0, plot (3.26,(.7).68+(.3).68H3.26, 0.68)
3.5 soj»3andg».5. Since g*0,ptot(3.56,(.5).68+(.5)1.42)=(3.56,1. 05)
Continue this process for i =4. 5, 6, 7, 8, 9, and 10 to yield the following 10 data pairs (1.32, 0.653). (3.26,
0.68). (3.56, 1.05), (4.02, 1.553), (4.14. 1.898X (5.22. 2.373), (6.30. 2.562), (772, 2.87), (7.73. 3.339). and
(8.22. 5.244). These pairs are plotted below, along with the best fitting regression fine.
10

-|^{2-.5)+.5
8
10
0246
Quantf!«sofV\feil1
This graph indicates the variables behave roughly the same since there are no substantial deviations from the
fitted fine.
EPAQA/G-9
2.3 -17
QA96
-------
2.3.8 Plots for Temporal Data

Data collected over specific time intervals (e.g., monthly, biweekly, or hourly) have a temporal
component. For example, air monitoring measurements of a pollutant may be collected once a minute or once
a day; water quality monitoring measurements of s contaminant level may be collected weekly or monthly.
An analyst examining temporal data may be interested in the trends over time, correlation among time
periods, and cyclical patterns. Some graphical representations specific to temporal data are the time plot,
correlogram, and variogram.

Data collected at regular time intervals are called time series. Time series data may be analyzed
vising Box-Jenkins modeling and spectral analysis. Both of these methods require a large amount of data
collected at regular intervals and are beyond the scope of mis guidance. It is recommended that the interested
reader consult a statistician.

The graphical representations presented in this section are recommended for all data that have a
temporal component regardless of whether formal statistical time series analysis will be used to analyze the
data. If the analyst uses a time series methodology, the graphical representations presented below will play
an important role in this analysis. If the analyst decides not to use time series methodologies, the graphical
representations described below will help identify temporal patterns that need to be accounted for in the
analysis of the data.

The analyst examining temporal environmental data may be interested in seasonal trends, directional
trends, serial correlation, and stationarity. Seasonal trends are patterns in the data that repeat over time, Le.,
the data rise and fall regularly over one or more time periods. Seasonal trends may be large scale, such as a
yearly trend where the data show the same pattern of rising and falling over each year, or the trends may be
small scale, such as a daily trend where the data show the same pattern for each day- Directional trends are
downward or upward trends in the data which is of importance to environmental applications where
contaminant levels may be increasing or dbnreasirig Send correlation is a measure of the extent to which
successive observations are related: If successive observations are related, statistical quantities calculated
without accounting for serial correlation may be biased. Finally, another item of interest for temporal data is
stationarity (cyclical patterns). Stationary data look the same over all time periods. Directional trends and
increasing (or decreasing) variability among the data imply that the data are not stationary.

Temporal data are sometimes used in environmental applications in conjunction with & statistical
hypothesis test to determine if contaminant levels have changed. If me hypothesis test does not account for
temporal trends or seasonal variations, the data must achieve a "steady state" before the hypothesis test may
be performed. Therefore, the data must be essentially the same for comparable periods of time both before
and after the hypothesized time of change.

Sometimes multiple observations are taken in each time period. For example, the sampling design
may specify selecting 5 samples every Monday for 3 months. If mis is the case, the time plot described in
section 2.3.8.1 may be used to display fee data, display the mean weekly level, display & confidence interval
for each mean, or display & confidence interval for each mean with the individual data values. A time plot of
all the data can be used to determine if the variability for the different time periods changes. A time plot of
the means can be used to determine if the means are possibly changing between time periods. In addition,
each time period may be treated as a distinct variable and the methods of section 2.3.7 may be applied.
EPAQA/G-9 2.3-18 QA96
-------
2.3.8.1 Time Plot
One of the simplest plots to generate mat provides a large amount of information is a time plot A
time plot is a plot of the data over time. This plot makes it easy to identify large-scale and small-scale trends
overtime. Small-scale trends show up on a time plot as fluctuations in smaller time periods. For example,
ozone levels over the course of one day typically rise until the afternoon, then decrease, and this process is
repeated every day. Larger scale trends, such as seasonal fluctuations, appear as regular rises and drops in
the graph. For example, ozone levels tend to be higher in the summer than in the winter so ozone data tend tq
show both a daily trend and a seasonal trend. A time plot can also show directional trends and increased
variability over time. Possible outliers may also be easily identified using a time plot
Data Values
_o
-------
2.3.8.2 Plot of the Autocorrelation Function (Correlogram)
Serial correlation is a measure of the extent to which successive observations are related If
successive observations are related, either the data vpust he transformed nr thia relationship must be
accounted for in the analysis of the data. The correlogram is a plot that is used to display serial correlation
when the data are coUected at equally spaced time interval. The autocorrelation function is a summary of the
serial correlations of data. The 1" autocorrelation coefficient (r,) is the correlation between points that are 1
time unit (k,) apart; the 2nd autocorrelation coefficient (ij) is the correlation betvveen points that arc 2 time
units (kj) apart; etc. A correlogram (Figure 2.3-13) is a plot of the sample autocorrelation coefficients in
which the values of k versus the values of rk are displayed. Directions for constructing a correlogram are
contained in Box 2.3-16; example calculations are contained m BOX 2.3-17. For large sample sizes, a
correlogram is tedious to construct by hand; therefore, software like DataQUEST (QA/G-9D) should be used.
The correlogram is used for modeling
time series data and may be used to determine if
serial correlation is large enough to create
problems in the analysis of temporal data using
other methodologies besides formal time series
methodologies. A quck mewed fa? determining
if serial correlation is large is to place horizontal
lines at ±2A/n on the correlogram (shown as
dashed lines on Figure 2.3-13). Autocorrelation
coefficients that exceed this value require further
UIVCStlCStlOD.

In application, the correlogram is only
useful for data at equally spaced intervals. To
relax this restriction, a variogram may be used
instead The variogram displays the same
information as a correlogram except that the data may be based on unequally spaced time intervals. For more
information on the construction and uses of the variogram, consult a statistician.
1.25
1
0.75
«"
0.25
0
-0.25
•0.5

•»
*
X
« u ' *
* **
* » * m
» * * ** *
"******

0 5 10 15 20 25 30
k
Figure 23-13. Example of a Correlogram
Box 2.3-16: DlrectkMt* for Constructing a Correlogram

LetX,, Xj,.... X, represent the data pointa ordered by time for equafly spaced time points. Le., X, was coBected at
time 1, X, was collected at time 2, and so on. To construct a correlogram, first compute the sample autocorrelation
coefficients. So for k = 0,1,.... compute rk where

rk = h and gk = £ XtXt.k - (n-QX2
60 *•**!

Once the r» have been computed, a correlogram b the graph (k. rj for k = 0.1 and so on. As a
approximation, compute up to approximately k » n/8. Ateo. note that r, = 1. Finally, place horizontal fnes at ±2//n.
EPAQA/G-9
2.3-20
QA96
-------
Box 2J-17: Example Calculations for Generating a Correlogram

Discharge readings of tin were taken at hourly intervals from a standard monitoring point at a pond that coflects run-
off water.

Time: Sam. 9am. 10am. 11am. 12am. 1p.m. 2p.m. 3p.m.
ESB(ppb) 16 14 12 9 10 7 7 5

The mean of these readings is 10 ppb. Calculation of & is simplified by using Table 2.3-1.
g
80 - E x& ~ nX* " 900 - (8X10)2 = 100
81
82
83
- i

1-2
744 - (7X10)2 = 44
E«-2-(«-
1-3
481 - (5X10)2 =» -19
1-4
For higher values of k, there are few readings available to compute & so g* is not meaningful K foPows

**0 i _ _ 44
that r0 =
The corretogram is shown below.
-0.19.
1

0.8

0.4
-Q2

-0.4

-0.6
-0.8 •
+0.707
-0.707
0 1 23 4
k
(Hours)

In this case, it appears that the observations are not seriaBy correlated because afl of the corretogram points are
within the bounds of ±2//ff (±0.707). In Figure 2.3-13, if k represents months, then the corretogram shows a yearly
correlation between data points since the points at k=12 and k=24 are out of the bounds of ±2//n. This correlation
wi need to be accounted for when the data are analyzed
EPAQA/G-9
2.3-21
QA96
-------
Table 2.3-1. Table for Calculating & Comlogrant
1
1
2
3
4
5
6
7
8
Total
x,
16
14
12
0
10
7
7
3

16
14
12
9
10
7
7

x»a
.

16
14
12
©
10
7

x,.,

16
14
12
9
10

16
14
12
e

16
14
12

X*,

16
14

X*,

V
256
106
144
81
100
48
49
25
900
X.X*,

- 244
168
108
90
70
49
35
, 744
X.X,.,

192
126
120
63
70
35
606
X,X,.j

144
140
84
63
50
481
X.X,.,

160
98
84
45
387
x,**,

112
98
60
270
X.XHO

112
70
182
x,x^T

80
80
EPAQA/G-9
2.3 - 22
QA96
-------
2J.9 Plots for Spatial Data

The graphical representations of the preceding sections may be useful for exploring spatial data.
However, an analyst examining spatial data may be interested in the location of extreme values. Overall
spatial trends, and the degree of continuity among neighboring locations. Graphical representations for
spatial data include postings, symbol plots, correlograms. h-scatter plots, and contour plots.

The graphical representations presented in this section are recommended for all spatial data
regardless of whether or not geostatistical methods will be used to analyze the data. The graphical
representations described below will help identify spatial patterns that need to be accounted for in the analysis
of the data. If the analyst uses geostatistical methods such as kriging to analyze the data, the graphical
representations presented below will play an important role in geostatistical analysis.

2.3.9.1 Posting Plots

A posting plot (Figure 2.3-14) is a map of data locations along with corresponding data values. Data
posting may reveal obvious errors in data location and identify data values that may be in error. The graph of
the sampling locations gives the analyst an idea of how the data were collected (i.e., the sampling design),
areas that may have been inaccessible, and areas of special interest to the decision maker which may have
been heavily sampled. It is often useful to mark the highest and lowest values of the data to see if there are
any obvious trends. If all of the highest concentrations fall in one region of the plot, the analyst may consider
some method such as post-stratifying the data (stratification after the data are collected and analyzed) to
account for this fact in the analysis. Directions for generating a posting of the data (a posting plot) are
contained in Box 2.3-18.
H.O 17.4 17.7 114
IOJ U 4.0 17.2
Figure 2 J-14. Example of a Posting Plot

2.3.9.2 Symbol Plots

For large amounts of data, a posting plot may not be feasible and a symbol plot (Figure 2.3-15) may
be used. A symbol plot is basically the same as a posting plot of the data, except that instead of posting
individual data values, symbols are posted for ranges of the data values. For example, the symbol '0' could
EPA QA/G-9
2.3 - 23
QA96
-------
represent all concentration levels less than 100 ppm, the symbol T could represent all concentration levels
between 100 ppm and 200 ppm, etc. Directions for generating a symbol plot are contained in Box 2.3-18.
Figure 2^-15. Example of a Symbol Plot
Direction* to 6«n«nAJne a Posting Plot «nds Symbol Pl@3
with an Example

On a map oS the aits, plot the location
-------
2.3.9 J Other Spatial Graphical Representations

The two plots described b sections 2.3.9.1 and 2.3.9.2 provide information on the location of
extreme values and spatial trends. The graphs below provide another Hem of interest to the data analyst,
continuity of the spatial data. The graphical representations are not described in detail because they are used
more for preliminary geostatistical analysis. These graphical representations can be difficult to develop and
interpret For more information on these representations, consult & statistician.

An h-scatterplot is a plot of all possible pairs of data whose locations are separated by a fixed
distance in a fixed direction (indexed by h). For example, a h-scatter plot could be based on all the pairs
whose locations are 1 meter apart in a southerly direction. A h-scatter plot is similar in appearance to a
scatter plot (section 2.3.7.2). The shape of the spread of the data in a h-scatter plot indicates the degree of
continuity among data values a certain distance apart in particular direction, if all the plotted values fall close
to a fixed line, then the data values at locations separated by a fixed distance in a fixed location are very
similar. As data values become less and less similar, the spread of the data around the fixed line increases
outward. The data analyst may construct several h-scatter plots with different distances to evaluate the
change in continuity in a fixed direction.

A correlogram is a plot of the correlations of the h-scatter plots. Because the h-scatter plot only
displays the correlation between the pairs of data whose locations are separated by a fixed distance in a fixed
direction, it is useful to have a graphical representation of how these correlations change for different
separation distances in a fixed direction. The correlogram is such a plot which allows the analyst to evaluate
the change in continuity in a fixed direction as a function of the distance between two points. A spatial
correlogram is similar in appearance to a temporal correlogram (section 2.3.8.2). The correlogram spans
opposite directions so that the correlogram with s fixed distance of due north is identical to the correlogram
with a fixed distance of due south.

Contour plots are used to reveal overall spatial trends in the data by interpolating data values
between sample locations. Most contour procedures depend on the density of the grid covering the sampling
area (higher density grids usually provide more information than lower densities). A contour plot gives one
of the best overall pictures of the important spatial features. However, contouring often requires that the
actual fluctuations in the dais values are smoothed so that many spatial features of the data may not be
visible. The contour map should be used with other graphical representations of the data and requires expert
judgement to adequately interpret the findings.
EPAQA/G-9 2.3-25 QA96
-------
CHAPTER 3
STEP 3s SELECT THE STATISTICAL TEST
THE DATA QUALITY ASSESSMENT PROCESS
Review DQOeond Sampling Design
Conduct Preliminary Data Review
Select the Statistical Test
Verify the Assumptions
Draw Conclusions From the Data
SELECT THE STATISTICAL TEST
SefecS an appropriate procedure tw analyzing
data bated on the prefarin&y data revtsvt
. Safect Statutes! Hypottmfe T«*J
• Identify Asaumftiora Undartyfrig Tes3
• Hypottwb toete tor a eingte popu&toi
• Hypothesis teats fa- comparing two popubsUcro
Step 3s Select the Statistical Test

o Select the statistical hypothesis test based on the data user's objectives end the results erf th©
preliminary data review.
° r? the problem involves comparing study results to a fixed threshold, such as a regulatory
standard, consider the hypothesis tests in section 3.2. . .
o If the problem involves comparing two population*, such as comparing data from two different
locations or processes, then consider the hypothesis tests in section 3.3.

o Identifyfrd assumptions underlying the statistic^ test
° List the key underlying assumptions of the statistical hypothesis test such es Distributional form.
dtepersion, independence, or others as applicable.
o Note any sensitive assumptions where relatively smal deviations could jeopardee the vafidity of
the test results.
EPA QA/G-9
QA96
-------
STEP 3: SELECT THE STATISTICAL TEST
Parameter
Mean
Proportion/ Percentile
Two Means
Two Proportions/Two
Percentiles
Non-Parametric
Comparison of Two
Populations
Test
One-Sample t-Test
Wilcoxoa Signed Rank Test
One-Sample Proportion Test
Two-Sample t-Test
Satterthwaite's Two-Sample t-Test
Two-Sample Test for Proportions
Wilcoxoa Rank Sum Test
QuantileTest
Section
3.2.1.1
3.2.1.2
3.2.2.1
3.3.1.1
3.3.1.2
3.3.2.1
3.3.3.1
3.3.3.2
Directions
Box3.2-l
Box 3.2-3
Box 3.2-5
Box 3.2-7
Box 3.2-8
Box 3.3-1
Box 3.3-3
Box 3.3-5
Box 3.3-7
Box 3.3-9

Example
Box 3.2-2
Box 3.2-4
Box 3.2-6
Box 3.2-9
Box 3.3-2
Box 3.3-4
Box 3.3-6
•Box 3.3-8

Box No. Page •
3.2-1: Directions for a One-Sample t-Test for Simple and Systematic Random Samples
with or without Compositing 3.2 - 3
3.2-2: An Example of a One-Sample t-Test for a Simple Random or Composite Sample 3.2 = 4
3.2-3: Directions for a One-Sample t-Test for & Stratified Random Sample 3.2-5
3.2-4: An Example of a One-Sample t-Test for a Stratified Random Sample 3.2-6
3.2-5: Directions for a Wilcoxon Signed Rank Test for Simple and Systematic Random Samples — 3.2-8
3.2-6: An Example of the Wilcoxoa Signed Rank Test for a Simple Random Sample 3.2-9
3.2-7: Directions for the Large Sample Approximation to the Wilcoxon Signed Rank Test
for Simple and Systematic Random Samples ...; 3.2 -10
3.2-8: Directions for the One-Sample Test for Proportions for Simple and Systematic
Random Samples ...I..... 3.2-12
3.2-9: An Example of the One-Sample Test for Proportions for a Simple Random Sample .... 3.2 -13
3.3-1: Directions for the Student's Two-Sample t-Test (Equal Variances)
for Simple and Systematic Random Samples 3.3 - 3
3.3-2: An Example of a Student's Two-Sample t-Test (Equal Variances) 3.3 - 4
3.3-3: Directions for Satterthwaite's t-Test (Unequal Variances)
for Simple and Systematic Random Samples 3.3 - 5
3.3-4: An Example of Satterthwaite's t-Test for Simple and Systematic Random Samples 3.3-6
3.3-5: Directions for a Two-Sample Test for Proportions for Simple and Systematic Samples 3.3-8
3.3-6: An Example of a Two-Sample Test for Proportions for Simple and Systematic Samples 3.3-9
3.3-7: Directions for the Wileoxon Rank Sum Test for Simple and Systematic Samples ....3.3-11
3.3-8: An Example of UK WUcoxon Rank Sum Test feSimpte and Systonatk Samples 3.3° 12
3.3-9: Directions for the Large Sample Approximation to the Wilcoxon Rank Sum Test
for Simple and Systematic Random. Samples .......< 3.3 - 13
EPAQA/G-9
FY95
-------
CHAPTER 3
STEP 3s SELECT THE STATISTICAL TEST

3.1 OVERVIEW AND ACTIVITIES

This chapter provides information that the analyst can use in selecting an appropriate statistical
hypothesis test that will be used to draw conclusions from the data. A brief review of hypothesis testing is
contained in Chapter 1, "Developing DQOs Retrospectively." There are two important outputs from this
step: ( 1) the test itself, and (2) the assumptions underlying the test that determine the validity of conclusions
drawn from the test results.

This section describes the two primary activities in this step of the DQA Process. The remaining
sections in this chapter contain statistical tests that may be useful for analyzing environmental data. In the
one-sample tests discussed in section 3.2, data from a population are compared with an absolute criterion
such as a regulatory threshold or action level In the two-sample tests discussed in section 3.3, data from a
population are compared with data from another population (for example, an area expected to be
contaminated might be compared with a background area). For each statistical test, this chapter presents its
purpose, assumptions, limitations, robustness, and the sequence of steps required to apply the test

The directions for each hypothesis test given in this chapter are for simple random sampling and
systematic sampling designs, except where noted otherwise. If a more complex design is used
,
(such as a stratified design or a composite random sampling design) then different formulas are needed, some
of which are contained in this chapter.

3.L1 Sdec* Statistical Hypothesis Test

If a particular test has been specified either in the DQO Process, the Quality Assurance Project Plan,
or by the particular program or study, the analyst should use the results of the preliminary data review to
determine if this statistical test is legitimate for the data collected If the test is not legitimate, the analyst
should document why this particular statistical test should not be applied to the data and then select a
different test, possibly after consultation with the decision maker. If a particular test has not been specified,
the analyst should select & statistical test based on the data user's objectives, preliminary data review, and
likeh/ viable assumptions.

3.1 2 Identify Assumptions Underlying the Statistical Test

All statistical tests make assumptions about the data. Parametric tests assume the data have some
distributional form (e.g., the t-test Msnmeg normal distribution), whereas nonparametric tests do not make
this assumption (e.g., the Wilcoxon test only assumes the data are symmetric but not necessarily normal).
However, both parametric and nonparametric tests may assume mat the data are statistically independent or
that there are no trends in the data. While examining the data, the analyst should always list the underlying
assumptions of the statistical hypothesis test, such as distribution, dispersion, or others as applicable.

Another important feature of statistical tests is their sensitivities (nonrobustness) to departures from
the assumptions. A statistical procedure is called robust if its performance is not seriously affected by
moderate deviations from its underlying assumptions. The analyst should note any sensitive assumptions
where relatively small deviations could jeopardize the validity of the test results.

EPAQA/G-9 3.1-1 QA96
-------
3.2 TESTS OF HYPOTHESES ABOUT A SINGLE POPULATION

A one-sample test involves the comparison of a population parameter (e.g., a mean, percentile, or
variance) to a threshold value. Both the threshold value and the population parameter were specified during
Step 1: Review DQOs and Sampling Design. In 8 one-sample test, the threshold value is a fixed number that
does not vary. If the threshold value was estimated (and therefore contains variability), a one-sample test is
not appropriate. An example of a one-sample test would be to determine if 95% of all companies emitting
sulfur dioxide into the air are below a fixed discharge level For this example, the population parameter is a •
percentage (proportion) and the threshold value is 95% (.95). Another example is a common Superfund
problem that involves comparing the mean contaminant concentration to a risk-based standard. In this case,
the risk-based standard (which is fixed) is the threshold value and the statistical parameter is the true mean
contaminant concentration level of the site. However, comparing the mean concentration in an area to the
mean concentration of a reference area (background) would not be a one-sample test because the mean
concentration in the reference area would need to be estimated.

The statistical tests discussed in this section may be used to determine if 9 * 00 or 6 > 0Q, where 6
represents either the population mean, median, a percentile, or a proportion and 60 represents the threshold
value. Section 3.2.1 discusses tests concerning the population mean, section 3.2.2 discusses tests concerning
a proportion or percentile, aod section 3.2.2 discusses tests for a median.

3.2.1 Tests for a Mean

A population mean is a measure of the center of the population distribution. It is one of the most
commonly used population parameters in statistical hypothesis testing because its distribution is well known
for large sample sizes. The hypotheses considered in this section are:

Case 1: HO: ji s C vs. HA: u > C; and

Case2: H^ Hi C vs. HA: ji C for Case 1 or a value p, < C for Case 2
representing the bound of the gray region; the false positive error rate c at C; the false negative error rate P at
u,; and any additional limits on decision emus. It may be helpful to label any additional false positive error
limits as Oj at Cj, a, at C,, etc., and to label any additional false negative error limits as P2 at u^ 0, at u3, etc.
For example, consider the following decision: determine whether the mean contaminant level at & waste site
is greater than 10 ppm. The null hypothesis is H,,: 112 10 ppm and the alternative hypothesis is HA: n<10
ppm, A gray region has been set from 10 to 8 ppm, s false positive error rate of 5% has been set at 10 ppm,
and a false negative error rate of 10% has been set at 8 ppm. Thus, C s 10 ppm, u, = 8 ppm, a =• 0.05, and
p = 0.1. If an additional false negative error rate was set, for example, an error rate of 1% at 4 ppm, then
P2 = .01 and u, = 4 ppm.
EPAQA/G-9 3.2-1 QA96
-------
3.2.1.1 The One-Sample 4-TesS

PURPOSE

Given a random sample of size n (or a composite sample of size a, each composite consisting of k
aliquots), the one-sample t-test can be used to test hypotheses involving the mean (u) of the population from
which the sample was selected.

ASSUMPTIONS AND THEIR VERIFICATION

The primary assumptions required for validity of the one-sample t-test are that of a random sample
(independence of the data values) and that the sample mean X is approximately normally distributed.
Because the sample mean and standard deviation are very sensitive to outliers, the t-test should be preceded
by a test for outliers (see section 4.4).

Approximate normality of the sample mean follows from approximate normality of the data values.
In addition, the Central Limit Theorem states that the sample mean of a random sample from e population
with an unknown distribution will be approximately normally distributed provided the sample size is large.
This means that although the population distribution from which the data are drawn can be distinctly different
from the normal distribution, the distribution of the sample mean can still be approximately normal when the
sample size is relatively large. Although preliminary tests for normality of the data can and should be done
for small sample sizes, the conclusion that the sample does not follow a normal distribution does not
automatically invalidate the t-test, which is robust to moderate violations of the assumption of normality for
large sample sizes.

LIMITATIONS AND ROBUSTNESS

The t-test is not robust to outliers because the sample mean and standard deviation are influenced
greatly by outliers. The Wilcoxon signed rank test (see section 3.2.1.2) is more robust, but is slightly less
powerful This means that the Wilcoxon signed rank test is slightly less likely to reject the null hypothesis
when it is false than the t-test

The t-test has difficulty dealing with less-than values, e.g., values below the detection limit,
compared with tests based on ranks or proportions. Tests based on a proportion above a given threshold
(section 3.2.2) are more valid in such a case, if the threshold is above the detection limit. It is also possible to
substitute values for below detection-level data (e.g., '/* the detection level) or to adjust the statistical
quantities to account for nondetects (e.g., Cohen's Method for normally or lognormally distributed data). See
Chapter 4 for more information on dealing with data that are below the detection level.

SEQUENCE OF STEPS

Directions for a one^sample t-test for & simple, systematic, and composite random samples are given
in Box 3.2-1 and an example is given in Box 3.2-2. Directions for a one-sample t-test for a stratified random
sample are given in Box 3.2-3 and an example is given in Box 3.2-4.
EPAQA/G-9 3.2-2 QA96
-------
BOB 3.2-1 : Directions for a One-Sampte fc-Tasfi
for Simple and Systematic Random Samples
with w without Compositing

Let X,. X*.. . ,X, represent than data points. These could be either n individual samples or n composite
samples consisting of k aliquote each. These are the steps for & one-eampie t-test for Case 1 (H^ u s. C);
modifications for Case 2 (H,: u i C) are given in bracee {}.
STEP1:

STEP 2:

STEP 3:

STEP 4:
STEPS:
Calculate th* sample mean X (section 232) and the standard deviation s (section 2.2.3).

Use Table A-1 of Appends: A to find lh« critical value V.such that 100(1-a)% of the tdistribution
wfthn-1 degrees of freedom » below t^ Forexampie.ifaaO.OSandns 16, thenn-1 = 15
and t^, = 1.753.

Calculate the sample valu® t ° (X-Q / (s/fi) .

Compare t with t^,

1) If t > t^, ft < -tj. the nuB hypothesis may be rejected Go to Step 6.
2) If t > tn, {t < -VJ, there is hot enough evidence to reject tha nui hypothesis and the fates
negative error rate should be verified. Go to Step 5.

As ma nuB hypothesis (HO wae not rejected, criculata either tne power of the test or the aarnpJe
ese necessary to achieve the fates positive and false negative error rates. To calculate th©
power, assume that the tru® values for the mean and standard deviation are those obtained In the
sample and use a software package 6ke the Decision Error Feasibity Trial (DEFT) software (EPA
G-4D, 1094) or the Data QuaSty Evaluation Statistical Toolbox (DataQUEST) software (QA/G-9D,
1998) to generate the power curve of the test

K only one false negative error rate (3) has been specified (at u,), £ & possible to calculate the
sample see which achieves the DQOs, assuming the true mean and standard deviation are equal
to the values estimated from the sample, instead of calculating the power of the test To do this,
calculate m =
+ (0-5)^ where ^ © th® pP percentito of the standard
normal distribution (Table A-1 of Appendte A). Round m up to the Dart integer. If m $ n, the false
negative error rate has been satisfied If m»n. the fate® negative error rate has not been
satisfied

STEP 6: The results of the test may bsc

1) tha nui hypothesfe was rejected and Bseems that the true mean is less than C {greater
thanC};

2) th® nufi hypothesis was not rejected and the fatee negative error rate was satisfied and 8
seems that the true mean Is greater than C {lose tfvan C); or

3) the nui hypothesis was not rejected and the fatee negative error rate was not satisfied
and 8 seems that the true mean ~n greater than C (less than C} but condusione are
uncertain since the sample size was tod smafi.

Report the results of the test, th® sample size, sample mean, standard deviation, t and t^.

Note: The calculations for the t-test are the same for both simple random or composite random sampling.
The use of compositing wi usuafiy resutt In a smaller value of V than simple random samping.
EPAQA/G-9
3.2-3
QA96
-------
BOH 3.2-3: An Example of a One-Sample t-Tes&
for a Simple Random or Composite Sample

Consider the foiowing 9 random (or composite samples each of k afiquots) data points: 82.39 ppm, 103.43
ppm, 104.93 ppm, 105.52 ppm. 98.37 ppm, 113.23 ppm, 86.62 ppm, 91.72 ppm. and 108.21 ppm. Trite
data wi8 be used to test the hypothesis: rV \i s. 95 ppm vs. HA: \t > 95 ppm. The decision maker has
specified a 5% false positive decision error Emit (a) at 95 ppm (C). and a 20% false negative decision.error
Emit (P) at 105 ppm (MI).

STEP1: In Boxes 2.3-3 and 2.3-5 of Chapter 2,8 was found that
STEP 2:
STEPS:
STEP 4:
STEPS:
X = 99.38 ppm and s » 10.41 ppm.

Using Table A-1 of Append* A, the critical value of the t Distribution with 8 degrees of freedom i

f „ X-C m 99.38 - 95 ,

s/Vw 10.41/,/9~

Because 1.26 > 1.86, there is not enough evidence to reject the nul hypothesis and the fats®
negative error rate should be verified

Because there is only one false negative error rate, it is possible to use the sample size formula to
determine if the error rate has been satisfied Therefore,
m - ' ""« ^ * (O.Stf,
3 10.41^(1.645 * 0.842)* + (05)(L645)2 . 3.049, Le.,9
(95 - 105)2

Notice that it is customary to round upwards when computing a sample size. Since m=n, the
false negative error rate has been satisfied

STEP 6: The results of the hypothesis test were that the nul hypothesis was not rejected but the false
negative error rate was satisfied Therefore, it seems that the true mean is less than 95 ppm.
EPA QA/G-9
3.2-4
QA96
-------
Box 3.3-3: Directions for a One-Sample t-Tas6
fcxr a Stratified Random Sample

Let h=1,2,3 L represent the L strata and rv, represent the sample size of stratum h. These steps are for
aone-samplat-testforCaseK.Ho: u £C);moolficatton8forC«8«2(H0: M i C) are given in braces {}.

STEP 1: Calculate the stratum weights (WO by calculating the proportion of the volume in
V
Btratumh. Wb =—— where V,, is the surface area of stratum h multiplied by the depth of
" L
EV
h

sampfing in stratum h.

STEP 2: For each stratum, calculate the sample stratum mean Xh

standard error sk = ^

L _
STEP 3: Calculate bverai mean X~ • T, WyT*. and variance
and the sample stratum
STEP 4: Calculate the degrees of freedom (dof): dof
Use Tabte A-1 of Appendix A to find the critical value t^ M that 100(1-a)% of th« t dtetributfon
with the above degress of freedom (rounded to the next highest Integer) is bstowt^.
STEP 5: Calculate the sampte valu®: t,»
fi*
STEPS: Compare t to W (f t>t^ag<-tj, the nui hypothesis may be rejected. Go to Step 8. lft>t,^
fl < -W. there a not enough evidence to reject the nui hypothesis and the false negative error
rate should be verified. Go to Step 7.

STEP 7: If the nuB hypothesis was not rejected, calculate either the power of the test or the sample size
necessary to achieve the false positive and false negative error rates (see Step 5. Box 3.2-1).

STEPS: The results of the tesS may b©:

1) th® nui hypothesfe was rejected eoKseems that the true mean iatess than C {greater
thenC);

2) tha nui hypothesis was no* rejected and the fate* negative error rate was satisfied and it
eeem@ that the true mean to greater than C (lese than C}; or

3) tha nui hypotheste was not rejected and the false negative error rate was not satisfied
and it seems that the true mean is greater than C {less than C} but conclusions are
uncertain since the sample size was too sma&

Report the results of the test as wei as the sample ssza. sample mean, and sample standard
deviation for each stratum, the estimated t, the dof, and W
EPAQA/G-9
3.2-5
QA96
-------
BOE 3.2-4: An Exam pis of a One-Sam ptet-Tasi
tor a Stratttted Random Sample

Consider a stratified sample consisting of two strata where stratum 1 comprises 10% of the total site surfec®
area and stratum 2 comprises the other 90%, and 40 samples were coOected from stratum 1 , and 60 samples
were collected from stratum 2. For stratum 1, the sample mean is 23 ppm and the sample standard deviation
B 18.2 ppm. For stratum 2, the sample mean is 35 ppm, and the sample standard deviation a 20.5 ppm.
This information wffl be used to test the nui hypothesis that the overai site mean is greater than or equal to 40
ppm, i.e., rV M 2 40 ppm (Case 2). The decision maker has specified a 1% false positive decision imrt (a) at
40 ppm and a 20% false negative decision error imfc (0) at 35 ppm (MA-

STER 1: W, = 10/100* 0.10, W2 = 90/100* 0.9.
STEP 2:
STEP 3:
STEP 6:

STEP 7:
From above. X, = 23 ppm. X, = 35 ppm, s, = 1 8.2, and 8, = 20.5. This information was
developed ueing the equations in step 2 of Box 3.2-3.
The estimated overai mean concentration is

_ I
(.9)(35) = 33.8 ppm.
and the estimated overaB variance fo:

L
STEP 4: The approximate degrees of freedom (dof) is:

„ (S.76)2
dof
(40)239
^ (.9)4(20.5)4
60.8.1©., 61
Note how the degrees of freedom has been rounded up to a whole number. Using Tabte A-1 of
Appenda A, the critical value Va of th« t oWributkMi with 61 dof is approximateiv2.39.
STEPS: Calculate the sampte value
-C _ 33.8 - 40

1/5775
-2J8
Because -2.58 < -2.39 the nui hypothesis may b® rejected
Because the nui hypothesis was rejected, it is concluded that the mean is probably less than 40
ppm. In this example there to no need to calculate the false negative rate as the nui hypothesis
was rejected and so the chance of making a false negative error te zero by definition.
EPA QA/G-9
3.2-6
QA96
-------
3 J.U The Wikoxoa Signed Rank (One-Sample) Test for the Mesa

PURPOSE

Given a random sample of size n (or composite sample size n, each composite consisting of k
aliquots), the Wilcoxon signed rank test can be used to test hypotheses regarding the population mean or
median of the population from which the sample was selected.

ASSUMPTIONS AND THEIR VERIFICATION

The Wilcoxon signed rank test assumes that the data constitute & random sample from a symmetric
continuous population. (Symmetric me^ns that the underlying population frequency curve is symmetric about
its mean/median.) Symmetry is a less stringent nyyumptjon than normality since all normal distributions are
symmetric, but some symmetric distributions are not normal The mean and median are equal for a
symmetric distribution, so the null hypothesis can be stated in terms of either parameter. Tests for symmetry
can be devised which are based on the chi-squared distribution, or a test for normality may be used. If the
data are not symmetric, it may be possible to transform the data so that this assumption is satisfied. See
Chapter 4 for more information on transformations and tests for symmetry.

LIMITATIONS AND ROBUSTNESS

Although symmetry is a weaker assumption than normality, it is nonetheless a strong assumption. If
the data are not approximately symmetric, this test should not be used. For large sample sizes (n > SO), the
t-test is more robust to violations of its assumptions than the Wilcoxon signed rank test For small sample
sizes, if the data are not approximately symmetric and are not normally distributed, this guidance
pyommMiHg consulting a statistician before selecting a statistical test or changing the population parameter
to the median and applying a different statistical test (section 3.2.3).

The Wilcoxon signed rank test may produce misleading results if many data values are the same.
When values are the same, their relative ranks are the same, and this has the effect of diluting the statistical
power of the Wilcoxon test Box 3.2-5 demonstrates the correct method used to break tied ranks. If possible,
results should be recorded with sufficient accuracy so that a large number of equal values do not occur.
Estimated concentrations should be reported for data below the detection limit, even if these estimates are
negative, as their relative magnitude to the rest of the data is of importance

SEQUENCE OF STEPS

Directions for the Wilcoxon signed rank test for a simple random sample and a systematic simple
random sample are gives in Box 3.2-5 and an example is given in Box 32-6 for samples sizes smaller than
20. For sample sizes greater than 20, the large sample approximation to the Wilcoxon Signed Rank Test
shouldbeused. Directions for this test are given in Box 3.2-7.
EPAQA/G-9 3.2-7 QA96
-------
Boa 3.2-5: Directions fw the Wtteoxon Signed Rank T®st
for Simple and Systematic Random Samples

Let X, Xj,.... X, represent the n data points. The fofiovwig describes the steps for applying the WBcoxon
signed rank test for both Case 1 (H^ M « C) and Case 2 (H» |i i C) for a sample size (n) less than 20. If tho
sample size is greater than or equal to 20, use Box 3.2-7.

STEP 1: If possible, assign values to any measurements below the detection Emit If this a not possible,
assign the value'Detection Limit divided by 2° to each value. Then subtract C from each of then
observation* X, to obtain the deviations d,» X, - C. If any of the deviations are zero delete them
and correspondingly reduce the sample size n.

STEP 2: Assign ranks from 1 to n based on ordering the absolute deviations |dj (La., magnitude of
differences ignoring the sign) from smallest to largest The rank 1 a assigned to the smaSest
value, the rank 2 to the second smallest value, and so forth. If there are ties/assign the average
of the ranks which would otherwise have been assigned to the tied observations.

STEP 3: Calculate the signed rank for each observation. This signed rank is equal to the rank if the
deviation d, is positive, or equal to the negative rank if the deviation d, is negative.

STEP 4: For Casel, calculate the sum R of the ranks with a positive sign.

For Case 2, calculate the sum R of the ranks w8h a negative sign and take the absolute value of
this sum (Le., ignore the negative sign).
STEP 5:

If R 2 w,,. the nuE hypothesis may be rejected Go to Step 7.

If R « Wo, there is not enough evidence to reject the nul hypothesis, and the false negative error
rate wi need to be verified. Go to Step 8.

STEP 8: If the nuB hypothesis (HJ was not rejected, calculate either the power of the test or the sample
size necessary to achieve the false positive and false negative error rates using a software
package ike the DEFT software (EPA G-4D, 1994) or the DataQUEST software (EPA G-40.
1998). Calculate,
m
where z, is the p® percentae of the standard normal distribution (Table A-1 of Appendte A). Then
multiply m by 1.18 to account for toss in efficiency and if this te number m greater than or equal to
n, the false negative error rate has been satisfied.

STEP 7: The results of the test may bs:

1) the nui hypothesis was rejected, and for Casa 1, it seems the true mean is greater than C or
for Case 2. ft seems the true mean is less than C;

2) thanul hypothesis was not rejected, the false negative errorrate was satisfied, and for Casal.
g seems the true mean is less than C or for Case 2, 'A seerrw the true mean is greater than C; or

3) the nui hypothesis was not rejected, the fates negative error rate was not satisfied, and for
Cas® 1.8 seems the true mean is less than C or for Case 2. it seerrw the true mean @ greater
than C but the conclusions are uncertain because the sample size was too small
EPA QA/G-9
QA96
-------
Bos. 3.2-8: An Exampte of the Wilcoxon Signed Rank T®8«
fof a Simple Random Sample

Consider the foSowing 10 data points: 974 ppb. 1044 ppb. 1093 ppb. 897 ppb, 879 ppb, 1161 ppb, 839 ppb,
824 ppb, 796 ppb, and one observation below the detection Emit of 750 ppb. This data wi be used to test
the hypothesis H«; M * 1000 ppb vs. HA: u < 1000 ppb (Case 2). The decision maker has specified a 10%
false positive decision error Emit (a) at 1000 ppb (C). and a 20% fate® negative decision error Emit (0) at
900ppb(u,).

STEP 1: Assign the value 375 ppb (750 divided by 2) to th® data point below the detection fimit Subtract
C (1000) from each of the n observations X, to obtain the deviations d, = X, -1000. This » shown
in row 2 of the table beSow.

X 974 1044 1093 897 879 1161 839 824 796 375
d -26 +44 +93 -103 -121 +161 -161 -176 -204 -625
|4J 26 44 ' 93 103 121 161 161 176 204 625
rank 1 234 5 6.5 6.5 8 9 10
s-rank -1 2 ' 3 ' -« -5 8.5 -6.5 -6 -9 -10

STEP 2: Assign ranks from 1 to n based on ordering the absolute deviations |dj (magnitude ignoring any
negative sign) from smaDest to largest The absolute deviations are Bsted in row 3 of the table
above. Note that the data have been sorted (rearranged) for clarity so that the absolute
deviations are ordered from smafest to targes!

The rank1 fe assigned to the smafiest value, the rank 2 to the second smalest value, and so
forth. Observations 6 and 7 ere ties, therefore, the average (6+7)/2 » 6.5 wi be assigned to the
two observations. The ranks are shown in row 4.

STEPS: Calculate the signed rank for each observation. Thte signed rank is equal to the rank if the
deviation d, is positive, or equal to the negative rank if the deviation d,is negative. The signed
rank to shown In row 5.

STEP 4: Because of the form of the nui hypothesis (H,: u a 1000 ppb), Rfe the sum of the ranks with
negative stana Since. -1 + -4 + -5 + -6.5 *-6 + -9 + -10 »-43.5. R = 43.5.

STEP 5: Because there are only 10 data points. Table A-6 of Appendix A is used to find the critical value
w0 where a » 0.10. For the example, waio ° 15. Therefore, sine® 43.5 > 15, the nu8 hypothesis
may be rejected.

STEP 6: The nui hypothesis was rejected with a 10% significance level using the Wicoxon signed rank
test (w=15). Therefore, Swould seem thatthe true mean is below 1000 ppb.
EPAQA/G-9
3.2-9
QA96
-------
Box 3.2-7: Directions for th« Large Sample Approximation
to the Wlteoxon Signed Rank Test
for Simple and Systematic Random Samples

Let X,. X* .... X,, represent the n data points where n 'a greater than or equal to 20. The following describes
the steps for applying the targe sample approximation for the Wfcoxon signed rank test for both Case 1
(H.: M £ C) and Case 2 (rV M ^ C).
STEP 1 :
STEP 2:
STEP 3:
STEP 4:
STEP 5:
STEP 6:
If possible, assign values to any measurements below the detection limit If this a not possible,
assign the value "Detection Limit divided by 2° to each value. Then subtract C from each of the n
observations X, to obtain the deviations d, = X, - C. If any of the deviations are zero delete them
and correspondingly reduce the sample size n.

Assign ranks from 1 to n based on ordering the absolute deviation* |dj (I.e.. magnitude of
differences ignoring the sign) from smallest to largest The rank 1 & assigned to the smallest
value, the rank, 2 to the second smaEest value, and so forth. If there are lies, assign the average
of the ranks which would otherwise have been assigned to the tied observations.

Calculate the signed rank for each observation. This signed rank to equal to the rank if the
deviation d, is positive, or equal to the negative rank if the deviation 4 is negative.

For Case 1 , calculate the sum R of the ranke with a positive sign. For Case 2, calculate the sum
R of the ranks with a negative sign and take the absolute value of this sum (La., Ignore the
negative sign}. Then calculate: z
l)(2n + l)/24
Use Table A^1 of Appendix A to find the critical value z^, such that 1 00(1 -a)% of the norms)
distribution is below z^ For example, if a = 0.05, then r^ = 1.645. lfz»zV81thenufi
hypothesis may be rejected. If z > z^,, there Is not enough evidence to reject the nul hypothesis,
Therefore, the falsa negative error rate wiS need to be verified.

If the nul hypothesis (HJ was not rejected, calculate either the power of the test or the sample
size necessary to achieve the false positive and false negative error rates using a software
package Gke the DEFT software (EPA G-4D, 1 994) or the OataQUEST software (EPA G-*D,
1996). Calculate,
where Zp is the pP percentSe of the standard normal distribution (Table A-1 of Append!* A). Then
multiply m by 1.16 to account for loss in efficiency and if this value is greater than or equal to n,
the false negative error rate has been satisfied.

STEP 7: Th« results of the test may be:

1) the nul hypothesis was rejected, and foi Case 1, ft seem* the true mean te greater than Cor
for Case 2, 'A seems the trua mean e less than C;

2) the nul hypothesis was not rejected, the false negative error rate was satisfied, and for Case
1, it seems the fru® mean fe less than Cor for Case 2, it seems th© true mean fe greater than C;
or

3} the nui hypothesis was not rejected, the false negative error rate was not satisfied, and tor
Case 1. it seems the true mean is toss than C or for Case 2,8 seems the true mean to greater
EPAQA/G-9
3.2-10
QA96
-------
3.2.2 Tests for a Proportion or PercentiEe

This section considers hypotheses concerning population proportions and percentiles. A population
proportion is the ratio of the number of elements of a population that has some specific characteristic to the
total number of elements. A population percentile represents the percentage of elements of a population
having values less th«" some threshold C. Thus, if x is the 95* percentile of a population, 95% of the
elements of the population have values less than C and 5% of the population have values greater than C.

This section of the guidance covers the following hypothesis: Case 1: H,>: P s P0 vs. HA: P > P0
andCase2: H<,: P*P0 vs. HA: P
-------
Box 3.2-8: Directions for the One-Sample Test for Proportions
for Simple and Systematic Random Samples

This box describee the steps for applying the one-sample test for proportions for Case 1 (H,: P * P0);
modficationa for Case 2 (H^ Pi PJare given in braces {}.

STEP 1 : Given a random sample X,, X* . . . , X,, of measurements from the population, let p (smafl p) denote
the proportion of X*e that do not exceed C, Le., p a the number (k) of sample points that are toss
than or equal to C. divided by the sample see n.

STEP 2: Compute np, and n(1-p}. tf both np and n(1-p) are greater than or equal to 5, use Steps 3 and 4.
Otherwise, consult a statistician as analysis may be complex.
p -
STEP 3: Calculate z = — - fbrCaselor z = — - forCas®2.
STEP 4: Use Table A-1 of Appendix A to find the crffioy value z^8U*
distribution is betowz^ Forexampte, if a = 0.05 then z^ = 1645.

If 2 > z^ {2 < -z^J, the nufi hypothesis may be rejected. Go to Step 6.
If z » z*, {2 « -2vJ, there fe not enough evidence to reject the nui hypotheska. Therefore,
negative error rate wi need to be verified. Go to Step 5.

STEP 5: To calculate the power of the test, assume that the true values for the mean and standard deviation
are those obtained in the sample and use a statistical software package Eke the DEFT software
(EPA G-4D. 1994) or the t_ taQUEST csoftware (EPA G-8D. 1996} to generate the power curve of
.the test

If only one false negative error rate (P) has been specified (at P,), £ m possible to calculate the
sample size which achieves the DQOa. To do this, calculate
If m i n. the false negative error rate has been satisfied. Otherwise, the fates negative error rate ha©
not been satisfied

STEP 6: The results of the test may btc

1} the nuQ hypothesis was rejected and it seems that the proportion is greater than {less than} P*

2) the nu8 hypothesis was not rejected, the false negative error rate was satisfied, and it seems that
proportion is less than (greater than) P^ or

3} th@ nuS hypothesis was not rejected, the false negative error rate was not satisfied, end ilwoutd
seem the proportion was less than {greater than} Pfr but the conduaona ere uncertain because the
sample ske was too amsi.
EPAQA7G-9 3.2-12 QA96
-------
Bos &2-@: An Exampts of th© One-Sampk) T«s8 So? Proportions
tor a Simple Random Sampte

Consider 85 samples of which 11 samples have concentrations greater than the dean-op standard This
data wil be used to test the nui hypothesis H^ Pi.20vs.HA: P < .20 (Case 2). The decision maker has
specified a 5% fates positive rate (a) for P, = .2. and a false negative rate (P) of 20% for P, - 0.15.
STEP1:

STEP 2:

STEPS:
From the data, the observed proportion (p) fe p *» 11/85 «. 1294

np = (85X.1294) =*11 and n(1-p) - (85X1-.1294)» 74. Since both np and n(1-p) are greater
than or equal to 5, Steps 3 and 4 wi be used.
Because H^ Pi .20,Case2formulaswlbaused.

p + Sin - PQ .1294 * .5/85 - 2
z
-1.492
STEP 4:
STEP 5:
Using Table A-1 of Appendix A. 'A was found that z^ = zx -1.645. Because z < -z^, (i.e..
•1.492 < -1.645), the nuB hypothesis is not rejected so Step 5 wB need to be completed.

To determine whether the test was powerful enough, the sample size necessary to achieve the
DQOs was calculated as foSows:
m
422.18
STEP 6:
.15 - 2

So 423 samples ere required, many more than were aduaSy taken.

The nui hypothesis was not rejected and the false negative error rate was not satisfied.
Therefore, ft would seem the proportion is greater than 0.2. but this conclusion a uncertain
because the sample size fe too small
3.23 Tests for a Mediarn .

A population median (J») is another measure of the center of the population distribution. This
population parameter is less sensitive to extreme values and nondetects than the sample ™*^ Therefore,
this parameter is sometimes used instead of the mean when the data contain a large number of nondetects or
extreme values. The hypotheses considered in this section are:

Case 1: H,,: £ s: C vs. HA:

Case2: H^ j»*C vs. HA:

where C represents & given threshold such as a regulatory level

It is worth noting thai ths median is ths 50** percentile, so the methods described in section 3.2.2 may
be used to test hypotheses concerning the median by letting Po010.50. In this case, the one-sample test for
proportions is also called the Sign Test for a median. The Wilcoxon signed rank test (section 3.2.1.2) can
also be applied to a median in the $am« manner ss it is applied to & mean. In addition, this test is more
powerful than the Sign Test for symmetric distributions. Therefore, the Wilcoxon signed rank test is the
preferred test for the median.
EPAQA/G-9
3.2-13
QA96
-------
3 J TESTS FOR COMPARING TWO POPULATIONS

A two-sample test involves the comparison of two populations or a "before and after" comparison.
In environmental applications, the two populations to be compared may be & potentially contaminated area
with a background area or concentration levels from an upgradient and a downgradient well The comparison
of the two populations may be based on a statistical parameter that characterizes the relative location (e.g., &
mean or median), or it may be based on a distribution-free comparison of the two population distributions.
Tests that do not assume an underlying distributions (e.g., normal or lognonnal) are called distribution-free or
nonparametric tests. These tests are often more useful for comparing two populations than those that assume
a specific distribution because they make less stringent assumptions. Section 3.3.1 covers tests for
differences in the means of two populations. Section 3.3.2 covers tests for differences in the proportion or
percentiles of two populations. Section 3.3.3 describes distribution-free comparisons of two populations.
Section 3.3.4 describes tests for comparing two medians.

Often, a two-sample test involves the comparison of the difference of two population parameters to a
threshold value. For environmental applications, the threshold value is often zero, representing the case
where the data are used to determine which of the two population parameters is greater than the other. For
example, concentration levels from a Superfund site may be compared to a background site. Then, if the
Superfund site levels exceed the background levels, the site requires further investigation. A two-sample test
may also be used to compare readings from two instruments or two separate populations of people.

If the exact same sampling locations are used for both populations, then the two samples are not
independent. This case should be converted to a one-sample problem by applying the methods described in
section 3.2 to the differences between the two populations at the same location. For example, one could
compare contaminant levels from several wells after treatment to contaminant levels from the same wells
before treatment. The methods described in section 3.2 would then be applied to the differences between the
before and after treatment contaminant levels for each welL

3J>J Comparing Two Means

Let U| represent the mean of population 1 and u? represent the mean of population 2. The
hypotheses considered in this section are:

Case 1: HO: u, - uj i 60 vs. HA: n, - jij> 80; and

Case2: H,,: u.-u^o,, vs. HA: fi,-u2<60.

An example of a two-sample test for population means is comparing the mean contaminant level at &
remediated Superfund site to a background site; in this case, 60 would be zero. Another example is a Record
of Decision for a Superfund site which specifies that the remediation technique must reduce the mean
contaminant level by 50 ppm each year. Here, each year would be considered a separate population and 80
would be 50 ppm.

The information required for these tests includes the null and alternative hypotheses (either Case 1 or
Case 2); the gray region (i.e., a value 8, > 80 for Case 1 or a value 8j <: 8p for Case 2 representing the bound
of the gray region); the false positive error rate a at o^ the false negative error rate P at d,; and any additional
limits on decision errors. It may be helpful to label additional false positive error limits as ctj at 6^ ctj at o^,
etc., and to label additional false negative error limits as p2 at d^, p, at dp, etc.

EPAQA/G-9 3.3-1 QA96
-------
33.1.1 Student's Two-Sample ft-Tes4 (Equal Variances)

PURPOSE

Student's two-sample t-test can be used to compare two population means based on the independent
random samples X,, Xj,... ,-X^ from the first population, and Y,, Yj,.... YB from the second population.
This test re-flfnre the variabilities (as expressed by the variance) of the two populations are approximately
equal If the two variances are not equal (a test is described in section 4.5), use Satterthwaite's t test (section.
3.3.1.2).

ASSUMPTIONS AND THEIR VERIFICATION

The principal assumption required for the two-sample t-test is that & random sample of size m (X,,
Xj,..., XJ is drawn from population i, and an independent random sample of size n (Y,, Yj,..., YJ is
drawn from population 2. Validity of the random sampling and independence assumptions should be
confirmed by reviewing the procedures used to select the sampling points.

The second assumption required for the two-sample t-tests are that the sample means X (sample 1)
and Y (sample 2) are approximately normally distributed. If both m and n are large, one may make this
assumption without further verification. For small sample sizes, approximate nonnality of the sample m^ans
can be checked by testing the normality of each of the two samples.

LIMITATIONS AND ROBUSTNESS

The two-sample t-test with equal variances is robust to violations of the assumptions of normality
and equality of variances. However, if the investigator has tested and rejected nonnality or equality of
variances, then nonparametric procedures may be applied. The t-test is not robust to outliers because sample
means and standard deviations are sensitive to outliers.

SEQUENCE OF STEPS

Directions for the two-sample t-test for a simple random sample and a systematic simple random
sample are given in Box 3.3-1 sod m example in Box 3.3=2.

3J.O Satterthwaite's Two-Sample fc=Test (Unequal Variances)

Satterthwaite's t-test should be used to compare two population means when the variances of the two
populations are not equal It requires the same assumptions as the two-sample t-test (section 3.3.1.1) except
the assumption of equal variances.

Directions for Satterthwaite's t-test for § simple random sample and a systematic simple random
sample are given in Box 3.3-3 and an example in Box 3.3-4.
EPAQA/G-9 3.3-2 QA96
-------
Box 3-3-1: Directions for the Students Two-Sample t-Test (Equal Variances)
for Simple and Systematic Random Samples

This describes the steps for applying the two-sample t-teste for differences between the population mean®
when the two population variances are equal for Case 1 (H,: u,-u,*6,)- Modifications for Case 2
(Ho.- p, - Uj i 5,) are given in parentheses {}.

STEP 1: Calculate the sample mean >( and the sample variance V for sample 1 and compute the samp!®
mean Y and the sample variances^ for sample 2.

STEP 2: Use section 4.5 to determine if the variances of the two populations are equal. If the variances of
the two populations are not equal, use Sattertnwaite'a t test (section 3.3.1.2). Otherwise,
compute the pooled standard deviation
(»-l)f»
STEP 3: Calculate t
(/»-!) *(«-!)
X-Y-l
UseTableA-1 of Appendix A to find the critical value t^, such that 100(1<0% of the t-dfetribution
with (m*n-2) decrees of freedom ia below t^

, the nuB hypothesis may ba rejected. Go to Step 5.
If t >

If t > tm {? * 4^J, there is not enough evidence to reject the nul hypothesis. Therefore, the false
negative error rate wi need to be verified. Go to Step 4.

STEP 4: To calculate the power of «h@ test, assurns that the fruevatues for the mean endotandard
deviation are those obtained in the sample and use a statistical software package Eke the DEFT
software (EPA G40, 1994) or the DataQUEST software (EPA G-90, 1996) to generate the
power curve of the two-sample t-test If only one false negative error rate (P) has bsen specified
(at 6,), it is possible to calculate the sample size which achieves the DQOs, assuming the true
mean and standard deviation are equal to the values estimated from the sample, instead of
calculating the power of the test Calculate
m
n
1-e
If m° £ m and n° & n, the false negative error rate has been satisfied. Otherwise, tho fate® negative
error rate has not been satisfied.

STEP 5: The results of the test could ba:

1) the nuS hypothesis was rejected, and 'A seems u, - u, > 5, (M, - Ma < OJ;

2) the nuB hypothesis was not rejected, the false negative error rate was satisfied, and it seems
3) the nui hypothesis was not rejected, the false negative error rate was not satisfied, and 3
seems u, - ^ * 6, {y, - u, a 5J, but this conclusion is uncertain because the sample sse was
toosmaS.
EPAQA/G-9
3.3-3
QA96
-------
BOB 3.3-2: An Example of a Students Two-Sample t-Test (Equal Variances)
for Simple and Systematic Random Samples

At a hazardous waste site, area 1 (cleaned using an In-eftu methodology) wee compared with a simiar (but
relatively uncontaminated) reference area, area 2. If the in-situ methodology worked, then the two ertee
should be approximately equal in average contaminant levels. If the methodology dki not work, then area 1
should have a higher average than the reference area. Seven random samples were taken from area 1, and
eight were taken from area 2. Because the contaminant concentrations In the two areas are supposedly
equal, the null hypothesis »H^ u, - MI * 0 (Case 1). The false positive error rate waa set at 5% and the false
negative error rate waa set at 20% (P) 8 the difference between the areas is 2.5 ppb.
STEP1:
STEP 2:
STEP 3:
STEP 4:
STEPS:
Area 2
Sample Mean
7.6 ppm
6.6 ppm
Sample Variance
2.1
Methods described in section 4.5 were used to determine that the variances were essentially
equal. Therefore,
(7-1)2.1 + (8-1)2.2
(7-1)+ (8-1)
1.4676
7.8-6.6-0

1.4676,71/7+1/8
1.5798
Table A-1 of Appendbc A waa used to find that the critical value ^ with (7 * 8 - 2) * 1 3 degrees
of freedom m 1.771.

Because t > t^ (La., 1.5798 » 1.771). there Is riot enough evidence to reject the nul hypothessa.
The fatee negative error rate wffl need to be verified.

Assuming the true values for the mean and standard deviation are those obtained fai the sample:

w. a „, D 2(1.4676*X1.645 +0.842)* + (o;25)L645a B 4.938, Le., 5.
Because m* < m (7) and n° i n (8), the false negative error rate has been satisfied

The nui hypothesis was not rejected and the false negative error rate was satisfied Therefore, it
seems there fe no difference between the two areas and that the in-aftu methodology worked as
expected
EPAQA/G-9
3.3-4
QA96
-------
Bos 3.3-3: Dlrectfons for Satterthwatte's t-Test (Unequal Variances)
for Simple and Systematic Random Sample®

This describes the steps for applying the two-sample t-test for differences between the population mean© for
Case1(H,: jJi-Mz'Ai)- Mocfificatione for Case 2 (H,: g,- Uji 6J are given in parentheses {}.

STEP 1: Calculate the sample mean X and the sample variance %' for sample 1 and compute tie sample
mean Y and the sample variance 8y2 for sample 2.

STEP 2: Using section 4.5, test whether the variances of the two populations are equal. If the variances
of the two populations are not equal, compute:
If the variances of the two populations appear approximately equal, use Student's two-sample t-
test (section 3.3.1.1, Box 3.3-1).
£.
m
>£
n
STEP 3: Calculate t =
Use Table A-1 of Append* A to find the critical value t*, such that 100(1-o)% of the t-dstribution
with f degrees of freedom s below tv,, where

,12
m
n

m\m~l) n\n-\)

(Round f down to the nearest integer.)

If t > t^, $ < -t^J. the nul hypothesis may be rejected. Go to Step 5.
If t > t^, (t«-t^J. there is not enough evidence to reject the nui hypothesis and therefor®, th@
false negative error rate wi need to be verified. Go to Step 4.

STEP 4: If the nui hypothesis (HJ was not rejected, calculate either the power of the test or the sampte
size necessary to achieve the false positive and false negative error rates. To calculate the
power of the test, assume that the true values for the mean and standard deviation are those
obtained in the sample and use a statistical software package to generate the power curve of the
two-«ampte t-test A simple method to check on statistical power does not exist

STEPS: The resute of the test could be:

1) th® nul hypothesis was rejected, and it seems MI • M» * 89 (M» • Ma * 6J:

2) the nui hypothesis was not rejected, the false negative error rate was satisfied, and '<&. seem®
3) the nu§ hypothesis was not rejected, the false negative error rate was not satisfied, and '&
seems u, - M, £ 6, {jj, - ^ * 6J. but this conclusion is uncertain because the sample see was
too small.
EPA QA/G-9
3.3-5
QA96
-------
Box 3.3-4: An Example of Satterthwalte's t-Test (Unequal Variances)
for Simple and Systematic Random Samples

At a hazardous waste site, area 1 (cleaned using an in-eitu methodology) was compared with a similar (but
relatively uncontaminated) reference area, area 2. If the in-sftu methodology worked, then the two sites
should be approximately equal in average contaminant tevete. If the methodology did not work, then area 1
should have a higher average than the reference area Seven random samples were taken from area 1, and
eight were taken from area 2. Because the contaminant concentrations in the two areas are supposedly
equal, the nui hypothesis » H^ MI - Ma $ 0 (Case 1). The false positive error rate was set at 5% and the false
negative error rate was set at 20% (P) ff the difference between the areas te 2.5 ppb.

STEP1: Sample Mean Sample Variance
Areal 9.2 ppm 1.3 ppm*
Area 2 6.1 ppm 5.7 ppm3

STEP 2: Using section 4.5, it was determined that the variances of the two populations were not equal,
and therefore using Satterthwaite's method is appropriate:
SNE aVl-3/7 + 5.7/8 =0.9477

STEPS: t = 9-2"6-1"0 =3.271
0.9477

Table A-1 was used with f degrees of freedom, where

/ = [1.3/7 + S.7/8]2 m iQ3Q7^t IQ degre^ of freedom)
1.32 ^ 5.72

72(7-1) 82(8-1)

(recal that f a rounded down to the nearest integer), to find ^, = 1.812.

Because t >\& (3.271 > 1.812), the nui hypothesiB may be rejected

STEP 5: Because the nui hypothesis was rejected, 8 would appear there is a difference between the two
areas (area 1 being more contaminated than area 2, the reference area) and that the in-situ
methodology has not worked ae intended
EPAQA/G-9 3.3-6 QA96
-------
3 J 3, Comparing Two Proportions or PercentSes

This section considers hypotheses concerning two population proportions (or two population
percentiles); for example, one might use these tests to compare the proportion of children with elevated blood
lead in one urban area compared wi th the proportion of children with elevated blood lead in another area. The
population proportion is the ratio of the number of elements in a subset of the total population to the total
number of elements, where the subset has some specific cJinrnrtnistk that the rest of the elements do not A
population percentile represents the percentage of elements of a population having values less than some
threshold value C.

Let P, represent the true proportion for population 1, and P2 represent the true proportion of
population!. The hypotheses considered in this section are:

Case 1: IV PI - PI * ^o vs. HA: P, - Pa> 8^ and

Case 2: H,,: P, - P2 2 60 vs. HA: P, - Pz < ^

An equivalent null hypothesis for Case i, written in terms of percentiles, is HQ: the 100P,* percentile minus
the 100P2* percentile is C or larger, the reverse applying to Case 2. Since any hypothesis about the
proportion below a threshold can be converted to an equivalent hypothesis about percentiles (see section
3.2.2), this guidance will only consider hypotheses concerning proportions.

The information required for this test includes the null and alternative hypotheses (either Case 1 or
Case 2); the gray region (Le., a value 8, >80 for Case 1 or a value 6, <60for Case 2, representing the bound
of the gray 'region); the false positive error rate a at 6^ the false negative error rate P at 6t; and any additional
limits on decision errors.
Two-Sample Test for Proportions

PURPOSE

The two-sample test for proportions can be used to compare two population percentiles or
proportions and is based on an independent random sample of m (X,, Xj, . . . , XJ from the first population
and an independent random sample size n (Y,, Yj, .... YJ from the second population.

ASSUMPTIONS AND THEIR VERIFICATION

The principal assumption is that of random sampling from the two populations.

LIMITATIONS AND ROBUSTNESS

The two-sample test for proportions is valid (robust) for any underlying distributional shape and is
robust to outliers, providing they are not pure data errors.

SEQUENCE OF STEPS

Directions for a two-sample test for proportions for a simple random sample and a systematic simple
random sample are given in Box 3.3-5; an example is provided in Box 3.3-6.

EPAQA/G-9 3.3 = 7 QA96
-------
Box 3.3-S: Directions foe a Two-Sample Test for Proportions
Ion Simple and Systematic Random Samples

The foflovwng describes the steps for applying the two-aampto test for proportions for Case 1 (H«: PI - Pj * 0).
Modifications for Case 2 (H* P, - P3 *0) are given in braces {}.
STEP 1:
STEP 4:
Given m random samples X,. Xj, . . . . X,,, from the first population, and n samples from the
second population, Y1t Yj, . . . . Y,., let k, be the number of points from sample 1 which exceed C,
and let k, be the number of points from sample 2 which exceed C. Calculate the sample
proportions p, = k/m and Pj = kj/n. Then calculate the pooled proportion
Compute mp,, m(1-p,). "P* n(1-pj). If al of these values are greater than or equal to 5,
continue. Otherwise, seek assistance from s statistician as analysis fe complicated.
STEP 2:
STEP 3: Calculate 2 = (p, - pj I Jp(l - p)(l/m * l/n).

Use Table A-1 of Appendix A to find the critical value z*. such that 1 00(1 -a)% of the normal
distribution B below z^. For example, i?a = 0.05 thenz^,= 1.645.

If z » ZK, {z « -z ,J. the nufl hypothesis may be rejected. Go to Step 5.

If z > Zn, fc < -z^J, there @ not enough evidence to reject the nu§ hypothesis. Therefore, th®
false negative error rate wffl need to b® verified Go to Step 4.

If the nui hypothesfo (HJ was not rejected, calculate either the power of the test or the sample
see necessary to achieve the fate® positive and false negative error rates. If only one false
negative error rate (P) has been specified at P, - Pj, tt IB possible to calculate the sample sizes
that achieve the DQoV(assuming the proportions are equal to the values estimated from the
sample) instead of calculating the power of the teat To
-------
BOH 3.3-3: An Example of a Two-Sampte Test tor Proportions
for SImpto and Systematic Random Samples

At a hazardous waste site, investigator* must determine whether an area suspected to be contaminated with
dioxin needs to be remediated. The possibly contaminated area (area 1) wi be compared to a reference area
(area 2) to aee if dioxin levels In area 1 are greater than dioxin levels in the reference area. An inexpensive
surrogate probe was used to determine if each individual sample is either "contaminated.0 i.e., over the health
standard of 1 ppb. or "dean.0 Le., less than the health standard of 1 ppb. The nui hypothesis wi be that the
proportion or contaminant levels in area 1 is less than or equal to the proportion in area 2. or H,: P, - P2 s 0
(Case 1). The decision maker is willing to accept a false positive decision error rate of 10% (a) and a false-
negative decision error rate of 5% (P) when the difference in proportions between areas exceeds 0.10. A
team collected 92 readings from area 1 (of which 12 were contaminated) and 80 from area 2, the reference
area, (of which 10 were contaminated).

STEP 1: The sample proportion for area 1 is p,»12/92 * 0.130, the sample proportion for ares 2 is
Pj = 10/80 = 0.125, and the pooled proportion p = (12 * 10) / (92 * 80) •* 0.128.

STEP 2: mp,»12. md-p,) = 80. np, = 10. n(1-pj °70. Because these values are greater than
to 5, continue to step 3.
STEP 3: z = (0.130 - 0.125) / ^0.128(1 - 0.128)(l/92 + 1/80) = 0.098

Table A-1 of Appendix A was used to find the critical value z^-1.282.

Because z > ZMO (0.098 > 1.282), there to not enough evidence to reject the nuB hypothesis and
the false negative error rate wffl need to be verified. Go to Step 4.

STEP 4: Because the nul hypothesis (HJ was not rejected, calculate the sample see necessary to
achieve the false positive and false negative error rates. Because only one false negative error
rate (3 = 0.05) has been specified (at a difference of P, - P, = 0.1), it« possible to calculate the
sample sizes that achieve the OQOs, assuming the proportions are equal to the values estimated
from the sample:

„* = w°- 4(1282* 1.645)20.1275(1-0.1275) „ (i.e^2samples}

(O.I)2

AI*>« D 0.115+0.055
where 0.1275 = P »
2

Because both m and n are less than m°. the false negative error rate has not been cafefisd.

STEP 5: The nui hypothesis was not rejected, and the false negative error rate was not satisfied.
Therefore, ft seems that there is no difference in proportions and that the contaminant
concentrations of the Investigated area and the reference area are probably the same. However,
this outcome is uncertain because the sample sizes obtained were in ai EkeEhood too smafl.
EPAQA/G-9 3.3 = 9 QA96
-------
3.3.3 Nonparametric Comparisons of Two Populations

In many cases, assumptions on distributional characteristics are difficult to verify or difficult to
satisfy for both populations. In this case, several distribution-free test procedures are available that compare
the shape and location of the two distributions instead of a statistical parameter (such as a mean or median).
The statistical tests described below test the null hypothesis "H^ the distributions of population 1 and
population 2 are identical (or, the site is not more contaminated than background)" versus the alternative
hypothesis "HA: part of the distribution of population 1 is located to the right of the distribution of
population 2 (or the site is more contaminated than background)." Because of the structure of the hypothesis
tests, the labeling of populations 1 and 2 is of importance. For most environmental applications, population
1 is the area of interest (Le., the potentially contaminated area) and population 2 is the reference area.

There is no formal statistical parameter of interest in the hypotheses stated above. However, the
concept of false positive and false negative error rates still applies.

3.3.3.1 The Wilcoxon Rank Sum TesS

PURPOSE

The Wilcoxon rank sum test can be used to compare two population distributions based on m
independent random samples X,, Xj,..., XB, from the first population, and n independent random samples
Y,, Yj,..., Yn from the second population. When applied with the Quantile test (section 3.3.3.2), the
combined tests are most powerful for detecting true differences between two population distributions.

ASSUMPTIONS AND THEIR VERIFICATION

The validity of the random sampling and independence assumptions should be verified by review of
the procedures used to select the sampling points. The two underlying distributions are assumed to have the
same shape and dispersion, so that one distribution differs by some fixed amount (or is increased by a
constant) when compared to the other distribution. For large samples, to test whether both site distributions
have approximately the same shape, one can create and compare histograms for the samples.

LIMITATIONS AND ROBUSTNESS

The Wilcoxon signed rank test may produce misleading results if many data values are the same.
When values are the same, their relative ranks are the same, and this has the effect of diluting the statistical
power of the Wilcoxon rank sum test Estimated concentrations should be reported for data below the
detection limit, even if these estimates are negative, because their relative magnitude to the rest of the data is
of importance An important advantage of the Wilcoxon rank sum test is its partial robustness to outliers,
because the analysis is conducted in terms of rankings of the observations. This limits the mfhtence of
outliers because a given data point can be no more extreme than the first or last rank.

SEQUENCE OF STEPS

Directions and an example for the Wilcoxon rank sum test are given in Box 3.3-7 and Box 3.3-8.
However, if a relatively large number of samples have been taken, fc is more efficient in terms of statistical
power to use a large sample approximation to the Wilcoxon rank sum test (Box 3.3-9) to obtain the critical
values of W.

EPA QA/G-9 3.3 -10 QA96
-------
Box 3.3-7: Directions for the Wllcoxon Rank Sum TasS
for Simple and Systematic Random Samples

Let X,, X* . . . , XB, represent the m data points from population 1 end Y,, Y» . . . . YB represent the n data
points from population 2 where both m and n are less than or equal to 10. For the test, the nufl hypothesis
wiy be that there to no difference between the two populations. The alternative hypothesis wffl be that
population 1 fe located to the right of population 2 for Case 1 or that population 2 is located to the right of
population 1 for Case 2. If either m or n is larger than 10, use Box 3.3*9.

STEP 1 : List and rank the measurements from both populations from smaflest to largest, keeping track of
which population contributed each measurement The rank of 1 a assigned to the smallest
value, the rank of 2 to the second smallest value, and so forth. If there are ties, assign the
average of the ranks that would otherwise have been assigned to the tied observations.

STEP 2: For Case 1 , calculate W as the sum of the ranks of the data from population 2.
For Case 2, calculate W as the sum of the ranks of the data from population 1.
STEP 3: Calculate Wff = W -
for Case tor calculate
W -
for Case 2.
STEP 4: Use Table A-7 of Appandbc A to find the critical value w.

If WOT £ w,,, th® nui hypothesis may be rejected. Go to Step 6.

If W^ wa, there is not enough evidence to reject the nui hypothesis. Therefore, the fates
negative error rate wffl need to be verified. Go to Step 5.

STEP 5: If the nui hypothesis (HJ was not rejected, calculate either the power of the test or the sampka
see necessary to achieve the false positive and false negative error rates using o software
package Eke the DEFT software (EPA G-4D, 1994) or the DataQUEST software (EPA G-9D,
1996). (Power calculations tend to be much more difficult for nonparametric procedures than for
parametric procedures.) If only one false negative error rate (P) has been specified (at 5,). it e
possible to calculate the sample size that achieves the DQOs, assuming the true mean and
standard deviation are equal to the values estimated from the sample, instead of calculating the
power of the test If m and n are large, calculate:
l-e
where z, is the p" parcenHe of the standard normal distribution (Table A-1 of Appends A). Then,
multiply m° and n° by 1.16 to account for loss in efficiency, and, if 1.16m° s m and 1.16n° s n, the
false negative error rate has been satisfied; if the values of m and n are otherwise, the fates
negative error rate has not been satisfied.

STEP 6: Th© results o? the test could bs:

1) th® nui hypotheste was rejected, and Bseerns that population 1 is located to the right of
population 2 for Cas® 1 or that population 2 b located to the right of population 1 for Case 2.

2) th® nui hypothesis was not rejected, the false negative error rate was satisfied, and Ss«sm@
there is no difference between the two populations; or

3) the nuB hypothesis was not rejected, the false negative error rate was not satisfied, and it
seems there ta no difference between the two populations, but this result 0 uncertain because tha
sample sizes were probably too smai.
EPA QA/G-9
3.3-11
QA96
-------
BOH 3.3-8: An Example of the Wllcoxon Rank Sum Tes4
toff Simple and Systematic Random Samples

At a hazardous waste eite, area 1 (cleaned using an in-eftu methodology) was compared with a simlar (but
relatively uncontaminated) reference area, area 2. If the in-situ methodology worked, then the too sites
should be approximately equal in average contaminant levels. If the methodology (fid not work, then area 1
should have a higher average than the reference ares. The nui hypothesis wS be that there is no difference
between the two areas. Since area 1 was previously contaminated, the alternative hypothesis wi be that
contaminant levels in area 1 are larger (located to the right) than those in area 2 (Case 1). The false positive
error rate was set at 5% and the false negative error rate was set at 20% (P) if the difference between the
areas e 2.5 ppb. Seven random samples were taken from area 1 and eight samples were taken from area 2:
Areal
17,23.26,5
13,13,12
Area 2
16,20.5,4
8.10.7.3
STEP 1: The data feted and ranked by size are (Area 1 denoted by °):

Data (ppb): 3. 4. 5. 5°. 7, 8V 10. 12. 13°. 13°. 16. 17°. 20. 23°. 28°
Ranto 1. 2,3.5.3.5°, 5. 6°. 7, 8. 8.5°. 9.5° 11, 12°, 13, 14°. 15*

STEP 2: W = sum of ranks from area 2 = 50.5

STEP3: W^ 50.5-8(8* 1X2° 14.5

STEP 4: Using Table A-7 of Appandx A, Wuj,813. Because Wwb greater than W^ do not reject the
nu§ hypothesis.

STEP 5: The Ru9 hypothesis was not rejected and ft would be appropriate to calculate the probable power
of the test However, because the number of samples ts smafl, extensive computer simulation*
are required to order to estimate the power of this test Therefore, a statistician should bs
consulted

STEP 6: The nuB hypothesis was not rejected. Therefore, it is likely that there is no deference between the
investigated area and the reference area, although the statistical power is low due to the emal
sample sizes involved.
EPAQA/G-9
3.3 - 12
QA96
-------
Bos 3.3-0: Directions for the Urge Sample Approximation
to the Wllcoxon Rank Sum Test
for Simple and Systematic Random Samples

LetX,, X*,... .^represent the m data points from population 1 and Y,, Yj,..,. Yn represent the n data
points from population 2 where both n and m are greater than 10. The nuB hypothesis wtf be that there te no
deference between the two populations. The alternative hypothesis wffl ba that population 1 a larger than
population 2 for Case 1 or that population 2 is larger than population 1 tor Case 2.

STEP 1: Lot end rank the measurements from both population* from smallest to largest, keeping track of
which population contributed each measurement The rank of 1 is assigned to the smallest
value, the rank of 2 to the second smallest value, and so forth. If there are ties, assign the
average of the ranks that would otherwise have been assigned to the tied observations.

STEP 2: For Case 1, calculate W as the sum of the ranks of the data from population 2.
For Case 2, calculate W as the sum of the ranks of the data from population 1.
STEP 4:
STEP 4:
STEP 3: Calculate s =
2
forCas&l or s
for Case 2.
Use TaWe A-1 of Appendix A to find the critical value z^. such that 100(1-a)% of the normal
distribution a below z^.

ff z > ZKB there Js not enough evidence to reject &>s nui hypothesis and the fates negative arror
rate should bs verified. Go to Step 5.

Kztz^, the nul hypothesis may bs rejected. Go to Step 6.

If the nul hypothesis (HJ was not rejected, calculate either the power of the test or the samp!®
size necessary to achieve the false positive and false negative error rates using a statistic^
software package. (Power calculations tend to be more difficult for nonparametric procedures
than for parametric procedures.) If only one false negative error rate (0) has been specified (at
6,), It to possible to calculate the sample see that achieves the DQO*. assuming the true mean
and standard deviation are equal to the valuea estimated from the sample. Instead of calculating
the power of the test If m and n are large, calculate:
where z. Is the p* percentie of the standard normal Distribution (Table A-1 of Appendfe A). Then,
multiply nV and n* by 1.16 to account for a loss in efficiency. If 1.16m" s. m and 1.16n° £ n, the .
false negative error rate has. been satisfied. Otherwise, the false negative error rate has not been
satisfied

STEP 6: Tharesute of thetesi could bs:

1) the nui hypothesfe was rejected, and itseems that population 1 fe greater than population2
for C®se 1 or thai population 2 te greater than population 1 for Case 2.

2) the nuE hypotheses was not rejected, the false negative error rate wag satisfied, and 8 seams
there b no difference between th« two population*; or

3) tha nui hypothesis was not rejected, the fake negative error rate was not satisfied, and '&
seems there is no difference between the two populations, but this result is uncertain because the
sample sizes were probably too srnai
EPAQA/G-9
3.3 -13
QA96
-------
3.3.3.2 Ths Quantik Tesft

PURPOSE

The Quantile test can be used to compare two populations based on the independent random samples
X,, Xj, . . ., X,,, from the first population and Y,, Yj, . . ., Ya from the second population. When the Quantile
test and the Wilcoxon rank sum test (section 3.3.3. 1) are applied together, the combined tests are the most
powerful at detecting true differences between two populations.

ASSUMPTIONS AND THEIR VERIFICATION

The Quantile test assumes that the data X,, Xj, . . ., X^ are a random sample from population 1, and
the data Y,, Yj, . . ., Yn are & random sample from population 2, and the two random samples are independent
of one another. The validity of the random sampling and independence assumptions is assured by using
proper randomization procedures, either random number generators or tables of random numbers. The
primary verification required is to review the procedures used to select the sampling points. The two
underlying distributions are fflsmmffd to hnve the Mme underlying dispersion (variance).

LIMITATIONS AND ROBUSTNESS

The Quantile test is not robust to outliers. In addition, the test assumes either a systematic (e.g., a
triangular grid) or simple random sampling was employed. The Quantile test may not be used for stratified
designs.

SEQUENCE OF STEPS

The Quantile test is difficult to implement by hand. Therefore, directions are not included in this
guidance. However, the DataQUEST software (EPA G-9D, 1996) can be used to conduct this test

3.3.4 Comparing Two Medians

Let £, represent the median of population 1 and p, represent the median of population 2. The
hypothesis considered in this section are:

Case 1: H,,: ?i - i»a * 6o vs. HA: JI, - ?j > 8,,; and

Case2: H,,: Pi-^ao,, vs. HA: ?l=?s<6o.
An example of & two-sample test for the difference between two population medians is comparing the median
contaminant level at & Superfund site to the median of a background site. In this case, 60 would be zero.

The median is also the 50* perceatile, and, therefore, the methods described in section 3.3.2 for
percentiles and proportions may be used to test hypotheses conceniing the difference between two medians by
letting P, = P0 = 0.50. The Wilcoxon rank sum test (section 3.3.3. 1) is also recommended for comparing two
medians. This test is more powerful than those for proportions for symmetric distributions.
EPAQA/G-9 3.3-14 QA96
-------
STEP 4i VERIFY THE ASSUMPTIONS OF THE STATISTICAL TEST
THE DATA QUALITY ASSESSMENT PROCESS
Reviaw DQOs and Sampling Dedgn
Conduct Preliminary Data Review
Select the Statistical Test
Verify the Assumptions
Draw Conclusions From the Data
VERIFY THE ASSUMPTIONS OF THE
STATISTICAL TEST
hypottwete taet h Ight of »hs erwlrorersrta!
-------
STEP 4: VERIFY THE ASSUMPTIONS OF THE STATISTICAL TEST

Tests for
Distributional
*-
Assumptions

Tests for
Treads
Tests for
Outliers
Tests for
Dispersion
Transformations
Data below
Detection Limit
Test
Shapiro WilkW Test
Fiffibea's Statistic
Coefficient of Variation Test
Skewness and Kurtosis Tests
Studentized Range Tesft
**
Geary's Test
Goodness-of-Fit Tests
Test of a Correlation Coefficient
Mann-Kendall Test
Tests for an Overall Monotonic Trend
Extreme Value Jest
Discordance Test
.
Rosner's Test
Walsh's Test
Confidence Intervals fer a Variance

F-Test
Bartlett'sTest
Levene'sTest
Logarithmic, Square Roof, Inverse
Sine, Box-Cox Transformations
Substitution Methods
Cohen's Adjustment
Trimmed Mean
Winsorization
Section)
4.2.2
4.2.3
4.2.4
4.2.5
4.2.6
4.2.6
4.2.7
4.3.2.2
4.3.4.1
4.3.4.2
4.3.4.3
4.4.3
4.4.4
.4.4.5
4.4.6
4.5.1
4.5.2
4.5.3
4.5.4
4.6
4.7.1
4.7.2.1
4.7.2.2
4.7.2.3
Directions

Box 4.2-1

Box 4.2-2
Box 4.2-3

Box 4.3.1
Box 4.3.3
Box 4.3.5
Box 4.3-8
Box 4.4-1
Box 4.4-3
Box 4.4-5
Box 4.4-7
Box 4.521
Box 4.5-2
Box 4.5-3
Box 4.5-5
Box 4.6-1

Box 4.7-1
Box 4.7-4
Box 4.7-6
Example

Box 4.2-1

Box 4.2-2
Box 4.2-4

Box 4.3.1
Box 4.3.4
Box 4.3.6

Box 4.4-2
Box 4.4-4
Box 4.4-6

Box 4.5-1
Box 4.5-2
Box 4.5-4
Box 4.5-6
Box 4.6-1

Box 4.7-2
Box 4.7-5
Box 4.7-7
EPAQA/G-9
QA96
-------
., CHAPTER 4
STEP 4s .VERIFY THE ASSUMPTIONS OF THE STATISTICAL TEST

4.1 OVERVIEW AND ACTIVITIES

In this step, the analyst should assess the validity of the statistical test chosen in step Sty examining
ifq underlying a<;
-------
multiplicative fashion (or some combination). For example, if temporal or spatial autocorrelations are
believed to be present, then the model needs to identify the autocorrelation structure (see section 2.3.8).

4.O Perform Tests of Assumptions

For most statistical tests, investigators will need to assess the reasonableness of assumptions in
relation to the structure of the components making up an observation. For example, a t-test assumes that the
components, or errors, are additive, uncorrelated, and normally distributed with homogeneous variance.
Basic assumptions that should be investigated include:

(1) Is it reasonable to assume that the errors (deviations/rom the model) are iwnnalfy
distributed? If adequate data are available, then standard tests for normality can be
conducted (e.g., the Shapiro-Wilk test or the Kohnogorov-Smimov test).

(2) Is it reasonable to assume that errors are uncorrelated? While it is natural to assume mat
analytical errors imbedded in measurements ««?He on different sample units are independent,
other errors from other sources may not be independent If sample units are "too close
together," either in time or space, independence may not hold. If the statistical test assumes
independence and this assumption is not correct, the proposed false positive and false
negative error rates (a and P) for the statistical test cannot be verified.

(3) Is it reasonable to assume that errors are additive and have a constant variability? If
sufficient data are available, a ploJ of the relevant standard deviations versus mean
concentrations may be used to discern if variability tends to increase with concentration
level If so, transformations of the data may make the additivity assumption more tenable.

One of frg nwrf important assumptions underlying the statistical procedures described herein is that
ttere is no inherent bias (systematic deviation from the true value) in the data. The general approach adopted
here is that if a long term bias is known to exist, then adjustment for this bias should be made. If bias is
present, then the basic effect is to shift the power curves associated with a given test to the right or left,
depending on the direction of the bias. Thus substantial distortion of the nominal Type I (false positive) and
Type n (false negative) decision error rates may occur. In general, bias cannot be discerned by examination
of routine data: rather, appropriate and adequate QA data are needed, such as performance evaluation data. If
one chooses not fo make adjustment for bias oo the basis of such data, then one should, at a minimum.
construct the estimated worse-case power curves so as to understand me potential effects of the bias.

4.13 Determine Corrective Actions

Sometimes the assumptions underlying the primary statistical .test will not be satisfied and some type
of corrective action will be required before proceeding, hi some cases, a transformation of the data will
correct a problem with distributional assumptions. In other cases, the data for verifying some key assumption
may not be available, and existing information may not support a theoretical justification of the validity of the
assumption. In this situation, it may be .necessary to collect additional data to verify the assumptions. If the
assumptions underlying a hypothesis test are not satisfied, and data transformations or other modifications do
not appear feasible, then it may be necessary to consider an alternative statistical test These include robust
test procedures and nonparametric procedures. Robust test procedures involve modifying the parametric test
by using robust estimators. For instance, as a substitute for a t-test, a trimmed mean and its associated
standard error (section 4.7.2) might be used to form a t-type statistic.

EPAQA/G-9 4.1> 2 QA96
-------
42 TESTS FOR DISTRIBUTIONAL ASSUMPTIONS

Many statistical tests and models are only appropriate for data that follow a particular distribution.
This section will aid in determining if a distributional assumption of a statistical test is satisfied, in particular,
the assumption of normality. Two of the most important distributions for tests involving environmental data
are the normal distribution and the lognormal distribution, both of which are discussed in this section. To test
if the data follow a distribution other than the normal distribution or the lognormal distribution, apply the chi-
square test discussed uvseetion 4.2.7 or consult a statistician.

There are many methods available for verifying the assumption of normality ranging from simple to
complex This section discusses methods based on graphs, sample moments (kurtosis and skewness), sample
ranges, the Shapiro-Wilk test and closely related tests, and goodness-of-fit tests. Discussions for the simplest
tests contain step-by-step directions and examples based on the data in Table 4.2-1. These tests are
summarized in Table 4.2-2. This section ends with a comparison of the tests to help the analyst select a test
for normality.

Table 42-1. Data for Examples in Section 42
15.63
11.00
11.75
10.45
13.18
10.37
10.54
11.55
11.01
10.23
X =11.57
s= 1.677
The assumption of normality is very important as it is the basis for the majority of statistical tests.
A normal, or Gaussian, distribution is one of the most common probability distributions in the analysis of
environmental data. A normal distribution is a reasonable model of the behavior of certain random
phenomena and can often be used to approximate other probability distributions. In addition, the Central
Limit Theorem and other limit theorems state that as the sample size gets large, some of the sample summary
statistics (e.g., the sample mean) behave as if they are a normally distributed variable. As a result, a common
assumption associated with parametric tests or statistical models is that the errors associated with data or a
model follow a normal distribution.

The graph of a normally distributed random variable, a normal curve, is bell-shaped (see Figure
4.2-1) with the highest point located at the mean which is equal to the median. A normal curve is symmetric
Nonnal Distribution
Lognonnsl Distribution
Figure 4 .2-1. Graph of a Nonnal and Lognormal Distribution
EPAQA/G-9
QA96
-------
about the mean, hence the part to the left of the mean is a mirror image of the part to the right. ID
environmental data, random errors occurring during the measurement process may be normally distributed.

Environmental data commonly exhibit frequency distributions that are non-negative and skewed with
heavy or long right tails. Several standard parametric probability models have these properties, including the
Weibull, gamma, and lognormal distributions. The lognonnal distribution (Figure 4.2-1) is a commonly used
distribution for modeling environmental contaminant data. The advantage to this distribution is that a simple
(logarithmic) transformation will transform a lognonnal distribution into a normal distribution. Therefore,
the methods for testing for normality described in this section can be used to test for lognormality if a
logarithmic transformation has been USed.
Table 4J-2. Tests for Normality
Test
Shapiro WilkW
Test
Filliben's
Statistic
Coefficient of
Variation Test
Skewnessand
Kurtosis Tests
Geary's Test
Shirfpntut^S
Range Test
Chi-SquareTest
Lilliefors
Kolmogorov-
SmimoffTest
Section
4.2.2
4.2.3
4.2.4
4.2.5
4.2.6
4.2.6
4.2.7
4.2.7
Sample
Size
*50
*100
Any
>50
>50
$1000
Larg^
>50
Recommended Use
Highly recommended.
Highly recommended.
Only use to quickly
discard an assumption of
normality.
Useful for large sample
sizes.
Useful when tables for
other tests are not
available.
Highly recommended
(with some conditions).
Useful for grouped date
and when the comparison
distribution is known.
Useful when tables for
other tests are not
available.
Data-
QUEST
Yes
Yes
Yes
Yes
Yes
Yes
No
No
0 The necessary sample size depends on the number of groups formed when implementing this test Each
group should contain at least 5 observations.
EPAQA/G-9
4.2-2
QA96
-------
4*2.1 Graphical Methods

Graphical methods (section 2.3) present detailed information about data sets that may not be
apparent from a test statistic. Histograms, stem-and-leaf plots, and nonnal probability plots are some
graphical methods that are useful for determining whether or not date follow a normal curve. Both the
histogram and stem-and-leaf plot of a normal distribution are bell-shaped The nonnal probability plot of &
normal distribution follows a straight line. For non-normally distributed data, there will be large deviations in
the tails or middle of a normal probability plot.

Using a plot to decide if the data are normally distributed involves making a subjective decision. For
extremely non-normal data, it is easy to make this determination; however, in many cases the decision is not
straightforward. Therefore, formal test procedures are usually necessary to test the assumption of normality.

4.2.2 Shapiro-Wilk Test for Normality (the W test)

One of the most powerful tests for normality is the W test by Shapiro and Wilk. This test is similar
to computing a correlation between the quantiles of the standard nonnal distribution and the ordered values of
a data set If the normal probability plot is approximately linear (Le., the data follow a normal curve), the test
statistic will be relatively high. If the normal probability plot contains significant curves, the test statistic will
be relatively low.

The W test is recommended in several EPA guidance documents and in many statistical texts.
Tables of critical values for sample sizes up to 50 have been developed for determining the significance of the
test statistic. However, this test is difficult to compute by hand since it requires two different sets of tabled
values and a large number of summations and multiplications. Therefore, directions for implementing this
test are not given in this document, but the test is contained in the DataQUEST software package (QA/G-9D,
1996).

43, J Extensions of the Shapiro-Wilk Test (Filliben's Statistic)

Because the W test may only be used for sample sizes less than or equal to 50, several related tests
have been proposed. D*Agostino's test for sample sizes between 50 and 1000 and Royston's test for sample
sizes up to 2000 are two such tests that approximate some of the key quantities or parameters of the W test

Another test related to the W test is the Filliben statistic, also called the probability plot correlation
coefficient This test measures the linearity of the points on the nonnal probability plot Similar to the W
test, if the normal probability plot is approximately linear (ie., the data follow a normal curve), the
correlation coefficient will be relatively high. If the nonnal probability plot contains significant curves (Le.,
the data do not follow a normal curve), the correlation coefficient will be relatively low. Although easier to
compute that the W test, the Filliben statistic is still difficult to compute by hand. Therefore, directions for
implementing this test are not given in this guidance; however, it is contained in the DQA DataQUEST
software package (QA/G-9D, 1996).
EPAQA/G-9 4.2-3 QA96
-------
43. A Coefficient of Variation!)

The coefficient of variation (CV) may be used to quickly determine whether or not the data follow a
normal curve by comparing the sample CV to 1. Tl^ use of the CV is only valid for some eoviroomenlal .
applications if the data represent & non-negative characteristic such as contaminant concentrations. If the CV
is greater than 1, the data should not be modeled with a normal curve. However, this method should not be
used to conclude the opposite, i.e., do not conclude that the data can be modeled with a normal curve if the
CV is less than 1. This test is to be used only in conjunction with other statistical tests or when graphical
representations of the data indicate extreme departures from normality. Directions and an example of mis
method are contained in Box 4.2-1.

43.3 Coefficient of Skewness/CoefficfenS of Kwrtosis Teste

The degree of symmetry (or asymmetry) displayed by a data set is measured by the coefficient of
skewness (g3). The coefficient of kurtosis, §4, measures the degree of flatness of a probability distribution
near its center. Several test methods have been proposed using these coefficients to test fornonnality. One
method tests for normality by adjusting the coefficients of skewness and kurtosis to approximate a standard
normal distribution for sample sizes greater than SO.

Two other tests based on these coefficients include a combined test based on a chi-squared (x2)
distribution and Fisher's cumulant test Fisher's cumulant test etmpatea the exact sampling distribution of g3
and &,; therefore, it is more powerful than previous methods which assume mat the distributions of the two
coefficients are normal. Fisher's cumulant test requires a table of critical values, and these tests require a
sample size of greater than 50. Tests based on skewness and kurtosis are rarely used as they are difficult to
compute and less powerful than many alternatives.
Box 4.2-1: Directions for the Coefficient of Variation Test for
Environmental Data and an Example
Directions
STEP1: Calculate the coefficient of variation (CV): CV *> s IX
STEP 2: If CV>1.0. conclude that the data ere not normaSy distributed. Otherwise, the test la inconclusive.

Example

The following example demonstrates using the coefficient of variation to determine that the data in Table 4.2-1
should not be modeled using a normal curve.

STEP1: Calculate the coefficient of variation (CV): CF • 4: = TTT^T = °-145
X

STEP 2: Since 0.145 > 1.0. the test is inconclusive.
EPAQA/G-9 4.2-4 QA96
-------
Range Testa

Almost 100% of the area of a normal curve lies within ±5 standard deviations from the mean and
tests for normality have been developed based on this fact Two such tests, which are both simple to apply,
are the studentized range test and Geary's test Both of these tests use a ratio of an estimate of the sample
range to the sample standard deviation. Very large and very small values of the ratio then imply, that the data
are not well modeled by a normal curve.

a. The studentized range test (or w/s test). This test compares the range of the sample to the
sample standard deviation. Tables of critical values for sample sizes up to 1000 (Table A-2 of Appendix A)
are available for determining whether the absolute value of this ratio is significantly large. Directions for
implementing this method are given in Box 4.2-2 along with an example. The studentized range test does not
perform well if the data are asymmetric and if the tails of the data are heavier than the normal distribution. In
addition, this test may be sensitive to extreme values. Unfortunately, lognonnalfy distributed data, which are
common in environmental applications, have these characteristics. If the data appear to be lognormally
distributed, then this test should not be used. In most cases, the studentized range test performs as well as the
Shapiro-Wilk test and is much easier to apply.

b. Geary's Test Geary's test uses the ratio of the mean deviation of the sample to the sample
standard deviation. This ratio is then adjusted to approximate a standard normal distribution. Directions for
implementing this method are given in Box 4.2-3 and an example is given in Box 4.2-4. This test does not
perform as well as the Shapiro-Wilk test or the studentized range test However, since Geary's test statistic is
based on the normal distribution, critical values for all possible sample sizes are available.
Box &2-2: Directions for Studentized Rang® Tesfc
and an Example
Directions
STEP 1: Calculate sample range (w) and sample standard deviation (e) using section 2.2.3.

X - X
STEP 2: Compare — - — — to the critical values given in Table A-2 (labated a and b).
S S ' •

If w/e fate outside the two critics! values then the data do not foflow a normal curve.

Example

The foSowing example demonstratee the use of the studentized range test to determine V th® date from TabSs
4.2-1 can be modeled using a normal curve.

STEP 2: w/8 * 5.4 /1.677 = 3.22. The critical values given in Table A-2 are 2.51 and 3.875. Sines 3.22
falls between ttese values, the assumption of normaity fe not rejected
EPA QA/G-9 4.2 - 5 QA96
-------
Bos 42-3: Dtoodtom® for G*a»y°8 Test

STEP1: Cateutete the sampte mean x. tha esmpte sum of squares (SSS).and the sum of absolute
deviations (SAD):
STEP 3:
STEP 4:
as
STEP 2: Calculate Geary's tea* statistic a =
.
= £ |*,-
&4D
Test V for agnfficanc® by computing Z = ^Ll — : - -. Here 0.7979 and 0.2123 are
02123/^1
constants used to achieve normaUy.
UseTabieA-1 of Appendw A to find the critical value z^, such that 100(1-a)% of the normal
distribution Bbatowz^ For axampie.fr a = 0.05. then 2^ = 1. 645. Declare "a" to be
sufficiently smal or large (Le., conclude the data are not normally Distributed) 9 IZ I >
BoK-12-4: Example of Geary's TasS

The foiowing example demonstrates the usa of Geay*@ tasito determine if the data from Table 4.2-1 can be
modeled using a normal curve.
STEP1: X = £^ = 11.571, SAD
STEP 2:

STEP 3:

STEP 4:
11.694,

• 1364.178 - 1338.88 « 25298
n
11.694
,/10(25.298)
0.735-
Z - °'735 - °'7979 - -0.934
0-2123//IO

Sine® tZ I > 1.84 (5% significance level), there is not enough information to conclude that the
data do not foSow a normal distribution.
EPAQA/G-9
4.2-6
QA96
-------
4.2.7 Q>odness-of-F>t Tests

Goodness-of-fit tests are used to test whether data follow a specific distribution, i.e., how "good" &
specified distribution fits the data. In verifying assumptions of normality, one would compare the data to a
normal distribution with a specified mean and variance.

a, Chi-squ&re Test One classic goodness-of-fit test is the chi-square test which involves breaking
the data into groups and comparing these groups to the expected groups from the known distribution. There •
are no fixed methods for selecting these groups and this test also requires, a large sample size since at least 5
observations per group are required to implement this test In addition, the chi-square test does not have the
power of the Shapiro-Wilk test or some of the other tests mentioned above.

b. KoImogorov-Smiraoff(K-§) Test and Lilliefors K-S Test, Another goodness-of-fit test is the
Kolmogorov Smirnoff (K-S) test which also tests whether the data follow a specific distribution with known
parameters such as the mean and variance. This test requires that the sample size of the data be greater than
SO. The Lilliefors K-S test may be used for testing if the data are normally distributed when the sample size
is larger than 50 and the distribution parameters are estimated from the data. The Lilliefors K-S test is more
powerful than the chi-square test for large sample sizes and is recommended in several EPA guidance
documents.

4.2.8 Recommendations

Analysts can perform tests for normality with samples as small as 3. However, the tests lack
statistical power for small sample size. Therefore, for small sample sizes, it is recommended that a
nonparametric statistical test (i.e., one that does not assume a distributional form of the data) be selected
during Step 3 of the DQA Process in order to avoid incorrectly assuming the data are normally distributed
when there is simply not enough information to test this assumption.

If the sample size is less than 50, then this guidance recommends using the Shapiro-Wilk W test,
wherever practicable. The Shapiro-Wilk W test is one of most powerful tests for normality and it is
recommended in several EPA guidance as the preferred test when the sample size is less than 50. This test is
difficult to implement by hand but can be applied easily using the DQA DataQUEST software package
(QA/G-9D, 1996). If the Shapiro-Wilk W test is not feasible, then this guidance recommends using either
Filliben's statistic or the sM«t«™d range test Filliben's statistic performs similarly to the Shapiro-Wilk
test The studentized range is a simple test to perform; however, it is not applicable for non-symmetric data
with large tails. If the data are not highly skewed and the tails are not significantly large (compared to a
normal distribution), the shi«fcn*'*H range provides a simple and powerful test that can be calculated by
hand. All three of these tests are included in the DataQUEST software (QA/G-9D, 1996).

If the sample size is greater than SO, this guidance recommends using either the Filliben's statistic or
the studentized range test However, if critical values for these tests (for the specific sample size) are not
available, then this guidance recommends implementing either Geary's test or the Lilliefors Kolmogorov-
Smirnoff test Geary's test is easy to apply and uses standard normal tables similar to Table A-l of Appendix
A and widely available in standard textbooks. Lilliefors Kolmogorov-Smimofif is more statistically powerful
but is also more difficult to apply and uses specialized tables not readily available.
EPAQA/G-9 4.2-7 QA96
-------
4.3 TESTS FOR TRENDS

4.3.1 Introduction

This section presents statistical tools for detecting and estimating trends in environmental data. The
detection and estimation of temporal or spatial trends are important for many environmental studies or
monitoring programs. In cases where temporal or spatial patterns are strong, simple procedures such as time
plots or linear regression over time can reveal trends. In more complex situations, sophisticated statistical
models and procedures may be needed. For example, the detection of trends may be complicated by the
overlaying of long- and short-term trends, cyclical effects (e.g., seasonal or weekly systematic variations),
autocorrelations, or impulses or jumps (e.g., due to interventions or procedural changes).

The graphical representations of Chapter 2 are recommended as the first step to identify possible
trends. A plot of the data versus time is recommended for temporal data, as it may reveal long-term trends
and may also show other major types of trends, such as cycles or impulses. A posting plot is recommended
for spatial data to reveal spatial trends such as areas of high concentration or areas that were inaccessible.

For most of the statistical tools presented below, the focus is on monotonic long-term trends (Le., a
trend that is exclusively increasing or decreasing, but not both), as well as other sources of systematic
variation, such as seasonality. The investigations of trend in this section are limited to one-dimensional
domains, e.g., trends in a pollutant concentration over time. The current edition of this document does not
address spatial trends (with 2° and 3-dimensional domains) and trends over space and time (with 3- and 4-
dimensional domains), which may involve sophisticated geostatistical techniques such as kriging and require
the assistance of a statistician. Section 4.3.2 discusses estimating and testing for trends using regression
techniques. Section 4.3.3 discusses more robust trend estimation procedures, and section 4.3.4 discusses
hypothesis tests for detecting trends under several types of situations.

4.3.2 Regression-Based Methods for Estimating and Testing for Trends

4J.2.1
The classic procedures for assessing linear trends involve regression. Linear regression is a
commonly used procedure in which calculations are performed on a data set containing pairs of observations
(Xj,Yj), so as to obtain the slope and intercept of a line that "best fits" the data. For temporal trends, the Xj
values represent time and the Y4 values represent the observations, such as contaminant concentrations. An
estimate of the magnitude of trend can be obtained by performing a regression of the data versus time (or
some function of the data versus some function of time) and using the slope of the regression line as the
measure of the strength of the trend.

Regression procedures are easy to apply, most scientific calculators will accept data entered as pairs
and will calculate the slope and intercept of the best fitting line, as well as the correlation coefficient r (see
section 2.2.4). However, regression entails several limitations and assumptions. First of all, simple linear
regression (the most commonly used method) is designed to detect linear relationships between two variables;
other types of regression models are generally needed to detect non-linear relationships such as cyclical or
non-monotonic trends. Regression is very sensitive to extreme values (outliers), and presents difficulties in
handling data below the detection limit, which are commonly encountered in environmental studies.
Regression also relies on two key assumptions: normally distributed errors, and constant variance. It may be
difficult or burdensome to verify these assumptions in practice, so the accuracy of the slope estimate may be

EPAQA/G-9 4.3-1 QA96
-------
suspect Moreover, the analyst must ensure that time plots of the data show no cyclical patterns, outlier tests
show no extreme data values, and data validation reports indicate that nearly all the measurements were
above detection limits. Because of these drawbacks, regression is not recommended as a general tool for
estimating and detecting trends, although it may be useful as an informal, quick, and easy screening tool for
identifying strong linear trends.

43.2.2 Testing for Trends Using Regression Methods

The limitations and assumptions associated with estimating trends based on linear regression
methods apply also to other regression-based statistical tests for detecting trends. Nonetheless, for situations
in which regression methods can be applied appropriately, there is a solid body of literature on hypothesis
testing using the concepts of statistical linear models as a basis for inferring the existence of temporal trends.
The methodology is complex and beyond the scope of this document

For simple linear regression, the statistical test of whether the slope is significantly different from
zero is equivalent to testing if the correlation coefficient is significantly different from zero. Directions for
this test are given in Box 4.3-1 along with an example. This test assumes a linear relation between Y and X
with independent normally distributed errors and constant variance across allX and Yvalues. Censored
values (e.g., below the detection limit) and outliers may invalidate the tests.
Bos 4.3-1: Directions for the Test for a. Correlation Coefficient
and an Example :

Directions

STEP 1: Calculate the correlation coefficient r (section £2.4).
STEP 2: Calculate the t-value t
Use Table A-1 of Apper«laAtoftadthecrtk^valueV-8uchthat100(1-a/2)%ofth«t
distribution with n - 2 degrees of freedom a below W For example, if a * 0.10 and n ° 17. then
n-2 = 15 and t^ " 1.753. Conclude that the correlation is significantly different from zero tf
STEP 3:
Example: Consider the foflowihg data set On ppb): for Sampte 1, arsenic (X) te 4.0 and lead (Y) is 8.0; for
Sample 2. arsenic n 3.0 and lead te 7.0; for Sample 3, arsenic a 2.0 and tead te 7.0; and for Sampte 4.
arsenic is 1.0 and lead 'n 6.0.

STEP 1: In section 2.2.4, the correlation coefficient r for this data was calculated to be 0.949.

0.949
STEP 2:
1 - 0.9492
4-2
426
STEP 3: Using Table A-1 of Appendix A, t^ * 2.920 tor a 10% level of significance and 4-2 = 2 degress
of freedom. Therefore, there appears to be a significant correlation between the two variables
lead and arsenic.
EPA QA/G-9
4.3-2
QA96
-------
433 General Trend Estimation Methods

4J.3.1 Sen's Slope Estimator

Sen's Slope Estimate is a nonparametric alternative for estimating a slope. This approach involves
computing slopes for all the pairs of ordinal time points and then using the median of these slopes as an
estimate of the overall slope. As such, it is insensitive to outliers and can handle a moderate number of
values below the detection limit and missing values. Assume that there are n time points (or n periods of
time), and let X; denote the data value for the i* time point If there are no missing data, there will be n(n-l)/2
possible pairs of time points (i, j) in which i > j. The slope for such a pair is called a pairwise slope, b^, and is
computed as bs = (Xj • Xj) / (i - j). Sen's slope estimator is then the median of the n(n-l)/2 pairwise slopes.

If there is no underlying trend, then a given Xj is as likely to be above another Xj as it is below.
Hence, if there is no underlying trend, there would be an approximately equal number of positive and negative
slopes, and thus the median would be near zero. Due to the number of calculations required, Sen's estimator
is rarely calculated by hand and directions .are not given in this document. However, the estimator is
contained in the DQA DataQUEST software package (QA/G-9D, 1996).

4333, Seasonal Kendall Slope Estimator

If the data exhibit cyclic trends, then Sen's slope estimator can be modified to account for the cycles.
For example, if data are available for each month for a number of years, 12 separate sets of slopes would be
determined (one for each month of the year); similarly, if daily observations exhibit weekly cycles, seven sets
of slopes would be determined, one for each day of the week. In these estimates, the above pairwise slope is
calculated for each time period and the median of all of the slopes is an estimator of the slope for a long-term
trend. This is known as the seasonal Kendall slope estimator. Because of the number of calculations
required, this estimator is rarely calculated by hand so directions are not given in this document The
seasonal Kendall slope estimator is contained in the DataQUEST software package (QA/G-9D, 1996).

4.3.4 Hypothesis Tests for Detecting Trends

Most of the trend tests treated in this section involve the Mann-Kendall test or extensions of it The
Mann-Kendall test does not assume any particular distributional form and accommodates trace values or
values below the detection limit by aligning them a common value. The test can also be modified to deal
with multiple observations per time period and generalized to deal with multiple sampling locations and
seasonality.

43.4.1 One Observation per Time Period for One Sampling Location

The Mann-Kendall test involves computing a statistic S, which is the difference between the number
of pairwise slopes (as described in 4.3.3.1) that are positive minus the number that are negative. If S is &
large positive value, then there is evidence of an increasing trend in the data. If S is a large negative value,
then there is evidence of a decreasing trend in the data. The null hypothesis or baseline condition for this test
is that there is no temporal trend in the data values, Le., "H,,: no trend". The alternative condition or
hypothesis will usually be either "HA: upward trend" or "HA: downward trend,"

The basic Mann-Kendall trend test involves listing the observations in temporal order, and
computing all differences that may be formed between measurements and earlier measurements, as depicted

EPAQA/G-9 4.3 = 3 QA96
-------
in Box 4.3-2. The test statistic is the difference between the number of strictly positive differences and the
number of strictly negative differences. If there is an underlying upward trend, then these differences will
tend to be positive and a sufficiently large value of the test statistic will suggest the presence of an upward
trend Differences of zero are not included in the test statistic (and should be avoided, if possible, by
recording data to sufficient accuracy). The steps for conducting the Mann-Kendall test for small sample sizes
(Le., less than 10) are contained in Box 4.3-3 and an example is contained in Box 4.3-4.

For sample sizes greater than 10, a normal approximation to the Mann-Kendall test is quite accurate:
Directions for this approximation are contained in Box 4.3-5 and an example is given in Box 4.3-6. Tied
observations (Le., when two or more measurements are equal) degrade the statistical power and should be
avoided, if possible, by recording the data to sufficient accuracy.
Bos 43-2: "Upper Triangular" Data for Basic Mann-Kendall Trend Test
with a Single Measurement at Each Time Point
Data Table
Original Time
Measurements
x,
x,
Original Tuna
Measurements
X,
X,
NOTE: XrYt=(
t, t, tj t, ... V, ^ (time from eariest to latest)
X X, X, X, ... X., X, (actual values recorded)
X.-X.,
After performing the subtractions this table converts to:
t, tj fc» t, ... V, t> ^Of* SOf-
X, Xj Xj X, ... X,., Xa Differences Differences
Y^,,^ Y^
0 do not contribute to either total and are discarded. Total # >0 Total # <0

where YB = sign (XrXJ = *tfX,-Xa>0
»-»x,-xa
-------
Bos 4.3-3: Directions for the Mann-Kendall Trend Tes4 for Small Sample Sizes

If the sample size n less than 10 and there is only one datum per time period, the Mann-KendaB Trend Test for
smafl sample sees may be used.

STEP 1: List the data in the order collected over time: X,. X*.... X* where X, is the datum at time V Assign a
value of DL/2 to values reported as below the detection Emit (DL). Construct a 'Data Matrix? similar to
the top half of Box 4.3-2.

STEP 2: Compute the sign of aS possible differences as shown in the bottom portion of Box 4.3-2.

STEP 3: Compute the Mann-Kendaa statistic S. which is the number of positive signs minus the number of
negative signs in the triangular table: S = (number of * signs) - (number of - signs).

STEP 4: Use Table A-11 of Appendix A to determine the probability p using the sample size n and the absolute
value of the statistic S. For example, if n=5 and S=8, p=0.042.

STEP 5: For testing the nuB hypothesis of no trend against H, (upward trend), reject H, if S > 0 and if p < a.
For testing the nuH hypothesis of no trend against Hj (downward trend), reject H, i? S < 0 and if p < a.
Bos 4.3-4: An Example of Mann-Kendall Trend Test for Small Sample Sizes

Consider 5 measurements ordered by the time of their collection: 5,6,11,8, and 10. This data wB ba used to test
the nun hypothesis, H^ no trend, versus the alternative hypothesis H, of an upward trend at an a = 0.05 significance
level
STEP1:

STEP 2:
The data Bsted in order by time are: 5,6,11,8,10.

A triangular table (see Box 4.3-2) was used to construct the possible differences. The sum of signs of
the differences across the rows are shown in the columns 7 and 8.
Time
Data
1
5
2
6
3
11
4
8
5
10
No. of*
Signs
No. of
- Signs
*
*
STEPS:

STEP 4:
11
8
Using the table above, S = 8 - 2 = 6.

From Table A-11 of Appendix A for ns Sand S° 6, p° 0.117.
4
3
0
1
8
0
0
2
fl
2
STEP 5: Since S » 0 but p a 0.117 < 0.05. Sha nu§ hypothesis is not rejected. Therefore, there to not enough
evidence to conclude that there is an increasing trend in the data.
EPAQA/G-9
4.3-5
QA96
-------
Bos 4.3-5: Directions for the Mann-Kendall Procedure Using Normal Approximation

If the sample size te 10 or more, a normal approximation to the Mann-Kendall procedure may be used.

STEP 1: Complete steps 1.2. and 3 of Box 4.3-3.

STEP 2: Calculate the variance of S: V(S) = — ^——-.
lo
If ties occur, let g represent the number of tied groups and wp represent the number of data points in the

p* group. The variance of S IK V(S) = -}- [n(w-lX2«+5) - £ w_ (w -l)(2w>5)]
18 p=i * r r
STEP 4: Calculate Z=-— ?S>O.Z = 0»S = 0.or Z=-- ifS<0.
STEPS: Us® Table A-1 of Appendix A to find the critical value 3^ such that 100(1 « 6
... * 1
. . . * 1
- ~ •+ - 1
* 0 1
SL
1
1
1
0
0
4
3
2
0
1
35 13
STEP 2:
STEP 3:

STEP 4:
STEP 5:
STEP 6:

S = (sum of + signs) - (sum of - signs) » 35 - 13 = 22
There are several observations tied at 10 and 15. Thus, the formula for tied values w8J be used. In this
formula, g=2. t,=4 for tied values of 1 0. and ta=2 for tied values of 1 5.
FT[5) o _L [11(1 1-1X2X1 1)^5) - [4(4-l)(2(4)+5) + 2(2-l)(2(2)+5)]] - 155.33
18
0inrin| B I. .yy.!!!..... V _• "~* _ ii~\ „ ^U _ 1 grt«
[^5)1* (155.33)14 12.46
From Table A-1 of Appendix A, z^^l .645.
H, is the alternative of interest Therefore, since 1 .605 is not greater than 1 .645. H, @ not rejected.
Therefore, there is not enough evidence to determine that there is an upward trend

EPAQA/G-9 4.3 - 6 QA96
-------
4.3.4.2 Multiple Observations per Time Period for One Sampling Location

Often, more than one sample is collected for each time period. There are two ways to deal with
multiple observations per time period. One method is to compute a summary statistic, such as the median,
for each time period and to apply one of the Mann-Kendall trend tests of section 4.3.4.1 to the summary
statistic. Therefore, instead of using the individual data points in the triangular table, the summary statistic
would be used. Then the steps given in Box 4.3-3 and 4.3-5 could be applied to the summary statistics.

An alternative approach is to consider all the multiple observations within a given time period as
being essentially equal (Le., tied) values within that period. The S statistic is computed as before with n
being the total of all observations. The variance of the S statistic (previously calculated in step 2) is changed
to:

MM»-±
l)(w-2) 2n(n-l)

where g represents the number of tied groups, wp represents the number of data points in the p* group, h is
the number of time periods which contain multiple data, and u, is the sample size in the q* time period.

The preceding variance formula assumes that the data are not correlated. If correlation within single
time periods is suspected, it is preferable to use a summary statistic (e.g., the median) for each time period
and to then apply either Box 4.3-3 or Box 4.3-5 to the summary statistics.

43 A3 Multiple Sampling Locations with Multiple Observations

The preceding methods involve a single sampling location (station). However, environmental data
often consist of sets of data collected at several sampling locations (see Box 4.3-7). For example, data are
often systematically collected at several fixed sites on & lake or river, or within a region or basin. The data
collection plan (or experimental design) must bs systematic in the sense that approximately the same
sampling times should be used at all locations. In this situation, it is desirable to express the results by an
overall regional summary stat"™**^ across all sampling locations. However, there must be consistency in
behavioral characteristics across sites over time in order for a single summary statement to be valid across all
sampling locations. A useful plot to assess the consistency requirement is a single time plot (section 2.3.8.1)
of the measurements from all stations where a different symbol is used to represent each station.

If the stations exhibit approximately steady trends in the same direction (upward or downward), with
comparable slopes; then a single summary statement across stations is valid and this implies two relevant sets
of hypotheses should be investigated:

Comparability of stations. H,,: Similar dynamics affect all K stations vs. HA: At least two stations
exhibit different dynamics.

Testing for overall monotonie irend, HO°: Contaminant levels do not change over time vs.
HA': There is an increasing (or decreasing) trend consistently exhibited across all stations.

EPAQA/G-9 .4.3-7 QA96
-------
Therefore, the analyst must first test for homogeneity of stations, and then, if homogeneity is confirmed, test
for an overall monotonic trend.

Ideally, the stations in Box 4.3-7 should have equal numbers. However, the numbers of observations
at the stations can differ slightly, because of isolated missing values, but the overall time periods spanned
must be similar. This guidance recommends that for less than 3 time periods, an equal number of
observations (a balanced design) is required. For 4 or more time periods, up to 1 missing value per sampling
location may be tolerated.

a. One Observation per Time Period. When only one measurement is taken for each time period
for each station, a generalization of the Mann-Kendall statistic can be used to test the above hypotheses. This
procedure is described in Box 4.3-8.

bo Multiple Observations per Time Period. If multiple measurements are taken at some times and
station, then the previous approaches are still applicable. However, the variance of the statistic S^ must be
calculated using the equation for calculating V(S) given in section 4.3.4.2. Note that S^ is computed for each
station, son,wp,g,h,andu4areall station-specific.
Box 43-7: Data lot Multiple Times and Multiple Stations

Let I = 1,2,.... n represent time, k * 1,2 K represent sampling locations, and X®
represent the measurement at time i for location k. This data can be summarized in
matrix form, as shown below.
Stations
1
2
Time
x«
X,
X,,
A22
X,
£
s,
V(S,)
s,
V(S,)
2,
where S* = Mann-Kendal statistic for station k (see STEP 3. Box 4.3-3),
V(SJ * variance for S statistic for station k (see STEP 2, Box 4.3-5), and
EPA QA/G-9
4.3-8
QA96
-------
Bos-O-8: Testing for Comparability of Stations and an Overall Monotonte Trend

Let i = 1.2,.... n represent time, k a 1,2 K represent sampSng locations, and X* represent the measurement
at time i for location k. Let a represent the significance level for testing homogeneity and a° represent the
significance level for testing for an overal trend.

STEP1: Calculate the Mann-Kendafl statistic S^ and its variance V(S,J for each of the K stations using the .
methods of section 4.3.4.1, Box 4.3-5.

STEP 2: For each of the K stations, calculate Zh «

_ K
STEP 3: Calculate the average Z = E *
4=1

STEP 4: Calculate the homogeneity chnsquar© statistic X& e X Z£ - K Z*.
4-1
STEP 5: Using a chhequared table (Table A-3 of Appendix A), find the critical value for x* with (K-1) degrees
of freedom at an a significance level. For example, for a significance level of 5% and 5 degrees of
freedom, X*< 9 > = 11 -07, Le., 11.07 »the cut point which puts 5% of the probability in the upper tai of
a cW-«quare variable with 5 degrees of freedom.

STEP 6: tf x? sx*( K. i> there are comparable dynamics across stations at significance level a. Go to Step 7.

* xl * X*< K. i \> the stations are not homogeneous (I.®., (Efferent dynamics at different stations) at the
significance level a. Therefore, individual a'-tevel Mann-KendaS tests should be conducted at each
station using the methods presented in section 4.3.4.1.

STEP 7: Usingachi^uaredtabie(TabieA^ofAppendb
-------
If seasonal cycles are anticipated, then two approaches for testing for trends are the seasonal Kendall
test and Sen's test for. trends. The seasonal Kendall test may be used for large sample sizes, and Sen's test for
trends may be used for small sample sizes. If different seasons manifest similar slopes (rates of change) but
possibly different intercepts, then the Mann-Kendall technique of section 4.3.4.3 is applicable, replacing time
by year and replacing station by season.

The seasonal Kendall test, which is an extension of the Mann-Kendall test, involves calculating the
Mann-Kendall test statistic, S, and its variance separately for each "season" (e.g., month of the year, day of
the week). The sum of the S's and the sum of their variances are then used to form an overall test statistic
that is assumed to be approximately normally distributed for larger size samples.

For data at a single site, collected at multiple seasons within multiple years, the techniques of
section 4.3.4.3 can be applied to test for homogeneity of time trends across seasons. The methodology
follows Boxes 4.3-7 and 4.3-8 exactly except that "station" is replaced by "season" and the inferences refer
to seasons.

4 J J A Discussion OB Tests for Trends

This section discusses some further considerations for choosing among the many tests for trends. All
of the nonparametric trend tests and estimates use ordinal time (ranks) rather than cardinal time (actual time
values, such as month, day or hour) and this restricts the interpretation of measured trends. All of the Mann-
Kendall (MK) Trend Tests presented are based on certain pairwise differences in measurements at different
time points. The only information about these differences that is used in the MK calculations is their signs
(i.e., whether they are positive or negative) and therefore are generalizations of the sign test MK calculations
are relatively easy and simply involve counting the number of cases in which X^ exceeds X* and the number
of cases in which Xj exceeds Xj+j. Information about magnitudes of these difFerences is noi used by MK
methods and this can adversely affect the statistical power when only limited amounts of data are available.

There are, however, nonparametric methods based on ranks that takes such magnitudes into account
and still retains the benefit of robustness to outliers. These procedures can be thought of as replacing the data
by their ranks and then conducting parametric analyses. These include the Wilcoxon rank sum test and its
many generalizations. These methods are more resistant to outliers than parametric methods; e point can be
no more extreme than the smallest or largest value.

Rank-based methods, which make fuller use of the information in the data than MK methods, are oo£
as robust with respect to outliers as the sign and MK tests. They are, however, more statistically powerful
than the sign test and MK mettvyfa; the Wilcoxon test being a case in point If the data are random samples
from normal distributions with equal variances, then the sign test requires approximately 1.225 times as
many observations as the Wilcoxon rank sum test to achieve a given power as a given significance level This
kind of tradeoff between power and robustness exemplifies the analyst's evaluation process leading to the
selection of the best statistical procedure for the current situation. Further statistical tests will be developed
in future editions of this guidance.
EPAQA/G-9 4.3-10 QA96
-------
4.4 OUTLIERS

4.4.1 Background

Outliers are measurements that are extremely large or small relative to the rest of the data and,
therefore, are suspected of misrepresenting the population from which they were collected. Outliers may
result from transcription errors, data-coding errors, or measurement system problems such as instrument
breakdown. However, outliers may also represent true extreme values of a distribution (for instance, hot
spots) and indicate more variability in the population than was expected. Not removing true outliers and
removing false outliers both lead to a distortion of estimates of population parameters.

Statistical outlier tests give the analyst probabilistic evidence that an extreme value (potential outlier)
does not "fit" with the distribution of the remainder of the data and is therefore a statistical outlier. These
tests should only be used to identify data points that require further investigation. The tests alone cannot
determine whether a statistical outlier should be discarded or corrected within a data set; this decision should
be based on judgmental or scientific grounds..

There are 5 steps involved in treating extreme values or outliers:

1: Identify extreme values that may be potential outliers;
2. Apply statistical test;
3. Scientifically review statistical outliers and decide on their disposition;
4. Conduct data analyses with and without statistical outliers; and
5. Document the entire process.

Potential outliers may be identified through the graphical representations of Chapter 2 (step 1 above).
Graphs such as the box and whisker plot, ranked data plot, normal probability plot, and time plot can all be
used to identify observations that are much larger or smaller than the rest of the data. If potential outliers are
identified, the next step is to apply one of the statistical tests described in the following sections. Section
4.4.2 provides recommendations on selecting a statistical test for outliers.

IfadatapouitisfoundtobeanoutUer.theanalystmaydther: 1) correct the data point; 2) discard
the data point from analysis; or 3) use the data point in all analyses. This decision should be based on
scientific reasoning in addition to the results of the statistical test For instance, data points containing
transcription errors should be corrected, whereas data points collected while an instrument was
malfunctioning may be discarded. One should never discard an outlier based solefy on a statistical test
Instead, the decision to discard an outlier should be based on some scientific or quality assurance basis.
Discarding an outlier from a data set should be done with extreme caution, particularly for environmental data
sets, which often contain legitimate extreme values. If an outlier is discarded from the data set, all statistical
analysis of the data should be applied to both the full and truncateddata set so that the effect of discarding
observations may be assessed. If scientific reasoning does not explain the outlier, it should not be discarded
from the data set

If any data points are found to be statistical outliers through the use of a statistical test, this
information will need to be documented along with the analysis of the data set, regardless of whether any data
points are discarded. If no data points are discarded, document the identification of any "statistical" outliers
by documenting the statistical test performed and the possible scientific reasons investigated. If any data
points are discarded, document each data point, the statistical test performed, the scientific reason for

EPAQA/G-9 4.4=1 QA96
-------
discarding each data point, and the effect on the analysis of deleting the data points. This Information is
critical for effective pee? review.

4.4 J SdectioQ of a Statistical Test

There are several statistical tests for determining whether or not one or more observations are
statistical outliers. Step by step directions for implementing some of these tests are described in sections
4.4.3 through 4.4.6. Section 4.4.7 describes statistical tests for multivariate outliers.
Sample
Size
n*25
n*50 .
n*25
n*50
Test
Extreme Value Test
Discordance TesS
Hosier's Test
Walsh's Test
Section
4.4.3
4.4.4
4.4.S
4.4.6
Assumes
Normality
Yes
Yes
Yes
No
Multiple
Outliers
No/Yes
No
Yes
Yes
Bata-
QUEST
Yes
Yes
Yes
Yes
Table 4.4-1. Recommendations for Selecting a Statistical Test for Outliers

If the data are normally distributed, this guidance recommends applying Rosner's test (Box 4.4-5)
when the sample size is greater than 25 and the Extreme Value test (Box 4.4= 1) when the sample size is less
than 25. If only one outlier is suspected, thea the Discordance test (Box 4.4-3) may be substituted for either
of these t^sts^ If the d?** are not normally distributed, or if the dafo cannot be transformed so that the
transformed data are normally distributed, then the Analyst should either apply a nonparametric test (such as
Walsh's test in Box 4.4-7) or consult a statistician.

4.4 J Extreme Value Test (Dtara's Test)

Dixon's Extreme Value test can be used to test for statistical outliers when the sample size is less
than or equal to 25. This test considers both extreme values ftvtf are much smaller than the rest of the data
(case l)and extreme values that are much larger than the rest of the data (case 2). This test assumes that the
data without the suspected outlier are normally distributed; therefore, it is necessary to perform & test for
normality on the data without the suspected ctutlier before applying this test If the data are not normally
distributed^ eithesr transform thg (ja^ apply S diffcmtf tCSt, Of fiPPSuU 9 Statistician Directions for the
Extreme Value test are contained in Box 4.4-1; an example of this test is contained in Box 4.4-2. The
Extreme Value test is contained in the DQA DataQUEST software package (QA/G-9D, 1996).

This guidance recommends using this test when only one outlier is suspected in the data. If more
than one outlier is suspected, the Extreme Value test may lead to masking where two or more outliers close in
value "hide" one another. Therefore, if the analyst decides to use the Extreme Value test for multiple outliers,
apply the test to the least extreme value first
EPA QA/G-9
i.4-2
QA96
-------
Boa 4.4-1: Directions for the Extreme Valu© Tes8
(Dlxon's Test}

STEP1: LetX(,vX(,t>...,X,8,representthe data ordered from smaflestto largest Check that the data
without the suspect outlier are normally distributed, using one of the methods of section 4.2. If
normaBty fate, either transform the data or apply a different outfier test

STEP 2: X,., a a Potential Outlier (case 11: Compute th« test statistic C, where

C * (2) " (1) fbr3in$7. C * —& ^- for 11 * n s. 13.

C = —£> QL for 8 £ n £ 10, C » —^ ^- for 14 s. n s, 25.
STEP 3: If C exceeds the critical value from Table A-3 of Append* A for the specified significance level a,
X, , , is an outlier and should be further investigated.

STEP 4: X,., is a Potential Outlier (case 2): Compute the test statistic C. wher®

c . XW ~ *(»-!> fof 3 f „ f 7 c .. XW - ^(B-2)
. C
STEPS: If C exceeds the critical value from Tabte A-3 of Append* A for the specified significance tev«l a.
X ° iJUJJ " A1Ju" ' -^^ - 0.584
^^ ' ^* ^"* *^ B« * *>K « B ^K m • ** rf- ^^ £*) ft*5
STEP 5: Since C * 0.584 s> 0.477 (from Tabte A-3 of Appendix A with n=10), there \a evidence thatX,a) &
an outOer at a 5% significance level and should be further investigated
EPAQA/G-9 4.4 - 3 QA96
-------
4.4U Discordance Test

The Discordance test can be used to test if one extreme value is an outlier. This test considers two
cases: 1) where the extreme value (potential outlier) is the smallest value of the data set, and 2) where the
extreme value (potential outlier) is the largest value of the data set The Discordance test assumes that die
data are normally distributed; therefore, it is necessary to perform a test for normality before applying this
test If the data are not nonnalh/oUstributed either transform the data, apply a different test, or consult 8
statistician. Note that the test assumes that the data without the outlier are normally distributed; therefore,
the test for normality should be performed without the suspected outlier. Directions and an example of the
Discordance test arc contained in Box 4.4-3 and 4.4-4, respectively. Table A-4 of Appendix A contains
critical values for this test forn £ 50.
Box <&4-3: Directions Jo* the Discordance Tss&

STEP 1: LetX,,,, X,a>... .X,,,, represent** data ordered from smaDestto largest Check that the data
without the suspect outlier are normally distributed, using one of the methods of section 4.2. If
normaWyfafls. either transform the data or apply a different outfier test

STEP 2: Compute the sample mean, x (section 2.2.2), and the sample standard deviation, s (section 2.2.3).
If the minimum value X,,, is a suspected outlier, perform Steps 3 and 4. If the maximum value X 2.41 (from Tabto A4 of Appendfc A with n = 10). there s evidence that X,a)fe an
outfier at a 5% significance level and should be further investigated.
EPAQA/G-9 4.4-4 QA96
-------
4.4.3 Rosner's Tesft

A paramedic test developed by Rosner can be used to detect up to 10 outliers for sample sizes of 25
Of more. This test assumes that the data are normally distributed; therefrge, it is necessary fa perform a test
for normality before applying this test If the data are not nonnalty distributed either transform the data,
apply a different test, or consult a statistician. Note that the test assumes that the data without the outlier are
normally distributed; therefore, the test for normality may be performed without the suspected outlier.
Directions for Rosner's test are contained in Box 4.4-5 and an example is contained in Box 4.4-6. This test is.
also contained in the DQA DataQUEST software package (QA/G-9D, 19%).

Rosner's test is not as easy to apply as the preceding tests. To apply Rosner's test, first determine an
upper limit r0 on the number of outliers (r0 * 10), then order the r0 extreme values from most extreme to least
extreme. Rosner's test statistic is then based on the sample mean and sample standard deviation computed
without the r = r0 extreme values. If this test statistic is greater than the critical value given in Table A-5 of
Appendix A, there are r0 outliers. Otherwise, the test is performed again without the r = r0-1 extreme values.
This process is repeated until either Rosner's test statistic is greater than the critical value or r = 0.
STEP1:
STEP 2:
Box 4.4-5: Directions for Rosner's Test for Outliers

Let X,, Xj, . . . , X,, represent the ordered data points. By Inspection, identify the maximum
number of possible outfiere, r» Check that the data are normally distributed, using one of the
methods of section 42.

Compute the sample mean x, and the sample standard deviation, s. for aj the data Label
these values 3?°' and 8<0). respectively. Determine the observation farthest from x"> and label
this observation y"". Delete y<0>from the data and compute the sample mean, labeled 3?1>. and
the sample standard deviation, labeled tf ' '. Then determine the observation farthest from 5? ' '
and label this observation y*1'. Delete y*1) and compute 5? 2> and S121. Continue the process
until r0 extreme values have been eliminated.
In summary, after the above process the analyst should have

<°>, /•>];
where
jpo.

from 7 ' >. (Note, the above formulas for x^'and 8(l) assume that the data have been
renumbered after each observation is deleted.)
STEPS: To test if there are Y outfiere in the data, compute: Rf =

RrtoAjinTableA-SofAppendfaA. IfR,
and compare
conclude that there are routDers.
First, test if there are r0 outfiere (compare R to i, ). If not test if there are r0- 1 outfiere
. ro-\ r
-------
Bon 444: An Esampte of Rosnet's Test for Outliers

STEP 1: Consider the following 32 data points (In ppm) feted In order from smaBest to largest 2.07.40.55.
84.15. 88.41, 98.84,100.54.115.37,121.18.122.08.125.84.129.47,131.80,148.08.163.88.
166.77,171.81,178.23,181.64,185.47,187.64,183.73.188.74,208.43.213.29. 223.14,
225.12.232.72,233.21.239.87.251.12.275.36. and 385.67.

A normal probabffity ptot of the data shows that there is no reason to suspect that the data
(without the suspect ouffiere) are not normally distributed. In addition, this graph identified four
potential outfiere: 2.07,40.55,275.36, and 395.67. Therefore, Rosner's test win be applied to see
if there are 4 or fewer (r8 = 4) outfiere.

STEP 2: First the sample mean and sample standard deviation were computed for the entire data set (x"*
and s00). Using subtraction, it was found that 395.67 was the farthest data point from x™, so
y"* - 395.67. Then 385.67 was deleted from the data and the sample mean, x™, and the sample
standard deviation, s"'. were computed Using subtraction, ft was found that 2.07 was the farthest
value from x™. This value was then dropped from the data and the process was repeated again
on 40.55 to yield x01, tP, and y" and 2™, s®, and y* These values are summarized below.
0 169.923 75.133 395.67
1 162.640 63.872 2.07
2 167.993 57.460 40.55
3 172.387 53.099 275.36

STEP 3: To apply Rosner's test, it is first necessary to test if there are 4 ouffiere by computing

I275.36 - 172.387|
53.099
_
1.939
and comparing R« to \, In Table A-5 of Append* Awith n = 32. Since R« = 1.939 s A> = 2.89,
there are not 4 outfiere in the data sat Therefore, it wffl next be tested if there are 3 ouffiere by
computing

-»*l - 140.55 - 167.993| . 2^
f(2> 57.460

and comparing R, to A, in Table A-5 with n = 32. Since Rs =2.218 a AS = 2.91, there are not 3
outfiere In the data set Therefore, it wi next be tested V there ere 2 outfiere by computing

_ b(1> - s*1*! = |2.07 - 162.640!
^ = ,W = 63.872

and comparing R, to Aj In Tabte A-5 with n = 32. Since Ra = 2.514 s X, = 2.92, there are not 2
outfiere in the data set Therefore, it wi next be tested ff there is 1 outfier by computing
b(0) -
_ [395.67 - 169.923)
~ 75.133
3.005
and comparing R, to A, in Table A-5 with n = 32. Since R, = 3.005 > A,= 2.94. there Is evidence
at a 5% significance level that there is 1 outlier in the data set Therefore, observation 355.67 is a
statistical outfier and should be further investigated.
EPA QAyG-9
4.4-6
QA96
-------
4.4.6 Walsh's Tests

Twononparametric tests were developed by Walsh to detect multiple outliers in a data set The first
test by Walsh requires a large sample size: n > 220 for a significance level of a = 0.05 and n > 60 for a
significance level of &a 0.10. The second test by Walsh assumes the data arc symmetric but has no
restrictions on the sample size; however, this test is difficult to compute by hand. Both of these tests maybe
used whenever the data are not normally distributed. Directions for the first test by Walsh for large sample
sizes are given in Box 4.4-7. Both of Walsh's tests are contained in the DQA DataQUEST software package
(QA/G-9D, 1996).
Bos 44-7: Directions for Walsh's Test for Large Sample Sizes

STEP1: L^X(1))X,,>,....X(q)fepf^ent (he data ordsred from smaJ^t to largest ffn<60, donot
apply this test If 60«n £220,th«na °0.10. If n » 220, than a « 0.05. Identify the number of
possible outsets, r. Note that r can equs) 1.
STEP2: Compute c = ^. * = r «• c, 62 = l/«, and a
STEP 3: The r smallest points are oulSere (with a a% level of significance) ?
x(,} - (1 +a)x(rM) * ax(S) < 0

STEP 4: The r largest points are outEem (with a a% levd of significance} §

x(0»i-,) - d+«)«(fl.o * fl*(o.i-s) > °
STEP 5: If both of the inequafities ars trus, then both smai and large outfiare are indeated.
4.4.7 Multivariate Outikre

Multivariate analysis, such as factor analysis and principal components analysis, involves the
analysis of several variables simultaneously. Outliers in multrvariate analysis are then values that are
extreme in relationship to either one or more variables. As the number of variables increases, identifying
potential outliers using graphical representations becomes more difficult In addition, special procedures are
required to test for multtvariate outliers. Details of these procedures are beyond the scope of this guidance.
However, procedures for testing for multivariate outliers are contained in the software package Scout
developed by the EPA's Environmental Monitoring Systems Laboratory in Las Vegas, Nevada (EMSL-LV)
and statistical textbooks on multivariate analysis.
EPAQA/G-9 4.4-7 ' QA96
-------
4J TESTS FOR DISPERSIONS .

Many statistical tests make assumptions on the dispersion (as measured by variance) of data; this
section considers some of the most commonly used statistical tests for variance assumptions. Section 4.5.1
contains the methodology for constructing a confidence interval for a single variance estimate from a sample.
Section 4.5.2 deals with the equality of two variances, & key assumption for the validity of a two-sample
t-test Section 4.5.3 describes Bartlett's test and section 4.5.4 describes Levene's test These two tests verify
the assumption that two or more variances are equal, a requirement for & standard two-sample t-test, for
example. The analyst should be aware that many statistical tests only require the assumption of approximate
equality and that many of these tests remain valid unless gross inequality in variances is determined.

4 J.I Confidence Intervals for a Single Variance

This section discusses confidence intervals for a single variance or standard deviation for analysts
interested in the precision of variance estimates. This information may be necessary for performing a
sensitivity analysis of the statistical test or analysis method. The method described in Box 4.5-1 can be used
to find a two-sided 100(l=a)% confidence interval The upper end point of a two-sided 100(l-a)%
confidence interval is a 100(l-«/2)% upper confidence limit, and the lower end point of & two-sided
100(l-cc)% confidence interval is a 100(l-o/2)% lower confidence limit. For example, the upper end point
of a 90% confidence interval is & 95% upper confidence limit and the lower end point is a 95% lower
confidence limit Since the standard deviation is the square root of the variance, a confidence interval for the
variance can be converted to a confidence interval for die standard deviation by taking the square roots of the
endpoints of the interval This confidence interval asswnieg that the data constitute & random sample from a
normally distributed population and can be highly sensitive to outliers and to departures from normality.

4J.2 The F-Test for the Equality of Two Variances

An F-test may be used to test whether the true underlying variances of two populations are equal.
Usually the F-test is employed as a preliminary test, before conducting the two-sample t-test for the equality
of two means. The assumptions underlying the F-test are that the two samples are independent random
samples from two underlying normal populations: The F-test for equality of variances is highly sensitive to
departures from normality. Directions for implementing an F-test with an example are given in Box 4.5-2.

4JJ Bartlett's Test for the Equality of Two or More Variances

Bartlett's test is a means of testing whether two or more population variances of normal distributions
are equal In the case of only two variances, Bartlett's test is equivalent to the F-test Often in practice
unequal variances and non-normality occur together and Bartlett's test is itself sensitive to departures from
normality. With long-tailed distributions, the test too often rejects equality (homogeneity) of the variances.

Bartlett's test requires the calculation of the variance for each sample, then calculation of a statistic
associated with the logarithm of these variances. This statistic is compared to tables and if it exceeds the
tabulated value, the conclusion is that the variances differ as a complete set It does not mean that one is
significantly different from the others, nor that one or more are larger (smaller) than the rest It simply
implies the variances are unequal as a group. Directions for Bartlett's test are given in Box 4.5-3 and an
example is given in Box 4.5-4.
EPAQA/G-9 4.5-1 QA96
-------
Bos 4.5-1: Directions for Constructing Confidence Intervals and
Confidence Limits tor foe Sample Variance and Standard Deviation
with an Exampte

Directions: LeiX,,X*.. .,^,representthendatapoints.

STEP 1: Calculate the sample variance e2 (section 22.3).

STEP 2: Fora 100{1<0% two-sided confidence Interval, use Table A-8 of Appendix Ato find the cutoffs
L and U such that L, = x*«aandU = x2(i-ea) with (n-1) degrees of freedom (dot).

STEPS: A100(1-a)% confidence interval for the true undenVIng variance to (""O5 to 0»-1)*
A100(1-o:)% confidence inter** for the true standard deviation fe J fo"1)* ro J fo"1)* .
ij L Aj U •

Example: Ten samples were analyzed for lead: 46.4,46.1,45.6,47,46.1,45.9.'45.8,46.9,45.2,46 ppb.

STEP1: Using section 2.2.3, s* = 0.288.

STEP 2: Using Table A-« of Appends* A and 9 dof, L = x*«a • tfn • 19-02 and U = x*(MM, = X2^ = 2,70.

STEP 3: A 95% confidence interval for the variancete (10"1)°-286 to (10-1)0286 Of Q14 to Q g5
19.02 2.70

A 95% confidence interval for the standard deviation fe: ^Ol4 = .374 to v/0!95 = .975.
Bos4.S-S: Directions toff CaSculating on F-Te£t to Compare
Directions: Let X,, Xj, . . . , X^, represent the m data points from population 1 and Y,, Yj, . . . , Y0 represent the
n data points from population 2. To perform an Meat proceed as foBows.

STEP1: Calculate the sample variances 8«2 (for flw Xtyandey2 (for the Y*8 ) (section 2.2.3).
STEP 2: Calculate the variance ratios FR = 8aX2 and FySSr7/^2. Let FequaS trie larger of these two
values, tf F = Fn then tetk = m-1 and q = n-1. If F = Fatten let k= n-1 andq = m-1.
STEP 3: Using Table A-9 of Appencfix A of the F distribution, find the cutoff U = f^k, q). If F > U,
conclude that the variances of the two populations are not the same.

Example: Manganese concentrations were coiected from 2 weUs. The data are WeflX: 50, 73, 244, and
202 ppm; and Wei Y: 272, 171, 32, 250, and 53 ppm. An F-testwi be used to determine if the variances of
the two weds are equal

8TEP1: ForWeiX, V3 8076. For WeSY,V= 12125.

STEP 2: Fx = ^fc/ • 9076 / 12455 = 0.749. Fv • tfa? = 12445/9076 • 1.334. Since, Fy > F*
STEP 3: Using Table A-9 of Appencfix A of the F distribution with a = 0.05, L = fvW 4, 3) - 1 5.1 . Since
1 .334 < 1 5.1 , there is no evidence that the variability of the two weBs is different
EPAQA/G-9 4.5-2 QA96
-------
Box 45-3: Directions ftxr Bartletf s T«s&

Consider k groups with a sample size of n, for each group. Let N represent the total number of samples, i.e.,
tetN-n^n,^...^^ For example, consider two wete where 4 samples have been taken from wei 1 and
3 samples have been taken from weS 2. In this case, k * 2, n, =» 4, nj = 3, and N •» 4 * 3 = 7.

STEP 1 : For each of the k groups, calculate the sample variances, sf (section 2.2.3).
STEP 2: Compute the pooled variance across groups: s*

STEP 3: Compute the test statistic: TS = (N - k)

where °ln° stands for natural logarithms.

M
STEP 4: Using a chi-squared table (Table A-8 of Appendix A), find the critical value for x2 with (k-1)
degrees of freedom at a predetermined significance level For example, for a significance level of
5% and 5 degrees of freedom, x2 a 11.1. If the calculated value (TS) to greater than the
tabulated value, conclude that the variances are not equal at that significance tevd
BOB 43-d: An Example off Barttetf s Tasfi

Manganese concentrations were collected from 6 wefe over a 4 month period. The data are shown in the following
table. Before analyzing the data, it is important to determined the variances of the six wefls are equal. Barttetfs test
wffl be used to make this determination.

STEP 1: For each of the 6 weds, the sample means and variances were calculated. These are shown in the
bottom rows of the table below.
Sampling Date
January 1
February 1
March 1
April
n, (N=17)
5
WeS1
50
73
244
202
4
142.25
9076.37
Wei 2
46
77
2
61.50
480.49
Wel3
272
171
32
53
4
132
12455
Wei 4
34
3940
2
1987
7628243
WeS 5
48
54
2
51.00
17.98
Wen 6
68
991
54
3
371.00
288348
STEP 2:
7STTT I>,-1)*? ° TTT^
(N-K) M (17-6)
... +(3-1)576696] - 751837.27
STEPS: TS - (17-6) ln(751837.27) - [ (4-l)ln(9076) * ... * (3-l)ln(288348) ] = 43.16

STEP4: Thecrfficalxlvaluewith6-1=5degreee of freedom at the 5% significanc® level© 11.1 (from Table A-8
ofAppsndixA). Since43.16telargerthan11.1,ttecOncludedthatthes!xvariance8(s?,... ,s|)arenot
homogeneous at the 5% significance level
EPA QA/G-9
4.5-3
QA96
-------
4J.4 Levene's Test for the Equality of Two or More Variances

Levene's test provides an alternative to Bartlett's test for homogeneity of variance (testing for
differences among the dispersions of several groups). Levene's test is less sensitive to departures from
normality than Bartlett's test and has greater power than Bartlett's for non-normal data. In addition, Levene's
testhas power nearly as great as Bartlett's test for normally distributed data. However, Levene'? test is more
difficult to apply than Bartlett's test since it involves applying an analysis of variance (ANOVA) to the
absolute deviations from the group means. Directions and an example of Levene's test are contained in Box.
4.5-5 and Box 4.5-6, respectively.
Box&S-S: Directions ?o? Levene's Test

Consider k groups with a sample size of n, for the fth group. Let N represent the total number of samples, i.e., let
N = n, + n, *... <• iv For example, consider two wells where 4 samples have been taken from wefl 1 and 3
samples have been taken from weS 2. In this case, k = 2, n, = 4. n, = 3, and N = 4 •> 3 = 7.

STEP 1: For each of the k groups, calculate the group mean, I, (section 2.2.2), La., calculate:

:ft/-
STEP 2: Compute the absolute residuals %„ ° UT^ - X\ where X, represents the f value of the r* group.
For each of the k groups, calculate the means, zj. of these residuals, Le., calculate:
.
Also calculate th« overaB mean residue m z ° — #
N
-------
Fourmon
shown in t
equal. L*
STEP1:
STEP 2:

STEP 3:
STEP 4:
STEPS:
Bos 4.5-6: An Example of Lavene's T®s8
the of data on arsenic concentration were collected from sk wefe at a Superfund site. This data set @
he table below. Before analyzing this data, it is important to determine if the variances of the six wefls are
Dene's test wffl ba used to make this determination.
The group mean for each wei (x J is shown in the last raw of the table batow.
Areenic Concentration (ppm)
Month WeB1 WeS2 We§3 We§4 Wei 5 WeH8
1 22.90 2.00 2.0 7.84 24.90 0.34
2 3.09 1.25 109.4 9.30 1.30 4.78
3 35.70 7.80 4.5 25.90 0.75 2.85
4 4.18 52.00 2.5 2.00 27.00 1.20
Group Means . x,=16.47 X2=15.76 x,=29.6 xfl»11.26 X9=13.49 X0=2.29

To compute the absolute residuals Zg in each wei, the value 16.47 wi be subtracted from Wen 1 data,
15.76 from WeB 2 data, 29.6 from Wei 3 data, 1 1.26 from Wei 4 data, 13.49 from Wei 5 data, and 2.29
from Wei 6 data. The resulting values are shown in th® folowing table with the new weS means (zj and
the totai mean I.
Residual Arsenic Concentration (ppm)
Month Weil Wei 2 Wefl3 We§4 WeB 5 WeB6
1 6.43 13.76 27.6 3.42 11.41 1.95
2 13.38 14.51 79.8 1.96 12.19 2.49
3 19.23 7.98 25.1 14.64 12:74 0.56
4 12.29 36.24 27.1 9.26 13.51 1.09
Residual Means z,=12.83 z2=18.12 z>3S.9 z>7.32 z,>12.46 Zo=1.52
Tota) Residual Mean z « (1/8X12.83 * 18.12 •> 39.9 * 7.32 * 12.48 * 1.52) = 15.36
The sum of squares are: SSromL = 6300.89, SS^^ = 3522.90, and SSg^^, = 2777.99.
f_ ^^/(k-1) _ 352257(6-1) _Mfi

' ^flHWR^-*) 2777.997(24-6) ""
Using Tabte A-8 of AppendK A. the F statistic tor 5 and 1 8 degrees of freedom with a = 0.05 e 2.77.
Sines f=4.58 exceeds F.w=2.77. the assumption of equal variances should be rejected.
EPA QA7G-9
4.5-5
QA96
-------
4.6 TRANSFORMATIONS

Most statistical tests and procedures contain assumptions about the data to which they will be
applied For example, some common assumptions are that the data are nonnally distributed; variance
components of a statistical model are additive; two independent data sets have equal variance; and a data set
has no trends over time or space. If the data do not satisfy such assumptions, then the results of .a statistical
procedure or test may be biased or incorrect Fortunately, data that do not satisfy statistical assumptions may
often be converted or transformed mathematically into a form that allows standard statistical tests to perform.
adequately.

4.6.1 Types of Data TransfonnstioiBS

Any mathematical function that is applied to every point in a data set is called a transformation.
Some commonly used transformations include:

Logarithmic (Lag X or In X): This transformation may be used when the original measurement data
follow a lognormal distribution or when the variance at each level of the data is proportional to the
square of the mean of the data points at that level For example, if the variance of data collected
around SO ppm is approximately 250, but the variance of data collected around 100 ppm is
approximately 1000, then a logarithmic transformation may be useful. This situation is often
characterized by having a constant coefficient of variation (ratio of standard deviation to mean) over
all possible data values.

The logarithmic base (for example, either natural or base 10) needs to be consistent throughout the
analysis. If some of the original values are zero, it is customary to add a small quantity to make the
data value non-zero as the logarithm of zero does not exist The size of the small quantity depends
on the magnitude of the non-zero data and the consequences of potentially erroneous inference from
the resulting transformed data. As a working point, a value of one tenth the smallest non-zero value
could be selected. It does not matter whether a natural (In) or base 10 flog) transformation is used
because the two transformations are related by the expression ln(X) = 2.303 log(X). Directions for
applying a logarithmic transformation with an example are given in Box 4.6-1.

Square Root (t/x): This transformation may be used whea dealing with small whole numbers, such
as bacteriological counts, or the occurrence of rare events, such as violations of a standard over the
course of a year. The underlying assumption is foflt die original data follow a Poisson-lfke
distribution in which case the mean and variance of the data are equal It should be noted that the
square root transformation overcorrects when very small values and zeros appear hi the original data.
hi these cases, tJX+l is often used as a transformation.

Inverse Sine (ArcsineX): This transformation may be used for binomial proportions based on
count data to achieve stability in variance. The resulting transformed data are expressed in radians
(angular degrees). Special tables must be used to transform the proportions into degrees.

Box-Cox Transformations: This transformation is a complex power transformation that takes the
original data and raises each data observation to the power lambda (X). A logarithmic transformation
is a special case of the Box-Cox transformation. The rationale is to find A. such that the transformed
data have the best possible additive model for the variance structure, the errors are normally
EPAQA/G-9 4.6-1 QA96
-------
distributed, and the variance is as constant as possible over all possible concentration values.' The
Maximum Likelihood technique is used to find A such that the residual error from fitting the
theorized model is minimized. In practice, the exact value of X is often rounded to a convenient value
for ease in interpretation (for example, JL «• -1.1 would be rounded to -1 as it would then have the
interpretation of a reciprocal transform). One of the drawbacks of the Box-Cox transformation is the
difficulty in physically interpreting the transformed data.

4.6.2 Reasons for Data Transformations

By transforming the data, assumptions that are not satisfied in the original data can be satisfied by
the transformed data. For instance, a right-skewed distribution can be transformed to be approximately
Gaussian (normal) by using a logarithmic or square-root transformation. Then the normal-theory procedures
can be applied to the transformed data. If data are lognonnally distributed, then apply procedures to
logarithms of the data. However, selecting the correct transformation may be difficult If standard
transformations do not apply, it is suggested that the data user consult a statistician.

Another important use of transformations is in the interpretation of data collected under conditions
leading to an Analysis of Variance (ANOVA). Some of the key assumptions needed for analysis (for
example, additivity of variance components) may only be satisfied if the data are transformed suitably. The
selection of a suitable transformation depends on the structure of the data collection design; however, the
interpretation of the transformed data remains an issue.

While transformations are useful for dealing with data that do not satisfy statistical assumptions,
they can also be used for various other purposes. For example, transformations are useful for consolidating
data that may be spread out or that have several extreme values. In addition, transformations can be used to
derive a linear relationship between two variables, so that linear regression analysis can be applied They can
also be used to efficiently estimate quantities such as the ir"*-^ and variance of & lognonnal distribution.
Transformations may also make the analysis of data easier by changing the scale into one that is more
familiar or easier to work with.

Once the data have been transformed, all statistical analysis must be performed on the transformed
data. No attempt should be ma
-------
BOE 4.8-1: Directions to? Transforming Data and an Exampls

Let X,, X*.... XB represent the n data points. To apply a transformation, simply apply the transforming
function to each data point When a transformation is implemented to make thsdata satisfy some statistical
assumption, it wi need to bs verified that the transformed data satisfy tha assumption.

Example: Transforming Loanormal Data

A logarithmic transformation te partfculariy useful for pplution data PoSutton data are often skewed, thus the
tog-transformed data will tend to be symmetric. Consider the data set shown below with 15 data points. The
frequency plot of this data (below) shows that the data are possibly tognormafly distributed. If any analysis
performed with this data assumes normality, then the data may be logarithmically transformed to achieve
normality. The transformed data are shown in column 2. A frequency pk>t of the transformed data (below)
shows that the transformed data appear to be normally distributed.
Observed
X
0.22
3.48
6.67
2.53
1.11
0.33
1.64
1.37
Transformed
hilXI
-1.51
1.25
1.90
0.93
0.10
-1.11
0.50
0.31
Observed
X
0.47
0.67
0.75
0.60
0.99
0.90
0.26
Transformed
InOO
-0.76
-0.40
-0.29
-0.51
-0.01
•0.11
-1.35

1 ,'
1 0

I . • .
2 3 4 § 37
CtecsvaSVe&ss
7
I9
US
|2
* 1

•

°.4 .3 -2 -1 01 2 3 4
EPAQA/G-9
4.6-3
QA96
-------
4.7 VALUES BELOW DETECTION LIMITS

Data generated from chemical analysis may fall below the detection limit (DL) of the analytical
procedure. These measurement data are generally described as not detected, or nondetects, (rather than as
zero or not present) and the appropriate limit of detection is usually reported, In cases where measurement
data are described as not detected, the concentration of the chemical is unknown although it lies somewhere
between zero and the detection limit Data that includes both detected and non-detected results are called
censored data in the statistical literature.

There are a variety of ways to evaluate data that include values below the detection limit However,
there are no general procedures that are applicable in all cases. Some general guidelines are presented in
Table 4.7-1. Although these guidelines are usually adequate, they should be implemented cautiously.
Percentage of
Nondetects
< 15%
15% -50%
>50%-90%
Section
4.7.1
4.7.2
4.7.3
~\
Statistical Analysis Method
Replace nondetects with DL/2,
DL or a very small number

Trimmed mean, Cohen's
adjustment, Winsorized meant
and standard deviation.
Use tests for proportions
(section 3.2.2)
Table 4.7-1. Guidelines for Analyzing Data with Nondetects

All of the suggested procedures for analyzing data with nondetects depend on the amount of data
below the detection limit For relatively small amounts below detection limit values, replacing the nondetects
with a small number and proceeding with the usual analysis may be satisfactory. For moderate amounts of
data below the detection limit, a more detailed adjustment is appropriate. In situations where relatively large
amounts of data below the detection limit exist, one may need only to consider whether the chemical was
detected as above some level or not The interpretation of small, moderate, and large amounts of data below
the DL is subjective. Table 4.7-1 provides percentages to assist the user in evaluating their particular
situation. However, it should be recognized that these percentages are not hard and fast rules, but should be
based on judgement

In addition to the percentage of samples below the detection limit, sample size infliiRncpa which
procedures should be used to evaluate the data. For example, the case where 1 sample out of 4 is not detected
should be treated differently from the case where 25 samples out of 100 are not detected. Therefore, this
guidance suggests that the data analyst consult a statistician for the most appropriate way to evaluate data
containing values below the detection level
EPAQA/G-9
4.7-1
QA96
-------
4.7.1 Less than 15% Nondetects - Substitution Methods
If a small proportion of the observations are not detected, these may be replaced with a small
number, usually the detection limit divided by 2 (DL/2), and the usual analysis performed. As a guideline, if
15% or fewer of the values are not detected, replace them with the method detection limit divided by two and
proceed with the appropriate analysis using these modified values. If simple substitution of values below the
detection limit is proposed when more than 15% of the values are reported as not detected, consider using
nonparametric methods or a test of proportions to analyze the data. If a more accurate method is to be
considered, see Cohen's Method (section 4.7.2.1).

4.7.2 Between 15-50% Nondetects

4.7J.1 Cohen's Method

Cohen's method provides adjusted estimates of the sample mean and standard deviation that accounts
for data below the detection level. The adjusted estimates are based on the statistical technique of maximum
likelihood estimation of the mean and variance so that the fact that the nondetects are below the limit of
detection but may not be zero is accounted for. The adjusted mean and standard deviation can then be used in
the parametric tests described in Chapter 3 (e.g., the one sample t-test of section 3.2.1.1). However, if more
than 50% of the observations are not detected, Cohen's method should not be used. In addition, this method
requires that the data without the nondetects be normally distributed and the detection limit is always the
same. Directions for Cohen's method are contained in Box 4.7-1; an example is given in Box 4.7-2.
Box 4.7-1: Directions for Cohen's Method

Let X,. Xj,..., X,, represent the n data points with the first m values representing the data points above the
detection imit (DL). Thus, there are (n-m) data points are below the DL
_ 1
STEP 1: Compute the sample mean x,, from the data above the detection imit: Xd « —
m
STEP 2: Compute the sample variance^ from the data above the detection Em&
,
-l
STEP 3: Compute h
STEP 4: UsahandvinTaWeA-IOofAppendbAtodetermln®*. For example, V h = 0.4 and y • 0.30,
then A s 0.6713. if the exact valu© of hand y do not appear in the table, use double Bnear
interpolation (Box 4.7-3) to estimate L

STEP 5: Estimate th@ corrected sampte mean, x. end samp!® variance, s2, to account for the data below
the detection imit, as foBows: X - X4 - £(£, - DL) and s2 • sj + X(^, - DQ*.
EPAQA/G-9
4.7-2
QA96
-------
Box 4.7-2: An Exampte of Cohen's Method

Suffate concentrations were measured far 24 data points. The detection Imit was 1,450 mg/L and 3 of the 24
values were below the detection level. The 24 values are 1850, 1760, < 1450 (NO). 1710, 1575, 1475, 1780,
1790, 1780. < 1450 (ND), 1790. 1800, < 1450 (ND). 1800, 1840, 1820. 1860. 1780. 1760, 1800, 1900,
1770, 1790, 1780 mg/L Cohen's Method wi be used to adjust the sample mean for use in a t-test to
determine if the mean is greater than 1600 mg/L

STEP1: The sample mean of them = 21 values above the detection level fe Xd = 1771.9

STEP 2: The sample variance of the 21 quantified values Is sj = 8593.69.

STEPS: h = (24-21)/24 - 0.125 and y - 8593.69/0771.9 -1450)2 = 0.083
STEP 4: TaUeA-10ofAppendb(Awa8U8edforh = 0.125andv = 0.083toKndthevalueofA. Since the
table does not contain these entries exactly, double inear interpolation was used to estimate
I = 0.14986 (see Box 4.7-3).

STEP 5: The corrected sample mean and standard deviation are then estimated as follows:

X = 1771.9-0.14986(1771.9-1450) = 1723.66 and

s2 = 8593.69 «• 0.14986(1771.9 -1450)2 = 24122.12
Box4.7-3: Double Linear Interpolation

The details of the double Enear interpolation are provided to assist In the use of Table A-10 of Appends A.
The desired value for £ corresponds to v = 0.083 and, h = 0.125 from Box 4.7-2, Step 3. The values from
Table A-10 for interpolatation are:

y. h = 0.10 h = 0.15
0.05 0.11431 0.17935
0.10 0.11804 0.18479

There are 0.05 units between 0.10 and 0.15 on the h-scate and 0.025 units between 0.10 and 0.125.
Therefore, the value of interest tea (0.025/0.05)100% = 50% of the distance atong the interval between 0.10
and 0.15. To inearty Interpolate between tabulated values on the h axis for y = 0.05, the range between the
values must be calculated, 0.17935 - 0.11431 = 0.06504; the value that is 50% of the cfistance along the
range must be computed, 0.06504 x 0.50 - 0.03252; and then that value must be added to the tower point
on the tabulated values, 0.11431 + 0.03252 - 0.14683. Similarly for Y • 0.10,0.18479 - 0.11804 = 0.06675,
0.08675 x 0.50 = 0.033375, and 0.11804 * 0.033375 = 0.151415.

On the Y-axis there are 0.033 units between 0.05 and 0.083 and there are 0.05 units between 0.05 and 0.10.
The value of interest (0.083) Ees (0.033/0.05 x 100) - 66% of the distance along the interval between 0.05
and 0.10, so 0.141415 - 0.14683 - 0.004585,0.004585 " 0.66 = 0.0030261. Therefore,

£ = 0.14683 * 0.0030261 = 0.14886.
EPAQA/G-9 4.7 - 3 QA96
-------
4.7.2.2 Trimmed Mem

Trimming discards the data in the tails of a data set in order to develop an unbiased estimate of the
population mean. For environmental data, noadetects usually occur in the left tail of the data so trimming the
data can be used to adjust the data set to account for nondetects when estimating a mean. Developing s
100p% trimmed mean involves trimming p% of the data in both the lower and the upper tail Note that p
must be between 0 and .5 since p represents the portion deleted in both the upper and the lower tail After np
of the largest values and np of the smallest values are trimmed, there are n(l-2p) data values remaining
Therefore, the proportion trimmed is dependent on the total sample size (n) since a reasonable amount of
samples must remain for analysis For approximately symmetric distributions, & 25% trimmed mmm (the
midmean) is a good estimator of the population mean. However, environmental data are often skewed (non-
symmetric) and in these cases & 15% trimmed mean performance may be a good estimator of the population
mean. It is also possible to trim the data only to replace the nondetects. For example, if 3% of the data are
below the detection limit, a 3% trimmed m«*^" could be used to est"natg the population mean Directions for
developing a trimmed mean are contained in Box 4.7-4 and an example is given in Box 4.7-5. A trimmed
variance is rarely calculated and is of limited use.
Bos 4.7-4: Directions for Developing a Trimmed Mean

Let X,, Xj,.... Xa represent the n data points. To develop a 100p% trimmed mean (0 < p < 0.5): •

STEP1: Let t represent the integer part of th® product np. For example, ^p^ .25 and n° 17.
np s (.25X17) o 4.25. so t» 4.

STEP 2: Delete the tsmaBest values of the data set and the t largest values of me data set

STEP 3: Compute the arithmetic mean of tha remaining n • 2t values: X
n-2t

This value e the estimate of the population mean.
Bos 47-41: An ixampte of the Trlmm@4 Mean

Sulfate concentration© were measured for 24 data points. Th© detection im£ wee 1 ,450 mg/L and 3 of the 24
values were below this Imit The 24 values feted tn order from smallest to largest are: < 1450 (NO), « 1450
(ND), < 1450 (ND), 1475, 1S7S, 1710, 1760, 1760, 1770, 1780, 1780, 1780, 1780, 1790, 1780, 1790, 1800,
1800, 1800, 1820, 1840, 1850, 1860, 1800 mg/L A 15% trimmed mean wtS be used to develop an estimate
of the population mean that accounts for the 3 nondetecte.

STEP1: SIncanp = (24X.15)°3.e.t"3.

STEP 2: The 3 emaBe^ values c^ the data set and the 3 largest valu^ of the data «s« ware deleted. The
new data eel fee 1475, 1575. 1710, 1760, 1760, 1770, 1780, 1780, 1780, 1780, 1780, 1790,
1790, 1800. 1800. 1800. 1820. 1840 mg/L

STEP 3: Compute the arithmetic mean of tfi© remaining n-2t values:
Therefore, the 15% trimmed mean is 1755.56 mg/L, which la an estimate of the population mean.
EPAQA/G-9 4.7-4 QA96
-------
4.7.2.3 Wuasoraedl Mean and Standard Devuiiom

laces daifl in the tails of & data set with the next most extreme data value. For
environmental data, nondetocts usually occur in die left tail of thedata. Therefore, winsorizing can be used to
adjust the data set to account fornondetects. The mean and standard deviation can then be computed oaths
new data set ' Directions for wiflsorizing data (and revising the sample size) are contained in Box 4.7-6 and
an example is given in Box 4.7=7.
Box 4.7-8: Directions few Developing a Winsorized
Mean and Standard Deviation

Let X,, Xj, . . . , X, represent the n data points and m represent the number of data points above the detection
§mit (DL). and hence n-m below the CM-

STEP 1: Ustthe data in order from smallest to largest, Including nondetecte. Label these points X,,,.
• •• *
-------
4.7 J Greater 4ham 50% Nondetects - Tesi of Proportions

If more than 30% of the data are below the detection limit but at least 10% of the observations are
quantified, tests of proportions may be used to test hypotheses using the data. Thus, if the parameter of
interest is a mean^ consider switching the parameter of interest to some percentile greater than the percent of
data below the detection limit For example, if 67% of the data are below the DL, consider switching the
parameter of interest to the 75* percentile. Then the method described in 3.2.2 can be applied to'test the
hypothesis concerning the 75* percentile. It is important to note that the tests of proportions may not be
applicable for composite samples. In this case, the data analyst should consult a statistician before
proceeding with analysis.

If very few quantified values are found, & method based on the Poisson distribution may be used as
an alternative approach. However, with a large proportion of nondetects in the data, the data analyst should
consult with a statistician before proceeding with analysis.
EPAQA/G-9 4.7-6 QA96
-------
CHAPTERS
STEP 5s DRAW CONCLUSIONS FROM THE DATA
THi DATA QUALITY ASSESSMENT PROCESS
Review DQOc and Sampling Dassgn
Conduct Preliminary Data Review
Select the Statistical Test
Verify the Assumptions
Draw Conclusions From the Data

DRAW CONCLUSIONS FROM THE DATA
Comfc^ thahypotheofeteet art Interpret tteresuSte
tn the context of the data user's objectives.
. Parfwmthe Stattrtfcal HypottratoTssS
»Draw Study Conclusions.
• EvatesSa PorfMimnoe of fte Semp&\g Design
• tocusa h hypothecs te*fog refeted to undaretending
end cofmuicailng the tw& raoute
StepS: Draw Conclusions from the Data

o Perform the calculations for the statistical hypothesis test
o Perform the calculations and document them dearly.
Q If anomaies or outfiere are present In the data set, perform the calculations with end
without the questionabis data.

o Evaluate the statistical test results and draw condusions. .
a If the mil hypothesis to rejected, then draw the condusions and document the analysis.
o If the nul hypothesis is not rejected, verify whether the tolerable Emits on false negative
decision errors have been satisfied. If so, draw condusions and document the analysis; 3
net, determine corrective actions, £ any.
o Interpret the results of the test

o Evaluate the performance of the eampSng design ff the design ta to ba used again.
Q Evaluate the statistical power of the design over the ful range of parameter values;
consuR s statistician as necessary.
EPA QA/G-9
FY95
-------
STEP 5: DRAW CONCLUSIONS FROM THE DATA

Page
5.1 OVERVIEW AND ACTIVITIES .. 5.1 • 1
5.1.1 Perform the Statistical Hypothesis Test :..... 5.1-1
5.1.2 Draw Study Conclusions „ 5.1-1
5.1.3 Evaluate Performance of the Sampling Design 5.1-2
5.2 INTERPRETING AND COMMUNICATING THE TEST RESULTS 5.2-1
5.2.1 Interpretation of p-Values 5.2-1
5.2.2 "Accepting" vs. "Failing to Reject" the Null Hypothesis 5.2-1
5.2.3 Statistical Significance vs. Practical Significance 5.2-2
5.2.4 Impact of Bias on Test Results 5.2-2
5.2.5 Quantity vs. Quality of Data : 5.2-5
5.2.6 "Proof of Safety" vs. "Proof of Hazard" 5.2-6
LIST OF FIGURES

Figure No. Page
5.2-1. Illustration of Unbiased versus Biased Power Curves 5.2-5
LIST OF BOXES

Bos No. Page
Box 5.1-1: Checking Adequacy of Sample Size for a One-Sample t-Test 5.1-3
Box 5.1-2: Example of Power Calculations for the One-Sample Test of a Single Proportion 5.1-3
Box 5.2-1: Example of a Comparison of Two Variances which is Statistically
but not Practically Significant , 5.2 - 3
Box 5.2-2: Example of a Comparison of Two Biases 5.2-4
EPAQA/G-9 FY95
-------
CHATTERS
STEP Ss DRAW CONCLUSIONS FROM THE DATA

5.1 OVERVIEW AND ACTIVITIES

In this final step of the DQA Process, the analyst performs the statistical hypothesis test and draws
conclusions that address the data user's objectives. This step represents the culmination of the planning,
implementation, and assessment phases of the data operations. The data user's planning objectives will have.
been reviewed (or developed retrospectively) and the sampling design examined in Step 1. Reports oa the
implementation of the sampling scheme will have bees reviewed and a preliminary picture of the sampling
results developed in Step 2. In light of the information gained in Step 2, the statistical test will have been
selected in Step 3. To ensure that the chosen statistical methods are valid, the key underlying assumptions of
the statistical test will have been verified in Step 4. Consequently, all of the activities conducted up to this
point should ensure that the calculations performed on the data set and the conclusions drawn here in Step 5
address the data user's needs in a scientifically defensible manner. This chapter describes the main activities
that should be conducted during this step. The actual procedures for implementing some commonly used
statistical tests are described in Step 3, Select the Statistical Test

5.1.1 Perform the Statistical Hypothesis Test

The god of this activity is to conduct the statistical hypothesis test Step-by-step directions for
several commonly used statistical tests are described in Chapter 3. The calculations for the test should be
clearly documented and easily verifiable. In addition, the documentation of the results of the test should be
understandable so that the results can be communicated effectively to those who may hold a stake in the
resulting decision. If computer software is used to perform the calculations, ensurethat the procedures are
adequately documented, particularly if algorithms have been developed and coded specifically for the project

The analyst should always exercise best professional judgment when performing the calculations.
For instance, if outliers or anomalies are present in the data set, the calculations should be performed both
with and without the questionable data to see what effect they may have on the results.

5.1.2 Draw Study Conclusions

The goal of this activity is to translate the results of the statistical hypothesis test so that the data
user may draw a conclusion from the data. The results of the statistical hypothesis testwill be either:

(a) reject the null hypothesis, in which case the analyst is concerned about a possible false positive
decision error, or

(b) fail to reject the null hypothesis, in which case the analyst is concerned about & possible false
negative decision error.

In case (a), the data have provided the evidence needed to reject the mill hypothesis, so the decision
can be r"adp with sufficient confidence and without further analysis. This is because the statistical test based
on the classical hypothesis testing philosophy, which is the approach described in prior chapters, inherently
controls the false positive decision error rate within the data user's tolerable limits, provided that the
underlying assumptions of the test have been verified correctly. .

EPAQA/G-9 5.1 = 1 QA96
-------
In case (b), the data do not provide sufficient evidence to reject the null hypothesis, and the data must
be analyzed further to detaining whether the data user's tolerable limits on false negative decision errors have
been satisfied. One of two possible conditions may prevail:

(1) The data do not support rejecting the null hypothesis and the false negative decision error
limits were satisfied. In this case, the conclusion is drawn in favor of the null hypothesis,
since the probability of committing a false negative decision error is believed to be
sufficiently small in the context of the current study (see section 5.2).

(2) The data do not support rejecting the null hypothesis, and the false negative decision error
limits were not satisfied. In this case, the statistical test was not powerful enough to satisfy
the data user's performance criteria. The data user may choose to tolerate a higher false
negative decision error rate than previously specified and draw the conclusion in favor of the
null hypothesis, or instead take some form of corrective action, such as obtaining additional
data before drawing a conclusion and making a decision.

When the test fails to reject the null hypothesis, the most thorough procedure for verifying whether the false
negative decision error limits have been satisfied is to compute the estimated power of the statistical test,
using the variability observed in the data. Computing the power of the statistical test across the full range of
possible parameter values can be complicated and usually requires specialized software. Power calculations
are also necessary for evaluating the performance of a sampling design. Thus, power calculations will be
discussed further in section 5.1.3.

A simpler methndl can be used for checking the performance of the statistical test Using an estimate
of variance obtained from the actual data or upper 95% confidence limit on variance, the sample size required
to satisfy the data user's objectives can be calculated retrospectively. If this theoretical sample size is less
than or equal to the number of samples actually taken, then the test is sufficiently powerful If the required
number of samples is greater frpn the number actually collected, then additional'samples would be required to
satisfy the data user's performance criteria for the statistical test An example of this method is contained in
Box 5.1-1. The equations required to perform these calculations have been provided in the detailed step-by-
step instructions for each hypothesis test procedure in Chapter 3.

S.1.3 Evaluate Perfonnamce of the Sampling EJssigffi

If the sampling design is to be used again, either in a later phase of the current study or in a similar
study, the analyst will be interested in evaluating the overaU performance of the design. To evaluate the
sampling desig"t the a"qtyft performs a statistical power analysis that describes the estimated power of the
statistical test over the range of possible parameter values. .The power of a statistical test is the probability of
rejecting the null hypothesis when the null hypothesis is false. The estimated power is computed for all
parameter values under the alternative hypothesis to create a power curve. A power analysis helps the analyst
evaluate the adequacy of the sampling design when the true parameter value lies in the vicinity of the action
level (which may not have been the outcome of the current study). In this manner, the analyst may determine
how well a statistical teft p*yfr»rmadl and! compare this performance with that of other tests.

The calculations required to perform a power analysis can be relatively complicated, depending on
the complexity of the sampling design and statistical test selected. Box 5.1.2 illustrates power calculations
for a test of a single proportion, which is one of the simpler cases. A further discussion of power curves
(performance curves) is contained in the Guidance for Data Quality Objectives (EPA QA/G-4,1994).

EPAQA/G-9 5.1-2 QA96
-------
Bos 5.1-1: Checking Adequacy of Sample Stea for a Ono-
Sampla t-TesS tor Simple Random Sampling
., DQOs
specified that the test should imtt the false positive error rate to 5% and the false negative error rate to 20% if the
true mean were 105 ppm. A random sample of size n = 8 had sample mean x = 99.38 ppm and standard deviation
s = 1 0.41 ppm. The nul hypothesis was not rejected. Assuming that the true value of the standard deviation was
equal to its sample estimate 1 0.41 ppm, it was found that a sample size of 9 would be required, which validated the
sample size of 9 which had actuatty been used.

The distribution of the sample standard deviation is skewed with a long right tai. K fbflows that the chances are
greater than 50% that the sample standard deviation wi underestimate tha true standard deviation. In such a case
it makes sense to build in some conservatism, for example, by using an upper 90% confidence Emit for o in step 5 of
Box 3.3-1 . Using Boxes 4.6-1 and 4.6-2 and n - 1 = 8 degrees of freedom, it is found that U = 3.49, «o that an
upper 90% confidence imS for the true standard deviation to
10.41
15.76
Using thie value for 8 in Step 5 of Box 3.3-1 or Box 3.3-2 leads to the sample size estimate of 17. Hence, a eampte
size of at least 17 should be used to be 90% sure of achieving the DQOc. Since it» generaSy desirable to avoid the
need for additional sampling, it is advisable to conservatively estimate sample size in the first place. In cases where
OQOs depend on a variance estimate, this conservatism is achieved by intentionally overestimating the variance.
BOH i.i-2: Example of Power Calculations for the One-Samp!® Teat of a Single Proportion

This box iustrates power calculations for the test of H,: P 2 .20 vs. HA: P <* .20, with a false positive error rate of 5%
when P-.20 presented in Boxes 3.3-9 and 3.3-10. The power of tha test wi b« calculated assuming P, - .15 and
before any data are available. Since nP, and n(1-P,) both exceed 4, tha sample size EB large enough for tha normal
approximation, and the test can be carried out as in steps 3 and 4 of Box 3.3-9.
STEP1:
STEP 2:
STEPS:
Determine the general contitione for rejection of the nul hypothesis. In tha ess®, ft® nui hypothesis B
rejected if the sample proportion fe sufficiently smaller than P* (Clearly, a sample proportion above P,
cannot cast doubt on HO.) By steps 3 and 4 of Box 3.3-9 and 3.3-10, H, @ rejected i?
n
Hera p 'a tha sample proportion, 0, = 1 • P^, n is tha sample size, and z^ te the critical value such that
100(1 -a)% of the.standard normal Distribution is below z^. This tnequaOty te tru® B
Determine the specific conditions for rejection of the nul hypothesis i? P, (=1-Q,) fe the true value of the
proportion P. The same operations as are used In step 3 of Box 3.3-9 era performed on both @kte© of
the above inequaity. Hc^ravar.P.isrepiacadbyPiSincaitlBBSSumadftat
These operations make the normal approximation applicable. Hence, rejection occurs '€
.20 - .15 -
_0 55
/(.15)(.85)/85
Find the probabity of rejection 'tf P, te the true proportion. By tha same reasoning that ted to the test in
steps 3 and 4 of Boxes 3.3-9 and 3.3^10. the quantity on the left-hand side of the above inequality is &
standard normal variable. Hence the power at P, = .15 (Le., the probability of rejection of H,, when .15 is
the true proportion) is tha probability that & standard normal variable is less than -0.55. In the ess®, the
probability is approximately 0.3 (using the last Bna from Tabte 1 of Appemfix A) which ® fairiy smaS.
EPA QA/G-9
5.1-3
QA96
-------
S2 . INTERPRETING AND COMMUNICATING THE TEST RESULTS

Sometimes difficulties may arise in interpreting or explaining the results of a statistical test One
reason for such difficulties may stem from inconsistencies in terminology, another may be due to a lack of
understanding of some of the basic notions underlying hypothesis tests. As an example, in explaining the
results to a data user, an analyst may use different terminology than that appearing in this guidance. For
instance, rather than saying that the null hypothesis was or was not rejected, analysts may report the result of
a test by saying that their .computer output shows a p-value of 0.12. What does this mean? Similar problems
of interpretation may occur when the data user attempts to understand the practical significance of the test
results or to explain the test results to others. The following paragraphs touch on some of the philosophical
related tO hypothesis tfStfrg yfa'ch P^y h^p 'n imHergtanding and cnmmiTpt fetf r^yylfs

Interpretation! of p-Vstoes

The classical approach for performing hypothesis tests is to prespecify the significance level of the
test, Le., the Type I decision error rate a. This rate is used to define the decision rule associated with the
hypothesis test For instance, in testing whether the population mean \i exceeds a threshold level (e.g., 100
ppm), the test statistic may depend on X, an estimate of u. Obtaining an estimate x that is greater than 100
ppm may occur simply by chance even if the true mean u is less than or equal to 100; however, if X is "much
larger" than 100 ppm, then there is only & small chance that the null hypothesis HO (u 2 100 ppm) is true.
Hence the decision rule might take the form "reject HO if x exceeds 100 + (T, where C is a positive quantity
that depends on a (and on the variability of x). If this condition is met, then the result of the statistical test is
reported as "reject H?"; otherwise, the result is repotted as "do not reject HO." (See Box 3.3-2 for an example
ofat-test)

An alternative way of reporting the result of a statistical test is to report its p-value, which is defined
as the probability, assuming the null hypothesis to be true, of observing & test result at least as extreme as
that found in the sample. Many statistical software packages report p-values, rather than adopting the
classical approach of using a prespetified Type I error rate. In the above example, for instance, the p-value
would be the probability of observing a sample mean as large as X (or larger) if in fact the true mean was
equal to 100 ppm.- Obviously, in making a decision based on the p-value, one should reject H, when pis
small atKJ not rgJEC* '* if p «>»g^ Thus the Bpliftiniiidiip between pvalneg and the classical hypothesis
testing approach is that one rejects HO if the p-valus associated with the test result is less than 6. If the data
user had chosen the Type I error rate as O.OS a priori and the analyst reported a p-value of 0.12, then the data
user would report the result as "do not reject the null hypothesis;" if the p-value had been reported as 0.03,
then that person would report the result as "reject the null hypothesis." An advantage of reporting p-values is
that they provide a measure of the strength of evidence for or against the null hypothesis, which allows data
users to establish their own Type I error rates. The significance level can be interpreted as that p-value («)
that divides "do ix* reject HO" from "reject HO,"

5.23, "Accepting" vs. "Failing to Reject? tme Null Hypothesis

As noted in the paragraphs above, the classical approach to hypothesis testing results in one of two
conclusions: urejectH0"(caUedarignifi<»ntresdt)or"donotrejectHa"(ancBsignificant In the
latter case one might be tempted to equate "do not reject HO" with "accept HO." This terminology is not
recommended, however, because of the philosophy underlying the classical testing procedure. This
philosophy places the burden of proof on the alternative hypothesis, that is, the null hypothesis is rejected
only if the evidence furnished by the data convinces us that the alternative hypothesis is the more likely state

EPAQA/G-9 5.2-1 QA96
-------
of nature. If a nonsignificant result is obtained, h provides evidence that the null hypothesis could
sufficiently account for the observed data, but it does not imply that the hypothesis is the only hypothesis that
could be supported by the data. In other words, a highly nonsignificant result (e.g., a p-value of 0.80) may
indicate that the null hypothesis provides a reasonable model for explaining the data, but it does not
necessarily imply that the null hypothesis is true. It may, for example, simply indicate that the sample size
was not large enough to establish convincingly that ute alternative hypothesis was more iikefy. When the
phrase "accept HO" is encountered, it must be considered as "accepted with the preceding caveats."

SJ3 Statistical Significance vs. Practical Significances

There is an important distinction between these two concepts. Statistical significance simply refers
to the result of the hypothesis test Was the null hypothesis rejected? The likelihood of achieving s
statistically significant result depends on the true value of the population parameter being tested (fb?
example, u), how much that value deviates from the value hypothesized under the null hypothesis (for
example, (O, and on the sample size. This dependence on (y. - u m. for example) when the actual difference, from a practical
standpoint, may be inconsequential Or, in the case of the slowly increasing power curve, one may not find a
significant result even though a "large" difference between u and m, exists. Neither of these situations is
desirable: in the former case, there has been an excess of resources expended, whereas in the latter case, s
Type H error is likely and has occurred.

But how large a difference between the parameter and the aull value is of real importance? This
relates to the concept of practical significance. Ideally, this question is asked and answered as part of the
DQO process during the planning phase of the study. Knowing the magnitude of the difference that is
regarded as being of practical significance is important during the design stage because this allows ease, to the
extent foat prior information permits, to determine a sampling plan of type and size that will nmkg the
magnitude of that difference commensurate with a difference that can be detected with high probability.
From a purely statistical design perspective, this can be considered to be main purpose of the DQO process.
With such planning, the likelihood of encountering either of the undesirable situations mentioned in the prior
paragraph can be reduced. Box 5.2-1 contains an example of a statistically significant but fairly
inconsequential difference

.5 J.4 Impact of Bis--, om Test Results

Bias is defined as the difference between the expected value of s statistic and a population parameter.
It is relevant when the statistic of interest (e.g., a sample average x) is to be used as an estimate of the
parameter (e.g., the population mean ji). For example, the population parameter of interest may be the
average concentration of dioxin within the given bounds of a hazardous waste site, and the statistic might be
the sample average as obtained from a random sample of points within those bounds. The expected value of
a statistic can be interpreted as supposing one repeatedly implemented the particular sampling design & very
large number of times and calculated the statistic of interest in each case. The average of the statistics
EPAQA/G-9 5.2-2 QA96
-------
BoxSJ-1: Exampte of a Comparison off Two Variances
which Is Statistically but not Practically Significant

the quality control (QC) program associated with a measurement system provides important information on
performance and also yields data which should be taken into account in some statistical analyses. The QC program
should include QC check samples, i.e., samples of known composition and concentration which are run at regular
frequencies. The term precision refers to the consistency of a measurement method in repeated applications under
fixed conditions. Precision is usually equated with a standard deviation. For many purposes, the appropriate
standard deviation n one which results from applying the system to the same sample over a long period of time.

This example concerns two methods for measuring ozone in ambient eir, an approved method and a new
candidate method. Both methods are used.onca per week on a weekly basis for three months. Based on 13
analyses with each method of the mid-range QC check sample at 100 ppb, the nul hypothesis of the equality of the
two variances win be tested with a false positive error rate of 5% or less. (If the variances are equal, then the
standard deviations are equal.) Method 1 had a sample mean of 80 ppb and a standard deviation of 4 ppb.
Method 2 had a mean of 90 ppb and a standard deviation of 8 ppb. The Shapiro-Wflks teat did not reject the
assumption of normality for either method. Applying the F-test of Box 4.5-2, the F ratio is 82/42 = 2. Using 12
degrees of freedom for both the numerator and denominator, the F ratio must exceed 3.28 in order to reject the
hypothesis of equal variances (Table A-9 of Appendbc A). Since 4 > 3.28, the hypothesis of equal variances EB
rejected, and it is concluded that method 1 to significantly more precise than method 2.

In an industrialized urban environment, the true ozone levels at 8 fixed location and time of day are known to vary
over a period of months with a coefficient of variation of at least 100%. This means that the ratio of the standard
deviation (SO) to the mean at a given location is at least 1. For a mean of 100 ppb, the standard deviation over time
for true ozone values at the location would be at least 100 ppb. Relative to this degree of variabity, a difference
between measurement error standard deviations of 4 or 8 ppb to negflgibte. The overaS variance, incorporating the
true process variability and measurement error, to obtained by adding the individual variances. For instance, 8
measurement error standard deviation is 8 ppb, then the total variance is (100 ppbX100 ppb) + (8 ppbX8 ppb).
Taking the square root of the variance gives a corresponding total standard deviation of 100.32 ppb. For a
measurement error standard deviation of 4 ppb, the total standard deviation would b® 100.08 ppb. From a practical
standpoint the difference in precision between the frvo methods to insignificant for the given application, despite the
finding that there to a statistically significant difference between the variances of the fowo methods.
values would then be regarded as its expected value. Let E denote the expected value of x and denote the
relationship between the expected value and the parameter, u, as E m u+b where b is the bias. For instance;
if the bias occurred due to incomplete recovery of an anah/te (and no adjustment is made), then
b = (R-100)u/100, where R denotes the percent recovery. Bias may also occur for other reasons, such as lack
of coverage of the entire target population (e.g.0 if only the drums within s storage site that are easily
accessible are eligible for inclusion in the sample, then inferences to the entire group of drums may be
biased). Moreover, in cases of incomplete coverage, the mngnitiidf! and direction of the bias may be
unknown An example involving comparison of the biases of two measurement methods is contained in
Box 5.2-2.

In the context of hypothesis testing, the impact of bias can be quite severe in some circumstances.
This can be illustrated by comparing the power curve of a test when bias is not present with a power curve for
the same test when bias is present The basic influence of bias is to shift the former "no bias" curve to the
right or left, depending on the direction of the bias. If the bias b constant, then the second curve will be an
exact translation of the former curve; if not, there will be a change in the shape of the second curve in addition
to the translation. If the existence of the bias is unknown, then the former power curve will be regarded as the
curve that determines the properties of the test when in fact the second curve will be the one mat actually
represents the test's power. For example, in Figure 5.2-1 when the true value of the parameter is 120, the "no
bias" power is 0.72 but the true power (the biased power) is only 0.14, a substantial difference.

EPAQA/G-9 5.2-3 QA96
-------
BO8&2-2: Example of a Comparison of Two Biases

This example is a continuation of the ozone measurement comparison described in Box 5.2-1. Lai x and % denote
the sample mean and standard deviation of measurement method 1 applied to the QC check sample, and let? and
SY denote the sample mean and standard deviation of method 2. Then x = 80 ppb, a* = 4 ppb, Y = 90 ppb and @y -
8 ppb. The estimated biases arex-T = 80 -100 »-20 ppb for method 1, and Y-T - 00-100 = 10 ppb for method
2, since 100 ppb ia the true value T. That is, method 1 seems to underestimate by 20 ppb, and method 2 seams to
underestimate by 10 ppb. Let u, and u, be the underlying mean concentrations for measurement methods 1 and 2
applied to the QC check sample. These means correspond to the average results which would obtain by applying
each method a large number of times to the QC check sample, over a long period of time.

A two-sample West (Boxes 3.3-1 and 3.3-3) can be used to test for a significant Difference between these two
biases. In this case, a two-taled test of the nui hypothesis H,: MI -PiaO against the alternative HA: u,-u,«>0f8
appropriate, because there is no a priori reason fin advance of data coBection) to suspect that one measurement
method is superior to the other. (In general, hypotheses should not be tailored to data.) Note that the difference
between the two biases is the same as the difference (u, - uj between the too underlying means of the .
measurement methods. The test wi be done to Emit the false positive error rate to 5% if the too means are equal

STEP1: x = 80 ppb, % = 4 ppb, Y = 90 ppb, 8y » 8 ppb.

STEP 2: From Box 5.2-1, it is known that the methods have significantly tifferent variances, so that
Sattherthwaite's t-test should be used. Therefore,
*NE
m n
13 13
STEP 3: / =

2 2
m n
2

4 S4
* -f. •
m\m - 1) «2 (w - 1)
£*£
13 13
132 12 132 12
17.65.
Rounding down to the nearest integer gives f = 17. For a too-tailed test the critical valu® ia
« WOB = tm ~ 2.110. from Table A-1 of AppenoTx A.
STEP 4: f o JLlI o 80 " 9° o -4.032
5W 2.48
STEPS: For a too-taied test, compare |t| with t^a » 2.11. Since 4.032 > 2.11, reject th® nuQ hypothess and
conclude that there is a significant difference between the too method biases, in favor of method 2.

This box fflustratas Q atuaton Involving too measurement methods where one method Is more precise, but eteo
more biased, than the other. If no adjustment for bias & made, then for many purposes, tha lass biased, more
variable method is preferable. However, proper bias adjustment can make both methods unbiased, so that th®
more precise method becomes the preferred method Such adjustments can be based on QC check sample
results, ff the QC check samples are regarded as representative of environmental samples involving suffidenfy
similar anaiytes and matrices.
EPA QA/G-9
5.2-4
QA96
-------
80
100 120 140
True Value of the Parameter
160
Figure 5.2-1. Illustration of Unbiased versus Biased Power Curves

Since bias is not impacted by changing the sample size, while the precision of estimates and the
power of tests increases with sample size, the relative importance of bias becomes more pronounced when the
sample size increases (le., when one makes the power curve steeper). Similarly, if the same magnitude of
bias exists for two different sites, then the impact on testing errors will be more severe for the site having the
smaller inherent variability in the characteristic of interest (Le., when bias represents a larger portion of total
variability).
i
To minimi7e the effects of bias: identify and fr^ment sources of potential bias; adopt measurement
procedures (including specimen collection, handling, and analysis procedures) that minimize the potential for
bias; make a concerted effort to quantify bias whenever possible; and nuke appropriate compensation for
bias when possible.

5 J.5 Quantity vs. Quality of Data

The above conclusions imply fr*t if compensation for bias cannot be made and if statistically-
based decisions are to be made, then there will be situations in which serious consideration should be given
to using an imprecise (and perhaps relatively inexpensive) chemical mf^Lf4 having negligible bias as
compared to using a very precise method that has even a moderate degree of bias. The tradeoff favoring the
imprecise method is especially relevant when the inherent variability in the population is vcty large relative to
the random measurement error.

For example, suppose a mean concentration for a given spatial area (site) is of interest and that the
coefficient of variation (CV) characterizing the site's variability is 100%. Let method A denote an imprecise
method, with measurement-error CV of 40%, and let method B denote a highly precise method, with
measurement-error CV of 5%. The overall variability, or total variability, can essentially be regarded as the
sum of the spatial variability and the measurement variability. These are obtained from the individual CVs in
EPA QA/G-9
5.2-5
QA96
-------
the form of variances. As C V equals standard deviation divided by mean, it follows that the site standard
deviation is then the CV times the mean. Thus, for the site, the variance is l.OCPx mean3; for method A, the
variance is 0.402 x mean2; and for method B, the variance is 0.052 x mean2. The overall variability when
using method A is then (LOO2 x mean2) + (0.402 x mean2) - 1.16 x mean2, and when using method B, the
variance is (l.OO2 x mean2) + (0.052 x mean2) = 1.0025 mean2. It follows that the overall CV when using
each method is then (1.077 xmean) / mean » 107.7% for method A, and (1.001 x mean) / mean » 100.1%
for methodB.

Now consider a sample of 25 specimens from the site. Tbs precision of the sample mean can then be
characterized by the relative standard error (RSE) of the mean (which for the simple random sample situation
is simply the overall CV divided by the square root of the sample size). For Method A, RSE =• 21.54%; for
method B, RSE = 20.02%. Now suppose that the imprecise method (Method A) is unbiased, while the
precise method (Method B) has a 10% bias (e.g., an anah/te percent recovery of 90%). An overall measure of
error that reflects how well the sample mean estimates the site mean is the relative root mean squared error
(RRMSE):
RRMSE = j(RB?+(RSE?

where RB denotes the relative bias (RB - 0 for Method A since it is unbiased and RB - ± 10% for Method B
since it is biased) and RSE is as defined above. The overall error in the estimation of the population mean
(the RRMSE) would then be 21.54% for Method A and 22.38% for Method B. If the relative bias for
Method B was 15% rather than 10%, then the RRMSE for Method A would be 21.54% and the RRMSE for
Method B would be 25.02%, so the method difference is even more pronounced. While the above illustration
is portrayed in terms of estimation of a mean based on & simple random sample, the basic concepts apply
more generally.

This example serves to illustrate that a method that may be considered preferable from & chemical
point of view (e.g., 85 or 90% recovery, 5% relative standard deviation [RSD]) may not perform as well in a
statistical application as a method with less bias and greater imprecision (e.g., zero bias, 40% RSD),
especially when the inherent site variability is large relative to the measurement-error RSD.

SJ.6 "Proof of Safety" vs. "Proof of Hazard"

Because of the basic hypothesis testing philosophy, the null hypothesis is generally specified in terms
of the status quo (e.g., no change or action will take place if null hypothesis is not rejected). Also, since the
classical approach exercises direct control over the Type I error rate, this rate is generally associated with the
error of most concern (for further discussion of this point, see section 1.2). One difficulty, therefore, may be
obtaining a consensus on which error should be of most concern. It is not unlikely that the Agency s
viewpoint in this regard will differ from the viewpoint of the regulated party. In using this philosophy, the
Agency's ideal approach is not only to set up the direction of the hypothesis in such s way that controlling the
Type I error protects the health and environment but also to set it up in a way that encourages quality (high
precision and accuracy) and minimi re? expenditure of resources in situations where decisions are relatively
"easy" (e.g., all observations are far from the threshold level of interest).

In some cases, how one formulates the hypothesis testing problem can lead to very different sampling
requirements. For instance, following remediation activities at & hazardous waste site, one may seek to
answer "Is the site clean?" Suppose one attempts to address this question by comparing a mean level from
samples taken after the remediation with & threshold level (chosen to reflect "safety"). If the threshold level is

EPAQA/G-9 5.2-6 QA96
-------
near background levels that might have existed in the absence of the enntaminatinn then it may be very
difficult (Le., require enormous sample sizes) to "prove" that the site is "safe." This is because the
concentrations resulting from even a highly efficient remediation under such circumstances would not be
expected to deviate greatly from such & threshold. A better approach for dealing with this problem may be to
compare the remediated site with a reference ("uncontaminated") site, assuming that such a site can be
determined.

To avoid excessive expense in collecting and analyzing samples for a contaminant, compromises will
sometimes be necessary. For instmnre, suppose that a significance level of 0.05 is to be used; however, the
affordable sample size may be expected to yield a test with power of only 0.40 at some specified parameter
value chosen to have practical significance (see section 5.2.3). One possible way that compromise may be
maA» in such a situation is to relax the significance level, for instance, using a = 0.10,0.15, or 0.20. By
relaxing this false positive rate, & higher power (Le., a lower false negative rate P) can be achieved. An
argument can be made, for example, that one should develop sampling plans and determine sample sizes in
such & way that both the Type I and Type n errors are treated simultaneously and in a balanced manner (for
example, Aligning to achieve a =» P = 0.15) instead of using the traditional approach of fixing the Type I
error rate at 0.05 or 0.01 and letting P be determined by the sample size. This approach of treating the Type I
and Type n errors simultaneously is taken in the DQO Process and it is rcffommffndfid that several different
scenarios of a and P be investigated before a decision on specific valued for a and P are selected.
EPAQA/G-9 5.2-7 QA96
-------
APPENDIX A

STATISTICAL TABLES
EPAQA/G-9 A-l QA%
-------
LIST OF TABLES

Table Nou Page

TABLEA-1: CRITICAL VALUES OF STUDENT'S t DISTRIBUTION A-3

TABLE A-2: CRITICAL VALUES FOR THE STUDENTIZED RANGE TEST A- 4

TABLEA-3: CRITICAL VALUES FOR THE EXTREME VALUE TEST (DDCOWS TEST) A-S

TABLE A-4: CRITICAL VALUES FOR DISCORDANCE TEST A - 6

TABLE A-5: APPROXIMATE CRITICAL VALUES \ FOR ROSNER'S TEST A - 7

TABLEAU: QUANTILES OF THE WILCOXON SIGNED RANKS TEST A-9

TABLEA-7: CRITICAL VALUES FOR THE RANK-SUM TEST - a = 0.05 A-10

TABLE A-8: PERCENTILES OF THE CHI-SQUARE DISTRIBUTION .....A-II

TABLE A-9: PERCENTILES OF THE F DISTRIBUTION A -12

TABLE A-10: VALUES OF THE PARAMETER £ FOR COHEN'S ESTIMATES A-15

TABLEA-11: PROBABILITIES FOR THE SMALL-SAMPLE MANN-KENDALL TEST FOR TREND..... A- 16
EPAQA/G-9 A-2 QA96
-------
TABLE A-l: CRITICAL VALUES OF STUDENTS 8 DISTRIBUTION
Tff|lH|Mr>|lhnn AJ?
JlregEvvU Ott
Freedom
1
2
3
4
5
6
7
8
9
10
11
12
13
14
IS
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
120
00
1-C
.70
0.727
0.617
0.584
0.569
0.559
0.553
0.549
0.546
0.543
0.542
0.540
0.539
0.538
0.537
0.536
0.535
0.534
0.534
0.533
0.533
0.532
0.532
0.532
0.531
0.531
0.531
0.531
0.530
0.530
0.530
0.529
0.527
0.526
0.524
.75
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.677
0.674
.80
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0855
0.855
0.854
0.854
0.851
0.848
0.845
0.842
.85
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.046
1.041
1.036
.90
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.34
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.289
1.282
.95
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.658
1.645
.975
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
.2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.980
1.960
.99
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.358
2.326
.995
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.617
2.576
Note: The las* row of the table (
e.g..Ms = ZoLW= 1.645.
degrees of freedom) gives the criticial values for a standard normal distribution (z),
EPAQA/G-9
A-3
QA96
-------
TABLE A-2s CRITICAL VALUES FOR THE STUDENTIZEIJ RANGE TEST
a
3
• 4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
150
200
500
1000

a
1.737
1.87
2.02
2.15
2.26
2.35
2.44
2.51
2.58
2.64
2.70
2.75
2.80
2.84
2.88
2.92
2.%
2.99
3.15
3.27
3.38
3.47
3.55
3.62
3.69
3.75
3.80
3.85
3.90
3.94
3.99
4.02
4.06
4.10
4.38
4.59
5.13
5.57
0.01
b
2.000
2.445
2.803
3.095
3.338
3.543
3.720
3.875
4.012
4.134
4.244
4.34
4.44
4.52
4.60
4.67
4.74
4.80
5.06
5.26
5.42
5.56
5.67
5.77
5.86
5.94
6.01
6.07
6.13
6.18
6.23
6.27
6.32
6.36
6.64
6.84
7.42
7.80
Lsvd of Significance @
O.OS
a
1.758
1.98
2.15
2.28
2.40
2.50
2.59
2.67
2.74
2.80
2.86
2.92
2.97
3.01
3.06
3.10
3.14
3.18
3.34
3.47
3.58
3.67
3.75
3.83
3.90
3.96
4.01
4.06
4.11
4.16
4.20
4.24
4.27
4.31
4.59
4.78
5.47
5.79
b
1.999
2.429
2.753
3.012
3.222
3.399
3.552
3.685
3.80
3.91
4.00
4.09
4.17
4.24
4.31
4.37
4.43
4.49
4.71
4.89
5.04
5.16
5.26
5.35
5.43
5.51
5.57
5.63
5.68
5.73
5.78
5.82
5.86
5.90
6.18
6.39
6.94
7.33
a
1.782
2.04
2.22
2.37
2.49
2.59
2.68
2.76
2.84
2.90
2.96
3.02
3.07
3.12
3.17
3.21
3.25
3.29
3.45
3.59
3.70
3.79
3.88
3.95
4.02
4.08
4.14
4.19
4.24
4.28
4.33
4.36
4.40
4.44
4.72
4.90
5.49
5.92
0.10
b
1.997
2.409
2.712
2.949
3.143
3.308
3.449
3.57
3.68
3.78
3.87
3.95
4.02
4.09
4.15
4.21
4.27
4.32
4.53
4.70
4.84
4.96
5.06
5.14
5.22
5.29
5.35
5.41
5.46
5.51
5.56
5.60
5.64 .
5.68
5.96
6.15
6.72
7.11
EPAQA/G-9
A-4
QA96
-------
TABLE A-3: CRITICAL VALUES FOR THE EXTREME VALUE TEST
(DKON'STEST)
n
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Level of Significance m
0.10
0.886
0.679
0.557
0.482
0.434
0.479
0.441
0.409
0.517
0.490
0.467
0.492
0.472
0.454
0.438
0.424
0.412
0.401
0.391
0.382
0.374
0.367
0.360
0.05
0.941
0.765
0.642
0.560
0.507
0.554
0.512
0.477
0.576
0.546
0.521
0.546
0.525
0.507
0.490
0.475
0.462
0.450
0.440
0.430
0.421
0.413
0.406
0.01
0.988
0.889
0.780
0.698
0.637
0.683
0.635
0.597
0.679
0.642
0.615
0.641
0.616
0.595
0.577
0.561
0.547
0.535
0.524
0.514
0.505
0.497
0.489
EPAQA/G-9
QA96
-------
TABLE A-4s CRITICAL VALUES FOR DISCORDANCE TEST
si
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
. 24
25
26
27
28
29
30
31
32
Level of Significance &
0.01
1.155
1.492
1.749
1.944
2.097
2.221
2.323
2.410
2.485
2.550
2.607
2.659
2.705
2.747
2.785
2.821
2.854
2.884
2.912
2.939
2.963
2.987
3.009
3.029
3.049
3.068
3.085
3.103
3.119
3.135
O.OS
1.153
1.463
1.672
1.822
1.938
2.032
2.110
2.176
2.234
2.285
2.331
2.371
2.409
2.443
2.475
2.504
2.532
2.557
2.580
2.603
2.624
2.644
2.663
2.681
2.698
2.714
2.730
2.745
. 2.759
2.773
S
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Lsvd of Significance e
<0o0i
3.150
3.164
3.178
3.191
3.204
3.216
3.228
3.240
3.251
3.261
3.271
3.282
3.292
3.302
3.310
3.319
3.329
3.336
0.05
2.786
2.799
2.811
2.823
2.835
2.846
2.857
2.866
2.877
2.887
2.8%
2.905
2.914
2.923
2.931
2.940
2.948
1956
EPAQA/G-9
A-6
QA96
-------
TABLE A-Ss APPROXIMATE CRITICAL VALUES i, FOR ROSNER'S TEST

a
25

29 •

?
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
e
0.05
2.82
2.80
2.78
2.76
173
2.59
2.84
2.82
2.80
2.78
2.76
2.62
2.86
2.84
2.82
2.80
2.78
2.65
2.88
2.86
2.84
2.82
2.80
168
2.89
2.88
2.86
184
2.82
2.71
2.91
2.89
2.88
2.86
2.84
2.73
2.92
2.91
2.89
2.88
2.86
2.76
0.01
3.14
3.11
3.09
3.06
3.03
2.85
3.16
3.14
3.11
3.09
3.06
2.89
3.18
3.16
3.14
3.11
3.09
2.93
3.20
3.18
3.16
3.14
3.11
2.97
3.22
3.20
3.18
3.16
3.14
3.00
3.24
3.22
3.20
3.18
3.16
3.03
3.25
3.24
3.22
3.20
3.18
3.06

Q
32

r
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
I
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
a
0.05
2.94
2.92
2.91
2.89
2.88
2.78
2.95
2.94
2.92
2.91
2.89
2.80
2.97
2.95
2.94
2.92
2.91
2.82
2.98
2.97
2.95
2.94
192
2.84
2.99
2.98
2.97
2.95
194
2.86
3.00
2.99
2.98
2.97
2.95
2.88
3.01
3.00
2.99
2.98
2.97
2.91
0.01
3.27
3.25
3.24
3.22
3.20
3.09
3.29
3.27
3.25
3.24
3.22
3.11
3.30
3.29
3.27
3.25
3.24
3.14
3.32
3.30
3.29
3.27
3.25
3.16
3.33
3.32
3.30
3.29
3.27
3.18
3.34
3.33
3.32
3.30
3.29
3.20
3.36
3.34
3.33
3.32
3.30
3.22

B
39

r
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
e
0.05
3.03
3.01
3.00
2.99
2.98
2.91
3.04
3.03
3.01
3.00
2.99
192
3.05
3.04
3.03
3.01
3.00
2.94
3.06
3.05
3.04
3.03
3.01
2.95
3.07
3.06
3.05
3.04
3.03
2.97
3.08
3.07
3.06
3.05
3.04
2.98
3.09
3.08
3.07
3.06
3.05
2.99
0.01
3.37
3.36
3.34
3.33
3.32
3.24
3.38
3.37
3.36
3.34
3.33
3.2S
3.39
3.38
3.37
3.36
3.34
3.27
3.40
3.39
3.38
3.37
3.36
3.29
3.41
3.40
3.39
3.38
3.37
3.30
3.43
3.41
3.40
3.39
3.38
3.32
3.44
3.43
3.41
3.40
3.39
3.33
EPA QA/G-9
A-7
QA96
-------
TABLE A-S: APPROXIMATE CRITICAL VALUES A, FOR ROSNER'S TEST

n
46

V
47

ff
I
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
S
10
e
O.OS
3.09
3.09
3.08
3.07
3.06
3.00
3.10
3.09
3.09
3.08
3.07
3.01
3.11
3.10
3.09
3.09
3.08
3.03
3.12
3.11
3.10
3.09
3.09
3.04
3.13
3.12
3.11
3.10
3.09
3.05
3.20
3.19
3.19
3.18
3.17
3.14
0.01
3.45
3.44
3.43
3.41
3.40
3.34
3.46
3.45
3.44
3.43
3.41
3.36
3.46
3.46
3.45
3.44
3.43
3.37
3.47
3.46
3.46
3.45
3.44
3.38
3.48
3.47
3.46
3.46
3.45
3.39
3.56
3.55
3.55
3.54
3.53
3.49

a
70

f
100

„

150

200

V
I
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
5
10
1
2
3
4
S
10
1
2
3
4
5
10
e
0.05
3.26
3.25
3.25
3.24
3.24
3.21
3.31
3.30
3.30
3.29
3.29
3.26
3.35
3.34
3.34
3.34
3.33
3.31
3.38
3.38
3.38
3.37
3.37
3.35
3.52
3.51
3.51
3.51
3.51
3.50
3.61
3.60
3.60
3.60
3.60
3.59
0.01
3.62
3.62
3.61
3.60
3.60
3.57
3:67
3.67
3.66
3.66
3.65
3.63
3.72
3.71
3.71
3.70
3.70
3.68
3.75
3.75
3.75
3.74
3.74
3.72
3.89
3.89
3.89
3.88
3.88
3.87
3.98
3.98
3.97
3.97
3.97
3.96

Q
250

300
350
400
450
500

V
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
e
0.05
3.67'
3.67
3.66
3.72
3.72
3.71
3.77
3.76
3.76
3.80
3.80
3.80
3.84
3.83
3.83
3.86
3.86
3.86
0.0!
4.04
4.04
4.03
4.09
4.09
4.09
4.14
4.13
4.13
4.17
4.17
4.16
4.20
4.20
4.20
4.23
4.23
4.22
EPAQA/G-9
A-8
QA96
-------
TABLE A-fc QUANTELES OF THE WILCOXON SIGNED RANKS TEST
m
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
w*
0
0
0
1
2
4
6
8
10
13
16
20
24
28
33
38
44
0
1
3
4
6
9
11
14
18
22
26
31
36
42
48
54
61
1
3
4
6
9
11
15
18
22
27
32
37
43
49
56
63
70
3
4
6
9
12
15
19
23
28
33
39
45
51
58
66
74
82
EPA QA/G-9
A-9
QA96
-------
TABLE A-7i CRITICAL VALUES FOR THE RANK-SUM TEST - & - O.OS
Smaller
of m or ca
I
2
3
4-
5
6
7
8
9
10
11
12
13
14
IS
16
17
18
19
20
Larger of morn
3

0
•I

0
1 '
2
4

0
2
3
5
7

0
2
4
6
8
88

1
3
5
8
10
O
IS

•

1
4
6
9
12
15
18
21

'1
4
7
11
14
17
20
24
27

I
S
8
12
16
19
23
27
31
34

2
S
9
13
17
21
26
30
34
38
42.

2
6
10
IS
19
24
28
33
37
42
47
Si

3
7
11
16
21
26
31
36
41
46
51
56
61

3
7
12
18
23
28
33
39
44
50
55
61
66
72

3
8
14
19
25
30
36
42
48
54
60
65
71
77
83

3
9
15
20
26
33
39
45
51
57
64
70
77
83
89
96

4
9
16
22
28
35
41
48
55
61
68
75
82
88
95
102
109

19
0
4
10
17
23
30
37
44
51
58
65
72
80
87
94
101
109
116
123

20
0
4
11
18
25
32
39
47
54
62
69
77
84
92
100
107
115
123
130
138
EPAQA/O-9
A-10
QA96
-------
TABLE A-Ss PERCENTILES OF THE CHI-SQUARE DISTRIBUTION
V
I
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
§0
100
!-e
.005
0.0*393
o.orca
0.072
0.207
0.412
0.676
0.989
1.34
1.73
2.16
2.60
3.07
3.57
4.07
4.60
5.14
5.70
6.26
6.84
7.43
8.03
8.64
9.26
9.89
10.52
11.16
11.81
12.46
13.12
13.79
20.71
27.99
35.53
43.28
51.17
59.20
67.33
.010
O.OMS7
~ 0.0201
0.115
0.297
0.554
0.872
1.24
1.65
2.09
2.56
3.05
3.57
4.11
4.66
5.23
5.81
6.41
7.01
7.63
8.26
8.90
9.54
10.20
10.84
11.52
12.20
12.88
13.56
14.26
14.95
22.16
29.71
37.48
45.44
53.54
61.75
70.06
.025
0.0J982
0.0506
0.216
0.484
0.831
1.24
1.69
2.18
2.70
3.25
3.82
4.40
5.01
5.63
6.26
6.91
7.56
8.23
8.91
9.59
10.28
10.98
11.69
12.40
13.12
13.84
14.57
15 Jl
16.05
16.79
24.43
32.36
40.48
48.76
57.15
65.65
74.22
.050
0.02393
0.103
0.352
0.711
1.145
1.64
2.17
2.73
3.33
3.94
3.57
5.23
5.89
6.57
7.26
7.96
8.67
9.39
10.12
10.85
11.59
12.34
13.09
13.85
14.61
15.38
16.15
16.93
17.71
18.49
26.51
34.76
43.19
51.74
60.39
69.13
77.93
.100
0.0158
0.211
0.584
1.064
1.61
2.20
2.83
3.49
4.17
4.87
5.58
6.30
7.04
7.79
8.55
9.31
10.09
10.86
11.65
12.44
13.24
14.04
14.85
15.66
16.47
17.29
18.11
18.94
19.77
20.60
29.05
37.69
46.46
53.33
64.28
73.29
82.36
.900
2.71
4.61
6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99
17128
18.55
19.81
21.06
22.31
23.54
24.77
25.99
27.20
28.41
29.62
30.81
32.01
33.20
34.38
35.56
36.74
37.92
39.09
40.26
51.81
63.17
74.40
85.53
96.58
107.6
118.5
.950
3.84
5.99
7.81
9.49
11,07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
'37.65
38.89
40.11
41.34
42.56
43.77
55.76
67.50
79.08
90.53
101.9
113.1
124.3
.975
5.02
7.38
9.35
11.14
12.83
14.45
16.01
17.53
19.02
20.48
21.92
23.34
24.74
26.12
27.49
28.85
30.19
31.53
32.85
34.17
35.48
36.78
38.08
39.36
40.65
41.92
43.19
44.46
45.72
46.98
59.34
71.42
83.30
95.02
106.6
118.1
129.6
.990
6.63
9.21
11.34
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.73
26.22
27.69
29.14
30.58
32.00
33.41
34.81
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.28
49.59
50.89
63.69
76.15
88.38
100.4
112.3
124.1
135.8
.995
7.88
10.60'
12.84
14.86
16.75
18.55
20.28
21.%
23.59
25.19
26.76
28.30
29.82
31.32
32.80
34.27
35.72
37.16
38.58
40.00
41.40
42.80
44.18
45.56
46.93
48.29
49.64
50.99
52.34.
53.67
66.77
79.49
91.95
104.2
116.3
128.3
140.2
EPAQA/G-9
A-ll
QA96
-------
TABLE A-9s PERCENTILES OF THE F BISTRIBUTION
Dtgreee
for
Damn-

1 JO

53
573

2 .30
50
53
.973
.99
3 .90
.90
59
.979
.99
4 .30
.90
53
.973
.99
.999
3 .30
.90
53
.973
.99

6 .30
.90
.93
.973
.99
.999
Dogracd of FrowfasD for Nim&rstar
i .

1.00
395
161
648
4052
0.667
8.33
18.3
38.3
98.3
0.583
3.34
10.1
17.4
34.1
0.349
4.34
7.71
123
21.2
74.1
0.328
4.06
6.61
10.0
'16.3
47.2
0.315
3.78
3.99
8.81 •
22.8
33.9
2

1.30
49.3
200
300
3000
1.00
9.00
19.0
39.0
99.0
0.881
3.46
'9J3
16.0
30.8
0.828
432
6.94
10.6
18.0
613
0.799
3.78
3.79
8.43
133
37.1
0.780
3.46
9.14
736
10.9
27.0
3 .

.•J.71
93.6
216
864
3403
1.13
9.16
193
39.2
993
D.OO
939
938
13.4
29.S
0.941
4.19
6.39
9.98
16.7
36.2
0.907
3.62
3.41
7.76
12.1
33.2
0.886
3.29
4:76
&60
9.78
23.7
4

1.82
33.8
223
900
3623
1.21
9.24
19.2
393
99.2
1.06
934
9.12
19.1
28.7
1.00
4.11
639
9.60
16.0
93.4
0.963
3.32
3.19
739
11.4
31.1
0.942
3.18
4.33
6.23
9.13
21.9
3

1.89
373
230
922
3764
135
939
193
393
993
1.10
331
9.01
14.9
28.2
1.04
4.05
6.26
936
13.3
31.7
1.00
3.43
3.05
7.13
11.0
29.8
0.977
3.11
439
359
S.73
20.8
6

1.94
58.2
234
937
5839
S.28
933
193
393
993
1.13
3.28
8.94
14.7
27.9
1.06
4.01
6.16
9.20
133
30.3
1.02
3.40
4.93
6.98
10.7
28.8
1.00
3.0)
4.28
3.82
8.47
20.0
7

1.98
38.9
237
948
3928
130
933
19.4
. 39.4
99.4
1.13
337
3.89
14.6
27.7
1.08
3.98
6.09
9.07
15.0
49.7
1.04
337
4.88
6.85
10.3
28.2
1.02
3.01
431
3.70
8.26
19.3
8

2.00
39.4
239
937
5981
132
937
19.4
39.4
99.4
1.16
5.25
8.85
I4.S
27.5
1.09
355
6.04
3.98
14.8
49.0
1.05
3.34
4.82
6.76
103
27.6
1.03
2.98
4.13
3.60
8.10
19.0
9

103
39.9
241
963
6022
133
936
19.4
39.4
99.4
1.17
3.24
asi
14.3
273
1.10
3.94
6.00
8.90
14.7
48.3
1.06
3.32
4.77
6.68
10.2
27.2
1.04
256
4.10
3.32
758
18.7
10

2.04
60.2
242
969
6036
134
939
89.4
39.4
99.4
1.18
333
&79
14.4
273
1.11
3.92
3.96
8.84
14.3
48.1
1.07
3.39
4.74
6.62
10.1
26.9
1.03
254
4.06
9.46
7.87
18.4
12

2.07
60.7
244
977
6106
136
9.41
19.4
39.4
99.4
UO
9.22
8.74
143
27.1
1.13
3.90
3.91
3.73
14.4
47.4
1.09
' 3.27
4.68
6.32
9.89
26.4
1.06
2.90
4.00
337
7.72
18.0
13

2.09
613
246
983
6157
138
9.42
19.4
39.4
99.4
131
5.20
aTO
143
26.9
1.14
3.87
5.86
8.66
143
46.8
1.10
3.24
4.62
6.43
9.72
25.9
1.07
2.87
3.94
3.27
7.56
17.6
20

2.12
61.7
248
993
6209
139
9.44
19.4
39.4
99.4
1.23
5.18
8.66
14.2
26.7
1.15
3.84
5.80
8.36
14.0
46.1
Ml
331
4.36
6.33
9.33
23.4
1.08
2.84
3.87
3.17
7.40
. 17.1
24

2.13
62.0
249
997
6233
1.40
9.43
19.5
39.3
99.3
1.23
3.18
3.64
14.1
26.6
1.16
3.83
3.77
8.51
135
43.8
1.12
3.19
4.33
638
9.47
23.1
1.09
2.82
3.84
3.12
731
16.9
30

2.19
623
250
1001
6261
1.41
9.46
19.5
39.5
99.5
1.24
3.17
3.62
14.1
26.5
1.16
3.82
5.75
S.46
13.8.
45.4
1.12
3.17
4.50
6.23
938
24.9
1.10
2.80
3.81
5.07
733
16.7
60

2.17
62.8
252
1010
6313
1.43
9.47
19.3
39.3
99.3
1.23
3.13
8.57
14.0
26.3
1.18
3.79
3.69
836
13.7
44.7
1.14
3.14
4.43
6.12
9.20
24.3
1.11
2.76
3.74
4.96
7.06
16.2
120

2.18
63.1
233
1014
6339
1.43
9.48
19.5
39.3
99.3
136
3.14
8.35
13.9
263
1.18
3.78
9.66
831
13.6
44.4
1.14
3.12
4.40
6.07
9.11
24.1
1.12
2.74
3.70
4.90
6.97
16.0
CO

2.20
633
254
1018
6366
1.44
9.49
19.5
39.5
99.5
137
3.13
8.53
13.9
26.1
1.19
3.76
3.63
8.26
13.3
44.1
1.15
3.11
4.37
6.02
9.02
23.8
1.12
2.72
3.67
4.85
6.88
13.7
EPAQA/O-9
A-12
QA96
-------
TABLE A-9: PERCENTILES OF THE F DISTRIBUTION
Degrees
Ffoodosn
for
Ds&03n»
tnatox **
7 .30
50
.93
.973
.99
.999
8 .30
.90
.93
.973
.99
.999
9 .30
50
53
.973
39
.999
10 .30
.90
53
.973
59
.999
12 .30
.90
53
.973
59
.999
13 .30
.90
53
573
59
.999
Dagrosa of Fronton for Numsretar
1
0.306
3.39
3.39
8.07
12.2
29.2
0.499
3.46
3.32
7.37
10
23.4
0.494
336
3.12
7.21
10.6
22.9
0.490
3.29
4.96
6.94
10.0
21.0
0.4S4
3.18
4.73
6.33
933
18.6
0.478
3.07
4.34
6.20
8.68
16.6
2
.0767
3.26
4.74
6.34
9.33
21.7
0.737
3.11
0.46
6.06
0.63
• 18.3
0.749
3.01
4.36
3.71
8.02
. 16.4
0.743
2.92
4.10
' 3.46
7.36
145
0.733
2.81
3.89
3.10
6.93
13.0
0.726
2.70
3.68
4.77
636
10
3
0.871
3.07
433
3.89
0.43
18.8
0.860
2.92
4.07
3.42
7.39
13.8
0.832
2.81
3.86
3.08
6.99
135
0.843
2.73
3.71
4.83
6.33
12.6
0.833
2.61
3.49
4.47
353
10.8
0.826
2.49
3.29
4.13
3.42
9.34
4
0.926
2.96
4.12
3.32
7.83
17.2
0.913
2.01
3.84
3.03
7.01
14.4
0506
2.69
3.63
4.72
6.42
12.6
0.899
2.61
3.48
4.47
359
10
0.888
2.48
3.26
4.12
3.41
9.63
0.878
236
3.06
3.80
4J9
0.23
3
0.960
2.88
3.97
3.29
7.46
16.2
0.948
2.73
3.69
4.82
6.63
13.3
0539
2.61
3.48
4.48
6.06
11.7
0532
2.32
3.33
4.24
3.64
10.3
0.921
239
3.11
3.89
3.06
8.89
0511
2.27
2.90
3.38
4.36
7.37
6
0.983
2.83
3.87
3.12
7.19
13.3
0.971
2.67
3.38
4.63
637
12.9
0.962
2.33
337
432
3.80
11.1
0.934
2.46
3.22
4.07
339
953
0.943
233
3.00
3/73
4.82
8.38
0.933
2.21
2.79
3.41
432
7.09
7
1.00
2.78
3.79
4.99
6.99
13.0
0588
2.62
3.30
4.33
6.18
12.4
0.978
2.31
3.29
4.20
3.61
10.7
0.971
2.41
3.14
353
3.20
9.32
0.939
2.28
2.91
3.61
4.64
8.00
0.949
2.16
2.71
3.29
4.14
6.74
8
1.01
2.73
3.73
4.90
6.84
14.6
1.00
2.39
3.44
4.43
6.03
12.0
0.990
2.47
3.23
4.10
3.47
10.4
0.983
238
3.07
3.83
3.06
9.20
0.972
2.24
2.83
3.31
4.30
7.71
0.960
112
2.64
3.20
4.00
6.47
9
1.02
2.72
3.68
4.82
6.72
14.3
1.01
2.36
339
436
3.91
11.8
1.00
2.44
3.18
4.03
333
10.1
0.992
135
3.02
3.78
4.94
8.96
0.981
2.21
2.80
3.44
439
7.48
0.970
2.09
2.39
3.12
3.89
6.26
10
1.03
2.70
3.64
4.76
6.62
14.1
1.02
2.34
333
430
' 3.81
11.3
1.01
2.42
3.14
3.96
3.26
9.89
1.00
232
2.98
3.72
4.83
8.73
0.989
2.19
2.73
337
430
7.29
0.977
2.06
2.34
3.06
3.80
6.08
12
1.04
2.67
3.37
4.67
6.47
13.7
1.03
2.30
338
4.20
3.67
11.2
1.01
238
3.07
3.87
3.11
9.37
1.01
2.28
251
3.62
4.71
8.43
1.00
2.13
2.69
3.28
4.16
7.00
0.989
2.02
2.48
256
3.67
3.81
13
1.03
2.63
3.31
4.37
631
13.3
1.04
2.46
3.22
4.10
3.32
10.8
1.03
234
3.01
3.77
4.96
9.24
1.02
2.24
2.34
3.32
4.36.
8.13
1.01
2.10
2.62
3.18
4.01
6.71
1.00
157
2.40
2.86
3.32
3.34
20
1.07
2.39
3.44
4.47
6.16
12.9
1.03
2.42
3.13
4.00
336
10.3
1.04
230
254
3.67
4.81
8.90
1.03
230
2.77
3.42
4.41
7.80
1.02
2.06
2.34
3.07
3.86
6.40
1.01
152
233
2.76
337
3.23
24
1.07
2.38
3.41
4.42
6.07
12.7
1.06
2.40
3.12
3.93
3.28
103
1.03
2.28
2.90
3.61
4.73
8.72
1.04
2.18
2.74
3.37
4.33
7.64
1.03
2.04
2.31
3.02
3.78
6.23
1.02
1.90
2.29
2.70
3.29
3.10
30
1.08
2.36
3.38
436
3.99
12.3
1.07
238
3.08
3.89
3.20
10.1
1.03
125
2.86
3.36
4.63
3.33
1.03
2.16
2.70
331
4.23
7.47
1.03
2.01
2.47
2.96
3.70
6.09
1.02
1.87
2.23
2.64
3.21
4.93
60
1.09
2.31
330
4.23
3.82
12.1
1.08
234
3.01
3.78
9.03
9.73
1.07
231
2.79
3.43
4.48
8.19
1.06
2.11
162
3.20
4.08
7.12
1.03
1.96
238
2.83
3.34
5.76
1.03
1.82
2.16
2.32
3.05
4.64
120
1.10
2.49
3.27
4.20
3.74
11.9
1.08
232
2.97
3.73
4.95
9.53
1.07
2.18
175
339
4.40
8.00
1.06
2.08
2.58
3.14
4.00
6.94
1.03
1.93
234
2.79
3.45
5.39
1.04
1.79
2.11
2.46
2.96
4.48
C9
1.10
2.47
3.23
4.14
3.65
11.7
1.09
2.29
2.93
3.67
4.86
933
1.08
2.16
2.71
333
431
7.81
1.07
2.06
2.34
3.08
3.91
6.76
1.06
1.90
2.30
2.72
336
3.42
1.03
1.76
2.07
2.40
2.87
431
EPAQA/G-9
A-13
QA96
-------
TABLE A-$s PERCXNTILES OF THE F DISTRIBUTION
Degree
Froodosn
for
Henttnv
jnalor
20 JO
50
.93
573
.99
,999
24 .30
50
53
573
59
.999
30 .30
50
53
573
59
.999
60 .30
50
53
573
.99
.999
120.90
53
573
59
. .999
o 50
.93
.973
.99
599
Dsgrewo? Freedom fer Numerator

0.472
2.97
4.33
3.87
8.10
14.8
0.469
2.93
4.26
3.72
7.82
14.0
0.466
2.88
4.17
3.37
7.36
13.3
0.461
179
4.00
3.29
7.08
12.0
2.73
3.92
3.13
6.83
11.4
2.71
3.84
3.02
6.63
10.8

0.718
2.39
3.49
4.46
3.83
9.93
0.714
2.34
3.40
432
6.66
934
0.709
2.49
. 332
4.18
339
8.77
0.701
239
3.13
3.93
4.98
7.77
233
3.07
3.80
4.79
732
230
3.00
3.69
4.61
6.91

0.816
238
3.10
3.86
4.94
8.10
0.812
233
3.01
3.72
4.72
7.33
0.807
2.28
252
3.39
4.31
7.03
0.798
2.18
2.76
334
4.13
6.17
2.13
2.68
333
3.93
3.78
2.08
Z60
3.12
3.78
3.42

0.868
233
2.87
3.31
4.43
7.10
0.863
2.19
2.78
338
432
6.39
0.838
2.14
2.69
333
4.02
6.12
0.849
2.04
2.33
3.01
3.63
331
1.99
2.43
2.89
3.48
4.93
1.94
237
2.79
332
4.62

0500
2.16
2.71
3.29
4.10
6.46
0.893
2.10
2.62
3.13
3.90
3.98
0.890
2.03
2.33
3.03
3.70
3.33
0.880
153
237
2.79
334
4.76
150
2.29
2.67
3.17
4.42
1.83
2.21
2.37
3.02
4.10

0.922
2.09
2.60
3.13
3.87
6.02
0.917
2.04
2.31
2.99
3.67
3.33
0512
158
2.42
2.87
3.47
3.12
0.901
1.37
2.23
2.63
3.12
437
1.82
2.18
2.32
2.96
4.04
1.77
2.10
22.41
2.80
3.74

0.938
2.04
2.31
3.01
3.70>
3.69
0.932
1.98
2.42
2.87
3.30
3.23
0.927
1.93
233
2.73
330
4.82
0.917
1.82
2.17
2.31
2.93
4.09
1.77
2.09
239
2.79
3.77
1.72
2.01
239
2.64
3.47

0.930
2.00
2.43
2.91
3.36
3.44
0.944
1.94
236
2.78
336
4.99
0.939
i'.ss'
237
2.63
3.17
4.38
0528
1.77
2.10
2.41
2.82
3.86
1.72
2.02
230
2.66
3.33
1.67
1.94
2.19
131
3.27

0.939
1.96
2.39
2.84
3.46
3.24
0.933
1.91
230
2.70
336
4.80
0.948
1.83
231
2.37
3.07
439
0.937
1.74
2.04
233
2.72
3.69
1.68
1.96
232
2.36
338
1.63
1.88
2.11
2.41
3.10

0.965
1.94
233
2.77
337
3.08
0.961
1.88
233
2.64
3.17
4.64
0533
1.82
2.16
2.31
2.98
4.24
0.943
1.71
159
237
2.63
3.34
1.63
151
2.16
2.47
3.24
1.60
1.83
2.03
232
2.96

0.977
1.89
2.28
2.68
3.23
4.82
0.972
1.83
2.18
2.34
3.03
439
0566
1.77
2.09
2.41
- 2.84
4.00
0536
1.66
1.92
117
2.30
332
1.60
1.83
2.03
234
3.02
1.33
1.73
1.94
2.18
2.74

0.989
1.84
2.20
2.37
3.09
4.36
0.983
1.78
2.11
2.44
2.89
4.14
0.978
1.72
2.01
231
2.70
3.73
0.967
1.60
1.84
106
233
3.08
1.33
1.73
153
119
178
1.49
1.67
'1.83
2.04
131

1.00
1.79
2.12
2.46
2.94
4.29
0.994
1.73
2.03
233
2.74
3.87
0589
1.62
153
230
.133
3.49
0.978
1.34
1.73
154
120
2.83
1.48
1.66
1.82
2.03
133
1.42
1.37
1.71
1.88
237

1.01
1.77
2.08
2.41
2.86
4.13
1.00
1.70
158
237
2.66
3.74
0.994
1.64
1.89
114
147
336
0.983
1.31
1.70
1.88
2.12
• 2.69
1.43
1.61
1.76
1.93
2.40
138
1.32
1.64
1.79
2.13

1.01
1.74
2.04
133
2.78
4.00
1.01
1.67
1.94
121
138
3.39
1.00
1.61
1.84
107
239
3.22
0.989
1.48
1.63
1.82
2.03
2.33
1.41
1.33
1.69
1.86
2.26
134
1.46
1.37
1.70
1.99

. 60

,1.02
'1.68
•1.93
2.22
161
3.70
1.02
1.61
1.84
2.08
2.40
3.29
1.01
1.34
1.74
1.94
2.21
2.92
1.00
1.40
1.33
1.67
1.84
2.23
132
1.43
1.33
1.66
1.93
1.24
132
139
1.47.
1.66

120

1.03
1.64
1.90
2.16
2.32
3.34
1.02
1.37
1.79
2.01
231
3.14
1.02
1.30
1.68
1.87
111
2.76
1.01
133
1.47
1.S8
1.73
2.08
1.26
133
1.43
1.33
1.77
1.17
132
137
132
1.43

1.03
1.61
1.84
2.09
2.42
3.38
1.03
1.33
1.73
1.94
2.21
2.97
1.02
1.46
1.62
1.79
101
2.39
1.01
1.29
139
1.48
1.60
1.89
1.19
1.23
1.31
138
1.34
1.00
1.00
1.00
1.00
1.00
EPAQA/0-9
A-14
QA96
-------
TAILS A-lOs VALUES QW THE PARAMETER £ FOR COHEN'S ESTIMATES
ADJUSTING FOR NOffD EJECTED VALUES
_y_
.00
.05
.10
.15
20
25
30
35
.40
.45
30
.55
.60
.65
.70
.75
.§0
.85
.90
.95
1.00
.01
.010100
.010551
.010950
.011310
.011642
.011952
.012243
.012520
.012784
.013036
.013279
.013513
.013739
.013958
.014171
.014378
.014579
.014773
.014967
.015154
.015338
.02
.020400
.021294
.022082
.022798
.023459
.024076
.024658
.025211
.025738
.026243
.026728
.027196
.027849
.028087
.028513
.029927
.029330
.029723
.030101
.030483
.030850
.03
.030902
.032225
.033398
.034466
.035453
.036377
.037249
.038077
.038866
.039624
.040352
.041054
.041733
.042391
.043030
.043652
.044258
.044848
.045425
.045989
.046540
.04
.041583
.043350
.044902
.046318
.047829
.048858
.050018
.051120
.052173
.053182
.054153
.055089
.055995
.056874
.057726
.058556
.059364
.060153
.060923
.061676
.062413
.05
.052507
.054670
.056596
.058356
.059990
.061522
.062969
.064345
.065660
.066921
.068135
.069306
.070439
.071538
.072505
.073643
.074655
.075642
.075606
.077549
.078471
.06
.063625
.066159
.068483
.070586
.072539
.074372
.076106
.077736
.079332
.080845
.082301
.083708
.085068
.086388
.087670
.088917
.090133
.091319
.092477
.093611
.094720
&
.07
.074953
.077909
.080563
.083009
.085280
.087413
.089433
.091355
.093193
.094958
.096657
.098298
.099887
.10143
.10292
.10438
.10580
.10719
.10854
.10987
.11116
.08
.08649
.08983
.09285
.09563
.09822
.10065
.10295
.10515
.10725
.10926
.11121
.11208
.11490
.11666
.11837
.12004
.12167
.12225
.12480
.12632
.12780
.09
.09824
.10197
.10534
.10845
.11135
.11408
.11667
.11914
.12150
.12377
.12595
.12806
.13011
.13209
.13402
.13590
.13775
.13952
.14126
.14297
.14465
.10
.11020
.11431
.11804
.12148
.12469
.12772
.13059
.13333
.13595
.13847
.14090
.14325
.14552
.14773
.14987
.15196
.15400
.15599
.15793
.15983
.16170
.15 20
-.17342 .24268
.17925 25033
.18479 .25741
.18985 26405
.19460 .27031
.19910 27626
20338 28193
20747 28737
21129 29250
21517 29765
21882 30253
7777-S 30725
22578 31184
22910 31630
23234 32065
23550 32489
23858 32903
24158 33307
24452 33703
24740 34091
25022 34471
•y
.00
.05
.10
.15
20
25
30
35
.40
.45
JO
35
.60
.65
.70
.75
.80
.85
.90
.95
1.00
25
31862
32793
33662
34480
35255
35993
36700
37379
38033
38665
39276
39679
.40447
.41008
.41555
.42090
.42612
.43122
43622
.44112
.44592
30
.4021
.4130
.4233
.4330
.4422
.4510
.4595
.4676
.4735
.4831
.4S04
.4976
.5045
3114
3180
3245
3308
.5370
.5430
.5490
.5548
35
.4941
JOSS
3184
J296
3403
3505
3604
3699
3791
3880
3967
.6061
.6133
.6213
J.6291
.6367
.6441
.6515
.6586
.6656
.6724
.40
3961
.6101
.6234
.6361
.6483
.6600
.6713
.6821
.6927
.7029
.7129
.7225
.7320
.7412
.7502
.7590
.7676
.7781
.7844
.7925
.8005
.45
.7096
.7252
.7400
.7542
.7673
.7810
.7937
.8060
.8179
.8295
.8403
.8517
.8625
.8729
.8832
.8932
.9031
.9127
.9222
.9314
.9406
30
.8388
.8540
.8703
.8860
.9012
.9158
.9300
5437
.9570
.9700
.9826
5930
1.007
1.019
1.030
1.042
1.053
1.064
1.074
1.085
1.095
i .
.55
.9808
.9994
1.017
1.035
1.051
1.067
1.083
1.098
1.113
1.127
1.141
1.155
1.169
1.182
1.195
1207
1220
1232
1244
1253
1287
.60
1.145
1.166
1.185
1204
1222
1240
1257
1270
1290
1306
132!
1337
1351
1368
1380
1394
1.408
1.422
1.435
1.448
1.461
.63
1336
1358
1379
1.400
1.419
1.439
1.457
1.475
1.494
1311
1328
1345
1361
1377
1393
1.608
1.624
1.639
1.653
1.668
1.882
.70
1361
1385
1.608
1.630
1.651
1.672
1.693
1.713
1.732
1.751
1.770
1.788
1.806
1.824
1.841
1.851
1.875
1.892
1.908
1.924
1.940
.80
2.176
2203
2229
2255
2280
2305
2329
2353
2376
2399
2.421
2.443
2.465
2.486
2.507
2328
2348
2368
2.588
2.607
2.626
.SO
3283
3314
3345
3376
3.405
3.435
3.464
3.492
3320
3347
3375
3.601
3.628
3.654
3.679
3.705
3.730
3.754
3.779
3.803
3.827
EPAQA/G-9
A-15
QA96
-------
TABLE A-ll: PROBABILITIES FOR THE SMALL-SAMPLE MANN-KENBALL TEST FOR TREND
§
0
2
4
6
%
10
12
14
16
18
20
22
24
26
28
30
32
34
36

El
4 5 8 9
0.625 0.592 0.548 0.540
0.375 0.408 0.452 0.460
0.167 0.242 0.360 0.381
0.042 0.117 0.274 0.306
0.042 0.199 0.238
0.0083 0.138 0.179
0.089 0.130
0.054 0.090
0.031 0.060
0.016 0.038
0.0071 0.022
0.0028 0.012
0.00087 0.0063
0.00019 0.0029
0.000025 0.0012
0.00043
0.00012
0.000025
0.0000028

S
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
m
6 7 10
0.500 0.500 0.500
0.360 0.386 0.431
0.235 0.281 0.364
0.136 0.191 0.300
0.068 0.199 0.242
0.028 0.068 0.190
0.0083 0.035 0.146
0.0014 0.015 0.108
0.0054 0.078
0.0014 0.054
0.00020 0.036
0.023
0.014
0.0083
0.0046
0.0023
0.0011
0.00047
0.00018
0.0000458
0.0000415
0.0000028
0.00000028
EPAQA/G-9
A-16
QA96
-------