Tne lltn EPA Conference on Statistics

February 27, 1995 - Marcli 2, 1995


George Washington Inn and Conference Center
   W.ll.         -T7-.  .  .
   dliamsburg, Virginia
                                              EPA STATISTICS
                      WATER QUALITY

-------
Welcome to the Eleventh Annual EPA Conference on Statistics
     We are delighted to welcome you to Williamsburg for the llth
Annual EPA Conference on Statistics.  This conference has evolved
considerably from its origin one summer week in Raleigh.  The
conference now covers a wider scope of information and data
topics compared with the "purer statistic" topics of years of
old.  This year we have added some new features.  We are having a
morning of collaborative research reporting to learn the progress
of several universities working under EPA grants.  We are also
initiating an invited session to honor some success stories among
EPA statisticians.  You'll also note the addition of an informal
session to discuss the ins and outs of various statistical
software packages.  These new items along with our regular
sessions, featured speakers, tutorials, and poster session should
provide something for everyone.  I want to thank both the
Conference Planning Committee and our Arrangements Committee for
their very hard efforts to plan and organize such a conference.
I am sure we will uncover a few t's that haven't been crossed,
but at least we dotted the i's.  Do let us know how we can
improve any aspect of the conference.

                    Barry D. Nussbaum
                    Conference Chairman
                1995 Conference Planning  Committee
Jacquelyn Ager
Ruth Allen
Barbara Parzygnat
Susan Brunenmeister
Rick Cothern
Barnes Johnson
                Henry Kahn
                Mel Hollander
                Elizabeth Margosches
                John Warren
                Pat Wilkinson
Joan Bundy
Margaret Conomos
Nicole Cortina
1995 Arrangements Committee

                Pepi  Lacayo
                Patricia Little
                Pat Wilkinson

-------
                                    appeared in Amscat News,  Janiinry  1995,  Number 216,  pp.  33-34.
Section on Statistics  and the
Knvlmnmeitl

We Meet, We Discuss,

We Move Forward

         Barry D. Nussbaum
        Program Chair, 1995
U.S. environmental Protection Agency
   Happy New Year! While the year is
new and young 10 you. many of us have
already been planning cvenis to take
place in 1995  This includes paiiltipaiion
in many forums, colloquia, and meetings.
Aftei all,  the marriage of statistics and
ilie'environment is not just confined to
the Atncricnn Statistical Association.
Using llie discipline of statistics toward
solving enviiorunenial  problems   is

Indeed an "In" topic for many confer-
em cs  1 he section li- active in promoting
the topic,  at cvciy nppoilimily Ficqucnl-
ly we sui'jjrsi topics and piovide spcakets
and siippoii u> oilier organizations in a
nevei-enctwg cffoil In pats the  word
around The moie we meet and discuss,
the moie statistics becomes embedded
In the qucM fumivlionmcnial solutions.
   l:ollowing is a samplet of cnviroiniicn-
tal statistics conff retires and sessions
within othei confeicnces foi  1995.  In
many  cases, contacts  are given so you
can voUmicci fin  sessions, papers,
workshops, etc Obviously, the success
of these meetings depends on  our indi-
vidual involvrmrril In j'.ciif ral, I've listed
confeiences that aie not in Amstol News
Add these to the list, and you'd  never
have a day In the office.
    Rest, a woid fiom our sponsot  As
 piogram than, 1 grt to advertise out own
conference  fust1 Ihis is not unbiased,
 hut iaihcr a BNIMl estimate. 'I his Is Barry*
 »CM Biased Blandishment. We are slill
 seeking cieative ideas (01 papcis, special
 contributed sessions, roundiables, and
 workshops foi the animal Joint Statistical
 Mi-cling In Oilando. 'Mils will  occur
 August 13-17 at the Wall Disney  World
 Dolphin nod Swmi. two of the  better
 creatmes for wlilc.li we thank the Section
 on Statistics and the Environment. Vou
 can icach me with yotn suggestions at
 (202) 260-11D i, nt by fax at (101) JW-
 1968.    li-mail     also    woiks:
 NUSSHAUM.UARRY@PPAMA1I .ni'A.GO
 V. Soiry,  no collect calls and no 000 num-
 bci, bin out operators are standing by
 '24 hours a day We look Im waul to see-
 ing you  In sunny Florida.
   And now for the sampler on what
1995 has to offer foi Statistics and the
Enviionmeiu
   Depending on ASAs mailing schedule,
(his eiihei will be a good cnnlctciirc or
was  a good Lonfeience  ASA's  Wintei
meeting is m Keseatch Tiiangle Park. NC
on January 6-8. Lany Cox (919-511-
264H) of El'A is organizing a session on
Statistics in niiviionmcnul Science  So
eiihei look for the proceedings or get to
the nearest an pott and attend!
   On March 26-29, our sister organiza-
tion, FNAIi will liost us spnng meeting
in Utimingham, Alabama. Two invited
sessions have been anatiged by oui sec-
tion  Charles Davis (702.4->h-K094) is
organizing "Rcgtilatoty Statistics for llnvi-
lonmcnial Coniaminatloii," and Hlmal
Slnha (410-193*2147) has put together
    Srr I'NIVIRONMP.N 1 pagf 34
"SlalisliLal Me! hods loi
Analysis "
   Washington, IXC. will be the site ku
the May 17-19 inceiiiif> of die Waste Pol-
icy Institute and Ait and Waste Munngc-
inrnt Association  Conleience  "Clial-
lcii{.4y8y) Then quick.
ly iravp| u> Meiida,  Mexico,  for  the
SI'UIK'.P 111 confciencc on December  II-
13. Ihc keynote pa|Ki will be delivered
by hold a former section chaiimati (I'lul

Ross) and a future nnr (l^ity Cox). For
this one, you may want to spruce up on
youi  Spanish.
   I hat's only a sample, llie univctsc Is
much laigei.  lu.il you may ceilainly infer
Iiotn this sample thai the srclion is active
and the topics are vfiy CUMCIU We hope
lu see you and have you participate at
several of these linpoilnnt meetings In
1995. Make this a lesolutlon you will
keep'
 mem is a planned session on statist ical
 aspects nf testing fot compliance with

-------
AGENDA

-------
                         llth Annual EPA Conference on Statistics
                              AGENDA
                       MONDAY, FEBRUARY 27
3:30-5:30      Registration and Check-in            Mt. Vernon-C

5:00-6:00      Statistical Software and the Single  Mt. Vernon-B
               Statistician

               Informal Discussion Group on
               Statistical Software

               Elizabeth Margosches and Susan
               Brunenmeister
                       TUESDAY,  FEBRUARY 28

8:30-8:40      Welcoming Remarks                    Mt. Vernon-A
               Barry Nussbaum


8:40-8:45      Introduction of Speakers
               Phil Ross

8:45-9:30      Keynote Address
               Lynn Goldman,  Assistant
               Administrator for Office of
               Prevention,Pesticides and Toxic
               Substances
9:30-10:15     Featured Speaker
               William F.  Raub, Science Advisor to
               the Administrator - Director,
               National Center for Extramural
               Research and Quality Assurance
10:15-10:30    Break

-------
                         llth Annual EPA Conference on Statistics
10:30-11:45
Statistical Quality Assurance: Data
Quality Assessment
John Warren
Tom Dixon
Mt.Vernon-A
               Environmental Monitoring:  New
               Answers to Old Questions and
               Spatial Sampling
               George Flatman
               Evan Englund
                                     Wakefield
11:45-1:15
Lunch
1:15-2:30      Tutorial:  Survival Analysis          Mt.Vernon-B
               Lawrence Leemis,  College of William
               and Mary
               Statistical Methods for Combining
               Environmental information and
               Environmental Research at HISS
               Larry Cox and Jerry Sacks
                                      Wakefield
2:30-2:45      Break
2:45-4:00      Tutorial:  Publishing on the
               Internet
               Chap Gleason

               Emerging Issues in Environmental
               Statistics-I
               Ruth Allen,  Organizer

               Pesticides in the Diets of Infants
               and Children: Exposure and Risk
               Estimation Using Monte Carlo
               Simulation
               John Peter Wargo, Yale University
                                      Mt.Vernon-B
                                      Wakefield
4:00-4:15
Break

-------

-------
                         llth Annual EPA Conference on Statistics
4:15-5:30      Statistical Policy Advisory
               Committee
                                      Mt.Vernon-B
                    WEDNESDAY, MARCH 1
8:45-10:15
Collaborative Research I
Dan Carr, George Mason University
G.P. Patil, Penn State University

Science and Information Management:
Present and Future Perspectives
Joe Abe and Nathan Wilkes
                                              Mt.Vernon-Al
                                                    Wakefield
10:15-10:30
Break
10:30-11:45
        Collaborative Research II
        Gina Papush, UMBC
- 11;o'  Jim Lee, Nancy Flournoy, David
    qx  Crosby, American University

        Strategic Directions in Information
        Resources Management at EPA
        Mark Day

        New Sources of Environmental Data:
        Testing the Latest Aerial and
        Sattellite Sensing at the Field of
        Dreams
        Liz Porter
                                      Mt.Vernon-Al
                                                    Wakefield
11:45-1:30
Lunch
1:30-3:00      Geographic Visualization of
               Environmental Quality
                                     Mt.Vernon-Al

-------
                         llth Annual EPA Conference on Statistics
               A Case Study of Surface Water
               Conditions in the US/Mexico Border
               Area
               Judy Calem,  Nicole Cortina,  Joan
               Crawford,  Doug Freeman,  Avi
               Goldscheider, and Ron Shafer
               Lewis Summers and T.  Nigo:  Martin
               Marietta

               Building Environmental Data
               Management and Analytical
               Capabilities in the Great Lakes
               Region and the Baltic Republics
               Steve Goranson
               Environmental Statistics in the       Wakefield
1:30-3:00      Water Office - I
               Henry D.  Kahn, Organizer

               Alternative Models for Analysis of
  1:30         Composite Environmeii^l Samples
               Henry D.  Kahn, George W. Zipf,  and
               Alan Unger

               Estimates of Fish Consumption Rates
  2:00         in the United States
               Helen Jacobs, Henry D. Kahn, and
               Kathleen Stralka

               Benchmark Dose in  an Acute
  2:30         Toxicity study
               Mary A. Marion

3:00:3:15      Break

               Water Quality Based Effluent
  3:15         Limitations and the Statistical
               Properties of Low Concentration
               Measurements in Analytical
               Chemistry
               Chuck White and Henry D. Kahn

-------
                         llth Annual EPA Conference on Statistics
3:15-5:00      Statistical Analysis of Risk and      Mt.Vernon-B
               Performance Results
               Bimal sinha, UMBC, Organizer
               Jon Helton, Sandia National
               Laboratories
               Tim Margulies, EPA

               Ina Alterman, National Research
               Council, Discussant
               Using Relative Data Quality           Mt.Vernon-
               Indicators of Precision and Bias      A2
               Don Miller, EPA, Region VII
               Environmental Statistics in the
               Hater Office II (Continued from      Wakefield
               earlier session)
               Long island Breast Cancer Study
               Project:  Environmental statistics    Mt  Vernon-Al
               Research Issues
               G.  Iris Obrams, M.D.  Ph.D
               Director,  Long Island Breast Cancer
               Program;  Chief, Extramural Programs
               Branch, National Cancer Institute
5:00-6:00      Poster Session
               Barbara Parzygnat,  Organizer
               Terence Fitz-Simons,  Co-Organizer    Mt.Vernon-C
               David Mintz
               Jerry Akland
               Mary Marion
               David Crosby
               Nancy Flournoy
               Jim Lee
               Don Miller
               Lewis Summers

-------
                         llth Annual EPA Conference on Statistics


                        Thursday, March 2
8:30-9:45      Tutorial: Tine Series Basics           Mt.  Vernon-A
               Pepi Lacayo

               Using Epidemiological Data to          wakefieid
               Examine Statistical Models
               Elizabeth Margosches, Organizer
               Assessment of USEPA IEUBK Model
               Prediction of Elevated Blood Levels
               Karen Hogan

               Human Experiences for Judging
               Predictions From Animal Cancer
               Models
               Cheryl Siegel Scott


9:45-10:00     Break                                 Mt. Vernon-A

               Featured Award Winning
               Presentations - Acknovledegements
               to EPA Statisticians
               Mel Hollander, organizer

10:00-12:15    Environmental Tobacco Smoke
               Statistics: Industry vs. EPA From
               an EPA Point of View
               Steven P.  Bayard
               Jennifer Jinot

               A Book is Born
               Wayne Ott

-------
REGISTRANTS

-------
REGISTRANTS

-------
                        REGISTRANTS

           The Eleventh Annual Epa Conference on Statistics
             The George Washington Inn and Conference Center
            Williamsburg, Virginia - February 29, March 2,1995
Jacquelyn J. Ager
OPPE
Phone:  (202) 260-5971
Fax:    (202) 260-4968

Gerald Akland
ORO
Phone:  (919) 541-4885
Fax:    (919) 541-1496

Derry Allen
OPPE
Phone:  (202) 260-4028
Fax:    (202) 260-0275
Ruth Allen
NCI
Phone: (301)
Fax:   (301)
   496-9600
   402-4279
Ina Alterman
NRC
Phone: (202) 334-2748
Fax:   (202) 334-3077

Roch Baamonde
Region 2
Phone: (212) 264-3052
Fax:   (212) 264-9695
Steven P.
ORD
Phone: (202)
Fax:   (202)
Bayard
   260-5722
   260-3803
Jeff Beaubier
OPPTS
Phone: (202) 260-2263
Fax:   (202) 260-1279

Dorothy Bertino
ERL
Phone: (405) 436-8681
Fax:   (405) 436-8529
Martin Brossman
OW
Phone: (202) 260-7023
Fax:   (202) 260-1977

Jim Brown
OSW
Phone: (703) 308-8656
Fax:   (703) 308-8609

Susan Brunenmeister
OA
Phone: (202) 260-0246
Fax:   (202) 260-0200

Lori Brunsman
OW
Phone: (703) 305-5453
Fax:   (703) 308-2902
                         Joan Bundy
                         OPPE
                         Phone:  (202)
                         Fax:    (202)
             260-2680
             260-4968
                         Richard T. Burnett
                         Health Canada
                         Phone: (613) 957-1877
                         Fax:   (613) 957-4546
Judy Calem
OPPE
Phone:  (202)
Fax:    (202)
260-3638
260-4968
                         Dan Carr
                         George Mason Univ.
                         Phone: (703) 993-1671
                         Fax:   (703) 993-1700

                         Margaret Conomos
                         OPPE
                         Phone: (202) 260-3958
                         Fax:   (202) 260-4968

-------
 Nicole Cortina
 OPPE
 Phone:  (202)  260-0998
 Fax:    (202)  260-4968

 Rick Cothern
 OPPE
 Phone:  (202)  208-4376
 Fax:    (202)  208-4867

 Larry Cox
 AREAL
 Phone:  (919)  541-2648
 Fax:    (919)  541-7588

 John P. Creason
 EPA
 Phone:  (919)  541-2598
 Fax:    (919)  541-5394

 David s. Crosby
 American University
 Phone:  (202)  885-3127
 Fax:    (202)  885-3155

 J. Michael Davis
 ORD/OHEA
 Phone:  (919)  541-4162
 Fax:    (919)  541-0245

 Mark  Day
 OARM
 Phone:  (202)  260-8672
 Fax:    (202)  260-3923

 Kim Devonald
 OPPE
 Phone:  (202)  260-4904
 Fax:    (202)  260-4903

 Thomas E. Dixon
 ORD-NCERQA
 Phone:  (202)  260-5780
 Fax:    (202)  260-4346

 Donald L. Doerfler
 HERL
 Phone:  (919)  541-7741
 Fax:    (919)  541-5394

 Evan  England
ORD/EMSL-LV
Phone:  (702)  798-2248
Fax:    (702)  798-2107
Gloria C. Feeney
OPPTS
Phone: (703) 305-7436
Fax:   (703) 305-6309

Bernice Fisher
OPPTS
Phone: (703) 305-5959
Fax:   (703) 305-5453

Terence Fitz-simons
OAQPS
Phone: (919) 541-0889
Fax:   (919) 541-1903

George T. Flatman
ORD/EMSL-LV
Phone: (702) 798-2528
Fax:   (702) 798-2208

Nancy Flournoy
American University
Phone:  (202) 885-3127
Fax:   (202) 885-3155

Douglas R. Freeman
OPPE
Phone: (202) 260-3378
Fax:   (202) 260-4968

Michael A. Gansecki
Region 8
Phone: (303) 293-1510
Fax:   (303) 293-1724

William V. Garetz
OPPE
Phone: (202) 260-2685
Fax:   (202) 260-4968

Jill Gendelman
OPPTS
Phone: (202) 260-0288
Fix:   (202) 260-1279

Chap Gleason
OPPE
Phone: (202) 260-9006
Fax:   (202) 260-4968

Lynn Goldman
Asst Admin. OPPTS
Phone: (202) 260-2902
Fax:   (202) 260-1577

-------
Avi Goldscheider
OPPE
Phone:  (202) 260-5136
Fax:    (202) 260-4968

Alan R. Goozner
OPPTS
Phone:  (703) 308-8147
Fax:    (703) 308-8151

Stephen Goranson
EPA, Region 5
Phone:  (312) 886-3445
Fax:    (312) 886-1515

Wilson L. Haynes
Region 4
Phone:  (404) 347-3555
Fax:    (404) 347-2130

Jon Helton
Ariz. State Univ.
Phone:  (505) 848-0693
Fax:    (505) 848-0705

Helen Hinton
EPA
Phone:  (919) 541-4618
Fax:    (919) 541-1903

Karen Hogan
OPPTS
Phone:  (202) 260-3895
Fax:    (202) 260-1279

John W. Holley
OAR
Phone:  (202) 233-9305
Fax:    (202) 233-9557

Helen Jacobs
OW
Phone:  (202) 260-5412
Fax:    (202) 260-7185

Jennifer Jinot
ORD
Phone:  (202) 260-8913
Fax:    (202) 260-3803

Henry Kahn
OW
Phone:  (202) 260-9408
Fax:    (202) 260-7185
Ela Kinowska
Environment Canada
Phone: (819) 953-8948
Fax:   (819) 953-9542

Art Koines
OPPE
Phone: (202) 260-4030
Fax:   (202) 260-0275

Mel Kollander
Temple University
Phone: (202) 973-2820
Fax:   (202) 293-3083

Herbert Lacayo
OPPE
Phone: (202) 260-2714
Fax:   (202) 260-4968

Jim Lee
American University
Phone: (202) 885-1691
Fax:   (202) 885-2494

Lawrence Leemis
Wm & Nary
Phone: (804) 221-2034
Fax:   (804) 221-2988

Eleanor Leonard
OPPE
Phone: (202) 260-9753
Fax:   (202) 260-4968

Patricia Little
OPPE
Phone: (202) 260-2679
Fax:   (202) 260-4968

Arthur Lubin
EPA
Phone: (312) 886-6226
Fax:   (312) 303-4342

Jesse Mabellos
HERL
Phone: (919) 541-3743
Fax:   (919) 541-5394

Elizabeth H. Margosches
OPPTS
Phone: (202) 260-1511
Fax:   (202) 260-1279

-------
 Tim Margulies
 OAR
 Phone:  (202)  233-9774
 Fax:    (202)  233-0981

 Mary A.  Marion
 OPPE
 Phone:  (703)  308-2854
 Fax:    (703)  308-5453

 Don Miller
 LABO/ENSV
 Phone:  (913)  551-5156
 Fax:    (913)  551-5218

 David Mintz
 OAQPS
 Phone:  (919)  541-5224
 Fax:    (919)  541-1903

 William  L. Monson
 Region 8
 Phone:  (303)  293-0981
 Fax:    (303)  293-1647

 William  C. Nelson
 ORD/AREAL
 Phone: (919)  541-3184
 Fax:    (919)  541-1486

 Barry Nussbaum
 OPPE
 Phone: (202)  260-1493
 Fax:   (202)  260-4968

 G.  Iris Obrams
 NCI
 Phone: (301)  496-9600
 Fax:   (301)  402-4279

Wayne Ott
AREAL
Phone: (919)  541-3184
Fax:   (919)'541-7588

Gina Papush
Univ. Maryland
Phone: (410)  455-3785
Fax:   (410)  455-1066

Barbara Parzygnat
OAQPS
Phone: (919)  541-5474
Fax:   (919)  541-1903
 Ganapati  P.  Patil
 Penn  State Univ.
 Phone:  (814)  865-9442
 Fax:    (814)  865-7114

 Hugh  M. Pettigrew
 OPPTS
 Phone:  (703)  305-5699
 Fax:    (703)  305-5147

 Elizabeth Porter
 OPPE
 Phone:  (202)  260-6129
 Fax:    (202)  260-4903

 William F. Raub
 OA
 Phone:  (202)  260-0486
 Fax:    (202)  260-3682

 Erika Ronca
 OAR/ORIA
 Phone:  (202)  233-9724
 Fax:    (202)  233-9555

 N. Phillip Ross
 OPPE
 Phone:  (202)  260-2680
 Fax:    (202)  260-8550

 Jerry Sacks
 HISS
 Phone:  (919)  541-6255
 Fax:    (919)  541-7102

 Judith B. Schnid
 HERL
 Phone:  (919)  541-0486
 Fax:    (919)  541-5394

 Cheryl Scott
 ORD
 Phone: (202)  260-5720
 Fax:   (202)  260-3803

 Denise Settles
 ORIA
 Phone: (202)  233-9704
 Fax:   (202)  233-9650

R. Woodrow Setzer
HERL
Phone: (919)  541-0128
 Fax:   (919)  541-5394

-------
Ronald W. Shafer
OPPE
Phone: (202) 260-6966
Fax:   (202) 260-4968

Bimal Sinha
Univ. Maryland
Phone: (410) 455-2412
Fax:   (410) 455-1066

William P. Smith
OPPE
Phone: (202) 260-2697
Fax:   (202) 260-4968

Chris Solloway
OPPE
Phone: (202) 260-3008
Fax:   (202) 260-4968

Steve Stodola
Region I OQA
Phone: (617) 860-4634
Fax:   (617) 860-4397

James L.  Sutton
HERL
Phone: (919) 541-7610
Fax:   (919) 541-5394

Lewis Summers
Martin Marietta
Phone: (202) 260-9710
Fax:   (202) 260-4968

John Peter Harqo
Yale Univ.
Phone: (203) 432-5100
Fax:   (203) 432-5942

John Warren
ORD-NCERQA
Phone: (202) 260-9464
Fax:   (202) 260-4346

Chuck White
OW
Phone: (202) 260-5411
Fax:   (202) 260-7185

Nathan Wilkes
OPPE
Phone: (202) 260-4910
Fax:   (202) 260-4903
Denise Zvanovec
Region 2
Phone: (212)  264-3052
Fax:   (212)  264-9695

-------
ABSTRACTS

-------
ABSTRACTS

-------
                             ABSTRACT

      STATISTICAL QUALITY ASSURANCE:  DATA QUALITY ASSESSMENT

                  John Warren & Thomas E. Dixon
               Office of Research and Development


     Data  Quality   Assessment  (DQA)    is  the   scientific  and
statistical evaluation of data to determine  if the data are of the
right type, quality, and quantity  to support  their intended use.
DQA is the conclusion to the Agency's recommended  approach for data
collection;   Planning    (Data   Quality   Objectives    [DQO]),
Implementation  (Quality  Assurance  Project Plans  [QAPP]),  and
Assessment  (DQA), but  is  possibly the hardest  part  for  non-
statisticians to apply. Guidance (Data Quality Assessment 6-9) is
being developed that will assist non-statisticians investigate some
of  the statistical  assumptions underlying any  data  collection
activity.

     Similar  to  the established  DQO  Process,  the DQA  Process
consists of iterative steps to investigate data:

               • Review DQOs and Sampling Design
               • Conduct Preliminary Data Review
               • Select the Statistical Test
               • Verify the Assumptions
               • Perform the Statistical Test

     Some  of the steps require  only  elementary knowledge  of
statistics,  others  can  require   quite  extensive  statistical
expertise. The Quality Assurance Management staff offers  the G-9
Guidance  as  a  tool  for  non-statisticians  to complement  the
guidances for DQOs and QAPPs. The guidance is not intended to be a
comprehensive handbook on statistical quality assurance, but more
of  a  primer  that  enables  analysts and managers to  interpret
statistical conclusions.

     The  presentation outlines the Agency's position on  data
collection activities, gives  an overview of the  contents  of G-9,
and outlines the direction of future work.

-------
                    Spatial Sampling for Local Estimation:
                  One-phase, Two-phase, and N-phase Designs
                               Evan J. Englund
                      U.S. Environmental Protection Agency
In the absence of prior knowledge, there is no basis for selecting preferential sampling
locations; hence, an optimal one-phase design must necessarily involve a uniform
spatial distribution of sampling locations.  In two-phase sampling, estimates based  on
data from the first  phase are  used to concentrate the set of second phase samples  in
areas where they are most needed. In N-phase sampling each observation is a phase;
the estimates are updated after each  observation and the best location for the next
observation is selected. Algorithms for these three methods have been developed and
their relative performance is compared.

-------
    ENVIRONMENTAL MONITORING: NEW ANSWERS FOR OLD QUESTIONS


      Abstract:   G.T. Flatman,
      Often statistics is used and  taught as if it  were a dead language like Latin
which first killed the Romans and is now killing the students.  However, statistics is
alive and well and adding new methods and algorithms.  In the last few years, spatial
statistics has rewritten the answers to the ubiquitous questions  of  (1) how to take
"representative" samples' of assured quality,  (2) how to optimize sampling design
(number  of samples), and (3) how to make data analysis understandable to decision
making. The cause of the change  is "spatial correlation," which is a technical term for
the common sense fact that environmental samples  taken close together in space ^
both apt to be high because they  come from the  same plume area or  both low
because they come from the same background area. Varying together is correlation.
This talk  will summarize the meaning of: (1) "correct sample" from Gy's Theory  for
determining sample mass in heterogeneous media for Quality Assurance, (2) additional
sampling optimization rules-equal probability and equal spacing of samples, and (3)
data analysis ftjmeasure false positives,  false negatives, and power for the decision
makers. Statistics did not kill the ancient statisticians  and has the potential to make
the life of the Environmental scientist manager a  lot easier and more productive
(accurate).

-------
Lawrence Leemis
Survival Analysis
Probabilistic models and statistical methods for the analysis of survival data are
presented. General analytic techniques based on the likelihood function are applied to
the exponential and Weibull distributions.  These techniques are illustrated by analyzing
several complete and right-censored data sets.

-------
           ENVIRONMENTAL RESEARCH AT NISS


                                Jerome Sacks
              National Institute of Statistical Sciences, PO Box 14162,
                       Research Triangle Park, NC 27709
Key Words:  Risk assessment, environmental monitoring, air pollution, meta-analysis
      Environmental research at NISS has centered on air pollution issues and on
approaches to combining studies in the risk assessment of exposure to toxic
substances.  The research has featured collaborations among statisticians,
meteorologists-and lexicologists. One emphasis has been on estimating trends in
ozone adjusted for meteorology through the use of nonlinear and nonparametric
models. Data came from monitoring networks in Chicago, a midwest rural region, and
the Gulf Coast, including Houston. There have been several studies reporting a link
between levels of airborne particulars and mortality. A NISS project examining this
question finds results that are inconsistent with previous findings, largely due to earlier
failures to take time of year into account. Approaches to assessing risk associated
with acute inhalation exposure to toxic chemicals from an additional project at NISS,
as does one on extrapolation methods for assessing risk to chronic exposures. Some
specifics of these projects will be reported with a fuller description of the work on
particulates.

-------
              STATISTICS AND THE ENVIRONMENTAL SCIENCES
          STATISTICAL METHODS FOR COMBINING ENVIRONMENTAL
                              INFORMATION

                    Lawrence H. Cox, Walter W. Piegorsch
Lawrence H. Cox, US EPA, MD-75, Research Triangle Park, NC 27711
Key Words: Combining Information, Data aggregation Meta-analysis
      An important concern in environmental studies is the need to combine
information from diverse sources that relate to a common endpoint or effect and to
combine environmental monitoring and assessment data.  Statistical techniques are
integral to analyses that combine environmental monitoring and assessment data.
These techniques are still under development, however, as modem statistical
methodologies for combining information usually require subject-specific formulations.
Herein, we discuss recent developments and opportunities for statistical research in
combining environmental information.

-------
     what  TRI  releases  are  in  my  zip?  Via  Internet  E-mail

                                By

                         Chapman Gleason

Abstract

     This paper describes a intuitive,  cost effective Internet E-
mail interface to a major EPA public released data base, "The Toxic
Release  Inventory".   The  beauty  of  this  interface  is that  it
returns  to the  requestor  (anyone with  an Internet  mail-box)  a
report telling them the  name of the company, the chemicals and the
amount of TRI releases by year  in their zip code.  Since all of the
american public knows their  zip code and/or neighboring zip codes,
and they do not know their latitude/longitude this interface allows
the public to access data via  a simple "placed based" interface.
This pilot  interface is  an example of the EPA Administrators 5 year
strategic  plan  of making data  more available  electronically and
empowering citizens via a placed  based (zip code)  initiative for
ecosystem protection and environmental  justice.  To use this pilot
system, send an Internet mail message to tris@ipcl.was.epa.gov and
in the body of the message type your 5  digit zip code.  The system
returns a report to your Internet mail account.

-------
                     Childhood Exposure to Complex Mixtures of Pesticides

                         John Wargo. Ph.D. and Richard Jackson. M.D.

                                        ABSTRACT

       Nearly 325 pesticides and 1500 inert ingredients, along with their metabolites, are permitted
to exist as residues in the nation's food supply by the U.S. federal government.  The U.S.
Environmental Protection Agency (EPA) judges the health risks of single pesticide residues in food
and drinking water, rather than the mixture of pesticide residues likely to appear in the human diet.
This one-at-a-time approach to regulation has the potential to overlook the potential for toxic effects
from complex mixtures of pesticides. Within this paper, we present a method for estimating
exposure to numerous pesticides in numerous foods which have the common toxicological effect of
inhibiting the enzyme cholinesterase (ChE). We designed a probabilistic computer model to simulate
exposure across 5  pesticides permitted as residues on 11 foods.  Using actual food intake data for
2-year olds and residue data collected by FDA. we simulated person day exposures to the complex
mixtures.  The method appears to provide a reasonable approach to estimating exposure across
compounds.

-------
The Long Island Breast Cancer Study Project: Environmental Statistics Research
Issues

Ruth H. Allen, Ph. D.. M.P.H. and G. Iris Obrams,  M.D.,  Ph. D.,

National Cancer  Institute,  Division of  Cancer Etiology.  Extramural  Programs
Branch

Breast cancer statistics point to important opportunities for prevention.    For
the last several  decades, despite rapid technological advances in detection and
treatment of breast cancer,  mortality rates  are relatively unchanged and breast
cancer  incidence  rates  have more  than  doubled since the  1950's.   Higher than
average breast cancer  incidence rates of over 113 per 100.000 women, and patterns
of increased  breast cancer  in younger  women on Long Island recently received
increased public  and  congressional  attention.  Legislation  passed by the U.S.
Congress in June.  1993  mandated an intensive study of the etiology  of breast
cancer.  The study is required to use a geographic information system approach
to  integrate  a  wide range of  environmental  and  health  statistics.    This
presentation examines  the emerging environmental statistics issues for the Long
Island case, including data  confidentiality, accuracy and precision of exposure
and  dose  reconstruction  for  estimation   of  past  pesticide  exposure,  and
environmental statistics validation for modeling.
Dr. Allen is on detail  froii  EPA,  Office of Pesticide Programs,  Health Effects
Division to  the  National Cancer  Institute.   Dr.  Obrams is  Chief.  Extramural
Programs Branch and Director. Long Island Breast Cancer Study Project.

-------
                     Visual Representation of Statistical Summaries

                                          By

                                     Daniel B. Can-
                           Center For Computational Statistics
                                George Mason University
                                   Fairfax, VA 22030
Abstract
This talk addresses the redesign of row-labeled plots and cumulative distribution plots. Both types
of plots are familiar.  For example, row-labeled plots include labeled dot plots, bar plots, and
horizontally-oriented distributional summary plots such as box plots. The redesign goal is to show
statistical summaries more effectively than traditional business graphics and to facilitate conversion
of statistical summary tables into plots. The proposed designs use perceptual grouping, sorting, and
layering of information to simpb'fy the appearance  of the graphics while incorporating more
information!  The resulting row-labeled plots provide templates for re-expressing numerous ose-
factor, two-factor and three-factor tables.  Color linking between row-labeled plots and maps
provides a convenient way to show statistical summaries of spatial information. The new cumulative
distribution plots can also be used with maps. Color linking allows, these distribution plots to serve
as legends for classed choropleth maps while providing additional distributional detail.

Talk examples emphasize EPA environmental data summaries. However, the examples shown are
templates for a wide variate of applications. Other government agencies, such as BLS and NASS,
are putting the new templates to work. The S-PIus functions, script files and data for producing the
examples are publicly available via anonymous.

-------
Measuring DO restoration goals by  combin-
ing  monitoring station and ouoy data
Nagaraj K. Neerchal, Gina Papush it Sanjoy V
Department of Mathematics and Statistics
University of Maryland Baltimore County
Baltimore, MD 21228 USA
Ronald W. Shafer
Environmental Statistics and Information  Division,  United
States Environmental Protection  Agency,   Washington, DC
20460 USA


Abstract

Dissolved oxygen (DO) is a major factor affecting the survival, distribution
and productivity of the living resources of the  Chesapeake Bay. Target
DO concentrations, with limits to the duration and frequency, ia an impor-
tant element in a program to restore living resources.  DO restoration goals
are stated in the December 1992 DO Restoration Goals document. Semi-
continuous data obtained from telemetering can be used to verify these
goals.  Since semi-continuous data  (buoy data) is available only in a few
specified locations on the Bay, the question of goals verification at the mon-
itoring station needs to be addressed.

    DO levels are available, taken approximately every fifteen day*, at each,
of the monitoring stations. Since the DO goals are generally specified in
terms of hours, we cannot verify whether or not a station sit* i« meeting
DO restoration goab based on biweekly DO observation*. To address this
problem, we developed a spectral analysis method that combines the short
term variations of buoy data with the long term variations of the station
data.  The spectral analysis method produces a synthetic data that can
predict the likelihood of a station site meeting dissolved oxygen restoration
goals. Generating a synthetic data set ia a substitute for expensive observa-
tional data and may be adequate for management purposes such as strategic
planning and tracking progress towards goals.

-------
Nathan Wilkes
Abstract for Statistics Conference
Science and Information Management

ECOVIEW: Bringing Scientific Information to the Masses

Communication of ideas and information is essential to a functional society,
government, business, or family. Our methods of communication are changing
at an exponential rate. The purpose of the ECOVIEW project is to take a very
old method of conveying locational or directional information (i.e. maps) and
combine it with the incredible array of multi-media electronic communications
capabilities found in computers and digital technologies. The result will be the
development of a convenient tool to convey environmental and socio-economic
information geographically. The greatest challenge for this project is not the
technical issues, but the human or organizational aspects in a continuum of data
development through data analysis to information development and
production. Our success will be measured by our ability to bring together the
unique and varied communities of environmental scientists, statisticians, policy
makers, politicians, and the general public resulting in an effective
communication of the condition of the environment,  improved understanding of
the effects of our human behaviors and social policies, and enhanced
opportunities to build consensus.

-------
           SCIENCE AND INFORMATION : A FUTURE PERSPECTIVE

                     Joseph Abe, Environmental Scientist
                                 Futures Staff
              Office of Strategic Planning and Environmental Data
The perspective and thinking of western culture have been significantly influenced by
the scientific and industrial revolutions.  Citizens of western culture, consciously or
unconsciously, tend to view the world from a compartmentalized, mechanistic, near-
term, human-centered and linear perspective. This worldview affects how we: relate to
each other and nature, create and run organizations, measure societal success and
progress, and develop and use information and technologies. While many wonderful
inventions, accomplishments and life improvements can be attributed to the industrial
age, continued development based on this worldview will  inevitably threaten the long-
term sustainability of civilization and the global environment. Creating a sustainable
future for the Earth and its inhabitants requires a new worldview that nurtures
personal and community development, recognizes the interdependence between
humans and nature and fosters democracy, awareness and peaceful coexistence
through global communications. These and other qualities and challenges of the
emerging post-industrial worldview are described to suggest ^merging new roles for
science and information in the twenty-first century.

-------
Environment, Statistics and Public Policy....The ESP Project

A Project of the American University, Washington, DC
Principal Investigators:  David Crosby and James Lee
Funded by: U.S. Environmental Protection Agency
Research Tasks Under the ESP Project

     This grant funds activities for five years in research on statistics and
environment policy issues, and is a collaboration between the School  of
International Service (SIS) and the Mathematics'and Statistics (MAS)
department of American University.  Under the agreement, these two
units will research statistics and environmental issues and will attempt to
work together on some particular research questions. The project is funded
under a cooperative grant from the U.S. Environmental Protection Agency.

     The research effort is the overall Environment, Statistics and  Public
Policy Project (ESP).  The ESP Project consists of a trade and other
environmental issues. Of particular concern to the project are the public
policy issues of importance and the manner in which public policy makers
use data, statistics and other types of information. There are five tasks to
begin the first year of the ESP research grant.

Task 1:    Issues of Measurement in International Treaties: Getting
          the Numbers Correct, David Crosby and James Lee

Task 2:    Getting Information to Policy-Makers: The Trade-Environment
          Interagency Project,  James R. Lee

Task 3:    Case Studies in the Americas on Trade and the Environment
          James R. Lee

Task 4:    Satellite Intercalibration,  David S. Crosby

Task .5:    Up and Down Design, Nancy Floumoy

-------
           ENVIRONMENTAL TOBACCO SMOKE (ETS) STATISTICS -
         EPA VS. THE TOBACCO INDUSTRY FROM EPA's VIEWPOINT.
                      Steven Bayard and Jennifer Jinot
      The U.S. EPA in 1993 concluded that exposure to ETS causes approximately
3,000 lung cancer deaths in U.S. nonsmokers annually, a finding which has been
strongly attacked by the tobacco industry and its consultants, both in the scientific
literature (e.g. Newspaper Advertisements, Washington Times' editorials, Investors'
Business Daily, Congressional Record), and in Federal Court (Middle District Court of
North Carolina). Both parties rely at least partly on statistical analyses of epidemiology
data to support their claims. These claims will be examined by looking at what the
statistics really say. This unbiased analysis will be presented by the Project Officer and
co-author of the EPA report.

-------
ADAPTIVE DESIGNS


Nancy Floumoy:  American University, Washington, DC
Recent advances in adaptive designs for dose-response problems are presented. We
focus on sequential designs that (1) center the treatment distribution around a
prespecified target quantile  given a monotone response function and (2) concentrate
observations at the level yielding maximum probability of success given two opposing
response functions. These designs could be used to define toxic thresholds in terms of
outcomes instead of particle concentration. They also may provide a useful control
mechanism, even when the target levels change with time.  We are interested in
identifying specific applications of these designs to environmental problems

-------
          PROVIDING INFORMATION TO DECISION MAKERS
      TO PROTECT HUMAN HEALTH AND THE ENVIRONMENT
      EPA's senior management has recently completeda stategic plan for management of the
Agency's information resources.  The IRM strategic plan's mission and vision were based upon
the Agency's 7 operating principles as documented in the Agency's Five Year Strategic Plan.
The plan was developed under the guidance of the Agency's Executive Steering Committee
(ESC) for IRM, whose members include the Agency's Assitant Administrators, Associate
Administrators, the IG, the General Counsel, four Regional Administrators, and five State
environmental department executives. The ESC received considerable input into the plan from
an external committee of stakeholders and numerous program and IRM staff in the Agency.

      EPA's Strategic Plan for Infromation Resources Management establishes a far-reaching
and challanging mission and vision for all parts of the Agency. It defines several difficult key
operating principles and core implementation strategies that must be implemented to achieve the
vision. Finally it provides for the establishment of a program to measure IRM performance.

      Change has already begun in the Agency's management of IRM as a result of this
planning effort. The Agency committed $6M to implmentation efforts in FY95 and has
requested over S1SM for implementation efforts in FY96.  Additionally, subcommittees of the
ESC have begun to review EPA's current IRM base of S300M to ensure the Agency is
progressing toward the vision established. Most importantly information management issues are
now on the agenda of the Agency's top management which will lead to fundamental
improvements.
    Mark Day

-------
                                      Mark Day
      Since 1992 Mr. Day has been employed by U.S. EPA in the Office of Administration and
Resources Management. Since 1994 he has served as the manager of the IRM Planning Group.
The IRMPG is a cross divisional group of the Office of Information Resources Management
formed to deal with the IRM Planning material weakness declared in 1993.

      Prior to coming to EPA, he was employed by the Missouri Department of Natural
Resource. From 1986-1992 he served as the Chief Information Officer for the Environmental
Quality Division of the Department. In this capacity he oversaw development of state
environmental systems. His use of advanced techniques and innovative approaches was
recognized in  a case study by Harvard's Kennedy School of Government.  His work emphasized
long range planning for efficient use of government resources as well as innovative use of
information to improve effectiveness of environmental protection in Missouri. Mr. Day also led
the Division's long range planning effort on environmental issues in the 21st century.

      From 1983 until 1986 Mr. Day was the Director of the Residential Energy Program,
which included the low-income weatherization program, solar bank program, and other state and
federal programs operated by the State of Missouri for the purpose of conserving energy.  He was
recognized for innovative use of technology to redu,:: administrative costs and improve customer
service.  He served as special assistant to the Department Director for a study of environmental
permitting processes and cycle times.

      Prior to 1983 Mr. Day worked at a local community action agency (Missouri Ozarks
Economic Opportunity Corporation) as the Director of the Residential Energy Dept where he
directed the weatherization, solar, and emergency energy programs for low-income citizens in 8
county area of central Missouri. He completed the states' first automation of weatherization
inventory tracking and client scheduling processes. His programs were recognized for high
innovation and productivity.

      Mr. Day graduated in 1977 - Magna Cum Laude with 3.849 GPA with a  B.A. in History
and Political Science from Southwest Baptist University, a small liberal arts college. He has
completed numerous studies in business and information resources management.

-------
New Sources of Environmental Data:  Testing Out the Latest Aerial and
Satellite .Sensing at the "Field of Dreams"

Elizabeth D. Porter, Environmental Results Branch, OPPE

      The Field of Dreams is a test site that was established to study the
effectiveness of remotely sensed wetland identification techniques,
specifically in forested environments. It would have been more accurate to
call it the "forest of dreams", (a forest of no dramatic terrain relief, no
uniquely wetland-indicative vegetation, and with varying but usually limited
periods of soil saturation or inundation.) These "driest" of wetlands are the
toughest to identify.  Their boundaries are often fuzzy gradations, difficult to
delineate even from ground observation. The objective of the project is to
identify and advance remote sensing techniques to support the inventorying
of wetland resources, primarily in support of the National Wetland
Inventory (NWI) program administered by the US Fish and Wildlife Service
(FWS).
      "Delineate it, and monitor it; and the technology agencies shall come."
The site is (and will continue to be) probably one of the most heavily imaged
and data-rich wetland sites in the U.S.  Technology developers from the
commercial sector, via the NASA commercial remote sensing program, the
Department of Energy technology program, and fht ;ntelligence community
have all offered support for trying to solve the forested wetland identification
problem. The Field of Dreams consists of ten ground test  sites in transitional
wetland-upland forests in Wango Quad, Wicomico County, MD, on the
eastern shore of the Chesapeake Bay.
      The project is a result of two major interagency initiatives: the first, a
Federal Geographic Data Committee (FGDC) study on wetland mapping
programs; and the second, the review of intelligence community (1C) data by
the environmental science  community.  The presentation provides
background on the project and these two preceding initiatives, as well as
explains some of the challenges relevant to structuring the experiment and
planning for the spring 1995 data collections. The project intends to identify
source and compilation techniques that can improve the accuracies for NWI
products (maps, status and trend plots.) These wetlands are an environmental
problem area with major scientific, technological and policy implications.
Forested wetlands were the cover type selected for study, not only because
they are the most difficult to discern,  but because they are  the wetland cover
type which has experienced the greatest losses in recent years.

-------
 A GEOGRAPHIC VISUALIZATION OF ENVIRONMENTAL QUALITY: A
CASE STUDY OF FECAL COLIFORM BACTERIA IN SURFACE WATERS
                  IN THE US/MEXICO BORDER AREA

J. Calem. N. Cortina, J. Crawford, D. Freeman, A. Goldscheider, R. Shafer
                   U.S. Environmental Protection Agency

                           T. Ngo, L.  Summers
                             Martin Marietta
      Maps are excellent tools to display relational data in a spatial  frame.  The
US/Mexico border area, defined as an area  within 100 km of either side  of the
international boundary, is presented as a regional case study of environmental quality.
A series of maps and additional county-level  attribute data provide insight into the
presence, sources and implications of one pollutant, fecal coliform bacteria, in the
U.S. border  waterways.   Fecal coliform  bacteria is  an agent for many  human
communicable  diseases, such as  typhoid fever, hepatitis a  and  dysentery.   Its
presence in the environment is associated with unsanitary conditions and human and
animal waste. The discussion utilizes the "Pressure-State-Response-Effects" (PSR/E)
macro-framework to associate ambient concentrations of fecal coliform bacteria, land-
use pressures affecting concentrations, human exposure and health effects.  In the
future, a larger study will assess and characterize environmental quality on both sides
of the international boundary, taking into account other environmental pollutants and
media.

-------
  Building Environmental Data Management & Analysis Capabilities
           in the Great Lakes Region & Baltic Republics

                        Stephen K. Goranson
     Deputy Chief, Information Management Branch, EPA Region 5

This presentation focuses on recent  experiences in two  programs,
the Great Lakes and the Baltic Republics, both requiring  improved
capacity for collecting and assessing  data, for transforming  the
data into useful information, and for their dissemination.

The Great Lakes Program, which includes a wide variety of partners
(public,  private,   and  binational),   developed   a   strategic
information plan which  will  allow easy  access to  environmental
data (chemical,  biological, habitat,  and human  health)  and  to
provide  sound  environmental  indicators.   The  initial   system
concept includes (l) a  repository of all Great  Lakes  monitoring
data conforming to established standards; (2) the ability to  link
the monitoring data  to diverse sets  of environmental  assessment
data for comprehensive analysis of ecosystem status and risk;  and
(3) the ability to identify sources of other Great Lakes data  and
pathways to that data, including EPA's corporate, network links to
other data  outside EPA,  and electronic  document  reporting  and
retrieval.

The experiences gained during the  past few years in  implementing
the Great Lakes Strategy served as  a useful model for  subsequent
assistance  to  the   environmental  ministries  of   the   Baltic
Republics.   As  part  of  its   commitment  in  the   cooperative
agreements with the Environmental  Ministries of Estonia,  Latvia,
and Lithuania,  USEPA is  providing technical  assistance  to  the
management  and  integration  of  environmental  monitoring  data.
Specialists from the participating  countries are working  jointly
to assess program  priorities to evaluate  monitoring  information
needs,  define optimum  approaches for gathering,  processing,  and
analyzing  environmental  data   needed  to   support   management
decisions.  USEPA is-providing the basic capacity and training  to
efficiently   management   environmental   data   and   administer
environmental programs.

This discussion will relate the two  geographic areas in terms  of
environmental assessment-data  needs, inventory data  and  models,
data attributes and their quality,  data management practices  and
standards, indicators, and information reporting/exchange.

-------
                  ALTERNATIVE MODELS FOR ANALYSIS
                                    OF
                  COMPOSITE ENVIRONMENTAL SAMPLES
                 BY: Henry D. Kahn, George Zipf and Alan Unger
      This presentation will consider the use of segmented and non-segmented
composite sampling models in environmental field studies.  Composite sampling has
many advantages in environmental work, particularly as a cost-effective method of
obtaining estimates  of the mean. Estimates of the mean and the variance from
composite samples for segmented data and non-segmented data will also be
addressed. Data on field measurements of contaminants in a number of fish species
will be used to illustrate the discussion, In addition, the effectiveness of compositing
versus individual measurements will be evaluated on  the fish effectiveness of
compositing.

-------
     ESTIMATES OF FISH CONSUMPTION RATES  IN THE UNITED STATES

      Co-authors: Helen Jacobs, EPA
                 Henry Kahn, EPA
                 Kathleen Stralka, SAIC
      Estimates of fish consumption in the U.S. based on USDA's combined 1989,
1990,1991 Continuing Survey of Food Intake by Individuals (CSFII) will be presented.
Fish consumption estimates play an important role in a number of EPA programs. In
particular, exposure estimates  used in determining water quality criteria and related
standards are based  in part on the amount offish consumed and contamination levels
in the fish.  This presentation will provide an update on fish consumption estimates by
habitat (marine, estuarine and freshwater) and species including the most recent CSFII
data.  Estimates will be presented for the total U.S. and by geographic region.

-------
         WATER QUALITY BASED EFFLUENT LIMITATIONS
           AND THE STATISTICAL PROPERTIES OF LOW
  CONCENTRATION  MEASUREMENTS IN ANALYTICAL CHEMISTRY
                   Chuck White and Henry Kahn
     Water quality limitations for specific chemicals are sometimes set at
concentrations below EPA's current criteria for detection.  This paper will
discuss the statistical properties  of chemical analytical measurements ar
low concentrations with regards to the concepts of detection and
quantification in analytical chemistry and the requirements of water quality
based effluent limitations for industrial dischargers.

-------
      TREATMENT OF UNCERTAINTY IN PERFORMANCE ASSESSMENTS
                          FOR COMPLEX SYSTEMS
                                Jon C. Helton
                          Department of Mathematics
                            Arizona State University
                            Tempe.AZ 85287-1804
                                 ABSTRACT
When viewed at a high level, performance assessments (PAs) for complex systems
involve two types of uncertainty, stochastic uncertainty, which arises from the fact that a
number of different occurrences have a real possibility of taking place, and subjective
uncertainty, which arises from a lack of knowledge about quantities required within the
computational implementation of the PA.  Stochastic uncertainty is typically
incorporated into a PA with an experimental design based on importance sampling and
leads to the final results of the PA being expressed as a complementary cumulative
distribution function (CCDF). Subjective uncertainty is usually treated with Monte Carlo
techniques and leads to a distribution of CCDFs.  This presentation discusses the use
of the Kaplan/Garrick ordered triple representation for risk in maintaining a distinction
between stochastic and subjective uncertainty in PAs for complex systems. The topics
discussed include (1) the definition of scenarios and the calculation of scenario
probabilities and consequences, (2) the separation of subjective and stochastic
uncertainties, (3) the construction of CCDFs required in comparisons with regulatory
standards (e.g., 40 CFR Part 191, Subpart B for the disposal of radioactive waste), and
(4) the performance of uncertainty and sensitivity studies. Results obtained in a
preliminary PA for the Waste Isolation Pilot Plant, an uncertainty and sensitivity
analysis of the MACCS reactor accident consequence analysis model, and the
NUREG-1150 probabilistic risk assessments are used for illustration.

-------
              STATISTICAL ANALYSIS OF RISK AND PA RESULTS

               Tim Margulies U.S. Environmental Protection Agency
                              Washington, DC 20460

               Bimal Sinha: University of Maryland, Baltimore County
                           Dept. of Mathematics and Statistics
                           Baltimore, Maryland 21228
 Performance and risk  analyses provide useful quantitative information to evaluate a
technology or activity and the level of safety needed to protect the environment.
Uncertain models, data, and future events and processes are explicitly considered to
generate probabilistic results.  This paper presents an overview of probabilistic
modelling approaches for estimating the likelihood of human intrusion via exploratory
drilling at the WIPP (Waste Isolation Pilot Plant) in New Mexico and several illustrative
calculations.  Furthermore, a statistical approach based on hypothesis testing is
investigated to determine compliance with a probabilistic standard (such as
the"containment standard" in the radioactive waste management regulations).
Potential to other areas of environmental energy regulation and decision-making,  such
as reactor design modification will also be discussed.

-------
              using Relative Data Quality Indicators
                      of Precision and Bias

                       D. Miller, Region 7

     Improvement in the implementation of data quality evaluation
and in the use of data for decision making can be achieved by
addressing three problems:  first,'the nature of environmental
data; second, the general decision-making procedures; and third,
decision-making in the specific situation where the numbers
involved are slightly larger than zero.

     Environmental data often confounds the methods of
traditional statistics, because the numerical values within a
single data set can span many orders of magnitude.  As a result,
the standard deviation of the measurement process is not constant
over the range of observed values.

     Decision-making using hypothesis testing can be confusing to
the stake-holders.  The traditional null hypothesis requires
convoluted logic.  Furthermore, environmental data can be
complex.  As a result, valid procedures for environmental
decisions provided by traditional statistics are often
restrictively narrow in scope and are often incomprehensible to
the average stake-holder.

     Traditional statistical procedures work well for two
situations:  first, the standard deviation is a constant, and
second, the coefficient of variation is a constant.  When the
true value is zero, the standard deviation is constant, and when
the true value is large,  the coefficient of variation is
constant.  Traditional statistical procedures do not work well in
the specific region between "zero" and "large".

     The three problems may be resolved with a unified approach.
First,  the approach uses expressions for data quality indicators
that are applicable'to the entire range of observed values,  are
internally consistent, and use as data the results of widely
accepted types of QC samples.  Second, environmental decisions
are made using the direct test, which is simple and
understandable.  And third, the precision of the entire range of
observed values, from "zero" to "very large",  is unified in the
hyperbolic model.

     This presentation will describe a consistent set of
expressions for data quality indicators,  describe decision making
using the direct test, and describe the hyperbolic model.  This
will be followed with examples of the unified approach.

-------
ASSESSMENT OF THE U.S. EPA IEUBK MODEL PREDICTION OF ELEVATED
BLOOD LEAD LEVELS.   K. A. Hogan, R.W. Elias, AH.Marcus, P.O. White. U.S.
Environmental Protection Agency, Office of Pollution Prevention and Toxic
(Washington, DC) and Office of Health and Environmental Assessment (Research
Triangle Park, NC and Washington, DC)
      The Integrated Exposure, Uptake, and Biokinetic (IEUBK) Model  for Lead in
Children, which was designed to predict the proportion of a population of children with
elevated blood leads (Ig/dL) on a site-specific basis, was examined for its use as a
risk assessment tool for regulatory purposes.  This was carried out with existing data
sets relating environmental and blood lead levels on a per individual basis, by using
the IEUBK Model to generate  blood leads predictions from the measured
environmental lead levels. These predicted blood lead levels were then compared
with the measured blood lead level, by comparing geometric mean blood leads and
proportions observed or expected to have elevated blood lead levels. All studies used
for this examination had data of sufficient quality and quantity to characterize the
environmental lead leve's in each residential home and yard (i.e.., for each
participant: blood lead;; soil, dust water, interior and exterior paint lead;  and
demographic/behaviorsurvey data covering other aspects of lead exposures). The
model results and observed blood lead levels were reasonably concordant and
similar population proportions with elevated blood lead level.

-------
             HUMAN EXPERIENCES FOR JUDGING PREDICTIONS
                      FROM ANIMAL CANCER MODELS
Cheryl Siegel Scott. US Environmental Protection Agency, Office of Health and
                  Environmental Assessment (8602). Washington, DC
      The use of epidemiologic data is preferred for basing inferences about human
cancer risks. Only rarely are complex statistical models such as the excess relative
risk, absolute risk, and relative risk models fitted to cohort data since such data either
are often not readily at hand or are considered to be of insufficient quality. By default,
projections of human cancer risks are based on animal bioassay data.. Adopting a
philosophy of choosing one set of data over another is narrow and throws away
information. This talk  proposes that both animal and human data support an evaluation
of cancer risk.  General methods are reviewed for using human data to gauge the
accuracy of animal-based cancer risk estimates. Examples using epidemiologic
information on exposure to formaldehyde, methylene chloride, and trichloroethylene
illustrate discussed principles.

-------
       She Lover itio Grande Valley Environmental Monitoring Study*
           Applying Human Exposure Science to Public Health Concern*
                         Gerald Akland
             U.S. Environmental Protection Agency
          Atmospheric Research end Exposure Assessment Laboratory
               Research Triangle Park, we 27711
An  environmental monitoring  investigation  vae initiated  in the
Lover Rio Grande valley in response to valley residents' concerns
about  the  potential link  betvaen their  health  and pollution.
Potential sources  of contamination include industrial emissions,
agricultural pesticide nee,  and inadequate infrastructure,  fiat the
•cope and magnitude of the implications of the resulting pollution
for the local population have not been  documented,   Exposure is
often the  missing link in  the effort to  evaluate- environmental
health  risks  and an  understanding of human exposure is essential
for developing effective  risk reduction  policies.   A field pilot
provided preliminary data about the levels, sources, and pathways
of actual human  exposure in the Valley.  Specifically, samples of
indoor  and  outdoor air,  house dust,  soil/  food,  drinking vater,
urine, .blood  and breath samples vere collected and analysed for
metals, VOCs, PAHs,  and  pesticides.  This poster presents the
design  of  the study,  the  concepts  underlying  the  design,  the
perceived utility  and value  of the  approach  chosen, the field
implementation methods, and the Implications of  the  experiences
gained through this process.  This project has the potential to set
a new model  for environmental health research which integrates
public health concerns, exposure reduction,  illness prevention/ and
regulatory activities of many agencies.

-------
          Presentation  of  1993 National Air Quality Data

                           David Mintz
              U.S.  Environmental  Protection Agency
           Office  of Air Quality Planning and Standards
                Research Triangle Park,  NC  27711

Last  October,  EPA released  its   twenty-first  annual  report
documenting  national  air  pollution  and emissions  trends.   The
National  Air  Quality  and Emissions Trends Report  highlights six
pollutants for which standards have been set and tracks  how well
areas are  doing to  meet  those  standards.   I will use  one of the
pollutants,  PM-10,  as  an example to demonstrate various ways we
present the data.

-------
      APPLICATION OF GEOGRAPHIC INFORMATION SYSTEMS TO THE
              ASSESSMENT OF FECAL COLIFORM BACTERIA IN
            SURFACE WATERS IN THE US/MEXICO BORDER AREA

                                 Lewis Summers
                                Martin Marietta

   The U.S. Environmental Protection Agencies Office of Policy, Planning and Evaluation
Environmental Statistics and Information Division is conducting a characterization report of
surface water quality within the U.S./Mexico border region. A computerized geographic
information systems (GIS) is utilized to manipulate, manage and display statistical results of
one pollutant, fecal coliform.  Fecal coliform bacteria is an agent for many human
communicable diseases, such as typhoid fever, hepatitis a and dysentery.  Its presence in the
environment is associated with unsanitary condition and human and animal waste.  The GIS
can visually portray study results and also superimpose various other spatial data sets for
further analysis.  In the future, a larger study will assess  and characterize environmental
quality on both sides of the border, taking into account other environmental pollutants and
media.

-------
 Ceo
n
  INNOVATIVE ENVIRONMKNTAL RESOURCE SAMPLING AND ASSESSMENT I
-------
        d distribution* lake into at count the observer-observed interface and provide a promising
approach tor the problems of ascertainment in environmental resource as&cssmem.

The 1995 HI1 A Statistics Conference presentation will discuss current research and outreach work
on environmental sampling with observational economy in progress under a Penn Siate-KPA
OPPE ESID Cii-uuci alive  Agreement.
                                      References

Gore, S. D., and Patil, G. P. (1994).  Identifying extremely large values using composite sample
dam.  Environmental and Ecukijticul Statistics. 1(3), (Ui appear).

Gore, S. D., Paul, G. P., Sinha, A. K,, and Taillic, C. (1V93). Certain muliivariaic considerations
in ranked set sampling and composite sampling designs. lu Multivartate Environmental Statistic*,
G. P. Patil  and C. R. Rao, eds.  North Holland, Amsterdam, pp. 12M48.

Gove, J. H., Patil, G. P., Swindel, B. K, and Taillie, C. (1994).  Ecological diversity and forest
management. In Handbook of Statistics, Volume 12: Environmental Statistics, G. P. Patil and C.
R. Rao, eds.  North Holland, Amsterdam, pp. 409-462.

Myers. W.  L., Johnson. 0. D., and Patil, G. P. (1994). Rapid mobilization of spatial/temporal
information in the context of natural catastrophes.  Invked paper presented at 1W4 Spring
Statistical Meetings in Cleveland,  Ohio.  IW4 ASA Proceedings (to appear).

Myers, W.  L., and Paul, G. P. (1994).  Simplicity, efficiency, and economy in forest surveys.
              'Mtschrift fur r-'orstwesen (lo appear).
Myers,  W.  1... Patil, G.  P., and  Taillie  (1994).   Comparative paradigms for  biodiversity
assessment  Invited paper at the IUHRO Symposium in Chiang-Mai, Thailand. To appear in the
Proceedings Volume.

Patil. G. I'., Gore, S. U.. and Sinha, A. K. (1993). Environmental chemistry, statistical modeling.
and observational economy.  In Environmental Statistics, Assessment, and Forecasting, C. R.
Cothern and N. P. Ross, eds.  Lewis PubUCRC Press, Boco Raton, PL.  pp.  57-97.

Patil, G. P., and Rao, C. R.  (eds). (1993). Multivariatt Environmental Statistics. North-Holland,
Amsterdam.  596 pp.

Patil, G. P., and Rao, C. R. (eds).  (1994).  Handbook of Statistics, Volume 12: Environmental
Statistics.  North-Holland,  Amsterdam,  pp. 927.

Patil, 0. P.. Sinha, A. K., and Taillie, C. (1994).   Ranked set sampling.  In Handbook of
Statistics,  Volume 12: Environmental Statistic,*, C. P. Pali) and C. K. Rao. eds. North Holland,
Amsterdam,  pp.  167-21X).

-------
Pali I, G. P.,- and Taillie,  C. (1993).   linvironmental sampling, observational economy, and
statistical inference with emphasis on ranked set sampling, encounter sampling, and composite
sampling.  In Hull. ISl, Proceedings of 49th Session, Firenze, Italy, pp. 295-312.

Paul, G. P., Taillie, C., and  Talwalker, S. (1993).  Encounter  sampling and modelling  in
ecological and environmental studies using weighted distribuiion methods.  In Statistics for the
Environment, V. Banictt and K. K Turkman, eds., Wiley, New York. pp. 45-69.

Thompson, S. (1994),.  Factors influencing die efficiency of adaptive cluster sampling. Technical
Report 94-0301, Center for Statistical Ecology and linviroiimentol Statistics.  Department  of
Statistics, Pennsylvania State University, University Park, PA.

-------
Time Series Tutor - Beta Version

Nagaraj K. Neerchal et. al


The Times Series Tutor is an ambitious, and from our experience, a
novel effort to introduce data oriented scientists into the heart
of the  problems associated with  time  series.   This  tutorial is
aimed at those who want a serious  but accessible introduction into
the mechanics of time  series modeling.  The TST  is a self contained
computer program that  allows one to proceed-at their own pace.  The
current version to be dis'played still requires considerable work.

-------
                              Taefc-4

                    Satellite Int«rcalibration

                         David s. Crosby
                                                               J
     One of  the  most important problems in the use  of satellite
data  for  the detection  of climate trends  is  that  of satellite
intercalibration.   It  is well known,  for example,  that different
instruments  in  the same series  of  satellites  can have slightly
different characteristics.   The same temperature sensing channels
on successive satellites can differ by over 2.0 degrees celsius.
These differences  can  lead to inconsistencies  in  the time series
and can make the  use of satellite data for the detection of climate
trends difficult.  The standard method for modeling these effects
in time series is intervention analysis.  For many of the satellite
data sets this may be the best or only technique available«
     However,  for  some  of the  satellite data seta there  is a
significant  period  of  overlap  between  the   two  successive
satellites.    We  examine   the  use  of  this   overlap  period  to
intercalibrate  the  two instruments.    The technique uses the
empirical distribution  functions.  It requires very large sample
sizes,  the probability  of the  signal of  interest  for  the two
satellites  is  the  same  and that   th« measurements for  both
satellite* are monotone functions of the same signal.   Examples of
the technique will be presented.

-------
                             ABSTRACT

      STATISTICAL QUALITY  ASSURANCE:  DATA  QUALITY  ASSESSMENT

                  John Warren & Thomas E.  Dixon
               Office of Research and Development


     Data  Quality   Assessment   (DQA)   is  the   scientific   and
statistical evaluation of data to determine  if the data are of the
right type, quality, and  quantity  to support  their intended use.
DQA is the conclusion to the Agency's recommended approach for data
collection;   Planning    (Data    Quality   Objectives    [DQO]),
Implementation  (Quality  Assurance  Project Plans  [QAPP]),  and
Assessment  (DQA), but  is  possibly the  hardest  part  for  non-
statisticians to apply. Guidance (Data Quality Assessment G-9) is
being developed that will assist  non-statisticians  investigate some
of  the statistical  assumptions underlying any  data  collection
activity.

     Similar 'to   the  established  DQO  Process,  the DQA  Process
consists of iterative steps to investigate data:

               • Review DQOs and Sampling Design
               • Conduct Preliminary Data Review
               • Select the Statistical Test
               • Verify the Assumptions
               • Perform the Statistical Test

     Some  of the steps  require  only elementary  knowledge  of
statistics,  others  can  require   quite   extensive  statistical
expertise. The Quality Assurance Management Staff offers  the G-9
Guidance  as  a   tool  for  non-statisticians  to  complement  the
guidances for DQOs and QAPPs.  The guidance is not  intended to be a
comprehensive handbook on statistical quality assurance,  but more
of  a primer  that  enables  analysts  and managers  to  interpret
statistical conclusions.

     The  presentation outlines  the  Agency's position  on  data
collection activities, gives an  overview  of the  contents  of  G-9,
and outlines the direction of future work.

-------
GUIDANCE FOR ENVIRONMENTAL
   DATA QUALITY ASSESSMENT
            DRAFT
           EPA QA/G-9
             DRAFT
   United States Environmental Protection Agency
      Quality Assurance Management Staff

          Washington, DC 20460

-------
                   The 5 Steps of the Data C  Jity Assessment Process
1.       Review the Data Quality Objectives and Sampling Design: Review the DQO outputs
        to assure that they are still applicable.  If DQOs have not been developed, specify
        DQOs before evaluating the data (for environmental decisions, define the statistical
        hypothesis and specify tolerable limits on decision errors; for estimation problems,
        define an acceptable confidence or probability interval width). Review the sampling
        design and data collection documentation for consistency with the DQOs.

2.       Conduct a Preliminary Data Review:  Review quality assurance  reports, calculate basic
        statistical quantities and generate graphs of the data.  Use this information to learn
        about the structure of the data and identify  patterns, relationships, or potential
        anomalies.

3.       Select the Statistical Test:  Select the most  appropriate procedure for summarizing and
        analyzing the data, based on the preliminary data review.  Identify the key underlying
        assumptions that must hold for the statistical procedures to be valid.

4.       Verify the Assumptions of the Statistical Test:  Evaluate  whether the underlying
        assumptions hold, or whether departures are acceptable,  given the actual data and other
        information about the study.

5.       Perform the Statistical Test:  Perform the calculations required for the statistical test
        and document the inferences drawn as a result of these calculations.  If the design is to
        be used again, evaluate the performance of  the sampling design.

-------
              Overview 1:  Review DQOs and Sampling Design

Translate the data user's objectives into a statement of the primary statistical hypothesis.
•   If DQOs have not been developed, review section B.2.1, B.2.2, and Table 1-1, then
    develop a statement of the hypothesis based on the data user's objectives.
•   If DQOs were developed, translate the DQO Process outputs into a statement of the
    primary hypothesis that corresponds to the data user's decision.

Translate the data user's objectives into tolerable limits on the probability of committing Type
I or Type II decision errors.
•   If DQOs have not been developed, review section B.2.3 and document the data user's
    tolerable limits on decision errors.
•   If DQOs were developed, confirm that the data user's tolerable limits on decision errors
    were fully specified.

Review the sampling design and note any special features or potential problems.
•   Review the applicable parts of section C corresponding to the type of sampling design
    used for this study.

-------
          Overview 2:  Conduct Preliminary Data Review

Review quality assurance reports.
•  Look for problems or anomalies in the implementation of the sample collection and
   analysis procedures.
•  Examine QC data for information that may be useful in verifying assumptions
   underlying the Data Quality Objectives, the Sampling and Analysis Plan, and the
   Quality Assurance Project Plans.

Calculate the statistical quantities.
•  Select appropriate measures of central tendency (Box 2-2) and dispersion (Box 2-
   4).
•  Consider calculating appropriate percentiles (Box 2-1) and measures of
   distributional shape (Box 2-6).
•  If data involve two variables,  calculate the Pearson correlation coefficient (Box 2-7).

Display the data using graphical representations.
•  Select graphical representations from section C that illuminate the structure of the
   data set and highlight assumptions underlying the Data Quality Objectives, the
   Sampling and  Analysis Plan,  and the Quality Assurance Project Plans.
•  Use a variety of graphical representations that examine different features of the set.

-------
                   Overview 3: Select the Statistical Test

Select the statistical hypothesis test based on the data user's objectives and the results of
the preliminary data review.
•   If the problem involves comparing study results to a fixed threshold, such as a
    regulatory standard, consider the hypothesis tests in section B.I.
•   If the problem involves comparing two populations, such as comparing data from two
    different locations or processes, then consider the hypothesis tests in section B.2.

Identify the assumptions underlying the statistical test.
•   List the key underlying assumptions of the statistical hypothesis test, such as
    distributional form, dispersion, independence, or others as applicable.
•   Note any sensitive assumptions where relatively small deviations could jeopardize the
    validity of the test results.

-------
   Overview 4:  Verify the Assumptions of the Statistical Test

Determine approach for verifying assumptions.
• Identify any strong graphical evidence from the preliminary data review.
• Review (or develop) the statistical model for the data.
• Select the tests for verifying assumptions.

Perform tests of assumptions.
• Adjust for bias if warranted.
• Perform the calculations required for the tests selected in activity 4.1.

If necessary, determine corrective actions.
• Determine whether data transformations will correct the problem.
• If data are missing, explore the feasibility of using theoretical justification
  or collecting new data.
• Consider robust  procedures or nonparametric hypothesis tests.

-------
                  Overview 5:  Perform the Statistical Test

Perform the calculations for the statistical hypothesis test.
•   Perform the calculations and document them clearly.
•   If anomalies or outliers are present in the data set, perform the calculations with and
    without the questionable data.

Evaluate the statistical test results and draw conclusions.
•   If the null hypothesis is rejected, then draw the conclusions and document the
    analysis.
•   If the null hypothesis is not rejected, verify whether the tolerable limits on false
    negative decision errors have been satisfied. If so, draw conclusions and document
    the analysis; if not, determine corrective actions, if any.

Evaluate the performance of the sampling  design if the design is to be used again.
•   Evaluate the statistical power of the design over the full range of parameter values;
    consult a statistician as necessary. .

-------
                               TABLE OF CONTENTS

                                                                                   Page
 INTRODUCTION  	  0-1

       Purpose and Overview  	  0-1
              Intended Audience	0-1
              Organization of (his Guidance . .      	  0-1

       Overview of the DQA Process	  0-2
              The 5 Steps of the Data Quality Assessment Process	  0-2
              DQA and the Data  Life Cycle  	  0-3

       Background  	  0-4
              Errors Due to Imperfect Sampling and Measurement	  0-4
              Decision Errors	  0-5
              Hypothesis Testing  	  0-6
              Uncertainty vs. Inconclusive	  0-6

STEP 1: REVIEW DQOs AND THE SAMPLING DESIGN	  1-1

       Overview	  1-2

       A.     Activities	  1-2
              Translate the data user's objectives into a statement of
                ihe. primary statistical hypotheses	  1-2
              Translate the data user's objectives into tolerable limits on the
                probability of committing Type I or Type II decision errors	  1-3
              Review the sampling design and note any special features
                or potential problems	'.	  1-3

       B.     The Data Quality Objectives (DQO) Process	  1-4
              B.I     Relationship Between DQOs and DQA	  1-4
              B.2     Developing  DQOs Retrospectively	•	  1-5
                     B.2.1  Defining the Background of the Data Collection Retrospectively  .  1-6
                     B.2.2  Developing the Statement of Hypotheses	  1-9
                     B.2.3  Specifying Tolerable Limits on Decision Errors

       C.     Designs for Sampling Environmental Media in
              Space and Time  	 1-11
              C.I     Authoritative Sampling Versus Probability Sampling	 1-11
              C.2     Probability Sampling   	 1-12
                     C.2.1         Simple Random Sampling  	 1-12
                     C.2.1         Sequential Random Sampling and Double Sampling:
                                 Variations on Simple Random Sampling
                    C.2.3        Systematic Samples	
                    C.2.4        Stratified Samples	
                    C.2.5        Other Probability Samples  		
                    C.2.6        Compositing and Subsampiing of Specimens
-14
-15
-15
-16
-17
EPA QA/G-9
                                  DRAFT

-------
                                                                                   Page
       D.     References .................................................  M8

STEP 2:  CONDUCTING A PRELIMINARY DATA REVIEW ......................  2-1

       Overview  ............... ..... ....................................  2-2
       A.     Activities [[[
              Review quality assurance reports .........................  2-2
              Calculate statistical quantiues  ...................................  2-3
              Graph the data  ...............................................  2-3

       B.     Statistical Quantities  ..........................................  2-4
              B.I    Measure of Relative Standing • Percentiles  ......................  2-5
              B.2    Measures of Central Tendency ...............................  2-5
              B.3    Measures of Dispersion  ...................................  2-6
              B.4    Measures of Shape ......................................  2-11
              B.5    Measures of Association ..................................  2-12

       C.     Graphical Representations .....................................  2-13
              C.I    Stem-and-Leaf Diagram  ..................................  2-14
              C.2    Histogram/Frequency Plots  ................................  2-15
              C.3    Box and Whiskers Plots  ..................................  2-19
              C.4    Ranked Data Plot .......................................  2-20
              C.5    Quantile Plot  ..........................................  2-23
              C.6    Normal Probability Plot (Quantile-Quantile Plots)  ................  2-25
              C.7    Plots for Temporal Data  ........... , .......................  2-28
                    C.7.1   Time Plot .......................................  2-32
                    C.7.2   Plot of the Autocorrelation Function (Correlogram)  .........  2-33
                    C.7.3   Other Temporal Graphical Representations  ...............  2-34
                    C.7.4   Multiple Observations Per Time Period ..................  2-36
              C.8    Plots for Spatial Data  ....................................  2-36
                    C.8.1   Posting Plots  ....................................  2-37
                    C.8.2   Symbol Plots  ....................................  2-38
                    C.8.3   Other Spatial Graphical Representations  ........ . ........  2-39
              C.9    Plots for Two or More Variables ............................  2-40
                    C.9.1   Scatter Plot  ..................................... 2-41
                    C.9.2   Extensions of the Scatter Plot ......................... 2-42
                    C.9.3   Empirical Quantile-Quantile Plot  ...................... 2-46

       D.     References ................................................. 2-48

STEP 3:  SELECT THE STATISTICAL TEST  ................................. 3-1

       Overview [[[ 3-2

       A.     Activities  [[[ 3-2

-------
                                                                                        Page
        B.     Hypothesis Tests 	  3-3
               B.I    One-Sample Tests	    3-3
                      B 1.1   Tests for a Mean  	   3-4
                             B.I.I.I        The One-Sample T-Test  	  3-4
                             B.I.1.2        The Wilcoxon Signed Rank Test 	  3-6
                      B.I.2   Tests for a Proportion or Percentile	  3-8
                             B.I 2.1        The One-Sample Proportion Test	  3-9
                      B.I.3   Tests for a Median   	  3-12
               B 2    Two-Sample Tests	  3-12
                      B.2.1   Comparing Two Means	  3-12
                             B.2.1.1        Two Sample T-Test for Comparing
                                           Population Means	  3-13
                             B.2.1.2        Wilcoxon Rank Sum Test	  3-15
                      B.2.2   Comparing Two Proportions or Percentiles  	  3-18
                             B.2.2.1        Two-Sample Test for Proportions	  3-18
                      B.2.3   Comparing Two Medians	  3-21
                      B.2.4   Other Two-Sample Tests	  3-21
               B.3    References  	  3-22

 STEP 4: VERIFY THE ASSUMPTIONS  OF THE STATISTICAL TEST	  4-1

        Overview  	  4*2

        A.      Activities	  4-2
               Determine approach for verifying assumption	  4-2
               Perform tests of assumptions  	  4-3
               Determine corrective actions (if any))	  4-4

        B.      Tests for Normality	  4-4
               B.I    Background	  4-5
               B.2    Graphical  Methods	  4-6
               B.3    Shaptro-Wilk Test for Normality (the W test)	  4-6
               B.4    Extensions of the Shapiro-Wilk Test (Fiiliben's Statistic)	  4-8
               B.5    Coefficient of Variation  	  4-9
               B.6    Coefficient of Skewness/Coefficient of Kurtosis Tests  	  4-10
               B.7    Range Tests	  4-11
               B.8    Goodness-of-Fit Tests	  4-14
               B.9    Recommendations 	  4-15
               B.10    References 	  4-15

       C.      Tests for Trends	  4-17
               C.I     Background	  4-17
               C.2     Estimating Trends 	  4-18
                     C.2.1   Regression	  4-18
                     C.2.2   Sen's Slope Estimator	  4-18
                     C.2.3   Seasonal Kendall Slope Estimator	  4-18
              C.3    Tests for Trends	  4-18
                     C.3.1   Mann-Kendall Test	  4-18
                     C.3.2   Seasonal Kendall Test and Sen's Test	  4-19
EPA QA/C-9                                    ill

-------
                                                                                    Page
              C.4    Tests for Homogeneity of Trends ............................ 4-19
              C.5    References   .......................................... 4-19
       D.     Outliers [[[
              D.I    Background ...........................................  4-20
              D.2    Statistical Tests for Outliers  ...............................  4-21
                     D.2.1   Selection of a Statistical Test  ........................  4-21
                     D.2.2   Extreme Value Test (Dixon's Test) I ....................  4-22
                     D.2.3   Discordance Test  .................................  4-23
                     D.2.4   Walsh's Tests ...................................  4-25
                     D.2.5   Rosner's Test ....................................  4-25
                     D.2.6   Special Cases and Other Sources of Information ............  4-28
              D.3    References  ...........................................  4-28
       E.     Tests for Dispersion ..........................................
              E.1     Confidence Intervals for a Single Variance  .....................  4-29
              E.2     F-Test for the Equality of Two Variances ......................  4-30
              E.3     Bartlett's Test for the Equality of Two or More Variances ..........  4-31
              E.4     Levene's Test for the Equality of Two or More Variances ..........  4-31
              E.5     References  ...........................................  4-31
       F.     Transformations  ............................................

STEP 5: PERFORM THE STATISTICAL TEST  ................................  5-1

       Overview  . [[[  5-2

       Activities  [[[  s'2
              Perform the calculations for the statistical hypothesis test  .................  5-2
              Evaluate the statistical test results and draw the study conclusions ...........  5-2
              Evaluate the performance  of the sampling design if the design is to
                be used again ..............................................  5-3

APPENDIX A: STATISTICAL TABLES ............................ .- ......... A-l
                               LIST OF TOOL BOXES

                                                                                    Page
Box 2-1.       Directions for Calculating the Measure of Relative Standing
              (Percentiles) with an Example ..................................... 2-6
Box 2-2:       Directions for Calculating the Measures of Central Tendency  ............... 2-7
Box' 2-3:       Example Calculations of the Measures of Central Tendency ................ 2-8
Box 2-4:       Directions for Calculating the Measures of Dispersion .................... 2-9

-------
                                                                                       Page
Box 2-7.       Directions for Calculating the Correlation Coefficient
               (the Pearson Correlation Coefficient)	  2-12
Box 2-8:       Example Calculations of the Correlation Coefficient  	  2-13
Box 2-9:     •  Directions for Generating a Stem and Leaf Plot  	  2-14
Box 2-10:      Example of Generating a Stem and Leaf Diagram	  2-14
Box 2-11:      Directions for Generating a Histogram and a Frequency Plot 	  2-17
Box 2-12:      Example of Generating a Histogram and a Frequency Plot	  2-18
Box 2-13:      Directions for Generating a Box and Whiskers Plot	  2-20
Box 2-14:      Example of a Box and Whiskers Plot  	  	  2-21
Box 2-15:      Directions for Generating a Ranked Data Plot  	  2-22
Box 2-16:      Example of Generating a Ranked Data Plot	  2-23
Box 2-17:      Directions for Generating Quamile Plot	•  2-25
Box 2-18:      Example of Generating a Quantile Plot   	  2-26
Box 2-19:      Directions for Constructing a Normal Probability Plot	  2-29
Box 2-20:      Example of Constructing a Normal Probability Plot	  2-30
Box 2-21:      Directions for Generating a Time Plot and an Example	  2-33
Box 2-22:      Directions for Constructing a Correlogram	,. •  2-34
Box 2-23:      Example of Generating a Correlogram	•  • • •  2-35
Box 2-24:      Directions for Generating a Posting Plot and an Example   .	  2-38
Box 2-25:      Directions for Generating a Symbol Plot and an Example   	  2-39
Box 2-26:      Directions for Generating a Scatter Plot and an Example	  2-43
Box 2-27:      Directions for Constructing an Empirical Q-Q plot	  2-46
Box 2-28:  '    Example of Constructing an Empirical Q-Q  Plot	  2-47
Box 4-1:       Directions for Filliben's Statistic (Normal Probability Plot
               Correlation Coefficient)	,			  4-8
Box 4-2:       Example of Filliben's Statistic (Normal Probability Plot Correlation
               Coefficient)   	  4-9
Box 4-3:       Directions for the Coefficient of Variation Test and an Example	  4-10
Box 4-4:       Directions for Coefficient of Skewness And  Kurtosis Tests  	  4-11
Box 4-5:       Directions for Studentized Range Test .	  4-12
Box 4-6:       Example of Studentized Range Test	  4-13
Box 4-7:       Directions for Geary's Test	  4-13
Box 4-8:       Example of Geary's Test  . . . . .	  4-14
Box 4-9:       Directions for the Extreme Value Test (Dixon's Test)	  4-22
Box 4-10:      An Example of the Extreme Value Test (Dixon's Test)	  4-23
Box 4-11:      Directions for the Discordance Test	  4-24
Box 4-12:      An Example of the Discordance Test		  4-24
Box 4-13:      Directions for Walsh's Test for Large Sample Sizes  	  4-25
Box 4-14:      Directions for Rosner's Test for Outliers	  4-26
Box 4-15:      An Example of Rosner's Test for Outliers	'.	  4-27
Box 4-16:      Directions for Constructing a Confidence Intervals and Confidence
               Limits for the Sample Variance and Sample  Standard Deviation	  4-30
Box 4rl7:      Directions for Calculating an F-Test to Compare Two Variances  .	  4-30
Box 4-18:      Directions for Transforming Data and an Example  	  4-32
EPA QA/G-9

-------
                                  LIST OF FIGURES
                                                                                      Page
Figure 0-1.     DQA in the Context of the Data Life Cycle     	   0-4
Figure 0-2     Environmental Decisions and Potential Errors	  0-6
Figure 2-1.     Example of a Histogram  	  2'16
Figure 2-2.     Example of a Frequency Plot	  2'16
Figure 2-3.     Example of a Box and Whiskers Plot for Symmetric Data       	  2-19
Figure 2-4.     Example of a Ranked Data Plot    	  2'22
Figure 2-5.     Example of a Skewed Quantile Plot  	   2"24
Figure 2-6.     Example of a Normal Probability Plot	  2'27
Figure 2-7.     Example of a Time Plot	  2-32
Figure 2-8.     Example of a Correlogram	  2-33
Figure 2-9.     Example of a Posting Plot 	  2'37
Figure 2-10.   Example of a Symbol Plot	  2'38
Figure 2-11.   Example of Graphical Representations of Multiple Variables  	  2-41
Figure 2-12.   Example of a Scatter Plot	  2"42
Figure 2-13.   Example of a Coded Scatter Plot	  2~44
Figure 2-14.   Example of a Parallel Coordinates Plot  	  2-44
Figure 2-15.   Example of a Matrix Scatter Plot  	  2"45
Figure 4-1.     Graph of a Standard Normal Distribution	  4-52



                            LIST OF OVERVIEW BOXES

Overview 1:   Review DQOs and Sampling Design	  1-1
Overview 2:   Conduct Preliminary Data Review	  2-l
Overview 3:   Select the Statistical Test	  3'1
Overview 4:   Verify  the Assumptions  of the Statistical Test  	  4-1
Overview 5:   Perform the Statistical Test	  5-1



                                   LIST OF TABLES
                                                                                      Page
Table 1-1.     Commonly Used Statements of Statistical Hypotheses	  1-10
Table 4-1.     Data for Examples	  4'6
Table 4-2.     Tests for Normality	-	  *-7
Table 4-3.     Summary of Recommendations for Selecting a Statistical Test for Outliers  	  4-21
Table A-l.     Cumulative Standard Normal Distribution  .  .	A-2
Table A-2.     Critical Values of Filliben's Statistic	A'3
Table A-3.     Critical Values for the Studentized Range Test	A-4
Table A-4.     Critical Values for the Extreme Value Test  	A-5
Table A-5.     Critical Values for the Discordance Test  	A-6
Table A-6.     Approximate Critical  Values for Rosner's Test	A-7
Table A-7.     Critical Values of Student's t Distribution  	A-10
Table A-8.     Quantiles of the Wilcoxon Signed Ranks Test Statistic  	A-l 1
Table A-9.     Critical Values for the Rank Sum Test	A-12
Table A-10.   Critical .Values of the Chi-Square Distribution  	A-13
EPA QA/C-9
                                             VI

-------
EVALUATION FORM

-------
EVALUATION FORM

-------
                         EVALUATION FORM




          ELEVENTH ANNUAL EPA CONFERENCE ON STATISTICS




                   FEBRUARY 27  - MARCH 2,  1995






1.  Overall Conference Evaluation
Questions (please check one box)
Did you broaden your EPA contacts?
Did you update your current
knowledge?
Did you find exposure to new
material?
Did you gain more agency-wide
perspective?
Were you able to exchange technical
methods?
Were you able to discuss problems
and concerns?
Very
Much






Some
Extent






Limited
Extent







-------
2.  Session Evaluation
Questions (please check one box)
Statistical Software and the
Single statistician
Keynote Address
Featured Speaker
Statistical Quality Assurance:
Data Quality Assessment
Environmental Monitoring: New
Answers to Old Questions and
Spatial Sampling
Tutorial: .Survival Analysis
Statistical Methods for
Combining Environmental
Information and Environmental
Research at NISS
Tutorial: Publishing on the
Internet
Emerging Issues in Environmental
Statistics-I
Pesticides in the Diets of
Infants and Children: Exposure
and Risk Estimation Using Monte
Carlo simulation
Statistical Policy Advisory
Committee
Collaborative Research I
Science and Information
Management: Present and Future
Perspectives
Collaborative Research II
Strategic Directions in
Information Resources Management
at EPA
New Sources of Environmental
Data: Testing the Latest Aerial
and Satellite Sensing at the
Field of Dreams
Highly
Relevant
















Fairly
Relevant
















Not
Very
Relevant
















-------
Session Evaluation Con't
Questions (please check one box)
Geographic Visualization of
Environmental Quality
A Case Study of Surface Water
Conditions in the US/Mexico
Border Area
Building Environmental Data
Management and Analytical
Capabilities in the Great Lakes
Region and the Baltic Republics
Environmental Statistics in the
Water Office - I
Alternative Models for Analysis
of Composite Environmental
Samples
Estimates of Fish Consumption
Rates in the United States
Water Quality Based Effluent
Limitations and the Statistical
Properties of Low Concentration
Measurements in Analytical
Chemistry
Benchmark Dose in an Acute
Toxicity Study
Statistical Analysis of Risk and
Performance Results
Using Relative Data Quality
Indicators of Precision and Bias
Environmental Statistics in the
Water Office II (Continued from
earlier session)
Long Island Breast Cancer Study
Project: Environmental
Statistics Research Issues
Poster Session
Highly
Relevant













Fairly
Relevant













Not
Very
Relevant














-------
3.  What were the greatest strengths of the conference?  What
    aspects did you like the most?
4.  What were the greatest weakness of the conference?  What
    aspects and sessions did you like the least?
5.  Would you be interested  in other training sessions that would
    introduce you to a new development in applied statistical
    methodology?

    Yes 	                No                    Unsure

    Suggestions for topics:
6.  Are you planning to attend next year's conference on
    statistics?

    Yes                      No 	             Unsure
7.  Other comments:

-------
NOTES

-------
NOTES

-------