Tne lltn EPA Conference on Statistics
February 27, 1995 - Marcli 2, 1995
George Washington Inn and Conference Center
W.ll. -T7-. . .
dliamsburg, Virginia
EPA STATISTICS
WATER QUALITY
-------
Welcome to the Eleventh Annual EPA Conference on Statistics
We are delighted to welcome you to Williamsburg for the llth
Annual EPA Conference on Statistics. This conference has evolved
considerably from its origin one summer week in Raleigh. The
conference now covers a wider scope of information and data
topics compared with the "purer statistic" topics of years of
old. This year we have added some new features. We are having a
morning of collaborative research reporting to learn the progress
of several universities working under EPA grants. We are also
initiating an invited session to honor some success stories among
EPA statisticians. You'll also note the addition of an informal
session to discuss the ins and outs of various statistical
software packages. These new items along with our regular
sessions, featured speakers, tutorials, and poster session should
provide something for everyone. I want to thank both the
Conference Planning Committee and our Arrangements Committee for
their very hard efforts to plan and organize such a conference.
I am sure we will uncover a few t's that haven't been crossed,
but at least we dotted the i's. Do let us know how we can
improve any aspect of the conference.
Barry D. Nussbaum
Conference Chairman
1995 Conference Planning Committee
Jacquelyn Ager
Ruth Allen
Barbara Parzygnat
Susan Brunenmeister
Rick Cothern
Barnes Johnson
Henry Kahn
Mel Hollander
Elizabeth Margosches
John Warren
Pat Wilkinson
Joan Bundy
Margaret Conomos
Nicole Cortina
1995 Arrangements Committee
Pepi Lacayo
Patricia Little
Pat Wilkinson
-------
appeared in Amscat News, Janiinry 1995, Number 216, pp. 33-34.
Section on Statistics and the
Knvlmnmeitl
We Meet, We Discuss,
We Move Forward
Barry D. Nussbaum
Program Chair, 1995
U.S. environmental Protection Agency
Happy New Year! While the year is
new and young 10 you. many of us have
already been planning cvenis to take
place in 1995 This includes paiiltipaiion
in many forums, colloquia, and meetings.
Aftei all, the marriage of statistics and
ilie'environment is not just confined to
the Atncricnn Statistical Association.
Using llie discipline of statistics toward
solving enviiorunenial problems is
Indeed an "In" topic for many confer-
em cs 1 he section li- active in promoting
the topic, at cvciy nppoilimily Ficqucnl-
ly we sui'jjrsi topics and piovide spcakets
and siippoii u> oilier organizations in a
nevei-enctwg cffoil In pats the word
around The moie we meet and discuss,
the moie statistics becomes embedded
In the qucM fumivlionmcnial solutions.
l:ollowing is a samplet of cnviroiniicn-
tal statistics conff retires and sessions
within othei confeicnces foi 1995. In
many cases, contacts are given so you
can voUmicci fin sessions, papers,
workshops, etc Obviously, the success
of these meetings depends on our indi-
vidual involvrmrril In j'.ciif ral, I've listed
confeiences that aie not in Amstol News
Add these to the list, and you'd never
have a day In the office.
Rest, a woid fiom our sponsot As
piogram than, 1 grt to advertise out own
conference fust1 Ihis is not unbiased,
hut iaihcr a BNIMl estimate. 'I his Is Barry*
»CM Biased Blandishment. We are slill
seeking cieative ideas (01 papcis, special
contributed sessions, roundiables, and
workshops foi the animal Joint Statistical
Mi-cling In Oilando. 'Mils will occur
August 13-17 at the Wall Disney World
Dolphin nod Swmi. two of the better
creatmes for wlilc.li we thank the Section
on Statistics and the Environment. Vou
can icach me with yotn suggestions at
(202) 260-11D i, nt by fax at (101) JW-
1968. li-mail also woiks:
NUSSHAUM.UARRY@PPAMA1I .ni'A.GO
V. Soiry, no collect calls and no 000 num-
bci, bin out operators are standing by
'24 hours a day We look Im waul to see-
ing you In sunny Florida.
And now for the sampler on what
1995 has to offer foi Statistics and the
Enviionmeiu
Depending on ASAs mailing schedule,
(his eiihei will be a good cnnlctciirc or
was a good Lonfeience ASA's Wintei
meeting is m Keseatch Tiiangle Park. NC
on January 6-8. Lany Cox (919-511-
264H) of El'A is organizing a session on
Statistics in niiviionmcnul Science So
eiihei look for the proceedings or get to
the nearest an pott and attend!
On March 26-29, our sister organiza-
tion, FNAIi will liost us spnng meeting
in Utimingham, Alabama. Two invited
sessions have been anatiged by oui sec-
tion Charles Davis (702.4->h-K094) is
organizing "Rcgtilatoty Statistics for llnvi-
lonmcnial Coniaminatloii," and Hlmal
Slnha (410-193*2147) has put together
Srr I'NIVIRONMP.N 1 pagf 34
"SlalisliLal Me! hods loi
Analysis "
Washington, IXC. will be the site ku
the May 17-19 inceiiiif> of die Waste Pol-
icy Institute and Ait and Waste Munngc-
inrnt Association Conleience "Clial-
lcii{.4y8y) Then quick.
ly iravp| u> Meiida, Mexico, for the
SI'UIK'.P 111 confciencc on December II-
13. Ihc keynote pa|Ki will be delivered
by hold a former section chaiimati (I'lul
Ross) and a future nnr (l^ity Cox). For
this one, you may want to spruce up on
youi Spanish.
I hat's only a sample, llie univctsc Is
much laigei. lu.il you may ceilainly infer
Iiotn this sample thai the srclion is active
and the topics are vfiy CUMCIU We hope
lu see you and have you participate at
several of these linpoilnnt meetings In
1995. Make this a lesolutlon you will
keep'
mem is a planned session on statist ical
aspects nf testing fot compliance with
-------
AGENDA
-------
llth Annual EPA Conference on Statistics
AGENDA
MONDAY, FEBRUARY 27
3:30-5:30 Registration and Check-in Mt. Vernon-C
5:00-6:00 Statistical Software and the Single Mt. Vernon-B
Statistician
Informal Discussion Group on
Statistical Software
Elizabeth Margosches and Susan
Brunenmeister
TUESDAY, FEBRUARY 28
8:30-8:40 Welcoming Remarks Mt. Vernon-A
Barry Nussbaum
8:40-8:45 Introduction of Speakers
Phil Ross
8:45-9:30 Keynote Address
Lynn Goldman, Assistant
Administrator for Office of
Prevention,Pesticides and Toxic
Substances
9:30-10:15 Featured Speaker
William F. Raub, Science Advisor to
the Administrator - Director,
National Center for Extramural
Research and Quality Assurance
10:15-10:30 Break
-------
llth Annual EPA Conference on Statistics
10:30-11:45
Statistical Quality Assurance: Data
Quality Assessment
John Warren
Tom Dixon
Mt.Vernon-A
Environmental Monitoring: New
Answers to Old Questions and
Spatial Sampling
George Flatman
Evan Englund
Wakefield
11:45-1:15
Lunch
1:15-2:30 Tutorial: Survival Analysis Mt.Vernon-B
Lawrence Leemis, College of William
and Mary
Statistical Methods for Combining
Environmental information and
Environmental Research at HISS
Larry Cox and Jerry Sacks
Wakefield
2:30-2:45 Break
2:45-4:00 Tutorial: Publishing on the
Internet
Chap Gleason
Emerging Issues in Environmental
Statistics-I
Ruth Allen, Organizer
Pesticides in the Diets of Infants
and Children: Exposure and Risk
Estimation Using Monte Carlo
Simulation
John Peter Wargo, Yale University
Mt.Vernon-B
Wakefield
4:00-4:15
Break
-------
-------
llth Annual EPA Conference on Statistics
4:15-5:30 Statistical Policy Advisory
Committee
Mt.Vernon-B
WEDNESDAY, MARCH 1
8:45-10:15
Collaborative Research I
Dan Carr, George Mason University
G.P. Patil, Penn State University
Science and Information Management:
Present and Future Perspectives
Joe Abe and Nathan Wilkes
Mt.Vernon-Al
Wakefield
10:15-10:30
Break
10:30-11:45
Collaborative Research II
Gina Papush, UMBC
- 11;o' Jim Lee, Nancy Flournoy, David
qx Crosby, American University
Strategic Directions in Information
Resources Management at EPA
Mark Day
New Sources of Environmental Data:
Testing the Latest Aerial and
Sattellite Sensing at the Field of
Dreams
Liz Porter
Mt.Vernon-Al
Wakefield
11:45-1:30
Lunch
1:30-3:00 Geographic Visualization of
Environmental Quality
Mt.Vernon-Al
-------
llth Annual EPA Conference on Statistics
A Case Study of Surface Water
Conditions in the US/Mexico Border
Area
Judy Calem, Nicole Cortina, Joan
Crawford, Doug Freeman, Avi
Goldscheider, and Ron Shafer
Lewis Summers and T. Nigo: Martin
Marietta
Building Environmental Data
Management and Analytical
Capabilities in the Great Lakes
Region and the Baltic Republics
Steve Goranson
Environmental Statistics in the Wakefield
1:30-3:00 Water Office - I
Henry D. Kahn, Organizer
Alternative Models for Analysis of
1:30 Composite Environmeii^l Samples
Henry D. Kahn, George W. Zipf, and
Alan Unger
Estimates of Fish Consumption Rates
2:00 in the United States
Helen Jacobs, Henry D. Kahn, and
Kathleen Stralka
Benchmark Dose in an Acute
2:30 Toxicity study
Mary A. Marion
3:00:3:15 Break
Water Quality Based Effluent
3:15 Limitations and the Statistical
Properties of Low Concentration
Measurements in Analytical
Chemistry
Chuck White and Henry D. Kahn
-------
llth Annual EPA Conference on Statistics
3:15-5:00 Statistical Analysis of Risk and Mt.Vernon-B
Performance Results
Bimal sinha, UMBC, Organizer
Jon Helton, Sandia National
Laboratories
Tim Margulies, EPA
Ina Alterman, National Research
Council, Discussant
Using Relative Data Quality Mt.Vernon-
Indicators of Precision and Bias A2
Don Miller, EPA, Region VII
Environmental Statistics in the
Hater Office II (Continued from Wakefield
earlier session)
Long island Breast Cancer Study
Project: Environmental statistics Mt Vernon-Al
Research Issues
G. Iris Obrams, M.D. Ph.D
Director, Long Island Breast Cancer
Program; Chief, Extramural Programs
Branch, National Cancer Institute
5:00-6:00 Poster Session
Barbara Parzygnat, Organizer
Terence Fitz-Simons, Co-Organizer Mt.Vernon-C
David Mintz
Jerry Akland
Mary Marion
David Crosby
Nancy Flournoy
Jim Lee
Don Miller
Lewis Summers
-------
llth Annual EPA Conference on Statistics
Thursday, March 2
8:30-9:45 Tutorial: Tine Series Basics Mt. Vernon-A
Pepi Lacayo
Using Epidemiological Data to wakefieid
Examine Statistical Models
Elizabeth Margosches, Organizer
Assessment of USEPA IEUBK Model
Prediction of Elevated Blood Levels
Karen Hogan
Human Experiences for Judging
Predictions From Animal Cancer
Models
Cheryl Siegel Scott
9:45-10:00 Break Mt. Vernon-A
Featured Award Winning
Presentations - Acknovledegements
to EPA Statisticians
Mel Hollander, organizer
10:00-12:15 Environmental Tobacco Smoke
Statistics: Industry vs. EPA From
an EPA Point of View
Steven P. Bayard
Jennifer Jinot
A Book is Born
Wayne Ott
-------
REGISTRANTS
-------
REGISTRANTS
-------
REGISTRANTS
The Eleventh Annual Epa Conference on Statistics
The George Washington Inn and Conference Center
Williamsburg, Virginia - February 29, March 2,1995
Jacquelyn J. Ager
OPPE
Phone: (202) 260-5971
Fax: (202) 260-4968
Gerald Akland
ORO
Phone: (919) 541-4885
Fax: (919) 541-1496
Derry Allen
OPPE
Phone: (202) 260-4028
Fax: (202) 260-0275
Ruth Allen
NCI
Phone: (301)
Fax: (301)
496-9600
402-4279
Ina Alterman
NRC
Phone: (202) 334-2748
Fax: (202) 334-3077
Roch Baamonde
Region 2
Phone: (212) 264-3052
Fax: (212) 264-9695
Steven P.
ORD
Phone: (202)
Fax: (202)
Bayard
260-5722
260-3803
Jeff Beaubier
OPPTS
Phone: (202) 260-2263
Fax: (202) 260-1279
Dorothy Bertino
ERL
Phone: (405) 436-8681
Fax: (405) 436-8529
Martin Brossman
OW
Phone: (202) 260-7023
Fax: (202) 260-1977
Jim Brown
OSW
Phone: (703) 308-8656
Fax: (703) 308-8609
Susan Brunenmeister
OA
Phone: (202) 260-0246
Fax: (202) 260-0200
Lori Brunsman
OW
Phone: (703) 305-5453
Fax: (703) 308-2902
Joan Bundy
OPPE
Phone: (202)
Fax: (202)
260-2680
260-4968
Richard T. Burnett
Health Canada
Phone: (613) 957-1877
Fax: (613) 957-4546
Judy Calem
OPPE
Phone: (202)
Fax: (202)
260-3638
260-4968
Dan Carr
George Mason Univ.
Phone: (703) 993-1671
Fax: (703) 993-1700
Margaret Conomos
OPPE
Phone: (202) 260-3958
Fax: (202) 260-4968
-------
Nicole Cortina
OPPE
Phone: (202) 260-0998
Fax: (202) 260-4968
Rick Cothern
OPPE
Phone: (202) 208-4376
Fax: (202) 208-4867
Larry Cox
AREAL
Phone: (919) 541-2648
Fax: (919) 541-7588
John P. Creason
EPA
Phone: (919) 541-2598
Fax: (919) 541-5394
David s. Crosby
American University
Phone: (202) 885-3127
Fax: (202) 885-3155
J. Michael Davis
ORD/OHEA
Phone: (919) 541-4162
Fax: (919) 541-0245
Mark Day
OARM
Phone: (202) 260-8672
Fax: (202) 260-3923
Kim Devonald
OPPE
Phone: (202) 260-4904
Fax: (202) 260-4903
Thomas E. Dixon
ORD-NCERQA
Phone: (202) 260-5780
Fax: (202) 260-4346
Donald L. Doerfler
HERL
Phone: (919) 541-7741
Fax: (919) 541-5394
Evan England
ORD/EMSL-LV
Phone: (702) 798-2248
Fax: (702) 798-2107
Gloria C. Feeney
OPPTS
Phone: (703) 305-7436
Fax: (703) 305-6309
Bernice Fisher
OPPTS
Phone: (703) 305-5959
Fax: (703) 305-5453
Terence Fitz-simons
OAQPS
Phone: (919) 541-0889
Fax: (919) 541-1903
George T. Flatman
ORD/EMSL-LV
Phone: (702) 798-2528
Fax: (702) 798-2208
Nancy Flournoy
American University
Phone: (202) 885-3127
Fax: (202) 885-3155
Douglas R. Freeman
OPPE
Phone: (202) 260-3378
Fax: (202) 260-4968
Michael A. Gansecki
Region 8
Phone: (303) 293-1510
Fax: (303) 293-1724
William V. Garetz
OPPE
Phone: (202) 260-2685
Fax: (202) 260-4968
Jill Gendelman
OPPTS
Phone: (202) 260-0288
Fix: (202) 260-1279
Chap Gleason
OPPE
Phone: (202) 260-9006
Fax: (202) 260-4968
Lynn Goldman
Asst Admin. OPPTS
Phone: (202) 260-2902
Fax: (202) 260-1577
-------
Avi Goldscheider
OPPE
Phone: (202) 260-5136
Fax: (202) 260-4968
Alan R. Goozner
OPPTS
Phone: (703) 308-8147
Fax: (703) 308-8151
Stephen Goranson
EPA, Region 5
Phone: (312) 886-3445
Fax: (312) 886-1515
Wilson L. Haynes
Region 4
Phone: (404) 347-3555
Fax: (404) 347-2130
Jon Helton
Ariz. State Univ.
Phone: (505) 848-0693
Fax: (505) 848-0705
Helen Hinton
EPA
Phone: (919) 541-4618
Fax: (919) 541-1903
Karen Hogan
OPPTS
Phone: (202) 260-3895
Fax: (202) 260-1279
John W. Holley
OAR
Phone: (202) 233-9305
Fax: (202) 233-9557
Helen Jacobs
OW
Phone: (202) 260-5412
Fax: (202) 260-7185
Jennifer Jinot
ORD
Phone: (202) 260-8913
Fax: (202) 260-3803
Henry Kahn
OW
Phone: (202) 260-9408
Fax: (202) 260-7185
Ela Kinowska
Environment Canada
Phone: (819) 953-8948
Fax: (819) 953-9542
Art Koines
OPPE
Phone: (202) 260-4030
Fax: (202) 260-0275
Mel Kollander
Temple University
Phone: (202) 973-2820
Fax: (202) 293-3083
Herbert Lacayo
OPPE
Phone: (202) 260-2714
Fax: (202) 260-4968
Jim Lee
American University
Phone: (202) 885-1691
Fax: (202) 885-2494
Lawrence Leemis
Wm & Nary
Phone: (804) 221-2034
Fax: (804) 221-2988
Eleanor Leonard
OPPE
Phone: (202) 260-9753
Fax: (202) 260-4968
Patricia Little
OPPE
Phone: (202) 260-2679
Fax: (202) 260-4968
Arthur Lubin
EPA
Phone: (312) 886-6226
Fax: (312) 303-4342
Jesse Mabellos
HERL
Phone: (919) 541-3743
Fax: (919) 541-5394
Elizabeth H. Margosches
OPPTS
Phone: (202) 260-1511
Fax: (202) 260-1279
-------
Tim Margulies
OAR
Phone: (202) 233-9774
Fax: (202) 233-0981
Mary A. Marion
OPPE
Phone: (703) 308-2854
Fax: (703) 308-5453
Don Miller
LABO/ENSV
Phone: (913) 551-5156
Fax: (913) 551-5218
David Mintz
OAQPS
Phone: (919) 541-5224
Fax: (919) 541-1903
William L. Monson
Region 8
Phone: (303) 293-0981
Fax: (303) 293-1647
William C. Nelson
ORD/AREAL
Phone: (919) 541-3184
Fax: (919) 541-1486
Barry Nussbaum
OPPE
Phone: (202) 260-1493
Fax: (202) 260-4968
G. Iris Obrams
NCI
Phone: (301) 496-9600
Fax: (301) 402-4279
Wayne Ott
AREAL
Phone: (919) 541-3184
Fax: (919)'541-7588
Gina Papush
Univ. Maryland
Phone: (410) 455-3785
Fax: (410) 455-1066
Barbara Parzygnat
OAQPS
Phone: (919) 541-5474
Fax: (919) 541-1903
Ganapati P. Patil
Penn State Univ.
Phone: (814) 865-9442
Fax: (814) 865-7114
Hugh M. Pettigrew
OPPTS
Phone: (703) 305-5699
Fax: (703) 305-5147
Elizabeth Porter
OPPE
Phone: (202) 260-6129
Fax: (202) 260-4903
William F. Raub
OA
Phone: (202) 260-0486
Fax: (202) 260-3682
Erika Ronca
OAR/ORIA
Phone: (202) 233-9724
Fax: (202) 233-9555
N. Phillip Ross
OPPE
Phone: (202) 260-2680
Fax: (202) 260-8550
Jerry Sacks
HISS
Phone: (919) 541-6255
Fax: (919) 541-7102
Judith B. Schnid
HERL
Phone: (919) 541-0486
Fax: (919) 541-5394
Cheryl Scott
ORD
Phone: (202) 260-5720
Fax: (202) 260-3803
Denise Settles
ORIA
Phone: (202) 233-9704
Fax: (202) 233-9650
R. Woodrow Setzer
HERL
Phone: (919) 541-0128
Fax: (919) 541-5394
-------
Ronald W. Shafer
OPPE
Phone: (202) 260-6966
Fax: (202) 260-4968
Bimal Sinha
Univ. Maryland
Phone: (410) 455-2412
Fax: (410) 455-1066
William P. Smith
OPPE
Phone: (202) 260-2697
Fax: (202) 260-4968
Chris Solloway
OPPE
Phone: (202) 260-3008
Fax: (202) 260-4968
Steve Stodola
Region I OQA
Phone: (617) 860-4634
Fax: (617) 860-4397
James L. Sutton
HERL
Phone: (919) 541-7610
Fax: (919) 541-5394
Lewis Summers
Martin Marietta
Phone: (202) 260-9710
Fax: (202) 260-4968
John Peter Harqo
Yale Univ.
Phone: (203) 432-5100
Fax: (203) 432-5942
John Warren
ORD-NCERQA
Phone: (202) 260-9464
Fax: (202) 260-4346
Chuck White
OW
Phone: (202) 260-5411
Fax: (202) 260-7185
Nathan Wilkes
OPPE
Phone: (202) 260-4910
Fax: (202) 260-4903
Denise Zvanovec
Region 2
Phone: (212) 264-3052
Fax: (212) 264-9695
-------
ABSTRACTS
-------
ABSTRACTS
-------
ABSTRACT
STATISTICAL QUALITY ASSURANCE: DATA QUALITY ASSESSMENT
John Warren & Thomas E. Dixon
Office of Research and Development
Data Quality Assessment (DQA) is the scientific and
statistical evaluation of data to determine if the data are of the
right type, quality, and quantity to support their intended use.
DQA is the conclusion to the Agency's recommended approach for data
collection; Planning (Data Quality Objectives [DQO]),
Implementation (Quality Assurance Project Plans [QAPP]), and
Assessment (DQA), but is possibly the hardest part for non-
statisticians to apply. Guidance (Data Quality Assessment 6-9) is
being developed that will assist non-statisticians investigate some
of the statistical assumptions underlying any data collection
activity.
Similar to the established DQO Process, the DQA Process
consists of iterative steps to investigate data:
Review DQOs and Sampling Design
Conduct Preliminary Data Review
Select the Statistical Test
Verify the Assumptions
Perform the Statistical Test
Some of the steps require only elementary knowledge of
statistics, others can require quite extensive statistical
expertise. The Quality Assurance Management staff offers the G-9
Guidance as a tool for non-statisticians to complement the
guidances for DQOs and QAPPs. The guidance is not intended to be a
comprehensive handbook on statistical quality assurance, but more
of a primer that enables analysts and managers to interpret
statistical conclusions.
The presentation outlines the Agency's position on data
collection activities, gives an overview of the contents of G-9,
and outlines the direction of future work.
-------
Spatial Sampling for Local Estimation:
One-phase, Two-phase, and N-phase Designs
Evan J. Englund
U.S. Environmental Protection Agency
In the absence of prior knowledge, there is no basis for selecting preferential sampling
locations; hence, an optimal one-phase design must necessarily involve a uniform
spatial distribution of sampling locations. In two-phase sampling, estimates based on
data from the first phase are used to concentrate the set of second phase samples in
areas where they are most needed. In N-phase sampling each observation is a phase;
the estimates are updated after each observation and the best location for the next
observation is selected. Algorithms for these three methods have been developed and
their relative performance is compared.
-------
ENVIRONMENTAL MONITORING: NEW ANSWERS FOR OLD QUESTIONS
Abstract: G.T. Flatman,
Often statistics is used and taught as if it were a dead language like Latin
which first killed the Romans and is now killing the students. However, statistics is
alive and well and adding new methods and algorithms. In the last few years, spatial
statistics has rewritten the answers to the ubiquitous questions of (1) how to take
"representative" samples' of assured quality, (2) how to optimize sampling design
(number of samples), and (3) how to make data analysis understandable to decision
making. The cause of the change is "spatial correlation," which is a technical term for
the common sense fact that environmental samples taken close together in space ^
both apt to be high because they come from the same plume area or both low
because they come from the same background area. Varying together is correlation.
This talk will summarize the meaning of: (1) "correct sample" from Gy's Theory for
determining sample mass in heterogeneous media for Quality Assurance, (2) additional
sampling optimization rules-equal probability and equal spacing of samples, and (3)
data analysis ftjmeasure false positives, false negatives, and power for the decision
makers. Statistics did not kill the ancient statisticians and has the potential to make
the life of the Environmental scientist manager a lot easier and more productive
(accurate).
-------
Lawrence Leemis
Survival Analysis
Probabilistic models and statistical methods for the analysis of survival data are
presented. General analytic techniques based on the likelihood function are applied to
the exponential and Weibull distributions. These techniques are illustrated by analyzing
several complete and right-censored data sets.
-------
ENVIRONMENTAL RESEARCH AT NISS
Jerome Sacks
National Institute of Statistical Sciences, PO Box 14162,
Research Triangle Park, NC 27709
Key Words: Risk assessment, environmental monitoring, air pollution, meta-analysis
Environmental research at NISS has centered on air pollution issues and on
approaches to combining studies in the risk assessment of exposure to toxic
substances. The research has featured collaborations among statisticians,
meteorologists-and lexicologists. One emphasis has been on estimating trends in
ozone adjusted for meteorology through the use of nonlinear and nonparametric
models. Data came from monitoring networks in Chicago, a midwest rural region, and
the Gulf Coast, including Houston. There have been several studies reporting a link
between levels of airborne particulars and mortality. A NISS project examining this
question finds results that are inconsistent with previous findings, largely due to earlier
failures to take time of year into account. Approaches to assessing risk associated
with acute inhalation exposure to toxic chemicals from an additional project at NISS,
as does one on extrapolation methods for assessing risk to chronic exposures. Some
specifics of these projects will be reported with a fuller description of the work on
particulates.
-------
STATISTICS AND THE ENVIRONMENTAL SCIENCES
STATISTICAL METHODS FOR COMBINING ENVIRONMENTAL
INFORMATION
Lawrence H. Cox, Walter W. Piegorsch
Lawrence H. Cox, US EPA, MD-75, Research Triangle Park, NC 27711
Key Words: Combining Information, Data aggregation Meta-analysis
An important concern in environmental studies is the need to combine
information from diverse sources that relate to a common endpoint or effect and to
combine environmental monitoring and assessment data. Statistical techniques are
integral to analyses that combine environmental monitoring and assessment data.
These techniques are still under development, however, as modem statistical
methodologies for combining information usually require subject-specific formulations.
Herein, we discuss recent developments and opportunities for statistical research in
combining environmental information.
-------
what TRI releases are in my zip? Via Internet E-mail
By
Chapman Gleason
Abstract
This paper describes a intuitive, cost effective Internet E-
mail interface to a major EPA public released data base, "The Toxic
Release Inventory". The beauty of this interface is that it
returns to the requestor (anyone with an Internet mail-box) a
report telling them the name of the company, the chemicals and the
amount of TRI releases by year in their zip code. Since all of the
american public knows their zip code and/or neighboring zip codes,
and they do not know their latitude/longitude this interface allows
the public to access data via a simple "placed based" interface.
This pilot interface is an example of the EPA Administrators 5 year
strategic plan of making data more available electronically and
empowering citizens via a placed based (zip code) initiative for
ecosystem protection and environmental justice. To use this pilot
system, send an Internet mail message to tris@ipcl.was.epa.gov and
in the body of the message type your 5 digit zip code. The system
returns a report to your Internet mail account.
-------
Childhood Exposure to Complex Mixtures of Pesticides
John Wargo. Ph.D. and Richard Jackson. M.D.
ABSTRACT
Nearly 325 pesticides and 1500 inert ingredients, along with their metabolites, are permitted
to exist as residues in the nation's food supply by the U.S. federal government. The U.S.
Environmental Protection Agency (EPA) judges the health risks of single pesticide residues in food
and drinking water, rather than the mixture of pesticide residues likely to appear in the human diet.
This one-at-a-time approach to regulation has the potential to overlook the potential for toxic effects
from complex mixtures of pesticides. Within this paper, we present a method for estimating
exposure to numerous pesticides in numerous foods which have the common toxicological effect of
inhibiting the enzyme cholinesterase (ChE). We designed a probabilistic computer model to simulate
exposure across 5 pesticides permitted as residues on 11 foods. Using actual food intake data for
2-year olds and residue data collected by FDA. we simulated person day exposures to the complex
mixtures. The method appears to provide a reasonable approach to estimating exposure across
compounds.
-------
The Long Island Breast Cancer Study Project: Environmental Statistics Research
Issues
Ruth H. Allen, Ph. D.. M.P.H. and G. Iris Obrams, M.D., Ph. D.,
National Cancer Institute, Division of Cancer Etiology. Extramural Programs
Branch
Breast cancer statistics point to important opportunities for prevention. For
the last several decades, despite rapid technological advances in detection and
treatment of breast cancer, mortality rates are relatively unchanged and breast
cancer incidence rates have more than doubled since the 1950's. Higher than
average breast cancer incidence rates of over 113 per 100.000 women, and patterns
of increased breast cancer in younger women on Long Island recently received
increased public and congressional attention. Legislation passed by the U.S.
Congress in June. 1993 mandated an intensive study of the etiology of breast
cancer. The study is required to use a geographic information system approach
to integrate a wide range of environmental and health statistics. This
presentation examines the emerging environmental statistics issues for the Long
Island case, including data confidentiality, accuracy and precision of exposure
and dose reconstruction for estimation of past pesticide exposure, and
environmental statistics validation for modeling.
Dr. Allen is on detail froii EPA, Office of Pesticide Programs, Health Effects
Division to the National Cancer Institute. Dr. Obrams is Chief. Extramural
Programs Branch and Director. Long Island Breast Cancer Study Project.
-------
Visual Representation of Statistical Summaries
By
Daniel B. Can-
Center For Computational Statistics
George Mason University
Fairfax, VA 22030
Abstract
This talk addresses the redesign of row-labeled plots and cumulative distribution plots. Both types
of plots are familiar. For example, row-labeled plots include labeled dot plots, bar plots, and
horizontally-oriented distributional summary plots such as box plots. The redesign goal is to show
statistical summaries more effectively than traditional business graphics and to facilitate conversion
of statistical summary tables into plots. The proposed designs use perceptual grouping, sorting, and
layering of information to simpb'fy the appearance of the graphics while incorporating more
information! The resulting row-labeled plots provide templates for re-expressing numerous ose-
factor, two-factor and three-factor tables. Color linking between row-labeled plots and maps
provides a convenient way to show statistical summaries of spatial information. The new cumulative
distribution plots can also be used with maps. Color linking allows, these distribution plots to serve
as legends for classed choropleth maps while providing additional distributional detail.
Talk examples emphasize EPA environmental data summaries. However, the examples shown are
templates for a wide variate of applications. Other government agencies, such as BLS and NASS,
are putting the new templates to work. The S-PIus functions, script files and data for producing the
examples are publicly available via anonymous.
-------
Measuring DO restoration goals by combin-
ing monitoring station and ouoy data
Nagaraj K. Neerchal, Gina Papush it Sanjoy V
Department of Mathematics and Statistics
University of Maryland Baltimore County
Baltimore, MD 21228 USA
Ronald W. Shafer
Environmental Statistics and Information Division, United
States Environmental Protection Agency, Washington, DC
20460 USA
Abstract
Dissolved oxygen (DO) is a major factor affecting the survival, distribution
and productivity of the living resources of the Chesapeake Bay. Target
DO concentrations, with limits to the duration and frequency, ia an impor-
tant element in a program to restore living resources. DO restoration goals
are stated in the December 1992 DO Restoration Goals document. Semi-
continuous data obtained from telemetering can be used to verify these
goals. Since semi-continuous data (buoy data) is available only in a few
specified locations on the Bay, the question of goals verification at the mon-
itoring station needs to be addressed.
DO levels are available, taken approximately every fifteen day*, at each,
of the monitoring stations. Since the DO goals are generally specified in
terms of hours, we cannot verify whether or not a station sit* i« meeting
DO restoration goab based on biweekly DO observation*. To address this
problem, we developed a spectral analysis method that combines the short
term variations of buoy data with the long term variations of the station
data. The spectral analysis method produces a synthetic data that can
predict the likelihood of a station site meeting dissolved oxygen restoration
goals. Generating a synthetic data set ia a substitute for expensive observa-
tional data and may be adequate for management purposes such as strategic
planning and tracking progress towards goals.
-------
Nathan Wilkes
Abstract for Statistics Conference
Science and Information Management
ECOVIEW: Bringing Scientific Information to the Masses
Communication of ideas and information is essential to a functional society,
government, business, or family. Our methods of communication are changing
at an exponential rate. The purpose of the ECOVIEW project is to take a very
old method of conveying locational or directional information (i.e. maps) and
combine it with the incredible array of multi-media electronic communications
capabilities found in computers and digital technologies. The result will be the
development of a convenient tool to convey environmental and socio-economic
information geographically. The greatest challenge for this project is not the
technical issues, but the human or organizational aspects in a continuum of data
development through data analysis to information development and
production. Our success will be measured by our ability to bring together the
unique and varied communities of environmental scientists, statisticians, policy
makers, politicians, and the general public resulting in an effective
communication of the condition of the environment, improved understanding of
the effects of our human behaviors and social policies, and enhanced
opportunities to build consensus.
-------
SCIENCE AND INFORMATION : A FUTURE PERSPECTIVE
Joseph Abe, Environmental Scientist
Futures Staff
Office of Strategic Planning and Environmental Data
The perspective and thinking of western culture have been significantly influenced by
the scientific and industrial revolutions. Citizens of western culture, consciously or
unconsciously, tend to view the world from a compartmentalized, mechanistic, near-
term, human-centered and linear perspective. This worldview affects how we: relate to
each other and nature, create and run organizations, measure societal success and
progress, and develop and use information and technologies. While many wonderful
inventions, accomplishments and life improvements can be attributed to the industrial
age, continued development based on this worldview will inevitably threaten the long-
term sustainability of civilization and the global environment. Creating a sustainable
future for the Earth and its inhabitants requires a new worldview that nurtures
personal and community development, recognizes the interdependence between
humans and nature and fosters democracy, awareness and peaceful coexistence
through global communications. These and other qualities and challenges of the
emerging post-industrial worldview are described to suggest ^merging new roles for
science and information in the twenty-first century.
-------
Environment, Statistics and Public Policy....The ESP Project
A Project of the American University, Washington, DC
Principal Investigators: David Crosby and James Lee
Funded by: U.S. Environmental Protection Agency
Research Tasks Under the ESP Project
This grant funds activities for five years in research on statistics and
environment policy issues, and is a collaboration between the School of
International Service (SIS) and the Mathematics'and Statistics (MAS)
department of American University. Under the agreement, these two
units will research statistics and environmental issues and will attempt to
work together on some particular research questions. The project is funded
under a cooperative grant from the U.S. Environmental Protection Agency.
The research effort is the overall Environment, Statistics and Public
Policy Project (ESP). The ESP Project consists of a trade and other
environmental issues. Of particular concern to the project are the public
policy issues of importance and the manner in which public policy makers
use data, statistics and other types of information. There are five tasks to
begin the first year of the ESP research grant.
Task 1: Issues of Measurement in International Treaties: Getting
the Numbers Correct, David Crosby and James Lee
Task 2: Getting Information to Policy-Makers: The Trade-Environment
Interagency Project, James R. Lee
Task 3: Case Studies in the Americas on Trade and the Environment
James R. Lee
Task 4: Satellite Intercalibration, David S. Crosby
Task .5: Up and Down Design, Nancy Floumoy
-------
ENVIRONMENTAL TOBACCO SMOKE (ETS) STATISTICS -
EPA VS. THE TOBACCO INDUSTRY FROM EPA's VIEWPOINT.
Steven Bayard and Jennifer Jinot
The U.S. EPA in 1993 concluded that exposure to ETS causes approximately
3,000 lung cancer deaths in U.S. nonsmokers annually, a finding which has been
strongly attacked by the tobacco industry and its consultants, both in the scientific
literature (e.g. Newspaper Advertisements, Washington Times' editorials, Investors'
Business Daily, Congressional Record), and in Federal Court (Middle District Court of
North Carolina). Both parties rely at least partly on statistical analyses of epidemiology
data to support their claims. These claims will be examined by looking at what the
statistics really say. This unbiased analysis will be presented by the Project Officer and
co-author of the EPA report.
-------
ADAPTIVE DESIGNS
Nancy Floumoy: American University, Washington, DC
Recent advances in adaptive designs for dose-response problems are presented. We
focus on sequential designs that (1) center the treatment distribution around a
prespecified target quantile given a monotone response function and (2) concentrate
observations at the level yielding maximum probability of success given two opposing
response functions. These designs could be used to define toxic thresholds in terms of
outcomes instead of particle concentration. They also may provide a useful control
mechanism, even when the target levels change with time. We are interested in
identifying specific applications of these designs to environmental problems
-------
PROVIDING INFORMATION TO DECISION MAKERS
TO PROTECT HUMAN HEALTH AND THE ENVIRONMENT
EPA's senior management has recently completeda stategic plan for management of the
Agency's information resources. The IRM strategic plan's mission and vision were based upon
the Agency's 7 operating principles as documented in the Agency's Five Year Strategic Plan.
The plan was developed under the guidance of the Agency's Executive Steering Committee
(ESC) for IRM, whose members include the Agency's Assitant Administrators, Associate
Administrators, the IG, the General Counsel, four Regional Administrators, and five State
environmental department executives. The ESC received considerable input into the plan from
an external committee of stakeholders and numerous program and IRM staff in the Agency.
EPA's Strategic Plan for Infromation Resources Management establishes a far-reaching
and challanging mission and vision for all parts of the Agency. It defines several difficult key
operating principles and core implementation strategies that must be implemented to achieve the
vision. Finally it provides for the establishment of a program to measure IRM performance.
Change has already begun in the Agency's management of IRM as a result of this
planning effort. The Agency committed $6M to implmentation efforts in FY95 and has
requested over S1SM for implementation efforts in FY96. Additionally, subcommittees of the
ESC have begun to review EPA's current IRM base of S300M to ensure the Agency is
progressing toward the vision established. Most importantly information management issues are
now on the agenda of the Agency's top management which will lead to fundamental
improvements.
Mark Day
-------
Mark Day
Since 1992 Mr. Day has been employed by U.S. EPA in the Office of Administration and
Resources Management. Since 1994 he has served as the manager of the IRM Planning Group.
The IRMPG is a cross divisional group of the Office of Information Resources Management
formed to deal with the IRM Planning material weakness declared in 1993.
Prior to coming to EPA, he was employed by the Missouri Department of Natural
Resource. From 1986-1992 he served as the Chief Information Officer for the Environmental
Quality Division of the Department. In this capacity he oversaw development of state
environmental systems. His use of advanced techniques and innovative approaches was
recognized in a case study by Harvard's Kennedy School of Government. His work emphasized
long range planning for efficient use of government resources as well as innovative use of
information to improve effectiveness of environmental protection in Missouri. Mr. Day also led
the Division's long range planning effort on environmental issues in the 21st century.
From 1983 until 1986 Mr. Day was the Director of the Residential Energy Program,
which included the low-income weatherization program, solar bank program, and other state and
federal programs operated by the State of Missouri for the purpose of conserving energy. He was
recognized for innovative use of technology to redu,:: administrative costs and improve customer
service. He served as special assistant to the Department Director for a study of environmental
permitting processes and cycle times.
Prior to 1983 Mr. Day worked at a local community action agency (Missouri Ozarks
Economic Opportunity Corporation) as the Director of the Residential Energy Dept where he
directed the weatherization, solar, and emergency energy programs for low-income citizens in 8
county area of central Missouri. He completed the states' first automation of weatherization
inventory tracking and client scheduling processes. His programs were recognized for high
innovation and productivity.
Mr. Day graduated in 1977 - Magna Cum Laude with 3.849 GPA with a B.A. in History
and Political Science from Southwest Baptist University, a small liberal arts college. He has
completed numerous studies in business and information resources management.
-------
New Sources of Environmental Data: Testing Out the Latest Aerial and
Satellite .Sensing at the "Field of Dreams"
Elizabeth D. Porter, Environmental Results Branch, OPPE
The Field of Dreams is a test site that was established to study the
effectiveness of remotely sensed wetland identification techniques,
specifically in forested environments. It would have been more accurate to
call it the "forest of dreams", (a forest of no dramatic terrain relief, no
uniquely wetland-indicative vegetation, and with varying but usually limited
periods of soil saturation or inundation.) These "driest" of wetlands are the
toughest to identify. Their boundaries are often fuzzy gradations, difficult to
delineate even from ground observation. The objective of the project is to
identify and advance remote sensing techniques to support the inventorying
of wetland resources, primarily in support of the National Wetland
Inventory (NWI) program administered by the US Fish and Wildlife Service
(FWS).
"Delineate it, and monitor it; and the technology agencies shall come."
The site is (and will continue to be) probably one of the most heavily imaged
and data-rich wetland sites in the U.S. Technology developers from the
commercial sector, via the NASA commercial remote sensing program, the
Department of Energy technology program, and fht ;ntelligence community
have all offered support for trying to solve the forested wetland identification
problem. The Field of Dreams consists of ten ground test sites in transitional
wetland-upland forests in Wango Quad, Wicomico County, MD, on the
eastern shore of the Chesapeake Bay.
The project is a result of two major interagency initiatives: the first, a
Federal Geographic Data Committee (FGDC) study on wetland mapping
programs; and the second, the review of intelligence community (1C) data by
the environmental science community. The presentation provides
background on the project and these two preceding initiatives, as well as
explains some of the challenges relevant to structuring the experiment and
planning for the spring 1995 data collections. The project intends to identify
source and compilation techniques that can improve the accuracies for NWI
products (maps, status and trend plots.) These wetlands are an environmental
problem area with major scientific, technological and policy implications.
Forested wetlands were the cover type selected for study, not only because
they are the most difficult to discern, but because they are the wetland cover
type which has experienced the greatest losses in recent years.
-------
A GEOGRAPHIC VISUALIZATION OF ENVIRONMENTAL QUALITY: A
CASE STUDY OF FECAL COLIFORM BACTERIA IN SURFACE WATERS
IN THE US/MEXICO BORDER AREA
J. Calem. N. Cortina, J. Crawford, D. Freeman, A. Goldscheider, R. Shafer
U.S. Environmental Protection Agency
T. Ngo, L. Summers
Martin Marietta
Maps are excellent tools to display relational data in a spatial frame. The
US/Mexico border area, defined as an area within 100 km of either side of the
international boundary, is presented as a regional case study of environmental quality.
A series of maps and additional county-level attribute data provide insight into the
presence, sources and implications of one pollutant, fecal coliform bacteria, in the
U.S. border waterways. Fecal coliform bacteria is an agent for many human
communicable diseases, such as typhoid fever, hepatitis a and dysentery. Its
presence in the environment is associated with unsanitary conditions and human and
animal waste. The discussion utilizes the "Pressure-State-Response-Effects" (PSR/E)
macro-framework to associate ambient concentrations of fecal coliform bacteria, land-
use pressures affecting concentrations, human exposure and health effects. In the
future, a larger study will assess and characterize environmental quality on both sides
of the international boundary, taking into account other environmental pollutants and
media.
-------
Building Environmental Data Management & Analysis Capabilities
in the Great Lakes Region & Baltic Republics
Stephen K. Goranson
Deputy Chief, Information Management Branch, EPA Region 5
This presentation focuses on recent experiences in two programs,
the Great Lakes and the Baltic Republics, both requiring improved
capacity for collecting and assessing data, for transforming the
data into useful information, and for their dissemination.
The Great Lakes Program, which includes a wide variety of partners
(public, private, and binational), developed a strategic
information plan which will allow easy access to environmental
data (chemical, biological, habitat, and human health) and to
provide sound environmental indicators. The initial system
concept includes (l) a repository of all Great Lakes monitoring
data conforming to established standards; (2) the ability to link
the monitoring data to diverse sets of environmental assessment
data for comprehensive analysis of ecosystem status and risk; and
(3) the ability to identify sources of other Great Lakes data and
pathways to that data, including EPA's corporate, network links to
other data outside EPA, and electronic document reporting and
retrieval.
The experiences gained during the past few years in implementing
the Great Lakes Strategy served as a useful model for subsequent
assistance to the environmental ministries of the Baltic
Republics. As part of its commitment in the cooperative
agreements with the Environmental Ministries of Estonia, Latvia,
and Lithuania, USEPA is providing technical assistance to the
management and integration of environmental monitoring data.
Specialists from the participating countries are working jointly
to assess program priorities to evaluate monitoring information
needs, define optimum approaches for gathering, processing, and
analyzing environmental data needed to support management
decisions. USEPA is-providing the basic capacity and training to
efficiently management environmental data and administer
environmental programs.
This discussion will relate the two geographic areas in terms of
environmental assessment-data needs, inventory data and models,
data attributes and their quality, data management practices and
standards, indicators, and information reporting/exchange.
-------
ALTERNATIVE MODELS FOR ANALYSIS
OF
COMPOSITE ENVIRONMENTAL SAMPLES
BY: Henry D. Kahn, George Zipf and Alan Unger
This presentation will consider the use of segmented and non-segmented
composite sampling models in environmental field studies. Composite sampling has
many advantages in environmental work, particularly as a cost-effective method of
obtaining estimates of the mean. Estimates of the mean and the variance from
composite samples for segmented data and non-segmented data will also be
addressed. Data on field measurements of contaminants in a number of fish species
will be used to illustrate the discussion, In addition, the effectiveness of compositing
versus individual measurements will be evaluated on the fish effectiveness of
compositing.
-------
ESTIMATES OF FISH CONSUMPTION RATES IN THE UNITED STATES
Co-authors: Helen Jacobs, EPA
Henry Kahn, EPA
Kathleen Stralka, SAIC
Estimates of fish consumption in the U.S. based on USDA's combined 1989,
1990,1991 Continuing Survey of Food Intake by Individuals (CSFII) will be presented.
Fish consumption estimates play an important role in a number of EPA programs. In
particular, exposure estimates used in determining water quality criteria and related
standards are based in part on the amount offish consumed and contamination levels
in the fish. This presentation will provide an update on fish consumption estimates by
habitat (marine, estuarine and freshwater) and species including the most recent CSFII
data. Estimates will be presented for the total U.S. and by geographic region.
-------
WATER QUALITY BASED EFFLUENT LIMITATIONS
AND THE STATISTICAL PROPERTIES OF LOW
CONCENTRATION MEASUREMENTS IN ANALYTICAL CHEMISTRY
Chuck White and Henry Kahn
Water quality limitations for specific chemicals are sometimes set at
concentrations below EPA's current criteria for detection. This paper will
discuss the statistical properties of chemical analytical measurements ar
low concentrations with regards to the concepts of detection and
quantification in analytical chemistry and the requirements of water quality
based effluent limitations for industrial dischargers.
-------
TREATMENT OF UNCERTAINTY IN PERFORMANCE ASSESSMENTS
FOR COMPLEX SYSTEMS
Jon C. Helton
Department of Mathematics
Arizona State University
Tempe.AZ 85287-1804
ABSTRACT
When viewed at a high level, performance assessments (PAs) for complex systems
involve two types of uncertainty, stochastic uncertainty, which arises from the fact that a
number of different occurrences have a real possibility of taking place, and subjective
uncertainty, which arises from a lack of knowledge about quantities required within the
computational implementation of the PA. Stochastic uncertainty is typically
incorporated into a PA with an experimental design based on importance sampling and
leads to the final results of the PA being expressed as a complementary cumulative
distribution function (CCDF). Subjective uncertainty is usually treated with Monte Carlo
techniques and leads to a distribution of CCDFs. This presentation discusses the use
of the Kaplan/Garrick ordered triple representation for risk in maintaining a distinction
between stochastic and subjective uncertainty in PAs for complex systems. The topics
discussed include (1) the definition of scenarios and the calculation of scenario
probabilities and consequences, (2) the separation of subjective and stochastic
uncertainties, (3) the construction of CCDFs required in comparisons with regulatory
standards (e.g., 40 CFR Part 191, Subpart B for the disposal of radioactive waste), and
(4) the performance of uncertainty and sensitivity studies. Results obtained in a
preliminary PA for the Waste Isolation Pilot Plant, an uncertainty and sensitivity
analysis of the MACCS reactor accident consequence analysis model, and the
NUREG-1150 probabilistic risk assessments are used for illustration.
-------
STATISTICAL ANALYSIS OF RISK AND PA RESULTS
Tim Margulies U.S. Environmental Protection Agency
Washington, DC 20460
Bimal Sinha: University of Maryland, Baltimore County
Dept. of Mathematics and Statistics
Baltimore, Maryland 21228
Performance and risk analyses provide useful quantitative information to evaluate a
technology or activity and the level of safety needed to protect the environment.
Uncertain models, data, and future events and processes are explicitly considered to
generate probabilistic results. This paper presents an overview of probabilistic
modelling approaches for estimating the likelihood of human intrusion via exploratory
drilling at the WIPP (Waste Isolation Pilot Plant) in New Mexico and several illustrative
calculations. Furthermore, a statistical approach based on hypothesis testing is
investigated to determine compliance with a probabilistic standard (such as
the"containment standard" in the radioactive waste management regulations).
Potential to other areas of environmental energy regulation and decision-making, such
as reactor design modification will also be discussed.
-------
using Relative Data Quality Indicators
of Precision and Bias
D. Miller, Region 7
Improvement in the implementation of data quality evaluation
and in the use of data for decision making can be achieved by
addressing three problems: first,'the nature of environmental
data; second, the general decision-making procedures; and third,
decision-making in the specific situation where the numbers
involved are slightly larger than zero.
Environmental data often confounds the methods of
traditional statistics, because the numerical values within a
single data set can span many orders of magnitude. As a result,
the standard deviation of the measurement process is not constant
over the range of observed values.
Decision-making using hypothesis testing can be confusing to
the stake-holders. The traditional null hypothesis requires
convoluted logic. Furthermore, environmental data can be
complex. As a result, valid procedures for environmental
decisions provided by traditional statistics are often
restrictively narrow in scope and are often incomprehensible to
the average stake-holder.
Traditional statistical procedures work well for two
situations: first, the standard deviation is a constant, and
second, the coefficient of variation is a constant. When the
true value is zero, the standard deviation is constant, and when
the true value is large, the coefficient of variation is
constant. Traditional statistical procedures do not work well in
the specific region between "zero" and "large".
The three problems may be resolved with a unified approach.
First, the approach uses expressions for data quality indicators
that are applicable'to the entire range of observed values, are
internally consistent, and use as data the results of widely
accepted types of QC samples. Second, environmental decisions
are made using the direct test, which is simple and
understandable. And third, the precision of the entire range of
observed values, from "zero" to "very large", is unified in the
hyperbolic model.
This presentation will describe a consistent set of
expressions for data quality indicators, describe decision making
using the direct test, and describe the hyperbolic model. This
will be followed with examples of the unified approach.
-------
ASSESSMENT OF THE U.S. EPA IEUBK MODEL PREDICTION OF ELEVATED
BLOOD LEAD LEVELS. K. A. Hogan, R.W. Elias, AH.Marcus, P.O. White. U.S.
Environmental Protection Agency, Office of Pollution Prevention and Toxic
(Washington, DC) and Office of Health and Environmental Assessment (Research
Triangle Park, NC and Washington, DC)
The Integrated Exposure, Uptake, and Biokinetic (IEUBK) Model for Lead in
Children, which was designed to predict the proportion of a population of children with
elevated blood leads (Ig/dL) on a site-specific basis, was examined for its use as a
risk assessment tool for regulatory purposes. This was carried out with existing data
sets relating environmental and blood lead levels on a per individual basis, by using
the IEUBK Model to generate blood leads predictions from the measured
environmental lead levels. These predicted blood lead levels were then compared
with the measured blood lead level, by comparing geometric mean blood leads and
proportions observed or expected to have elevated blood lead levels. All studies used
for this examination had data of sufficient quality and quantity to characterize the
environmental lead leve's in each residential home and yard (i.e.., for each
participant: blood lead;; soil, dust water, interior and exterior paint lead; and
demographic/behaviorsurvey data covering other aspects of lead exposures). The
model results and observed blood lead levels were reasonably concordant and
similar population proportions with elevated blood lead level.
-------
HUMAN EXPERIENCES FOR JUDGING PREDICTIONS
FROM ANIMAL CANCER MODELS
Cheryl Siegel Scott. US Environmental Protection Agency, Office of Health and
Environmental Assessment (8602). Washington, DC
The use of epidemiologic data is preferred for basing inferences about human
cancer risks. Only rarely are complex statistical models such as the excess relative
risk, absolute risk, and relative risk models fitted to cohort data since such data either
are often not readily at hand or are considered to be of insufficient quality. By default,
projections of human cancer risks are based on animal bioassay data.. Adopting a
philosophy of choosing one set of data over another is narrow and throws away
information. This talk proposes that both animal and human data support an evaluation
of cancer risk. General methods are reviewed for using human data to gauge the
accuracy of animal-based cancer risk estimates. Examples using epidemiologic
information on exposure to formaldehyde, methylene chloride, and trichloroethylene
illustrate discussed principles.
-------
She Lover itio Grande Valley Environmental Monitoring Study*
Applying Human Exposure Science to Public Health Concern*
Gerald Akland
U.S. Environmental Protection Agency
Atmospheric Research end Exposure Assessment Laboratory
Research Triangle Park, we 27711
An environmental monitoring investigation vae initiated in the
Lover Rio Grande valley in response to valley residents' concerns
about the potential link betvaen their health and pollution.
Potential sources of contamination include industrial emissions,
agricultural pesticide nee, and inadequate infrastructure, fiat the
cope and magnitude of the implications of the resulting pollution
for the local population have not been documented, Exposure is
often the missing link in the effort to evaluate- environmental
health risks and an understanding of human exposure is essential
for developing effective risk reduction policies. A field pilot
provided preliminary data about the levels, sources, and pathways
of actual human exposure in the Valley. Specifically, samples of
indoor and outdoor air, house dust, soil/ food, drinking vater,
urine, .blood and breath samples vere collected and analysed for
metals, VOCs, PAHs, and pesticides. This poster presents the
design of the study, the concepts underlying the design, the
perceived utility and value of the approach chosen, the field
implementation methods, and the Implications of the experiences
gained through this process. This project has the potential to set
a new model for environmental health research which integrates
public health concerns, exposure reduction, illness prevention/ and
regulatory activities of many agencies.
-------
Presentation of 1993 National Air Quality Data
David Mintz
U.S. Environmental Protection Agency
Office of Air Quality Planning and Standards
Research Triangle Park, NC 27711
Last October, EPA released its twenty-first annual report
documenting national air pollution and emissions trends. The
National Air Quality and Emissions Trends Report highlights six
pollutants for which standards have been set and tracks how well
areas are doing to meet those standards. I will use one of the
pollutants, PM-10, as an example to demonstrate various ways we
present the data.
-------
APPLICATION OF GEOGRAPHIC INFORMATION SYSTEMS TO THE
ASSESSMENT OF FECAL COLIFORM BACTERIA IN
SURFACE WATERS IN THE US/MEXICO BORDER AREA
Lewis Summers
Martin Marietta
The U.S. Environmental Protection Agencies Office of Policy, Planning and Evaluation
Environmental Statistics and Information Division is conducting a characterization report of
surface water quality within the U.S./Mexico border region. A computerized geographic
information systems (GIS) is utilized to manipulate, manage and display statistical results of
one pollutant, fecal coliform. Fecal coliform bacteria is an agent for many human
communicable diseases, such as typhoid fever, hepatitis a and dysentery. Its presence in the
environment is associated with unsanitary condition and human and animal waste. The GIS
can visually portray study results and also superimpose various other spatial data sets for
further analysis. In the future, a larger study will assess and characterize environmental
quality on both sides of the border, taking into account other environmental pollutants and
media.
-------
Ceo
n
INNOVATIVE ENVIRONMKNTAL RESOURCE SAMPLING AND ASSESSMENT I
-------
d distribution* lake into at count the observer-observed interface and provide a promising
approach tor the problems of ascertainment in environmental resource as&cssmem.
The 1995 HI1 A Statistics Conference presentation will discuss current research and outreach work
on environmental sampling with observational economy in progress under a Penn Siate-KPA
OPPE ESID Cii-uuci alive Agreement.
References
Gore, S. D., and Patil, G. P. (1994). Identifying extremely large values using composite sample
dam. Environmental and Ecukijticul Statistics. 1(3), (Ui appear).
Gore, S. D., Paul, G. P., Sinha, A. K,, and Taillic, C. (1V93). Certain muliivariaic considerations
in ranked set sampling and composite sampling designs. lu Multivartate Environmental Statistic*,
G. P. Patil and C. R. Rao, eds. North Holland, Amsterdam, pp. 12M48.
Gove, J. H., Patil, G. P., Swindel, B. K, and Taillie, C. (1994). Ecological diversity and forest
management. In Handbook of Statistics, Volume 12: Environmental Statistics, G. P. Patil and C.
R. Rao, eds. North Holland, Amsterdam, pp. 409-462.
Myers. W. L., Johnson. 0. D., and Patil, G. P. (1994). Rapid mobilization of spatial/temporal
information in the context of natural catastrophes. Invked paper presented at 1W4 Spring
Statistical Meetings in Cleveland, Ohio. IW4 ASA Proceedings (to appear).
Myers, W. L., and Paul, G. P. (1994). Simplicity, efficiency, and economy in forest surveys.
'Mtschrift fur r-'orstwesen (lo appear).
Myers, W. 1... Patil, G. P., and Taillie (1994). Comparative paradigms for biodiversity
assessment Invited paper at the IUHRO Symposium in Chiang-Mai, Thailand. To appear in the
Proceedings Volume.
Patil. G. I'., Gore, S. U.. and Sinha, A. K. (1993). Environmental chemistry, statistical modeling.
and observational economy. In Environmental Statistics, Assessment, and Forecasting, C. R.
Cothern and N. P. Ross, eds. Lewis PubUCRC Press, Boco Raton, PL. pp. 57-97.
Patil, G. P., and Rao, C. R. (eds). (1993). Multivariatt Environmental Statistics. North-Holland,
Amsterdam. 596 pp.
Patil, G. P., and Rao, C. R. (eds). (1994). Handbook of Statistics, Volume 12: Environmental
Statistics. North-Holland, Amsterdam, pp. 927.
Patil, 0. P.. Sinha, A. K., and Taillie, C. (1994). Ranked set sampling. In Handbook of
Statistics, Volume 12: Environmental Statistic,*, C. P. Pali) and C. K. Rao. eds. North Holland,
Amsterdam, pp. 167-21X).
-------
Pali I, G. P.,- and Taillie, C. (1993). linvironmental sampling, observational economy, and
statistical inference with emphasis on ranked set sampling, encounter sampling, and composite
sampling. In Hull. ISl, Proceedings of 49th Session, Firenze, Italy, pp. 295-312.
Paul, G. P., Taillie, C., and Talwalker, S. (1993). Encounter sampling and modelling in
ecological and environmental studies using weighted distribuiion methods. In Statistics for the
Environment, V. Banictt and K. K Turkman, eds., Wiley, New York. pp. 45-69.
Thompson, S. (1994),. Factors influencing die efficiency of adaptive cluster sampling. Technical
Report 94-0301, Center for Statistical Ecology and linviroiimentol Statistics. Department of
Statistics, Pennsylvania State University, University Park, PA.
-------
Time Series Tutor - Beta Version
Nagaraj K. Neerchal et. al
The Times Series Tutor is an ambitious, and from our experience, a
novel effort to introduce data oriented scientists into the heart
of the problems associated with time series. This tutorial is
aimed at those who want a serious but accessible introduction into
the mechanics of time series modeling. The TST is a self contained
computer program that allows one to proceed-at their own pace. The
current version to be dis'played still requires considerable work.
-------
Taefc-4
Satellite Int«rcalibration
David s. Crosby
J
One of the most important problems in the use of satellite
data for the detection of climate trends is that of satellite
intercalibration. It is well known, for example, that different
instruments in the same series of satellites can have slightly
different characteristics. The same temperature sensing channels
on successive satellites can differ by over 2.0 degrees celsius.
These differences can lead to inconsistencies in the time series
and can make the use of satellite data for the detection of climate
trends difficult. The standard method for modeling these effects
in time series is intervention analysis. For many of the satellite
data sets this may be the best or only technique available«
However, for some of the satellite data seta there is a
significant period of overlap between the two successive
satellites. We examine the use of this overlap period to
intercalibrate the two instruments. The technique uses the
empirical distribution functions. It requires very large sample
sizes, the probability of the signal of interest for the two
satellites is the same and that th« measurements for both
satellite* are monotone functions of the same signal. Examples of
the technique will be presented.
-------
ABSTRACT
STATISTICAL QUALITY ASSURANCE: DATA QUALITY ASSESSMENT
John Warren & Thomas E. Dixon
Office of Research and Development
Data Quality Assessment (DQA) is the scientific and
statistical evaluation of data to determine if the data are of the
right type, quality, and quantity to support their intended use.
DQA is the conclusion to the Agency's recommended approach for data
collection; Planning (Data Quality Objectives [DQO]),
Implementation (Quality Assurance Project Plans [QAPP]), and
Assessment (DQA), but is possibly the hardest part for non-
statisticians to apply. Guidance (Data Quality Assessment G-9) is
being developed that will assist non-statisticians investigate some
of the statistical assumptions underlying any data collection
activity.
Similar 'to the established DQO Process, the DQA Process
consists of iterative steps to investigate data:
Review DQOs and Sampling Design
Conduct Preliminary Data Review
Select the Statistical Test
Verify the Assumptions
Perform the Statistical Test
Some of the steps require only elementary knowledge of
statistics, others can require quite extensive statistical
expertise. The Quality Assurance Management Staff offers the G-9
Guidance as a tool for non-statisticians to complement the
guidances for DQOs and QAPPs. The guidance is not intended to be a
comprehensive handbook on statistical quality assurance, but more
of a primer that enables analysts and managers to interpret
statistical conclusions.
The presentation outlines the Agency's position on data
collection activities, gives an overview of the contents of G-9,
and outlines the direction of future work.
-------
GUIDANCE FOR ENVIRONMENTAL
DATA QUALITY ASSESSMENT
DRAFT
EPA QA/G-9
DRAFT
United States Environmental Protection Agency
Quality Assurance Management Staff
Washington, DC 20460
-------
The 5 Steps of the Data C Jity Assessment Process
1. Review the Data Quality Objectives and Sampling Design: Review the DQO outputs
to assure that they are still applicable. If DQOs have not been developed, specify
DQOs before evaluating the data (for environmental decisions, define the statistical
hypothesis and specify tolerable limits on decision errors; for estimation problems,
define an acceptable confidence or probability interval width). Review the sampling
design and data collection documentation for consistency with the DQOs.
2. Conduct a Preliminary Data Review: Review quality assurance reports, calculate basic
statistical quantities and generate graphs of the data. Use this information to learn
about the structure of the data and identify patterns, relationships, or potential
anomalies.
3. Select the Statistical Test: Select the most appropriate procedure for summarizing and
analyzing the data, based on the preliminary data review. Identify the key underlying
assumptions that must hold for the statistical procedures to be valid.
4. Verify the Assumptions of the Statistical Test: Evaluate whether the underlying
assumptions hold, or whether departures are acceptable, given the actual data and other
information about the study.
5. Perform the Statistical Test: Perform the calculations required for the statistical test
and document the inferences drawn as a result of these calculations. If the design is to
be used again, evaluate the performance of the sampling design.
-------
Overview 1: Review DQOs and Sampling Design
Translate the data user's objectives into a statement of the primary statistical hypothesis.
If DQOs have not been developed, review section B.2.1, B.2.2, and Table 1-1, then
develop a statement of the hypothesis based on the data user's objectives.
If DQOs were developed, translate the DQO Process outputs into a statement of the
primary hypothesis that corresponds to the data user's decision.
Translate the data user's objectives into tolerable limits on the probability of committing Type
I or Type II decision errors.
If DQOs have not been developed, review section B.2.3 and document the data user's
tolerable limits on decision errors.
If DQOs were developed, confirm that the data user's tolerable limits on decision errors
were fully specified.
Review the sampling design and note any special features or potential problems.
Review the applicable parts of section C corresponding to the type of sampling design
used for this study.
-------
Overview 2: Conduct Preliminary Data Review
Review quality assurance reports.
Look for problems or anomalies in the implementation of the sample collection and
analysis procedures.
Examine QC data for information that may be useful in verifying assumptions
underlying the Data Quality Objectives, the Sampling and Analysis Plan, and the
Quality Assurance Project Plans.
Calculate the statistical quantities.
Select appropriate measures of central tendency (Box 2-2) and dispersion (Box 2-
4).
Consider calculating appropriate percentiles (Box 2-1) and measures of
distributional shape (Box 2-6).
If data involve two variables, calculate the Pearson correlation coefficient (Box 2-7).
Display the data using graphical representations.
Select graphical representations from section C that illuminate the structure of the
data set and highlight assumptions underlying the Data Quality Objectives, the
Sampling and Analysis Plan, and the Quality Assurance Project Plans.
Use a variety of graphical representations that examine different features of the set.
-------
Overview 3: Select the Statistical Test
Select the statistical hypothesis test based on the data user's objectives and the results of
the preliminary data review.
If the problem involves comparing study results to a fixed threshold, such as a
regulatory standard, consider the hypothesis tests in section B.I.
If the problem involves comparing two populations, such as comparing data from two
different locations or processes, then consider the hypothesis tests in section B.2.
Identify the assumptions underlying the statistical test.
List the key underlying assumptions of the statistical hypothesis test, such as
distributional form, dispersion, independence, or others as applicable.
Note any sensitive assumptions where relatively small deviations could jeopardize the
validity of the test results.
-------
Overview 4: Verify the Assumptions of the Statistical Test
Determine approach for verifying assumptions.
Identify any strong graphical evidence from the preliminary data review.
Review (or develop) the statistical model for the data.
Select the tests for verifying assumptions.
Perform tests of assumptions.
Adjust for bias if warranted.
Perform the calculations required for the tests selected in activity 4.1.
If necessary, determine corrective actions.
Determine whether data transformations will correct the problem.
If data are missing, explore the feasibility of using theoretical justification
or collecting new data.
Consider robust procedures or nonparametric hypothesis tests.
-------
Overview 5: Perform the Statistical Test
Perform the calculations for the statistical hypothesis test.
Perform the calculations and document them clearly.
If anomalies or outliers are present in the data set, perform the calculations with and
without the questionable data.
Evaluate the statistical test results and draw conclusions.
If the null hypothesis is rejected, then draw the conclusions and document the
analysis.
If the null hypothesis is not rejected, verify whether the tolerable limits on false
negative decision errors have been satisfied. If so, draw conclusions and document
the analysis; if not, determine corrective actions, if any.
Evaluate the performance of the sampling design if the design is to be used again.
Evaluate the statistical power of the design over the full range of parameter values;
consult a statistician as necessary. .
-------
TABLE OF CONTENTS
Page
INTRODUCTION 0-1
Purpose and Overview 0-1
Intended Audience 0-1
Organization of (his Guidance . . 0-1
Overview of the DQA Process 0-2
The 5 Steps of the Data Quality Assessment Process 0-2
DQA and the Data Life Cycle 0-3
Background 0-4
Errors Due to Imperfect Sampling and Measurement 0-4
Decision Errors 0-5
Hypothesis Testing 0-6
Uncertainty vs. Inconclusive 0-6
STEP 1: REVIEW DQOs AND THE SAMPLING DESIGN 1-1
Overview 1-2
A. Activities 1-2
Translate the data user's objectives into a statement of
ihe. primary statistical hypotheses 1-2
Translate the data user's objectives into tolerable limits on the
probability of committing Type I or Type II decision errors 1-3
Review the sampling design and note any special features
or potential problems '. 1-3
B. The Data Quality Objectives (DQO) Process 1-4
B.I Relationship Between DQOs and DQA 1-4
B.2 Developing DQOs Retrospectively 1-5
B.2.1 Defining the Background of the Data Collection Retrospectively . 1-6
B.2.2 Developing the Statement of Hypotheses 1-9
B.2.3 Specifying Tolerable Limits on Decision Errors
C. Designs for Sampling Environmental Media in
Space and Time 1-11
C.I Authoritative Sampling Versus Probability Sampling 1-11
C.2 Probability Sampling 1-12
C.2.1 Simple Random Sampling 1-12
C.2.1 Sequential Random Sampling and Double Sampling:
Variations on Simple Random Sampling
C.2.3 Systematic Samples
C.2.4 Stratified Samples
C.2.5 Other Probability Samples
C.2.6 Compositing and Subsampiing of Specimens
-14
-15
-15
-16
-17
EPA QA/G-9
DRAFT
-------
Page
D. References ................................................. M8
STEP 2: CONDUCTING A PRELIMINARY DATA REVIEW ...................... 2-1
Overview ............... ..... .................................... 2-2
A. Activities [[[
Review quality assurance reports ......................... 2-2
Calculate statistical quantiues ................................... 2-3
Graph the data ............................................... 2-3
B. Statistical Quantities .......................................... 2-4
B.I Measure of Relative Standing Percentiles ...................... 2-5
B.2 Measures of Central Tendency ............................... 2-5
B.3 Measures of Dispersion ................................... 2-6
B.4 Measures of Shape ...................................... 2-11
B.5 Measures of Association .................................. 2-12
C. Graphical Representations ..................................... 2-13
C.I Stem-and-Leaf Diagram .................................. 2-14
C.2 Histogram/Frequency Plots ................................ 2-15
C.3 Box and Whiskers Plots .................................. 2-19
C.4 Ranked Data Plot ....................................... 2-20
C.5 Quantile Plot .......................................... 2-23
C.6 Normal Probability Plot (Quantile-Quantile Plots) ................ 2-25
C.7 Plots for Temporal Data ........... , ....................... 2-28
C.7.1 Time Plot ....................................... 2-32
C.7.2 Plot of the Autocorrelation Function (Correlogram) ......... 2-33
C.7.3 Other Temporal Graphical Representations ............... 2-34
C.7.4 Multiple Observations Per Time Period .................. 2-36
C.8 Plots for Spatial Data .................................... 2-36
C.8.1 Posting Plots .................................... 2-37
C.8.2 Symbol Plots .................................... 2-38
C.8.3 Other Spatial Graphical Representations ........ . ........ 2-39
C.9 Plots for Two or More Variables ............................ 2-40
C.9.1 Scatter Plot ..................................... 2-41
C.9.2 Extensions of the Scatter Plot ......................... 2-42
C.9.3 Empirical Quantile-Quantile Plot ...................... 2-46
D. References ................................................. 2-48
STEP 3: SELECT THE STATISTICAL TEST ................................. 3-1
Overview [[[ 3-2
A. Activities [[[ 3-2
-------
Page
B. Hypothesis Tests 3-3
B.I One-Sample Tests 3-3
B 1.1 Tests for a Mean 3-4
B.I.I.I The One-Sample T-Test 3-4
B.I.1.2 The Wilcoxon Signed Rank Test 3-6
B.I.2 Tests for a Proportion or Percentile 3-8
B.I 2.1 The One-Sample Proportion Test 3-9
B.I.3 Tests for a Median 3-12
B 2 Two-Sample Tests 3-12
B.2.1 Comparing Two Means 3-12
B.2.1.1 Two Sample T-Test for Comparing
Population Means 3-13
B.2.1.2 Wilcoxon Rank Sum Test 3-15
B.2.2 Comparing Two Proportions or Percentiles 3-18
B.2.2.1 Two-Sample Test for Proportions 3-18
B.2.3 Comparing Two Medians 3-21
B.2.4 Other Two-Sample Tests 3-21
B.3 References 3-22
STEP 4: VERIFY THE ASSUMPTIONS OF THE STATISTICAL TEST 4-1
Overview 4*2
A. Activities 4-2
Determine approach for verifying assumption 4-2
Perform tests of assumptions 4-3
Determine corrective actions (if any)) 4-4
B. Tests for Normality 4-4
B.I Background 4-5
B.2 Graphical Methods 4-6
B.3 Shaptro-Wilk Test for Normality (the W test) 4-6
B.4 Extensions of the Shapiro-Wilk Test (Fiiliben's Statistic) 4-8
B.5 Coefficient of Variation 4-9
B.6 Coefficient of Skewness/Coefficient of Kurtosis Tests 4-10
B.7 Range Tests 4-11
B.8 Goodness-of-Fit Tests 4-14
B.9 Recommendations 4-15
B.10 References 4-15
C. Tests for Trends 4-17
C.I Background 4-17
C.2 Estimating Trends 4-18
C.2.1 Regression 4-18
C.2.2 Sen's Slope Estimator 4-18
C.2.3 Seasonal Kendall Slope Estimator 4-18
C.3 Tests for Trends 4-18
C.3.1 Mann-Kendall Test 4-18
C.3.2 Seasonal Kendall Test and Sen's Test 4-19
EPA QA/C-9 ill
-------
Page
C.4 Tests for Homogeneity of Trends ............................ 4-19
C.5 References .......................................... 4-19
D. Outliers [[[
D.I Background ........................................... 4-20
D.2 Statistical Tests for Outliers ............................... 4-21
D.2.1 Selection of a Statistical Test ........................ 4-21
D.2.2 Extreme Value Test (Dixon's Test) I .................... 4-22
D.2.3 Discordance Test ................................. 4-23
D.2.4 Walsh's Tests ................................... 4-25
D.2.5 Rosner's Test .................................... 4-25
D.2.6 Special Cases and Other Sources of Information ............ 4-28
D.3 References ........................................... 4-28
E. Tests for Dispersion ..........................................
E.1 Confidence Intervals for a Single Variance ..................... 4-29
E.2 F-Test for the Equality of Two Variances ...................... 4-30
E.3 Bartlett's Test for the Equality of Two or More Variances .......... 4-31
E.4 Levene's Test for the Equality of Two or More Variances .......... 4-31
E.5 References ........................................... 4-31
F. Transformations ............................................
STEP 5: PERFORM THE STATISTICAL TEST ................................ 5-1
Overview . [[[ 5-2
Activities [[[ s'2
Perform the calculations for the statistical hypothesis test ................. 5-2
Evaluate the statistical test results and draw the study conclusions ........... 5-2
Evaluate the performance of the sampling design if the design is to
be used again .............................................. 5-3
APPENDIX A: STATISTICAL TABLES ............................ .- ......... A-l
LIST OF TOOL BOXES
Page
Box 2-1. Directions for Calculating the Measure of Relative Standing
(Percentiles) with an Example ..................................... 2-6
Box 2-2: Directions for Calculating the Measures of Central Tendency ............... 2-7
Box' 2-3: Example Calculations of the Measures of Central Tendency ................ 2-8
Box 2-4: Directions for Calculating the Measures of Dispersion .................... 2-9
-------
Page
Box 2-7. Directions for Calculating the Correlation Coefficient
(the Pearson Correlation Coefficient) 2-12
Box 2-8: Example Calculations of the Correlation Coefficient 2-13
Box 2-9: Directions for Generating a Stem and Leaf Plot 2-14
Box 2-10: Example of Generating a Stem and Leaf Diagram 2-14
Box 2-11: Directions for Generating a Histogram and a Frequency Plot 2-17
Box 2-12: Example of Generating a Histogram and a Frequency Plot 2-18
Box 2-13: Directions for Generating a Box and Whiskers Plot 2-20
Box 2-14: Example of a Box and Whiskers Plot 2-21
Box 2-15: Directions for Generating a Ranked Data Plot 2-22
Box 2-16: Example of Generating a Ranked Data Plot 2-23
Box 2-17: Directions for Generating Quamile Plot 2-25
Box 2-18: Example of Generating a Quantile Plot 2-26
Box 2-19: Directions for Constructing a Normal Probability Plot 2-29
Box 2-20: Example of Constructing a Normal Probability Plot 2-30
Box 2-21: Directions for Generating a Time Plot and an Example 2-33
Box 2-22: Directions for Constructing a Correlogram ,. 2-34
Box 2-23: Example of Generating a Correlogram 2-35
Box 2-24: Directions for Generating a Posting Plot and an Example . 2-38
Box 2-25: Directions for Generating a Symbol Plot and an Example 2-39
Box 2-26: Directions for Generating a Scatter Plot and an Example 2-43
Box 2-27: Directions for Constructing an Empirical Q-Q plot 2-46
Box 2-28: ' Example of Constructing an Empirical Q-Q Plot 2-47
Box 4-1: Directions for Filliben's Statistic (Normal Probability Plot
Correlation Coefficient) , 4-8
Box 4-2: Example of Filliben's Statistic (Normal Probability Plot Correlation
Coefficient) 4-9
Box 4-3: Directions for the Coefficient of Variation Test and an Example 4-10
Box 4-4: Directions for Coefficient of Skewness And Kurtosis Tests 4-11
Box 4-5: Directions for Studentized Range Test . 4-12
Box 4-6: Example of Studentized Range Test 4-13
Box 4-7: Directions for Geary's Test 4-13
Box 4-8: Example of Geary's Test . . . . . 4-14
Box 4-9: Directions for the Extreme Value Test (Dixon's Test) 4-22
Box 4-10: An Example of the Extreme Value Test (Dixon's Test) 4-23
Box 4-11: Directions for the Discordance Test 4-24
Box 4-12: An Example of the Discordance Test 4-24
Box 4-13: Directions for Walsh's Test for Large Sample Sizes 4-25
Box 4-14: Directions for Rosner's Test for Outliers 4-26
Box 4-15: An Example of Rosner's Test for Outliers '. 4-27
Box 4-16: Directions for Constructing a Confidence Intervals and Confidence
Limits for the Sample Variance and Sample Standard Deviation 4-30
Box 4rl7: Directions for Calculating an F-Test to Compare Two Variances . 4-30
Box 4-18: Directions for Transforming Data and an Example 4-32
EPA QA/G-9
-------
LIST OF FIGURES
Page
Figure 0-1. DQA in the Context of the Data Life Cycle 0-4
Figure 0-2 Environmental Decisions and Potential Errors 0-6
Figure 2-1. Example of a Histogram 2'16
Figure 2-2. Example of a Frequency Plot 2'16
Figure 2-3. Example of a Box and Whiskers Plot for Symmetric Data 2-19
Figure 2-4. Example of a Ranked Data Plot 2'22
Figure 2-5. Example of a Skewed Quantile Plot 2"24
Figure 2-6. Example of a Normal Probability Plot 2'27
Figure 2-7. Example of a Time Plot 2-32
Figure 2-8. Example of a Correlogram 2-33
Figure 2-9. Example of a Posting Plot 2'37
Figure 2-10. Example of a Symbol Plot 2'38
Figure 2-11. Example of Graphical Representations of Multiple Variables 2-41
Figure 2-12. Example of a Scatter Plot 2"42
Figure 2-13. Example of a Coded Scatter Plot 2~44
Figure 2-14. Example of a Parallel Coordinates Plot 2-44
Figure 2-15. Example of a Matrix Scatter Plot 2"45
Figure 4-1. Graph of a Standard Normal Distribution 4-52
LIST OF OVERVIEW BOXES
Overview 1: Review DQOs and Sampling Design 1-1
Overview 2: Conduct Preliminary Data Review 2-l
Overview 3: Select the Statistical Test 3'1
Overview 4: Verify the Assumptions of the Statistical Test 4-1
Overview 5: Perform the Statistical Test 5-1
LIST OF TABLES
Page
Table 1-1. Commonly Used Statements of Statistical Hypotheses 1-10
Table 4-1. Data for Examples 4'6
Table 4-2. Tests for Normality - *-7
Table 4-3. Summary of Recommendations for Selecting a Statistical Test for Outliers 4-21
Table A-l. Cumulative Standard Normal Distribution . . A-2
Table A-2. Critical Values of Filliben's Statistic A'3
Table A-3. Critical Values for the Studentized Range Test A-4
Table A-4. Critical Values for the Extreme Value Test A-5
Table A-5. Critical Values for the Discordance Test A-6
Table A-6. Approximate Critical Values for Rosner's Test A-7
Table A-7. Critical Values of Student's t Distribution A-10
Table A-8. Quantiles of the Wilcoxon Signed Ranks Test Statistic A-l 1
Table A-9. Critical Values for the Rank Sum Test A-12
Table A-10. Critical .Values of the Chi-Square Distribution A-13
EPA QA/C-9
VI
-------
EVALUATION FORM
-------
EVALUATION FORM
-------
EVALUATION FORM
ELEVENTH ANNUAL EPA CONFERENCE ON STATISTICS
FEBRUARY 27 - MARCH 2, 1995
1. Overall Conference Evaluation
Questions (please check one box)
Did you broaden your EPA contacts?
Did you update your current
knowledge?
Did you find exposure to new
material?
Did you gain more agency-wide
perspective?
Were you able to exchange technical
methods?
Were you able to discuss problems
and concerns?
Very
Much
Some
Extent
Limited
Extent
-------
2. Session Evaluation
Questions (please check one box)
Statistical Software and the
Single statistician
Keynote Address
Featured Speaker
Statistical Quality Assurance:
Data Quality Assessment
Environmental Monitoring: New
Answers to Old Questions and
Spatial Sampling
Tutorial: .Survival Analysis
Statistical Methods for
Combining Environmental
Information and Environmental
Research at NISS
Tutorial: Publishing on the
Internet
Emerging Issues in Environmental
Statistics-I
Pesticides in the Diets of
Infants and Children: Exposure
and Risk Estimation Using Monte
Carlo simulation
Statistical Policy Advisory
Committee
Collaborative Research I
Science and Information
Management: Present and Future
Perspectives
Collaborative Research II
Strategic Directions in
Information Resources Management
at EPA
New Sources of Environmental
Data: Testing the Latest Aerial
and Satellite Sensing at the
Field of Dreams
Highly
Relevant
Fairly
Relevant
Not
Very
Relevant
-------
Session Evaluation Con't
Questions (please check one box)
Geographic Visualization of
Environmental Quality
A Case Study of Surface Water
Conditions in the US/Mexico
Border Area
Building Environmental Data
Management and Analytical
Capabilities in the Great Lakes
Region and the Baltic Republics
Environmental Statistics in the
Water Office - I
Alternative Models for Analysis
of Composite Environmental
Samples
Estimates of Fish Consumption
Rates in the United States
Water Quality Based Effluent
Limitations and the Statistical
Properties of Low Concentration
Measurements in Analytical
Chemistry
Benchmark Dose in an Acute
Toxicity Study
Statistical Analysis of Risk and
Performance Results
Using Relative Data Quality
Indicators of Precision and Bias
Environmental Statistics in the
Water Office II (Continued from
earlier session)
Long Island Breast Cancer Study
Project: Environmental
Statistics Research Issues
Poster Session
Highly
Relevant
Fairly
Relevant
Not
Very
Relevant
-------
3. What were the greatest strengths of the conference? What
aspects did you like the most?
4. What were the greatest weakness of the conference? What
aspects and sessions did you like the least?
5. Would you be interested in other training sessions that would
introduce you to a new development in applied statistical
methodology?
Yes No Unsure
Suggestions for topics:
6. Are you planning to attend next year's conference on
statistics?
Yes No Unsure
7. Other comments:
-------
NOTES
-------
NOTES
------- |