THE TENTH ANNUAL
EPA CONFERENCE ON
STATISTICS
MARCH 7-10, 1994
RAMADA INN NORFOLK
NORFOLK, VIRGINIA
-------
PRELIMINARY PROGRAM
TENTH ANNUAL EPA CONFERENCE ON STATISTICS
MARCH 7-10, 1994
RAMADA INN NORFOLK
345 GRANBY STREET
NORFOLK, VIRGINIA
804-622-6682
MONDAY MARCH 7, 1994
6:00-7:30 pm Social Hour and Cash Bar
TUESDAY MARCH 8, 1994
8:45-10:00 am Plenary Session—Panel Discussion
Phil Ross; OPPE, Derry Allen; OPPE, Paul
Wohlleben; OIRM
10:00-10:15 am Break
10:15-11:30 am Plenary Session
Policy and Guidance in Groundwater Monitoring
Statistics: The State of the States
State of Pennsylvania/Dept of Natural Resources
Phil Morgan and collegues
11:30-1:00 pm Lunch
1:15-2:30 pm
-Environmental statistics Research at the
National Institute for Statistical Sciences
Larry Cox, David Holland
-General Topics
Tim Barry, Al Goozner, Matthew Leopard
-Statistical Research Methodology
George Flatman
2:30-2:45 pm Break
-------
2:45-4:00 pm
-How Widespread is Our Influence as
Statisticians?
Elizabeth Margosches, John Schwemberger,
Karen Hogan and Margaret Conomos
-Using Uncertainty Analysis in Environmental
Decision Making
Barnes Johnson, Tim Barry
-Environmental Statistics Issues in Pesticides
Ruth Allen
6:00-7:30 Poster Session & Statistics Software
Bill Smith
WEDNESDAY MARCH 9, 1994
8:45-4:00 Demonstrations of S Plus with Environmental Data
Sets
Brand Niemann et al
8:45-10:00 Statistical Issues in Rulemaking
Henry Kahn
Statistics and the Agency's Mandatory Quality
Assurrance Program
John Warren and Alfred F. Haeberer
10:00-10:15 Break
10:15-11:30 Statistical Analyses of Ozone Layer and
Composite Sampling
Bimal Sinha
Small Community Information and Data Program
Mel Hollander and Susan Brunenmeister
11:30-1:15 pm Lunch
1:15-4:30 pm Statistical Policy Advisory Committee Meeting
including a discussion of the role of SPAC,
subcomittee reports including the Ad Hoc
subcommittee of Nussbaum et al
-------
THURSDAY MARCH 10, 1994
8:45-10:00 am Plenary session
An Introduction to Internet For Statisticians
Chapman Gleason
10:00-10:15 Break
10:15-11:30 Plenary session Featured Speaker
Tom Van Zant
11:3 0-Noon Awards
-------
ADDITIONS TO THE PROGRAM FOR THE
TENTH ANNUAL EPA CONFERENCE ON STATISTICS
The Featurpd Speaker on Thursday Morning 10:15 to 1.1:30 am will
be Professor Dan Carr from the Center For Computational
Statistics at George Mason University. The topic of his
presentation will be "Application of Graphic Design Principles:
Construction of Row Labeled Plots and Maps For Exploration and
Presentation of EPA Data".
(Unfortunately the scheduled speaker Tom Van Zant could not
come)
2. Add to the Tuesday 2:45 to 4:00 pm session on Using Uncertainty
Analysis in Environmental Decision Making the following
presentation:
Richard Gilbert—Battelle Northwest
Environmental Decision Making: What Is To Be Gained From
Quantitative Estimates of Risk Uncertainty
-------
FINAL EXAM
TENTH ANNUAL EPA CONFERENCE ON STATISTICS
MARCH 10, 1994
PLEASE ANSWER SIX OF YOIP CHOICE
l. What are the five major types of evidence that were published
recently to suggest a relationship between organochlorine
pesticides and excess human breast cancer?
2. How green is GSA?
3. What unabated area remained high in lead content
(concentration) in Denver homes subjected to lead paint
abatement?
4. What feature of a lead standard was identified as an
essential component in health-based standard setting?
5. Are states capable of adaptive EPA stastical regulating (in
RCRA groundwater monitoring) and administrating these
regulations?
6. What is your notion of 'background' in environmental
monitoring?
7. Does the role of statistics in EPA and the state agencies
need to expand or retrench in scope?
8. How do we communicate critical statistical information to
managers on the state level?
9. What does the C/CPAS acronym stand for?
10. Write the S Plus script for a qqurom polot of a vector and
include a highly robust straight line fit which is not much
influenced by outliers. > qquorum (ear. gals), > qqline (can.
gals)
11. From first principles derive the first two moments of the
Candy distribution.
12. Describe the difference between stochastic and variability
and knowledge uncertainty.
-------
13. How many communities in the US have fewer than 10,000 people
based on the 1990 census.
14. Under what conditions does a systematic example of size n
have sample variance less than or equal to a simple random sample
of size n.
-------
Questions for the Annual Statistics Conference
1. At what hotel was the first conference held?
2. Who was the first featured speaker and from what office?
3. At which conference did Peter, Paul, and Mary speak one
morning?
4. Which plenary speaker was snowed out of his session?
5. Which is the only hotel to host the conference twice?
(Hint: It has had two different names)
6. In which year did the bus backtrack to the Greyhound depot
after crossing the 14 Street Bridge?
7. Which conference had three DAA's participate on one panel?
8. Which assistant administrator spoke at a conference?
9. Why is the confernce usually held in March? Which was the
one conference that was not in March?
10. Most of the conferences have had contractor support to set
them up. Name the two contractors who did this.
-------
ON
9
Q
-------
CONTENTS
Combining Data and Data Uncertainty: Introduction (N.P. Ross). Uncertainty Issues of the Hartford
Environmental Dose Reconstruction Project (R.O. Gilbert and J.C. Simpson). Decreased Sampling Costs and
Improved Accuracy with Composite Sampling (S.D. Edland and G. van Belle). Environmental Chemistry,
Statistical Modeling and Observational Economy (G.P. Patil). Predictive Models of Fish Response to Acidifica-
tion: Using Bayesian Inference to Combine Laboratory and Field Measurements (WJ. Warren-Hicks and R.L.
Wolpert). A New Approach for Accommodation of Below Detection Limit Data in Trend Analysis (N.N. Nagaraj
and S.L Brunenmeister). Spatial Statistics: Introduction (N.P. Ross). Spatial Chemostatistics (N. Cressie).
Design of the Clean Air Act Deposition Monitoring Network (D. Holland and R. Baumgardner). National Air
Quality and Emissions Trends Report (B.A. Beard and W.P. Freas). Models and Data Interpretation:
Introduction (D. Krewski). Statistical Issues for Development Toxicity Data (D. Gaylor). Measuring Carcinogenic
Potency (M.J. Goddard and D. Krewski). Environmental Pollution and Human Health: An Epidemiology
Perspective (J. Schwartz). Soil Quality as a Component of Environmental Quality (M.A. Cole). Future
Environmental Management: Introduction (J. Abe). Gauging tile Future Challenges tor Environmental
Management: Some Lessons from Organizations with Effective Outlook Capabilities (M. Boroush). Creating
Strategic Visions (G. Taylor}. Exploring Future Environmental Risks (D. Rejeski). Geographical Information
Systems {GISJ: Introduction (A. Pesachowitz). Geographical Information'Systems ;(GIS) for Environmental
Decision Making (A. Pesachowitz). Mechanisms to Access,Information about Spatial Data (E. Christian).
Environmental Statistics: Introduction (B. Nussbaum). The Quality of Enyironmentaipatabases (D. A. Marker,
S. Ryaboy, andH: Lacayo). TheNOAA National-Status arid-Trends Mussel Watch Prqg>a'm: National Monitoring
of Chemical Contamination in the Coastal United States (T.P. O'Connor). Some'Problems of Safe Dose
Estimation (A.P. Basu, G.F. Krause, K. Sun, M. Ellersieck, and F. Mayer). Where Next? Adaptive Measurement
Site Selection for Area Remediation (H.T. David and S. Yoo). The Center for Environmental Statistics: Interim
Status and Vision of Products (B. Niemann, C. Curtis, and E. Leonard). Conclusion: Where Do We Go from
Here? (N.P. Ross and C.R. Cothem).
Catalog no. L936LBRA
December 1993, c. 400 pp., ISBN: 0-87371-936-0
Approx. U.S. $95.00/Outside U.S. $114.00
30-DAY EXAMINATION POUCY
Lewis Publishers will alow up to $300.00 in publications to be
examined on 30-day approval Oiders lor books totalling over
$300.00 must be pre-paid. or accompanied by a viable purchase
oider All books are returnable Return postage is the
respofisibiligy of the customer Journals, software, and data-
bases are not available on an examination basis.
%EWIS PUBLISHERS X ^ •
$2000 Corporate>Blvd.£N.W. •&£g^&. :£
BocaJlaton^Florida 33431K407) 994-0555
* ' --- ' '' '
LEWIS PUBLISHERS
2000 Corporate Blvd., N.W., Boca Raton, Florida 3M31
Name
Company or Institution
Street
CnySlate/Zip
Telephone
P.O.*
Check One: O Check enclosed in the amount of $
D Credit Card:
D American Expreit Q Mastercard
Account number (indud* •« 6gi&)
» I I I I I I 1 I I I T I T
Expiration dale Signature
Please biH me
FREE Shipping and Handling on Prepaid Orders!
1193
3Easv Wavs ' •u" rep'lf c"dw P™*1**-
faaj vv«iy» 2. Mail thii portion to LewifPublbheri
tO Order: 3- c»" to" '** 1-800-272-7737 Monday through Friday
(Continental UJ. only) or (407) 9944)555.
Yes, Send Ms
CbthenVRoss: Environmental Statiriici, Astenment. and Foracading
Cat. no. L936LBRA..^pprox. U.S. $95.0
Also of Interest.
Taylor: Statistical Techniques for Data Anan/ni, 1990
Cat. no. 12502Z...US. $79.95/Oulside U Ji. $96.00
LBRALBRE
-------
THE EPA ANNUAL CONFERENCE OH STATISTICS
1985 «Durham, North Carolina
1986' Williamsburg, Virginia
1987" (Virginia Beach, Virginia
1988- oWilliamsburg, Virginia
1989- Charlottesville, Virginia
1990 Williaxnsburg, Virginia
1991 Richmond, Virginia
1992 Philadelphia, Pennsylvania
199J Baltimore, Maryland
1994^ Norfolk, Virginia
1995 Williamsburg, Virginia
1996 Richmond, Virginia
-------
PLANNING COMMITTEE FOR THE TENTH ANNUAL EPA CONFERENCE ON
STATISTICS
Chair Rick Cothern, OPPE
Ruth Allen, OPPTS
Larry Cox, AERL
George Flatman, EMSL
Chapman Gleason, OPPE
David Holland, ORD
Barnes Johnson, OSW
Henry Kahn, OW
Mel Hollander, OA
Elizabeth Margosches, OPPTS
Phil Morgan, State of Pennsylvania
Department of Natural Resources
Brand Niemann, OPPE
Barry Nussbaum, OPPE
Bimal Sinha, University of Maryland/Baltimore Campus
Bill Smith, OPPE
Tom Van Zant, Geosphere Project
John Warren, ORD
-------
ATTENDEES AT ALL TEN OF THE ANNUAL EPA CONFERENCES ON STATISTICS
JOHN CREASON
TOM CURRAN
JIM DALEY
BILL HUNT
BARNES JOHNSON
HENRY KAHN
MEL HOLLANDER
ELIZABETH MARGOSCHES
BARRY NUSSBAUM
PHIL ROSS
BILL SMITH
JOHN WARREN
-------
REGISTRATANTS
TENTH ANNUAL EPA CONFERENCE ON STATISTICS
RAMADA INN NORFOLK
MARCH 7-10, 1994
Gerald G. Akland
ORD/OMMSQA
MD-75
US EPA
Research Triangle Park, NC 27711
919-541-4885
FAX 919-541-7588
Derry Allen
Acting Director
Office of Strategic Planning and Environmental Data
MC 2161
USEPA
Washington, D.C. 20460
202-260-4028
Ruth Allen
OPPT/OPP/HED/CCB
HC7509C
USEPA
Washington, D.C. 20460
703-308-2918
FAX 703-305-5147
Tim Barry
OPA/OPPE
MC 2127-
202-260-2038
Peter Brcornfield
NISS
RTF, NC
Susan Bruenenmeister
OA H1501
USEPA
Washington, D.C. 20460
202-260-0246
-------
Lori Brunsman
H7509C
USEPA
Washington, D.C.
703-308-2902
Jeff Beaubier
MC 7403
USEPA
Washington, D.C. 20460
202-260-2263
Judy Calem
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-8638
FAX 202-260-4968
Jim Callier
Region 7
USEPA
MC RCRA/IOWA
726 Minnesota Ave
Kansas City, KS 66101
913-551-7646
FAX 913-551-7521
Dan Carr
George Mason University
William S. Cleveland
Maureen Clifford
OPP MC 8602
USEPA
Washington, D.C. 20460
703-308-2827
Jim Cogliano
ORD/OHEA
MC 8602
USEPA
Washington, D.C. 20460
202-269-3814
FAX 202-260-3803
-------
Edwin J. Coleman
Coleman/Morse Associates Ltd.
1190 River Bay Road
Annapolis, MD 21401
Margaret Conomos
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-26oO
FAX 202-260-4968
Rick Cothern
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-2734
FAX 202-260-4968
Bill Cox
OAQPS
MD-14
Durham, NC
919-541-5563
Larry Cox
AERL
MD-75
USEPA
RTP, NC 27711
919-541-2648
FAX 919-541-7588
John Creason
ORD/HERL-RTP
USEPA
RTP, NC 27711
919-541-2598
FAX 919-541-5394
-------
Tom Curran
OAQPS/OAR
MD-14
US EPA
RTF, NC 27711
919-541-5467
FAX 919-541-2357
Jim Daley
OPPE/ORME/IPB
MC 2136
US EPA
Washington, D.C. 20460
202-260-2743
Evan England
EMSL
PO BOX 93478
USEPA
Las Vegas, NV 89193-3478
Gary F. Evans
MD-56
Durham, NC
919-541-3124
George Flatman
EMSL
PO Box 93478
USEPA
Las Vegas, NV 89193-3478
702-798-2628
FAX 702-798-2454
Bernice Fisher
H7509
USEPA
Washington, D.C. 20460
703-305-5959
Terrence Fitz-Simons
MD-14
Durham, NC
919-541-0889
-------
John Fritsvold
Environmental Statistics and Information Division
OPPE MC 2163
US EPA
Washington, D.C. 20460
202-260-6724
FAX 202-260-4968
Dru Francis
Hampton Roads Sanitation District
PO Box 5911
Virginia Beach, VA 23455-0911
804-460-2261
Niel Frank
OAQPS
MD-14
Durham, NC
919-541-5560
Warren Freas
OAQPS
MD-14
Durham, NC
919-541-5469
William Garetz
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-2684
FAX 202-260-4968
Chapman Gleason
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-9006
FAX 202-260-4968
-------
Al Goozner
OPPTS/OPP/BEAD
H7503W
USEPA
Washington, D.C. 20460
703-308-8147
FAX 703-308-8151
James Hemby
OAQPS
Helen Hinton
OAQPS
MD-14
Durham, NC
919-541-5558
Karen Hogan
MC 7403
USEPA
Washington, D.C. 20460
202-260-3895
Dave Holland
ORD/AREAL-RTP
Mail Drop MD-56
USEPA
RTP,NC 27711
919-541-3126
FAX 919-541-1486
John Hoiley
MC 6406J
USEPA -
Washington, D.C. 20460
202-233-9305
Bill Hunt
OAQPS/OAR
MD-14
USEPA
RTP, NC 27711
919541-5559
FAX 919-541-2357
-------
Barnes Johnson
MC 5305
USEPA
Washington, D.C. 20460
202-260-2791
Henry Kahn
OW/OST/EAD
MC 4303
USEPA
Washington, D.C. 20460
202-260-5406
Art Koines
MC 2161
USEPA
Washington, D.C. 20460
202-260-4030
Mel Hollander
OA H1501
USEPA
Washington, D.C. 20460
202-260-4719
Pepi Lacayo
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-2714
FAX 202-260-4968
Eleanor Leonard
Environmental Statistics and Information Division
OPPE 'Me 2163
USEPA
Washington, D.C. 20460
202-260-9753
FAX 202-260-4968
Matthew Leopard
MC 2136
USEPA
Washington, D.C. 20460
202-260-2468
-------
Elizabeth Margosches
OPPTS
MC 7403
USEPA
Washington, D.C. 20460
202-260-1511
FAX 202-260-1283
Mary Marion
Rick Moll
Statistics Canada
National Accounts and Environment Division
R.H. Coats Building, 21st Floor
Ottawa, Ontario, Canada K1A OT6
613-951-3741
Bob Moon
Cincinnati Technical College
FAX 513-569-1463
James Morant
Environmental Statistics and Information Division
OPPE - MC 2163
USEPA
Washington, D.C. 20460
202-260-2266
FAX 202-260-4968
Phil Morgan
717-772-3609
FAX 717-787-8885
Bill Nelson
ORD/AREAL-RTP
USEPA
RTP,NC 27711
919-541-3184
FAX 919-541-1486
-------
Brand Niemann
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-3726
FAX 202-260-4968
Barry Nussbaum
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-1493
FAX 202-260-4968
Tony Olsen
ORD/ Corvallis
USEPA
Environmental Research Laboratory
Corvallis, Oregon 97330
503-754-4790
FAX 503-754-4338
Thomas Parker
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-3378
FAX 202-260-4968
Barbara Parzygnat
OAQPS
Hugh Pettigrew
OPPT/OPP/HED
H7509C
USEPA
Washington, D.C. 20460
703-305-5699
James A. Reagan
MD-78A
Durham, NC
919-541-4486
-------
Esparenza Renard
US EPA
Edison, NJ
Phil Ross
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-2680
FAX 202-260-4968
Jerry Sacks
NISS
PO BOX 14162
RTF, NC 27709
919-541-7114
FAX 919-541-7102
John schwemberger, Jr.
MC 7404
USEPA
Washington, D.C. 20460
202-260-7195
Ingrid Schulze
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-3007
FAX 202-260-4968
Robert Seila
USEPA
RTP, NC
Woody Setzer
MD 55
Durham, NC
919-541-0128
Ron Shafer
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-6966
FAX 202-260-4968
-------
Jack H. Shreffler
AERL
MD-75
US EPA
RTF, NC 27711
Bimal Sinha
Department of Statistics and Mathematics
University of Maryland/Baltimore Campus
410-455-2412
FAX 410-455-1066
William Smith
Environmental Statistics and Information Division
OPPE MC 2163
US EPA
Washington, D.c. 20460
202-260-9659
FAX 202-260-4968
Chris Solloway
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-2697
FAX 202-260-4968
Tim Stuart
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-0725
FAX 202-260-4968
David Svendsgaard
MD-55
Durham, NC
919-541-2468
Tom Van Zant
Geosphere Project
146 Entrada Drive
Santa Monica, CA 90402
310-459-4342
FAX 310-459-8299
-------
John Warren
QUAMS MC 8205
USEPA
Washington D.C. 20460
202-260-9464
FAX 202-260-4346
Patricia Wilkinson
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-2680
FAX 202-260-4968
Paul Wohlleben
Acting Director, OIRM
USEPA
Washington, D.C. 20460
202-260-4465
David Zoellner
Environmental Statistics and Information Division
OPPE MC 2163
USEPA
Washington, D.C. 20460
202-260-3373
FAX 202-260-4968
-------
ABSTRACTS FOR THE TENTH ANNUAL EPA CONFERENCE ON STATISTICS
MARCH 7-10, 1994
RAMADA INN NORFOLK
345 GRANBY STREET
NORFOLK, VIRGINIA
804-622-6682
-------
TUESDAY MARCH 8, 1994
8:45-10:00 am Plenary Session—Panel Discussion
Phil Ross; OPPE, Frederick w. Allen; OPPE, Paul
Wohlleben; OIRM
-------
TUESDAY MARCH 8, 1994
10:15-11:30 am Plenary Session
Policy and Guidance in Groundwater Monitoring
Statistics: The State of the States
State of Pennsylvania/Dept of Natural Resources
Phil R. Morgan, Sr. and colleagues
An informal survey of state environmental agencies has revealed
that there is a great disparity in the role these agencies play in
meeting the mandates of the UESPA regulations on the use of
statistical methods in management and monitoring of Superfund and
RCRA sites. This review of the statistical activities of state
agencies also revealed wide variation in the extent to which-states
observe statistical methodology recommended in the USEPA
regulations and guidance.
This session will examine approaches taken by toxic waste-intensive
states in Superfund and RCRA monitoring statistics, unique data
collection, analysis and proposal review problems faced by certain
states, and will suggest solutions to recurrent problems in
statistical methods for detection, evaluation and attainment
monitoring.
The session will include a discussion of these issues by state
agency personnel.
Statistical concerns to be addressed include the following:
1. Overview of RCRA Groundwater Monitoring Problem
-RCRA monitoring overview
-Major monitoring review
-Balancing statistical issues
-USEPA power analysis approach
-Considerations in applying the USEPA approach
2. The Verification Problem
3. Application of Interval Statistics to RCRA Data for Detection,
Assessment, Attainment Monitoring
-Motivation for using interval statistics
-Application of Tolerance intervals
-Application of Prediction Intervals
-Application of Confidence Intervals
-Application of Combined Tolerance/Prediction Intervals
-The Statistical 'Performance Standard1 Problem
-Lognormal Interval Statistics
-Nonparametric Interval Statistics
-------
4. Description of Example Data
-Overview
-Data: Locations and Chronology
-Using Results of Previous Analysis
5. Power Analysis Using Tolerance/Prediction Intervals
-Logical Description of how Simulation Procedures
Relates to the Waste Site
-Description of Data Generation and Simulation Procedures
-Results of Simulation
6. Attainment and Maintenance Statistics: Trend and Causality
-Attainment Sampling and Sequential Testing
-Mann-Kendall Statistics
-Keeping an Eye on Statistical Criteria
7. Statistics in Today's Regulatory Environment
-Is it possible to "keep it simple"?
-Resources Needed by State and Federal Agencies
-Resources Needed by the Regulated Community
-------
TUESDAY MARCH 8, 1994
1:15 - 2:30 pm
-Environmental Statistics Research at the
National Institute for Statistical Sciences
David Holland and Larry Cox—ORD
Meteorological Effects in Measuring Ozone Levels and Trends
Observed ozone concentrations are used to monitor changes and
trends in the sources of ozone and its precursors. For this
purpose the influence of meteorological variables is a confounding
factor. Data from the Chicago area are explored with a variety of
methods. The key relationships are used to construct nonlinear
least squares, nonparametric and logistic regression -models
relating ozone levels unaccounted for by trends in meteorology, to
"adjust" observed ozone concentrations for anomalous weather
conditions and to predict exceedances over high thresholds.
Effects of Particulate Levels on Mortality Accounting for
Meteorology
Potential relationships between mortality and levels of
particulates (in particular, PM10) are confounded by meteorological
variables. Coefficients from regression models relating these
variables are often used to measure the effect of PM10 on
mortality. Models are constructed, and applied to Chicago area
data, indicating that season and meteorology play dominant roles
and that the effect of PM10 on mortality may be insignificant.
Alternative (to regression coefficient) ways of measuring the
effect of PM10 are also used to disentangle season and meteorology
from particulate levels with similar conclusions.
Strategies for Combining Data From Multiple Studies in Risk
Assessment
These strategies accommodate both systematic and random variation
between studies while developing dose-response relationships. A
key step is to discretize the response into severity categories in
order to combine endpoints on possibly different scales. The
probability of an adverse outcome is then modeled using general
mixed model logistic regression. Modeling error is assessed and
reduced through stratification and random effects modeling. These
(and other) tools address systematic differences between studies,
such as, species effects, uncontrolled sources of variation such as
lab and investigator effects as well as combining information for
studies with different levels of quantification. Analyses of data
from studies of tetrachloroethylene and methylisocyanate illustrate
the methods.
-------
TUESDAY MARCH 8, 1994
1:15 - 2:30 pm
-General Topics
Tim Barry, Al Goozner, Matthew Leopard
Tim Barry—OPPE
The Principle of Maximum Entropy and the Selection of
Probability Distributions.
Maximum entropy (MaxEnt) is a nonparametric method for selecting
probability distributions developed by Shannon and Jaynes which
guarantees the maximum use of the available data without- going
beyond it. MaxEnt has proven to be especially useful in data poor
situations. Recent advances have opened up the method for expanded
uses in environmental risk assessment. This presentation presents
an overview of the methodology and discusses some recent
applications and their potential uses in quantitative risk
assessment.
Al Goozner—OPP/OPPTS
Certified/Commercial Pesticide Applicator Survey C/CPAS
EPA had embarked on a new data collection program under mandate
from the 1990 Food, Agriculture, Conservation and Trade Act (PL
101-624). The 1993 Certified/Commercial Pesticide Applicator
Survey (C/CPAS) is the initial EPA effort to collect needed
pesticide use data from these applicators in order to compile a
report to be submitted to the Congress in compliance with the new
law.
A view of how preliminary investigations were conducted to develop
a survey design and a successful OMB Information Collection Request
will be presented. Emphasized are the objectives of collecting
information with the minimum respondent burden. Reference to
Cochran's Sampling Techniques will be made. Participants will see
how the stage was set for an on-going data collection program.
Topics to be covered include:
A) The Congressional Mandate
B) Preliminary Investigations
C) Developing a Focus for the Survey
D) Methodology Development
E) ICR Preparation - Approval
-------
Matthew G. Leopard
Transborder Hazardous Waste Data Electronic Data Interchange
(EDI) Project
The Maquiladoras or "Maquilas", defined as U.S.-based industries
operating in Mexico, are an extreme case of an industry subjected
to myriad environmental reporting requirements. Due to the
transborder nature of their business, they must submit
environmental compliance reports to multiple agencies on both sides
of the border. This requires the Maquilas to engage in a laborious
and often duplicative act of transposing information from databases
onto paper forms. For them, transmitting the data electronically
could, in theory, lead to significant reductions in regulatory
burden—possibly condensing the data required on many paper, forms
into a single electronic format. Converting from paper to
electronic submissions of data involves much more than a technical
fix—to maximize its value, senders and recipients must reassess
their existing data management practices and data requirements.
This fall the EPA initiated a project to electronically transmit
hazardous waste data from the Maquilas to EPA Regional, State, and
other Federal Agencies. The broad objective is to lay the
groundwork for a phased implementation of electronically-
transmitted hazardous waste compliance data to government agencies
and appropriate industries. Using the experience gained from other
EDI pilots, the EPA is adapting EDI standards (referred to as ANSI
ASC X12) used by industry to the unique requirements of the
Maquilas. The EPA has formed a working group of industry, State,
and U.S. and Mexican government representatives to orchestrate the
project. Presented at the meeting will be the status of the
project, the lessons learned thus far, and the potential value of
EDI with respect to other environmental data requirements.
-------
TUESDAY MARCH 8, 1994
1:15 - 2:30 pm
-Statistical Research Methodology
George Flatman—ORD
Environmental Monitoring: New Answers For Old Questions
Often statistics is used and taught as if it were a dead language
like Latin which first killed the Romans and is now killing the
students. However, statistics is alive and well and adding new
methods and algorithms. In the last few years, spatial statistics
has rewritten the answers to the ubiquitous questions of (1) how to
take "representative" samples of assured quality, (2) how to
optimize sampling design (number of samples), and (3) how to make
data analysis understandable to decision makers. The cause of the
change is "spatial correlation", which is a technical term for the
common sense fact that environmental samples taken close together
in space are both apt to be high because they come from the same
plume area or both low because they come from the same background
area. Varying together is correlation. This talk will summarize
the meanings of: (1) "correct sample" from Gy's Theory for
determining sample mass in heterogeneous media for Quality
Assurance, (2) additional sampling optimization rules—equal
probability and equal spacing of samples, and (3) data analysis to
measure false positives, false negatives, and power for the
decision makers. Statistics did not kill the ancient statisticians
and has the potential to make the life of the environmental
scientist or manager a low easier and more productive (accurate).
Environmental Monitoring (Sampling From a Correlated Random
Field): How Many Samples?
George T. Flatman will present a unified sampling strategy,
discussing the need for and inter-relationships among classical
random variable sampling, Gy's Theory of sampling, and
geostatistical sampling for environmental monitoring. Evan J.
Englund will present "SAMPLAN", or how to answer "how many samples"
for one and two stage spatial sampling campaigns using spatial
(conditional) simulation.
-------
TUESDAY MARCH 8, 1994
2:45 - 4:00 pm
-How Widespread is Our Influence as
Statisticians?
Elizabeth Margosches, John Schwemberger,
Karen Hogan and Margaret Conomos
Karen Hogan—HEB/HERD/OPPTS
Health-based Standards for Lead
OPPTS has been charged with developing health-based standards for
lead in dust and soil under Title X, Section 403, of the
Residential Lead-based Paint Hazard Reduction Act of 1992. This
development is being coordinated with work going on in the -Office
of Solid Waste and Emergency Response, and follows on work from the
Office of Ground Water and Drinking Water. Approaches using a
pharmacokinetic model based on clinical (human) data and
statistical models of epidemiologic studies are outlined. In
particular, issues concerning extrapolation from these constructs
to national distributions of potentially exposed children are
discussed, including the viability of assessing the effect of
standard setting at the household level.
John G. Schwemberger and Benjamin S. Lim—OPPTS, Bruce E. Buxton,
Steven W. Rust, and John G. Kinateder—Battelle, Frederick G.
Dewalt and Paul Constant, Midwest Research Institute
Does Lead Paint Abatement Work?
Lead paint abatement of a residence is an expensive and disruptive
undertaking. An opportunity to evaluate lead paint aba cement arose
after lead paint abatements had been completed on approximately 50
houses in Denver, Colorado. In addition, houses with a low
incidence of lead-based paint were available to comprise a control
group. Levels of lead in the dust and soil of the abated and
control houses were measured after normal occupancy was
established. Dust samples were collected from window sills and
window channels (the channel is generally the part of the window
where the sash rests when the window is closed) , from air ducts and
floors, and from interior and exterior entryways. Soil samples
were collected near entryways, near the foundation, and at the
property boundary.
Lead in dust was measured both as a percentage of collected house
dust and as an amount of lead per unit of area. The amount of lead
per unit area was dependent on the amount of dust collected from
the area. Therefore, the percentage of lead in the house dust,
usually called the lead concentration, was used as the primary
means of comparing abated houses to control houses. Lead in soil
-------
was measured as percentage of the collected soil. Hence soil lead
was also measured as a lead concentration.
The geometric means of the lead concentrations were statistically
equivalent for the abated and the control houses for a number of
sampled areas. Differences were, with one exception, attributed to
unpainted areas of the house or property (air ducts, exterior soil)
that were not abated. The lone exception was window sills. Lead
concentrations on window sills were significantly higher in abated
houses than in control houses. This result is an anomaly; mean
lead concentrations in window channels were similar for the two
types of houses.
Lead paint abatement worked in the sense that mean lead
concentrations at abated houses were similar to those at the
control houses, and non-abatement accounted for all but one"of the
cases of differences. A surprising result was the level of lead
concentrations in the window channels. The geometric mean
concentrations were high for both control and abated houses. Hence
it is possible that window channels remain a source of lead, even
after lead paint abatement and even in houses with a low incidence
of lead-based paint.
This study was a follow-up of a study done by the Department of
Housing and Urban Development (HUD). HUD conducted the abatements
and examined costs, worker safety, and clean-up. The EPA study was
designed to determine if the abatements worked two years after the
abatements had been completed. It is estimated that approximately
57 million homes in the United States contain some lead-based paint
at or above the statutory definition of 1.0 milligrams per square
centimeter. The 57 million homes with lead-based paint have an
average of 580 square feet of interior surface and an average of
900 square feet of exterior surface covered with lead-based paint.
decisions regarding whether abatement works have the potential to
affect many of the owners of these 57 million homes, who will have
to decide how to deal with hazards from lead-based paint.
Margaret Conomos, John Michael, William M. Devlin, Stephen K.
Dietz—OPPTS
Design for the Environment—General Services Administration
(GSA) Field Office Cleaning Systems Survey
The Office of Pollution Prevention and Toxics (OPPT) of EPA has
been assisting the Public Buildings Service (PBS) of GSA in an
evaluation of cleaning products. The effectiveness ratings and
perceived health effects of 13 "green" products were compared to 6
other products currently used by GSA. The "green" products are so
named because of the reduced packaging requirements and the fact
10
-------
that manufacturers have said that their products are non-toxic to
the environment and users.
EPA and its contractor, Westat, helped GSA (EPA's client) develop
and administer a series of four questionnaires to 45 GSA cleaning
personnel over a period of 6 months. Quantitative data were
collected from the respondents about their experiences with the
various products. In addition, qualitative information was
collected from respondents about their experience with various
products. Methods were developed to track each respondent across
cycles of product testing while still preserving respondent
anonymity. A stagewise approach to Analysis of Variance was used
to analyze product effectiveness, and cope with missing data.
11
-------
TUESDAY MARCH 8, 1994
2:45 - 4:00 pm
-Using Uncertainty Analysis in Environmental
Decision Making
Barnes Johnson
Richard Gilbert—Battelle Northwest
Environmental Decision Making: What is To Be Gained From
Quantitative Estimates of Risk Uncertainty?
Quantitative risk predictions are uncertain because of incomplete
information and knowledge about data, models, model parameters, and
the true state of nature. This uncertainty may be assessed using
qualitative or quantitative (Monte Carlo simulation) methods. The
rigor with which the uncertainty of quantitative risk estimates
should be assessed depends on many factors including 1) the size of
the estimated risk, 2) the consequences of making wrong decisions
because of risk uncertainty is not adequately assessed, 3) the
availability or obtainability of information needed to quantify the
uncertainty of risk estimates to the desired degree, and 4) whether
there is a need to identify key model components and parameters
that should be studied to reduce risk uncertainty and decision
errors. Quantitative uncertainty analyses are typically more
expensive and may be more difficult to conduct, understand and
interpret than qualitative analyses. Moreover, quantitative
uncertainty analyses of risk are themselves uncertain because of
uncertainties about which probability density functions are most
appropriate to model the uncertainty risk parameter values. For
these reasons, the advantages as well as limitations of
quantitative uncertainty analyses must be clearly understood by
decision makers and stakeholders before the method is used. We may
ask "How does the -decision maker benefit by having available a
quantitative estimate of risk uncertainty?", "How do we decide when
to use quantitative uncertainty analyses?11, and "Is the additional
information obtained by the quantitative uncertainty analysis worth
the cost?". These questions will be addressed and illustrated in
the context of 1) making decisions about whether additional
environmental samples or information are needed by the stakeholders
for making cleanup decisions based on risk, 2) eliciting expert
judgements concerning the uncertainty of risk parameter values in
the absence of site-specific information, 3) developing
computational tools for using uncertainty analyses to design more
efficient sampling strategies at sites where risk assessments are
needed, and 4) the Data Quality Objectives process for developing
sampling designs to meet uncertainty requirements.
12
-------
Tim Barry—OPPE
Two-dimensional Mote Carlo Analysis.
In quantitative risk analysis, it is important to distinguish
between naturally varying quantities (stochastic variables) and
uncertain quantities (i.e., quantities for which our uncertainty is
due to a lack of knowledge about their true but unknown value
either in a statistical sense regarding inaccuracy or imprecision
in parameter estimates or in a scientific sense regarding missing
or ambiguous information or gaps in scientific theory). A two-
dimensional (nested) Monte Carlo analysis was conducted to assess
the uncertainty and variability of cancer risks attributable to
radon in drinking water. Exposure pathways included the inhalation
of volatilized radon gas and its daughter products and the
ingestion of waterborne radon. This presentation will focus on the
underlying theory for two dimensional Monte Carlo analysis, using
the radon case study for illustration of the methodology.
13
-------
TUESDAY MARCH 8, 1994
2:45 - 4:00 pm
-Environmental Statistics Issues in Pesticides
Ruth Allen
14
-------
ENVIRONMENTAL STATISTICS ISSUES IN PESTICIDES
Ruth H. Allen, Ph. D., M.P.H.
This presentation is design to highlight generic environmental
statistics issues for the management of pesticides. There are
three parts. " Audience questions will be taken after each part
for up to five minutes to maximize group participation.
Part 1. Report of the Statistical Needs Assessment-Phase 1 by
Ruth H. Allen, Ph. D., M.P.H. (10 min.)
This part is an overview of the results of the first phase of the
OPP statistical needs assessment. It highlights the common
environmental statistics issues across divisions within EPA.
Brief mention is given of work in progress with a list of experts
is different subject areas. Environmental statistics problems
and solutions are summarized for Phase 2. Environmental
statistics is looked at in the context of current streamlining
activities.
Part 2. Breast Cancer and Pesticides in the Environment: A Case
Example of the Usefulness of Environmental Statistics by Ruth H.
Allen, Ph. D., Amy Rispin, Ph.D. and Victor Miller, Dip. Pharm.
(30 min)
This part highlights the important role of environmental
statistics in the assessment of the risks to human health and
wildlife. Information regarding possible mechanisms of action
from chemical agents, diet and other lifestyle related risk
factors are presented. It also reviews and summarizes the recent
hypothesis that organochlorine pesticides may be linked to
increases in the rates of human breast cancer worldwide. Vital
statistics and epidemiology findings from national and
international sources are included with special reference to
selected organochlorine compounds. Lessons learned from pesticide
management and regulatory activity over the last few decades are
also included.
Part 3. Certified/Commercial Pesticide Applicator Survey C/CPAS
Discussion of Survey Design by Al Goozner (20 min)
This part describes EPA participation is a new data collection
program under mandate from the 1990 Food, Agriculture,
Conservation and Trade Act (PL 101-624). The purpose of the
survey is to collect pesticide use information from
certified/commercial pest applicators. This presentation will
review the congressional mandate, discuss strategies to comply
with OMB information collection reporting requirements, highlight
methods development issues, and include reference to Cochran's
Sampling Technique. Participants will see how the stage was set
for an on-going data collection program.
-------
TUESDAY MARCH 8, 1994
6:00-7:30 pm Poster Session
Bill Smith
Edwin J. Coleman; Coleman/Mores Associates Ltd.
Data: Where It Is and How to Get It
We see this book as our modest contribution to America's global
competitiveness and business productivity. We feel strongly that
the time has come for Americans to become more data literate. That
is the primary reason we have prepared this hands-on, practical
guide to useful business, environmental and energy data. This book
meets an important need. For years, we have seen professionals in
all walks of life make important business and political decisions
without using the best data available. We feel we have remedied
these problems in this one volume. Half of the book is
instructional. It is an introductory guide to data, where it is
produced, how it is prepared, the tricks to using it and the jargon
that seems to make it difficult to use. We call this section the
DATAPRIMER because it explains everything anyone needs to know to
feel comfortable with data and to use it effectively. We have make
every effort to be clear and practical, without sacrificing
accuracy.
The other half of the book contains three practical and well
indexed directories to business, environmental and energy data.
This is the DATAPHONER section of the book and it is the heart of
the volume. It contains the names and areas of specialization of
over 2,500 individuals in federal, state and local governments, and
in private firms who can answer specific questions about nearly
every business-related data issue of interest to busy
professionals.
Brand Niemann; OPPE
Environmental Statistics and Information ONLINE
The Clinton Administration's major initiative to develop the
National Information Infrastructure (Nil) is prompting federal
Government agencies to explore new solutions to "ensure that the
immense reservoir of government information is available to the
public easily and equitably." Of course before government
information, especially environmental data, statistics, and
indicators, can be made more available, we need to locate it and
add value to it so our own agency can first use it. ESID's
15
-------
10th Annual EPA Statistics Conference
March 7-10, 1993
Norfolk, Virginia
Poster Session Presentation
March 9thf 6-7:30 p.m.
Dissolved Oxygen in the Chesapeake Bay:
Exploratory Data Analysis to Support Monitoring Frequency Decisions
Dissolved oxygen is critical to living resources in the Chesapeake
Bay surface waters. Too much of the nutrients nitrogen and
phosphorus added to the Bay subtracts oxygen and, at times, life
itself from the waters. The Chesapeake Bay Monitoring Program,
begun in 1984, is a bay-wide EPA/state cooperative effort
comprising over 165 stations. Nineteen physical, chemical, and
biological characteristics are routinely monitored 20 times a year
in the mainstem Bay and many tributaries. This "point" monitoring
data at the 49 mainstem sites (see map) along with special
continuous monitoring data at fewer sites and limited time periods
for dissolved oxygen are the focus of this EDA. In addition, the
mainstem point monitoring data which has been interpolated
horizontally and vertically within the Bay by other researchers was
also used. In this way the effects of sampling frequency (20 versus
12 per year) and spatial inerpolation on dissolved oxygen
statistics could determined, especially in relation to the new
suggested water quality "targets" for dissolved oxygen levels in
the Bay. This EDA was conducted as both a graduate class project at
George Mason University and for the Chesapeake Bay Monitoring
Subcommittee.
The EDA was structured into three parts: (1) basic explorations;
(2) additional issues; and (3) advanced explorations following the
principles of "not stopping with the first result" and asking the
five basic questions about the data (who, what, when, where, and
why). Cleveland's approach (1993) of progressing from univariate,
bivariate, and multivariate data was also followed. The EDA tool is
S-PLUS running under the Microsoft Windows 3.1 operating system.
The basic explorations feature cumulative distribution functions of
rank order frequency of dissolved oxygen concentrations, "notched"
boxplots, and the simple scatterplot matrix. The advanced
explorations feature q-q plots, seasonal decomposition of time
series, advanced scatter plot matrices, coplots, and contour plots.
An effort was made to develop generic S-PLUS functions so it was
easy to change the site data and plot labels.
The results show how EDA can improve visualizations of complex
databases, support decisions on monitoring frequency and
development of environmental indicators, and provide comparisons of
results to environmental goals. In addition, the presentation shows
how the S-PLUS results can be "cut and pasted" through the Windows
Clip Board into Folio Views infobases with explanatory text and
"data stories" for electronic distribution to a broad audience.
-------
Environmental Statistics and Information ONLINE is a program of
products designed to locate and add value to environmental data and
information science approach with state-of-the-art tools for
envisioning, visualizing, structuring, and distributing.
You are invited to see the following:
(1) the latest version of the Guide to Selected National
Environmental Statistics in the U.S. Government as a hypertext-
hyperlinked electronic book in Folio Views for both DOS and Windows
and how to access it in the Internet;
(2) the Interagency PC Global Change Data and Information
System and the Intergovernmental Master Directory of Water Quality
and Ancillary Data infobases that provided access to key reports,
statistics, and metadata and links to databases on CD-ROMs;
(3) the Source book and Syllabus of Visualizations of
Environmental Databases with S-PLUS for EPA Managers for key
environmental problem areas like global change, acid deposition,
superfund environmental equity, Chesapeake Bay water quality, and
EMAP resources monitoring; and
(4) custom CD-ROMs with agency databases, documentation, and
analyses compiled and written for the Office of Water, Chesapeake
Bay Program Office, Region III/MAHA Program, etc.
16
-------
Bernard Most; ManTech
Defensible Data Management for Chronic Studies
Using two-year studies investigating the toxicity of byproducts of
water purification, we display a data management flow which
emphasizes good data management techniques providing an audit trail
from "lab data" (on a PC) to analysis datasets (SAS datasets on a
mainframe). In addition, these techniques have the ability to
provide feedback to the lab workers (data collectors) which may be
beneficial in maintaining high standards of adherence to
experimental and data recording protocols.
17
-------
Statistical Software Demonstrations
-Minitab; Maureen McCullen
-S+; Tom Christie
-SPSS; Bill Haffey
18
-------
WEDNESDAY MARCH 9, 1994
8:45-4:00 Demonstrations of S Plus with Environmental Data
Sets
Brand Niemann et al
Title: Visualizing Data with William S. Cleveland and S-PLUS
Part 1: William S. Cleveland, 8:45 - 10:15 am
William S. Cleveland is a leading researcher in the analysis of
statistical data. His interests have ranged from the theoretical to
the applied, starting with an A.B. in math at Princeton, then on to
a Ph.D. in statistics at Yale, and finally a position in the
Mathematics Research Center at AT&T Bell Laboratories. Today he
concentrates on research in statistical methods, data
visualization, and visual perception. His writings, which include
three books and numerous journal articles, have enjoyed a large
audience. Many of the visualization methods that he developed and
applied in his writings are widely used throughout the scientific,
engineering, and business communities.
Dr. Cleveland's new book, Visualizing Data, is about visualization
tools and a philosophy of data analysis that stresses a penetrating
look at the structure of data. There are graphical tools such as
coplots, brushing, and banking to 45 degrees. There are fitting
tools such as loess and bisquare. The book conveys the role of
visualization in drawing conclusions from data and its
relationships to classical statistical methods.
The book is organized around applications of the tools to data sets
from scientific studies. This shows how each tool is used, and the
class of problems it solves. It also reveals the power of
visualization; for many of the applications, the tools of the book
reveal missed effects and errors in judgment in the original
analyses. And the applications convey the excitement of discovery
that visualization brings to data analysis.
Dr. Cleveland's new book, Visualizing Data (1993) and the revised
printing of his The Elements of Graphing Data (1985) are available
from Hobart Press, Lisa McKittrick, Publisher. The data sets used
in Visualizing Data are also available by electronic mail from
Hobart Press.
19
-------
Title: Visualizing Data with William S. Cleveland and S-PLUS(cont.)
Part 2: S-PLUS Mini-Class, 10:30 - 11:30 am
S-PLUS is a state-of-the-art, interactive computing environment
which provides both a full-featured graphical data analysis system
and an object-oriented language. The flexible S-PLUS system can be
used for exploratory data analysis, graphics, statistics, and
mathematical computing. S-PLUS can be used as an application
package or as a development environment for custom data analysis
and graphics applications. S-PLUS is the commercial version and a
superset of the original S language from AT&T Bell Laboratories
available from Stat-Sci, Inc. S and S-PLUS are at the leading edge
of statistical research and new developments usually appear sooner
than in other statistical software packages. There are both an
electronic mailing list for people using S where you can share
experiences with other S and S-PLUS users and an archive server,
StatLib, for user-contributed S functions and mailing list
discussions on the Internet.
S-PLUS is a large system, with over 1400 built-in functions and
dozens of additional functions stored in included libraries. S-PLUS
runs under the DOS, Windows 3.1 and UNIX operating systems. An S+
Interface is available and S-PLUS for ARC/INFO and S-Plus for
Remote Sensed Data Analysis are in advanced development. Stat-Sci
recommends the product Data Junction to convert your data files to
and from dozens of popular databases, spreadsheets, and other
applications for use with S-PLUS. S-PLUS requires a 386 or 486
based machine with a math co-processor, MS Windows 3.1, DOS 3.0 or
higher, 8MB of RAM and 40MB of hard disk space.
Stat-Sci offers three formal training classes with certified
teachers, namely: Introduction, Advanced Topics, and Statistical
Models. The mini-class and syllabus provided at the Statistics
Conference are in no way a replacement for taking the formal
training and reading the documentation. The three basic lessons to
be covered in this session are: (1) Overview of S-PLUS; (2) Getting
Your Data Into S-PLUS; and (3) Learning from Selected Applications.
Additional selected applications will be shown at the Poster
Session. Selected datasets and S-PLUS functions will be made
available on diskette to attendees after the Conference.
20
-------
Title: Visualizing Data with William S. Cleveland and S-PLUS(cont.)
Part 3: S-PLUS Applications, 1:30 pm to whenever
Each demonstrator will have.10-20 minutes depending on the number
of demonstrators. Each demonstrator should provide (1) a problem
background, (2) S-PLUS script explanation, and (3) a computer
and/or transparency demonstration of the results. S-PLUS for
Windows 3.1 running on several 486 PCs and computer screen
projection equipment will be available for installation of files
and demonstrations at the Poster Session and presentations at this
session. A compilation of the presentation materials and files will
be made available to the participants after the conference if
desired.
Confirmed S-PLUS Application Demonstrators:
Rick Moll, Statistics Canada
Dan Carr, George Mason University
Student of Neerchal Nagaraj, University of Maryland
Esperanza Renard U.S. EPA/Edison, New Jersey
Brand Niemann, U.S. EPA/DC (if needed to fill time)
Potential S-PLUS Application Demonstrators:
Peter Broomfield, NISS/RTP
Larry Cox, U.S. EPA/RTF
Tony Olsen, U.S. EPA/Corvallis
Robert Seila, U.S. EPA/RTP
21
-------
S-PLUS Applications Abstracts
Dr. Rick Moll
Statistics Canada
National Accounts and Environment Division
R.H. Coats Bldg., 21st Floor
Ottawa, ONTARIO
K1A OT6
613-951-3741
Data Visualization and Calibration of a Dynamic Forest resource
Account Using S-PLUS
In this presentation the implementation of a simulation framework
designed to reconstruct a large scale area based forest over the
historical period 1953-1986 is described using S-PLUS. We use the
1953 forest inventory for Ontario of forest areas and volume
characterized by 180 single year age classes, 3 covertypes and 24
districts. Two productive forestland types are considered: stocked
and non-stocked forestland. Growth is represented by the inter-
temporal flow of forest area from younger to older age classes.
Harvesting, mortality, natural regeneration and planting are
represented as separate processes. Endemic losses due to pest
infestations are absorbed in aggregate volume-at-age curves which
represent the net growth process of the forest over time.
Catastrophic losses due to fire are represented as forest area
losses. Forestland inventory is updated for fire by decreasing it
according to historical fire rates. The forest is cut according to
historical softwood and hardwood production volumes by district. In
each year, the available roundwood volume of the forest for
softwood and hardwood is calculated. Then we determine how much
forest area should be harvested by calculating a covertype and age-
specific harvest ratio for volume removed. Stocked forestland is
updated for natural and artificial regeneration of both recently
harvested area and non-stocked forestland. We calculate the total
growing stock volume changes over time due to harvesting, fire, and
natural causes based on average figures of volume per area. The
model is validated by comparing model generated forestland with the
1986 inventory. In the development of this framework we have
learned some important principles for modeling large-scale systems.
First, the model structures need to be generic. That is to say, the
complexity of the model is managed by making the data operations
set-driven. This way, it is not necessary to write down separate
equations for, say, each age-class, but, rather, a set of age-
classes were created in a single equation defined over the set.
This generic structure allows evolutionary changes to be made
easily. Second, a requirement for the calibration procedure is that
the model's data objects, which are multi-dimensional arrays,
should be able to be displayed interactively. S-PLUS provided a
programming environment which satisfied these criteria.
22
-------
S-PLUS Applications Abstracts
Dr. Dan B. Carr
Center for Computational Statistics
George Mason University
Fairfax, Virginia
From Tables To Row-Labeled Plots
This application session describes two user-written S-PLUS
functions for producing row-labeled plots: dot plots, horizontal
bar plots, and horizontal distributional summary plots such as the
boxplot. Row-plots provide graphical alternatives to much of the
information that EPA publishes in tabular form. This session
provides design guidance to facilitate the conversion of moderate
sized tables into elegant plots. Topics covered include grid
options such as white lines on a grey background, symbol options
for factor levels or distributional summaries, multiple factor
layout choices, factor level sorting for enhancing one-, two- and
three-way plots, and plot labeling to provide context. Examples
emphasize TRI and EMAP summaries.
Production of Choropleth Maps and Hexagon Mosaic Maps
This application session reviews S-PLUS command files that produce
choropleth maps and hexagon mosaic maps. The review describes
geographic data structures, data set structure, smoothing using
lowess, display of residuals and construction of legends. Examples
are similar to that published in two 1993 Statistical Computing &
Graphics newsletter articles and emphasize cancer mortality rates
and trends. However the methods can be readily adapted to other
contexts. Selected command files and postscript examples can be
obtained in advance via anonymous ftp to galaxy.gmu.edu and are
stored under /pub/submissions/eda/maps.
23
-------
S-PLUS Applications Abstracts
Gina Papush
Sanjoy V
Arun Satyanarayana
Mathematics and Statistics Department
University of Maryland Baltimore County
Spatial Statistical Methods in S-PLUS
Spatial statistical methods are used in environmental data analysis
to take into account the spatial nature of data. In addition to the
usual tools of exploratory data analysis (EDA) such as scatterplots
and stem-and-leaf diagrams, some special EDA tools are used in
spatial data analysis. S-PLUS functions providing spatial summaries
such as pocket plots and spatial trend removal such as median
polish will be presented. These programs are used to analyze the
Chesapeake Bay Benthic Index data.
24
-------
S-PLUS Applications Abstracts
Esperanza Renard
Superfund Technology Demonstration Division
Risk Reduction Engineering Laboratory
U.S. EPA
Edison, New Jersey 08837-3679
Use of S-PLUS for Evaluation of Test Methods
for Measuring Oil Spill Dispersant Performance
Data were obtained from the evaluation of several methods to
measure dispersant performance for use in an oil spill emergency.
The results and conclusions derived from the evaluation of these
data will be presented at the International Oil Symposium sponsored
by the ASTM in October 1994. The coauthors of this paper use
different statistical programs and approaches for treating the
data. The purpose of this initial application of S-PLUS is to
become familiar with the software package and to reevaluate the
results and conclusions presented in the paper.
25
-------
S-PLUS Applications Abstracts
IF NEEDED TO FILL TIME
Brand Niemann
Environmental Statistics & Information Division
U.S. EPA, 2163
Washington, D.C. 20460
Dissolved Oxygen in the Chesapeake Bay:
Exploratory Data Analysis to Support Monitoring Frequency Decisions
Dissolved oxygen is critical to living resources in the Chesapeake
Bay surface waters. Too much of the nutrients nitrogen and
phosphorus added to the Bay subtracts oxygen and, at times, life
itself from the waters. The Chesapeake Bay Monitoring Program,
begun in 1984, is a bay-wide EPA/state cooperative effort
comprising over 165 stations. Nineteen physical, chemical, and
biological characteristics are routinely monitored 20 times a year
in the mainstem Bay and many tributaries. This "point" monitoring
data at the 49 mainstem sites (see map) along with special
continuous monitoring data at fewer sites and limited time periods
for dissolved oxygen are the focus of this exploratory data
analysis (EDA). In addition, the mainstem point monitoring data
which has been interpolated horizontally and vertically within the
Bay by other researchers was also used. In this way the effects of
sampling frequency (20 versus 12 per year) and spatial
interpolation on dissolved oxygen statistics could determined,
especially in relation to the new suggested water quality "targets"
for dissolved oxygen levels in the Bay.
The EDA was structured into three parts: (1) basic explorations;
(2) additional issues; and (3) advanced explorations following the
principles of "not stopping with the first result" and asking the
five basic questions about the data (who, what, when, where, and
why). Cleveland's approach (1993) of progressing from univariate,
bivariate, and multivariate data was also followed. The EDA tool is
S-PLUS running under the Microsoft Windows 3.1 operating system.
The basic explorations feature cumulative distribution functions of
rank order frequency of dissolved oxygen concentrations, notched
boxplots, and the simple scatterplot matrix. The advanced
explorations feature q-q plots, seasonal decomposition of time
series, advanced scatter plot matrices, coplots, and contour plots.
An effort was made to develop generic S-PLUS functions so it was
easy to change the site data and plot labels and thereby make the
EDA more interactive.
The results show how EDA can improve visualizations of complex
databases, support decisions on monitoring frequency and
development of environmental indicators, and provide comparisons of
results to environmental goals. In addition, the presentation shows
how the S-PLUS results can be "cut and pasted" through the Windows
Clipboard into Folio Views infobases with explanatory text and
26
-------
"data stories" for electronic distribution to a broad audience.
27
-------
Potential S-PLUS Application Demonstrators:
Peter Broomfield
National Institute of Statistical Sciences
Research Triangle Park, NC 27709-4162
Accounting for Meteorological Effects in Measuring Urban Ozone
Levels and Trends
Surface ozone levels are determined by the strengths of sources and
precursor emissions, and by the meteorological conditions. Observed
ozone concentrations are valuable indicators of possible health and
environmental impacts. However, they are also used to monitor
changes and trends in the sources of ozone and of its precursors,
and for this purpose the influence of meteorological variables is
a confounding factor. This report describes a study of ozone
concentrations and meteorology in the Chicago area. The data are
described using a variety of exploratory methods, including median
polish and principal components analysis. The key relationships
observed in these analyses are then used to construct a model
relating ozone to meteorology. The model can be used to estimate
that part of the trend in ozone levels that cannot be accounted for
by trends in meteorology, and to "adjust" observed ozone
concentrations for anomalous weather conditions. The model are
estimated by nonlinear least squares. Its goodness of fit is
assessed by the comparison with nonparametric regression results
(lowest).
28
-------
WEDNESDAY MARCH 9, 1994
8:45-10:00 Statistical Issues in Rulemaking
Henry Kahn
Helen Jacobs, Henry Kahn and Kathleen Stralka; OW
Estimates of Fish Consumption Rates in the U.S.
Estimates of fish consumption in the U.S. based on the 1989 and
1990 USEA Consumption Surveys for Individual Intake (CSFII) will be
presented. Fish consumption estimates play an important role in
a number of EPA problems. In particular, exposure estimates used
in determining water quality criteria and related standards are
based in part on the amount of fish consumed and contamination
levels in the fish. This presentation will provide fish
consumption estimates by habitat (marine, estuarine and freshwater)
and species. Changes in fish consumption during the past two
decades will also be discussed.
Henry Kahn; OW
Statistical Basis of Industrial Wastewater Control Regulations
Regulations that limit the amount of pollutants that may be
discharged by industrial facilities are known as effluent
guidelines regulations. These regulations are based on the
capability of treatment technology in specific industries and
contain numerical limitations on the levels of particular
pollutants that may be discharged in treated effluent. The
limitations are determined on the basis of statistical analysis of
chemical analytical data that characterize the performance of
treatment technology. This presentation provides a general
descriptions of the statistical analysis of the data which includes
model fitting and parameter estimation and adjustments to account
for different sampling periods and the occurrence of non-detect
measurements. Some examples from recent rulemakings are also
discussed.
29
-------
WEDNESDAY MARCH 9, 1994
8:45 - 10:00 am
John Warren and Alfred F. Haeberer—ORD
Statistics and the Agency's Mandatory Quality Assurance
Program
The EPA's Quality Assurance program has evolved over the last
twenty years from a relatively technical laboratory quality control
program to the present program that focuses on the management
processes needed to produce the appropriate data to support
specific applications. The three principal components to this
mandatory program are Planning, Implementation, and Assessment; it
wi axiomatic that statistical design and inference has a great
impact on each element.
The extent of this impact will be discussed through an analysis of
the Requirements and Guidance documents recently issued by the
Quality Assurance Management staff, ORD, and by reference to the
ANSI/ASQC American National Standard, E4, "Quality Systems
Requirements for Environmental Data and Technology Programs".
The major themes of the presentation will center on where the
statistician can have maximum impact on improving the quality of
the Agency's data, and what will be demanded of statisticians by
environmental decision makers.
Copies of the principal Requirements and Guidance documents will be
made available to participants.
30
-------
WEDNESDAY MARCH 9, 1994
10:15-11:30 Statistical Analyses of Ozone Layer and
Composite Sampling
Bimal Sinha
Dulal K. Bhaumik (University of South Alabama)
The Ozone Layer
Some 50 kms above the Earth's surface lies a veil called the Ozone
Layer. It saves the earth from the ultraviolet radiation emitted
by the Sun. Some stable chlorine gases released by the human
activities go above the earth and eat up the ozone layer.
Depletion of the ozone layer is a great threat to the human
society. In this talk we will discuss the lethal effects of ozone
depletion and try to find out through a statistical analysis how
severe the depletion would be if the world continues producing
trace gases as it does today.
Soma Sengupta (University of Maryland/Baltimore County)
Inference with Composite Sampling
A composite sample is a physical mixture of several grab samples.
The problem of inference regarding the mean of a population based
on composite sampling measurements is considered. A necessary and
sufficient condition is derived under which the estimate of the
mean based on the composite measurements is better than that based
on grab measurements. An approximate distribution of composite
sample measurements based on large samples is derived. An example
is used to illustrate this inference procedure.
31
-------
WEDNESDAY MARCH 9, 1994
10:15 - 11:30 am
Small Community Information and Data Program
Mel Kollander and Susan Brunenmeister-OA
The Small Community Information and Data Program was established in
April 1992 in the Administrator's Office to provide an Agency focal
point for small community information. Since its foundation the
program has an outline 2 gigabyte mainframe data base primarily
consisting of information from the 1992 Census of Governments and
the 1990 Census of Population and Housing. The session will
include two presentations about the Small Community Information and
Data Program. The first presentation will describe the mission,
objectives and activities of the program. The second presentation
will provide examples of available information from the mainframe
databank.
32
-------
THURSDAY MARCH 10, 1994
8:45-10:00 am Plenary session
Chapman Gleason—OPPE
An Introduction to Internet For Statisticians
It is the goal of the Clinton Administration to move the US out of
the Industrial Revolution and fully into the Information age. Like
the rail roads and the Interstate Highway system did for the
Industrial Revolution the National Information Initiative will do
fort he Information age. In this presentation I will define the
Internet, give an overview of the National Information Initiative,
describe how the Government Information Locator System (GILS) and
describe how the Bureau of Environmental Statistics will
disseminate information on the Internet. In addition, the
following Internet tools will be defined and an example of their
use will be shown:
1) Bitnet and Listserv(ers) and Internet newsgroups,
2) FTP and Anonymous FTP,
3) Archie,
4) Veronica,
5) WAIS (public and commercial),
6) Gopher,
7) World Wide Web (WWW),
8) Xmosaic.
10:15-11:30 Plenary session Featured Speaker
Tom Van Zant
33
-------
EVALUATION FORM
TENTH ANNUAL EPA CONFERENCE ON STATISTICS
1. Overall Conference Evaluation
Did you broaden your EPA
contacts?
Did you update your current
knowledge?
Did you find exposure to
new material?
Did you gain more agency-
wide perspective?
Were you able to exchange
technical methods?
you able to discuss
lems and concerns?
2. Session Evaluations
Very Much Some Extent Limited Extent
Highly Fairly Not Very
Relevant Relevant Relevant
Plenary Session Introduction
to Conference & Panel Discussion
Plenary Session
State of Pennsylvania
Environmental Statistics Research
at the National Institute for
Statistical Sciences
General Topics
Statistical Research Methodology
-------
Highly Fairly Not Very
Relevant Relevant ^Relevant
How Widespread is Our Influence
as statisticians?
Using Uncertainty Analysis in
Environmental Decision Making
Environmental Statistics
Issues in Pesticides
Poster Session
Demonstrations of S Plus
with Environmental Data Sets
Statistical Issues in Rulemaking
Statistics and the Agency's
Mandatory Quality Assurrance Program
Statistical Analyses of Ozone
Layer and Composite Sampling
L Community Information
Jata Program
Plenary session An Introduction
to Internet For Statisticians
Plenary session
Featured Speaker Tom Van Zant
3. What were the greatest strengths of the conference? What aspects did you like
the most?
-------
4. What were the greatest weaknesses of the conference? What aspects and
sessions did you like the least?
5. Would you be interested in another mini.-cours& th^t would indroduct you, to a
new development in applied statistical methodology?'
Yes No Unsure
Suggestions for topics:
6. Are you planning to attend next: year's Conference on Statistics?-
7. Other comments
------- |