THE TENTH ANNUAL

EPA CONFERENCE ON
      STATISTICS
          MARCH 7-10, 1994

          RAMADA INN NORFOLK
          NORFOLK, VIRGINIA

-------
                    PRELIMINARY PROGRAM

          TENTH ANNUAL EPA CONFERENCE ON STATISTICS

                     MARCH 7-10, 1994

                     RAMADA INN NORFOLK
                      345 GRANBY STREET
                      NORFOLK, VIRGINIA
                         804-622-6682


                    MONDAY  MARCH 7,  1994

6:00-7:30 pm    Social Hour and Cash Bar



                     TUESDAY   MARCH 8, 1994

8:45-10:00 am    Plenary Session—Panel Discussion
                 Phil Ross; OPPE, Derry Allen; OPPE, Paul
                 Wohlleben; OIRM



10:00-10:15 am   Break

10:15-11:30 am   Plenary Session
                 Policy and Guidance in Groundwater Monitoring
                       Statistics: The State of the States
                 State of Pennsylvania/Dept of Natural Resources
                       Phil Morgan and collegues

11:30-1:00 pm    Lunch


 1:15-2:30 pm

                  -Environmental statistics Research at the
                   National Institute for Statistical Sciences
                        Larry Cox, David Holland

                  -General Topics
                        Tim Barry, Al Goozner, Matthew Leopard

                  -Statistical Research Methodology
                        George Flatman


 2:30-2:45 pm    Break

-------
  2:45-4:00 pm
                   -How Widespread is Our Influence as
                    Statisticians?
                         Elizabeth Margosches, John Schwemberger,
                         Karen Hogan and Margaret Conomos

                   -Using Uncertainty Analysis in Environmental
                    Decision Making
                         Barnes Johnson, Tim Barry

                   -Environmental Statistics Issues in Pesticides
                         Ruth Allen
  6:00-7:30       Poster Session  & Statistics Software
                     Bill Smith
                      WEDNESDAY   MARCH 9, 1994


8:45-4:00        Demonstrations of S Plus with Environmental Data
                  Sets
                       Brand Niemann et al
8:45-10:00        Statistical Issues in Rulemaking
                       Henry Kahn

                  Statistics and the Agency's Mandatory Quality
                  Assurrance Program
                       John Warren and Alfred F. Haeberer
10:00-10:15       Break

10:15-11:30       Statistical Analyses of Ozone Layer and
                  Composite Sampling
                       Bimal Sinha

                  Small Community Information and Data Program
                      Mel Hollander and Susan Brunenmeister

11:30-1:15 pm     Lunch


1:15-4:30 pm      Statistical Policy Advisory Committee Meeting
                     including a discussion of the role of SPAC,
                     subcomittee reports including the Ad Hoc
                     subcommittee of Nussbaum et al

-------
                   THURSDAY   MARCH 10, 1994

8:45-10:00 am     Plenary session
                  An Introduction to Internet For Statisticians
                       Chapman Gleason
10:00-10:15       Break

10:15-11:30       Plenary session  Featured Speaker
                      Tom Van Zant

11:3 0-Noon         Awards

-------
                ADDITIONS TO THE PROGRAM FOR THE

            TENTH ANNUAL EPA CONFERENCE ON STATISTICS
    The Featurpd Speaker on Thursday Morning 10:15 to 1.1:30 am will
    be Professor Dan Carr from the Center For Computational
    Statistics at George Mason University.  The topic of his
    presentation will be "Application of Graphic Design Principles:
    Construction of Row Labeled Plots and Maps For Exploration and
    Presentation of EPA Data".
    (Unfortunately the scheduled speaker Tom Van Zant could not
    come)
2.  Add to the Tuesday 2:45 to 4:00 pm session on Using Uncertainty
    Analysis in Environmental Decision Making the following
    presentation:

     Richard Gilbert—Battelle Northwest
     Environmental Decision Making:  What Is To Be Gained From
     Quantitative Estimates of Risk Uncertainty

-------
                            FINAL EXAM



            TENTH ANNUAL EPA CONFERENCE ON STATISTICS



                          MARCH 10, 1994


                 PLEASE ANSWER SIX OF YOIP CHOICE
l.  What are the five major types of evidence that were published
recently to suggest a relationship between organochlorine
pesticides and excess human breast cancer?

2.  How green is GSA?

3.  What unabated area remained high in lead content
(concentration) in Denver homes subjected to lead paint
abatement?

4.  What feature of a lead standard was identified as an
essential component in health-based standard setting?

5.  Are states capable of adaptive EPA stastical regulating (in
RCRA groundwater monitoring) and administrating these
regulations?

6.  What is your notion of 'background' in environmental
monitoring?

7.  Does the role of statistics in EPA and the state agencies
need to expand or retrench in scope?

8.  How do we communicate critical statistical information to
managers on the state level?

9.  What does the C/CPAS acronym stand for?

10.  Write the S Plus script for a qqurom polot of a vector and
include a highly robust straight line fit which is not much
influenced by outliers.  > qquorum (ear. gals), > qqline (can.
gals)

11.  From first principles derive the first two moments of the
Candy distribution.

12.  Describe the difference between stochastic and variability
and knowledge uncertainty.

-------
13.  How many communities in the US have fewer than 10,000 people
based on the 1990 census.

14.  Under what conditions does a systematic example of size n
have sample variance less than or equal to a simple random sample
of size n.

-------
          Questions for the Annual  Statistics  Conference


1.  At what hotel was the first conference held?

2.  Who was the first featured speaker and from what office?

3.  At which conference did Peter, Paul, and Mary speak one
morning?

4.  Which plenary speaker was snowed out of his session?

5.  Which is the only hotel to host the conference twice?
          (Hint: It has had two different names)

6.  In which year did the bus backtrack to the Greyhound depot
after crossing the 14 Street Bridge?

7.   Which conference had three DAA's participate on one panel?

8.   Which assistant administrator spoke at a conference?

9.   Why is the confernce usually held in March?  Which was the
one conference that was not in March?

10.  Most of the conferences have had contractor support to set
them up.  Name the two contractors who did this.

-------
   ON
   9
   Q
   
-------
        CONTENTS
        Combining Data and  Data Uncertainty: Introduction (N.P. Ross). Uncertainty Issues of the Hartford
        Environmental Dose Reconstruction Project (R.O. Gilbert and J.C. Simpson). Decreased Sampling Costs and
        Improved Accuracy with Composite Sampling (S.D. Edland and G. van Belle). Environmental Chemistry,
        Statistical Modeling and Observational Economy (G.P. Patil). Predictive Models of Fish Response to Acidifica-
        tion: Using Bayesian Inference to Combine Laboratory and Field Measurements (WJ. Warren-Hicks and R.L.
        Wolpert). A New Approach for Accommodation of Below Detection Limit Data in Trend Analysis (N.N. Nagaraj
        and S.L Brunenmeister). Spatial Statistics: Introduction (N.P. Ross). Spatial Chemostatistics (N. Cressie).
        Design of the Clean Air Act Deposition Monitoring Network (D. Holland and  R. Baumgardner). National Air
        Quality and Emissions Trends  Report (B.A. Beard and  W.P. Freas). Models and Data Interpretation:
        Introduction (D. Krewski). Statistical Issues for Development Toxicity Data (D. Gaylor). Measuring Carcinogenic
        Potency (M.J. Goddard and D.  Krewski).  Environmental  Pollution and Human Health: An Epidemiology
        Perspective (J.  Schwartz). Soil  Quality as a Component  of Environmental Quality  (M.A. Cole).  Future
        Environmental Management:  Introduction (J.  Abe). Gauging tile  Future  Challenges tor Environmental
        Management: Some Lessons from Organizations with Effective Outlook Capabilities (M. Boroush). Creating
        Strategic Visions (G. Taylor}. Exploring Future Environmental Risks (D. Rejeski). Geographical Information
        Systems {GISJ: Introduction (A.  Pesachowitz). Geographical Information'Systems ;(GIS) for Environmental
        Decision Making (A. Pesachowitz). Mechanisms to Access,Information about Spatial Data (E. Christian).
        Environmental Statistics: Introduction (B. Nussbaum). The Quality of Enyironmentaipatabases (D. A. Marker,
        S. Ryaboy, andH: Lacayo). TheNOAA National-Status arid-Trends Mussel Watch Prqg>a'm: National Monitoring
        of Chemical Contamination  in the Coastal United States (T.P. O'Connor). Some'Problems of Safe Dose
        Estimation (A.P. Basu, G.F. Krause, K. Sun, M. Ellersieck, and F. Mayer). Where Next? Adaptive Measurement
        Site Selection for Area Remediation (H.T. David and S. Yoo). The Center for Environmental Statistics: Interim
        Status and Vision of Products (B. Niemann, C. Curtis, and E. Leonard). Conclusion: Where Do We Go from
        Here? (N.P. Ross and C.R. Cothem).

        Catalog no. L936LBRA
        December 1993, c. 400 pp., ISBN: 0-87371-936-0
        Approx. U.S. $95.00/Outside U.S. $114.00
         30-DAY EXAMINATION POUCY
Lewis Publishers will alow up to $300.00 in publications to be
examined on 30-day approval Oiders lor books totalling over
$300.00 must be pre-paid. or accompanied by a viable purchase
oider All books are returnable  Return postage is the
respofisibiligy of the customer  Journals, software, and data-
bases are not available on an examination basis.
                         %EWIS PUBLISHERS X ^  •
                         $2000 Corporate>Blvd.£N.W. •&£g^&. :£
                           BocaJlaton^Florida 33431K407) 994-0555
                          *  '    ---                  ' ''       '
      LEWIS PUBLISHERS
      2000 Corporate Blvd., N.W., Boca Raton, Florida 3M31
Name
Company or Institution
Street
CnySlate/Zip

Telephone
P.O.*
Check One: O Check enclosed in the amount of $
         D Credit Card:
            D American Expreit  Q Mastercard

                Account number (indud* •« 6gi&)
  »  I  I  I  I   I  I   1   I   I   I  T   I T
Expiration dale	Signature	
  Please biH me
                             FREE Shipping and Handling on Prepaid Orders!
                                                                                                          1193
                                                             3Easv Wavs  ' •u" rep'lf c"dw P™*1**-
                                                             faaj vv«iy»  2. Mail thii portion to LewifPublbheri
                                                             tO Order:   3- c»" to" '** 1-800-272-7737 Monday through Friday
                                                                          (Continental UJ. only) or (407) 9944)555.
                                                        Yes, Send Ms
                           	CbthenVRoss: Environmental Statiriici, Astenment. and Foracading
                              Cat. no. L936LBRA..^pprox. U.S. $95.0
Also of Interest.
                                                           Taylor: Statistical Techniques for Data Anan/ni, 1990
                                                           Cat. no. 12502Z...US. $79.95/Oulside U Ji. $96.00
                                                                                                     LBRALBRE

-------
   THE EPA ANNUAL CONFERENCE OH STATISTICS
1985  «Durham, North Carolina
1986'  Williamsburg, Virginia
1987" (Virginia Beach, Virginia
1988- oWilliamsburg, Virginia
1989-  Charlottesville, Virginia
1990   Williaxnsburg, Virginia
1991   Richmond, Virginia
1992   Philadelphia, Pennsylvania
199J   Baltimore, Maryland
1994^  Norfolk, Virginia
1995   Williamsburg, Virginia
1996   Richmond, Virginia

-------
PLANNING  COMMITTEE  FOR  THE  TENTH  ANNUAL  EPA  CONFERENCE  ON
STATISTICS

Chair   Rick Cothern, OPPE

Ruth Allen, OPPTS

Larry Cox, AERL

George Flatman, EMSL

Chapman Gleason, OPPE

David Holland, ORD

Barnes Johnson, OSW

Henry Kahn, OW

Mel Hollander, OA

Elizabeth Margosches,  OPPTS

Phil Morgan, State of Pennsylvania
  Department of Natural Resources

Brand Niemann, OPPE

Barry Nussbaum, OPPE

Bimal Sinha, University of Maryland/Baltimore Campus

Bill Smith, OPPE

Tom Van Zant, Geosphere Project

John Warren, ORD

-------
 ATTENDEES  AT ALL TEN  OF THE ANNUAL EPA CONFERENCES ON STATISTICS






JOHN CREASON




TOM CURRAN




JIM DALEY




BILL HUNT




BARNES JOHNSON




HENRY KAHN




MEL HOLLANDER




ELIZABETH MARGOSCHES




BARRY NUSSBAUM




PHIL ROSS




BILL SMITH




JOHN WARREN

-------
                          REGISTRATANTS

            TENTH ANNUAL EPA CONFERENCE ON STATISTICS

                       RAMADA INN NORFOLK

                       MARCH 7-10, 1994
Gerald G. Akland
ORD/OMMSQA
MD-75
US EPA
Research Triangle Park, NC  27711
919-541-4885
FAX 919-541-7588
Derry Allen
Acting Director
Office of Strategic Planning and Environmental Data
MC  2161
USEPA
Washington, D.C.  20460
202-260-4028
Ruth Allen
OPPT/OPP/HED/CCB
HC7509C
USEPA
Washington, D.C. 20460
703-308-2918
FAX  703-305-5147
Tim Barry
OPA/OPPE
MC 2127-
202-260-2038
Peter Brcornfield
NISS
RTF, NC
Susan Bruenenmeister
OA  H1501
USEPA
Washington, D.C. 20460
202-260-0246

-------
Lori Brunsman
H7509C
USEPA
Washington, D.C.
703-308-2902
Jeff Beaubier
MC 7403
USEPA
Washington, D.C. 20460
202-260-2263
Judy Calem
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-8638
FAX 202-260-4968
Jim Callier
Region 7
USEPA
MC  RCRA/IOWA
726 Minnesota Ave
Kansas City, KS  66101
913-551-7646
FAX 913-551-7521

Dan Carr
George Mason University
William  S.  Cleveland
Maureen  Clifford
OPP  MC  8602
USEPA
Washington, D.C.  20460
703-308-2827
 Jim Cogliano
 ORD/OHEA
 MC 8602
 USEPA
 Washington,  D.C.  20460
 202-269-3814
 FAX 202-260-3803

-------
Edwin J. Coleman
Coleman/Morse Associates Ltd.
1190 River Bay Road
Annapolis, MD  21401
Margaret Conomos
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-26oO
FAX 202-260-4968
Rick Cothern
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-2734
FAX 202-260-4968
Bill Cox
OAQPS
MD-14
Durham, NC
919-541-5563
Larry Cox
AERL
MD-75
USEPA
RTP, NC 27711
919-541-2648
FAX 919-541-7588
John Creason
ORD/HERL-RTP
USEPA
RTP, NC   27711
919-541-2598
FAX 919-541-5394

-------
Tom Curran
OAQPS/OAR
MD-14
US EPA
RTF, NC  27711
919-541-5467
FAX 919-541-2357
Jim Daley
OPPE/ORME/IPB
MC 2136
US EPA
Washington, D.C. 20460
202-260-2743
Evan England
EMSL
PO BOX 93478
USEPA
Las Vegas, NV  89193-3478
Gary F. Evans
MD-56
Durham, NC
919-541-3124
George Flatman
EMSL
PO Box 93478
USEPA
Las Vegas, NV  89193-3478
702-798-2628
FAX  702-798-2454
Bernice Fisher
H7509
USEPA
Washington, D.C.  20460
703-305-5959
Terrence  Fitz-Simons
MD-14
Durham, NC
919-541-0889

-------
John Fritsvold
Environmental Statistics and Information Division
OPPE    MC 2163
US EPA
Washington, D.C.  20460
202-260-6724
FAX 202-260-4968
Dru Francis
Hampton Roads Sanitation District
PO Box 5911
Virginia Beach, VA  23455-0911
804-460-2261
Niel Frank
OAQPS
MD-14
Durham, NC
919-541-5560
Warren Freas
OAQPS
MD-14
Durham, NC
919-541-5469
William Garetz
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-2684
FAX 202-260-4968
Chapman Gleason
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-9006
FAX 202-260-4968

-------
Al Goozner
OPPTS/OPP/BEAD
H7503W
USEPA
Washington, D.C. 20460
703-308-8147
FAX 703-308-8151

James Hemby
OAQPS
Helen Hinton
OAQPS
MD-14
Durham, NC
919-541-5558
Karen Hogan
MC 7403
USEPA
Washington, D.C. 20460
202-260-3895
Dave Holland
ORD/AREAL-RTP
Mail Drop MD-56
USEPA
RTP,NC 27711
919-541-3126
FAX 919-541-1486
John Hoiley
MC  6406J
USEPA   -
Washington,  D.C.  20460
202-233-9305
 Bill  Hunt
 OAQPS/OAR
 MD-14
 USEPA
 RTP,  NC 27711
 919541-5559
 FAX  919-541-2357

-------
Barnes Johnson
MC 5305
USEPA
Washington, D.C. 20460
202-260-2791

Henry Kahn
OW/OST/EAD
MC  4303
USEPA
Washington, D.C. 20460
202-260-5406
Art Koines
MC 2161
USEPA
Washington, D.C. 20460
202-260-4030
Mel Hollander
OA  H1501
USEPA
Washington, D.C. 20460
202-260-4719
Pepi Lacayo
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.   20460
202-260-2714
FAX 202-260-4968
 Eleanor Leonard
 Environmental  Statistics and Information Division
 OPPE   'Me  2163
 USEPA
 Washington,  D.C.   20460
 202-260-9753
 FAX 202-260-4968
 Matthew Leopard
 MC 2136
 USEPA
 Washington,  D.C.  20460
 202-260-2468

-------
Elizabeth Margosches
OPPTS
MC 7403
USEPA
Washington, D.C. 20460
202-260-1511
FAX 202-260-1283
Mary Marion
Rick Moll
Statistics Canada
National Accounts and Environment Division
R.H. Coats Building, 21st Floor
Ottawa, Ontario, Canada  K1A OT6
613-951-3741
Bob Moon
Cincinnati Technical College
FAX 513-569-1463
James Morant
Environmental Statistics and Information Division
OPPE -   MC  2163
USEPA
Washington, D.C.  20460
202-260-2266
FAX 202-260-4968
 Phil  Morgan
 717-772-3609
 FAX 717-787-8885
 Bill  Nelson
 ORD/AREAL-RTP
 USEPA
 RTP,NC 27711
 919-541-3184
 FAX 919-541-1486

-------
Brand Niemann
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-3726
FAX 202-260-4968
Barry Nussbaum
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-1493
FAX 202-260-4968
Tony Olsen
ORD/ Corvallis
USEPA
Environmental Research Laboratory
Corvallis, Oregon  97330
503-754-4790
FAX  503-754-4338
Thomas Parker
Environmental Statistics and Information Division
OPPE    MC  2163
USEPA
Washington, D.C.   20460
202-260-3378
FAX 202-260-4968
Barbara  Parzygnat
OAQPS
Hugh  Pettigrew
OPPT/OPP/HED
H7509C
USEPA
Washington, D.C.  20460
703-305-5699
 James A.  Reagan
 MD-78A
 Durham,  NC
 919-541-4486

-------
Esparenza Renard
US EPA
Edison, NJ
Phil Ross
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-2680
FAX 202-260-4968
Jerry Sacks
NISS
PO BOX 14162
RTF, NC  27709
919-541-7114
FAX 919-541-7102
John schwemberger, Jr.
MC 7404
USEPA
Washington, D.C. 20460
202-260-7195
Ingrid Schulze
Environmental Statistics and Information Division
OPPE    MC  2163
USEPA
Washington,  D.C.   20460
202-260-3007
FAX 202-260-4968

Robert Seila
USEPA
RTP, NC
 Woody  Setzer
 MD  55
 Durham,  NC
 919-541-0128

 Ron Shafer
 Environmental Statistics  and  Information Division
 OPPE    MC 2163
 USEPA
 Washington,  D.C.   20460
 202-260-6966
 FAX 202-260-4968

-------
Jack H. Shreffler
AERL
MD-75
US EPA
RTF, NC  27711

Bimal Sinha
Department of Statistics and Mathematics
University of Maryland/Baltimore Campus
410-455-2412
FAX 410-455-1066
William Smith
Environmental Statistics and Information Division
OPPE    MC 2163
US EPA
Washington, D.c.  20460
202-260-9659
FAX 202-260-4968

Chris Solloway
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-2697
FAX 202-260-4968
Tim Stuart
Environmental Statistics and Information Division
OPPE    MC  2163
USEPA
Washington, D.C.   20460
202-260-0725
FAX 202-260-4968
David  Svendsgaard
MD-55
Durham,  NC
919-541-2468
 Tom Van  Zant
 Geosphere  Project
 146 Entrada Drive
 Santa  Monica,  CA  90402
 310-459-4342
 FAX 310-459-8299

-------
John Warren
QUAMS   MC 8205
USEPA
Washington D.C. 20460
202-260-9464
FAX 202-260-4346
Patricia Wilkinson
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-2680
FAX 202-260-4968

Paul Wohlleben
Acting Director, OIRM
USEPA
Washington, D.C. 20460
202-260-4465
David Zoellner
Environmental Statistics and Information Division
OPPE    MC 2163
USEPA
Washington, D.C.  20460
202-260-3373
FAX 202-260-4968

-------
ABSTRACTS FOR THE TENTH ANNUAL EPA CONFERENCE ON STATISTICS
                     MARCH 7-10,  1994
                     RAMADA INN NORFOLK
                     345 GRANBY STREET
                     NORFOLK, VIRGINIA
                        804-622-6682

-------
                     TUESDAY   MARCH 8, 1994

8:45-10:00 am    Plenary Session—Panel Discussion
                 Phil Ross; OPPE,  Frederick w. Allen; OPPE, Paul
                 Wohlleben; OIRM

-------
                       TUESDAY   MARCH 8,  1994

  10:15-11:30 am   Plenary Session

                   Policy and Guidance in Groundwater Monitoring
                         Statistics: The State of the States
                   State of Pennsylvania/Dept of Natural Resources
                         Phil R. Morgan, Sr. and colleagues


An informal  survey of state environmental  agencies  has revealed
that there is a great disparity in the role these agencies play in
meeting  the mandates  of  the  UESPA  regulations  on  the use  of
statistical methods in management and monitoring of Superfund and
RCRA sites.   This review of the  statistical  activities of state
agencies also revealed wide variation in the extent to which-states
observe   statistical  methodology  recommended   in   the   USEPA
regulations and guidance.

This session will examine approaches taken by toxic waste-intensive
states  in Superfund and RCRA  monitoring  statistics,  unique data
collection, analysis and proposal review problems faced  by certain
states,  and  will  suggest solutions  to  recurrent  problems  in
statistical  methods  for  detection,  evaluation  and   attainment
monitoring.

The  session will  include  a  discussion  of these issues by state
agency personnel.

Statistical concerns to be addressed include the following:

1.  Overview of RCRA Groundwater Monitoring Problem

     -RCRA monitoring  overview
     -Major monitoring review
     -Balancing statistical issues
     -USEPA power analysis approach
     -Considerations in applying  the USEPA approach

2.  The Verification Problem

3.  Application of Interval Statistics to  RCRA Data for  Detection,
    Assessment, Attainment Monitoring

     -Motivation  for using interval statistics
     -Application of Tolerance intervals
     -Application of Prediction  Intervals
     -Application of Confidence  Intervals
     -Application of Combined  Tolerance/Prediction Intervals
     -The Statistical  'Performance  Standard1 Problem
     -Lognormal  Interval Statistics
     -Nonparametric Interval Statistics

-------
4.  Description of Example Data


     -Overview
     -Data: Locations and Chronology
     -Using Results of Previous Analysis

5.  Power Analysis Using Tolerance/Prediction Intervals

     -Logical Description of how Simulation Procedures
      Relates to the Waste Site
     -Description of Data Generation and Simulation Procedures
     -Results of Simulation

6.  Attainment and Maintenance Statistics: Trend and Causality

     -Attainment Sampling and Sequential Testing
     -Mann-Kendall Statistics
     -Keeping an Eye on Statistical Criteria

7.  Statistics in Today's Regulatory Environment

     -Is it possible to "keep it simple"?
     -Resources Needed by State and Federal Agencies
     -Resources Needed by the Regulated Community

-------
                       TUESDAY   MARCH 8,  1994

1:15 - 2:30 pm
                    -Environmental Statistics Research at the
                     National Institute for Statistical Sciences
                          David Holland and Larry Cox—ORD


     Meteorological Effects in Measuring Ozone Levels and Trends

Observed  ozone concentrations  are  used to  monitor  changes  and
trends  in the  sources  of ozone  and its  precursors.    For  this
purpose the influence of meteorological variables is a confounding
factor.  Data from the Chicago area are explored with a variety of
methods.   The key relationships are used  to construct nonlinear
least  squares,  nonparametric  and  logistic  regression  -models
relating ozone levels unaccounted for by trends in meteorology, to
"adjust"  observed  ozone  concentrations   for  anomalous  weather
conditions and to predict exceedances over high thresholds.


     Effects  of Particulate  Levels  on Mortality  Accounting for
Meteorology

Potential   relationships  between   mortality   and   levels   of
particulates  (in particular, PM10) are confounded by meteorological
variables.    Coefficients  from regression models  relating these
variables  are  often  used to measure  the  effect  of  PM10  on
mortality.   Models are constructed, and applied  to Chicago area
data,  indicating that season  and meteorology play dominant roles
and  that the effect  of PM10  on mortality  may  be insignificant.
Alternative   (to  regression  coefficient)   ways  of  measuring the
effect of PM10 are also used to disentangle season and meteorology
from particulate levels with  similar conclusions.


     Strategies  for  Combining Data  From Multiple Studies in Risk
Assessment

These  strategies accommodate  both systematic and random variation
between  studies while developing dose-response relationships.  A
key  step  is to discretize the response into severity categories in
order  to combine  endpoints  on  possibly  different scales.   The
probability  of an adverse outcome  is  then modeled using general
mixed  model  logistic regression.  Modeling error is assessed and
reduced through  stratification and random effects modeling.  These
 (and other)  tools address  systematic differences between studies,
such as, species effects, uncontrolled sources of variation such as
lab  and  investigator effects  as well as combining  information for
studies with different  levels of quantification.   Analyses  of data
from studies of tetrachloroethylene and methylisocyanate illustrate
the  methods.

-------
                       TUESDAY   MARCH 8, 1994

1:15 - 2:30 pm
                    -General Topics
                          Tim Barry, Al Goozner, Matthew Leopard
Tim Barry—OPPE

     The  Principle  of  Maximum  Entropy  and  the  Selection  of
     Probability Distributions.

Maximum entropy  (MaxEnt)  is a nonparametric method for selecting
probability  distributions developed by  Shannon  and  Jaynes which
guarantees the maximum use  of  the available data  without- going
beyond it.  MaxEnt has proven to be especially useful in data poor
situations.  Recent advances have opened up the method for expanded
uses in environmental risk assessment.  This presentation presents
an  overview  of   the  methodology  and  discusses  some  recent
applications   and  their  potential  uses  in  quantitative  risk
assessment.
Al Goozner—OPP/OPPTS

     Certified/Commercial Pesticide Applicator Survey C/CPAS

EPA had  embarked on a new data  collection  program under mandate
from the  1990 Food, Agriculture, Conservation and  Trade Act  (PL
101-624).   The 1993  Certified/Commercial  Pesticide  Applicator
Survey  (C/CPAS)  is the  initial EPA  effort  to collect  needed
pesticide use data from these applicators  in  order to compile a
report to be  submitted to the Congress in compliance with the new
law.
A view of how preliminary investigations were conducted to develop
a survey design and a successful OMB  Information Collection Request
will be  presented.  Emphasized are the objectives  of collecting
information with  the  minimum respondent burden.    Reference  to
Cochran's Sampling Techniques will be made.  Participants will see
how the  stage was  set  for  an on-going data  collection program.
Topics to be  covered include:

     A)  The  Congressional Mandate
     B)  Preliminary Investigations
     C)  Developing a Focus for the Survey
     D)  Methodology Development
     E)  ICR  Preparation - Approval

-------
Matthew G. Leopard

     Transborder Hazardous Waste Data Electronic Data Interchange
     (EDI) Project

The Maquiladoras or  "Maquilas",  defined as U.S.-based industries
operating in Mexico, are an extreme case of an industry subjected
to  myriad  environmental reporting  requirements.    Due  to  the
transborder   nature  of   their  business,   they  must   submit
environmental compliance reports to multiple agencies on both sides
of the border.  This  requires the Maquilas to engage  in a laborious
and often duplicative act of transposing information  from databases
onto paper forms.  For them, transmitting the data electronically
could,  in theory,  lead  to  significant reductions  in regulatory
burden—possibly condensing the data required on many paper, forms
into  a  single  electronic  format.    Converting from  paper  to
electronic submissions of data involves much more than a technical
fix—to maximize its value,  senders  and recipients  must reassess
their existing data  management practices and data requirements.

This fall the EPA  initiated a project to electronically transmit
hazardous waste data from the Maquilas to  EPA Regional, State, and
other  Federal  Agencies.    The  broad  objective  is  to lay  the
groundwork   for  a  phased   implementation  of  electronically-
transmitted hazardous waste compliance data to government agencies
and appropriate industries.  Using the experience gained from other
EDI pilots, the EPA is adapting EDI standards (referred to as ANSI
ASC  X12)  used  by  industry to  the   unique requirements of  the
Maquilas.   The EPA has formed a working group of industry, State,
and U.S. and Mexican government representatives  to orchestrate the
project.   Presented at the  meeting  will  be  the  status of  the
project, the  lessons learned thus far, and  the potential value of
EDI with respect to  other environmental data requirements.

-------
                       TUESDAY   MARCH 8,  1994

1:15 - 2:30 pm
                    -Statistical Research Methodology
                          George Flatman—ORD


     Environmental Monitoring: New Answers For Old Questions

Often statistics is used and taught as if it were a dead language
like Latin which  first killed the Romans and  is  now killing the
students.  However,  statistics is alive and well  and adding new
methods and algorithms.  In the last few years, spatial statistics
has rewritten the  answers to the ubiquitous questions of (1) how to
take  "representative"  samples of assured  quality,  (2)  how  to
optimize sampling design (number of samples), and  (3) how to make
data analysis understandable to decision makers.  The  cause of the
change is "spatial correlation", which is a technical  term for the
common sense fact that environmental samples taken close together
in space are both apt to be high because  they come from the same
plume area or both low because they come from the same background
area.  Varying together  is correlation.  This talk will summarize
the  meanings  of:   (1)   "correct  sample"  from Gy's   Theory  for
determining  sample   mass  in  heterogeneous  media   for  Quality
Assurance,  (2)  additional  sampling  optimization   rules—equal
probability and equal spacing of samples, and (3) data analysis to
measure  false  positives,   false  negatives,  and  power for  the
decision makers.  Statistics did not kill the  ancient statisticians
and  has the  potential  to make  the   life  of the environmental
scientist or manager  a  low easier and more productive (accurate).
     Environmental  Monitoring  (Sampling  From a Correlated Random
Field):  How Many Samples?

George  T.  Flatman  will present  a  unified  sampling  strategy,
discussing  the need for  and  inter-relationships  among classical
random   variable   sampling,   Gy's   Theory   of   sampling,   and
geostatistical  sampling  for environmental  monitoring.    Evan J.
Englund will present "SAMPLAN",  or how to answer "how many samples"
for  one and two  stage spatial sampling  campaigns using spatial
(conditional)  simulation.

-------
                       TUESDAY   MARCH 8, 1994

2:45 - 4:00 pm
               -How Widespread is Our Influence as
                Statisticians?
                          Elizabeth Margosches, John Schwemberger,
                          Karen Hogan and Margaret Conomos

Karen Hogan—HEB/HERD/OPPTS

     Health-based Standards for Lead

OPPTS has been charged with developing health-based standards for
lead  in  dust  and  soil  under  Title  X,   Section 403,  of  the
Residential Lead-based Paint  Hazard  Reduction Act of  1992.   This
development is being coordinated with work going on in the -Office
of Solid Waste and Emergency Response, and follows on work from the
Office of  Ground Water and  Drinking Water.  Approaches  using a
pharmacokinetic   model   based  on   clinical   (human)   data  and
statistical  models  of  epidemiologic studies are outlined.   In
particular, issues concerning extrapolation from these constructs
to  national distributions  of potentially  exposed children  are
discussed,  including  the  viability  of  assessing the  effect  of
standard setting at the household level.


John G. Schwemberger and Benjamin S. Lim—OPPTS,  Bruce E. Buxton,
Steven W.  Rust,  and  John  G.  Kinateder—Battelle, Frederick  G.
Dewalt and Paul Constant, Midwest Research Institute

     Does Lead Paint Abatement Work?

Lead paint abatement of a residence is an expensive and disruptive
undertaking.  An opportunity to evaluate lead paint aba cement arose
after lead paint abatements had been  completed on approximately 50
houses  in Denver,  Colorado.   In  addition,  houses  with  a  low
incidence of lead-based paint were available to comprise a control
group.   Levels of lead  in the dust and  soil of  the  abated  and
control   houses  were  measured  after  normal   occupancy  was
established.   Dust  samples were collected  from  window sills  and
window channels  (the channel  is generally the part of the window
where the sash rests when the window is closed) , from air ducts and
floors, and  from  interior and exterior entryways.   Soil samples
were collected near entryways, near the foundation,   and at  the
property boundary.

Lead in dust was measured both as a percentage of collected house
dust and as an amount of lead per unit of area. The amount of lead
per unit area  was dependent  on the  amount of  dust collected from
the area.   Therefore,  the percentage of lead in the  house  dust,
usually called the  lead concentration,  was used as  the primary
means of comparing abated houses to control houses.  Lead in soil

-------
was measured as percentage of the collected soil.  Hence soil lead
was also measured as a lead concentration.

The geometric means of the lead concentrations were statistically
equivalent for the abated  and the control houses for a number of
sampled areas. Differences were, with one exception, attributed to
unpainted areas of the house or property (air ducts, exterior soil)
that were not abated.  The lone exception was window sills.  Lead
concentrations on window sills were significantly higher in abated
houses than  in  control  houses.  This result  is  an anomaly;  mean
lead concentrations  in  window channels were  similar  for  the two
types of houses.

Lead  paint   abatement   worked  in  the  sense  that  mean  lead
concentrations  at abated  houses  were  similar  to  those  at  the
control houses, and non-abatement accounted for all but one"of the
cases of differences.  A surprising result was the level of lead
concentrations  in  the  window  channels.     The geometric  mean
concentrations were high for both control and abated houses.  Hence
it is possible that window channels remain a source of lead,  even
after lead paint abatement and even in houses  with a low incidence
of lead-based paint.

This study was  a follow-up of a  study  done by the Department of
Housing and Urban Development (HUD).  HUD  conducted the abatements
and examined costs, worker safety,  and clean-up.  The EPA study was
designed to determine if the  abatements  worked two years after the
abatements had been completed.  It is  estimated that approximately
57 million homes in the United States contain some lead-based paint
at or above the statutory definition of 1.0 milligrams per square
centimeter.   The 57  million homes  with  lead-based paint  have an
average of 580 square feet of interior  surface and an average of
900 square feet of exterior surface covered with  lead-based paint.
decisions regarding whether abatement works have the potential to
affect many of the owners of  these 57  million  homes, who will have
to decide how to deal with hazards  from lead-based paint.
Margaret  Conomos,  John  Michael,  William M.  Devlin,  Stephen  K.
Dietz—OPPTS

     Design for  the  Environment—General  Services Administration
(GSA) Field Office Cleaning Systems Survey

The Office  of  Pollution Prevention and Toxics  (OPPT)  of  EPA has
been assisting  the Public Buildings  Service  (PBS)  of GSA  in  an
evaluation  of  cleaning products.   The effectiveness ratings and
perceived health effects of 13  "green" products were compared to 6
other products currently used by GSA.  The "green" products are so
named because of the  reduced packaging requirements  and  the fact

                                10

-------
that manufacturers have said that their products are non-toxic to
the environment and users.

EPA and its contractor, Westat, helped GSA (EPA's client) develop
and administer a series of four questionnaires to 45 GSA cleaning
personnel  over a  period  of  6 months.   Quantitative data  were
collected  from the respondents about their experiences  with the
various  products.    In  addition,  qualitative  information  was
collected  from respondents about  their experience  with various
products.  Methods were developed to track each respondent across
cycles  of  product  testing  while  still  preserving  respondent
anonymity.  A  stagewise approach to Analysis of Variance was used
to analyze product effectiveness, and cope with missing data.
                                11

-------
                       TUESDAY   MARCH 8,  1994

2:45 - 4:00 pm
                    -Using Uncertainty Analysis in Environmental
                     Decision Making
                          Barnes Johnson
Richard Gilbert—Battelle Northwest

     Environmental Decision Making: What is To Be Gained From
     Quantitative Estimates of Risk Uncertainty?

Quantitative risk predictions are uncertain because of incomplete
information and knowledge about data, models, model parameters, and
the true state of nature.  This uncertainty may be assessed using
qualitative or quantitative (Monte Carlo simulation) methods.  The
rigor with  which the uncertainty  of  quantitative  risk estimates
should be assessed depends on many factors including 1)  the size of
the estimated risk, 2) the consequences of making wrong decisions
because of  risk uncertainty  is  not adequately assessed,  3)  the
availability or obtainability of information needed to quantify the
uncertainty of risk estimates to the desired degree,  and 4) whether
there is  a  need to identify key  model components  and parameters
that should be studied  to  reduce risk uncertainty and  decision
errors.   Quantitative  uncertainty analyses  are  typically  more
expensive and  may be more  difficult to conduct,  understand and
interpret  than  qualitative  analyses.    Moreover,  quantitative
uncertainty analyses  of  risk  are themselves uncertain because of
uncertainties about which probability density functions  are most
appropriate to model  the uncertainty  risk  parameter values.   For
these  reasons,   the   advantages  as  well   as   limitations  of
quantitative uncertainty analyses must be  clearly understood by
decision makers and stakeholders  before the method is used.  We may
ask "How  does the -decision maker benefit by having  available a
quantitative estimate of  risk uncertainty?", "How do we  decide when
to use quantitative uncertainty analyses?11,  and "Is  the additional
information obtained by the quantitative uncertainty analysis worth
the cost?".  These questions will be addressed and  illustrated in
the  context  of  1)  making  decisions about  whether  additional
environmental samples or  information are needed by the stakeholders
for making  cleanup decisions  based on risk,  2)  eliciting expert
judgements concerning the uncertainty of risk parameter values in
the   absence   of  site-specific   information,    3)   developing
computational tools for using uncertainty analyses to design more
efficient sampling strategies at sites where risk assessments are
needed, and 4) the Data Quality Objectives process  for developing
sampling designs to meet uncertainty requirements.
                                12

-------
Tim Barry—OPPE

     Two-dimensional Mote Carlo Analysis.

In  quantitative risk  analysis,  it  is  important to  distinguish
between naturally  varying quantities (stochastic variables)  and
uncertain quantities (i.e., quantities for which our uncertainty is
due to  a  lack  of  knowledge about  their true but unknown value
either in a statistical sense regarding inaccuracy or imprecision
in parameter estimates or in a scientific sense regarding missing
or  ambiguous  information  or gaps in scientific theory).   A two-
dimensional (nested) Monte Carlo analysis was conducted to assess
the uncertainty and variability of cancer risks  attributable to
radon in drinking water.  Exposure pathways included the inhalation
of  volatilized  radon gas  and  its  daughter products  and  the
ingestion of waterborne radon.  This presentation will  focus on the
underlying theory for two dimensional Monte Carlo analysis, using
the radon case  study for illustration of the methodology.
                                13

-------
                       TUESDAY   MARCH 8, 1994

2:45 - 4:00 pm
                    -Environmental Statistics Issues in Pesticides
                          Ruth Allen
                                14

-------
          ENVIRONMENTAL STATISTICS ISSUES IN PESTICIDES
                  Ruth H. Allen, Ph. D.,  M.P.H.

This presentation is design to highlight generic environmental
statistics issues for the management of pesticides. There are
three parts. " Audience questions will be taken after each part
for up to five minutes to maximize group participation.

Part 1. Report of the Statistical Needs Assessment-Phase 1 by
Ruth H. Allen, Ph. D., M.P.H. (10 min.)

This part is an overview of the results of the first phase of the
OPP statistical needs assessment. It highlights the common
environmental statistics issues across divisions within EPA.
Brief mention is given of work in progress with a list of experts
is different subject areas.  Environmental statistics problems
and solutions are summarized for Phase 2.  Environmental
statistics is looked at in the context of current streamlining
activities.

Part 2. Breast Cancer and Pesticides in the Environment:  A Case
Example of the Usefulness of Environmental Statistics by Ruth H.
Allen, Ph. D., Amy Rispin, Ph.D. and Victor Miller, Dip. Pharm.
(30 min)

This part highlights the important role of environmental
statistics in the assessment of the risks to human health and
wildlife.  Information regarding possible mechanisms of action
from chemical agents, diet and other lifestyle related risk
factors are presented. It also reviews and summarizes the recent
hypothesis that organochlorine pesticides may be linked to
increases in the rates of human breast cancer worldwide. Vital
statistics and epidemiology findings from national and
international sources are included with special reference to
selected organochlorine compounds. Lessons learned from pesticide
management and regulatory activity over the last few decades are
also included.

Part 3. Certified/Commercial Pesticide Applicator Survey C/CPAS
Discussion of Survey Design by Al Goozner (20 min)

This part describes EPA participation is a new data collection
program under mandate from the 1990 Food, Agriculture,
Conservation and Trade Act (PL 101-624).   The purpose of the
survey is to collect pesticide use information from
certified/commercial pest applicators.  This presentation will
review the congressional mandate, discuss strategies to comply
with OMB information collection reporting requirements, highlight
methods development issues, and include reference to Cochran's
Sampling Technique.  Participants will see how the stage was set
for an on-going data collection program.

-------
                       TUESDAY   MARCH 8,  1994
   6:00-7:30 pm      Poster Session
                      Bill Smith
Edwin J. Coleman; Coleman/Mores Associates Ltd.

     Data: Where It Is and How to Get It

We see  this  book as our modest  contribution  to  America's global
competitiveness and business productivity.  We feel strongly that
the time has  come for Americans to become more  data literate.  That
is the  primary reason we have prepared  this  hands-on,  practical
guide to useful business, environmental and energy data.  This book
meets an important need.   For years, we have seen professionals in
all walks of life make important business and political decisions
without using  the best data  available.   We feel  we have remedied
these  problems  in  this  one volume.     Half  of  the  book  is
instructional.   It  is an  introductory guide to data,  where it is
produced, how it is  prepared, the tricks to using  it and the jargon
that seems to make  it difficult to use.   We call  this section the
DATAPRIMER because  it explains everything anyone  needs to know to
feel comfortable with data and to use it  effectively.  We have make
every  effort  to be  clear  and   practical,  without  sacrificing
accuracy.

The  other half  of  the  book contains  three  practical and  well
indexed directories to  business, environmental  and  energy data.
This is the DATAPHONER section of the book and it is the heart of
the volume.  It contains the names and areas of specialization of
over 2,500 individuals in federal, state  and local governments, and
in private firms who can answer  specific  questions  about nearly
every   business-related  data    issue   of   interest   to   busy
professionals.
Brand Niemann;  OPPE

     Environmental Statistics and Information ONLINE

The  Clinton  Administration's  major  initiative  to develop  the
National  Information Infrastructure  (Nil)  is  prompting  federal
Government agencies  to  explore new  solutions  to "ensure that the
immense reservoir  of government information  is  available to the
public  easily  and equitably."     Of  course  before  government
information,  especially   environmental   data,   statistics,  and
indicators, can be made more available,  we  need to locate it and
add  value to  it  so  our  own agency  can first  use it.   ESID's

                                15

-------
              10th Annual EPA Statistics Conference
                         March 7-10,  1993
                        Norfolk, Virginia

                   Poster Session Presentation
                     March  9thf  6-7:30  p.m.

             Dissolved Oxygen in the Chesapeake Bay:
Exploratory Data Analysis to Support Monitoring Frequency Decisions

Dissolved oxygen is critical to  living resources in the Chesapeake
Bay  surface  waters.  Too  much  of  the nutrients  nitrogen  and
phosphorus added to the  Bay subtracts oxygen  and,  at times,  life
itself from  the waters. The Chesapeake Bay  Monitoring  Program,
begun  in  1984,  is  a  bay-wide  EPA/state  cooperative  effort
comprising over 165 stations.  Nineteen physical,  chemical,  and
biological characteristics  are routinely monitored 20 times a year
in the mainstem Bay and many tributaries.  This "point" monitoring
data  at   the  49  mainstem   sites  (see  map)   along  with  special
continuous monitoring data  at fewer sites and  limited time periods
for dissolved oxygen are the focus of this  EDA.  In addition,  the
mainstem  point  monitoring  data  which  has  been  interpolated
horizontally and vertically  within the Bay by other researchers was
also used. In  this way the effects of sampling  frequency (20 versus
12  per  year)  and  spatial  inerpolation  on  dissolved  oxygen
statistics could  determined, especially in  relation to  the  new
suggested water quality  "targets"  for dissolved  oxygen levels in
the Bay.  This  EDA was conducted as both a graduate class project at
George Mason  University and for  the  Chesapeake Bay  Monitoring
Subcommittee.

The EDA was structured  into three parts:  (1)  basic explorations;
(2) additional issues;  and  (3) advanced  explorations following the
principles of "not stopping with the first result" and asking the
five basic questions about  the  data (who, what,  when,  where,  and
why). Cleveland's approach  (1993) of progressing from univariate,
bivariate, and multivariate  data was also followed. The  EDA tool is
S-PLUS running  under the Microsoft Windows  3.1 operating system.
The basic explorations feature cumulative distribution functions of
rank order frequency of  dissolved oxygen concentrations, "notched"
boxplots,  and  the  simple  scatterplot  matrix.   The  advanced
explorations  feature  q-q plots,  seasonal  decomposition  of  time
series, advanced scatter plot matrices, coplots, and contour plots.
An effort was made to develop generic S-PLUS  functions so it was
easy to change the site data and plot labels.

     The results show how EDA can improve visualizations of complex
databases,  support   decisions   on  monitoring   frequency   and
development of environmental indicators,  and provide comparisons of
results to environmental goals. In addition, the presentation shows
how the S-PLUS results can  be "cut and pasted" through the Windows
Clip Board  into Folio Views infobases  with explanatory  text and
"data stories" for electronic distribution to a broad audience.

-------
Environmental Statistics  and Information ONLINE is a  program of
products designed to locate and add value to environmental data and
information  science  approach with  state-of-the-art  tools  for
envisioning, visualizing, structuring,  and distributing.

You are invited to see the following:

     (1)   the latest version  of  the Guide to  Selected National
Environmental Statistics  in  the U.S. Government as  a  hypertext-
hyperlinked electronic book in Folio Views for both DOS and Windows
and how to access it in the Internet;

     (2)   the Interagency PC Global Change  Data  and Information
System and the Intergovernmental Master  Directory of Water Quality
and Ancillary Data infobases that provided access to key reports,
statistics, and metadata and links to databases on CD-ROMs;

     (3)    the  Source  book  and  Syllabus  of Visualizations  of
Environmental  Databases  with  S-PLUS for  EPA  Managers for  key
environmental problem areas  like  global change,  acid deposition,
superfund environmental equity, Chesapeake Bay water quality, and
EMAP resources monitoring; and

     (4)  custom CD-ROMs with agency  databases, documentation, and
analyses compiled and written for the Office of Water,  Chesapeake
Bay Program Office, Region III/MAHA Program, etc.
                                16

-------
Bernard Most; ManTech

     Defensible Data Management for Chronic Studies


Using two-year studies investigating the toxicity of byproducts of
water  purification,  we display  a  data   management  flow  which
emphasizes good data management techniques providing an audit trail
from "lab data" (on a PC) to analysis datasets (SAS datasets on a
mainframe).   In addition, these  techniques have  the  ability to
provide feedback to the lab workers  (data collectors) which may be
beneficial   in   maintaining   high   standards  of   adherence  to
experimental and data recording protocols.
                                17

-------
Statistical Software Demonstrations
     -Minitab; Maureen McCullen
     -S+; Tom Christie
     -SPSS; Bill Haffey
                                18

-------
                       WEDNESDAY   MARCH 9, 1994
 8:45-4:00        Demonstrations of S Plus with Environmental Data
                   Sets
                        Brand Niemann et al
Title: Visualizing Data with William S. Cleveland and S-PLUS

Part 1: William S. Cleveland, 8:45 - 10:15 am

William S.  Cleveland is a leading researcher  in  the analysis of
statistical data.  His interests have ranged from the theoretical to
the applied, starting with an A.B. in math at Princeton, then on to
a  Ph.D.  in  statistics at Yale,  and finally  a  position  in  the
Mathematics Research  Center at AT&T  Bell Laboratories.   Today he
concentrates   on   research   in   statistical   methods,   data
visualization, and visual perception. His writings,  which include
three books  and  numerous journal articles,  have  enjoyed a large
audience. Many of the visualization methods that he developed and
applied in his writings are widely used throughout the scientific,
engineering, and business communities.

Dr. Cleveland's new book,  Visualizing Data,  is about visualization
tools and a philosophy of data analysis that stresses a penetrating
look at the  structure of  data. There are graphical  tools such as
coplots, brushing,  and banking to 45 degrees. There are fitting
tools such  as loess  and  bisquare. The book conveys the role of
visualization   in  drawing   conclusions   from   data   and   its
relationships to classical statistical methods.

The book is organized around applications of the tools to  data sets
from scientific studies. This shows how each tool is used, and the
class  of  problems  it solves.  It  also  reveals  the   power  of
visualization; for many of the applications,  the tools of the book
reveal  missed effects  and errors  in judgment  in  the  original
analyses. And the applications convey the excitement of discovery
that visualization brings to data analysis.

Dr. Cleveland's new book, Visualizing Data  (1993)  and the revised
printing of his The Elements of Graphing  Data (1985) are available
from Hobart Press, Lisa McKittrick,  Publisher.  The data sets used
in Visualizing  Data are  also  available  by  electronic mail  from
Hobart Press.
                                19

-------
Title: Visualizing Data with William S. Cleveland and S-PLUS(cont.)

Part 2: S-PLUS Mini-Class, 10:30 - 11:30 am

S-PLUS  is  a state-of-the-art,  interactive  computing environment
which provides both a full-featured graphical data analysis system
and an object-oriented language. The  flexible S-PLUS  system can be
used  for exploratory  data analysis,  graphics,  statistics,  and
mathematical  computing.   S-PLUS can be  used  as an application
package or  as  a  development environment  for custom data analysis
and graphics applications. S-PLUS is the commercial version and a
superset of the  original S language from  AT&T Bell Laboratories
available from Stat-Sci,  Inc.  S and  S-PLUS  are  at the leading edge
of statistical research and new developments usually appear sooner
than  in other statistical software  packages.  There are  both an
electronic  mailing list  for  people  using  S where  you  can share
experiences with  other S and  S-PLUS  users  and an archive server,
StatLib,  for  user-contributed  S  functions  and  mailing  list
discussions on the Internet.

S-PLUS  is  a large system, with  over 1400  built-in functions and
dozens of additional functions stored in included  libraries. S-PLUS
runs under  the DOS, Windows 3.1  and UNIX operating systems. An S+
Interface  is available  and  S-PLUS  for  ARC/INFO and  S-Plus for
Remote  Sensed Data Analysis are  in advanced development. Stat-Sci
recommends the product Data Junction to convert your data files to
and  from dozens  of popular  databases,  spreadsheets,  and other
applications  for use with S-PLUS.  S-PLUS  requires  a  386  or 486
based machine with a math co-processor, MS  Windows  3.1,  DOS 3.0 or
higher, 8MB of RAM and 40MB of  hard disk space.

Stat-Sci  offers  three  formal  training  classes with  certified
teachers, namely:  Introduction, Advanced Topics, and Statistical
Models.  The mini-class  and  syllabus provided at  the  Statistics
Conference  are  in  no way a   replacement  for taking  the formal
training and reading the documentation. The three basic  lessons to
be covered in this session are:  (1) Overview of S-PLUS;  (2) Getting
Your Data Into S-PLUS;  and (3) Learning from Selected Applications.
Additional  selected  applications  will  be shown  at  the  Poster
Session.  Selected  datasets  and S-PLUS  functions  will  be  made
available on diskette  to attendees after the Conference.
                                20

-------
Title: Visualizing Data with William S.  Cleveland and S-PLUS(cont.)

Part 3: S-PLUS Applications, 1:30 pm to whenever

Each demonstrator will have.10-20 minutes depending on the number
of demonstrators.  Each demonstrator  should  provide (1)  a problem
background,  (2)  S-PLUS  script explanation,  and   (3) a  computer
and/or  transparency  demonstration  of  the  results.  S-PLUS  for
Windows  3.1  running  on several  486  PCs   and  computer  screen
projection equipment  will be  available for  installation of files
and demonstrations at the Poster Session and presentations at this
session. A compilation of the presentation materials and files will
be made available  to the participants  after the  conference  if
desired.

Confirmed S-PLUS Application Demonstrators:
     Rick Moll, Statistics Canada
     Dan Carr, George Mason University
     Student of Neerchal Nagaraj, University of Maryland
     Esperanza Renard U.S. EPA/Edison, New Jersey
     Brand Niemann, U.S. EPA/DC  (if needed to fill time)

Potential S-PLUS Application Demonstrators:
     Peter Broomfield, NISS/RTP
     Larry Cox, U.S.  EPA/RTF
     Tony Olsen, U.S.  EPA/Corvallis
     Robert Seila,  U.S.  EPA/RTP
                                21

-------
S-PLUS Applications Abstracts

Dr. Rick Moll
Statistics Canada
National Accounts and Environment Division
R.H. Coats Bldg., 21st Floor
Ottawa, ONTARIO
K1A OT6
613-951-3741

Data Visualization  and Calibration of a  Dynamic  Forest resource
Account Using S-PLUS

In this presentation the implementation of a simulation framework
designed to reconstruct a  large  scale  area  based  forest over the
historical period 1953-1986 is described using S-PLUS. We use the
1953  forest inventory  for Ontario  of forest  areas  and  volume
characterized by 180 single year age classes, 3 covertypes and 24
districts. Two productive forestland  types are considered: stocked
and non-stocked  forestland.  Growth is represented  by the inter-
temporal flow  of forest area from younger  to  older age classes.
Harvesting,  mortality,  natural   regeneration  and  planting  are
represented  as separate  processes.  Endemic losses due  to  pest
infestations are absorbed  in aggregate volume-at-age curves which
represent  the  net  growth  process  of   the  forest  over  time.
Catastrophic losses due to  fire are represented as  forest  area
losses. Forestland  inventory is updated for fire by decreasing it
according to historical fire rates. The forest is cut according to
historical softwood and hardwood production volumes by district. In
each  year,  the  available  roundwood  volume  of  the  forest  for
softwood and hardwood  is calculated. Then  we  determine how much
forest area should be harvested by calculating a covertype and age-
specific harvest  ratio for volume removed.  Stocked  forestland is
updated for  natural and  artificial regeneration  of  both recently
harvested area and  non-stocked forestland. We calculate the total
growing stock volume changes over time due to harvesting, fire, and
natural causes based on average figures  of  volume  per area.  The
model is validated by comparing model generated  forestland with the
1986  inventory.   In the  development  of  this  framework  we  have
learned some important  principles for modeling large-scale systems.
First, the model structures need  to be generic.  That  is to say, the
complexity of  the model  is managed by  making the data operations
set-driven. This  way,  it  is not  necessary to write  down separate
equations  for,  say, each  age-class, but, rather,  a  set  of  age-
classes were created in a single equation  defined  over the set.
This  generic structure  allows   evolutionary changes  to  be  made
easily. Second, a requirement for the calibration procedure is that
the  model's data objects, which are multi-dimensional  arrays,
should  be  able to  be  displayed  interactively. S-PLUS provided a
programming environment which satisfied these criteria.
                                22

-------
S-PLUS Applications Abstracts

Dr. Dan B. Carr
Center for Computational Statistics
George Mason University
Fairfax, Virginia

                 From Tables  To Row-Labeled Plots

This  application   session   describes   two   user-written  S-PLUS
functions for producing  row-labeled  plots: dot plots,  horizontal
bar plots, and horizontal distributional summary plots such as the
boxplot.  Row-plots provide graphical alternatives to much of the
information  that EPA publishes  in tabular  form.    This session
provides design guidance to facilitate the conversion of moderate
sized tables into  elegant  plots.    Topics  covered  include  grid
options such as  white lines  on a grey  background,  symbol options
for  factor  levels  or distributional summaries, multiple  factor
layout choices, factor level sorting for enhancing one-, two- and
three-way plots, and  plot  labeling to  provide context.   Examples
emphasize TRI and EMAP summaries.


      Production of Choropleth Maps and Hexagon Mosaic Maps

This application session reviews  S-PLUS  command files that produce
choropleth maps  and hexagon mosaic  maps.   The  review  describes
geographic data  structures,  data  set structure,  smoothing using
lowess, display of residuals and construction  of legends.  Examples
are similar to that published in two 1993 Statistical Computing &
Graphics newsletter articles and emphasize cancer mortality rates
and trends.  However  the methods can be readily adapted to other
contexts.  Selected command  files  and  postscript examples can be
obtained  in  advance via anonymous ftp  to galaxy.gmu.edu and are
stored under /pub/submissions/eda/maps.
                                23

-------
S-PLUS Applications Abstracts

Gina Papush
Sanjoy V
Arun Satyanarayana
Mathematics and Statistics Department
University of Maryland Baltimore County

Spatial Statistical Methods in S-PLUS

Spatial statistical methods are used in environmental data analysis
to take into account the spatial nature of data.  In addition to the
usual tools of exploratory data analysis (EDA) such as scatterplots
and  stem-and-leaf  diagrams,  some special  EDA  tools are  used in
spatial data analysis. S-PLUS functions providing spatial summaries
such  as  pocket plots  and  spatial  trend  removal  such  as median
polish will  be presented.  These  programs  are used  to analyze the
Chesapeake Bay Benthic Index data.
                                24

-------
S-PLUS Applications Abstracts

Esperanza Renard
Superfund Technology Demonstration Division
Risk Reduction Engineering Laboratory
U.S. EPA
Edison, New Jersey 08837-3679

           Use of  S-PLUS for  Evaluation of  Test Methods
          for  Measuring Oil Spill  Dispersant Performance

Data  were obtained  from the  evaluation  of  several methods  to
measure dispersant performance for use in an oil spill emergency.
The results and conclusions  derived  from the  evaluation  of these
data will be presented  at the International  Oil Symposium sponsored
by  the ASTM  in  October  1994.  The coauthors of  this paper  use
different  statistical  programs and  approaches  for  treating  the
data.  The purpose of  this  initial  application  of S-PLUS  is  to
become  familiar with the software package and to  reevaluate  the
results and conclusions presented in the paper.
                                25

-------
S-PLUS Applications Abstracts
IF NEEDED TO FILL TIME

Brand Niemann
Environmental Statistics & Information Division
U.S. EPA, 2163
Washington, D.C. 20460

             Dissolved Oxygen in the Chesapeake Bay:
Exploratory Data Analysis to Support Monitoring Frequency Decisions

Dissolved oxygen is critical to  living resources in the Chesapeake
Bay  surface  waters.  Too  much  of  the nutrients  nitrogen  and
phosphorus added to the Bay subtracts oxygen and,  at times, life
itself  from  the waters. The Chesapeake Bay Monitoring  Program,
begun  in  1984,  is  a  bay-wide  EPA/state  cooperative  effort
comprising over 165  stations.  Nineteen physical,  chemical,  and
biological characteristics are routinely monitored 20 times a year
in the mainstem Bay and many tributaries.  This "point" monitoring
data  at  the  49  mainstem  sites  (see  map) along  with  special
continuous monitoring data at fewer sites and limited time periods
for  dissolved  oxygen are  the   focus  of  this  exploratory  data
analysis  (EDA).  In addition, the  mainstem  point  monitoring data
which has been interpolated horizontally and vertically within the
Bay by other researchers was also used.  In this way the effects of
sampling   frequency   (20  versus   12   per  year)   and   spatial
interpolation  on dissolved  oxygen statistics could  determined,
especially in relation to the new suggested water quality "targets"
for dissolved oxygen  levels in the Bay.

The EDA was structured  into three parts:  (1)  basic explorations;
(2) additional issues; and (3) advanced  explorations following the
principles of "not stopping with the first result" and asking the
five basic questions  about the  data (who,  what,  when,  where, and
why). Cleveland's approach  (1993)  of progressing from univariate,
bivariate, and multivariate data was also followed. The EDA tool is
S-PLUS running  under  the  Microsoft Windows 3.1 operating system.
The basic explorations feature cumulative distribution functions of
rank order frequency  of  dissolved oxygen  concentrations, notched
boxplots,  and   the   simple  scatterplot   matrix.   The  advanced
explorations  feature  q-q  plots,  seasonal   decomposition  of time
series, advanced scatter plot matrices, coplots, and contour plots.
An effort was made to develop generic S-PLUS  functions so it was
easy to change  the site data and plot labels and thereby make the
EDA more interactive.

The  results  show how EDA  can  improve  visualizations  of complex
databases,   support   decisions   on  monitoring   frequency  and
development of environmental indicators, and provide  comparisons of
results to environmental goals.  In addition, the presentation shows
how the S-PLUS results can be "cut and pasted" through the Windows
Clipboard  into  Folio Views  infobases with explanatory  text and

                                26

-------
"data stories" for electronic distribution to a broad audience.
                                27

-------
Potential S-PLUS Application Demonstrators:

Peter Broomfield
National Institute of Statistical Sciences
Research Triangle Park, NC 27709-4162

Accounting  for  Meteorological Effects  in Measuring Urban  Ozone
Levels and Trends

Surface ozone levels  are determined by the strengths of sources and
precursor emissions,  and by the meteorological conditions. Observed
ozone concentrations  are valuable indicators of possible health and
environmental  impacts.  However,  they  are  also  used to  monitor
changes and trends in the sources of ozone and of its precursors,
and for this purpose the influence of meteorological variables is
a  confounding factor. This report  describes  a  study  of  ozone
concentrations and meteorology in the Chicago area.  The  data are
described using a variety  of exploratory methods, including median
polish and  principal components  analysis.  The  key  relationships
observed  in  these analyses are  then used  to  construct  a  model
relating ozone to meteorology. The model  can be used to estimate
that part of the  trend in ozone levels that cannot be accounted for
by  trends   in   meteorology,   and  to   "adjust"  observed  ozone
concentrations for  anomalous  weather conditions.  The model are
estimated by nonlinear  least squares.   Its goodness of fit  is
assessed by  the  comparison  with  nonparametric regression results
(lowest).
                               28

-------
                       WEDNESDAY   MARCH 9,  1994
 8:45-10:00        Statistical Issues in Rulemaking
                        Henry Kahn
Helen Jacobs, Henry Kahn and Kathleen Stralka; OW

     Estimates of Fish Consumption Rates in the U.S.

Estimates of  fish  consumption  in the U.S. based on  the  1989 and
1990 USEA Consumption Surveys for Individual Intake  (CSFII) will be
presented.   Fish consumption estimates play an important role in
a number of EPA problems.  In particular, exposure estimates used
in determining  water quality criteria and  related  standards are
based in  part on the  amount  of fish consumed  and  contamination
levels  in  the  fish.    This  presentation  will  provide  fish
consumption estimates by habitat (marine,  estuarine and freshwater)
and  species.   Changes  in fish consumption  during the  past two
decades will also be discussed.
Henry Kahn; OW

     Statistical Basis of Industrial Wastewater Control Regulations

     Regulations that  limit the  amount  of  pollutants that may be
discharged  by   industrial  facilities   are   known  as  effluent
guidelines  regulations.   These regulations  are based  on  the
capability  of treatment  technology in  specific  industries  and
contain  numerical   limitations   on the  levels  of  particular
pollutants  that may be  discharged in  treated  effluent.    The
limitations are determined on  the basis of statistical analysis of
chemical  analytical data  that  characterize  the  performance  of
treatment  technology.    This presentation   provides  a  general
descriptions of the statistical analysis of the data which includes
model fitting and parameter estimation and adjustments to account
for different  sampling periods and the  occurrence of non-detect
measurements.   Some  examples from recent  rulemakings are  also
discussed.
                                29

-------
                       WEDNESDAY   MARCH 9,  1994

8:45 - 10:00 am

John Warren and Alfred F. Haeberer—ORD

     Statistics  and  the Agency's  Mandatory  Quality  Assurance
Program

The  EPA's  Quality Assurance  program has evolved  over the  last
twenty years from a relatively technical laboratory quality control
program  to  the present  program that  focuses  on the  management
processes  needed to produce  the  appropriate data  to  support
specific applications.   The  three  principal components  to  this
mandatory program are Planning,  Implementation, and Assessment; it
wi axiomatic  that statistical  design  and  inference has  a  great
impact on each element.

The extent of this impact will be discussed through an analysis of
the  Requirements  and Guidance  documents  recently  issued by the
Quality Assurance Management  staff, ORD, and by reference to the
ANSI/ASQC  American   National   Standard,  E4,   "Quality  Systems
Requirements for Environmental Data and Technology  Programs".

The  major  themes of  the presentation will  center  on  where the
statistician can have maximum impact on  improving the  quality of
the Agency's data, and what will be demanded of statisticians by
environmental decision makers.

Copies of the principal Requirements and Guidance documents will be
made available to participants.
                                30

-------
                       WEDNESDAY   MARCH 9,  1994

 10:15-11:30       Statistical Analyses of Ozone Layer and
                   Composite Sampling
                        Bimal Sinha
Dulal K. Bhaumik (University of South Alabama)

     The Ozone Layer

Some 50 kms above the Earth's surface  lies a veil called the Ozone
Layer.  It saves the earth from the ultraviolet radiation emitted
by the  Sun.    Some stable chlorine gases  released by  the human
activities  go  above  the earth  and  eat  up  the  ozone  layer.
Depletion  of the  ozone  layer  is a  great threat  to  the human
society.  In this talk we will discuss the lethal effects of ozone
depletion and try  to find out through a statistical analysis how
severe  the  depletion would  be  if the  world continues  producing
trace gases as it does today.

Soma Sengupta (University of Maryland/Baltimore County)

     Inference with  Composite Sampling

A composite sample is a physical mixture of several grab samples.
The problem of inference regarding the mean of a population based
on composite sampling measurements is  considered.  A necessary and
sufficient condition is derived under which the  estimate of the
mean based on the composite measurements is better than that based
on grab measurements.   An approximate  distribution of  composite
sample measurements based on large samples  is derived.  An example
is used to illustrate this inference procedure.
                                31

-------
                       WEDNESDAY   MARCH 9,  1994
10:15 - 11:30 am
                   Small Community Information and Data Program
                       Mel Kollander and Susan Brunenmeister-OA

The Small Community Information and Data Program was established in
April 1992 in the Administrator's Office to provide an Agency focal
point for small community  information.   Since its foundation the
program has  an  outline 2 gigabyte mainframe  data base primarily
consisting of information from the 1992 Census of Governments and
the  1990  Census  of  Population  and  Housing.    The session  will
include two presentations about the Small Community Information and
Data Program.  The first presentation will describe the mission,
objectives and activities of the  program.  The second presentation
will provide examples of available information from the mainframe
databank.
                                32

-------
                    THURSDAY   MARCH 10,  1994

 8:45-10:00 am     Plenary session

Chapman Gleason—OPPE

     An Introduction to Internet For Statisticians

It is the goal of the Clinton Administration to move the US out of
the Industrial Revolution and fully into the Information age.  Like
the  rail  roads  and  the Interstate  Highway system  did  for  the
Industrial Revolution the National Information Initiative will do
fort he Information  age.   In this  presentation  I  will define the
Internet, give an overview  of the National Information Initiative,
describe how the Government Information Locator System (GILS)  and
describe  how   the  Bureau  of  Environmental  Statistics  will
disseminate  information  on the   Internet.    In  addition,  the
following Internet tools will  be defined  and  an example of their
use will be shown:

     1)  Bitnet and Listserv(ers)  and Internet newsgroups,
     2)  FTP and Anonymous FTP,
     3)  Archie,
     4)  Veronica,
     5)  WAIS  (public and commercial),
     6)  Gopher,
     7)  World Wide Web (WWW),
     8)  Xmosaic.
 10:15-11:30       Plenary session  Featured Speaker
                       Tom Van Zant
                                33

-------
                                EVALUATION FORM

                   TENTH ANNUAL EPA CONFERENCE ON STATISTICS
1.   Overall Conference Evaluation
Did you broaden your EPA
contacts?

Did you update your current
knowledge?

Did you find exposure to
new material?

Did you gain more agency-
wide perspective?

Were you able to exchange
technical methods?

     you able to discuss
    lems and concerns?
2.  Session Evaluations
                                   Very Much    Some Extent    Limited Extent
                                    Highly      Fairly    Not Very
                                    Relevant    Relevant  Relevant
Plenary Session Introduction
to Conference & Panel Discussion

Plenary Session
State of Pennsylvania

Environmental Statistics Research
at the National Institute for
Statistical Sciences

General Topics

Statistical Research Methodology

-------
                                    Highly      Fairly    Not Very
                                    Relevant    Relevant  ^Relevant
How Widespread is Our Influence
as statisticians?

Using Uncertainty Analysis in
Environmental Decision Making

Environmental Statistics
Issues in Pesticides

Poster Session

Demonstrations of S Plus
with Environmental Data Sets

Statistical Issues in Rulemaking

Statistics and the Agency's
Mandatory Quality Assurrance Program

Statistical Analyses of Ozone
Layer and Composite Sampling

    L Community Information
	 Jata Program

Plenary session  An Introduction
to Internet For Statisticians

Plenary session
Featured Speaker  Tom Van Zant
3.  What were the greatest strengths of the conference?  What aspects did you like
    the most?

-------
4.  What were the greatest weaknesses of the conference?  What aspects  and
    sessions did you like the least?
5.   Would you be interested in another mini.-cours& th^t  would indroduct you, to a
new development in applied statistical methodology?'

  Yes                       No                      Unsure


Suggestions for topics:
6.  Are you planning to attend next: year's Conference on Statistics?-
7.  Other comments

-------