x>EPA
           Agency
            Washington, DC 20460
230-030-47
           Statistical Policy Branch
ASA/EPA Conferences on
Interpretation of
Environmental Data
           IV Compliance Sampling
           October 5 -6th, 1987

-------
                                     PREFACE
    This volume is a compendium of the papers and commentaries that were presented at
the fourth in a series of conferences on interpretation of environmental data conducted by
the American Statistical Association and the U.S. Environmental Protection Agency's
Statistical Policy Branch of the Office of Standards and Regulations/Office of Policy,
Planning, and Evaluation.  The ASA Committee on Statistics and the Environment
developed this series and has general responsibility for it.

    The purpose of these conferences is to provide a forum in which professionals from
the academic, private, and public sectors exchange ideas on statistical problems that
confront EPA in its charge to protect the public and the environment through regulation of
toxic exposures. They provide a unique opportunity for Agency statisticians and scientists
to interact with their counterparts in the private sector.

    The eight papers and accompanying discussions in this volume of proceedings are
about "compliance sampling" to determine how well environmental standards are  met.
These papers provide valuable  guidance in the planning of future environmental studies.
The papers address many aspects of compliance, and are intended for statisticians involved
in planning how to ascertain general levels of compliance and identify noncompliers for
special attention. Such work is inherently statistical and must be based on anticipation of
the statistical analysis to be performed so that the necessary data can be collected.  These
proceedings should help the statistician anticipate the analyses to be  performed.  In
addition, the papers discuss implications for new studies.  No general prescriptions are
offered; none may be possible.

    The emphases in these papers are quite different.  No two authors have chosen the
same aspect of compliance to examine.  This diversity suggests that a major challenge is
to consider carefully each study aspect in the planning process.  Meeting this challenge
will require a high degree of professionalism from the statistical community.

    The conference itself and these proceedings are primarily the result of the efforts of
the authors and discussants. The discussants not only describe how their views differ from
those of the authors, but provided independent  ideas as well. The coordination of the
conference and of the publication of the proceedings was  carried out  by Mary Esther
Barnes and Lee L. Decker of the ASA staff.

    The views presented in this conference are those of individual writers and should not
be construed as reflecting  the official position  of any agency or  organization.

    This fourth conference, "Compliance Sampling," was  held in October 1987.  Others
were the first conference, "Current Assessment of Combined Toxicant Effects," in May
1986, the second , "Statistical Issues in Combining Environmental Studies," in October
1986, and the third , "Sampling and Site Selection in Environmental Studies," in May 1987.
                              John C. Bailar HI, Editor
               Chair, ASA Committee on Statistics and the Environment
           Department of Epidemiology and Biostatistics, McGill University
                                        and
                  Office of Disease Prevention and Health Promotion
                    U.S. Department of Health and Human Services

-------
                                  INTRODUCTION


     The  general  theme  of  the  papers and  associated  discussions is the  design  and
interpretation of environmental regulations that incorporate, from the outset, statistically
valid compliance  verification procedures.  Statistical aspects of associated  compliance
monitoring programs are considered.  Collectively the papers deal with a wide variety of
environmental concerns including various novel approaches to air emissions regulations and
monitoring,  spatial . sampling  of  soil,  incorporation  of  potential  health  effects
considerations into the design of monitoring programs, and considerations in the statistical
evaluation of analytical laboratory performance.

     Several papers  consider  aspects  of determining appropriate sampling frequencies.
Allan Marcus discusses how response time frames of potential biological and health effects
due to exposures may be used to decide upon appropriate monitoring interval time frames.
He demonstrates how biokinetic modeling may be used in this regard.

     Neil  Frank  and Tom Curran discuss factors influencing required  sampling frequencies
to detect paniculate  levels  in  air.  They  emphasize  the  need to specify  compliance
monitoring  requirements  right  at the time  that the  air  quality standard  is  being
formulated.  They suggest an adaptive monitoring approach  based  on  site  specific
requirements. Those sites that are clearly well above or well below  the standard need be
sampled relatively infrequently. Those sites that straddle the standard should  be sampled
more   frequently    to   decrease   the   probabilities   of   misclassification   of
attainment/nonattainment status.

     Tom  Hammerstrom and  Ron Wyzga discuss strategies to  accommodate situations
when Allan Marcus' recommendations for determining sampling frequency  have  not been
followed,  namely when monitoring data averaging time intervals are very long relative to
exposure  periods that  may result in adverse physiological and health consequences.  For
example,  air monitoring data  may be averaged over one hour intervals but  respiratory
symptoms may be related  to  the highest five minutes of exposure during that hour. The
authors model the relationships between peak five minute average  concentration during an
hour and  the overall  one hour average concentration  under  various stochastic process
assumptions.  They  combine   monitoring  and  modeling  to  predict  short   term  peak
concentrations on the basis of observed longer term average concentrations.

     Bill  Nelson discusses statistical aspects  of  personal monitoring  and  monitoring
"microenvironments" such  as  homes and workplaces to assess total personal exposure.
Such data are very  useful for the exposure assessment portions of risk assessment.  Dr.
Nelson  compares  and contrasts personal monitoring  with the  more  traditional  area
monitoring.  The availability  of good personal exposure data would permit much greater
use of human epidemiologic data in place of animal toxicologic data in risk assessment.

     Richard Gilbert, M. Miller,  and H.  Meyer discuss statistical aspects of sampling
"frequency" determination in  the spatial sense. They consider the development  of  a soil
sampling program to estimate  levels of radioactive solid contamination.  They discuss  the
use  of  multilevel  acceptance  sampling plans to determine  the compliance status  of
individual  soil plots.   These  plans  have sufficient sensitivity to  distinguish  between
compliant/noncompliant plots  yet result in substantial sample  size economies  relative to
more naive single stage plans.

-------
regulation.  The "bubble" concept specifies that average environmental standards must be
maintained  across  a  dimension such as area, time, auto fleet, or industry group.  This
dimension constitutes the "bubble." Lack of compliance in one part of the bubble may be
offset by greater than minimum compliance  in other parts. Emissions producers  have the
option to trade, sell or  purchase  emissions  "credits"  with,  from,  or to other emissions
producers in the bubble.  Alternatively, they may "bank" emissions "credits" for use in  a
future time period.   Such  an approach to regulation greatly  enhances the emissions
producers' flexibility, as a group, to configure their resources so as to most economically
comply with the overall standard.

    Soren Bisgaard and William Hunter discuss statistical aspects of the formulation of
environmental   regulations.   They emphasize  that  the regulations,  including  their
associated  compliance  monitoring  requirements, should be designed  to have satisfactory
statistical  characteristics.   One approach to this  is   to design  regulations  that have
operating characteristic curves of  desired shape.  Alternative candidate formulations can
be compared in terms of the  shapes of their associated operating characteristic  curves.

    Bert Price discusses yet another  statistical  aspect  of  environmental  regulation;
evaluating   the  capabilities  of analytical   laboratories.  He  contrasts  and  compares
strategies to evaluate individual laboratories based  only on their own bias and variability
characteristics  (intralaboratory testing) with strategies that evaluate laboratories  as  a
group (interlaboratory testing).  Price's paper has commonality with that of Bisgaard  and
Hunter in that he argues that first the operating characteristic of a regulation needs to be
specified.  This  specification is  then  used to determine  the  types  and  numbers of
observations required in the associated compliance tests.

    The  eight papers in this volume  of proceedings  deal  with diverse  aspects  of  the
statistical  design  and  interpretation  of   environmental   regulations  and  associated
compliance  monitoring programs.  A unifying theme among  them  is  that the statistical
objectives and characteristics of the regulations should  be specified right at  the planning
stage  and   should  be  drivers  of  the  specific regulation   designs  rather  than  being
(in)consequential afterthoughts.


                                    Paul I.  Feder
                 Chair, ASA/EPA Conference on Compliance Sampling
                             Battelle Memorial Institute
                                           IV

-------
                            TABLE OF CONTENTS



Preface. JOHN C. BAILAR III, McGill University                                ii

Introduction. PAUL I. FEDER, Battelle Memorial Institute                        iii

Index of Authors    .                                                       vi

       I. TOXICOKINETIC AND PERSONAL EXPOSURE CONSIDERATIONS IN
           THE DESIGN AND EVALUATION OF MONITORING PROGRAMS

Time Scales: Biological, Environmental, Regulatory. ALLAN H. MARCUS,
Battelle Columbus Division                                                   1

Discussion.  RICHARD C. HERTZBERG, U.S. Environmental Protection
Agency, ECAO-Cincinnati                                                   16

Statistical Issues in Human Exposure Monitoring. WILLIAM C. NELSON,
U.S. Environmental Protection Agency, EMSL-Research Triangle Park              17

Discussion.  WILLIAM F. HUNT, JR., U. S. Environmental Protection
Agency, OAQPS-Research Triangle Park                                       39

  H. STATISTICAL DECISION AND QUALITY  CONTROL CONCEPTS IN DESIGNING
   ENVIRONMENTAL STANDARDS AND COMPLIANCE MONITORING PROGRAMS

Designing Environmental Regulations.  SOREN BISGAARD, WILLIAM G. HUNTER,
University of Wisconsin-Madison                                             41

Discussion.  W. BARNES JOHNSON, U.S. Environmental Protection Agency,
OPPE-Washington, D.C.                                                     51

Quality Control Issues in Testing Compliance with a Regulatory Standard:
Controlling Statistical Decision Error Rates.  BERTRAM PRICE, Price
Associates, Inc.                                                            54

Discussion.  GEORGE T. FLATMAN, U.S. Environmental Protection Agency,
EMSL-Las Vegas                                                           75

                m. COMPLIANCE WITH RADIATION STANDARDS

On the Design of a Sampling Plan to Verify Compliance with EPA Standards
for Radium-226 in Soil at Uranium Mill Tailings Remedial-Action Sites.
RICHARD O. GILBERT, Battelle Pacific Northwest Laboratory, MARK L.
MILLER, Roy F. Weston, Inc.; H. R. MEYER, Chem-Nuclear Systems, Inc.           77

Discussion.  JEAN CHESSON, Price Associates,  Inc.                              111

            IV.  THE BUBBLE CONCEPT APPROACH TO COMPLIANCE

Distributed Compliance: EPA and the Lead Bubble. JOHN W. HOLLEY, BARRY
D. NUSSBAUM, U.S. Environmental Protection Agency, QMS-Washington, D.C.       112

Discussion.  N. PHILIP ROSS, U.S. Environmental Protection Agency,
OPPE-Washington, D.C.                                                     121

-------
                V.  COMPLIANCE WITH AIR QUALITY STANDARDS

Variable Sampling Schedules to Determine PMjQ Status.  NEIL H. FRANK,
THOMAS C. CURRAN, U. S. Environmental Protection Agency, OAQPS-
Research Triangle Park                                                      122

Discussion.  JOHN WARREN, U. S. Environmental Protection Agency, OPPE-
Washington, D.C.                                                           128

Analysis of the Relationship Between Maximum and Average in S02 Time
Series. THOMAS S. HAMMERSTROM, Roth Associates, RONALD E. WYZGA,
Electric Power Research Institute                                             129

Discussion.  R. CLIFTON BAILEY, Health Care Financing Administration           154

Summary of Conference. JOHN C. BAILAR III, McGill University and
U.S. Public Health Service                                                    155

Appendix A: Program                                                       160

Appendix B: Conference Participants                                          162
                             INDEX OF AUTHORS
Bailar, John C	  ii,155
Bailey, R. Clifton 	  154
Bisgaard, Soren 	  41
Chesson, Jean 	  Ill
Curran, Thomas C	  122
Feder, Paul I	  iii
Flatman, George T	  75
Frank. Neil H	  122
Gilbert, Richard 0	  77
Hammerstrom, Thomas S	  129
Hertzberg, Richard C	  16
Holley, John W	  112
Hunt, Jr., William F	  39
Hunter, William G	  41
Johnson, W. Barnes 	  51
Marcus, Allan H	   1
Meyer, H. R	  77
Miller, Mark L	   77
Nelson, William C	  17
Nussbaum, B. D	  112
Price, Bertram 	  54
Ross, N. Philip 	  121
Warren, John 	  128
Wyzga, Ronald E	  129
                                       VI

-------
           TIME SCALES: BIOLOGICAL. ENVIRONMENTAL.  ?E3L'LATCF.v

                             Allan H. Marcus
                       3attelle Coiumcus  Division
                             P.O. Box i 3759
                    Researcn Triangle Park, NO  2"7'0Q

1. INTRODUCTION

     E.P.A. has  estao i isnec. primar, air  duality standards  t:>  c-rptec:  *T =
general puolic aaainst the  adverse nealth effects  of   air col ".ut3"s.  src
secondary  standards  to  protect  against  other   aa/er=e   =p./ •; - j .-,- = .-, t a 1
impacts.   Compliance wi^n   these standaras   i= usuall1-'  are=cr:ce~ b- a>-
explicit sampling  protocol for  the pollutant,  with  SDSCITISO :'-'--[=:• ra 1
insui". variation in concentration   to which   the  ppoulaticn  1=  e-ocsec,
cost and  precisipn of  the sample  data.   Biological  and  heait"  effects
issues  are primary and should be kept always  in mind. Iracec'..'= '.=  sa-icli^c
schedules  for  compliance  testing  mignt allow fluctuarina  e'Pcsures cf
toxicologies! significance  to escape detection. Resources for trstinq
compliance are usual I/ going to be scarce, and  focusing :n  t.eal T  =f-"ec"s
ma/ allow  the analyst  and designer of  environmental '"egi. I a f: pr.s  ~z f:>-~
some patn between oversampling and und'ersamp 1 ing environment;! data.

     In this review I will  emchasize  air  puaiitv  starda^zs  ""-•" leao.
Lead is  a soft  dense metal whose toxic  effects have long  ::==•-  -roi-c-(. In
modern  times atmoscneric  lead has become  a  community  prcole? Because of
the  large  quantities  of  lead  used as gasoline additives.   while t'u?
problem was supstantlal1v  reduced as a result  of E.P.A.'s ieaoso  gasoline
pnasedown  regulations,    there  are   still   significant  Quantities  iTeire' = .  oart-3"'
plants  etc.,  and  substantial  residues of   previous lead  e-ii! =31 rns i .•
surface  soil  and  dust.    Other   regulatory  author it 1=5  rc'Ttr-i lean
concentrations in  drinkinq water.   in consumer orcducts.  and  i .'•  "he '.-
-------
of data has been collected by the State and Local Air Mom I-Q -1 -.Q  Stations
(SLAMS) network.   These  provide information  aoout areas  wr.ere  "he  lead
concentration  and  population  density  are  highest   and mcrircrmg  for
testing compliance with standards is most critical.  In order  f-c i -JL-MS
station to be part of  the National Air Monitoring Station  :i'(AfS.  ~^r.:-iar'-.
verv specific criteria must be satisfied about sampler  location  in terms
of height  above ground  level, distance  from the  nearest ma:or  roacwav.
and scatial scale of which the station is suocosed  to be rspr~=enrati.e.
The citing study must  also have  a sufficiently   long sampling oerioo to
exhibit  typical   wind  speeds  and  directions,  or a  sufficient,  larzie
number of short periods to provide an average value consisted wim ^  = ,..-
hour exposure (CD, 1986).

     The current averaging time for the lead primary National  Avioient  -iir
Quality Standard  (NAAGS) is a calendar  quarter  (3  months:,  and  trs  a: •-
lead  NAAGS  is  a  ouarterlv  a/erage  of  1.5  ug/m3  that  snail  --i~ ce
exceeded.  The lead stanaard proposed in 1977  was  Oased   on  en. a/e-'agi^g
time  of  one  calendar  month.    The longer period has the  ad/an^age of
greater statistical stability.  Howeve^, the shorter oe^iod  3;lows  some
extra protection.  Clinical  studies with  adult male   volunteer  subjects
showea that  blood lead  concentration  (PbB) changed to a  lew ec-uilinrium
level after E or   3 months  of exposure  (Rabinowitz et  al..  '9~3, >376:
G'iffin et  al.,  1975).   The  shorter averaging  time was  also thought tc
give more protection -tc young . ch i Idren  '^2 FR  530""*?) 'e-/en   though rn = r =
was no direct evidence then  (or now1) on blood  lead kinetics  '-• c""':dre'-.
""!"he risk  of shorter  term e'-oosures to air lead concent r a "• i ~ •• =  e'S/atsd
above  a  quarterly-averaged  standard  that  might go jncet=ct=c   v.e'-r
considered in the 1Q78 standard decision to be minimizes because  \ " -.^ =='~
on  the ambient air quality data availaole at that  time, tne  possibilities
for significant,  sustained excursions  were considered  sma". .,  src  £•  i:
was determined   that direct  irnalation of air  leac is  a  " = '. 5 •-.•=;. small
component of total airborne lead exposure  ^3 FR ^6c^o>. '  r/2i'i- = -'., 1C5='.
The biological reasons for reevaluating the averaging  time   =re d:scj = = 5(:
in  the next section.

     Alternative forms  of the  air lead stancard are .-,cw  "e1 'G =.al ;-sted
bv  E.P.A.'s Office of  Air Quality  Planning and   Standards  (QACPS..    The
averaging  time   is  only  one  of  the comoonents  in setting an  air  lead
standard.  The "characterizinc  /alue" for testing  compliance can  assume a
wide variety of  forms, e.g. the maximum monthly  (or quarteriv'  a.erage 55
used  in  the  "deterministic"  form of  the standaros,   tne  maximum  of the
average monthly  mean  over a specified numoer of  vears  e.g.  3 consecutive
years, the average of  the maximum monthly averages  for  each  vesr  within a
specified number  of years,  the average  of the  three  highest nonths (or
quarte~s)  within a specified number of  years etc.   Some  averaging  of the
extreme values certainly smoothes cut the oata, but also conceals extreme
high-level excursions.  Some attention  has been  given   to  the statistical
properties of  the alternative  characterizing  values  (Hunt. Iq9c).   Tne
consequences of  different characterizing  values  for   biolocica1  e^cosure
indices or health effects indicators has not vet  been evaluated.

      A final consideration  is  the samoling  freauency.   The current  normal
situation  is a 2^-hour average collected  every 6th day.    The number of
samples  collected  also depends  on  the  fraction of  lost  days: it  is not

-------
uncommon for E57, of the data to oe lost.  Thus one might have oni.-  3  cr  ^
valid samples  per month.   Hunt   (1956- examined  more  frequent  sampling
schemes: every day. every other day, eve^v third  aav.   ue  ai = j  comoared
the consequences of deterministic  vs.  "statistical"  form of  th*?  sr = r,qarc.
monthly vs. quarterly characteristic values, E57.  data loss   -s.  vo loss.
The  community  air   lead  problem   in   the  U.S.  is  now  <-nor = |H'~:  • -3  c-e
related to point sources than to area-wide emissions.  thus  rne  r"r i 1 z*>\ ~c
three scenarios  for  location  were  evaluated:   (i)  source or:ent=q sites
with maximum annual quarterly  averages  less  than 1.5  ug.'m2:  '5;  =3urc =
oriented   sites  with  maximum  annual  quarterly  average greats1"  '•han 1.5
ug/m3; (3) MAMS urban  maximum  concentration  sites.    Some c:.~ciusicrs
suggested  bv his study for Quarterly averaging  time  3"=:
     (i) The  characterizing value  witn the best pr=ci = ior<  ft
-------
plausible  explanation  is  that  tnere  is   reduced  transfe--   of  lead   to tre
red  blooa   cells at   higher  concentrations,  wnethe^  attributed fc  '-educed
lead-binding capacity  of  the ervthrocytes  or  reduced  transfe--  rste acro = ^
tne   erythrocvte  membrane  as  lead  concentrations   increase.    7r;-= ;H
reinforced  bv  multi-dose  experiments on  rats  in  wnicn  lead  concentrations
in brain,   kidney, and femur are  proportional  to  dose,  which  is ejected
if tissue  concentrations  equilibrate witn  o'lasma concentrations,  not  .-u!?-!
whole blood lead concentrations.

      Lead  concentrations   in peripheral  tissues  can be modeled bv  couoied
systems  of ordinary  differential  equations.   Parameters  for  suc^  systems
car  be estimated  bv  iterative  nonlinear  least squares methods, of-en  with
Marquardt-type modifications to  enlarge   the  domain   of  initial  oa'-ametc'-
estimates   which  allow   convergence  to the  optimal  solution  (Bermar, arid
Weiss,  1978).   Data  sets  with  observations   of  two   or   more comcone^ts
often  sustain   indirect  inferences    about  unobser/ed   Tissue   oools.
Analyses of data in  (Raoinowitz  et al..  1973,  1976; Griff i- et si.,  i-"1;;
De   Silva,  1981)  reported  in   (Marcus.  I985abc: Chamberlain.  '. 353;  12.
1986)  show that  lead  is absorbed  into  peripheral tissues  in  adult humar=
within a   few  davs.  The  retention of  lead by tissues  is 'Tiucn  larger  thar,
is the  initial uptake.  Even soft  tissues  such as  kicnev  arc  1: ver -sones-
to retain  lead for a month or  so, and  the  skeleton retains  lead  "or  .ea"=
or tens  of  years (Christoffersson et al..  1986).

      The relevance of  blood lead and  tissue lead  con-centra* ions  to overt
toxicitv is not  unambiguous.   As  in  any  biologically  vari=cl=  acnulstion.
sc'Tie  individuals  can  exhibit  extremel/ high  biood  lead   .-lit",  -inly  ->;'.-
lead  poisoning  (Chamberlain and  riassev.  1°72).   A r.o'-e  ci-~ct  PY-=CU'- = C-
of   toxicitv   is  tne  ervthncyte   pro topcrphyr in    ; EP '   concen fat i z~ .
Elevated levels  of EP  show that  lead nas deranged  the neme  *:10=-ntnetic
pathway, reducing  the rate  of production of neme "or he^crilcb i n.  EP is
now  widely  used  as a screening   indicator  for   potential   tcvicit\.    ~*n
example of  the  utility of EP  is that after a brieT" ••nassi-. e e-oosure  of a
British  worker (Williams,  198^ >,  zinc  EF   increased   to   .-erv elevated
levels  within  a  week   of  exposure  even   the worl-er v-jas still  larcielv
asymptomatic.   Even  though there  is   consideracie  biological  /ari=p : 1 i f.  ,
EP levels   in  adults   increase significantly  within  10  to EO davs after
beginning  an experimental  increase of  ingested   lead  'Stui*.   19"-+; Cools
et al.,  1976; Schlegel   and Kufner,    1978).  _ Thus biological effects in
adult humans occur very shortly  after  exposure,  certainly witnip  3 month.

     While  the  uptake of  lead and  the onset of potential  "o-'icit^ cccur
ran idly  during increased  exposure, the   reduction  of  exposure  does not
cause  an   equally  rapid  reduction   in  either  body aytleri  or toxicit-,
indices.   Accumulation of  mobilizable  pools of lead   in  "he skeleton and
other tissues  create an  endogenous source  of  lead  that  is  only slcwl/
eliminateo.  Thus the  rapid uptake   of leaa   during periods of  increased
exposure should be emonasized  in setting standards for ierd.

      The experimental  data cited above are indeed  human  cats,  ou*  ai!  for
adults (almost all for males).   We are not aware of any direct stuaies en
lead  kinetics   in children.  One of  the  more  useful sets of oata  involves
the uptake  of  lead by  infants from formula  and milk  (Ryu  et  al., I'S^,
1985).   Blood lead  levels and  lead content of food were  measured at ES

-------
day intervals.  The results  are  negative  but   informative:   Bicod  lead
levels  in   these  infants   appeared  to   eauilibrate  so  much  faster  that  n^
estimate of the kinetic parameters we»s  possible.    A  ,=r/   .-runn efi-ate
bv  Duggan  (I'St)  based  on  earlier   input-output   studies   i-  infants
(Iiegler et'al.,  1978) gave a Dlood  lead half  life  (-  mean  li* =  *  i3q<2''<
of H to 6 days.   Duggan's  method  has many  assumptions  and uncertainties.
An alternative method, allometric scaling based on  surf;':e  ares, sudc-ests
that if  a  70 kg  adult male has a blooc lead mean life of 30 da,-s,  t-,en =
7 kg infant should have a  blood lead mean  life of about 3 ua.'S.

     The above estimates of  lead kinetics   in children  are not strict!'.
acceptable.    Children  are  kineticaliy somewhat  different f--r-.  adults.
with a  somewnat larger volume  of  blood   and  much  smaller  cj:  rapid.
develooing  skeleton   (especially dense  cortical bone  :na^ •  etav.s  most  : *
the adult body burden of lead).    Children  also  aosorp   lee-J  f-~m r-e
environment  at   a  greater  rate,  as  -J>e/ na/e  greats-- das trn i - res t: ~a 1
absorption  of ingested lead  and  a  ..more rapid  ventilation ^a^e   then  zc
adults.   A b lomathemat ical  model has  been developed  p/ Hari = '.  3'ic  - ~eip
(19Bn)  ana  modified for use by  GAQPS .   This  uc take/b 10" inet ic  mcce'.  .=
based  on   lead   concentrations   in  infant and juvenile baccon^.  ,-ir.Q  ire
believed  to  constitute   a  valid   animal  model   for Human   grr>Jtn anc
development.    Preliminary  applications  of  the  incdel ;re described  r.
(Cohen, 1986; ATSDR,  1937; Marcus et  al..   1987).     The   mcoe! includes
annual  changes   of  kinetic  parameters  such  as  the transfe1"  -ates  f~r
plood-to-bone.  blood-to-liver.   1iver-to-gastrointestina1    :r=ct.   and
growth  of  blood,  tissue,  and  skeleton.    The  model oreair's  ~  Tie5r,
residence time for lead in blood  of c-vear-old cnildren 3=  = 33.•=.

     Blood  lead  concentrations  change  suostantiall   dL^ina  r~;lorc3c
'Raoinowitz et al., 198^).  These chanaes  reflect  the  wssnc^t j* : n  jte-'d
lead,  tne exposure of the  cm Id  to changing  patte-ns of   f"Cd  ^nc  --later
ccnsumccion, and  the exposure  of  the  toddler to  leaded sc:i and  dust  in
his or  ner  environment.     We  must   thus  ccnsio=-r  also  the temporal
variations  of exposure to  environmental  lead.

<*. ~IME SCALES OF LEAD EXPOSURE

     Air lead concentrations cnange ver / rapid-. ,  depending on  ,-nra  speed
and direction  and on  emissions patterns.   Biological  kinetics  tenc  to
filter out  the "high-freguency" /ariations in snvironmenral lead,  so  tna';
only environmental variations on  the order  of a  few  days  are  li(-- = I/  to
play much  of a role.   The temporal  patterns decena on averaging time and
sampling freouency, and thus  will  vary  from  one  location   to  another
depending on  the major  lead sources   at that  site.  Figure 1 shows  the
time series for the logarithm of air lead concentration '.log  PbA)  near  a
primary  lead  smelter  in  the  northwestern  U.S.  The data are  5^-hour
concentrations sampled every third day  (with a few  minor slippages':.    i4e
analysed these data using   Box-Jenkins time series programs.   The :emcc--al
structure is  fairly complex, with a significant  autoregressi^e  component
at lag - (£7 davs) and  significant moving  average components  5*-  lacs  1
and 3   (3 days  and 9  days).    Time series analyses  around point  source
sites and general  urban sites mav thus  be informative.

-------
     Direct inhalation of atmospheric lean mav  be only  a minor  part of
lead exposure   attributable to   air  lead.    Previously  elevated air lead
levels mav have deposited a substantial  reservoir of lead in =..c-fac= so: 1
                                                                  r - r
and house oust  in the environment; these are the pri-nsr/ p~thwa/s
leao  in  children   aged   1-5   /ears.    Little   is   known   abour  temcaral
variations  in   soil and  house dust  lead.   Preliminary result- 11 ceo  in
'Laxen et al.,  1987) suggest  that  lead levels   in  surface   dust  arc  soil
around redecorated  houses and scnools can chance over periods or"  • i r,e  of
two to six  months.  While  lead  levels  in undisturbed   soils  can  aersisc
for thousands   of years, the  turnover of lead in urban soils due  to human
activities  is undoubtedly much faster.

      Individuals are not stationar/ in their environment.   Thus,  tne  lead
concentrations  to which  individuals are exposed  must  include both  spatial
and temporal patterns of exposure.  The Picture  is corncle--,   out  much  is
being learned from  personal exoosure monitoring  programs.

     The amount  of variation  in air lead conceneracions at a stationar,
monitor can be  extremely  large.  Coefficients of  variation in  excess  of
100'/. are  not uncommon  around point  sources sucn  as lead  smelters,  eve11
uin en monthly or quarterly averages are used.  This  var iaD : 1 i r.v   is far  in
excess of   that attributable   to meteorological  /ariation  and   is due  to
fluctuations in the emissions  process  e.g.  oue  to  variations   in  feec
stock,   process   control,  • or   production  rate.    ^urtharmore,  the
concentration distributions are  verv skewed and  heavv-tai lea .  more nearly
log-normally distributed  than normal  even r"or  long  averaging times. The
stochastic  properties of the  orccess are  generally  unknown,   3it~cucn  it
mav be  assumed that air, dust,  and soil  lead concer.tr a r, i ?<•= •=< ~i.>r all
sources of  exposure, including food, water, ana  paint.   as  •.•lell   as t"ose
pathways from   gasoline lead,  have been declining.    Ui'-h these points  i~-
mind. we can begin  to  construct  a  Quantitative   characterization   c r~ a
nealth effects  target for compliance studies.

5. HEALTH EFFECTS CHARACTERIZATION: A THEORETICAL APPROACH

     We will  here briefly descrioe a possible approach  to  tne orioles  o*
choosing  an  averaging  time  that  is  meaningful   for  nealth  effects.
Related problems  such as sampling irequencv then aeoend on the precision,
with which  one  wishes  to estimate  the healtn   effects  characternac13- .
The basic   fact is  that ail  of  the effects of interest are driven DV the
environmental concentration-exposure C(t) at time t   integrated over  some
oeriod of   time, witn  an appropriate  weighing factor.      As oeoole are
encased to  diverse pollutant  sources, the  uptake from   all pathways  must
be added up.  If the health effect is an instantaneous one  wnose  .-alue  at
time  t is denoted X(t>, and   if  the  biokinetic  processes   are ai1  linear
(as is  assumed for  OAGPS uptake-biokinetic  moaei)  or can be reasonably
approximated by a linear  model  driven  bv  C(u)   at  time u.   then the
biokinetic  model  can be  represented by an aftereffect  hei  vi ': -->.i ; after
an interval t-u.  Mathematically.

-------
              f
     X ( c ) =  J f(t-u) C!u> du

The after effect function for linear comcartmental models  is  5  ~ i ' :_••.•=•  of
exponential- terms.

     The t ime-ave^agec! concentrat ion-exposure at  time t, denoted  '• ( t : ,  .3
also a   moving average  of concentration  C(u)  at  time  u.   witn  3  we i git
given by  g(t-u) after an interval t-u.  Thus compliance >i i 1 1 te  oaseci  or
the values of the variable Y> covCC u).C', vi] nu dv

                      f f
     covCX ( t )  , Y', s ) 3   = J  J  f(t-u) ai=-'' covCC : u ; , Cr/ ) 1 du  2/

Thus, we  could formalize the proolem of selecting an averaging t:T,e  T  b1-
the following mathematical praole'n:  choosing the  averaairg  time   ~  rra"
maximises the correlation between  X(t? and Y(s).  for  that  time _t _ at vihic^
ECX(tn is max imum.  That is. look for the timers;   t at  vinicn we exoect
the  largest  adverse health effect or effect  indicator  (e.g.  olooc  lead).
Then find the averaging time T such the moving average ar HO Tie otner  tire
=  is   as highly  correlated as  possiDle with  X(t>.  Mete  that we dc  nc r
require tnat s = t.  We may also restrict the range  of values of  T,

EXAMPLE: ONE-COMPARTMENT BIOKINETIC MODEL, MARKOV EXPOSURE  MODEL. .

     Suppose  that  the  relevant  biokinetic  mccel  1=  a   simple  one-
comcartment  model.    The  aftereffect  of a unit pollutant  uptake  is  an
exponential washout (e.g. of blood lead,  to a  first aoprcx imat ion) with
time constant k ,

     f(t-u) = exp(-k (t - uM      if u '. t

            =     0                i f u > t

We  will  also  assume  that  the  concentrat ion-exposure crocess Cst;  is
stochastically second-order stationary with covariance function

     covCC(u) ,C(v) ] = varCCl exo(-a I  u - •/ I >
                               7

-------
After some algebra, one finds  that:

     -.•ar[X( t! ] = varCCl '  ^ : a  * ;-)

     varCY(t>] = varCC] 2    -

                         -exo ( -a' c+T-s) ) /a( k-a)  -exo t -a • s-t >  ; / = • 5--k .] "

If t < s-T then

     covCX( t) ,Y( s) 3 = varCCl  lexo ( -a ( s-1 -T • ; -=. p ; -a ( s-t ) '• 1' ~e ' =--

If t > s (for predicting from  the current  =amoLing  time 5   to -~ /k >. a-^^ >  i k-a ;  ], T

A small  table of correlations between  X(t)  and  V(t;  a
-------
 children  or  for  adults  is  about   i.o/k.   and   that   much   longer   or  much
 shorter averaging   times will not capture significant e-cursions  in blood
 lead.   An   averaging   time   of   15-50   oavs   will   ma^e   v ( t :  reasoned I v
 predictive' cf X(t)  for  both acuity and cniloren.

                                  T^BLE c

 CORRELATION  BETWEEN  BLOOD   LEAD CONCENTRATION AND  AVERAGE EM1.' [ROMdEMT-L
          LEAD CONCENTRATION  AS  A FUNCTION  OF  AVERAGING TIME  F

 Assumed environmental leaa correlation scale a =  l/(4 da/=)

                                         CORRELATION
        Averaging                 CHILD               AC'UL"
      Time T, Days            <• =  LMS days)     K °=  1'iV)  :J3>/5'
            7                   3.9237
            10                   0.9538
            14                   0.9^97              0.~E07
            20                   0.8900              0.3020
            30                   0.770"              0.3783
            60                   0.5^51              0.914!
            90                   O.^'-MJC              O.aS"^

     Samples collected  fcr  compliance   resting   have   a   mere  complicated
structure for the weignt function qir,-u:, namei,  (for  fi--.cur =3^cles  j~c =
ever-/ m days in an  internal  of  T days/.

     qi t-u) = m/hT              ift.-*-vj-l)H
-------
 lead, volume of environmental   intake  (e.g.   m-3/d  of   air,  L'd  of •.•iate",
 mq/d of  leaded soil and dust,  g/d  of  food)  as well as concentration C't..

 6. TIME SCALES FOR THE EFFECTS OF  QZCME ON AGRICULTURAL  C?,GP -'lE'-DS

     The regulation  of ozone  has  for  some  time been  one of E.F.A.'s most
Dressing proolems — a regulatory-  irritant   as  .-jell   a=  ~   lung  ir<-1 i-an*..
 The secondary  standards for ozone nave drawn considerable attsrtic",  ax.e
 to  the  knowledge  that  exposure  to   ozone   may    rause  economical 1•
 significant damage  to cash  crops and  forests.   The time of day of  tr>e
ozone exposure, and the day of  exposure during   the  growing  season,  mav
 seriously  determine  the  effects of  exposure  and consecueor\v of  tre
statistics that  are  used  to  formulate  the   standard.     A  u^ticer o*
 aporoaches  to  defining  a  biologically  relevant   stanca^a  are  tiei^g
 investigated (Lee et al, 19B7ab; Larsen et al.,  1?97>.

     Air monitoring data  have  been   collected   in   ccnnec •:•;-•   with  -r =
chamber studies  of the National Crop Loss ana Assessment   let.-ior'--  (MC-LAN)
 ana  related  studies  have  been   carried   out   at    E.P.A.'s  Ccrvaiiis
Environmental   Research  Laboratory  (CERL).  The  ear lie:  HCLAd  data /-er =
based on seven hours of monitoring  (0900-1600)  and statistics ^pp-opr ia'.«
to  that  period.     More  recent   studies   use   longer  sanoiing per:ccs,
 including 2^-nour samples at CERL.   Examples  of   the  time  patterns c *"
exposure  used   at  CERL   are  shown  in   Lee  et   al.,   icS7-ab.     Tr.e
characterizations of  the  air  monitoring   data  considered  for   use a~
exoosure statistics  and comoliance specifications include  t-ne follo'rtinz,
ail based en the mean hourly ozone  concentration C\'h)  at ^CLI~ -:

     MEAN STATISTICS

          M7 = seasonal  mean of C(h) for 0900-1600 hr  each  da.-

          ill = seasonal mean of daily maximum C(h) aurinc  ~! no'jrs

          Effective Mean = (  * Cih)*  N> *•*! D   CNote:  5  T,=3,-.s =jm]

     PEAK STATISTICS

          F7 = seasonal  peak  of 7-hour daily  mean  over O^OO-lado hrs.

          PI = seasonal peak hourly concentration

     CUMULATIVE STATISTICS

          Total Exposure = t C(h>

          Total  Impact  = <  5  C(h)+*p )**l/p

          Phenologically Weighted  Cumulative Impact  (PWCP

                       = (  $  C(h)**o i-i(ii)  !*-*l/p


                              10

-------
     EXCEEDANCE  STATISTICS

          HP.Sxx  = number of hours  in  .^nicn  C' h :  .

          SUMxx  =  total  ozone  concentration X hours v/it-  C^.    . - •
and at least six other statistics  characterizing episode  lengths etc.
The statistic  most  freauently  consiaered  for   ozone cr^'--.cte'": 13 t i;<
|*17.  However,  the statistics tnat  best  predict  crz' shoot  weigri~ zf  r-u
cuttings  of alfalfa  in a CERL  experiment  was transformed  re  a  '"'-actior
the controls.    The   values of  M7  clearly  measure the damsci :.g  e'"»=c •
ozone, but with  a great  deal of scatter around  the regression  lire.
somewhat  clustered values  of !17 ai e soread  out  bv the sta'.i£r;z  ='.-JCI  '
oi-.es much higher weight to  large  values  of C (h'' (as C^h'/'-c1  a-id   "o
                                                             n
weight  0.3  to  those   preceding  the  previous cutting.   ^ro '.-*eig'i
those   preceding   the   next   earlier   cutting).   Crop  lcs= .=  mj
de finea av  the  values  of  PWCI.  with  relatively  little scarf:-'-
fitted  curve of "Weibull"  form.

     Tl,e ozone  example suggests   that;  biological  time 5C3l£-s'3f -esocnsr
are better  caotured  by comcliance statistics  that give  hiader .-.eiq^1: ':;
recent  exoosures,  as  in   our  lead example.   However,  tne t i .^l- iner : cs an-?
clearly nonlinear   in  ozone   concentration so  tt-ac some nonce mo r ." :~e ~" ± :
mecnanism   of  damage,  repair,   and   netaoolisn   nust  ~e  is = -:"ed  to  -5
ocerating.  The °WCI  is a cumulative value ana not  a  Dea>-  'r e r=e";nc~
statistic,  thus  even  low  levels of   ozone  exoosure acoesr  ~o oe rs'-si-g
seme carnage.    The  biological   statistic   for  comcliance  same Ing '  "-r
alfalfa,  anvway)   is   thus   a   E^-hour  peaK-weign tea  c
-------
     f-or  most   chemicals   of    intere-Jt   there   is   not   neariv  enpuan
 information  on  pharmacoK ine t ics .  to -. icot< inet ics ,   or  tempers;  /ar i ap i i i i; .
of exposure  pattern  to allow these calculations to  'je  marie.   Ho.-;5,5<-t  ^jr
manv criteria pollutants,  the  level  or"  information  is  adequate ^c  :ne
ratio  between  typical  population   levels   so cl:se  to  5  ''eairn effects
criterion  level as  to  make  this  s serious  issue.    For example.   in i°^6.
the criterion   level for  blood  lead  *as 30 uq/ai.  -u r •:>"<€•  geometric mean
blood  lead  in urban  cnildren was  about  15  ug/dl,  of wnicn  'c "C'd'
was assumed  to be "non-air"   bacl-grouna  (i.e.  reauia-ea   c-v   some ?'. >e--
office).    Due  to the  reduction  of  leaaed  gasoline during  tne ! "0 ' = .  the
mean blood   lead level  for  urban children had  fallen ""o   Q-10  ug/dl  bv
 I960,  and   is likely to be  somewhat  lower  today.   However.  ce^te1' d^ta on
health  effects  (e.g.  erythrocyte   oro tooororivr in  i'-cr = 5 ===   i r,   iron-
deficient   children  or  hearing   loss   and   neurobehav icrai prcoiems'  •. n
children with lead burdens now  suggest  a  much   lower ^eaitn  crite-'ior
 level   is   appropriate,  pernaps   10-15 ug/dl.  Thus  the^e is  still •/?-•.
little "margin of safety" against ranconi excursions OT  i=aa  exposure.

     This  is also true  for   other   criteria   pollutants.   especially  for
sensitive  or  vulnerable  suboopu la t ions.     For  example,  asthmatics  •"!>.£<•
experience  sensitivity to elevated  levels   of  sulfur   oioxice   cr  crone.
especially   wnen  exercising.     Ac t i /11 .   levels   ceitsntlv 3f-"ect  tf>e
kinetics of  gaseous  pollutant  uptake   and   elimination.    Sucpcpu lation
variations   in  kinetics  and  pdai maco-iynamics  mav be important.   Acute
exposure sampling in air  or water   (e.g.   1-day  Health   Acvisories  ~or
drinking water)  shoula be sensiti/e  to  pnarmacok inet ic  t-ime  scales,

     Biokinetic information   on pollutant  uotaKe  and  me tabc ' ; s.r  j .">  '•umans
is not often  available  for  /oiatile  organic  c'jiiccu:'3=   acd   for nest
carcinogens.  Thus  large uncertaint.-  -"actors  for  animal =•< trapo 1 = t icn  arc
for route of exposure variations  are  used  to  provide  a conse-'^at I /e level
of  exposure.    The methods shown  here mavbe  less  useful   IT  =LIC^
situations.  But  the  de/elopment  of  lealistic   biological 1,  ^etivatec
pharmacokinetic  models  for e«trapclating   animal  data   to  humans  mav
establish a  larger role  for  assessment  of compliance  test'.nc  for  r,nese
sucstances.

                             ACKMOWLEDGEi-lEilTS

     I am  grateful  to  Ms.  Judy  Kapadia for  retyping  the  m^rusc-ipt.  ano
to the reviewer for  his helpful comments.
                               REFERENCES

Eernan M, Weiss MF.  1978.  SAAM  -  5 i.nula t ion,   A,isl-.si = .   ana  HodeLina.
Manual. U.S. Public  Health Service  Fuel.  NIH-180.

Campbell BC,  Mereditn PA.  Moore MR,  Uatson US.   1°8^.  hinei.cs  of lead
fallowing  intravenous  administration  in man.   To* Letters 51:E31-S35.
CD  [Criteria  Document].   1996.   nil   duality    criteria    for   lead.
Environmental Criteria and Assessment  Office,  US  Environmental  Protection
Agency. EPA-600/8-33/OE8aF (4  volumes). Res. Tri.  Pk. ,  IMC.
                               12

-------
 Chamberlain  AC.  1935.  Frediction  o~~  resncr.se  of blood  leec to airborne
 ana dietar/   lead  from  volunteer  evoeriments  with lead  i settees. =r3C Rov
 Soc Lond  522^: 1^9-182.

 Chamfer la in  MJ ,  Massey  PMQ .   1Q72.  Hi la   ieaa DO i sen ing  mm = >:ce= = 1 vei v
 high  blood  lead.   Brit  J Industr Med  29:^58-^61.

 Chr i stot'f ersson  JO,  Ahiqren  L,   Schut:  A,  Ske>'f / i,-,g  r . l^Bc. Decrease of
 skeletal  lead levels  in man  after ena of occupational exacsure.  Arcn En;
 Health  41:312-318.

 Cohen,  J.   Personal  communica 1 100=   st'Cut UAGiFS staff pace''.  ~Dr:i-No/.
 1996.

 Cools A, Salle JA, Verberk MM, lielhms  PL.  1 9"=. 5 1 ocr-.-ru i - a 1 lescC'-.se of
 male  volunteers  ingesting   inorganic  lead   for <^Q  cays.  Inc  ACT-. Ccc'jc
 Environ Health 38:12Q-139.

 DeSilva PE.  1981.  Determination  of  lead   in   plasma   ana   =trudie= " on :t =
 relationship to  lead  in ervthrocvtes.   Brit  J Industr Mea 3S:20=-E!7.

 Duggan  M J . 1983. The uptake  and  excretion of  lead b -•  .Q'JI:Q  c;iildi=i'. Ai c-.
 Environ Health 38:.2.!C

Laxen, DPH,  Lindsay F,  Raab GM, Hunter R, Fell  GS, Fulton M .  1987.   The
variability of lead in dusts within the homes of  •. oung c^iidien.   In
Lead  In  the Home Environment, ed . E. Culbard. Science  5si . i ~^= . London.

Lee EH,  Tingey DT,  Hogsett WE.   1987a.  Selection  of  the  best  excosure-
resoonse model using various 7-hcur ozone exposure  statistics. Reoort  for
Office of Air Quality  Planning   and  Standards,  US   Environ. Protection
Agency.                       13

-------
L=e  EH,  Tinqev  DT,  Hogsett  WE.   l^BTb.  Evaluation  j-~  crone e-.c
-------
   2 —
   1 ~
0.25

-------
                                                  DISCUSSION
                                              Richard C. Hertzberg
                  Environmental Criteria and Assessment Office, U.S. EPA, Cincinnati, OH 45268

                                                 Comments on
                       "Time Scales: Biological, Environmental, Regulatory," Allan H. Marcus
Summary of Presentation

    Marcus  presents  a  case  for  consideration  of
physiologic   time  scales   in  the  determination  of
compliance  sampling protocols.  The general  theme of
incorporating  physiologic time into risk assessment is
certainly scientifically supportable (e.g., NAS Workshop,
1986, "Pharmacokinetics  in  Risk  Assessment," several
authors), but  has been  previously  proposed only  for
setting  standards.   Marcus  takes  the  application one
step further by showing how improper sampling can fail
to detect exposure fluctuations that have toxicological
significance.

The Regulatory Context

    The modeling and data that Marcus presents seem
reasonable,  but key items seem to be missing, at least if
this approach is to become used by regulatory agencies.
The  examples should  show that the  refinement will
make  a practical  difference  in  the   "cost-benefit"
evaluation, and that  the required data are accessible.

    The first question  is:  does  it   matter?  Most
standards are set  with a fair degree of conservatism, so
that slight  excursions above the standard will not pose a
significant  health risk.  The first impression of Marcus'
proposal is  that it is fine tuning, when in fact it is  the
coarse  control which needs to  be  turned.   Let  us
consider the  example  of lead.   Recent research  has
suggested that significant impairment of neurological
development can  be  caused by lead concentrations much
lower than previously thought.  In fact, some scientists
have suggested that  lead toxicity may be a no-threshold
phenomenon.  If such is the case,  then EPA's approach
to setting  lead standards will change drastically,  and
Marcus' example, though not necessarily  his proposal,
will  probably not  apply.  But  even with  the  current
standard,  it is not  clear  that  results  from  Marcus'
method will not  be  lost  in the usual noise of biological
data.  For  example, consider  his figure showing  the
graphs  of data and  model fits for  11  human  subjects.
First,  these  results  may  be  irrelevant  to  the  air
pollution issue since that data  are following "ingestion"
of lead, not "inhalation."  Lead  inhalation is  in many
ways more  complicated than ingestion.  Also, using  day
30 as an example, the fitted erythrocyte protoporphyrin
levels vary dramatically across  individuals (mean=49,
s.d.=20.3, range=30-73).  I could not read the  graphs
well, but even accounting for differing  starting values,
the curve shapes  also change across individuals, so that
predictions   for   any  untested   individual  might  be
difficult.
     The  second question,  that  of data requirements,
.cannot be answered from this presentation  alone.  But
 some  issues can be mentioned.  It is not clear that the
 correlations between blood lead (Table  1) and monthly
 average  lead  are  good  predictors of  the  correlation
 between   monthly  average  lead  and   neurological
 impairment. But is the correlation the best indicator of
 performance?   A  better  question,  perhaps,   is:  do
 changes in blood lead which could be allowed by using
 the  weakest  sampling  protocol actually  result  in
 significantly   increased   incidence   of   neurological
 dysfunction, when  compared  to the best  compliance
 sampling  procedure   as  determined  using  Marcus'
 scheme?   It  is  not  clear  how  much  data would  be
 required to answer  that question.

     Also, it seems that Marcus' approach  must have
 pharmacokinetic   data   on    humans.    The   data
 requirements  are  then  more severe for most  of the
 thousands  of  environmental  chemicals,  where only
 animal data are available.  The situation is even worse
 for carcinogens,  where human cancer incidence data are
 not available at the low  regulatory levels.   In fact, the
 orders-of-magnitude   uncertainty  in   the   low-dose
 extrapolation  of cancer bioassays easily  swamps the
 error due to non-optimal  compliance sampling.

     So where might  this  research  go?   Certainly  it
 should be  further   developed.   This  approach  will
 definitely be useful for acute regulatory levels,  such as
 the 1-day  Health  Advisories for drinking water, where
 internal  dose   and   toxicity   are   closely  tied  to
 pharmacokinetics.  It  will probably be more significant
 for sensitive subgroups, such as children and those with
 respiratory  disease,  where the  pharmacokinetics are
 likely to  be much different  from the norm,  and where
 the tolerance  to chemical exposure is already low. For
 those cases, scaling factors  and  uncertainty factors are
 highly inaccurate.   Most  important  is  the example
 Marcus   presents,   chemicals   where   uptake   and
 elimination  rates  are  dramatically  different.   For
 control   of  those chemicals,   using  the  "average"
 monitored   level  is   clearly  misleading,   and  some
 approach  such as  Marcus'  must be  used.  I  would
 recommend the following steps:

 •   First,  demonstrate the need. List at  least a
     few   chemicals  that  are  being  improperly
     monitored because  of  their pharmacokinetic
     properties.

 •   Then, show  us that your method works and  is
     practical.
                                                       16

-------
              Statistical  Issues  in Human Exposure Monitoring

        William  C.  Nelson,  U.S.  EPA, EMSL, Research Triangle Park


                                 ABSTRACT

      Pollutant exposure information provides a critical link in risk
 assessment  and therefore  in environmental decision making.  Traditionally,
 outdoor air monitoring stations  have been necessarily utilized to relate
 air  pollutant exposures to  groups of nearby residents.  This approach is
 limited by  (1) using only the outdoor air as an exposure surrogate when
 most  individuals spend relatively small proportions of time outdoors and
 (2)  estimating exposure of  a group rather than an individual.  More
 recently, air monitoring  of non-amoient locations, termed microenvironments,
 such  as residences, offices, and shops has increased.  Such data when
 combined with time  and activity  questionnaire information can provide
 more  accurate estimates of  human exposure.  Development of portable
 personal monitors that can  be used by the individual study volunteer
 provides a  more direct method for exposure estimation.  Personal samplers
 are  available for relatively few pollutants including carbon monoxide and
 volatile organic compounds  (VOC's) such as benzene, styrene, tetrachloroethylene,
 xylene, and dichlorobenzene.  EPA has recently performed carbon monoxide
 exposure studies in Denver, Colorado and Washington, D.C. which have
 provided new information  on CO exposure for individual activities and
 various microenvironments.  VOC personal exposure studies in New Jersey
 and California have indicated that, for some hazardous chemicals,
 individuals may receive higher exposure from indoor air than from outdoor
 air.   Indoor sources include tobacco smoke, cleansers, insecticides,
 furnishings, deodorizers, and paints.  Types of exposure assessment
 included in these studies are questionnaires, outdoor, indoor,  personal,
 and biological (breath) monitoring.

     As more sophisticated exposure data become available, statistical
 design and analysis questions also increase.  These issues include survey
 sampling, questionnaire development, errors-in-variables situation, and
 estimating the relationship between the microenvironment and direct
 personal exposure,  (Methodological  development is needed for models wnich
 permit supplementing the direct personal monitoring approach with an
 activity diary which provides an opportunity for combining these data
 with microenvironment data to estimate a population exposure distribution.
 Another situation is the appropriate choice between monitoring  instruments
 of varying precision and cost.   If inter-individual  exposure variability
 is high, use of a less precise instrument of lower cost which provides  an
 opportunity for additional study subjects may be justified.   Appropriate
 choice of an exposure metric also requires more examination.   In some
 instances, total  exposure may not be as useful  as exposure above a threshold
 level.

     Because community studies  using personal  exposure and microenvironmental
measurements are expensive,  future  studies will  probably use smaller
 sample sizes but be more intensive.   However,  since such studies
 provide exposure data for individuals  rather than only for groups,  they
may not necessarily have less statistical  power.

                                      17

-------
INTRODUCTION

     Pollutant exposure information is a necessary component of the risk
assessment process.  The traditional  approach to investigating the-
relationship between pollutant level  in the environment and the concentration
available for human inhalation, absorption or ingestion, has been 1)
measurements at an outdoor fixed monitoring site or 2)  mathematical  model
estimates of pollutant concentration  from effluent emission rate information.1

     The limitations of such a preliminary exposure assessment have become
increasingly apparent.  For example,  recognition of the importance of
indoor pollutant sources, particularly considering the  large amount of
time spent indoors, and concern for estimating total personal  exposure
have lead to more in-depth exposure assessments.

     One of the major problems to overcome when conducting a risk assessment
is the need to estimate population exposure.  Such estimates require
information on the availability of a  pollutant to a population group  via
one or more pathways.  In many cases, the actual concentrations encountered
are influenced by a number of parameters related to activity patterns.
Some of the more important are:  the  time spent indoors and outdoors,
commuting, occupations, recreation, food consumption, and water supply.
For specific situations the analyses  will involve one major pathway to
man (e.g. outside atmospheric levels  for ozone), but for others, such as
heavy metals or pesticides, the exposure will be derived from several
different media.

     A framework for approaching exposure assessments for air pollutants
has been described by the National Academy of Science Epidemiology of Air
Pollution Committee.2  The activities shown in Figure 1 were considered
to be necessary to conduct an in-depth exposure assessment.

     As knowledge about the components of this framework, particularly
sources and effects, has increased, the need for improved data on exposures
and doses has become more critical.  A literature review published in
1982 discussed a large number of research reports and technical papers
with schemes for calculating population exposures.3  However,  such schemes
are imperfect, relying on the limited data available from fixed air
monitoring stations and producing estimates of "potential exposures"  with
unknown accuracy.  Up until the 1980's, there were few  accurate field
data on the actual exposures of the population to important environmental
pollutants.  Very little was known about the variation  from person to
person of exposure to a given pollutant, the reason for these variations,
or the differences in the exposures of subpopulatiohs of a city.
Furthermore, a variety of field studies undertaken in the 1970s and early
1980s showed that the concentrations  experienced by people engaged in
various activities (driving, walking  on sidewalks, shopping in stores,
working in buildings, etc.) did not correlate well with the simultaneous
readings observed at fixed air-monitoring stations.4-9   Two reviews have
summarized much of the literature on  personal exposures to environmental
pollution showing the difficulty of relating conventional outdoor monitoring
data to actual exposures of the population.i°»H  No widely acceptable
methodology was available for predicting and projecting future exposures
                                    18

-------
of a population or for estimating how population exposures might change
in response to various regulatory actions.  No satisfactory exposure
framework or models existed.

TOTAL HUMAN EXPOSURE

     The total human exposure concept seeks to provide the missing
component in the full risk model:  estimates of the total exposures of
the population to environmental pollutants, with known accuracy and
precision.  Generating this new type of information requires developing
an appropriate research program and methodologies.  The methodology has
been partially developed for carbon monoxide (CO), volatile organic
compounds (VOC's) and pesticides, and additional research is needed to
solve many problems for a variety of other pollutants.

     The total human exposure concept defines the human being as the
target for exposure.  Any pollutant in a transport medium that comes into
contact with this person, either through air, water, food, or skin, is
considered to be an exposure to that pollutant at that time.

     The instantaneous exposure is expressed quantitatively as a
concentration in a particular carrier medium at a particular instant of
time, and the average exposure is the average of the concentration to the
person over some appropriate averaging time.  Some pollutants, such as
CO, can reach humans through only one carrier medium, the air route of
exposure.  Others, such as lead and chloroform, can reach humans through
two or more routes of exposure (e.g., air,.food, and water).  If multiple
routes of exposure are involved, then the total human exposure approach
seeks to determine a person's exposure (concentration in each carrier
medium at a particular instant of time) through all major routes of
exposure.

     Once implemented, the total human exposure methodology seeks to
provide information, with known precision and accuracy, on the exposures
of the general public through all environmental media, regardless of
whether the pathways of exposure are air, drinking water, food, or skin
contact.  It seeks to provide reliable, quantitative data on the number
of people exposed and their levels of exposures, as well as the sources
or other contributors responsible for these exposures.  In the last few
years, a number of studies have demonstrated these new techniques.  The
findings have already had an impact on the Agency's policies and priorties,
As the methodology evolves, the research needs to be directed toward
identifying and better understanding the nation's highest priority
pollutant concerns.

     The major goals of the Total Human Exposure Program can be summarized
as follows:

          Estimate total  human exposure for each pollutant of concern

          Determine major sources of this exposure

          Estimate health risks associated with these exposures

          Determine actions to eliminate or at least reduce these risks

                                    19

-------
     The total  human exposure concept  considers  major routes  of  exposure
by which a pollutant may reach the human  target.   Then,  it  focuses  on
those particular routes which are relevant  for the pollutants of concern,
developing information on the concentrations  present  and the  movement  of
the pollutants  through the,exposure routes.   Activity information from
diaries maintained by respondents helps identify  the  microenvironments of
greatest concern-, and in many cases, also helps  identify likely  contributing
sources.  Biological samples of body burden  may  be measured to confirm
the exposure measurements and to estimate a  later step in the risk  assessment
framework.

     In the total human exposure methodology, two complementary  conceptual
approaches, the direct and the indirect,  have been devised  for providing
the human exposure estimates needed to plan  and  set priorities for reducing
risks.

Direct Approach

     The "direct approach" consists of measurements of exposures of the
general population to pollutants of concern.12  A representative probability
based sample of the population is selected based on statistical  design.
Then, for the class of pollutants under study, the pollutant  concentrations
reaching the persons sampled are measured for the relevant  environmental
media.  A sufficient number of people are sampled using appropriate
statistical sampling techniques to permit inferences  to be  drawn, with
known precision, about the exposures of the larger population from which
the sample has been selected.  From statistical  analyses of subject
diaries which list activities and locations visited,  it usually is possible
to identify the likely sources, microenvironments, and human  activities
that contribute to exposures, including both traditional and  nontraditional
components.

     To characterize a population's exposures, it is necessary to monitor
a relatively large number of people and to select them in a manner that
is statistically representative of the larger population.  This approach
combines the survey design techniques of the social scientist with the
latest measurement technology of the chemist and engineer,  using both
statistical survey methodology and environmental monitoring  in a single
field survey.   It uses the new miniaturized personal  exposure monitors
(PEMs) that have become available over the last decade, 13,14,15 ancj ^
adopts the survey sampling techniques that have been used previously to
measure public  opinion and human behavior.  The U.S. EPA Office of Research
and Development  (ORD)  has recently conducted several  major field studies
using the direct approach, namely, the Total Exposure Assessment Methodology
(TEAM) Study of  VOCs,  the CO field studies in Washington, D.C. and Denver,
and the non-occupational exposure to pesticides study.  These studies
will be described later.

Indirect Approach

     Rather than measuring personal exposures directly  as  in the previous
approach, the  "indirect approach" attempts to construct the  exposure
profile mathematically  by combining information on the  times people spend
                                    20

-------
in particular locations (homes, automobiles, offices, etc.) with the
concentrations expected to occur there.  This approach requires a
mathematical model, information on human activity patterns, and statistical
information on the concentrations likely to occur in selected locations,
or "microenvironments".l^ -A microenvironment can be defined as a location
of relatively homogeneous pollutant concentration that a person occupies
for some time period.  Examples include a house, office, school, automobile,
subway or bus.  An activity pattern is a record of time spent in specific
mi croenvi ronments.

     In its simplest form the "indirect approach" seeks to compute the
integrated exposure as the sum of the individual products of the concentrations
encountered by a person in a microenvironment and the time the person
spends there.  The integrated exposure permits computing the average
exposure for any averaging period by dividing the time duration of the
averaging period.  If the concentration within microenvironment j is
assumed to be constant during the period that person i occupies
microenvi ronment j, then the integrated exposure E-j for the person i will
be the sum of the product of the concentration cj in each microenvironment
and the time spent by person i in that microenvironment

           J
     E1 -  I Cjt1jf
         j = 1

where E-j = integrated exposure of person i  over the time period of interest;

      Cj = concentrations experienced in microenvironment j;

     t-jj = time spent by person i in microenvi ronment j; and

       J = total number of microenvironments occupied by person i over
           the time period of interest.

To compute the integrated exposure E^ for person i, it obviously is
necessary to estimate_both Cj and t-jj.  If T is the averaging time,
the average exposure E-j of person i is obtained by dividing by T; that is
E-J = E-j/T, where E-j is summed over time T.

     Although the direct approach is invaluable in determining exposures
and sources of exposure for the specific population sampled, the Agency
needs to be able to extrapolate to much larger populations.  The indirect
approach attempts to measure and understand the basic relationships
                  variables and resulting exposures, usually in particular
                   through "exposure modeling."  An exposure model takes
                  the field, and then, in a separate and distinct activity,
                    The exposure model is intended to complement results
            studies and to extend and extrapolate these findings to  other
            other situations.  Exposure models are not traditional
                  used to predict
between causative
mi croenvi ronments
data collected in
predicts exposure
from di rect
locales and
dispersion models
                                  outdoor concentrations;  they are
different models designed to predict the exposure of a rather mobile
human being.  Thus, they require information on typical  activities and
time budgets of people, as well  as information on likely concentrations
in places where people spend time.
                                    21

-------
     The U.S. EPA ORD has  also conducted  several  studies  using  the  indirect
approach.  An example of a recent  exposure  model  is  the Simulation  of
Human Activities ad Pollutant Exposures  (SHAPE) model, which  has  been
designed to make predictions  of exposures to  population to  CO in  Urban
areas.  This model is similar to the NAAQS  Exposure  Model  (NEM).  The
SHAPE model used the CO concentrations measured in the Washington-Denver
CO study to determine the  contributions  to  exposure  from  commuting,
cooking, cigarette smoke,  and other factors.   Once a model  such as  SHAPE
is successfully validated  (by showing that  it accurately  predicts exposure
distributions measured in  a TEAM field study),  it can be  used in  a  new
city without a field study to make a valid  prediction of  that population's
exposures using that city's data on human activities, travel  habits, and
outdoor concentrations.  The  goal  of future development is  to apply the
model to other pollutants  (e.g., VOCs, household  pesticides)  making it
possible to estimate exposure frequency  distributions for the entire
country, or for major regions.

Field Studies

     The total human exposure field studies from  a central  part of  the
U.S. EPA ORD exposure research program.   Several  studies  have demonstrated
the feasibility of using statistical procedures to choose a small
representative sample of the  population  from which it is  possible to make
inferences about the whole population.   Certain subpopulations  of importance
from the standpoint of their  unique exposure to the  pollutant under study
are "weighted" or sampled more heavily than others.   In the subsequent
data analysis phases, sampling weights  are  used to adjust for the
overrepresentation of these groups.  As  a result, it is possible  to draw
conclusions about the exposures of the  larger population  of a region with
a study that is within acceptable costs.

     Once the sample of people has been  selected, their exposures to the
pollutant through various environmental  media (air,  water,  food,  skin)   '
are measured.  Some pollutants have negligible exposure  routes  through
certain media, thus simplifying the study.   Two large-scale total human
exposure field studies have been undertaken by U.S.  EPA to demonstrate
this methodology:  the TEAM study of VOCs and the Denver  - Washington  DC,
field study of CO.

     The first set of TEAM Studies  (1980-84) were the most extensive
investigation of personal  exposures to  multiple pollutants and  corresponding
body burdens.  In all, more than 700 persons in 10 cities have  had  their
personal exposures to 20 toxic compounds in air and-drinking water  measured,
together with levels in exhaled breath  as an indicator  of blood
concentration.17'19  Because of the probability survey  design used,
inferences can be made about a larger target population  in certain  areas:
128,000 persons in Elizabeth/Bayonne, NJ; 100,000 persons in the South
Bay Section of Los Angeles, CA; and 50,000 persons in Antioch/Pittsburg,
CA.
                                    22

-------
     The major findings of the TEAM Study may be summarized  as  follows:

1.   Great variability (2-3 orders of magnitude) of exposures  occur even
in small geographical  areas (such as a college campus)  monitored  on the
same day.

2.   Personal and overnight indoor exposures consistently  outweigh  outdoor
concentrations.  At the higher exposure levels, indoor  concentrations  may
be 10-100 times the outdoor concentrations,  even in New Jersey.

3.   Drinking water and beverages in some cases are the main pathways  of
exposure to chloroform and bromodichloromethane — air  is  the  main  route
of exposure to 10 other prevalent toxic organic compounds.

4.   Breath levels are significantly correlated with previous  personal
air exposures for all  10 compounds.  On the  other hand, breath  levels  are
usually not significantly correlated with outdoor levels,  even  when the
outdoor level is measured in the person's own backyard.

5.   Activities and sources of exposure were significantly  correlated
with higher breath levels for the following  chemicals:

     benzene:  visits  to service stations, smoking, work in  chemical  and
     paint plants;
     tetrachloroethylene:  visits to dry cleaners.

6.   Although questionnaires adequate for identifying household  sources
were not part of the study, the following sources were  hypothesized:

     p-dichlorobenzene:  moth crystals, deodorizers, pesticides;
     chloroform:  hot  showers, boiling water for meals;
     styrene:  plastics, insulation, carpets;
     xylenes; ethylbenzene:  paints, gasoline.

7.   Residence near major outdoor point sources of pollution had  little
effect, if any, on personal exposure.

     The TEAM direct approach has four basic elements:

     Use of a representative probability sample of the  population under
     study

     Direct measurement of the pollutant concentrations reaching  these
     people through all media (air, food, water, skin contact)

     Direct measurement of body burden to infer dosage

     Direct recording  of each person's daily activities through  diaries

     The Denver - Washington, DC CO Exposure Study utilized  a  methodology
for measuring the frequency distribution of  CO exposures in  a  representative
sample of urban populations during 1982-83.20-22  Household  data  were
collected from over 4400 households in Washington, DC and  over  2100
                                    23

-------
households in the Denver metropolitan areas.   Exposure  data  using  personal
monitors were collected from 814 individuals  in  Washington,  DC,  and  450
individuals in Denver, together with  activity data  from a  stratified
probability sample of the residents living in each  of the  two  urban  areas.
Established survey sampling procedures were used.   The  resulting exposure
data permit statistical comparisons between population  subgroups (e.g.,
commuters vs. noncommuters, and residents with and  without gas stoves).
The data also provide evidence for judging the accuracy of exposure
estimates calculated from fixed site  monitoring  data.

     Additional efforts are underway  to use these  data  to  recognize  indoor
sources and factors which contribute  to elevated CO exposure levels  and
to validate existing exposure models.

Microenvironment Models

     Utilizing data collected in the  Washington, DC urban-scale CO Study,
two modeling and evaluation analyses  have been developed.   The first,
conducted by Duan, is for the purpose of evaluating the use of microenvironmental
and activity pattern data in estimating a defined  population's exposure  to
CO.16  The second, conducted by Flachsbart, is to  model the microenvironmental
situation of commuter rush-hour traffic (considering type  and age  of
vehicle, speed, and meteorology) and  observed CO concentrations.5   With
the assistance of a contractor, U.S.  EPA has collected  data on traffic
variables, traffic volume, types of vehicles, and  model year.   An  earlier
study measured CO in a variety of microenvironments and under a variety
of conditions.23

     The indirect method for estimating population exposure to CO  was
compared to exposures to the CO concentrations observed while people
carried personal monitors during their daily activities.  The indirect
estimate derived from personal monitoring at the low concentration levels,
say 1 ppm but higher at levels above that.  For example, at the 5  ppm
level, indirect estimates were about half the direct estimates within  the
regression model utilizing these data.  Although the results are limited,
it appears that when monitoring experts design microenvironmental  field
surveys, there is a tendency to sample more heavily in those settings
where the concentration is expected to be higher,  thereby  causing exaggerated
levels of the indirect method.  The possibility of using microenvironmental
measurements and/or activity patterns from one city to extrapolate to
those of another city  is doubtful  but not yet fully evaluated.

Dosimetry Research

     The development  of reliable biological indicators of either specific
pollutant exposures or health  effects is  in its early  stages.   A limited
number of biomarkers  such  as blood levels of  lead or CO have  been recognized
and used for some time.  Breath levels of VOCs or CO have also  been
measured successfully.  However, the  use  of other biomarkers  such as
cotinine, a metabolite of  nicotine,  for  a tracer compound of  environmental
tobacco  smoke  is still in  its  experimental phase.   This also  applies to
                                    24

-------
use of the hydroxyproline-to-creatinine ratio as a measure of N02 exposure
and also to use of DNA adducts which form as a result of VOC exposure  and
have been found to be correlated with  genotoxic measures.   Dosimetry
methods development, though still very new and too often not yet  peady
for field application for humans, is obviously a very promising research
area.

     Exhaled breath measurements have  been used successfuly in VOC and CO
exposure studies.  Since breath samples can be obtained noninvasively,
they are preferred to blood measurements whenever they can meet the
exposure research goals.  A methodology to collect expired samples on  a
Tenax adsorbent has been developed and used on several hundred TEAM study
subjects.  Major findings have included the discovery that breath levels
generally exceed outdoor levels, even  in heavily industrialized'petrochemical
manufacturing areas.  Significant correlations of breath levels with
personal air exposures for certain chemicals give further proof that  the
source of the high exposure is in personal activities or indoors, at  home
as well as at work.

     The basic advantages of monitoring breath rather than blood or tissues
are:

1.   Greater acceptability by volunteers.  Persons give breath samples
more readily than blood samples.  The procedure is rapid and convenient,
taking only 5-10 min. in all.

2.   Greater sensitivity. . Since volatile organic compounds often have a
high air-to-blood partition coefficient, they will have higher concentrations
in breath than in blood under equilibrium conditions.  Thus, more than
100 compounds have been detected in the breath of subjects where
simultaneously collected blood samples showed only one or two above
detectable limits.

3.   Fewer analytical problems.  Several "clean-up" steps must be completed
with blood samples, including centrifuging, extraction, etc., with each
step carrying possibility for loss or contamination of the sample.

     Measurements of CO in expired air often are used as indicators of
carboxyhemoglobin (COHb) concentrations in blood, although the precise
relationship between alveolar CO and blood COHb has not been agreed upon.

     The U.S. EPA exposure monitoring program therefore included a breath
monitoring component in its study of CO exposures in Denver and Washington,
DC.  The purpose was (1) to estimate the distribution of alveolar CO (and
therefore blood COHb) concentrations in the nonsmoking adult residents of
the two cities; and (2) to compare the alveolar CO measurements to preceding
personal CO exposures.

     The major findings of the breath monitoring program included:

1.   The percent of nonsmoking adults with alveolar CO exceeding 10 ppm
(i.e., blood COHb 2%) was 11% in Denver and 6% in Washington, DC.
                                    25

-------
2.   The correlations between breath CO and previous  8-h  CO  exposure  were
0.5 for Denver and 0.66 for Washington, DC.

3.   The correlations between personal  CO exposures  at  home  or at-work
and ambient CO at the nearest stations  averaged  0.25  at Denver and  0.19
at Washington, DC.  Thus,  the ambient data explained  little  of the
variability of CO exposure.

Sampling Protocols

     Statistical  sampling  protocols are the design  for  large-scale  total
human exposure field studies.  They describe the procedures  to be used  in
identifying respondents, choosing the sample sizes,  selecting the number
of persons to be  contacted within various subpopulations,  and other
factors.  They are essential to the total human  exposure  research program
to ensure that a  field survey will  provide the information necessary  to
meet its objectives.  Because one's activities affect one's  exposures,
another unique component of the total human exposure  research program is
the development of human activity pattern*data bases.  Such  data bases
provide a record  describing what people do in time  and  space.

     Whenever the objectives of a study are to make  valid  inferences  beyond
the group surveyed, a statistical survey design  is  required. For exposure
studies, the only statistically valid procedure  that  is widely accepted
for making such inferences is to select a probability sample from the
target population.  The survey designs  used in the  total  exposure field
studies have been-three-stage probability-based, which  consist of areas
defined by census tracts,  households randomly selected  within the census
tracts, and stratified sampling of screened eligible  individuals.20,24

STATISTICAL ISSUES

TEAM Design Considerations

     It appears that some  variability in the TEAM exposure data might be
due to meteorological factors such as some receptors  being downwind of  the
sources while others are not.  A more careful experimental design that
includes consideration of  these factors, including  measurement of
appropriate meteorological parameters,  may lead  to  more meaningful  data
in future studies.

     Other TEAM design considerations are:

1.   The intraperson temporal variation in VOC exposure is crucial  in
     risk assessment and should be given a high  priority  in  future  studies.

2.   Given the substantial measurement  error, the estimated  exposure
     distributions can be substantially more heterogeneous than the true
     exposure distributions.  For example, the variance of the estimated
     exposures is the sum of the variance of the true exposures and the
     variance of  the measurement errors, assuming that:  a)  measurement
     errors are homoscedastic, and b) there is no correlation between
     measurement  error and true exposure.  Empirical  Bayes methods  are
     available for such adjustments.
                                    26

-------
3.    The relatively high refusal  rate in the sample enrollment  is  of
     concern.   A more rigorous effort in the future to assess the  impact
     of the refusal on the generalizability  of the  sample  is desirable.
     For example, a subsample of  the accessible part of the  refusals  can
     be offered an incentive to participate, or be  offered a less  intensive
     protocol  for their participation; the data from the would-be  refusals
     can then  be compared with the "regular" participants  to assess the
     possible  magnitudes of selection bias.

4.    In future studies, the following might  be used:

     a.   use  of closed format questionnaires,
     b.   use  of artifical intelligence methodology,
     c.   use  of automated instrument output.

Development of Improved Microenvironmental Monitoring Designs

     The direct method of personal  exposure is appealing but is expensive
and burdensome to human subjects.   Monitoring microenvironments instead
is  less costly but estimtes personal exposure only  indirectly.   Obviously
these approaches can be used in a complementary way to answer specific
pollutant exposure questions.

     With either method, a crucial  issue is  how to  stratify  the
microenvironments into relatively homogeneous microenvironment  types
(METS).12  Usually there are many possible ways to  stratify  the
microenvironments into METs, thus there can be many potentially distinct
METs.  Obviously one cannot implement a stratification scheme with five
hundred METs in field studies.  It is therefore important  to develop
methods for identifying the most  informative ways to stratify the
microenvironments into METs.  For example, if we can only  afford to
distinguish two METs in a field study, is it better to distinguish indoor
and outdoor as the two METs', or is it better to distinguish  awake  and
sleeping as the two METs?

     Some of the more important issues which will require  additional
methodological development are:

1.    How to identify the most informative ways to stratify microenvironments
     into METs.

2.    How to optimize the number of METs, choosing between  a  larger number
     of METs and fewer microenvironments for each MET, and a smaller
     number of METs and more microenvironments for  each MET.

3.    How to allocate the number of monitored microenvironments  across
     different METs:  one should monitor more microenvironments for the
     more crucial METs  (those in which the human subjects  spend more  of
     their time) than the less crucial METs.
                                    27

-------
Development and Validation of Improved Models  for Estimating Personal
Exposure from Microenvironmental  Monitoring Data

     Methodological  development is needed for  models  which  allow
supplementing the direct personal monitoring approach with  an activity
diary enabling these data to be combined with  indirect approach
microenvironmental data to estimate personal exposure through a regression-
like model.  The basic exposure model  which sums over microenvironments

     Ei • I Cjtij
          j

can be interpreted as a regression model with  the concentrations being
the parameters to be estimated.  To fully develop this approach, it is
necessary to make crucial assumptions  about independence between individuals
and between METs.  Therefore, it is very important to validate the method
empirically.

Errors-in-Vari'ables Problem

     It is important to recognize an errors-in-variables situation which.
may often occur in exposure assessment.  In estimating the  relationship
between two variables, Y (a health effect) and X (true personal exposure),
when X is not observed but a surrogate of X, say Z, which is related to  X
is observed.  Such variables may have  systematic errors as  well as zero-
centered random errors.  The effects of the measurement bias are more
serious in estimation situations than  for hypothesis  testing.

Choice Between Monitoring Instruments  of Varying Precision  and Cost

     When designing monitoring programs, it is common to have available
instruments of varying quality.  Measurement devices  that are less
expensive to obtain and use are typically also less accurate and precise.
Strategies could be developed and evaluated that consider the costs of
measurement as well as the precision.    In situations  of high between-
individual exposure variability, a less precise instrument  of lower cost
may be preferred if it permits an opportunity for enough additional study
subjects.

Development of Designs Appropriate for Assessing National Levels

     At the present time, the data available for the assessment of personal
exposure distributions are restricted to a  limited number of locales.
The generalization from existing data to a  very general population such
as the national population requires a great deal of caution.  However,  it
is conceivable that large scale studies or monitoring programs aimed at  a
nationally representative sample might  be implemented in the future.  It
would be useful to consider the design  of such studies using data presently
available.  It would also be useful to  design studies of more limited
scales to be conducted in the near future as pilot studies for a possible
national study, so as to collect information which might be useful for
the design of a national study.
                                    28

-------
     An issue in the design of a national  study is  the amount  of  clustering
of the sample:  one has to decide how many locales  to use,  and how large
a sample to take for each locale.  The decision depends partly on the
fixed cost in using additional locales, and partly  on the intracluster
correlation for the locales.  For many of  the VOC's measured  in the TEAM
studies, there is far more variability within locales than  between locales,
in other words, there is little intracluster correlation for  the  locales.
This would indicate that a national study  should be highly  clustered,
with a few locales and a large sample for  each locale.  On  the other
hand, if there is more variability between locales  than within locales,  a
national study should use many locales and a small  sample for  each locale.

     Further analysis of the existing TEAM data base can help  to  address
these issues.  For example, the TEAM sample to date can be  identified  as
a "population" from which various "samples" can be  taken.  The characteristics
of various sample types can be useful for  the design of any followup
studies as well as for a larger new study.

Evaluating Extreme Values in Exposure Monitoring

     Short term extreme values of pollutant exposure may well  be  more
important from a biological point of view  than elevated temporal  mean
values.  The study of statistical properties of extreme values from
multivariate spatio-temporally dependent data is in its infancy.   In
particular, the possibility of synergy necessitates the development of  a
theory of multivariate extreme values.  It is desirable to  develop estimates
of extreme quantiles of pollutant concentration.

Estimation Adjustment for Censored Monitoring Data

     One should develop low exposure level extrapolation procedures and
models, and check the sensitivity of these procedures to the models
chosen.  In some cases a substantial fraction of exposure monitoring data
is below the detection limit even though these low  exposure levels may  be
important.  The problem of extrapolating from measured to unmeasured
values thus naturally arises.  Basically this is a  problem of fitting  the
lower tail of the pollutant concentration  distribution.  Commonly used
procedures assume either that below detectable level values are actually
at the detection limit, or that they are zero, or that they are one-half
of the detection limit.

     In many monitoring situations we may  find a good fit to simple models
such as the lognormal for that part of the data which lies  above  the
detection limit.  Then the calculation of  total exposure would use a
lognormal extrapolation of the lower tail.

SUMMARY

     Personal exposure assessment is a critical link in the overall risk
assessment framework.  Recent advances in  exposure  monitoring have provided
new capabilities and additional challenges to the environmental research
team, particularly to the statistician, to improve  the current state of
                                    29

-------
information on microenvironment concentrations, activity patterns, and
particularly personal  exposure.  If these opportunities are realized,
then risk assessments  can more often use human exposure and risk data in
addition to available  animal  toxicology information.
                                    30

-------
REFERENCES

1.   Lioy, P. J., (1987)  In Depth Exposure Assessments.  JAPCA,  37,  791-
          793.

2.   Epidemiology of Air  Pollution,  National  Research  Council National
          Academy Press,  Washington, DC (1985),  1-334.

3.   Ott, W. R. (1982)  Concepts of human exposure to air pollution,
          Environ. Int.,  7, 179-196.

4.   Cortese, A. D.  and Spengler, J.D.  (1976) Ability  of fixed  monitoring
          stations to represent carbon  monoxide  exposure.  J.  Air Pollut.
          Control Assoc., 26, 1144.

5.   Flachsbart, P.  G.  and Ott, W. R. (1984)  Field Surveys of carbon
          monoxide in commercial  settings using  personal exposure monitors.
          EPA-600/4-94-019, PB-84-211291, U.S. Environmental  Protection
          Agency, Washington, DC.

6.   Wallace, L. A.  (1979) Use of personal monitor to  measure commuter
          exposure to carbon monoxide in vehicle passenger compartment.
          Paper No.  79-59.2, presented  at the 72nd Annual  Meeting of  the
          Air Pollution Control Association,  Cincinnati, OH.

7.   Ott, W. R. and Eliassen, R.  (1973) A survey technique for  determining
          the representativeness  of urban air monitoring stations with
          respect to carbon monoxide, J. Air. Pollut.  Control Assoc.  23,
          685-690.

8.   Ott, W. R. and Flachsbart, P. (1982) Measurement  of carbon monoxide
          concentrations  in indoor and  outdoor locations using  personal
          exposure monitors, Environ. Int. 8, 295-304.

9.   Peterson, W. B. and Allen, R. (1982) Carbon monoxide exposures to
          Los Angeles commuters,  J.  Air Pollut.  Control  Assoc.  32,  826-833.

10.  Spengler, J. D. and Soczek,  M.  L.  (1984) Evidence for improved
          ambient air quality and the need for personal  exposure research,
          Environ. Sci. Techno!.  18, 268-80A.

11.  Ott, W. R. (1985)  Total human exposure:   An emerging science focuses
          on humans as  receptors  of environmental pollution,  Environ.
          Sci. Techno!. 19, 880-886.

12.  Duan, N  (1982) Models for human exposure to air pollutant, Environ.
          Int. 8, 305-309.

13.  Mage, D. T. and Wallace, L.  A., eds. (1979) Proceedings  of the
          Symposium on  the Development  and Usage of Personal  Monitors  for
          Exposure and  Health Effects Studies.  EPA-600/9-79-032, PB-80-
          143-894, U.S. Environmental Protection Agency, Research Triangle
          Park, NC.
                                    31

-------
14.   Wallace,  L.  A.  (1981)  Recent  progress in developing and using personal
          monitors  to  measure  human  exposure to air pollution, Environ.
          Int. 5, 73-75.

15.   Wallace,  L.  A.  and  Ott, W.  R.  (1982) Personal monitors:  A state-of-
          the-art survey,  J. Air Pollut. Control Associ. 32, 601-610.

16.   Duan, N.  (1984) Application of  the microenvironment type approach to
          assess  human exposure  to carbon monoxide.  Rand Corp., draft
          final report submitted to  the U.S. Environmental  Protection
          Agency, Research Triangle  Park, NC.

17.   Wallace,  L.  A., Zweidinger, R., Erickson, M., Cooper,  S., Whitaker,
          D.,  and Pellizzari,  E. D.  (1982) Monitoring  individual exposure:
          Measurements of volatile organic compounds in breathing-zone
          air, drinking water, and exhaled breath, Environ.  Int. 8,  269-282.

18.   Wallace,  L., Pellizzari,  E.,  Hartwell,  T., Rosenzweig,  M., Erickson,
          M.,  Sparacino, C. and Zelon, H.  (1984) Personal exposures
          to volatile  organic  compounds:  I.  Direct measurements in
          breathing-zone air,  drinking water,  food,  and exhaled breath,
          Environ. Res. 35, 293-319.

19.   Wallace,  L., Pellizzari,  E.,  Hartwell,  T., Zelon, H.,  Sparacino,  C.,
          and Whitmore, R. (1984)  Analyses  of  exhaled  breath of 335
          urban  residents for volatile organic compounds,  in Indoor  Air,
          vol. 4:  Chemical Characterization'and  Personal  Exposure,  pp.
          15-20.  Swedish Council  for Building Research, Stockholm.

20.  Akland, G.  G., Hartwell,  T. D., Johnson,  T.R.,  and Whitmore,  R. W.
          (1985) Measuring human exposure to carbon  monoxide in Washington,
          DC,  and Denver, Colorado, during  the winter  of 1982-83,  Environ.
          Sci. Technol. 19, 911-918.

21.  Johnson,  T.  (1984) A study of  personal  exposure to carbon  monoxide
          in  Denver, Colorado.  EPA-600/4-84-015,  PB-84-146-125,
          Environmental Monitoring  Systems  Laboratory, U.S. Environmental
          Protection Agency, Research Triangle Park, NC

22.  Hartwell, T. D., Carlisle, A.  C., Michie, R.  M.,  Jr.,  Whitmore, R.
          W.,  Zelon, H. S., and Whitehurst,  D. A.  (1984) A study  of carbon
          monoxide exposure of  the  residents in Washington, DC.   Paper
          No.  121.4,  presented  at the 77th Annual  Meeting of the Air
          Pollution Control Association, San Francisco, CA.

23.  Holland,  D. M. and Mage, D.  T.  (1983)  Carbon monoxide in four  cities
          during the winter of  1981.  EPA-600/4-83-025, Environmental
          Monitoring  Systems  Laboratory, U.S. Environmental Protection
          Agency,  Research Triangle Park, NC.

 24.  Whitmore, R.  W.,  Jones,  S. M., and Rozenzeig, M.  S. (1984) Final
           sampling  report  for the study of personal CO exposure.  EPA-
           600/S4-84-034,  PB-84-181-957, Environmental  Monitoring
           Systems Laboratory,  U.S.  Environmental Protection Agency,
           Research Triangle Park, NC.

                                    32

-------
 FRAMEWORK  FOR  EXPOSURE  ASSESSMENT
  Outdoor
  Emission
  Sources
Outdoor
Concentrations
  Time-activity
  patterns
                        Total
                        Personal
                        Exposure
                      Internal Dose
                          i
                   Biologically Effective
                   Dose
                          i
                      Health Effect
    Indoor
    Emission
    Sources
                                               I
  Indoor
  Concentrations
Time-activity
patterns

-------
TOTAL HUMAN  EXPOSURE PROGRAM
GOALS:
   Estimate  total human exposure for each
   pollutant of  concern
   Determine major  sources of this exposure
   Estimate  health  risks associated with
   these exposures

   Determine actions to reduce these risks

-------
    PROPORTION OF  TIME  IN SELECTED MICROENVIRONMENT

                      EMPLOYED  PERSONS
     INDOORS. WORK—28%
OJ
(J\
                                          OUTDOORS—2%
                                            IN TRANSIT—6%
                                             INDOORS, OTHER—1%
                        INDOORS, HOME—63%

-------
   PROPORTION  OF TIME IN  SELECTED MICROENVIRONMENTS
                  FULL-TIME HOMEMAKERS
CO
ON
                                               .A*
                                        INDOORS,  OTHER—5%
                  INDOORS, HOME~89%

-------
           MAJOR  EXPOSURE  SOURCES
          Outdoors
                       Indoors
Ul
-4
Industrial
Automobile
Toxic wastes
Pesticides
Tobacco smoke
Gas stoves
Cleaners
Sprays
Dry Cleaning
Paints
                                Polishes

-------
            EXPOSURE  ASSESSMENT FOR

                 COMMUNITY STUDIES
OJ
00
                    Questionnaires
                    Outdoor monitoring
                    Indoor monitoring
                    Personal monitoring
                    Biological monitoring

-------
                                        DISCUSSION
                                      William F. Hunt, Jr.
                               Chief, Monitoring and Research Branch
                                   Technical Support Division
                                Research Triangle Park, NC  27711
    William C.  Nelson's paper provides an
 excellent overview of exposure monitoring
 and associated statistical  issues.   The
 reader must keep in mind  that  the paper
 is directed at estimating  air  pollution
 in microscale  environments—in the home,
 at work,  in automobiles, etc., as well as
 in the ambient air to  which the general
 public has access.
    While   it  is  important  to  better
 understand air pollution  levels  in  each
 of these  microenvironments,  it  must  be
 clearly  understood  that  the  principal
 focus   of  the  nation's  air  pollution
 control   program    is   directed    at
 controlling ambient outdoor air pollution
 levels to which  the  general public  has
 access.   The Clean Air Act (CAA)  of  1970
 and the   CAA   of 1977  emphasized   the
 importance of  setting and  periodically
 reviewing  the   National   Ambient   Air
 Quality   Standards   (NAAQS)   for   the
 nation's   most  pervasive   ambient   air
 pollutants—particulate matter,   sulfur
 dioxide,    carbon  monoxide,    nitrogen
 dioxide,  ozone and  lead.  NAAQS(s)  were
 set to protect against both public health
 and welfare effects.
    One   of.  these   pollutants,   carbon
 monoxide  (CO), is discussed extensively
 in  Dr.  Nelson's  paper.     CO  is   a
 colorless, odorless,  poisonous gas formed
 when   carbon   in  fuels is   not  burned
 completely.   Its major source is motor
 vehicle exhaust,  which contributes  more
 than    two-thirds   of   all   emissions
 nationwide.    In cities  or  areas  with
 heavy   traffic   congestion,    however,
 automobile exhaust can cause as  much  as
 95 percent of  all emissions, and carbon
 monoxide  concentrations can reach  very
 high levels.
    In Dr.  Nelson's paper,  he states  that
 the  correlations  between   personal   CO
 exposures  at home or at work and  ambient
 CO  at  the   nearest   fixed  site  air
 monitoring stations are weak.   This  does
 not  mean  from an air  pollution  control
 standpoint,   however,   that   there    is
 something  wrong with the  fixed site  CO
 monitoring  network.   As stated earlier,
 the  air  pollution  control   program  is
directed  at  controlling outdoor  ambient
 air at locations to which the public has
access.    The  microscale  CO monitoring
sites are  generally  located in areas  of
highest concentration within  metropolitan
areas at  locations to  which the  general
public has access.
   The  Federal  Motor  Vehicle   Control
Program  has  been  very  successful   in
reducing these concentrations over time.
In  fact,   CO  levels   have  dropped   32
 percent   between   1977   and  1986,   as
 measured  at  the  nation's  fixed  site
 monitoring networks."   This  improvement
 has a corresponding benefit for people in
 office  buildings which  use the  outdoor
 ambient  air to introduce fresh air  into
 their buildings through their ventilation
 systems.    A  major benefit  occurs  for
 people who are driving  back and forth to
 work  in  their automobiles,  for new  cars
 are much less polluting  than  older cars.
 This  should  be  clearly understood  when
 trying to  interpret the major findings of
 the breath monitoring programs that  are
 described    in   Dr.   Nelson's    paper.
 Otherwise,  the reader  could  mistakenly
 conclude   that   somehow   the   Federal
 Government may be in error in using fixed
 site monitoring.  Such a conclusion would
 be  incorrect.    Further,  it  should  be
 pointed  out  that a fixed  site  network
 also  has  the  practical advantages  of
 identifying the source of the problem and
 the  amount  of  pollution  control  that
 would be needed.
   Another area of  concern  that needs  to
 be addressed  in the future  regarding  the
 breath   monitoring   program   is   the
 relationship   between  alveolar   CO   and
 blood  carboxyhemoglobin  (COHb).     Dr.
 Nelson    states   that    the    precise
 relationship   between  alveolar   CO   and
 blood  COHb  has   not  been  agreed upon.
 Given that, is there an inconsistency  in
 not   being   able  to    determine   the
 relationship   between  alveolar   CO   and
 blood COHb and  then  using  alveolar  CO
measurements   in  Washington,  D.C.   and
 Denver,  Colorado  to estimate blood COHb?
   A  final  point,  which  needs   to   be
addressed   in  the   breath  monitoring
program,is the ability to detect volatile
organic chemicals,  some  of  which  may  be
carcinogenic.  What is  the  significance
of being able to detect 100 compounds  in
 breath,  yet  only one  or two  in blood
 above the  detectable  limits?   Does  the
 body expel the other 98  compounds that
 cannot be  detected in the blood?   If  so,
 why?
 STATISTICAL ISSUES
   I   agree   with    Dr.    Nelson  that
 meteorological    factors    should    be
 incorporated  into future TEAM studies,
 through more careful experimental  design.
The statistical  issues  identified under
TEAM    design    considerations,     the
development    of    improved
raicroenvironmental  monitoring  designs,
errors-in-variables   problem,     choice
between monitoring instruments of  varying
precision  and cost, the  development  of
designs    appropriate    for   assessing
                                           39

-------
National   levels,   evaluating   extreme
values   in   exposure  monitoring,   and
adjusting for  censored monitoring  data
are all well thought  out  and timely.   I
strongly  agree  with  his  recommendation
that when considering multiple pollutant
species, as in the  case of  the volatile
and semi-volatile organic  chemicals,  as
well as polar compounds,  the possibility
of synergistic effects necessitates the
development of a theory of  multivariate
extreme values.
SUMMARY
   In  conclusion,   Dr.   Nelson's  paper
provides a well thought out overview of
exposure  monitoring  and  the associated
statistical  issues.    It  should  be  an
excellent reference for people interested
in  this  topic.    The  reader  should  be
aware, however, of the importance of the
nation's fixed site monitoring network in
evaluating  the   effectiveness   of  the
nation's air pollution control program.
REFERENCE
1.   National  Air  Quality  and Emissions
Trends Report, 1986.   U.S. Environmental
Protection  Agency,   Technical  Support
Division, Monitoring and Reports Branch,
Research Triangle Park,  NC  27711.
                                           40

-------
                       Designing Environmental Regulations
                                  Stfren Bisgaard and William G. Hunter*
                            Center for Quality and Productivity Improvement
                                     University of Wisconsin-Madison
                              610 Walnut Street, Madison, Wisconsin 53705
 • Public debate  on proposed environmental regulations
 often focuses almost entirely  (and naively) on the allow-
 able limit for a particular pollutant, with scant attention
 being paid to the statistical nature of environmental data
 and  to  the  operational  definition of compliance.  As  a
 consequence regulations may fail to accomplish their pur-
 pose. A unifying framework is therefore  proposed that
 interrelates assessment of risk and determination of compli-
 ance.  A  central  feature is the operating characteristic
 curve, which displays the discriminating power of a regula-
 tion.  This  framework can facilitate rational  discussion
 among scientists, policymakers, and others concerned with
 environmental regulation.
Introduction
     Over the past twenty years many new federal, state,
and local regulations have resulted from heightened con-
cern about  the damage that we humans have done to the
environment - and might do in the future.  Public debate,
unfortunately, has often focused almost exclusively on risk
assessment   and  the  allowable  limit  of a  pollutant.
Although this "limit part" of a regulation  is important, a
regulation also  includes a  "statistical pan" that defines
how compliance is to be determined; even though it is typi-
cally relegated to an appendix and thus may seem unimpor-
tant, it can  have a profound effect on how the regulation
performs.
     Our purpose in this article is to  introduce  some new
ideas concerning the general  problem of designing environ-
mental regulations, and, in particular, to consider the role
of the "statistical pan" of such regulations.  As a vehicle for
illustration,   we  use  the  environmental  regulation of
ambient ozone.   Our intent  is not to provide a definitive
analysis of that  particular  problem.  Indeed, that  would
require experts  familiar with  the generation,  dispersion,
measurements,  and monitoring of ozone to analyze avail-
able data sets. Such detailed analysis  would probably lead
to the adoption  of somewhat different statistical assump-
tions  than  we  use. The  methodology described below,
however, can  accommodate  any  reasonable  statistical
assumptions for ambient ozone. Moreover, this  methodol-
ogy can be used  in the rational design of any environmental
regulation to limit exposure to any pollutant.

Ambient Ozone Standard
    For  illustrative purposes, then,  let  us consider the
ambient ozone standard (1,2). Ozone  is a reactive form of
oxygen that has serious health effects.  Concentrations from
about  0.15  parts per  million (ppm),  for example, affect

*) Deceased.
respiratory mucous membranes and other lung tissues  in
sensitive individuals as well as healthy exercising persons.
In 1971, based on the best scientific studies at the time, the
Environmental Protection Agency (EPA) promulgated a
National  Primary  and Secondary Ambient  Air  Quality
Standard ruling that "an hourly average level of 0.08 parts
per million (ppm)  not to be" exceeded more than  1 hour
per year."  Section 109(d) of the Clean Air Act calls for a
review every  five years of  the Primary National Ambient
Air Quality Standards.  In 1977 EPA announced that it was
reviewing  and  updating the  1971  ozone  standard.   In
preparing a new criteria document, EPA provided a number
of opportunities for external  review and comment. Two
drafts of the  document were  made  available for external
review.  EPA  received  more than  50 written  responses  to
the first draft and approximately  20 to the  second draft.
The American Petroleum Institute (API), in particular, sub-
mined extensive comments.
     The criteria document was the subject  of two meet-
ings of the Subcommittee on Scientific Criteria for Photo-
chemical Oxidants of EPA's Science Advisory Board. At
each of these meetings, which were open to the public, crit-
ical review and new information were presented for EPA's
consideration. The Agency was petitioned by the API and
29 member companies  and  by the  City of Houston  around
the time the revision was announced. Among other things.
the petition  requested that EPA  state the  primary and
secondary standards in such a way as to permit  reliable
assessment of compliance.  In the Federal Register  it  is
noted that
     EPA agrees that the present deterministic form of
     the oxidant standard has several limitations and
     has  made  reliable  assessment  of compliance
     difficult.  The revised ozone air quality standards
     are stated  in a statistical form that will  more
     accurately reflect the air quality problems in vari-
     ous regions of the country and  allow more reli-
     able assessment of compliance with  the  stan-
     dards. (Emphasis added)
Later, in the  beginning of  1978,  the EPA held a public
meeting to receive-comments from interested  parties on the
initial proposed revision of  the  standard.  Here  several
representatives from  the State and  Territorial Air Pollution
Program Administrators (STAPPA) and the Association  of
Local Air Pollution  Control Officials participated.  After
the proposal was published in the spring of 1978, EPA held
four public meetings to receive comments on  the proposed
standard revisions. In addition,  168 written comments were
received during  the formal  comment period.  The  Federal
Register summarizes the comments as follows:
     The majority  of comments received (132 out of
     168) opposed EPA's proposed standard  revision,
     favoring  either a  more  relaxed or  a  more
                                                      41

-------
     stringent  standard.  State  air  pollution control
     agencies (and STAPPA) generally  supported  a
     standard level of 0.12 ppm on the basis of their
     assessment  of an adequate  margin of  safety.
     Municipal groups generally supported a standard
     level of 0.12 ppm or higher, whereas most indus-
     trial  groups  supported a standard level of 0.15
     ppm or higher. Environmental groups generally
     encouraged EPA to retain the 0.08 ppm standard.
As reflected in this statement, almost all of the public dis-
cussion of the ambient ozone standard (not just the 168
comments summarized here) focused on  the limit part  of
the regulation. In this instance, in  common with similar
discussion of other environmental regulations, the statisti-
cal pan of the regulation was largely ignored.
     The  final rule-making  made the  following  three
changes:

(1)  The primary standard was raised to 0.12 ppm.
(2)  The secondary standard was raised to 0.12 ppm.
(3)  The  definition of the point at  which the  standard is
     attained was changed to "when the expected number
     of days per calendar year" with maximum hourly
     average concentration above 0.12  ppm is equal to  or
     less than one."

The Operating Characteristic Curve
     Environmental regulations have a structure similar to
that of  statistical  hypothesis tests. A regulation states how
data are to be used to decide whether a particular site is in
compliance with a specified standard, and  a hypothesis test
states how a particular set of data are to be used to decide
whether they are  in reasonable agreement with a specified
hypothesis.  Borrowing the terminology and  methodology
from hypothesis testing, we can say there  are two types  of
errors that can be made because of the stochastic nature  of
environmental data: a site that is really in compliance can
be declared out of compliance (type  I error) and vice  versa
(type II error).  Ideally the probability of committing both
types of error should be zero. In practice, however, it is not
feasible to obtain this ideal.
     In  the context of environmental regulations, an operat-
ing characteristic curve is the probability of declaring a site
to be in compliance (d.i.c.) plotted  as a function of  some
parameter 9 such as the mean level of a  pollutant.  This
Probfd.i.c. I 9J can be used to determine  the probabilities
of committing type I  and  type n errors. As  long as 9  is
below the stated standard,  the probability of a type I error
is   1 -Prob{d.i.c. I 6}.  When  6   is  above  the  stated
standard, Prob (d.i.c. I 9J is the probability  of  a type  II
error. Using the operating characteristic curve for the old
and the  new regulations for ambient ozone, we can evalu-
ate them to see what was accomplished by the revision.
     The old standard stated that "an hourly average  level
of 0.08 ppm [was] not to be exceeded more than 1 hour per
year." This standard was therefore defined operationally  in
terms of the observations themselves. The new standard, on
the other hand, states that the expected number of days per
calendar year with a maximum hourly average concentra-
tion above 0.12 ppm should be less than one.  Compliance,
however, must be determined in terms of the actual  data.
not an  unobserved expected number.  How should  this
conversion be made?  In  Appendix D of the new ozone
regulation, it is stated that:
    In  general,  the average number of exceedances
    per calendar year must be less than or equal to 1.
    In  its simplest form, the number of exceedances
    at a monitoring  site would be recorded for each
    calendar year and then averaged over the past 3
    calendar years to determine if this average is less
    than or equal to 1.
Based on  the stated requirements  of compliance, we have
computed the operating characteristic functions for the old
and the  new ozone regulations. They are plotted in Figures
1 and 2. (The last sentence in the legend for Figure 1 will
be discussed  below  in the  following  section,  Statistical
Analysis.)  To construct these curves, certain simplifying
assumptions were made, which are discussed in the section
entitled  "Statistical Concepts."   Before  such curves  are
used in practice,  these assumptions need to be investigated
and probably modified.
    According to  the main part of the new ozone regula-
tion,  the  interval from  0 to  1 expected number  of
exceedances of  0.12  ppm per  year  can be regarded  as
defining  "being in compliance." Suppose the decision
rule outlined above is used for a site that is operating at a
level such that the expected number of days exceeding 0.12
ppm is just below one.  In that case, as was noted by Javitz
(3), with the new ozone regulation, there is a probability of
approximately 37% in any given year  that such a site will
be declared out of compliance.  Moreover, there is approxi-
mately a  10% chance of not detecting a violation of 2
expected days per year above the 0.12 ppm limit; that is,
the standard operates  such that the probability is  10% of
not detecting occurrences when the actual value is twice its
pennissable value (2  instead of 1). Some individuals may
find these probabilities (37% and 10%) to be surprisingly
and unacceptably high, as we do. Others,  however, may
regard them as being reasonable or too low.  In this paper.
our point is not to  pursue that particular debate.  Rather, it
is  simply  to argue that, before environmental regulations
are put  in place, different segments of society need to be
aware of such operating characteristics, so that informed
policy decisions can be made.  It is important to realize that
the relevant operating characteristic  curves can  be con-
structed before a regulation is promulgated.

Statistical Concepts
     Let X denote a measurement from an instrument such
that X = 9 + e, where 9 is the mean value of the pollutant
and e is the statistical error term with variance cr .  The
term e contains not only the error arising from an imperfect
instrument but also the fluctuations in the level of the  pol-
lutant itself.  We assume  that the measurement process is
well calibrated and that the mean value of e is  zero.  The
parameters 9 and  O"2  of the distribution of e are unknown
but estimates of  them can  be  obtained  from data.   A
prescription of how the data are to be collected is known as
the sampling plan. It addresses the questions of how many.
where,  when, and how observations  are to be collected.
Any function f (X) =/(Xi,X2,.. - ,Xn) of the observa-
tions is an estimator, for example, the average of a set  of
values or the number of observations  in a sample above a
certain  limit.  The value of the function / for a given sam-
                                                        42

-------
pie is an estimate. The estimator has a distribution, which
can be detennined from the distribution of the observations
and the functional form of the estimator. With the distribu-
tion of the estimator, one can answer questions of the form:
what is the probability that the estimate f = f(X) is smaller
than or equal to  some critical value c? Symbolically  this
probability can be written as P = Probff (X_)< c I 6/.
     If we want to have a regulation limiting the pollution
to a certain level,  it is not enough to state the limit as a par-
ticular value of a parameter.  We must define compliance
operationally in terms of the observations. The condition of
compliance  therefore  takes  the  form of  an  estimator
f(X\,... ,Xn) being less than or equal to  some critical
value c, that is, { f (X i,...  ,Xn)< c J.  Regarded as a func-
tion of 6, the probability Prob{f(Xi,... ,Xn)L and zero  otherwise.  A year consists of
approximately  n  = 365 x 12 = 4380 hours of observations
(data are only taken  from 9:01 am to 9:00 pm LST). The
expected number of hours per year above the limit is then
                 4380
           Q = E{ £//.(*,) =i;=W,x 4380.
                  1=1
The probability that  a site is declared  to be in compliance
(d.i.c.) is
    PM =Prob{d.i.c.  I Q}=Prob\ _
                                 1'='
                                                   (1)
                            1=0
This probability P0id, plotted as  a function of 9, is  the
operating characteristic curve for the old regulation (Figure
1). Note that (/the old standard had been written in terms
of an allowable limit of one for  the expected number of
exceedances above  0.08 ppm, the maximum type  I error
would be 1.00 - 0.73 = 0.27. The old standard, however, is
actually  written  in  terms of  the  observed number  of
exceedances so type I and type II errors, strictly speaking,
are undefined.
     The condition of compliance  stated in the new regula-
tion is that the "expected number of days per calendar year
with daily maximum ozone" concentration exceeding 0.12
ppm must be less than or equal to  1." Let 7, represent the
daily  maximum hourly average 0=1,... ,365).  Suppose
the random variables Yj are independently and identically
distributed.  EPA proposed  that the  expected  number of
days (a parameter)  be estimated  by  a three-year moving
average of exceedances of 0.12 ppm.  A site is in compli-
ance when the  moving average is less  than or equal to 1.
The expected number of days above the limit of L = 0.12
ppm is then
                  365
     The  three-year specification  of  the  new  standard
makes it hard to compare with the previous one-year stan-
dard. If, however, one computes the conditional probability
that  the number of exceedances in the present year is less
than or equal to 0, 1,2 and 3 and multiplies that by the pro-
bability that the number of exceedances was 3, 2,  1 and 0,
respectively, for the previous two years, one then obtains a
one-year operating characteristic function.

  P™ = Prob{ d.i.c. I 0 } = JT Prob f d.i.c  \ k,6 }P(k)
                          k=0
where
               f2x365       }
   P(k)=Prob\  £ I(Yj) = b  =
and
      Prob {d.i.c.
                          3-k
                          y=0
where  k=0,1,2,3. A plot  of the operating  characteristic
function for the new regulation, Pnew versus 9, is presented
in Figure 2.
     Figures  1  and 2  show the operating  characteristic
curves computed as a function of (1) the expected number
of hours per year  above  0.08 ppm  for  the old  ambient
ozone regulation and (2) the expected number of of days
per year  with a maximum hourly observation above 0.12
ppm for  the  new ambient ozone regulation. We  observe
that the 95 % de facto limit (the parameter value for which
the site in a given year will be declared to be in compliance
                                                          43

-------
with 95 % probability) is 0.36 hours  per year exceeding
0.08 ppm for the old standard and 0.46 days per year
exceeding 0.12 ppm for the new standard. If the expected
number of hours of exceedances of 0.08 ppm is one (and
therefore in  compliance), the probability is approximately
26% of declaring a site to be not in compliance with the old
standard. If  the expected number of days exceeding 0.12
ppm is one (and therefore in compliance), the probability is
approximately 37% of declaring a site  to be not in compli-
ance with the new standard. (We are unaware of any other
legal context  in  which type I errors of this magnitude
would be considered reasonable.)  Note that the parameter
value for which the site in a given year will be declared to
be in compliance with 95% probability is 0.36 hours per
year exceeding 0.08 ppm for the old standard and 0.46 days
per year exceeding 0.12 ppm for the new standard.
     Neither curve provides sharp discrimination between
"good" and "bad" values of 0. Note  that the old standard
did  not  specify  any parameter value above which non-
compliance  was  defined.  The new  standard, however,
specifies that one  expected day  is the limit, thereby creating
an inconsistency between what the regulation says and how
it operates because of the large discrepancy between  the
stated limit and the operational limit.
     The construction of Figures 1 and 2 only requires the
assumption  that  the relevant  observations  are approxi-
mately identically and independently distributed (for the
old standard, the relevant observations are  those for the
hourly ambient ozone measurements; for the  new standard,
they are the maximum hourly average  measurements of the
ambient ozone measurements each day).  The construction
does not require  knowledge of the distribution of ambient
 ozone observations.  If one has an estimate of this distribu-
 tional form, however, a direct comparison of the new  and
 old regulation is  possible in terms of the concentration of
 ambient ozone (in units, say, of ppm.)  To illustrate  this
 point, suppose the random variable Xt  is  independently
 and identically distributed according to a normal distribu-
 tion with mean (j. and variance  a2, that is, X,-Af(|a,o2).
 Then  the probability of one observation being above the
 limit L =0.08 is
                                                    (4)
 where () is the cumulative density function of the stan-
 dard normal  distribution.  The  probability  that a site  is
 declared to be in compliance can be computed as a function
 of )i by substituting pL from (4) into (1).
      For the new regulation let Xl}  represent the one-hour
 average,        O'=l	12;y=l	365),        and
 Y, = max£X1;,... ,X^}). If Xl}-N(]L, a2) , then YrHQ.l2} = \-
                                     0.12-
                                               12
 one  obtains the operating  characteristic  function for the
 new standard.
     For a fixed value of the variance a*, one can compute
the operating  characteristic curves  for the old  and  new
regulations to  provide a graphical comparison of the  way
these two regulations perform. Figure 3 shows these curves
for the old and new ambient ozone regulations computed as
a function of the mean hourly values when it is assumed
that a = 0.02 ppm. We observe that the 95% de facto limit
is  changed from 0.0046 ppm to  0.045 ppm. That is.  it is
approximately  ten times higher in the new ozone regula-
tion.
     We have  three observations to offer with regard to the
old and new regulations for ambient ozone standards. First,
notwithstanding EPA's comment to  the contrary, the  new
ozone regulation  is  not more statistical than the previous
one;  like all environmental regulations, both the new and
old ozone regulations contain statistical parts, and, for that
reason,  both  are statistical.  Changing the specification
from one in terms of a critical value to one in terms of a
parameter does not  make  it more statistical.  It actually
introduced an inconsistency.  The  old standard did  not
specify  any parameter value as a limit but only an opera-
tional limit in  terms of the parameters.  This therefore con-
stitutes the standard. The new standard, however, specifies
not only an intent in terms of what the desired limit is but
also an operational limit.  The large difference between the
interned limit and the operational limit constitute the incon-
sistency. This inconsistency is a potential and unnecessary
source of conflict. Second, the new regulation is dependent
on the ambient ozone level for the past two years  as well as
the present year, which  means  that a sudden rise in the
ozone level might be detected more slowly.  The new regu-
lation is also more complicated.  Third, it is unwise first to
record and store every single hourly observation and then
to use only the binary observation as to whether the daily
maximum  is  above  or below 0.12  ppm.  This procedure
wastes valuable scientific information.  As a matter of pub-
lic policy, it  is  unwise to use the data in a binary  form
when they  are already measured on a continuous scale.
The estimate of the 1/365 percentile is an unreliable statis-
tic.  It is for this reason that type I and type H errors are as
high as they  are. In fact, the natural variability of this
statistic is of the  same order of magnitude as the change in
the limit which was so much in debate.

     If instead, for example, one used a procedure based on
 the  t-statistic  for control of the proportion above the  limit,
 as is commonplace in industrial  quality control procedures
 (4), one would get the operating characteristic curve plotted
 in Figure 4 (see also appendix).  For comparison, the curve
 for  the new regulation is also plotted as a function of the
 expected number of exceedances per year. With  the new
 ozone regulation, the probability  can exceed 1/3 that a par-
 ticular  site will be declared out of compliance  when it is
 actually in compliance.  The operating characteristic  curve
 for the t-test is steeper (and hence has more discriminating
 power) than that for the new  standard. The modified pro-
 cedure based  on the t-test generally reduces the probability
 that sites that are actually in compliance will be declared to
 be out of compliance. In fact, it is constructed so that there
 is 5% chance of declaring that a site is out of compliance
 when it  is actually  in compliance  in the  sense  that the
 expected exceedance number is one per year. Furthermore,
 when a violation has occurred, it is much more certain  that
                                                         44

-------
 it will  be detected  with  the  t-based procedure. In this
 respect, the t-based procedure provides more protection to
 the public.
      We do not conclude  that procedures based on the t-
 test are best. We  merely  point out  that  there are alterna-
 tives to the procedures used in the old and new ozone stan-
 dard. A basic principle is that information is lost when data
 are collected on a  continuous scale and then reduced to a
 binary form.  One of the  advantages of procedures based on
 the t-test is that they do not waste information in this way.
     The most important point to be made goes beyond the
 regulation of ambient ozone; it applies to regulation of all
 pollutants where there is a desire to  limit exposure.  With
 the aid of operating characteristic curves, informed judge-
 ments can  be made when  an environmental  regulation is
 being  developed.  In particular, operating  characteristic
 curves for  alternative forms of a regulation can be con-
 structed and compared before a final one is selected. Also,
 the robustness of a regulation to changes in  assumptions,
 such as normality and statistical independence of observa-
 tions, can  be  investigated prior to the promulgation.  Note
 that environmental  lawmaking, as it concerns the design of
 environmental regulations,  is similar to design of scientific
 experiments.  In both contexts, data should be collected in
 such a  way that clear answers will emerge to questions of
 interest, and careful forethought can ensure that this desired
 result is achieved.

 Scientific Framework
     The operating  characteristic curve is only one com-
 ponent  in a more comprehensive scientific framework that
 we would like to promote for the design of environmental
 regulations. The key elements in this process are:

 (a) Dose/risk curve
 (b) Risk/benefit analysis
 (c) Decision on maximum acceptable risk
 (d) Stochastic  nature of the pollution process
 (e) Calibration of measuring instruments
 (f) Sampling plan
 (g) Decision function
 (h) Distribution theory
 (i) Operating characteristic function

 Currently there may be some instances in which all of these
 elements are considered in some form when environmental
 regulations arc designed. Because the particular  purposes
 and techniques are not explicitly isolated and defined, how-
 ever, the resulting regulations are not  as clear nor as effec-
 tive as they might otherwise be.
     Often the first  steps towards establishing an environ-
 mental  regulation  are (a)  to  estimate  the  relationship
 between the "dose" of a pollutant and some measure of
 health risk associated with it and (b) to carry out a formal
 or informal risk/benefit analysis.  The problems associated
 with  estimating   dose/risk  relationships   and  doing
 risk/benefit analyses are numerous and complex, and uncer-
 tainties can never be completely eliminated. As a next step
 a  political  decision is made -  based  on this  uncertain
 scientific and economic groundwork - as to the maximum
risk that is acceptable to society (c). As indicated in Figure
5,  the   maximum acceptable risk implies,  through the
dose/risk curve, the  maximum  allowable dose. The first
three elements  have  received considerable attention when
environmental regulations have been  formulated,  but the
last  six  elements have  not  received the  attention  they
deserve.
     The maximum allowable dose defines the compliance
set &0 and the noncompliance set 0; ,  which is its comple-
ment.  The pollution process can be considered (d) as a sto-
chastic process or statistical  time-series  0(6; r). Fluctua-
tions in the measurements X can usefully be thought of as
arising from three sources: variation in the  pollution level
itself <)>, the bias  b in the readings, and the measurement
error e. Thus X = <}> + b + e.  Often it is assumed that 4> = 9,
a  fixed constant  and that variation arises  only from the
measurement  error  e;  however,  all  three  components
, b, and e can vary. Ideally b=0 and the variance of e is
small.
     Measurements will  only have scientific meaning if
there is a detailed operational description of how the meas-
urements are to be obtained and the measurement  process
is in a state of statistical control. A regulation must include
a  specification  relating to how the instruments are to be
calibrated (e). These descriptions must be an integral pan
of a regulation if it is going to be meaningful. The  subject
of measurement  is deeper than is  generally recognized,
with  important implications for environmental regulation
(5, 6, 7). The pollution process and the  observed  process
as a  function of time are indicated in Figure 5.
     Logically the next question is (f)  how best to  obtain a
sample X^ = (Xl,X2,... ,Xn) from  the pollution  process.
The answer to this question will be related to the  form of
the estimator/ (X) and (g) the decision rule
        d(f(X))=<
                    0 : process in compliance
                    1 : process not in compliance
The sample, the estimator, and the decision function are
indicated in Figure 5.  Based on knowledge about the sta-
tistical distribution of the sample (h), one can compute (i)
the        operating        characteristic        function
P =Prob{d(f(X)) = Q I 9; and plot the operating charac-
teristic curve P versus 9. An operating characteristic func-
tion is drawn at the bottom  of Figure 5. (In practice it
would probably be desirable to construct more than one
curve because, with different assumptions, different curves
will result).  Projected back on the dose/risk relationship
(see  Figure  5),  this  curve  shows the  probability of
encountering various risks for different values  of 9 if the
proposed environmental regulation is  enacted.  Suppose
there is a reasonable probability that the pollutant levels
occur in the range where the rate of change of the dose/risk
relationship is appreciable; then the steeper  the dose/risk
function, the  steeper the operating  characteristic curve
needs to be if the regulation is to offer adequate protection.
The promulgated regulation should be expressed in terms
of an operational definition that involves measured quanti-
ties, not parameters.  Figure 5 provides  a convenient sum-
mary of our proposed framework  for designing environ-
mental regulations.
     In environmental  lawmaking, it is most  prudent to
consider  a  range  of plausible assumptions.  Operating
                                                        45

-------
characteristic curves will sometimes change with different
geographical areas to a significant degree. Although this is
an awkward  fact  when  a  legislative,  administrative, or
other body is trying to enact regulations at an international,
national, or other level,  it is better to face the problem as
honestly as possible and deal with it rather than pretending
that it does not exist.

Operating Characteristic Curve as a Goal, Not a Conse-
quence
     We suggest that operating characteristic curves be
published  whenever  an  environmental  regulation  is
promulgated that involves a pollutant the level of which is
to be controlled. When  a regulation is being developed,
operating characteristic curves for various alternative forms
of the  regulation should  be examined.  An  operating
characteristic  curve  with specified desirable properties
should be viewed as a goal, not  as something to compute
after a regulation has been promulgated.  (Nevertheless, we
note in  passing  that it would  be informative to compute
operating characteristic curves for existing  environmental
regulations.)
     In  summary, the following procedure might be feasi-
ble.  First, based on scientific and economic studies of risks
and benefits associated with exposure to a particular pollu-
tant, a political decision would be reached concerning the
compliance  set  in the  form of an interval of  the  type
0 < 9 < Q0 for a  parameter of the distribution of the pollu-
tion process. Second, criteria for desirable sampling plans,
estimators, and  operating characteristic curves would be
established.  Third, attempts would be  made to create a
sampling plan and estimators  that would meet these cri-
teria. The costs associated with different sampling plans
would be estimated. One possibility is that the desired pro-
perties of the  operating characteristic  curve might not be
achievable at a reasonable cost.  Some iteration and even-
tual compromise may be required among the stated criteria.
Finally, the promulgated regulation  would be expressed in
terms of an operational definition that involves measured
quantities, not parameters.
     Injecting parameters into  regulations, as was done in
the new ozone standard,  leads  to unnecessary questions of
interpretation and  complications in enforcement.  In  fact,
inconsistencies    (such    as    that     implied    by
/V0&f/(X)
-------
ties of violations not being detected (type n errors); indus-
tries  would  know  the probabilities  of  being  accused
incorrectly  of violating standards (type I errors);  and all
parties would know the costs associated with various  pro-
posed environmental control schemes. We believe that the
operating characteristic curve is a simple, yet comprehen-
sive device for presenting and comparing different alterna-
tive  regulations  because it  brings  into the open many
relevant and sometimes subtle points. For many people it
is unsettling to realize that type I and type II errors will be
made, but it is unrealistic to develop regulations pretending
that such errors do  not occur. In fact, one of  the central
issues that should be faced in formulating effective and fair
regulations is the estimation and balancing of the probabili-
ties of such occurrences.

Acknowledgments
     This research was supported by  grants  SES - 8018418
and DMS - 8420968  from the National  Science Founda-
tion, Computing was facilitated by access to the research
computer  at  the Department of Statistics, University of
Wisconsin, Madison.
Appendix
     The  t-statistic procedure is  based  on the estimator
/ (x) = (L-x)/s where L is the limit (0.12 ppm), x the sam-
ple average, and s the sample standard deviation. The deci-
sion function is
                  f / (x) > c : in  compliance
        d(f&=-\f(x) c
                                                 (A2)
 where ZQ = ~I(l-9o) and  9o  is the fraction above  the
 limit we at most want to accept (here 1/365).
     The  exact operating  characteristic function  is found
 by reference to a non-central t-distribution, but for all prac-
 tical purposes the following approximation is sufficient:

            L-x
      Pro,
>c» =
                                      (A3)
 The operating characteristic function in Figure 4 is  con-
 structed using a=0.05, 9o=l/365  and n=3x365. Substitut-
 ing (A3) into (A2) yields
                      = 1 - 0.05
                                                 (A4)
                                                  Literature Cited
                                                  (1)   National Primary and Secondary Ambient Air Quality
                                                       Standards, Federal Register 36, 1971 pp 8186-8187.
                                                       (This final rulemaking document is referred to in this
                                                       article as the old ambient ozone standard.)
                                                  (2)   National Primary and Secondary Ambient Air Quality
                                                       Standards, Federal Register 44, 1979 pp 8202-8229.
                                                       (This final rulemaking document is referred to in this
                                                       article as the new ambient ozone standard.) The back-
                                                       ground material  we summarize is contained in this
                                                       comprehensive reference.
                                                  (3)   Javitz, H. J. /. Air Poll. Con. Assoc.  1980 30, pp 58-
                                                       59.
                                                  (4)   Hald, A.  "Statistical Theory with Engineering Appli-
                                                       cations"; Wiley, New York, 1952; pp 303-311.
                                                  (5)   Hunter, J. S. Science 210,1980 pp 869-874;
                                                  (6)   Hunter, J. S. In "Appendix D", Environmental Moni-
                                                       toring, Vol IV, National Academy of Sciences 1977,
                                                  (7)   Eisenhart,  C. In "Precision Measurements and Cali-
                                                       bration", National Bureau of Standards Special Publi-
                                                       cation 300 Vol. 1, 1969; pp 21-47.
                                                  (8)   Porter, W. P.; Hinsdill, R.; Fairbrother, A.; Olson, L.
                                                       J.; Jaeger, J.; Yuill, T.; Bisgaard, S.;  Hunter, W. G :
                                                       K. Nolan, K. Science 1984,224, pp 1014-1017.
                                                  (9)   Rogers, W. H. "Handbook of Environmental Law",
                                                       West Publishing Company, 1977, St. Paul, MN.
 which solved for the critical value yields c = 2.6715.  Refer
 for example to (4) for more details.
                                                        47

-------
Figure 1. Operating characteristic curve for the  1971 ambient ozone standard (old
standard), as a function of the expected number of hours of exceedances of 0.08 ppm
per year.  Note that if the old standard had been written in terms of an allowable limit
of one for the expected number of exceedances above 0.08 ppm, the maximum type I
error would be 1.00 - 0.73 = 0.27.

Figure 2. Operating characteristic curve for the 1979 ambient ozone standard (new
standard), as a function of the expected number of days of exceedances of 0.12 ppm
per year.. Note that the maximum type I error is 1.00 - 0.63 = 0.37.

Figure 3.  Operating characteristic curves for the old and the new standards as a func-
tion of the mean value of ozone measured in parts per million when it is assumed that
ozone measurements are normally and independently distributed with  cr = 0.02 ppm.

Figure 4.  Operating characteristic curves for the new ozone standard and  a t-statistic
alternative as a function of the expected number of exceedances per year.

Figure 5.  Elements of the environmental standard-setting process: Laboratory experi-
ments and/or epidemiological studies are used to assess the dose/risk relationship. A
maximum acceptable risk is determined through a political  process balancing risk and
economic factors. The maximum  acceptable risk implies a limit for the "dose" which
again implies a limit for the pollution process as a function of time. Compliance with
the standard is operationally determined based on a discrete sample * taken from a
particular site.  The decision about whether a site is in  compliance is  reached through
use of a statistic / and a decision function d.  Knowing the statistical nature of the pol-
lution process, the sampling plan,  and the functional form of the statistics and the
decision function, one can compute the operating characteristic function.  Projecting
the operating characteristic function back on the dose/risk relationship, one can assess
the probability of encountering various levels of undetected violation of the standard.
      03
      O
      (£>
  ^  O
  U
  o
  ol
      f\J
      d
      p
      o
                   123456

                       expected number of hours above 0.08 ppm
                                     48

-------
    0.0
0.2
                                         Prob( d.i.c.)

                                      0.4            0.6
                                                              0.8
p
b
p
b
                        ~1 - 1

                        old de facto limit
                                                ~r
                                                          1.0

                                                         -71
CD
CD_

o   p
o   p.
N   W
O

CD

5'

T3   P
q   b
o
en
o
o>
                   new de facto limit
                                                                                              (D
                                                                                              X
                                                                                              XJ
                                                                                              (D
                                                                                              Q.
                                                                                              C


                                                                                              CT
                                                                                              CD
                                                                           a>
                                                                           cr
                                                                           o
                                                                           CD
                                                                           p

                                                                           ro
                                                                           TD
                                                                           •o
                                                                           3
    Prob( d.i.c.)

0.4            0.6
                                                                                                                                                                 0.8
                                                                                                                                                                                     1.0
                                                                                                                   limit specified by new standard

-------
u
•d
    CO
    o
    ID
    d
    ••r
    0
    OJ
    d
    O
    CJ
       0.0        0.5        1.0        1.5        2.0         2.5

                      expected number of days above 0.12 ppm
3.0
                                   50

-------
EPA PROGRAMS AND ENVIRONMENTAL
STANDARDS
     I appreciate the general points
that Dr. Bisgaard has made regarding
the development of environmental
standards.  I agree that generally,
when standards are developed, most of
the technical emphasis is placed on
developing the magnitude of the absolute
number, which Dr. Bisgaard calls the
"limit part" of the standard.  In
contrast, frequently little work is
expended developing the sampling program
and the rules that are used to evaluate
compliance with the limit in applica-
tion, which he calls the "statistical
part" of the standard.  At EPA some
programs do a thorough and thoughtful
job of designing environmental stan-
dards.  However, other EPA programs
could benefit from Dr. Bisgaard's work
because they have focused strictly on
the magnitude of the standard and have
not considered the "statistical part" of
the standard.

     However, I insist that the ozone
standard and all of the National Ambient
Air Quality Standards fall into the
category of standards where both the
"limit part" and the "statistical part"
of the standard have been designed based
on extensive performance evaluations and.
practical considerations.

     There are other EPA programs that
have also done an excellent job of
designing and evaluating the "limit
part" and the "statistical part" of
their standards.  For example, under
the Toxic Substances Control Act (TSCA)
regulations, there are procedures for
managing PCB containing wastes.  In
particular, PCB soil contamination must
be cleaned up to 50 ppm.  Guidances have
been prepared that stipulate a detailed
sampling and evaluation program and
effectively describe the procedure for
verifying when the 50 ppm limit has been
achieved.  Also under the TSCA  mandate,
clearance tests are under development
for verifying that, after the  removal
of asbestos from a building, levels are
not different from background levels.

     There are, however, many programs
at EPA that have not performed the
analysis and inquiry necessary to
design the "statistical part" of their
standards.  One example is the Maximum
Contaminant Levels (MCLs) which are
developed and used by EPA'S drinking
water program.  MCLs are concentration
limits established for controlling
pollutants in drinking water supplies.
Extensive health effect, engineering,
and economic analysis is used to choose
    DISCUSSION
W. Barnes Johnson
            the MCL concentration value.   However,
            relatively little work is done to ensure
            that,  when compliance with the MCL is
            evaluated,  appropriate sampling and
            analysis methodologies are used to
            ensure a designed level of statistical
            performance.
                 Similarly,  risk-based cleanup
            standards are used in EPA's Superfund
            program as targets for how much aban-
            doned hazardous waste sites should be
            cleaned up.  These are concentration
            levels either borrowed from another pro-
            gram (e.g., an MCL)  or developed based
            on site-specific circumstances.  A great
            deal of effort has been expended on
            discussions of how protective the actual
            risk related cleanup standards should
            be; however, virtually no effort has
            been focused on the methodology that
            will be used to evaluate attainment of
            these standards.  Drinking water MCLs
            and Superfund cleanup standards could
            benefit from the approaches offered by
            Dr. Bisgaard.

            PRACTICAL ENVIRONMENTAL STANDARDS
            DESIGN:  POLITICS, POLLUTANT BEHAVIOR,
            SAMPLING AND OBJECTIVES

                 Dr. Bisgaard clearly points out
            that his use of the ozone standard is
            only for the purpose of example and
            that the message of his presentation
            applies to the development of any
            standard.  I have responded by trying
            to identify other EPA program areas
            that could benefit from the perspective
            offered by Dr. Bisgaard's approach.
            However, it is important to realize that
            the development of the "statistical
            part" of an environmental standard must
            consider the nature of the political
            situation, pollutant behavior, sampling
            constraints, and the objective of the
            standard.  Ignorance of these practical
            considerations can limit the usefulness
            of a proposed standard regardless of the
            theoretical basis.  The developers of
            the ozone  standard were quite aware of
            these contingencies and it is reflected
            in the  form of the "statistical part" of
            the ozone  standard.

                 Central Tendency Versus Extremes

                 I  must agree that a standard based
            on central tendency statistics will be
            more robust with better operating
            characteristics than a standard based on
            peak statistics.  The difficulty  is that
            EPA  is  not concerned with estimating or
            controlling the mean ozone concentra-
            tion.   Ozone is a pollutant with  acute
            health  effects and, as such, EPA's
            interest  lies  in control of the extremes
            of the  population.  Peak statistics were
                                          51

-------
the primary concern when the ozone
standard was developed.

     EPA, in the development of NAAQS's,
has tried to balance statistical per-
formance with objectives by examining
the use of other statistics that are
more robust and yet retain control of
the extremes.  For example, EPA has
suggested basing the standard on the
fourth or fifth largest value; however,
commenters maintained that EPA would
lose control of the extremes and cause
undo harm to human health.  It has also
been suggested that the peak to mean
ratio (P/M) be considered.  The problem
with this approach is that the P/M is
highly variable across the United States
because of variation in the "ozone
season."  The objective of developing a
nationally applicable regulatory frame-
work would be quite difficult if each
locale was subject to a different stan-
dard.

     Decision Errors and Power

     In addition, regardless of the
standard that is chosen, decision
errors will be highest when the true
situation at a monitoring station is at
or close to the standard.  As the true
situation becomes well above or below
the standard, certainty increases and
our decisions become less subject to
error.  Of course, it would be -most
desirable to have an operating charac-
teristic function with a large distinct
step at the standard.  This operating
characteristic would have no error even
when the true situation is slightly
above or below the standard; however,
this is virtually impossible.  There-
fore, when standards are compared for
their efficacy, it is  important to
compare performance along the continuum
when the true situation is well above,
at, and well below the standard.  One
should not restrict performance evalu-
ation to the area at or immmediately
adjacent to the standard,  for most
statistics the performance will be
quite low  in this region.

     Dr. Bisgaard points out  from his
Figure 2 that when a site  is  in compli-
ance and at  the standard,  expecting to
exceed the standard on one day, there
is  a 37% chance that the  site may be
indicated  as exceeding the standard.
However, it  can also be shown that when
a site  is  below the standard  and
expects  to exceed the  standard  on one-
half of  a  day, there  is only  about a  6%
chance  that  the  site may  be  indicated
as  exceeding the  standard.   Conversely,
 it  can  be  pointed  out  that when the  site
 is  above the standard  and expects to
exceed  the standard on three  days, there
 is  only  a  3% chance that  the  site will
be  found to  be  in  compliance.
     Dr. Bisgaard is quite correct in
pointing out that the operating charac-
teristics of a standard based on the
mean are better than a standard based
on the largest order statistic.  How-
ever, as mentioned above, a standard
based on the mean does not satisfy the
objectives of the ozone standard.  EPA
staff have tendered proposals to
improve the operating characteristics
of the standard.  One of these involved
the development of a three-tiered
approach that would allow a site to be
judged: in attainment, not in attain-
ment, or too close to call.  The
existing structure of the attainment
program was not flexible enough to
permit this approach.

     Pollutant Behavior

     Ozone is a pollutant which exists
in the environment at a high mean ambi-
ent level of approximately one-third the
existing standard.  Effort expended
trying to drive down peak statistics
indirectly by controlling the mean would
be futile.  This  is because mean levels
can only be reduced to the background
mean which, relative to the standard,  is
high even in the  absence of air
pollution.

     Another point to consider is that
ozone behavior  is influenced by both
annual .and seasonal meteorological
effects.  This  is the reason that the
newest standard is based on three years
of data.  The effect of  an extreme year
is reduced by the averaging process
associated with a three  year standard.
As mentioned above, work has also
focused on controlling the peak to mean
ratios; however,  because ozone seasons
vary radically  across the  country, this
sort of measure would be difficult to
implement.

     Dr. Bisgaard has also questioned
the  new standard  because of the use  of
the  term "expected."  This terminology
was  probably included in the wording
because of the  many  legal  and  policy
edits  that are  performed on a  draft
regulation.  It was  not  intended that
the  term "expected"  be applied in the
technical statistical use  of the term.
The  term was intended to show  that EPA
had  considered  and  reflected annual
differences  in  ozone  conditions  in the
three  year  form of  the standard.

CONCLUSIONS

     Dr. Bisgaard brings an  interesting
and  useful perspective to  the  develop-
ment of environmental standards.  The
important  idea  is that an  environmental
standard  is  more  than a  numerical limit
and  must  include  a  discussion  of the
associated  sampling approach and
                                           52

-------
decision function.  I tried to extend         issues in exhaustive detail.   Second,
this central idea by adding two primary       the practical issues that influence the
points.  First, there are several pro-        implementation of an environmental
grams within EPA that can benefit from        standard are a primary constraint and
Dr. Bisgaard's perspective; however, the      must be understood in order to develop a
NAAQS program is fully aware of and has       standard that offers a useful measure of
considered these sampling and decision        compliance.
                                         53

-------
 QUALITY CONTROL ISSUES IN TESTING COMPLIANCE WITH A REGULATORY
    STANDARD:  CONTROLLING STATISTICAL DECISION ERROR RATES

                               by

                          Bertram Price
                     Price Associates,  Inc.

                         prepared under

                  EPA  Contract No. 68-02-4139
                  Research Triangle  Institute

                              for

             The Quality Assurance Management Staff
               Office of Research and Development
             U. S. Environmental Protection Agency
                    Washington, D.C.  20460
                            ABSTRACT

     Testing compliance with a regulatory standard intended to
control chemical or biological contamination is inherently a
statistical decision problem.  Measurements used in compliance
tests exhibit statistical variation resulting from random
factors that affect sampling and laboratory analysis.  Since a
variety of laboratories with potentially different performance
characteristics produce data used in compliance tests, a
regulatory agency must be concerned about uniformity in
compliance decisions.  Compliance monitoring programs must be
designed to avoid, for example, situations where a sample
analyzed by one qualified laboratory leads to a noncompliance
decision, but there is reasonable likelihood that if the same
sample were analyzed by another qualified laboratory, the
decision would be reversed.

Two general approaches to designing compliance tests are
discussed.  Both approaches have, as an objective, controlling
statistical decision error rates associated with the compliance
test.  One approach, the approach typically employed, depends
on interlaboratory quality control  (QC) data.  The alternative,
referred to as the intralaboratory approach, is based on a
protocol which leads to unique QC data requirements in each
laboratory.  An overview of the statistical issues affecting
the development and implementation of the two approaches is
presented and the approaches are compared from a regulatory
management perspective.


SECTION 1 - INTRODUCTION

     Testing compliance with a regulatory standard intended to

control chemical or biological contamination is inherently a

                             54

-------
statistical decision problem.  Measurements used in compliance
tests exhibit statistical variation resulting from random factors
affecting sampling and laboratory analysis.  Compliance decision
errors may be identified with Type I and Type II statistical
errors (i.e., false positive and false negative compliance test
results, respectively).  A regulating agency can exercise control
over the compliance testing process by establishing statistical
decision error rate objectives (i.e., error rates not to be
exceeded).  From a statistical design perspective, these error
rate objectives are used to determine the number and types of
measurements required in the compliance test.

     Bias and variability in measurement data are critical
factors in determining if a proposed compliance test satisfies
error rate objectives.  Various quality control (QC) data
collection activities lead to estimates of bias and variability.
An interlaboratory study is the standard approach to obtaining
these estimates.  (The U.S. Environmental Protection Agency
[USEPA] has employed the interlaboratory study approach
extensively to establish bias and variability criteria for test
procedures required for filing applications for National
Pollution Discharge Elimination System [NPDES] permits - 40 CFR
Part 136, Guidelines Establishing Test Procedures for the
Analysis of Pollutants Under the Clean Water Act.)   An
alternative means of estimating bias and variability that does
not require an interlaboratory study is referred to in this
report as the intralaboratory approach.  The intralaboratory
approach relies on data similar to those generated in standard
laboratory QC activities to extract the information on bias and
variability needed for controlling compliance test error rates.

     The purpose of this report is to describe and compare the
interlaboratory and intralaboratory approaches to collecting QC
data needed for bias and variability estimates which are used in
compliance tests.   Toward that end,  two statistical models,  which
                              55

-------
reflect two different attitudes toward compliance test
development, are introduced.   Model 1, which treats differences
among laboratories as random effects,  is appropriate when the
laboratory producing the measurements  in a particular situation
is not uniquely identified, but is viewed as a randomly selected
choice from among all qualified laboratories.  If Model 1 is
used, an interlaboratory study is necessary to estimate "between
laboratory" variance which is an essential component of the
compliance test.  Model 2 treats laboratory differences as fixed
effects (i.e., not random, but systematic and identified with
specific laboratories).  If Model 2 is used, bias adjustments and
estimates of variability required for  compliance tests are
prepared in each laboratory from QC data collected in the
laboratory.  Model 2 does not require  estimates of bias and
variability from interlaboratory data.

     The remainder of this report consists of five sections.
First, in Section 2, statistical models selected to represent the
data used in compliance tests are described.  In Section 3, a
statistical test used in compliance decisions is developed.  The
comparison of interlaboratory and intralaboratory approaches is
developed in two steps.  Section 4 is  included primarily for
purposes of exposition.  The types and numbers of measurements
needed for a compliance test are derived assuming that the
critical variance components - i.e., within and between
laboratories - have known values.  This section provides the
structure for comparing the interlaboratory and intralaboratory
approaches in the realistic situation  where the variance
components must be estimated.  The comparison is developed in
Section 5.  A summary and conclusions  are presented in Section 6.

SECTION 2 - STATISTICAL MODELS
     Compliance tests are often complex rules defined as
combinations of measurements that exceed a quantitative standard.
However, a simple rule - an average of measurements compared to
                              56

-------
the standard - is the basis for most tests.  This rule provides
the necessary structure for developing and evaluating the
interlaboratory and intralaboratory approaches.  Throughout the
subsequent discussion, the compliance standard is denoted by C0
and interpreted as a concentration - e.g., micrograms per liter.
Samples of the target medium are obtained, analyzed by chemical
or other appropriate methods and summarized as an average for use
in the test.  The statistical design issues are:

     o    total number of measurements required;

     o    number and type of samples required; and

     o    number of replicate analyses per sample required.

The design issues are resolved by imposing requirements on the
compliance test error rates (i.e., the Type I and Type II
statistical error rates).

     Many sources of variation potentially affect the data used
in a compliance test.  The list includes variation due to sanple
selection, laboratory, day and time of analysis, analytical
instrument, analyst, and measurement error.  To simplify the
ensuing discussion, the sources have been limited to sample
selection, laboratory, and measurement error.  (Measurement error-
means analytical replication error or single analyst
variability.)   This simplification,  limiting the number of
variance components considered, does not limit the generality of
subsequent results.

     The distribution of the compliance data is assumed to have
both mean and variance proportional  to the true concentration.
(This characterization has been used since many types of
environmental  measurements reflect these properties.)   The data,
after transformation to logarithms,  base e, may be described as:
                              57

-------
   1      Yi,j,k = M + Bi + Sifj + eifjfk

where i = 1(1)1 refers to laboratory, j = 1(1) J refers to sample
and k = 1(1)K refers to analytical replication.  Two different
interpretations referred to as Model 1 and Model 2 are considere
for the factors on the right side of equation 1.
     In Model 1:
          \i       - ln(C), where C is the true concentration;

          B^      - the logarithm of recovery  (i.e., the
                    proportion of the true concentration
                    recovered by the analytical method) which  is
                    a laboratory specific effect treated as
                    random with mean zero and variance cr2B;

          Sj_ j   -  a sample effect which is random with mean
                    zero and variance C72s/* and

          €i j,k "  replication error which is random with mean
                    zero and variance CT26.

     It follows that:
                                    ae
     and denoting as Yj_ an average over samples  and  replicates,

EQ 2      Var[Y.i_] = <72B + ff2s/J + 02e/J-K.
     In Model 2, BJ_  is  interpreted as a  fixed  effect  (i.e.,  Bj_ is
bias associated with laboratory  i) .  All  other factors  have  the
same interpretation  used  in Model  1.  Therefore,  in Model  2:
                              58

-------
                        + B±
and
EQ 3      Var[Yi] = cr2s/J + a2e/J'K

     Differentiating between Model 1 and Model 2 has significant
practical implications for establishing an approach to compliance
testing.  These implications are developed in detail below.  For
now, it is sufficient to note that the collection of Bj_'s are
treated as sealer factors uniquely associated with laboratories.
If the identity of the specific laboratory conducting an analysis
is unknown because it is viewed as randomly selected from the
population of all laboratories, then Bj_ is treated as a random
effect.  If the laboratory conducting the analysis is known, Bj_
is treated as a sealer, namely the bias of the ith laboratory.

SECTION 3 - STATISTICAL TEST:  GENERAL FORMULATION
     The statistical test for compliance is based on an average
of measurements, Y.  Assuming that Y's are normally distributed
(recall that Y is the natural logarithm of the measurement),
noncompliance is inferred when

EQ 4      Y > T

where T and the number of measurements used in the average are
determined by specifying probabilities of various outcomes of the
test.  (For simplicity in exposition in this section,  the
subscripts i, j, and k used to describe the models in Section 2
are suppressed.   Also, aY is used in place of the expressions in
EQ 2 and EQ 3 to represent the standard deviation of Y.  The more
detailed notation of EQ 2 and EQ 3 is used in the subsequent
sections where needed.)
                              59

-------
     Let P! and p2 be probabilities of declaring noncompliance
when the true means are d]/Co and d2'CQ respectively  (dlrd2  > 0),
and let

          Mo = ln(C0)

          D! = ln(d!), D2 = In(d2).
Requiring

EQ 5      px = P[ Y > T: M = Mo + Dl ]
and
EQ 6      p2 = P[ Y > T: M = Mo + D2 1

leads to values of T and the number of measurements used to  forr
Y by solving

EQ 7      [(T - Mo + D1)]/aY = Zi.p-L
and
EQ 8      [(T - MO + D2)]/(7y = Zl-p2

where Z^.p^ and Z^_p2 are percentile points of the standard
normal distribution.

     The solutions are:

EQ 9      T = C7y-Zl_pl + MO + Dl

EQ 10     ay = (D2 - QI)/(ZI-PI ~ Zi_p2).

     This formulation allows considerable flexibility  for
determining compliance test objectives.  Consider the  following
three special cases:

     Case (i) .  When d]_ = 1, p^_ = a, d2 is any positive number
                              60

-------
greater than  1 and p2 =  1  - /3, the  formulation  reduces  to  the
classical hypothesis testing problem Ho: M = MO versus
H]_: fj. = ^o +  D2 •  T^e correct number of measurements  establishes
the probabilities of Type  I and Type II errors  at  a and /3
respectively.

     Case  (ii) .  Let d]_  =  1, £2 be  a positive number  less  than  1,
P! = 1 - /3, and p2 = a.  This formulation also  reduces  to  the
classical hypothesis testing problem Ho: M = MO +  D2  versus
H2_: M = MO-   (Note that  ju0 + D2 < MO*  i.e., D2  < 0.)

     Case  (iii) .  Let 1  <  d-^ < d2.  Set p^_ < p2 to large values
(e.g., .90 and .99).  This formulation imposes  a high probability
of failing the compliance  test when the mean is D]_ times the
standard, and a higher probability  of  failing when the  mean is
further above the standard.

     Case  (ii) imposes a more stringent regulatory program on the
regulated community than Case (i).  In Case (i), the  regulated
community may establish  control methods to hold the average
pollution level at the standard.  In Case (ii), the pollution
level must be controlled at a concentration below  the standard  if
the specified error rates  are to be achieved.   In  Case  (iii), a
formal Type I error is not defined.  Individual members  of the
regulated community may  establish the  Type I error rate  by
setting their own pollution control level - the lower the  control
level, the lower the Type  I error rate.  In Case (iii),  the
regulated community has  another option also.  There is  a tradeoff
between the control level  and the number of measurements used in
the compliance test.  Individuals may  choose to operate  at a
level near the standard  and increase the number of measurements
used in the compliance test over the number required  to  achieve
the stated probability objectives.  The important  difference
between Case  (iii)  and the two other cases is the  responsibility-
placed with the regulated  community regarding false alarms (i.e.,
                              61

-------
Type I errors) .   Since false alarms affect those regulated more
than the regulator, Case (iii) may be the most equitable approach
to compliance test formulation.

SECTION 4 - SAMPLE SIZE REQUIREMENTS:  VALUES OF VARIANCE
            COMPONENTS KNOWN
     The discussion below follows the structure of Case  (i)
described above.  Based on the general formulation developed in
Section 3, the conclusions obtained also hold for Cases  (ii) and
(iii).

MODEL 1
The compliance test is a statistical test of:

          H0:  ju = Mo = In (C0)
versus
                      + D
where C0 is the compliance standard.  Assuming the values 'of the
variance components are known, the test statistic is

          Z = (Yi - M0)/(tf2B + cr2S/J

     Specifying the Type I error rate to be a leads to a test
that rejects HQ if

EQ 11     Z > Zi_a

where Z^_a is the (l-a)th percentile point of the standard normal
distribution.  If the Type II error is specified to be /? when the
alternative mean is /no + 02, then:

EQ 12     a2B + (72S/J + a2e/J-K =
                              62

-------
     Any combination of J and K satisfying EQ 12 will achieve the
compliance test error rate objectives.  However, unique values of
J and K may be determined by minimizing the cost of the data
collection program subject to the constraint in EQ 12.  Total
cost may be stated as:

EQ 13     TC = J-G! + J'K-C2

where Cl is the unit cost of obtaining a sample and C2 is the
cost of one analysis.

     Using the LaGrange Multiplier method to minimize EQ 13
subject to the constraint imposed by EQ 12 yields:

EQ 14     K = (ae/as)-(C1/C2)1/2
and
EQ 15     J = [as-ae/(U-a2B)]-[as/ae + (C2/C1)1/2]
where
          U = [D2/(Z!_a + Z^)]2.
                   •
(If EQ 14 does not produce an integer value for K, the next
largest integer is used and J is adjusted accordingly.)

     The number of replicate analyses for each sample, K,
increases as the ratio of the sampling cost to the analysis cost
increases and the ratio of the single analyst standard deviation
to the sampling standard deviation increases.   In many
situations, the analysis cost, C2, is much larger than the
sampling cost,  GI; and the sampling variance is much larger than
single analysis variability.   Under these conditions,  the number
of replicate analyses,  K,  will be 1 (i.e.,  each sample will be
analyzed only once).
                              63

-------
MODEL 2
     Since

          E(Yi) = M + Bi

the statistic used in the compliance test must incorporate a bias
adjustment (i.e., an estimate of B^) .  This can be achieved by
analyzing standard samples prepared with a known concentration  c.
(Choosing C at or near CQ minimizes the effects of potential
model specification errors.)  Let
   16     *>i,,k = Yi,j,k - lnc * Bi
Since
          E(bi) = Bi

b^ is an estimate of B^ and

          Var(bi) = a2Si/J' + a2e/J"K'
where
          S ' i j   - an effect associated with standard  samples
                    which  is random with mean zero  and  variance
                    a2Si ;
          J1     -  the number of standard samples  used to
                    estimate B-[ ; and
          K1     -  the number of analyses conducted  on each
                    standard sample.

(Note that  single analyst  variability, cr2€,  is assumed  to   have
the same value for field samples and prepared samples.)

The test statistic is

EQ 17    (Yi-bi-)u0)/[a2s/J  + a2s,/J' + a2e/(l/J"K'  +  1/J-K)]V2
                              64

-------
The cost function used to allocate the samples and replicates  is:

EQ 18     TC = J-G! + J'-C3 +  (J-K + J''K')'C2

where C3 is the unit cost for preparing a standard sample.
Type I and Type II error rates - a and /3 - are achieved  if:

EQ 19     o2s/J + a2Si/J! + a2e(l/J'-K' + 1/J'K) = U
where
          U = [D2/(Z1_a + Z^)]2,

as defined in the discussion of Model 1.

     Minimizing costs subject to the constraint on variance
yields

EQ 20     K = (ae/as)-(C1/C2)1/2,

which is identical to the solution obtained for Model  1, and

EQ 21     K' = (ae/as,)-(C3/C2)1/2/

EQ 22     J- = (as,/U)-[as-(C1/C3)1/2, + 2-ae(C2/C3)L/2  + ag,i,
and
EQ 23     J = J"(as/c7S,)'(C3/C1)1/2.

     The solutions for K and K1 are similar.  Each increases with
the ratio of sampling to analytical costs and the ratio  of
analytical to sampling standard deviations.

SECTION 5 - SAMPLE SIZE REQUIREMENTS:  VALUES OF VARIANCE
            COMPONENTS UNKNOWN
     In this section the interlaboratory and intralaboratory
approaches for obtaining estimates of the variance components
necessary to implement the designs developed in Section  4 are
                              65

-------
described.  As in Section 4, the design objective is to control
the compliance test error rates (i.e., the Type I and Type  II
error probabilities).  The discussion is simplified by
considering situations where the cost of analysis is signifi-
cantly greater than the cost of sampling, and the sample to
sample variability is at least as large as the analytical
variability:

          C2 >:> cl an<^ a^S > Cf2e*

Under these conditions, K = 1 (i.e., each sample is analyzed onl;
once).  Also, the value of K1 determined from EQ 21 (i.e.,  the
number of replicate analyses performed on each standard sample)',
will be set equal to 1 since the cost of preparing standard
samples for estimating Bj_ is significantly less than the cost of
analyzing those samples (i.e., C^ « ^2) •

     When K = K1 = 1, the variances used to define the test
statistic are, for Model 1 and Model 2 respectively:

EQ 24     Var(Yj_) = CT2B + (a2s + cr2e)/J

                  = a2B + a2€,/J
and
EQ 25     Var(Yj_ - bjj = (a2s + a2e)/J + (a2s. + a2e)/J'

                       = CT2ei/J + a2eii/J1.

(The notations 
-------
using interlaboratory data or it may be estimated from the J
measurements of field samples used to form the average when the
compliance test is performed.

     As described by Youden  (1975), an interlaboratory study
involves M laboratories  (between 6 and 12 are used in practice)
which by assumption under Model 1 are randomly selected from the
collection of all laboratories intending to produce measurements
for compliance testing.  For the discussion below, let n denote
the number of samples analyzed by each laboratory.  (Youden
recommends n = 6 prepared as 3 pairs where the concentrations of
paired samples are close to each other but not identical.)

Let
          w^
where {Vj_^j: i=l(l)M; j=l(l)n) are the measurements produced by
the i-th laboratory on the j-th sample, and {Cj: j=l(l)n} are the
concentration levels used in the study.   (Youden does not
recommend using logarithms, however the logarithmic
transformation is convenient and is consistent with other
assumptions in Youden's design.)  The statistical model
describing the interlaboratory study measurements is:

EQ 26     Wifj = Bi + e"!^

where
          BJ[   is an effect associated with the i-th laboratory
               and treated as a random variable with mean zero
               and variance cr2B; and
                              67

-------
       e''j_^j  is analytical error, the sum of single analyst
               error and an effect associated with variation
               among standard samples, which has mean zero  and
               variance cr2e • i .

     Using standard ANOVA  (analysis of variance) techniques, a2B
may be estimated from the  "within laboratory" and "between
laboratory" mean squares,  Q^ and Q2 :

EQ 27     Q-L = 2(Wi;j - Wi)2/M-(n-l)
and
EQ 28     Q2 = n'Z(Wi - W)2/(M-1).

The estimate is:

EQ 29     s2B = (Q2 - Qi)/n

which reflects differences among the laboratories through the
quantity

EQ 30     Z(Bj_ - B) 2.

Also, QI is an estimate of a2eii.

     The compliance test statistic may be defined either as
EQ 3 la    R = (Yi - M0)/(SB +
or
                          2
                                    1/2
EQ 31b    R =  (Yi - M0)/(SB + S

where s2ei is the sample variance of the J measurements,
          se
                              68

-------
and  (Y-j^j =  InCXi^j),  j =  i(i)J) are the measurements  obtained
from  field samples  in  the  laboratory selected to conduct  the
analyses.   (Based on the discussion at the beginning of this
section, K is  always equal to  I.  Therefore, the notation
describing compliance  measurements has been simplified, i.e.,
Yi,j  ~ Yi,j,l)-  Note  that Q-^  estimates the average variability
over  laboratories,  whereas s2ei estimates variability  for the
laboratory conducting  the test.  Also, QI is an estimate  of
a2eii, the variability associated with the analysis of standard
samples; s2ei  is an estimate of the variability associated with
the analysis of field  samples.

      The ratios in  EQ  31a and  EQ 31b have approximate  t-distri-
butions when the null  hypothesis is true.  The degrees of freedc:
may be estimated by methods developed by Satterthwaite (1946).
Although it is possible to approximate the degrees of  freedom anc
use a percentile point of the  t-distribution to define the test,
that  approach  is complicated.  Develop it at this point would be
an unnecessary diversion.  Instead, non-compliance will be
inferred when

EQ 32     R >  Zx_a

where Z^-a is the (1 - a)th percentile point of the standard
normal distribution.   (If R has only a few degrees of  freedom,
which is likely, the Type I error rate will be larger  than a.
The situation may be improved by using,  for example,  Z1_a/2 °r
some other value of Z  larger than Z1_a.   If necessary,  exact
values of Z could be determined using Monte Carlo methods.)

     The number of samples, J,  that must be analyzed for the
compliance test is obtained by specifying that the expression in
EQ 32 is equal to 1-/3 when the true mean is MO + D2 •   Tne value
of J may be obtained either by using approximations based on the
normal distribution, the noncentral t-distribution,  or by
                              69

-------
estimates based on a Monte Carlo simulation of the  exact
distribution of R.

     If EQ 31a is used, the compliance test criterion  (i.e.,  the
expression in EQ 32) becomes
EQ 33     GM(Xi/:J) > CQ-expCZ^a  •  (sB +

where GM is the geometric mean of the J compliance measurements.
The right side of the inequality  is a fixed number once the
interlaboratory study is completed.  The advantage of this
approach is the simplicity realized in describing the compliance
test to the regulated community 'in terms of one measured
quantity, the geometric mean.  The disadvantage is using Q]_
rather than the sample variance calculated from the compliance
test measurements which is likely to be a better estimate of
variability for the particular laboratory conducting the test.

MODEL 2
     Under Model 2, estimates of variance from interlaboratory
study data are unnecessary.  Since the laboratory conducting  the
analyses for the compliance test  is uniquely identified, the
laboratory factor, Bj_, is a sealer, and the variance component,
cr2B/ does not enter the model.  The variance estimates needed for
the compliance test can be obtained from the measurements used tc
compute YJ_ and bj_.

The test statistic is

EQ 34     t = (Y-j. - bi - M0)/(s2e./J + s2e, '/J')1/2

which has an approximate t-distribution with degrees of freedom
equal to J + J1  - 2 when the true mean is /i0-  (The statistic
would have an exact t-distribution if a2ei were equal to cr2eii.)
Noncompliance is inferred if
                              70

-------
EQ  35      t  >  tx_a.

J and J1  are determined  by  requiring  that  the  probability  of  the
expression in  EQ  35  be equal  to  1  - /3 when the true  mean is
MO  + D2•   T^i3 calculation  can be  made using the  noncentral t-
distribution.   Where cr2ei = a2eii/ the noncentrality parameter  is
D2/[c72€'(1/J + 1/J')].   (Note that this  formulation  implies a
tradeoff  between  J and J1 for achieving  the compliance  test error
rate objectives.)  If cr2ei  and cr2eii  are not equal,  the correct
value to  replace  t;L_a in EQ 35 and values  of J and J1 may  be
determined using  Monte Carlo  methods.

SECTION 6 -  DISCUSSION AND  CONCLUSIONS
     Both statistical models  considered  above  are consistent  with
reasonable approaches to compliance testing.   The two approaches,
however,  have  distinctly different data  requirements.

     Model 1,  through EQ 32a, reflects "the conventional"
approach  to  compliance testing.  A "target value  for control,"
Co, is established (e.g., either a health  based standard or a
"best available control  technology" standard)  and then  adjusted
upward to account for both  analytical variability and laboratory
differences.   Using  EQ 33,  noncompliance is inferred when  the
geometric mean of the compliance test measurements,  GM(Xj_  j), is
larger than  Co multiplied  by a factor which combines estimates
reflecting variability between laboratories, a2B, and analytical
variability within laboratories.   Since an  estimate  of  cr2B is
required  in the Model 1  approach,  an  interlaboratory study is
required  also.   The  role of cr2B, which reflects laboratory
differences,  is to provide  insurance against potentially
conflicting compliance results if  one set of samples were
analyzed  in two different laboratories.   Systematic  laboratory
differences  (i.e., laboratory bias) could  lead to a  decision  of
noncompliance based  on analyses conducted  in one  laboratory and a
                              71

-------
decision of compliance based on analyses of the same samples
conducted in another laboratory.

     In practice, a2B is replaced by s2B,  an estimate obtained
from the interlaboratory study.  The variability of this estimate
also affects the compliance test error rates.   If the variance of
s2B is large, controlling the compliance test error rates becomes
complicated.  Requiring that more field samples be analyzed
(i.e., increasing J) may help.  However, increasing the amount of
interlaboratory QC data to reduce the variance of s2B directly
may be the only effective option.  Based on interlaboratory QC
data involving 6 to 12 laboratories, which is current practice,
the error in s2B as an estimate of a2B is likely to be as large
as 100%.  If interlaboratory QC data were obtained from 30
laboratories, the estimation error still would exceed 50%. "
(These results are based on a 95% confidence interval for CT2B/s2B
determined using the chi-square distribution.)  Since
interlaboratory data collection involving 12 laboratories is
expensive and time consuming, it is doubtful if a much larger
effort would be feasible or could be justified.

     Using Model 2 and the intralaboratory approach, a regulatory
agency would not attempt to control potential compliance decision
errors resulting from laboratory differences by using an estimate
of "between laboratory" variability to adjust the compliance
standard.  Instead, compliance data collected in each laboratory
would be adjusted to reflect the laboratory's unique bias and
variability characteristics.  In many situations, bias for any
specific laboratory can be estimated as precisely as needed using
QC samples.  Also, the variance of the bias estimate, which is
needed for the compliance test, can be estimated from the same
set of QC sample measurements.  An estimate of analytical
variability required for the compliance test can be estimated
from the measurements generated on field samples.  Therefore, all
information needed to develop the compliance test can be obtained
                              72

-------
within the laboratory that produces the measurements for the
test.

     From a regulatory management perspective, both approaches
(i.e., Model 1 using interlaboratory QC data and Model 2 using
intralaboratory QC data) lead to compliance tests that satisfy
specified decision error rate objectives.  However, the
intralaboratory approach based on Model 2 appears to be the more
direct approach.  The design for producing data that satisfy
error rate objectives is laboratory specific, acknowledging
directly that laboratories not only have different bias factors,
but also may have different "within laboratory" variances.  Each
laboratory estimates a bias adjustment factor and a variance
unique to that laboratory.  Then, the number of samples required
for that specific laboratory to achieve specified error rate
objectives is determined.  As a result, each laboratory produces
unbiased compliance data.  Also, compliance test error rates are
identical for all laboratories conducting the test.  Moreover,
the data used to estimate laboratory bias and precision are
similar to the QC measurements typically recommended for every
analytical program.  In summary, the intralaboratory approach
appears, in general, to provide a greater degree of control over
compliance test error rates while using QC resources more
efficiently than the approach requiring interlaboratory QC data.
                              73

-------
                           REFERENCES
Satterthwaite, F.E. (1946),  "An Approximate Distribution of
Estimates of Variance Components", Biometrics Bulletin, Vol. 2,
pp. 110-114.

Youden, W.J.; and Steiner, E.H. (1975), Statistical Manual of
AOAC.  Association of Official Analytical Chemists, Washington,
D.C.
                              74

-------
                                               DISCUSSION
                                            George T. Flatman
                                   U.S. Environmental Protection Agency
   Dr. Bertram Price has something worth saying
and has said it well in his paper entitled,
"Quality Control Issues in Testing Compliance
with a Regulatory Standard:  Controlling Sta-
tistical Decision Error Rates."

   The Environmental Protection Agency is
emphasizing "Data Quality Objectives." Dr. Price
has expressed the most important of these objec-
tives in his title, "Controlling Statistical
Decision Error Rates."  The paper is timely for
EPA because it demonstrates how difficult the
statistics and the  implementation are for data
quality objectives.

   In Section 1.. .Introduction, an "interlabora-
tory study approach" is suggested for establish-
ing "bias and variability criteria."  This is
theoretically valid but may not be workable in
practice.  In contract laboratory programs,
standards are in a  much cleaner matrix (dis-
tilled water instead of leachate) and sometimes
run on cleaner instruments that have not just
run dirty specimens.  Standards or blank samples
cannot avoid special treatment by being blind
samples since they  are in a different matrix
than the field samples.  Thus, in practice, the
same matrix and analytical instruments must be
used to make "interlaboratories study" an un-
biased estimate of  the needed "bias and vari-
ability criteria."  Both the theory and the
implementation must be vigorously derived.

   In Section 2...Statistical Models the enumer-
ation of the components of variation is important
for both theory and practice.  More precise
enumeration of variance components than the
mutually exclusive  and jointly exhaustive theory
of "between and within" is needed for adequate
sampling design.  I agree with Dr. Price that
"simplification, limiting the number of variance
components, does not limit the generality of
subsequent results," but I suggest it makes
biased or aliased data collection more probable.
For example, the Superfund Interlaboratories
Studies of the Contract Labs has identified the
calibration variance of the analytical  instrument
as the largest single  component of longitudinal
laboratory (or interlaboratories) variance.
If this component of variation is not enumerated
explicitly, I suggest this component of variance
could be omitted, included once, or included
twice.  If all  the  field samples and lab repli-
cate analyses were  run between recalibrations of
the analytical  instrument, the recalibration
variance would be omitted from the variances of
the data.  If the analytical  instrument were
recalibrated in the stream of field samples and
between lab replicate analyses, the recalibration
variance would be aliased with both the sample
and lab variances,  and thus added twice into the
total  variance.   With these possible analyses
scenarios the recalibration component of variance
could be either omitted or included twice.  This
potential for error can be minimized through the
vigorous modeling of all the process sources of
 variation  in  the  components of  variance model.
 This  is  not  a criticism of  the  paper  out  it  is  a
 problem  for  the  implementation  of  this paper by
 EPA's  data quality  objectives.

    Section 3...Statistical  Test  is  very important
 because  it specifically states  the  null and
 alternative  hypotheses with their  probability
 alpna  of type I  error and probability beta of
 type  II  error.   This may appear  pedantic  to  the
 harried  practitioner, but due to the  importance
 of  the decision  is  absolutely essential to data
 quality  objectives.  Dr. Price's alternative
 hypothesis and his  beta-algebra  is  complicated
 by  EPA's interpretation of  the  law,  "no exceed-
 ence of  background  values or concentration
 limits"  (40  CFR  part 264).  This requires an
 interval  alternative hypothesis

                    Hi :  u > Mo

 rather than  Dr.  Price's point hypothesis

                 H-J :  u  =   U0 +  D.

 Lawyers  should be more aware of  how they  increase
 the statistician's  work.  Beta  is  a  function or
 curve  over all positive D.

    I think it is  important  to mention in  any
 environmental  testing that beta  is more critical
 or  important  than in historical  hypotheses test-
 ing.   Classically the hypotheses are  formulated
 so  that  a  type II error is to continue with the
 status quo when  in  fact a new fertilizer,  brand
 of  seed  potato, etc., would be better.  Thus, the
 loss associated with the type II error is low and
 its probability of occurrence can be  large (e.g.,
 20 percent) in agricultural  experiments.   This  is
 not true  in environmental  hypotheses testing!
 The hypotheses usually make  a type  II error the
 misclassification of "dirty" as  "clean"  with  a
 loss in  public health and  environmental  protec-
 tion.  Thus, beta representing  the  probability  of
 this loss  in public health  and  environmental
 protection should be set arbitrarily low  like
 alpha (1% or 52) .

   Sections 4 and 5...Sample Size Reouirements
derive equations  for numbers of  field samples
 and lab  replicates as a  function of cost  and
 variances.  The formulas digitize the process
 for precise decis'ions between number of  field
 samples and number of lab  replicates.   The for-
mulas indicate that an  analysis  instrument like
 GCMS,  because of  its high  incremental  analysis
cost and  low variance requires  few  replications
 (K=l),  but other  analysis  instruments such as
radiation counters may  not.   These  formulas  have
 a practical value because  of the diversity of
 analysis  instruments and  pollutants.

   Section 5...Sample Size  Requirements:   Values
of Variance Components  Unknown  detail  the  rigors
of variance components  estimation through  unknown
degrees of freedom and  non-central  t-distribution.
                                                   75

-------
It might be asked, is not only the sum of var-
iances needed for testing or "quality assurance"
(i.e., rejection of outliers).  This is true,  but
"quality improvement" requires the estimation  of
each component of variance.   The analysis is more
meaningful  and usable if the individual compo-
nents have  an estimate.

   Section  6...Discussion and Conclusions state
that interlaboratories QC model(variable effects)
and intralaboratory QC model (fixed effects)
"lead to compliance tests that satisfy specified
decision error rate objectives."  This theoreti-
cal position of the paper is confirmed by the
empirical findings of the Superfund Interlabora-
tories Comparison of the Contract Laboratories.
This study  found that within-lab variance is of
corresponding magnitude  to between-lab variance.
The appropriate test and model should be used
that correspond to the use of one lab or more
than one lab in the actual chemical analysis of
the data.

   In conclusion, Dr. Bertram Price has rigor-
ously presented the algorithms and the problems
for "Controllina Statistical Decision Error
Rates."  This paper enumerates the statistical
problems in applying hypothesis testing to real
world data.  Unfortunately, hypotheses testing is
made deceptively simple in many textbooks and'the
true complexity is discovered in practice through
the expensive consequences of a wrong decision.
The serious problems discussed in Or. Price's
paper are needed to sober the superficial use of
"alphas, betas, and other probabilities" in data
quality objective statements.  The paper is a
timely and vigorous summary of components of vari-
ance modeling and hypotheses testing.

Acknowledgments:  The discussant wishes to thank
l-orest Garner and Evangelos Yfantis for their
advice, review, and insight gained from Super-
fund interlaboratories testing.

Notice:  Although the thoughts expressed in this
discussion have been supported by the United
States Environmental  Protection Agency, they have
not been subject to Agency review and therefore
do not necessarily reflect the views of the
Agency and no official  endorsement shculd.be
inferred.
                                                    76

-------
ON THE DESIGN OF A SAMPLING PLAN TO VERIFY COMPLIANCE WITH EPA STANDARDS
   FOR RADIUM-226 IN SOIL AT URANIUM MILL-TAILINGS REMEDIAL-ACTION SITES
         R.O. Gilbert, Pacific Northwest Laboratory; M.L. Miller, Roy F. Weston,
                    Inc.; H.R. Meyer, Chem-Nuclear Systems, Inc.
                                1.0   INTRODUCTION

       The  United States  government  is  required under the Uranium Mill Tailings
  Radiation Control Act (U.S. Congress  Public  Law 95-604, 1978) to perform
  remedial  actions on  inactive  uranium  mill-tailings  sites that had been  federally
  supported and on properties that had  been  contaminated by the tailings.  The
                                                             poc
  current Environmental Protection Agency  (EPA) standard for    Ra (henceforth
  denoted by Ra)  in soil  (EPA,  1983)  requires  that  remedial action must  be taken
  if  the average concentration  of Ra in surface  (0-  to 15-cm)  soil over  any
  area of 100 square  meters exceeds  the background  level by more than  5  pCi/g,
  or  if the average exceeds 15  pCi/g for subsequent  15-cm thick layers of soil
  more than 15 cm below the surface.  Since  there are many thousands of  100
  square-meter areas  that must  be evaluated, the soil  sampling plan should be
  as  economical as possible while still meeting the  intent of the regulations.
        After remedial  action at a site has }>een conducted, the field  sampling
  procedure that has  been used  to determine  whether  the E.PA standard was'met was
  to  first  grid the entire site  into 10-m  by 10-m plots.  Then, in each  plot,
  20  plugs  of surface soil  were  collected  and  physically mixed together  from
  which  a single 500-g  composite sample was  withdrawn  and assayed for  Ra.  If
  this measurement was  >  5 pCi/g above  background, then additional remedial
  action was required.   Recently, based on cost considerations and the study
  described in Section  2.0,  the  number  of  soil plugs  per composite sample was
  reduced from 20 to  9.
       In this paper we discuss  a verification acceptance-sampling plan that is
  being  developed to  reduce costs by  reducing  the number of composite  soil samples
  that must be analyzed for Ra.   In  Section 2.0 we report on  statistical  analyses
  of  Ra  measurements on soil  samples  collected in the  windblown mi 11-tailings
  flood  plain  at  Shiprock,  NM.   These analyses provide guidance on the number
  and  size  of  composite soil  samples  and on the choice  of a statistical decision
  rule (test)  for the acceptance-sampling  plan discussed in Section 4.0.  In
  Section 3.0,  we discuss  the RTRAK system, which is  a  4-wheel-drive tractor
  equipped  with  four Sodium-Iodide (Nal) gamma-ray detectors.   The RTRAK  is being
  developed  for measuring  radionuclides that indicate  the amount of Ra in surface
  soil.   Preliminary results  on  the calibration of these detectors are presented.

                                       77

-------
     2.0  PERCENT ACCURACY OF MEANS AND PROBABILITIES OF DECISION ERRORS

     In this section we statistically analyze Ra measurements of composite
soil samples collected from the windblown mi 11-tailings flood-plain region at
Shiprock, NM.  This is done to evaluate the impact on probabilities of false
positive and false negative decision errors resulting from reducing the number
of soil plugs per composite soil sample from 21 to 9 or 5 and from collecting
1, 2, or 3 composite samples per plot.  We also consider how these changes
affect the accuracy of estimated mean Ra concentrations.

2.1  FIELD SAMPLING DESIGN
     The Shiprock study involved collecting multiple composite soil samples
of different sizes from 10 plots in the flood-plain region after an initial
remedial action had occurred.  Five sizes of composite samples were collected;
those formed by pooling either 5, 8, 9, 16, or 21 plugs of soil.
     Figure 1 shows the windblown mill-tailings flood-plain region and the
location of ten 30-m by 30-m study areas from which composite soil samples.
were collected.  Eight- and 16-plug composite samples were formed by pooling
soil plugs that were collected over the ten 30-m by 30-m areas according to
the three sampling patterns shown in the lower half of Fig. 2.  The 5-, 9-,
and 21-plug composite samples were formed by pooling soil plugs collected
from only the central 10-m by 10-m plot in each 30-m by 30-m area using the
three patterns shown in the upper half of Fig.  2.
     Up to nine composite samples of each type were formed in each of the ten
areas.  Each composite sample of a given type used the same pattern that had
been shifted slightly in location.  For example, referring to Fig. 2, the
21-plug composite sample number 1 in a given 10-m by 10-m plot was formed by
pooling soil plugs collected at the 21 positions numbered 1 in the plot.
This design allowed replicate composite samples of a given type to be collected
without altering the basic pattern that would be used in practice.
     Each soil plug was collected to a depth of 15 cm using a garden trowel.
The plugs collected for a given composite sample were placed in a bucket and
mixed vigorously by stirring and shaking.  The  composite sample analyzed for
Ra consisted of about 500 g of the mixed soil.
                                      78

-------
         10-m by 10-m Plots Where 226Ra Concentrations
         Were Expected to Exceed 5 pCi/g
FIGURE 1.  Location of the Ten  30-m by 30-m Areas in the  Windblown Mill-
          tailings Flood Plain Region at Shiprock, New Mexico," Within
          Which 'Multiple-composite Soil  Samples were Collected Following
          Initial Removal of Surface Soil.
                                  79

-------
            -10 m
      21-Plug Composites
                       9-Plug Composites
                                  T
                                      1.8 m
                                         i
   i
10 m
m
                 m
     16-Plug Composites
                         8-Plug Composites
                           1. 3, 5, 7 and 9
5-Plug Composites
        Positions Where
        Soil Cores Were
        Taken
  m
  8-Plug Composites
     2, 4, 6 and 8
      FIGURE 2.  Sampling Patterns  for  5-, 8-, 9-, 16-, and 21-plug
               .Composite Soil  Samples  Collected From Ten 30-m by
              .. 30-m Areas in the  Windblown Mill-tailings Flood Plain
            "•v-:': at Shiprock, New Mexico.
                                 80

-------
 2.2   DESCRIPTION  OF  THE  DATA
      The  Ra  measurements  for  the  composite samples are plotted  in  Figs. 3, 4,
 and  5.  The  figures  also  give the arithmetic mean, x, the standard  deviation,
 s, and  the number of replicate composite samples, n.  We wish to determine
 the  extent to which  the  true  standard deviation, <7,  increases when  fewer than
 21 plugs  are used to form a composite sample.  To avoid confusion,  we point
 out  that  Figs.  4  and 5 indicate that Ra measurements of most 5-, 9-, and 21-
 plug  samples from Areas-  1, 3,  and 4 are larger than measurements for the 8-
 and  16-plug  samples  from  those areas.  This is believed to have occurred
 because the  soil  in  the central 10-m by 10-m plot (from which 5-, 9-, and 21-
 plug  composite  samples were formed) had higher concentrations of Ra than the
 soil  in the  30-m  by  30-m  areas  from which the 8- and 16-plug samples were
 formed  (see  Fig.  1).
      Measurements  for Areas 8,  9,  and 10 were below 5 pCi/g (Fig. 3) and the
 standard  deviations  ranged from 0.2 to 0.8 pCi/g, with no apparent trends in
 s with  increasing  number  of plugs  per sample.  The data in Fig. 4 indicates
 that  5-plug  sample data sets may  be more skewed than those for 9- or 21-plug
 samples,  at  least  for some plots.  The measurements for Areas 1, 4, and 7 (Fig.
 5) had higher means  and were more  variable than those for the areas in Figs.
 3 and 4.  In Fig.  6  are plotted the values of s from Figs. 3,  4, and 5 to
 show more clearly  the changes  in  s that occurred as the number of plugs per
 composite sample changed.

 2.3   ESTIMATING AND MODELING CHANGES IN STANDARD DEVIATIONS
      In this section we first estimate the changes in a that occur as the
 number of plugs per composite sample decreases from 21 to a smaller number.
 Then  a model  for these changes is developed for use in later sections.
     A simple model for the ratio of standard deviations is obtained by assuming
that  measurements of Ra in individual  soil  plugs are uncorrelated,  than the
 soil  plugs are thoroughly mixed together before the 500-g aliquot is removed,
and that the standard deviation between soil  plugs does  not change as the
                                     81

-------
n
X
.s
«• f\
10
CD "
0
Q. _
>. cr
cu 5
QC
CO
a
n
99999
2.7 2.2 2.5 2.2 2.6
0.7 0.7 0.7 0.6 0.7






yy A ^— — //^
A jfa vv vv y\
8 | 1 | 1
.1 	 1 ' I 	 1 	 L_
99889
0.7 1.4 0.6 1.6 1.5
0.3 0.3 0.4 0.4 0.5








| A | |
1,1.1
89999
0.6 1.4 0.8 1.6 0.8
0.4 0.5 0.2 0.8 0.3






A
fcr
& 1 A i 1
1 g p ( a
5   8   9   16  21  5   8   9  16  21   5   8   9   16  21
   Number of Soil Plugs per Composite Sample
       ~8                 _9	         10
                     Area Number
FIGURE 3.     Ra Measurements (pCi/g)  of 5-,  8-, 9-, 16-, and 21-plug
          Composite Soil Samples Taken from  Areas 8, 9, and 10 in
          the Windblown Mill-tail ings Flood  Plain at Shiprock, New
          Mexico,  x and s are the Arithmetic Mean and Standard
       ••'."Deviation of the n Measurements for each Data Set.
                          82

-------
in
ix
IS
'*>ft'
20
i
15
! 05
O
! °- 10
< CD
:cc
CO
c
O
n
45756
2.2 2.2 2.3 1.4 2.4
0.6 1.8 1.2 0.4 0.6

-



* 1
i i i ' i
99999
6.0 1.9 5.3 1.5 4.7
1.3 0.6 1.2 0.5 0.5




| *
|~A T 1
I ! 1 i 1
99999
5.2 1.9 2.6 1.9 2.5
4.7 0.8 2.1 0.8 0.7
A



A
A
A
. A
A
^ 1 & ^ &
I I ! 1 1
99999
3.8 1.8 1.0 1.7 2.0
1.3 0.4 0.4 0.4 0.8



•
A
1 £
i i i i i
1 5  8  9  16 21  5  8  9  16 21  5  8  9 16 21  5  8   9  16  21
        Number of Soil Plugs per Composite Sample

       2356
                        ' Area Number
 FIGURE 4.  225Ra Measurements  (pCi/g) of 5-, 8-, 9-, 16-,  and  21-
           plug Composite Soil  Samples Taken from Areas  2,  3,  5,
           and 6 in the Windblown Mi 11-tail ings Flood Plain at
           Shiprock, New Mexico,  x and s are the Arithmetic
           Mean and Standard Deviation of the n Measurements for
           each Data Set.
                                83

-------
h
X
s
OCX

20
en
^ 15

Q.
*
CU
DC
§ 10
	











95959
10.2 5.6 9.9 4.5 9.0
1.82.1 1.3 0.9 3.1
-




A
A
^ A
$ A-
i
g o
* A A • ^
A.
A A
^ , A

ZS
A


III!!

55555
10.5 3.3 8.5 4.7 9.5
2.9 1.4 3.4 2.3 1.7





^J
A
A
A
A A
A A
A A A A
A

^
A 6

A
A
I I I I !

98 989
13.1 7.9 10.6 7.5 8.0
4.3 1.5 3.1 0.8 1.9
" • " •
A

A A
A

A A '
A
A A
fi A
A A A A

2 A
^ ^ ^ s
^
/\
A




III!!

         5   8   9   16  21   5   8   9   16  21   5   8   9  16 21,
        ~~ Number of Soil  Plugs per Composite Sample
                                      4                   7
       1
                              Area Number
FIGURE 5.
226Ra Measurements  (pCi/g)  of 5-, 8-, 9-16-  and 21-plug Composite Soil
Samples Taken from  Areas 1, 4, and 7  in the Windblown Mill-tailings rlood
Plain at Shiorock,  New Mexico,  x and s are the Arithmetic Mean and
Standard Deviation  of the n Measurements for each Data Set.
                                    84

-------
               Mean
       1 Area   226Ra,
    5 _ Number  pCi/g
i CT
\
"o
 C.
 o
 ra

 QJ

••a
 L_
 ra
          5

          7
         ;6
         .3

         '8
          2
         10
          9
 3

11
               no

                 2
                .5

                 3
                 2
                 1
                 1
                                                       : Mean
                                                i Area  :22eRa,
                                                Number ; pci/g
                                   '1
                                   :2
                                   7
                                   4

                                  1 5
                                  : s
                                   3
                                  10
                                   6
                                   Q
2
8
4
2
2
1
2
1
                                           21
                                                                         16
                      Number of Plugs per Composite Sample
     FIGURE 6.  Standard Deviations  of  Multiple Composite Samples from Areas
                1 Through 10 at  the  Windblown^ill-tailings Flood Plain at
                Shiprock, New Mexico.   Mean    DRa Concentrations for each
                Area are Given to  Illustrate that Areas with Lower Average
                Concentrations tend  to  have  Smaller and More Stable Standard
                Deviations.
                                        85

-------
sampling pattern (see Fig. 2) changes.  Under these assumptions we have the
model
          
-------
     TABLE 1.  Comparing Estimated and Predicted Ratios of Standard
               Deviations for Composite Samples Formed From Different
               Numbers of Soil Plugs.


                                        +                  Predicted** Ratios
Ratio of Standard       Estimated Ratios  Computed Using     Computed Using
   Deviations        	Data from Areas 1 through 8          Equation 1
Geometric
Standard
Geometric Error Arithmetic Standard
Mean (GM) (GSE) Mean (AM) Error (SE)
cTq/(7«<* 1.3 1.3 1.6
<75/a21 1.7 1.3 2.2
a5/crg 1.3 1.2 . 1.5
a8/a15 1.4 1.3 1.7
* a. = true standard deviation of j-plug composite
1/2
** r nmn 1 1 +• o H ac (n 1 n \ ' uikior»a n anr) n a VQ -Una t
0.3
0.7
0.3
0.5
samples.
1.53
2.05
1.34
1.41

     of soil plugs per composite sample, respectively.
                                                                        ooc
+    Areas 9 and 10 were excluded because of their very low and uniform    Ra

     measurements.

++   GSE = exp (Sp/v/O where s,, is the estimated standard deviation of the

     natural logarithms (n =8).
                                     87

-------
1 ra
 \
'o
,2  4
I  3
 OJ
O
•a
 h-
 «  O
•o  -i
 a
    1 -
          A5-Piug Composite Sample
          O 9-Plug Composite Sample
          + 21 -Plug Composite Sample
                             A
;s = 0.43 + 0.22x(9-Plug)
r = 0.76
               s = 0.41 -f, 0.26x(5-Plug)
               r= 0.71
                                     s = 0.10 -f 0.23x(21-Plug)
                                     r = 0.87
                        4   5    6   7    8   9   10

                         I Arithmetric Mean, x (pCi/g)
                   11  112   13  14
     FIGURE /.   Least-Squares Linear Regression  Lines Relating the
                 Standard Deviation of Replicate  Composite Samples
                 from  a  Plot to the Estimated  Mean  Concentration  of
                 "DRa for the Plot.
                                   88

-------
      Substituting  Eq.  (3)  in  Eq.  (2)  gives

          ap   =   (0.10 + 0.23/i)(p2/p1)1/2                               (4)

which  is the model  used here  to predict  the  standard  deviation  of  p.-plug
composite samples,  where p. < 21.   The equations  for  5-  and 9-plug samples  in
Fig.  7 were not  used to predict standard deviations because of  the relatively
small  correlations  (r) obtained for those data.

2.4   PERCENT ACCURACY OF ESTIMATED  MEAN  Ra CONCENTRATIONS
      Using Eq.  (4)  and assuming that  Ra  measurements  of  composite  samples are
normally distributed, the  following formula  was used  to  estimate the percent-
accuracy with  which the post-remedial-action mean Ra  concentration for  a plot
at Shiprock would be estimated with specified confidence:
      Percent Accuracy = 100 Z (0.10+0.23/i) (p2/p1)1/2/(/iV/n),             (5)

where  Z equals 1.96 or 1.28 if 95%  or 80% confidence, respectively,  is  required,
n is the number  of  p,-plug composite  samples collected in  the plot and  averaged
together to estimate the plot mean, and  \i is the  true plot mean.   Eq.  (5) is
based  on the usual  formula for estimating the number  of  samples required to
estimate a mean  with prespecified relative accuracy and  confidence;  see, e.g.,
Gilbert (1987, p. 33).
      In Fig. 8 are  plotted values of  Eq. (5) for  80%  and 95% confidence, p. =
5, 9,  and 21 plugs, n = 1  and 2 composite samples per plot, and for ji ranging
from  1 to 10 pCi/g.  To illustrate  the meaning of Fig. 8,  consider the  plotted
value  for 95%  confidence,  p^ 9, n  =  2,  and ft = 8.  If two 9-plug  samples are
from a 10-m by 10-m plot that has a true mean concentration of 8 pCi/g
(including background), then  we can be 95% sure that.the arithmetic mean of
the two measurements will  fall within about 51% of the true mean.
     The curves  in Fig. 8 show that approximately doubling the number of plugs
per sample increases the percent accuracy by 20 to 25 percentage points.
Also, the increase in percent accuracy is negligible if more than  4 composite
samples are used.
                                     89

-------
  140
  120 - \
  100
o
5
3
u
u
c
93
U
k.
0!
Q.
80
60
   40
   20
               95%
            Confidence
                                  80%
                               Confidence
                               Number of Soil Plugs
                               per Composite Sample
            Number of
        Composite Samples
                      1
                      2
                     8100
                      226.
                                        8  10
          = True Mean   Ra Concentration (pCi/g)
                  Including Background
                                                    ?26
 FIGURE 8.   Percent Accuracies for Estimated  Mean     Ra
            Concentrations  in Surface Soil for  10-m by
            10-m Plots at  the Shiprock, New Mexico Site.
                          90

-------
     By dividing Eq. 5 when p« = 21 and p^ < 21 by Eq. 5 when p2 = p1 = 21 we
obtain  (21/p,)1  , which  is the factor by which the percent accuracy of 21-
plug composite samples is multiplied to get the percent accuracy of p^-plug
samples.  This formula gives  1.5 and 2.0 when Pj = 9 and 5, respectively.
Notice that this factor is not study-site dependent since it does not depend
on fj, or a.

2.5  PROBABILITIES OF REMEDIAL ACTION DECISION ERRORS
     In this section the  increase in remedial-action decision errors as the
number of plugs per sample declines is quantified.  These results are obtained
assuming:   (1) that Eq. 4 is  an appropriate model for the variance of p.-plug
composite samples  (p. < 21),  (2) the estimated Ra mean concentration for a
plot based on p,-plug composite samples withdrawn from the plot is normally
distributed, and (3) the mean Ra background concentration is known.
     The probabilities of making remedial action decision errors are computed
for three different decision  rules:
     Decision Rule 1
     Take additional .remedial action if x1 + 1.645 a_ /v/n (the upper-95%
                                                    Pi
confidence limit on the true  plot mean) exceeds 5 pCi/g above background,
where x' is the estimated mean concentration (above background) for the plot
based on n p. - plug composite samples.
     Decision Rule 2
     Take additional remedial action if x1 exceeds 5 pCi/g above background.
     Decision Rule 3
     Take additional remedial action if x1 - 1.645 a  /v/n (the lower 95%
                                                    Pi
confidence limit on the true  plot mean) exceeds 5 pC.i/g above background.
     Among these three rules, Rule 1 offers the greatest protection to the
public because the probabilities of taking additional remedial  action are
greater than for rules 2 or 3.  Rule 3 will result in fewer decisions to take
remedial action than rules 1 or 2 for plots with true mean Ra concentrations
near 5 pCi/g above background.  Hence,  Rule 3 will  tend to reduce costs of
                                     91

-------
remedial action.  Rule 2 is a compromise strategy in that the probabilities
of taking remedial action fall between those for Rules 1 and 3.
     Let us define ft to be the probability that a statistical test will indicate
additional remedial action is needed.  When Decision Rule 1 is used, the
probability ft is obtained by computing:
             .
                    "21
where 5 is the EPA limit, /*' is the true plot mean above background, a~, is
the standard deviation of 21-plug composite samples given by Eq. (3), p, is
the number of soil plugs used to form each of the n composite samples from
distribution.  I0 is then referred to tables of the cumulative normal
                P
distribution to determine ft.
     For Decision Rule 2, the same procedure is used except that Eq. (6) is
computed with the constant 1.645 replaced by zero.  For Decision Rule 3, the
negative sign before 1.645 in Eq. (6) is replaced by a positive sign.
     We computed ft for various' values of ft' when the background Ra concentration
was assumed to be 1 pCi/g (the approximate background value for the windblown
flood plain at the Shiprock site) when n = 1, 2, or 3, and p, = 5,  9, or 21.
The results when n = 1 are plotted in Fig. 9, and the results for one,  two,
or three 9-plug composite samples are plotted in Fig. 10.
     These figures indicate that:
 1.  Decreasing the number of plugs per composite sample increases the
     probability of incorrectly deciding additional  remedial  action is needed.
                                      92

-------
          Confidence
                                                       Lower 95%
                                                       Confidence
                                                       Rule
                                                       (Rule 3)
                            if 50%
                              Confidence /
                                 EPA Limit
                                  I   !    I   I
          1   2   3   4  5   6  7   8  9  10 11  12 13 14  15 16 17  18 19
                  22£
        True Mean   Ra Concentration (pCi/g) Including Background for a Plot
          0   1   2   3  4   5  6   78   9 10 11 12 13  14 15 16  17 18

         True Mean 22SRa Concentration (pCi/g) Above Background for a Plot
FIGURE 9.  Probabilities of Taking  Additional  Remedial Action in a
           Plot for Three Decision  Rules When  One 500-g Sample from
           a Composite Sample  Composed of Eithe^Sl, 9, or 5 Soil
           Plugs from the Plot is Measured for    Ra.
                                     93

-------
     For example, if the upper confidence limit rule is used (Rule 1),  if one
     composite sample is collected, if the true mean for the plot is 3  pCi/g
     above background, and if background is 1 pCi/g, then the probability the
     rule will indicate additional remedial action is needed increases  from
     about 0.40 to about 0.65 if a 9-plug rather than a 21-plug composite
     sample is used to estimate the plot mean (see Fig. 9).
 2.  Decreasing the number of plugs per composite sample increases the
     probability of incorrectly deciding additional  remedial action is  not
     needed.  For example, if the lower confidence limit rule is used (Rule
     3), if one composite sample is collected,  if the true plot mean is 10
     pCi/g above background,  and if background  is 1  pCi/g,  then the probability
     that Rule 3 will  correctly indicate additional  remedial action is  needed
     decreases from about 0.60 to about 0.30 if a 9-plug rather than a  21-
     plug sample is used (see Fig. 9).
 3.  Taking more than  one composite sample per  plot  reduces  the probability
     of incorrectly deciding  additional remedial  action is needed.  For the
     example in number 1 above, the probability decreases from about 0.65 to
     about 0.45 if two composite samples rather than one are collected  to
     estimate the mean (see Fig. 10).
 4.  For plots with mean concentrations near 5  pCi/g above background,  the
     probabilities of taking  additional remedial  action are  highly dependent
     on which decision rule is used.  For example, if the upper confidence
     limit rule is used (Rule 1), the  probability is greater than 0.95  that
     the test will indicate additional remedial  action is needed when the
     plot has a mean Ra concentration  greater than 5 pCi/g above background.
     But if the lower confidence limit rule (Rule 3) is used, and one 21-plug
     composite sample is collected, the probability  that the test will  indicate
     additional remedial action is needed does  not reach 0.95 until the true
     plot mean is about 20 pCi/g above background.  Rule 2 falls between these
     two extremes.  It achieves a 0.95 probability (for one  or more 21-plug
     samples)  when the true mean above background is about 9 or 10 pCi/g (see
     Fig. 9).
     The three decision rules may find application at different times in the
remedial action process. The  upper confidence limit  rule seems most appropriate
                                    94

-------
1.0
         Upper 95%
         Confidence    ,
         Rule (Rule 1) /
Lower 95%
Confidence
Rule (Rule 3)
                              if/  Confidence
                              '//   Rule
                              '
                                                          Number of
                                                        9-Plug Samples
                                                        .	1
                                                                 2
                                                                 3
           2       4       6       8       10      12      14      16

      True Mean   Ra Concentration (pCi/g) Including Background for a Plot

       0_       2       4       6        8       10      12      14      16
                 22£
       True Mean   Ra Concentration (pCi/g) Above Background for a Plot
 FIGURE  10.   Probabilities of Taking Additional Remedial  Action in a
              Plot  for Three Decision Rules if One,  Two,  or  Three
              500-g Samples from a Composite Sample  Composed of 9 Soil
              Plugs are Measured for    Ra.
                                   95

-------
at initial stages when it may be prudent to assume that the plot is contaminated
until proven otherwise.  The "price" of using this rule is increased remedial
action costs for plots that have true mean concentrations just under 5 pCi/g
above background.  The lower confidence limit rule is more appropriate for
plots that are strongly believed to have already been cleaned to below the
EPA limit.  Using this rule, the probability of taking additional  remedial
action is less than 0.05 when the true plot mean is 5 pCi/g above background
or less.
     The magnitude of changes in the probability of making incorrect remedial
action decisions due to changing the number of soil plugs per composite sample
from 21 to a lesser number depends on the particular statistical test used to
make the decision.  For example, suppose the decision to take additional
remedial action will be made whenever the estimated plot mean above background
is greater than the EPA limit of 5 pCi/g above background (Rule 2).  Also,
assume that the standard deviation of composite-sample Ra concentrations  is a
known constant as modeled using the Shiprock data.  Then using one or more 9-
plug rather than 21-plug composite samples increases the probability of making
decision errors (incorrectly deciding additional remedial action is or is not
needed) by no more than about 17 probability points.  These maximum increases
are over relatively narrow bands of true plot means above background; between
2.5 and 4.5 pCi/g and between 6 and 13 pCi/g.  These bands become smaller if
more than one composite sample per plot is used to estimate the plot mean.
If the plot mean is estimated using one or more 21- or 9-plug samples,  the
probability of incorrectly deciding additional  remedial action is  not needed
is small (< 0.05)  when the true plot mean above background exceeds about  15
pCi/g.
     If Rules 1 and 3 are to yield and probabilities shown in Figs. 9 and 10
the true standard deviation for the plot must be given by Eq. (4).  At
contaminated sites where this model does not apply, special soil sampling
studies could be conducted to determine whether Eq. (4) or some other model
is applicable.   Alternatively,  if several composite samples are collected
from each plot then the standard deviation could be estimated directly for
each plot using those data.  Then upper or lower confidence limits would  be
computed using the t distribution rather than the normal distribution [see
                                     96

-------
 Exner et al.  (1985) for an application of the upper confidence limit test].
 Use of the t  distribution will generally give more decision errors,  which is
 the price paid when the standard deviation must be estimated.   If the mean
 background Ra concentration is estimated, this will  also increase the standard
 deviation and hence the probabilities of making decision errors.
      As concerns the comparison of 21-,  9-,  and 5-plug samples,  the  increase
 in probabilities of decision errors as the number of plugs  per composite sample
 is reduced is,  on the whole, about the same  as shown in Figs.  9  and  10 when
 the standard  deviation,  a  , was assumed known.  This conclusion  is  based on
                           1
 probabilities of decision errors we obtained using the noncentral  t  distribution
 and the methods in Wine (1964),  pp. 254-260).   These results are  shown in
 Fig.  11 for the case of two composite samples  per plot.

 2.6  EXPECTED NUMBER OF DECISION ERRORS
      The expected number of plots at a remediated site that are misclassified
 as needing or not needing additional  remedial  action depends on the
 probabilities of making  decision errors  and  on the frequency distribution of
 the true plot means.   Fig.  12 shows the  frequency distribution of estimated
 Ra means for  1053 plots  at  the Shiprock  floodplain site that had  undergone an
 initial  remedial  action  (removal  of soil).   Each  mean was estimated  by the
 measurement of  one 20-plug  composite sample  from  the plot.   Fig.  12  shows
 that  83  plots had estimated means that exceeded the  EPA standard  of  1  pCi/g
 above background  (6  pCi/g).
      We  assume  for illustration  purposes  that  the  histogram in Fig.  12'is
 the distribution  of  true  plot  means.   (When  the RTRAK system becomes
 operational,  it  is expected  that,  following  remedial  action, all plots will
 have  Ra  concentrations below the  EPA  limit.  Hence,  the  distribution  in
 Fig.  12  may be a  worst case  distribution.)   Under  this  assumption we wish  to
 determine the effect of using  9  rather than  21  plugs  of  soil per composite
 sample on the expected number  of  plots that  are misclassified.   Let ni be the
 number of plots in the ith frequency class, Q be the  number of classes, and
Pi be the probability of a decision error for a plot with true mean in the
 ith class using a chosen decision rule.  Then E =£  nipj is the expected
number of misclassified plots for the decision rule.
                                     97

-------
 CO
 f*
 o
   o
 „ o>
 o Z
: C3 CO
1 o —

,11
— :VM
.+- O
 «•<
 05 M>
I- ra
 2'-5
- 0
•= cc
re
O
i_
C_
       !1.0r
       0.8 -
            Upper 95%
            Rule     .'//
            (Rule 1)
                                  50% Confidence
                                  Rule (Rule 2)
                                                       21  Plugs
                                                         9  Plugs
                                                	5 Plugs
                             EPA Limit
                                                      Lower 95
                                                      Confidence
                                                      Rule (Rule 3)
         0
                                     10   12   14   16   18    20
                 226,
           Mean    Ra Concentration (pCi/g) Above Background
                                  for a Plot
          FIGURE  11.  Probabilities  of Taking Additional Remedial  Action
                     in a Plot for  Three  Decision Rules if Two 21-,  9-,
                     or 5-Plug Composite  Soil Samples are Collected  and
                     the t Test is  Used to  Make Decisions.
                                     98

-------
    200--
    150--
C/3
*-*
_o
0.
OJ
3
z
    100-
             2.0    4.0    6.0   8.0   10.0   12.0   14.0  16.0

                Ra (pCi/g) (Including Background of 1 pCi/g)
      FIGURE  12,  Frequency Distribution of Estimated Mean  ^"Ra
                 Concentrations (pCi/g) in Surface Soil  following
                 Initial Remedial  Action for 1053 10-m by  10-m
                 Plots in the Windblown Mill-tailings Flood  Plain
                 at Shiprock, New Mexico.
                                   99

-------
     First, we computed E for the 970 plots in the Q = 12 classes in Fig. 12
that had means less than 6 pCi/g, i.e., for plots that met the EPA standard.
Using the probabilities in Fig. 9 for Rule 2 of incorrectly deciding to take
additional remedial action, we found that E = 27.4 and 40.2 for 21- and 9-
plug samples, respectively.  Hence, the use of a single 9-plug rather than a
single 21-plug composite in each plot would result in an expected 13 more
plots undergoing unneeded additional remedial action.
     Next, we computed E for the 83 plots in Fig. 12 that had means greater
than 6 pCi/g, i.e., for plots needing additional cleanup.  Using Rule 2 and
the probabilities of incorrectly deciding no additional remedial action was
needed from Fig. 9, we found E = 12.95 and 19.5 for 21- and 9-plug samples,
respectively.  That is, about 7 more plots would not receive needed remedial
action if 9- rather than 21-plug samples were used.
     We note that the 83 plots in Fig. 12 that exceeded the EPA standard were
subsequently further remediated:

2.7.  LOGNORMAL MODEL
     The results in Sections 2.3 - 2.6 were obtained by modeling the
untransformed data under the assumption those data were normally distributed.
We used the W statistic to test for normality and lognormality (see,  e.g.
Gilbert (1987)  or Conover (1980) for descriptions of this test) of the data
in Figs. 5,  6,  and 7.   We found that 21-plug samples were more likely to be
normally distributed than the 9- or 5-plug samples, and that 9- and 5-plug
samples were more likely to be lognormally distributed than normally
distributed.   Also, the increase in the standard deviation as the mean increases
(see Fig.  7)  indicates that the lognormal  distribution may be a better model
for these data than the normal  distribution.
     In this  section we investigate the extent to which the probability results
in Section 2.5 would change if the lognormal distribution rather than the
normal  distribution was appropriate.  To do this,  the natural logarithms of
the data in Figs. 3, 4, and 5,  were computed and a model  was developed for
the standard  deviation of the logarithms.   We found that after deleting the
data for plots 9 and 10 (the standard deviation of the logarithms (sy)  for
these plots were about twice as large as for the remaining eight plots)  there
                                     100

-------
was no statistically significant linear relationship between Sy and the mean
of the logarithms.  This indicates that the lognormal distribution may be a
reasonable model, at least for plots with concentrations at the level of those
in plots 1 through 8.  The pooled standard deviation of the logarithms for
plots 1-8 was 0.4, 0.37, and 0.3 for 5-, 9-, and 21-plug samples, respectively,
     The probabilities of taking additional remedial action were computed for
Rule 2 for the case of one, two, or three 5-,  9-, and 21-plug samples using
these modeled standard deviations.  This was done by computing

          Z^=   (In 5  -  In /t1)/ ay

and referring Z^ to the standard normal distribution tables, where ay equalled
0.4, 0.37, and 0.3 for 5-, 9-, and 21-plug samples, respectively.
     We found that for 9-plug samples, the false-positive error probabilities
for the lognormal case differed by less than two probability points from those
for the normal case for all mean Ra concentrations less than the EPA limit.
Differences in the false-negative rates were as large as 8 probability points
for mean concentrations between 8 and 10 pCi/g above background for the case
of one 9-plug composite sample per plot.  These results, while limited in
scope,  suggest that the false-positive and false-negative error probabilities
in Section 2.5 may be somewhat too large if the lognormal distribution is
indeed a better model  for the Ra data than the normal distribution.
                                     101

-------
                        3.0  RTRAK AND ITS CALIBRATION

     The  RTRAK  is  a 4-wheel-drive tractor equipped with four Sodium-Iodide
 (Nal) detectors, their  supporting electronics, an industrial-grade IBM PC,
 and a commercial microwave auto-location system.  The detectors are
 independently mounted on the front of the tractor and can be hydraulically
 lifted and angled.  Bogey wheels support the detectors to maintain a distance
 of 12 inches from  the ground during monitoring.  Each detector has a tapered
 lead shield that restricts its field of view to about 12 inches, with overlap
 between adjacent detectors.  The RTRAK will take gamma-ray readings while
 moving at a constant speed of 1 mph.  When a reading above a prespecified
 level is encountered, red paint is sprayed on the ground to mark these "hot
 spots". The automatic microwave locator system provides x-y coordinates with
 the count data.  This will permit real-time map generation to assist in control
 of contamination excavation.  Preliminary data indicate that the RTRAK should
 be able to detect  Ra in soil at concentrations less than 5 pCi/g.  Further
 tests of the RTRAK1s detection capabilities are underway.
     The proper calibration of the RTRAK detectors' is important to the success
 of the remedial-action effort.  The Na(I) detectors detect selected radon
 daughter gamma peaks that are related to Ra.  Hence, the RTRAK detectors do
 not directly measure Ra, the radionuclide to which the EPA standard applies.
 Radon is a gas, and the rate that it escapes from the soil depends on several
 factors including  soil moisture, source depth distribution,  soil randon
 emanating fraction, barometric pressure, soil density, and soil composition.
 The calibration of the detectors must take these variables into account so
 that randon daughter gamma peaks can be accurately related to Ra concentrations
 under field conditions.
     A field calibration experiment near the Ambrosia Lake,  NM, mill-tailings
 pile was recently conducted as part of the effort to develop a calibration
 procedure.  In this experiment the RTRAK accumulated counts  of    Bi  (Bismuth)
 for approximately 2-second intervals while traveling at 1  mph.   Red paint was
 sprayed to mark the locations and distances traveled for each time interval.
 For each detector,  from 3 to 5 surface soil  samples  were collected down the
center!ine of each  scanned area (Fig.  13).   Then,  for each of these areas,
                                     102

-------
                     Location
ector 1 2
! 4
1
2
8 F
3
4
i
eet
r
000
0 0 O
o o o
000

O 0 0
o o o
O O O
o o o
OUll
Plugs,
• • •
N -
o o o
000
0 O O
0 O O
FIGURE 13.   Pattern  of.Soil-Sample  Locations and  RTRAK
            Detector Readings  for Obtaining Data  to
            Calibrate the  Detectors.
                          103

-------
these samples were mixed and a ~ 500-g aliquot was removed and sealed in a
metal can that was assayed for Ra within a few days and then again following
a 30-day waiting period to permit equilibrium to be established between Rn
and 214Bi.
     The data and the fitted least-squares linear regression line are displayed
in Fig. 14.  The data for the 4 detectors have been combined into one data
set because there were no important differences in the 4 separate regression
lines.  Also shown in Fi-g. 14 are the 90% confidence intervals for predicted
Ra individual measurements.  The regression line and limits in Fig. 14 were
obtained by first using ordinary least-squares regression on the In-transformed
data.  Then the equation was exponentiated and plotted in Fig. 14.  It is
expected that this calibration equation will  be adjusted on a day-by-day basis
by taking several RTRAK-detector measurements per day at the same location in
conjunction with measurements of barometric pressure and soil moisture.   This
adjustment procedure is presently being developed.
                                     104

-------
90
80 -

^
•9
CO
5:
J-N-*
~\
\j
70 '
60 -

50 -
-
40 -

226Ra

n
R2

30 1
20 ^
10
 0 1
    I '

   10
                                                       90% Confidence
                                                             Intervals
20
30         40         5

      609  KeV,  cps
                                           60
                                           70
    FIGURE  14.  Least-sauares Regression Line for Estimating  " Ra
               Concentrations  (pCi/g)  in Surface Soil Based on
               RTRAK-Detector  Readings of Bi-214 (609 Kev).
                                   105

-------
                     4.0  COMPLIANCE ACCEPTANCE SAMPLING

     As illustrated by Fig. 14, there is not a perfect one-to-one correspondence
                                  214
between RTRAK detector counts for    Bi and measurements of Ra in aliquots of
                                             214
soil.  This uncertainty in the conversion of    Bi counts to Ra concentrations,
and the fact that the EPA standard is written in terms of Ra concentrations,
suggests that soil samples should be collected in some plots and their Ra
concentrations measured in the laboratory as a further confirmation that the
EPA standard has been met.  Schilling (1978) developed a compliance acceptance-
sampling plan that is useful for this purpose.
     Schilling's procedure as applied here would be to (1) determine (count)
the total number (N) of 10-m by 10-m plots in the remediated region, (2) select
a limiting (small) fraction (PL)  of defective plots that will be allowed (if
undiscovered) to remain after remedial action has been completed, (3) select
the confidence (C) required that the fraction of defective plots that remain
after remedial action has been conducted does not exceed PL, (4) enter Table
1 in Schilling (1978) or Table 17-1 in Schilling (1982)  with D = NPL to
determine the fraction (f) of plots to be sampled, (5) select n = fN plots at
random for inspection, and (6) "reject" the lot of N plots if the inspection
indicates one or more of the n plots does not meet the EPA standard.  (The
meaning of "reject" is discussed below.)
     In Step 6,  each of the n plots would be "inspected" by collecting three
or four 9- or 21-plug composite soil samples and using these to conduct a
statistical test to decide if the plot meets the EPA standard.  The choice of
three or four 9- or 21-plug samples is suggested by the results of our
statistical analyses in Section 2.0 in the windblown mill-tailings flood plain
region at Shiprock, NM.
     Steps 4 and 5 can be simplified by using curves  (Hawkes, 1979) that give
n at a glance for specified N, PL, and C.  Also, the Operating Characteristic
(OC) curves for this procedure (curves that give the probability of rejecting
the lot [of N plots] as a function of the true fraction of plots that exceeds
the standard) can be easily obtained using Table 2 in Schilling  (1978) or
Table 17-2 in Schilling (1982).
                                     106

-------
     To illustrate the 6-step procedure above,  suppose C = 0.90 and PL = 0.05
are chosen, and that the remediated region contains N = 1000 plots.  Then we
find from Fig. 1 in Hawkes (1979) that n = 46 plots should be inspected.  If
all 46 inspected plots are found to be non-defective, we can be 100C = 90%
confident that the true fraction of defective plots in the population of N =
1000 plots is less than 0.05, the specified value of PL-  If one or more of
the n plots fail the inspection, then our confidence is less than 0.90.
     As another example; suppose there are N = 50 plots in the remediated
region of interest.  Then, when C = 0.90 and PL = 0.05, we find that n = 30
plots should be inspected.  Small lots that correspond perhaps to subregions
of the entire remediated region may be needed if soil excavation in these
regions was difficult or more subject to error because of hilly terrain or
other reasons.
     The action that is taken in response to "rejecting the lot" may include
collecting three or four 9- or 21-plug composite soil samples in adjacent plots
surrounding the inspected plots that exceeded the EPA standard.  The same
statistical test as used previously in the original n plots would then be
conducted in each of these plots.  If any of these plots were contaminated
above the EPA limit, they would undergo remedial action and gamma scans using
the RTRAK system, and additional adjacent plots would be sampled, and so forth.
The calibration and operation of the RTRAK Nal detectors would also need to
be double checked to be sure the detectors and entire RTRAK system is operating
correctly.
     An assumption underlying Schilling's procedure is that no decision error
is made when inspecting any of the n plots.  However, inspection errors will
sometimes occur since "inspection", as discussed above, consists of conducting
a statistical test for each plot using only a small sample of soil from the
plot.  When inspection errors can occur, the fraction of defective plots is
artificially increased, which increases the probability of rejecting the lot.
To see this, let P denote the actual  fraction of plots whose mean exceeds the
EPA limit, let PI denote the probability of a false-positive decision on any
plot (deciding incorrectly that additional remedial action is needed), and
let ?2 denote the probability of a false-negative decision (deciding incorrectly
that no additional  remedial  action is needed).   Then, the effective fraction
                                     107

-------
defective is Pe = Pl(l-P) + P(l-Pz)-  For example,  if PI = P£ = P = 0.05,
then Pe = 0.05(0.95) + 0.05(0.95) = 0.095 so that the compliance sampling
plan will operate as if the true proportion of defective plots is 0.095 rather
than 0.05.  This means there will be a tendency to reject too many lots that
actually meet the C and PL specifications.
     In Section 2.5 we saw, using Ra data from the Shiprock,  NM, mill-tailings
site, how P, and P« change with the statistical test used, the true mean
concentration,  the number of composite samples, and the amount of soil used
to form each composite sample.  If remedial action has been very thorough so
that mean concentrations in all plots are substantially below the EPA limit,
then the true fraction of defective plots, P,  will  be zero and P  = P. (since
                                                                C    J.
P = 0)  will  be  small.   In that case, the probability of "rejecting the lot"
using Shillings'  compliance acceptance sampling plan will be  small.  As
indicated above,  this  probability is given by the OC curve that may be obtained
using Table  2 in  Schilling (1978).
                                     108

-------
                                5.0  DISCUSSION

      In this paper we have illustrated some statistical  techniques for
 developing more cost-effective sampling plans for verifying that 225Ra
 concentrations in surface soil meet EPA standards.    Although  the focus  here
        po c
 was on    Ra in soil, these techniques can be used  in other environmental
 cleanup situations.   Because of the high cost of chemical  analyses for hazardous
 chemicals, it is important to determine the number  and type or size of
 environmental samples that will  give a sufficiently high probability of  making
 correct cleanup decisions at hazardous-waste sites.   Also,  it  is  clear from
 Section 2.5 above that when the  level  of contamination is  close  to the allowed
 maximum concentration limit,  the probabilities  of making correct  cleanup
 decisions depend highly on the particular statistical  test  used  to make
 decisions.   Plots of  probabilities  such  as  given  in  Figs.  9, 10,  and  11  provide
 information for evaluating which test  is  most  appropriate  for  making  remedial-
 action  decisions.
     A  topic  that is  receiving much  attention  at  the  present time  is  the use
 of  in-situ  measurements  to reduce the  number of environmental  samples  that
 must be  analyzed  for  radionuclides or  hazardous chemicals.  The RTRAK  system
 discussed  in  this  paper  is  an  example  of  what can be  achieved  in the  case of
 radionuclides  in  soil.   Some  in-situ measurement  devices may only  be  sensitive
 enough  to  determine if  and  where a contamination  problem exists.   Other devices
 may be  accurate  enough to  provide a  quantitative  assessment of contamination
 levels.   In either case, but especially for the latter case, it is  important
 to quantitatively  assess the accuracy with which  the  in-situ method can measure
 the contaminant of interest.  The regression line in  Fig. 14 illustrates  this
 concept.
     It is hoped that this paper will provide additional  stimulus for the use
of statistical methods in the design of environmental sampling  programs for
the cleanup of sites  contaminated with radionuclides and/or hazardous-waste.
                                     109

-------
                               6.0  REFERENCES


Conover, W. J.  1980.  Practical  Nonparametric Statistics,  2nd ed., Wiley,
  New York.

EPA 1983.  Standard for Remedial  Actions at Inactive Uranium Processing Sites;
  Final Rule (40 CFR Part 19 2).   Federal  Register 48 (3):590-604-(January
  5, 1983).

Exner,  J. H., W. D. Keffer,  R. 0. Gilbert,  and R.  R. Kinnison.  1985.  "A
  Sampling Strategy for Remedial  Action at  Hazardous Waste Sites:  Clean-up
  of Soil Contaminated by Tetrachlorodibenzo-p-Dioxin."   Hazardous  Waste and
  Hazardous Materials 2:503-521.

Hawkes, C. J.  1979.  "Curves for Sample Size Determination in Lot  Sensitive
  Sampling Plans", J. of Quality Technology 11(4):205-210.

Gilbert, R. 0. 1987.  Statistical Methods  for Environmental Pollution
  Monitoring.  Van Nostrand  Reinhold,  Inc., New York.

Schilling, E. G.  1978.  "A  Lot Sensitive  Sampling Plan  for Compliance Testing
  and Acceptance Inspection", J.  of Quality Technology 10(2):47-51.

Schilling, E. G. 1982.  Acceptance Sampling in Quality Control.  Marcel Dekker,
  Inc., New York.

Wine, R. L. 1964.  Statistics for Scientists and Engineers.  Prentice-Hall,
  Inc., Englewood Cliffs, New Jersey.
                                     no

-------
                                      DISCUSSION
                                     Jean Chesson
            Price Associates,  Inc.,  2100 M Street, NW, Washington, DC  20037
   The presentation by Richard Gilbert
provides a good illustration of several
points that have been made by earlier
speakers.  My discussion is organized
around three topics that have general
applicability to compliance testing,
namely, decision error rates, sampling
plans, and initial screening tests.

Decision Error Rates
   The EPA standard for Cleanup of Land
and Buildings Contaminated with Residual
Radioactive Materials from Inactive Uran-
ium Processing Sites (48 FR 590) reads
"Remedial actions shall be conducted so
as to provide reasonable assurance
that,	" and then goes on to define
the requirements for concentrations of •
radium-226 in the soil.  An objective way
to "provide reasonable assurance" is to
devise a procedure which maintains stati-
stical Type II error rates at an accep-
table level.  A Type II error, or false
negative, occurs when the site is decl-
ared  in compliance when in fact it does
not satisfy the standard.  The probab-
ility of a Type II error must be low
enough to satisfy EPA.  On the other
hand, the false positive (or Type I)
error rate also needs to be kept reason-
ably  low, otherwise resources will be
wasted on unnecessary remedial action.
The aim is to devise a compliance test
that  will keep Type I and II errors with-
in acceptable bounds.
   Developing a compliance test involves
three steps.  First, a plan for collect-
ing data and a rule for interpreting it
is specified.  The paper considers sever-
al sampling plans and three decision
rules for data interpretation.  Second,
the decision error rates are calculated
based on a statistical model.  In this
case, the model involves a normal distri-
bution, a linear relationship between the
variance and mean for composite samples,
and an assumption of independence between
individual soil plugs making up the comp-
osite.  The last two components of the
model are based on empirical data.
Third, the sensitivity of the estimated
error rates to changes in the model ass-
umptions should be investigated.  This is
particularly important if the same proce-
dure  is going to be applied at other
sites.  For example, if the estimated
error rates are very sensitive to the
model relating variance and mean, it will
be necessary to verify the relationship
at each site.  Conversely, if the error
rates are relatively insensitive to
changes in the relationship, the com-
pliance test could be applied with con-
fidence to other sites without additional
verification.

Sampling Plans
   The sampling plan is an integral part
of the compliance test.  The paper illus-
trates how sampling occurs at several
levels.  There is the choice of plots
within the site.  The current plan in-
volves sampling every plot.  The proposed
plan suggests sampling a subset of the
plots according to an acceptance sampling
plan.  Then there is the choice of the
number and type of samples.  One or more
samples may be collected per plot each
composed of one or more soil plugs.
Usually more than one combination will
achieve the required decision error
rates.  The optimum choice is determined
by the contribution of each type of sam-
ple to the total variance and by relative
costs.  For example, if variability bet-
ween soil plugs is high but the cost of
collecting them is low, and the measu-
rement method is precise but expensive,
it is advantageous to analyze composite
samples composed of several soil plugs.
If the measurement method is inexpensive,
it may be preferable to analyze individ-
ual samples rather than composites.

Initial Screening Tests
   The RTRAK is an interesting example of
an initial screening test.  Initial scre-
ening tests may be used by the regulated
party to determine when the site is ready
for the "real" compliance test, or they
may be an integral part of the compliance
test itself.  In either case, the objec-
tive is to save costs by quickly ident-
ifying cases that are very likely to pass
or to fail the clearance test.  For ex-
ample, if the RTRAK indicates that the
EPA standard is not being met, additional
remedial action can be taken before final
soil sampling, thereby reducing the num-
ber of times soil samples are collected
before the test is passed.  If the init-
ial screening test is incorporated in the
compliance test, i.e., if a favorable
result in the initial screening reduces
or eliminates subsequent sampling re-
quirements, then calculations of decision
error rates must take this into account.

   The "reasonable assurance" stated in
the EPA rule is provided by an assessment
of the decision error rates for the en-
tire compliance test.  The development
and evaluation of a practical and effec-
tive multi-stage compliance test is a
significant statistical challenge.
                                           Ill

-------
                       DISTRIBUTED COMPLIANCE:   EPA AND THE LEAD BUBBLE
                                           John W.  Holley
                                         Barry  D. Nussbaum
                    U.S. EPA  (EN-397F),  401 M St., S.W. Washington,  D.C.
   This paper discusses a particular class 01
strategies,  "DUDbies",  tor tne management or
numan exposure to environmental hazaras ana
examines an application or sucn strategies to
tne case or ieao in gasoline.   Wnile gasoline
is by no means the oniy source 01  environmental
iead, lor most ol the population it has been
tne dominant source ror many years ana is
certainly the most controllable source.  Leaa
is not oniy toxic to people, it is also toxic
to catalytic converters which are  used on vehi-
cles to reduce emissions of  such conventional
poiiutants as carbon monoxide, hydrocarbons,
ana oxides ot  nitrogen.  The twin  objectives or
protecting people iron lead  ana iron the con-
ventional emissions ol  vehicles with lead-
disaoieo catalysts led to the nrst Enviro-
nmental Protection Agency (EPA; regulation or
the substance in gasoiine in 1979.- This tirst
regulation covered the totai amount ot lead
aiiouea in eacn gailon or gasoline produced by
& retinery wnen leaded and unieaded gasoiine
are considered together and  averaged over a
quarter.  it aiso set up temporary stanQaros at
a iess stringent level  tor smail retiners.
without tninKing or it in these terms, the
Agency had taxen tne first steps toward recog-
nizing the neeo ior ana implementing a "bubble"
poiicy tor lead.  The paper  will present some
conceptual  toois tor  discussing buboies ana
then examine tne application 01 tnis management
approach to gasoiine iead.

         Bubbies--GeneraI Principles

   in general,  a ouoole approach to environ-
mental regulation may oe thought 01 as any
approach that aims at ensuring that environ-
mental exposure to some poiiutant  is reduced  or
controlled "on the average"  whiie  accepting
some variability across emitters in the nagni-
tuoe or tneir  contribution.   "Un tne average"
ana  "emitters" are iaeas that oDviousiy require
rurther discussion.

rurooses or buobie regulations

   Regulators may use buooies ror  at  least tour
reasons.  First, they may allow institution ot a
stringent regulation that would oe inteasibie
tor  each entity to meet, yet might be  feasible
tor an inaustry as a whole.   Second, bubbles
make  it  possible to  improve the flexibility ot
a regulation trom the standpoint or tne regu-
 lateo  entities and may thus lessen any negative
economic impacts.  Tne classic plant bubble is
a case  in point, providing  tor operating  tiexi-
biiity oy regulating the pollution from the
entire piant rather  than  tnat  from each smoke-
stacK.  Truro, DUDDIBS may  improve tne
"raiiness" ot application ot  tne burdens  asso-
ciated witn a regulation.   In IMS way regulators
may  mitigate the economic impact ot an action
upon rirms that are  somehow unusually  sensitive
 to  its  provisions.   The  final  reason  tor  using
a ouooie approach  is really derivative ot the
second and third.   By minimizing and more
fairly distributing the impact ot  a regulation,
the drafter may make badly needed controls
"possible" in a politico-economic sense.   Thus
the public health may be protected by a buooie
regulation in a situation where the economic
impact ot a simpler regulation would make it
politically impossible to achieve.

Logical elements ot a buoble

   A bubbie regulation always has some
dimension or set ot dimensions along which
compliance is distributed.  The most ODVIOUS
such dimension is space, ana is illustrated
again by reference to the pi ant/smokestack
bubble.  A lack of compliance in one location
may be balanced otr against greater than mini-
mum compliance in another location.  It is
important in planning the implementation ot a
ouoble regulation whether sources across whicn
emissions are to be averagea are part ot a
single legally responsible entity  ias in the
plant model) or are eacn themselves separate
corporate entities.
   Time is another dimension aiong which com-
pliance may be distributed.  Almost ail ot our
regulations are to some degree buooies in this
sense, since tne dimension or time is always
involved in our setting ot compliance periods.
Time even enters into our selection ot the
appropriate units las in cubic feet per
ninutej.  This dimension becomes most impor-
tant, though, in a situation where it is
actively and intentionally manipulated in the
design ot tne compliance strategy so as to
achieve one or more ot  the objectives or
bubbles that were mentioned above.
    In addition to dimension, any  successtui
ouobie approach must have some thought given  to
wnat, tor want ot a oetter  terra,  we may can  an
integrating medium.  This medium must assure
that the results ol our allowing  an uneven
distribution ot compliance across some dimen-
sion does not also result in sharp differences
in the cdnsequences ot exposure across that
same dimension.  People in one area suffering
from some Kind ot toxic exposure are artoroeo
scant comtort by knowing  that in  consequence  or
their suttering tne people in another area are
not attected at all by  the pollutant.  So whiie
we are attempting to achieve fairness in dis-
tributing the economic  burdens or  compliance
among polluters, we must also consider the
question of equity in exposure.
   The integrating media  in most  oubbies are
the classic air, water, soil, and  tood.  Unaer
some circumstances we may consider the human
body to be an  integrating medium, as  in tne
case or pollutants whose etfects are cumulative
in  the body over a lifetime.  The  air may mix
the emissions trom stack A and stack B so that
the downwind victim experiences the average  or
tne two.  Certain pollutants may  be oittusea
throughout a body ot water  in such a way tnat
heavy emissions on one  day may be  oaianceo ort
                                                   112

-------
aga;nst very light emissions on another day
with tne sarae ertect as it aaily emissions hao
Deen carefully held to an intermediate or
average i eve I .

Ent01 cement considerations

   Measurement and/or sampling prooiems may
arise with distributed compliance regulations
tnat are rarely a problem with more conventio-
nal approaches.  An example is a scheme tor
averaging automobile emissions across nodeis or
engine lamilies that was considered by the
Agency some years ago.  Jitnout a cuobie
approach the certification process is limited
to determining whether each engine tamity meets
a single standard. Under a oubble- approach a
wnoie set ot issues arises around measuring the
emission level ot each tamily within some con-
tiaence iimits--questions 01 sample size and
design and distribution shape rear their heads.
When these vehicles are tested to verily tneir
in-use performance, statistical concerns again
arise as we consider whether the manuracturer
should be held responsible ror the point esti-
mate ot certification emissions, trie lower
conridence  limit  ito provide maximum protection
ror tne environment, or the upper confidence
limit  >to protect the manufacturer against
unp.easant surprises that may be cased upon
sampling error;.  These statistical concerns
cieariy nave snarpiy tocussea policy and tegai
imp Iications.
    une enect  of  some distributed compliance
schemes is to  unintentionally compromise an
environmental  oenelit which arises out 01 in-
dustry quality assurance provisions.   In the
simple situation  where the manufacturer uust
meet a standard and face dire consequences tcr
railing to do  so, some "headroom" is likely to
Cjt  iert Between the actual emission level and
tne somewhat higher standard.  This gap oenetits
trie environment to tne extent of tne manufac-
turer's intolerance ot risk.  A redesign or such
an  existing compliance scheme to a distributed
compliance approach with payment of a monetary
penalty for each  ton ot pollutant over the
overall stanaaro  may  lead to an increase in
emissions oy reducing tne manufacturers' uncer-
tainty, even though emissions overall remain
uncer  the statutory standard.
    The enforcement or buoDie regulations may
cost more than would oe tne case tor simpler
alternatives.  This is true because or the com-
plexity ot sampling and measurement and tne
administrative machinery needed to carry out
enforcement.   where the buoole regulation pro-
vides  significant oenetits to the industry in
tne torm ot flexibility, out costs more to ad-
minister, tne  question arises as to whether tne
Agency or the  industry should Dear tne cost. An
interesting example ot the working out ot these
problems can oe seen  in a groundoreaning regu-
lation tor  heavy-duty engine emissions negotia-
ted between the Agency ano various interested
parties.  where a small manufacturer finds the
number or tests required by the Agency to estao-
iisn a family's emissions  level too Durdensome,
tne firm may eiect a  sampling approach tnat uses
rewe:  tests.   Tne nsK to the environment is
neia constant,  leading tc higher risk of having
 to  pay unmerited  non-compilance  penalties  in  ex-
 change tor  the smaller  sample.
    DistriDuted compliance  systems  that  sounded
 wonderful when oeing discussed in  theory oy
 policy maxers and  economists  may contribute  to
 the development of ulcers  oy  the Agency's  legai
 fraternity.  The  very complexity ot  these
 schemes may become a major proolem  in court,
 where the violator can  ta*e pot-snots at the
 reasonableness of  tne regulation and seeK  refuge
 in  the loopholes  tnat are  the unintended con-
 sequence ot complexity.  The  statistical aspects
 or  the design of  the regulation  are  put to a
 severe test as tne violator's attorneys and
 consultants question the Agency's  proof that
 statistical assumptions were  met or  question  tne
 appropriateness ot the  methods chosen.  Where
 compliance  is distributed  among  different  firms,
 major difficulties may  arise  over  the fixing  or
 responsibility for a vioiation--a  prooiem  tnat
 may be unlikely to occur with a  simpier com-
 pllance scheme.
               me case ot  i eac

history ano background

   Lead compounds were first used  in  gasoiine  in
the 1920s to boost octane.  Tne effects ot  lead
on octane can  be seen  in  the sample response
curve, Figure  1.  While this curve is dirterent
for different  base gasolines,  its  essential
feature is a declining octane oenent per unit
or lead as the  total lead  concentration in-
creases.  The  nature of this curve creates an
incentive for  refiners to  spread the  amount or
lead they are  allowed to use as evenly as possi-
cie over the gallons of leaded gaso1 me-pro-
duced.  In addition to increasing  octane rating,
lead compounds  provide some protection trom
valve wear to  older engines designed  with soft
vaive seats.    This vaive protection is proviceo
by relatively  tow concentrations 01 leao com-
pared to tne more than two grams per  leaoeo
gallon (gpig>  once used in leaded  gasoiine for
octane reasons.
   As mentioned earlier,   lead in gasoiine was
first regulated in Ia7& ooth to reduce lead ror
health reasons  and to provide ror  avalia&iiity
01 unieaded gasoline.   Tougher standards tor
automotive emissions of caroon monoxide tCuJ ano
hydrocarbons  iHC; led auto maxers  to  turn to
catalytic converters as control devices.  wideiy
used rirst in  1»75,  these devices are very sen-
sitive to poisoning oy lead, phosphorus, ana
other metallic  substances.

Types of refineries

   Tne refining industry grew up with the auto-
mobile and is  thus a relatively old industry.
Refineries are  technologically stratified Dy age
based upon tne  levei  ot technology wnen they
were constructed.  Tne geographical development
ot the industry has tenoeo to fol low concen-
trations or population.  Thus the older reri-
neries tend to be located  in the East.  Newer
refineries teno to be located near emerging
centers ot  population and  more recently  oevei-
opeo sources  ot crude 01 ; .  These newer  laciii-
                                                   113

-------
ties,  incorporating more recent technology,  teno
to oe locatea on tne Uest Coast.
   As one mignt expect,  refineries aiso vary
considerably in size.   Figure 2 snows something
or tne size Distribution ot  tne inaustry.   A
substantial number ot  these snail  rerinenes
togetner proauce oniy  a small part 01 tne totai
gasoiine supply.  In certain markets, these
smaii  facilities may play an important role aue
to nign transportation costs from areas ynere
yarger ana more efficient refineries are locatea.

Tne lead buboies

   Quarterly averaging.  The rirst bubble or
averaging approach usea in regulating gasoiine
ieaa emergea a most unconscious ly- in the process
ot selecting an eriicient way to monitor con-
pi lance,  since continuous monitoring of each
refinery's output was  not practical, and since
requiring that each gallon or gasoline must meet
a stanoara was very inflexible  from the indus-
try's standpoint, the  first regulations pre-
scrioeo a compliance period during which tne
average concentration  or Ieaa couia not exceed
tne standard.  Tne selection or a caienaar quai -
ter represents a compromise between environmen-
tai  concerns ano tne industry's neec tor tiexi-
oiiity.  The dimension tor this bubble, then,  is
time.   Tne relatively  hign concentrations dic-
tate & snort time span in oraer to protect
pub,;c health.   Tne integrating media are tne
air and soii from vruch lead emitted in automo-
bile exhaust is taken  into the  human oody.  Tne
environmental  concerns regaraing  the use or the
quarter are mitigated  oy the fact that the gaso-
line distribution system tends  to mix gasoiine
from aitterent producers in the marketplace, ana
the air ano soil smooth out, over the course
quarter, tne intensity of human exposure.

   Tracing.  The second buboie  occurred in a
more deliberate fashion with regulations that
oecame effective in iate 1962 and eariy 1963.
These regulations shitted tne basis of the stan-
aara and introduced a system ot trading in  .ead
usage rights.  The stanaaro was cnanged from one
pertaining to  a  refinery's pooied gasoiine out-
put tunieaaea  ana  leaded considered  together/  to
a standard applied strictly  to  ieaaea gasoiine.
Tne original regulation purposefully encouraged
tne increaseo  production or unleaded gasoline  as
this product was new to the market.  By 196^..
unieaaea gasoiine  had become a  permanent  tix-
ture.  The change  to oase the standard on  ieaaea
gasoiine oniy  was made so tnat  the  totai amount
ot  ieaa  in gasoiine would aeciine with trie per-
centage  of gasoline demand that was  leaaed.
under tne older  pooiea standard the  amount of
 ieaa per  leaded  gaiion could increase as  the
percentage of  leaded aeclined,  resulting  in a
Siower  decline in  totai  lead use.
   Accompanied  by a tightening  of standards ana
a phasecut or  special  small  refinery  stanaaras,
tne tracing  system provioea  tor an  improvement
ir.  tne  ai.ocatior.  ot  Ieaa usage among  refine-
ries.  Tnis  was  aone oy permitting  refineries
wnich  neeaea  iess  ieaa  than  the standard  aliowea
to seii  tneir  excess to other  iess  technologi-
cally  aovanced refineries.   Thus  n  modern  raci-
iity capaoie ot  producing  Ieaaea  gasoline  com-
tortaniy at 0.70 gplg could seii the proouct ot
its leaaed gailonage and tne dirierence between
that concentration and the standard or 1.10 gpig
to one or  more other refineries which found it
necessary  to use more than 1.10 gpig in tneir
leaaed gasoiine. Sucn transactions were required
to occur during the compliance period in ques-
tion ana could occur either within corporate
bounaaries or across them.
   Without changing the time dimension,  trading
extended the oubole or distributed compliance
system for lead into the dimension ot space.
Incurring  no more transportation costs than tne
price of a stamp,  a refinery or importer in New
jersey could purchase the  right to use ieaa tnat
was not needed by a refinery or importer in
Oregon and thereby  legitimize actual lead use
tnat was over the standard.  The integrating
media were essentially the same as for quarterly
averaging, but greater reliance was placed upon
the homogenizing effects ot the distribution
system to  avoid tne development of "hot spots".

   Banking.  Responding to a mounting body ot
evidence on tne negative health etrects ot  leac
and to the problem  or increasea conventional
pollutants from Iead-poisonea emission contro,
systems, tne Agency took further action on  ieaa
in eariy 1965.  As  shown in Figure 3, the re-
suiting regulations reduced the allowable  ieaa
concentration by 91% in two stages (from 1.10
gpig to 0.50 gplg on July  1, 1965, and from 0.50
gpig to O.lo gpig on January 1, 1966;.  Tnis
sharp tightening ot the standard tor  lead was
accompanied by a system of banking whicn effect-
ively extended  tne  lead bubble  over a much
longer time span than the  calendar quarter  that
was previously  allowed.
   Under the banking provisions a  refiner was
allowed to store away in a bank account  the
ditrerence oetween  the standard ana either u. 10
gpig or actual  Ieaa usage, whichever was  larger.
Sucn accumulation of rights was permittea curing
the lour quarters ot calendar  1985.  The oankea
lead rights were to be available for use or
transfer  to another refiner or  importer  curing
any future quarter  through 1567.   Tnus  ieaa
rights  toregone During 1965 coula  be used  to
meet the sharply tignter 0.10  gpig stanaard
during  1966 and 1967 after which any  remaining
rights expire.  The 0.10 actual  lead use  limi-
tation on  rights accumulation  was  intended  to
avoid any  incentive for refiners to use  iess
than u.10  gplg  in  leaded  gasoline,  since  this
was the level believed sufficient  to protect  tne
vaives  of  some  oiaer engines from  excessive
wear.
    The  Agency's predictions ot  probaole  refiner
behavior  when given tne flexibility ot  banking
are shown  in  Figure «,  in  wnich the  concentra-
tions from Figure 3 are weighted by estimates
of  leadeo  gaiionage.  The  shaoeo areas  during
196S  represent  the  extent  to which Agency  econo-
mists expected  reiineries- to  lower lead  concen-
trations  in oraer  to oanx  lead  rights  tor  later
use.  The  snaoed areas  tartner  to  tne  right  show
tne aitterence  between  the expected  concentra-
tions and  the  stanaard  during  the  1966-1967
period  when  the bankea  rights  could  be  usea  to
supplement  the  O.lo gpig  allowed  under  the  stan-
aara.   As  the  tigure snows, the Agency  expectea
                                                   114

-------
 only  partial  use  or  banking  in  the  tirst  quarter
 or  1985 aue to  the time  required  for  refineries
 to  revise  their planning  horizons unaer  the  new
 regulations.  The heaviest banking  was expected
 to  occur in the second quarter  as refineries
 were  aDle  to  take full advantage  of the  regula-
 tion.  The  third  and  fourth  quarters  were  ex-
 pected to  show only slight oanking  due to  the
 55* reduction in  the  standard to  0.50 gpig.
 Predictions for the 1966-1967 period  snow  de-
 ciining lead  use  in the  second  year as addi-
 tional octane generation  capacity was expected
 to  corae into  service  in  anticipation  of  the  0.10
 standard without  banking.
    This rinai step in extending a system  or
 distributed compl  iance--a bubble — to  cover lead
 in  gasoline completed what was  started by  the
 decision to use quarters  as  compliance periods,
 greatly extending on  a temporary  basis the time
 span  over  which rerineries could  demonstrate
 compliance.   Coupled  with the trading provisions
 to  provide lor distriDution  over  the space di-
 mension,  the  package  provided the industry with
 a very suostantial degree of flexibility  in
 meeting a  standard which  public health neeos
 required to oe as stringent  as  possible.    The
 oanking ano trading together provided tor  an
 orderly adaptation by the more obsolete facili-
 ties, providing them  with the time  necessary to
 instaii new equipment.

 now well  it worxed

    Use or  banking and trading.  From  the  very
 oeginning 01 tne  trading  provisions in 196o,
 oetween one tirth ana one third or  the reporting
 facilities found  it either necessary or desira-
 ble to purchase lead  rights  for use in demon-
 strating  compliance with  the regulations.  Tne
 amount or  lead involved  in these  transactions
 was at first small,  amounting to about 7* or  tne
 totai lead used.  By  the  end of 196«  this  figure
 had ciimoed to 20*.
   The trading provisions of the  regulation
 unintentionally permitted facilities blending
 aiconoi into  leaded gasoline to claim and  sell
 leao  rights based  updn their activity.  These
 facilities, frequently little more  than  large
 service stations,  generated  leao rights in tne
 amount of  the product of  the 1.10 standard ana
 tne nuaioer  or  gallons of alcohol they blended.
 Both  the lead and  the gallons or  leaaed gasoline
 into  which tne aicohol was bienaed had aireaay
 oeen  reported by others.  While these alcohol
 Djenoers  increased sharply in numoer starting in
 the second quarter or 1984,   their activities
 generated only a small amount or  lead rights.
 This  appearance 01 a new  "industry" as an unex-
pected consequence of tne regulation should
 remind the statistician or analyst that "ceteris
parious"  is not always the case.  Even with all
 the avaiia&le information aoout tne regulated
 industry  to analyze,  all  else will not be equal
 since the regulation itself   wi1i cause pertur-
oations,  such  as tne  new and previously non-
existent  ciass or  blender "refiners".
   Tne oanking program provided a great deal  or
 tiexioility to the industry,  and accordingly  was
neaviiy used from  its outset in the first
 quarter or  1965, even though the regulations
were not  made  rinal  until after  tne end of the
quarter.  About halt of the entities reporting
to the Agency made deposits in that first
quarter, and tne industry held the actual lead
concentration to 0.70 gpig--lower than tne
Agency had predicted--thus banking more  lead
rights than expected.  Along with the oanking
came a sharp increase in trading activity.  The
lead rights,  because they no longer expired at
the end or each quarter, were worth more and
were traded in a more rational  market where
sellers had more time to seek out buyers and
where brokers arose to place buyers and sellers
in touch with each other.  The higher price of
Ieaa rights led to an explosion in the nuaoer or
alcohol blenders.   Major refiners' facilities,
which were previously not motivated to buy or
even sell lead rights, began to bank and traae
aggressively,  stocking up rights for use in the
1966-1967 transition period at the new more
stringent standard of 0.10 gpig.
   Figures 5 and 6 show the leao use outcome of
banking and trading compared to the standards
and Agency predictions at the time the standards
were promulgated.   Figure 5 shows concentrations
wh_ile figure 6 introduces leaded gallonage.   The
early and vigorous banking reduced concentra-
tions to a lower level than expected, and sub-
stantial oanking continued to occur on into tne
second half of the year under a naIr gram stan-
dard.   Actual  lead use,  as figure 6 shows, was
higher than predicted in both the secona and
third quarters as  a result of higher than anti-
cipated leaded gasoline usage.   In all,  1965
endea with a net collective bank oaiance in
excess or ten billion grams.
   The first quarter of 1986 saw lead rights
leaving the bank at about the rate that  the
Agency had predicted.- The second quarter caused
some alarm with a sharp drain on the bank oying
to tne unusually nigh leaded galionage at a
substantially higher concentration,  u.*»u gpig,
than predicted.   As Figures 5 and 6 show,
though, this ear ly drain was partially otfset oy
lower than expected usage in tne rourtn quarter.

   The environmental effect of the regulation
has been an unusually sharp and rapid decrease
in a major pollutant, one that health studies
indicate may be more aangerous at lower con-
centrations and to a broader segment or  tne
numan population than used to oe Delieved.  The
oanking and tracing appear to have done pre-
cisely what they were intended and expected to,
trading orr lead use  lower than the standard in
1985 against higher use in 1966-1967 with a
total  lead use over the period about the same as
if the standards had been rigidly held to.  It
may be the case that a lead reduction this
severe could not have been achieved without the
distributed compliance approach that was useo.
It is certainly true that a transition to lower
standards was achieved with greatly reduced
economic impact.

   Administration and entorcement.  The Banking
and trading regulations were conceived with
every intent that the Agency could keep a low
profile and let market mechanisms ao most or the
work.   While this  was achieved to a suostantiai
degree,  the need to ensure compliance involved
the Agency in processing more paperwork  than the
                                                   115

-------
draiters of the regulations anticipated.  It is
prooabiy worthwhile to examine briefly how this
Happened.
   The rlood of alcohol blenders swelling the
ranks of the reporting population was not expec-
ted.   Blenders had first come onto the scene
with  the trading provisions.  By the end of 1964
they  numbered something over a hundred, selling
smalt amounts of lead credits,  generated during
the quarter, to small and/or obsolete refineries
which were not otherwise'able to neet the 1.10
gplg  standard.  In the first quarter of 1965
well  over  200 additional blenders reported,
drawn by the prospect of either immediately
selling their lead usage rights at the sharply
higher prices that prevailed with banking or re-
taining them and speculating on t-he price.   As
the word of this opportunity spread among dis-
trioutors  and service station chains, the pop-
ulation of these "refineries" exploded, reaching
more  tnan  600 by the third quarter of 1965 and
pushing the reporting population above &00.
   The numbers themselves would not have been
such  a problem for the Agency if all of the
reports had been made correctly.  The blenders,
though, were new to this business.  They didn't
understand tne regulations, and they lacked the
accounting and legal departments which usually
nandled reporting for large refineries. Tne nost
common error made by the blenders was to attempt
to bank and immediately seil to another refiner
lead  rights that could not  legitimately be
claimed.   This frequently took the fora of
simply multiplying the alcohol  gallonage by the
standard il.lO or  0.50 gpig, depending on the
quarter),  ignoring the restriction mentioned
eariier that lead lights could be banked only on
foregone lead usage above 0.10 gplg.  By the
time  the blender filed a report and his error
was detected by the Agency's computer,  the
rignts naa already been sold to another party
ano pernaps resoid or used.  In addition to the
obvious legal  tangle caused by this, there was
the instability or the blender  population--the
party responsible for the improperly generateo
rights could not always be found.
   Tne enrorcement machinery developed oy the
Agency to  nanaie lead phasedown was shaped by
certain reasonable expectations about the re-
porting popuiation--scaIe or operations, number
or reporting entities, relative sophistication,
etc.   Trie  blenders did not fit these expec-
tations,  and the enforcement process developed
considerable congestion until some adaptation
could take place.   The computer system developed
to audit reports and especially to match up the
parties in lead rights transfers did precisely
wnat  it was designed to do and generated thick
stacks of  error output where only a few errors
hao been expected.  The further processing of
the errors hao to be done manually and required
Clerical and legal staffing at a level  that was
not anticipated.   By the time these resources
were  increased to tne appropriate levels the
oackiog 07 errors was substantial and the tne
elapsed since the filing of the original reports
maoe  sorting things out more difficult.
   A  further illustration of how the crystal
bail  can rail  is found in the dirrerence between
true  relineries ana the blenders in scale of
operations.   True refineries deal in such large
quantities of gasoline and lead that for con-
venience all  ol  the report forms used thousands
of gallons and kilograms of  lead as units.  To
report in smaller units would be to claim a
degree of precision lacking  in the basic inior-
mation available to the refineries' accounting
departments.   The effect of  rounding to thou-
sands, trivial to larger refineries,  was defi-
nitely not trivial to the blenders, many or whoro
only blended  a thousand gallons ot alcohoi in a
quarter.  The blenders used  whatever units
optimized their  profit with  a fine disregard for
the proper placement ot decimal points.  where
their gallonage  was,  say,  1,600 gallons, they
would take advantage ot the  rounding instruc-
tions on the  form to claim credits based upon 2
units of a thousand gallons  each.  If the amount
was 1,400, they  would report in gallons rather
tnan thousands of gallons, often without
labelling the units or putting a decimal point
in the correct position.
   All ot these  difficulties of enforcement
logistics came into being as a result ot the
complexity ot the bubble or  distributed com-
pliance system.   With a simple set of rigid
standards there  would have been no blenders.
Fortunately,  this was a case where the environ-
ment suffered almost no harm as a result 01 tne
unforeseen consequences ot tne regulations,
however embarrassing the situation may nave been
to Agency managers. This was probably mostly
good luck, and should not be counted upon to
happen routinely.

Legai Considerations

   The statistician frequently tinds himseii
witn a weii-thought-out concept for a procedure
only to be raced with complications in tne
implementation scheme.  Banking and trading
proved no exception to this  problem.   The idea
ot free trade of lead rignts between parties in
order to increase flexibility of each refinery's
planning was  too good to resist.  Tne government
even took great  pains to stay at "arms distance"
in tne trading process.   Prior experience with
the Department of Energy's entitlements program,
in which the  Federal  government establisneo
tormula upon  formula to assure that every reri-
nery got its  "fair share" demonstrated that the
Fedeial government was not the best "broxer in
the refinery  industry!  In this case the EPA was
staying out ot the business.
   So, what could go wrong?  Since lead rignts
are valuable, there is an incentive to cneat.
The value ot  lead rights rose from 3/4 of a
penny to slightly over 4 cents per gram or lead.
Trading and banking transactions are frequently
in the order  ot  25 to 50 million grams.  Thus
the dollar amounts are in the $1 to *2 million
dollar vicinity.  Consequently, monitoring ana
entorcement become major issues.  Monitoring and
its requirement  tor extra personnel and computer
usage has already been discussed.  Entorcement
and the  legal considerations are another matter.
Prior to banking ana trading, the regulations
were applied  on a retinery by refinery oasis and
entorcement was  a fairly straightforward matter.
Under banking and trading the host or possible
violations increased exponentially.  The types
ot violations included trading rights that were
                                                  116

-------
improperly generated,  selling the same rights
twice,  ana banking rignts lor a ruture quarter
mat were in tact required for trie current
quarter's compliance.   Any or these trans-
gressions, of course,  may have ramifications lot
the cuyers ol such lead rights.  The situation
Decomes very complex trom an entorcesent stand-
point since frequently rignts are soid to an
intermeaiary wno resells them.  Ir tne original
rignts were bogus, or partly Bogus, wno among
ail tne recipients has good rights and who has
bad ones:  These are not like counterteit bills;
they are entirely fungible,  and determining it a
particular right is legitimate can be a night-
mare.   Since banking lasts over several time
periods, bogus rights can be exchanged irequent-
ly, and tracing the source of the -bad rights can
be next to impossible.  Further, what action,  it
any, should oe taken against the gooo faith
purchaser or such lead rights?  This last ques-
tion subdivides into possioie different actions
depending upon whether the purchaser just de-
posits the rights into his account or,  alterna-
tively, actuary uses them berore they are
discovered to be bogus.  The possibilities seem
end less '
   An interesting sidelight to these difficul-
ties is that it is frequently a smacl refiner
with smail amounts of rights that causes the
difficulty.  More eftort is expenoeo to chase
snail  infractions than can be imagined, and
enforcement policies designed for use witn a
sma:i  number or large violators prove awkward
ana unwieidy when dealing with a large number of
smail  violators.  A second side effect, though
no fault or the designer of the regulation, is
that many refineries find thenseives bankrupt  in
today's oil industry.   Chasing after lead rights
of a oannrupt concern is generally tar iess than
IruitIui.
   Nevertne1 ess, the system has rared remarkaoiy
weii.   uver ten billion grams of iead rights
were banned, roughiy two year's worth,  and no
one is asking tor government intervention to
maxe ieaa rights trading run more smootniy.
nowever, tne point to be raaoe is that tne sta-
tistician can iii-atrord to wash his hanas or
the problems  involved  in  day-to-day  implemen-
tation anc  enforcement  or  the  requlations.   Me
must guard  against being  the party who  suggesteo
the program and  then waixea away  wnen  some  as-
pect didn't worx as planned.

Conclusions
   We have tried to provioe in tnis  paper an
analytic framework tor  understanding the set or
compliance management mechanisms  looseiy classi-
fied as "bubbles".   we  nave seen  something ot
the attractive features ot such approaches,
especially trom the standpoint ot the economic
flexibility which they may make possible, but
nave also seen some ot  the ways in wrncn things
may go otherwise than as the drafters or the
regulations intended.   The lead phasedown
banking and trading system was used  to  illus-
trate some ot  the concepts presented, even
though the statistical  problems in this regula-
tion were iess extensive than those  with some
other bubble regulations.
   Distributed compliance schemes are fasci-
nating to economists,  and they are attractive to
higher Agency  managers  from other professional
backgrounds because of  their potential  to biunt
the resistance to needed environmental  regula-
tion and sugarcoat the regulatory piil.  The
statistician must nave  a piace in the develop-
ment of these  regulations — the questions or
measurement, estimation, and uncertainty triat
a/e frequently involved demand it.   The proper
roie of the statistician is not oust that ot
picking up the pieces after things oegin to  go
wrong  in implementation.  Neither is it to be a
nit-picking nay-sayer whose business is to teli
people why "you can't get there from here".
hather the statistician's role should be an
affirmative one — that ot a full partner in tne
regulation development  process.   As  such,
members or the profession must not only serve in
the critical role of  assuring a regulation's
scientific integrity  (and therefore  its enrorce-
ability*  but must also   lend their creativity ana
special insights to the fundamental   design or
the regulation's compliance system,   finding ways
to do  things where others, perhaps,   cannot.
                                                   117

-------
                             Figure  1
            GasolinQ  octane enhancement from  lead
                       antiknock  compounds
  Octane number
      100 r
      95 I-
      90
                0.5     1.0     1.5     2.0     2.5
                      Grams of lead per gallon
3.0
                             Figure 2
            Cumulative percentage of  total  gasoline
            production by refinery  size percent!le*
Percentage of total gasoline
      100 r
      80 -
                    Size percentile of refineries
•Lua-tar III. ]9B3
                                118

-------
                                  Figure  3
           Standards  and predicted*  lead  concentrations
                        under banking  and trading
Grams par leaded gallon
                                                                  Standard in offset
                                                                  Predicted by model
             12341234123412341234
•Costs and Benefits of Roducjng Load
in Gasoline.  Feb..  1985.  p.  11-53.
                                   Figure 4
                           Lead  usage  predicted
                   with  and without  banking program
 Billion grams  of  lead      st«to-* lonk.ns
                  tOTHIng no-it  u O.JO SU. u
           r
           t-
         10 -
                                                                  Grams predicted
                                                                  by trie standard
                                                                   Grams predicted
                                                                    by  the model
                                                                    Load banned
                                                                   for future use
             12341234123412341234
Looo«c gollenojs for 18BS »id Jot»r
• 1th th« o*cuHptlon o^ BOX reduction in Blcfucllng.  EorJlcr
or-* frof p. 11-63 of tnc abovi docu>wnt.
                             tahan fro« Co»t* ami B«n«fltt of Raduelng Load in
                                                 er* actual.  Pr«dlct«d cancttn
                                                                    .  F»b. .  19E5.
                                                                    trationc
                                       119

-------
                                   Figure  5
            Predicted* and  actual  lead  concentrations
                          under  banking  program
Grams per leaded  gallon
•Costs and Benefits of Raaucing Lead
in Gasoline.  Feb.. 1965.  p.  11-63.
                                                                Predicted by model
                                                                     Actual
                                  Figure  6
                    Predicted and actual  lead usage
                          with banking program*
  Billion arams of lead
                          steiBIa u Bvfclng •«•
                                               N0 trertir^
                                                               Crams predicted
                                                                by the model
                                                              Actua] ]ead u
             12341234123412341234
  •Predicted lead ucag* it the mama of In flyjre 4 and it based upon the Agancy'c
  predlrted leaded gollonoos.  Actua] aollonoge •« htarier than predicted.
                                      120

-------
                                      DISCUSSION
                                   N. Phillip Ross
                          US Environmental  Protection Agency
   The concept of bubbles is intrigu-
ing; an umbrella under which trades can
be made which enable regulated indus-
tries within the bubble to meet
environmental standards—standards that
they otherwise may not have been able to
satisfy.  This paper describes such a
bubble; an umbrella of time for compli-
ance with lead in gasoline standards.

   The idea has logical appeal.  Unfor-
tunately, the world in which it is
implemented is not always as logical.
There is an implicit concept of
uniformity that underlies the ideas of
trading and banking.  It's okay to have
high levels of pollutants as long as you
balance them against low levels either
at a later point in time or by purchas-
ing "credits."  Although the "average"
levels of the pollutant within the bub-
ble's boundaries may be at or below the
EPA standard, there will be many points
within the bubble where levels are well
above the standard.  From a public
health point of view, this may not be
desirable.  It eventually translates
into periods when the population at risk
will receive exposures to levels greater
than the standard.

   As pointed out by the authors, a
major advantage to- use of the bubble in
the case of lead in gasoline was that
many refiners and blenders who could not
immediately meet the standard were able
to continue operations through the pur-
chase of credits.  Indeed, imposition of
the standard on many of these companies
may well have forced them out of busi-
ness.  This is not a minor concern.
Enforcement of environmental standards
is exceptionally difficult.  The
regulated industry must be willing to
cooperate through voluntary compli-
 ance.   The  bubble  approach,  even  under
 conditions  of  non-uniformity,  provides
 the  needed  incentives  to  encourage
 voluntary compliance.   Environmental
 standards which  cause  major  economic
 hardship for the regulated industry
 will be difficult  to enforce.   Federal
 enforcement resources  are minimal.
 Lack of a substantial  enforcement
 presence could result  in  greater  pol-
 lution  through noncompliance.   Even
 though  the  real  world  does not always
 conform to  the basic assumption of the
 bubble  model,  the  real world will use
 the  approach to  achieve an overall
 reductions  in  pollution.

   The  lead bubble was very  successful.
 As the  authors have pointed  out,  there
.were problems; however, overall the
 levels  of lead in  gasoline did go down
 rapidly.  This probably would  not have
 happened under the more traditional
 approach to enforcement.

   I agree  with  the author's conclusion
 that statisticians must learn  to  play
 a greater role in  developing the
 strategies  and in  "finding ways to do
 things  where others, perhaps,  cannot."
 Statistical thinking involves  the
 consideration  of uncertainty in
 decisionmaking.  All problems  cannot be
 solved  statistically;  however,  statisti-
 cal  thinking can help  solve  problems.
 Statisticians  need to  realize  that their
 roles are not  limited  to  the design or
 analysis components of a  study.  They
 have a  role to play in the process of
 regulation  development and in  the
 development of new an  innovative  ways
 to deal with enforcement  and compliance
 problems—ways which are  not necessarily
 based on mathematically tractable
 assumptions.
                                          121

-------
                         VARIABLE  SAMPLING SCHEDULES  TO  DETERMINE  PM10  STATUS
                                  Neil  H.  Frank  and Thomas C. Curran
                U.S. Environmental  Protection  Agency,  Research Triangle Park,  NC  27711
Introduction
   In April 1971, EPA set National
Ambient Air Quality  Standards  (NAAQS)
for participate matter (PM)  and  five
other air pollutants - nitrogen  dioxide,
sulfur oxioes, carbon dioxide,  hydrocar-
bons, and photochemical  oxidants.1
There are two types  of NAAQS:  primary
standards designed to protect  human
health and secondary standards  designed
to protect public welfare.  In  recent
years, the standard  for hydrocarbons has
been rescinded and standards for an
additional pollutant, lead,  have been
added.  The reference method for measur-
i ny attainment of the PM standards pro-
mulgated in 1971 was the "high-volune"
sampler, which collects PM up  to a nominal
size of 25 to 45 micrometers (urn).   This
measure of PM was called "Total  Suspended
Particulate (TSP)" and was the indicator
for the 1971 PM standards.  The  primary
(health-related) standards set in 1971
for particulate matter (measured as  TSP)
were 2faU ug/m3, averaged over  a  period
of 24-hours and not to be exceeded more
tnan once per year,  and 75 ug/rr3 annual
geometric mean.  The secondary (welfare-
related) standard set in 1971  (measured
as TSP) was 150 ug/m3, averaged over a
period of 24 hours and not to  be exceeded
more than once per year.
   The gaseous NAAQS pollutants  including
carbon monoxide, nitrogen dioxide, ozone,
and sulfur dioxide,  are sampled  with
instruments which operate continuously,
producing data for each hour of  the  year.
This data is subsequently processed  into
various statistical  indicators necessary
to judye  air quality status and attain-
ment witn their respective standards.
Lead anc TSP are NAAQS pollutants sampled
on an intermittent basis.  For these
pollutants, one integrated 24-hour  mea-
surement is typically scheduled every
sixth day.  This is designed to produce
measurements which are representative  of
every day of the week and season of  the
year.  This approach has been  shown  to be
useful in  producing unbiased estimates of
quarterly and annual average air quality,
out  nas various limitations regarding
estimation of peak air quality values.
One shortcoming of concern was that
attainment of the short-term 260 ug/m3
TSP standard could be judged using  data
typically collected every sixth day  and
there was no specified adjustment for  the
effect of incomplete  sampling.  This was
recognized as a problem in the early
197u's.   If the second  highest observed
TSP measurement was less  than 260 ug/m3,
the primary health related standard  was
juaged as being attained.  These stan-
dards were termed "deterministic."
   Pursuant to the requirements of the
1977 amendments to the Clean Air Act,  EPA
has reviewed new scientific and technical
data and has promulgated substantial
revisions to the particulate matter
standards.2>3  The review identified  the
need to focus from larger, total parti-
cles to smaller, inhalable particles  that
are more damaging to human health.  The
TSP indicator for particulate matter  has
therefore,  been replaced with a new
indicator called PM^o that only includes
those particles with an aerodynamic
diameter smaller than or equal  to a
nominal 10  micrometers.  A 24-hour
concentration of 150 ug/rn^ levels was
selected to provide a wide margin
of  safety against exposure which is asso-
ciated with increased mortality and
aggravation of respiratory illness; an
annual average concentration of 5U ug/m3
was selected to provide a reasonable
margin of safety against long-term degra-
dation in lung function.  The secondary
standards were set at the same  levels  to
protect against welfare effects.  The
EPA review also noted that tne  relative
protection provided by  the previous short-
term PM standards varied significantly
with the frequency of sampling.  This
was identified as a flaw in  both  the
form of the earlier TSP standard and the
associated monitoring requiremants.
Following the  recommendations of the EPA
staff  review,  the interaction between
the form of the standard and alternative
monitoring  requirements was  considered
in  developing  the recently promulgated  PM
standards.
Form of  the New PMm Standards
    The new standards  for particulate
matter  are  stated  in  terms of  a statis-
tical  form.   The  24-hour standards were
changed  from  a concentration level not to
be  exceeded more  than once per  year to
a  concentration  level  not  to have more
than one expected  exceedance per year.
This  form  corresponds  to  the one  promul-
gated  for  the revised ozone  standard  in
1979.4  The  annual  standards  were  changed
from  an  annual  average  concentration  not
to  be  exceeded  to  an  expected  annual
average concentration.   To  be more con-
sistent  with  pollutant  exposure,  the
annual  average statistic  was also changed
from  a  geometric  mean  to  an  arithmetic
mean.
    The attainment tests,  described  for
 the new expected  value  forms of the
 particulate matter standards, are
designed  to reduce the  effects of
year-to-year variability in pollutant
concentrations due to meteorology,
 and unusual  events.  For the new 24-hour
 PM standard,  an expected  annual
                                                  122

-------
number of exceedances would be estimated
from observed data to account for the
effects of incomplete sampling following
the precedents set for the ozone stan-
dard.  With averaging of annual arithme-
tic means ana estimated exceedances over
a multiple-year time period, the forms of
these standards will permit more accurate
indicators of air quality status and will
provide a more stable target for control
strategy development.
   The adjustments for incomplete data
and use of multi-year time periods are
significant improvements in the inter-
pretation of the particulate matter
standards.  These changes increase the
relative importance of the 24-hour stan-
dard and play an important role in the
development of the PM^o monitoring
strategy.  They also help to alleviate
tne implicit penalty under the old form
that was associated with more complete
data.  The review of alternative forms of
the 24-hour standards identified that
the ability to detect nonattainuent
situations improves with increasing
sample size.  This is true for the pre-
vious "deterministic" form and the
current statistical  form.  With the
new 24-hour attainment test, however,
there is a significant increase in the
probability of failing the attainment
test with incomplete data sets.  This
sets the stage for attainment sampling
strategies.
   Figure 1 presents the probability of
falling the 24-hour "attainment tests for
the new PM^Q NAAQS over a 3-year period.
These failure probabilities were based on:
(1) a constant 24-hour PM^g exceedance
probability from an underlying concentra-
tion frequency distribution with a speci-
fied characteristic high value (concen-
tration whose expected number of exceed-
ances per year is exactly one), and (2)
a binomial distribution of the number of
observed exceedances as a function of
sample size.  Lognormal distributions
with standard geometric deviations (sgd)
of 1.4 and 1.6 were chosen for this
illustration to represent typical  air
quality situations.  The approach used
in Figure 1 and throughout this paper
are similar to analyses presented else-
where. 5,6,7 This facilitates examining
properties of the proposed standard in
terms of the relative status of a site to
tne standard level  (e.g. 20 percent above
the standard or 10 percent below the
standard) and the number of sampling days
per year.  It is worth noting  that the
percent above or below the standard is
determined by the characteristic  high.
This is more indicative of the percent
control  requirements than using  the
expected exceedance rates.
   Sampling frequency was judged  to not
oe an important factor in the ability to
identify nonattainment situations for
eitner the current or previous annual
standards.  This is due to the generally
unbiased nature and small  statistical
variability of the annual  mean which is
used to judge attainment with this stan-
dard.  The change to an expected annual
mean form, however, would  tend to provide
better estimates of the long-term pol-
lutant behavior and provide a more stable
indicator of attainment status.
   With the new 24-hour attainment test,
one  important consequence of increased
failure probabilities is the potential
misclassification of true attainment
areas.  In Figure 1, it can be seen
that these Type I errors are generally
higher for small sample sizes, including
those typical of previous TSP monitoring.
This error is shown to be  as high as 0.22
for  a site which is 10 percent below the
standard and has a sampling frequency of
115  days per year.
   During the review of the standards,
it was recognized that the ideal approach
to evaluate air quality status would be
to employ everyday sampling.  This would
minimize the potential misclassi fication
error associated with the  new PM attain-
ment tests.  From Figure 1, it can be
seen that this would produce the desir-
able results of high failure probabili-
ties for nonattainment sites and low
failure probabilities for  attainment
sites.  Unfortunately, existing PM moni-
toring technology as well  as available
monitoring resources do not make it
convenient to monitor continuously
throughout the nation.  Moreover, while
more data is better than less, it may not
be necessary in all situations.  When we
revisit Figure 1, it can be seen that
when a site is considerably above or
below the standard, small  sample sizes
can  also produce reasonably correct
results with respect to attainment/
nonattainment decisions.  Thus, in order
to balance the ideal and the practical,  a
monitoring strategy was developed which
involves variable sampling schedules to
determine PM^y status and  attainment with
the new standards.
   The new strategy will permit most
locations to continue sampling once in  6
days for particulate matter.  Selected
locations will be required to operate
with systematic sampling schedules of
once in 2 days or every day.  With
approval of EPA Regional Office these
schedules may also vary quarterly depend-
i ny  on the local seasonal  behavior of
PM^U.  Schedules of once in 3 days were
not considered because of  the discon-
tinuity in failure probabilities occuring
at 115 sampling  days per year (95% data
capture), seen in Figure 1 and discussed
el sewhere.^i 1
Monitoring Strategy
   The previous  monitoring regulations
which applied  to particulate matter
specified that "at least one 24-hour
sample (is required) every 6 days except
                                                  123

-------
during periods or seasons exempted by
the Regional Administrator."8 The new
PM^j monitoring regulations would permit
monitoring  agencies to continue this sampling
frequency for PM^0 but would require them
to conduct more frequent PM^g sampling in
certain areas in order to estimate air
quality indicators more accurately for
control strategy development and to
provide more correct attainment/nonat-
tainment determinations.9  The change in
monitoring practice is largely required
to overcome the deficiency of existing
sampling frequency in detecting exceed-
ances of the 24-hour standard.  The
operating schedules proposed for the
measurement of PM^g will  consist of a
short-term and long-term monitoring plan.
The short-term monitoring plan will  be
based on the requirements and time
schedules set forth in the new PM^g
Implementation Regulations for revising
existing State Implementation Plans
(SIPs).^a  The requirements ensure that the
standards will be attained and properly
maintained in a timely fashion.  The
long-term requirements will depend on
PM}Q air quality status derived from future
PMig monitoring data.  These are designed
to ensure that adequate information is
produced to evaluate PMjg air quality
status and to ensure that the standards
are attained and subsequently maintained.
   Consistent with the new reference
sampling principle, available PM^j
instruments only produce one integrated
measurement during each 24-hour"period.
Multiple instruments operating with
timers, therefore are necessary to avoid
daily visits to a given location.   The
new standards, however, will permit
approval  of alternative "equivalent"
methods which include the use of contin-
uous analyzers.  'Because of the new
monitoring requirements, instrument
manufacturers are currently developing
such analyzers.  This will alleviate the
temporary burden associated with more
frequent monitoring.
Short-term Monitoring Plan
   The proposed first-year monitoning
requirements will be based on  the
requirements for revising SIPs.
Areas of the country have been clas-
sified into three groups, based upon the
likelihood  that they are not currently
attaining the PM^g standards as well as
other considerations of SIP adequacy.ll
Since PM^y monitoring is in the process
of being established nationwide and
is quite limited, a procedure was used
which estimated  the probability that each
area of the country would not attain the
new  standards  using existing TSP data in
combination with available PMig data.
Tnis is described elsewhere.12
   Areas have been classified as Group  I,
I! or  III.   Group  I areas  have been
judged to have a high probability,
p > 0.95, of  not being in  attainment with
the new standards.  Group II areas  have
been judged to be too close to call,  but
still very likely to violate the new
standards (0.20 _< p <0.95).  Group III
areas have been judged to be in attain-
ment (p <0.2U).
   For Group I areas, the value of  a
first year intensified PMjg data
collection is most important.   This
is because these areas are most likely
to require a revised SIP.  Since the
24-hour standard is expected to be
controlling, the development of control
strategies will require at least 1
complete year of representative data.
Consequently, everyday sampling for a
minimum of 1 year is required  for the
worst site in these areas in order  to
confirm a probable nonattainment status,
and to determine the degree of the
problem.
   The Group II category identifies
areas which may be nonattainment (out
whose air quality status is essentially
too close to call.)   For such areas,  the
value of additional  PM^g information  is
important in order to properly categorize
air quality status.  For these areas,
more intensified sampling is desirable.
Based on the consideration of cost, and
available monitoring resources, however,
a more practical strategy of sampling
once in 2 days at the worst site is
required for the first year of monitor-
ing.
   All remaining areas in the country
(defined in terms of p<0.20) have been
categorized Group III and judged not
likely to violate the new standards.   For
such areas, the value of collecting more
than a minimum amount of PMm data  is
relatively low and intensified PM^g data
collection is not warranted.  Recognizing
tnat there is still  a small chance of
being nonattainment, however,  a minimum
sampling program is still required  at
these locations.  Based on considerations
of failing the 24-hour attainment
test and estimating  an annual  mean  value,
a minimum sampling frequency of once  in 6
days is required.
   The short-term strategy also contains
previsions for monitoring to be inten-
sified to everyday at the site of ex-
pected maximum concentration if exceed-
dances of the 24-hour standard are
measured during the first year of moni-
toring.  This is intended to reduce the
potential for nonattai nment mi sclas-
sification (type I error) with the 24-hour
PMm attainment test.  With this provision,
the first observed exceedance is not
adjusted for incomplete sampling and  is
assumed to be the only true exceedance at
that location during the calendar quarter
in which it occurred.  The effect on
misclassification error associated  with a
3-year attainment test is illustrated in
Figure 2.  It can be seen that the  sites
most vulnerable to this error are slightly
                                                  124

-------
 less than the standard.  In these com-
 parisons, for sites whicn are 1U percent
 less than the standard and are sampling
 once in 2 days, the type I error is
 reduced from 6 percent to 1 percent.  If
 tnese  same  sites are sampling once in 6
 days,  the type I error is similarly
 reduced from 12 percent to 0.5 percent.
 There  is, however, a corresponding
 increase in the type II error associated
 with the attainment test for true nonat-
 tainment sites also close to the stan-
 dard.  This compromise was judged to be
 appropriate in developing the new rules.
 Long-term Monitoring Plan
   Tne long-term monitoring plan starts
 with the second year of sampling.'  The
 required sampling frequencies are based
 on an  analysis of the ratio of measured
 PM^Q concentrations to the controlling
 PMig standard.  This determination depends
 upon an assessment of (1) whether the
 annual or 24-hour standard is controlling
 and, if it  is the latter, (2) the
 magnitude of the 24-hour PM^o problem.
 Both items are evaluated in terms of the
 air quality statistic called the design
 concentration.  For the annual  standard,
 tne design concentration is the expected
 annual mean; for the 24-hour standard,
 tie design concentration is the
 characteristic high value whose
 expected exceedance rate is once per
 year.  In both cases the design
 concentration is the value the control
 strategy must be capable of reducing to
 tne level of the standard in order to
 achieve attainment.  The ratio to the
 standard is defined in terms of the
 design concentrations and the standard
 level; tne controlling standard is simply
 the standard which has the highest ratio.
 Trns is a somewhat simplified definition
 but is adequate for present purposes.
   Tne long-term strategy specifies
 frequencies of every day, every other
 aay, or every sixtn day.  Tne long-term
 monitoring strategy is designed to
 optimize monitoring resources and
 maximize information concerning attain-
 ment status.  As with tne short-term
 strategy, the increased  sampling  fre-
 quency provisions only apply to the
 site with expected maximum concentra-
 tion in each monitoring  area.
   For tnose areas  where the annual
 standard is controlling, 1 in 6 day
monitoring  would be required; this
 frequency has been judged to be adequate
 for assessing  status  with respect to
 tnis standard.  For those areas where the
24-hour standard is controlling,  the
 required minimum sampling frequency for
 the calendar year will  vary according to
 the relative level  of the most current
maximum concentration site to the level
 of the standard.  In other words, the
 sampling  requirement applies  to the site
wrnch drives attainment/nonattainment
status for  the monitoring  area.   The
 least frequent monitoring  (1 i n  6  days)
 would be  required  for  those  areas  where
 the maximum  concentration  site is  clearly
 above the standard (>_40  percent  aoove) or
 clearly below the  standard (>20  percent
 below).   For  such  sites  a minimun  amount
 of data collection would be  adequate  to
 verify correct attainment/nonattainment
 status.   As  the area approaches  the
.standard, the monitoring frequency for
 the maximum  concentration  site would
 increase  so  that the misclassification
 of correct attainment/nonattainment
 status can be reduced.   If the area is
 either 10-20  percent below or 20-40
 percent above the  24-hour standard, 1 in
 2  day mom'toriny would  be  required.   When
 the area  is close  to the standard, i.e.
 10 percent below to 20 percent above,
 everyday  sampling  would  be required in
 order to  improve the stability of  tne
 attainment/nonattai nment classi fication.
 Figures 2 and 3 illustrate mi sclassi fi -
 cation rates  for  a 3-year, 24-hour
 attainment test as  a function of the
 relative  status of a site  to the standard
 and in terms  of alternative  sampling
 frequencies.   As witn  previous analyses,
 underlying lognormal distributions with
 sgd's of  1.4  and 1.6 for attainment and
 nonattainment sites are  utilized.
 For sites following the  long-term
 incomplete sampling schedules  (1 in 6
 days and  1 in 2 days) misclassification
 rates can be  maintained  in or  below the
 neighborhood  of 5-10 percent.
 Summary
    The revisions to the  PM standards
 improve the  ability to  identify  non-
 attainment situations,  provide for more
 stable pollutant  indicators, and change
 the relative  importance  of the annual
 and 24-hour  averaging  times.  With the
 required  adjustments for incomplete
 sampling  in  the interpretation of  PM
 data, the revised  standard would correct
 for the variable protection afforded by
 tne current  24-hour PM  standard, and  it
 is expected  that the revised 24-hour
 standard  will  generally  be controlling.
    Monitoring  requirements have been
 promulgated  which  will  similarly correct
 for the deficiency  in the current
 standards.  Variable frequencies are  now
 required  in order  to reduce the uncer-
 tainty associated  with attainment/
 nonattainment classification.  This
 provides  more uniform protection by the
 standards but  at the same time conserves
 scarce monitoring  resources.  The  initial
 requirements  will  place  the most emphasis
 on areas  with the  highest estimated
 probability of  violating the PM]_.j stand-
 ards  wmle the  long-term strategy will
 allow sampling  frequency to vary accord-
 ing  to the relative status of an area
 with respect  to the standard concen-
 tration levels.
    The operational  difficulties
 associated with  implementing  the new
                                                  125

-------
requirements for everyday monitoring  has
generated new research initiatives  to
develop a continuous analyzer for  PM^o-
Once this is available, particulate matter
can be conveniently monitored everywhere
on the same basis as the gaseous  NAAQS
pollutants.
References
1.  "National Primary and Secondary Ambient  Air
    Quality Standards," Federal  Register,
    36(84):8186.  April 30,  1971.

2.  Review of the National  Ambient Air Quality
    Standards for Particulate Matter:   Assess-
    ment of Scientific and  Technical  Information,
    OAQPS Staff PaperJU.  S. Environmental
    Protection Agency,  Research  Triangle  Park,
    N.C. 27711.  EPA-450/5-82-001.  January
    1982.

3.  "Revisions to the National  Ambient Air
    Quality Standards for Particulate  Matter,"
    Federal Register, 52(126):24634.   July 1 ,


4.  "Revisions to the National  Ambient Air
    Quality Standard  for  Photochemical  Oxidants,"
    Federal Register, 44(28):8202.  February  8, 1979.

5.  Frank,  N. H. and  T. C.  Curran,  "Statistical
    Aspects of a 24-hour  National  Ambient Air
    Quality Standard  for  Particulate Matter,"
    presented at the  75th APCA Annual  Meeting,
    New Orleans,  LA.   June  1982.
    Davidson, J. E.,  and P.  K.  Hopke,  "Implica-
    tions of Incomplete Sampling  on a  Statis-
    tical Form of the Ambient Air Quality
    Standard for Particulate Matter,"
    Environmental Science and Technology, 18(8),
    1984.

    Frank, N. H., S.  F. SI ev a and N. J.  Berg,  Jr,
    "Revising the National  Ambient Air Quality
    Standards for Particulate Matter - A
    Selective Sampling Monitoring Strategy,"
    presented at the  77tn Annual  Meeting of
    the Air Pollution Control  Association,
    San Francisco, CA., June 1984.

    "Ambient Air Quality Surveillance,"
    Federal Register. 44(92):27571, May 10,
9.  "Ambient Air Quality Surveillance for
    Particulate Matter," Federal  Register,
    52(12b):24736.  July 1 ,  1987.   -

10.   "Regulations  for  Implementing  Revised
     Particulate Matter Standards,"  Federal
     Register.  52 ( 126 ):24672.   July  1,  1987
11
12.
           Group I  and  Group II  Areas"  Federal
     Register,  52(152):29383.   August 7,  1987.

     Pace,  T.  G. ,  and  N.  K.  Frank,  "Procedures
     for Estimating Probability  of  Nonattainment
     of  a  PMiQ  NAAQS  Using  Total  Suspended
     Particulate or Inhalable Particulate Data,"
     U.  S.  Environmental  Protection  Agency,
     Research Triangle Park, N.C. 1984.
           :IGURE 1.  FAILURE  PROBABILITIES  FOR 3-YEAR,  24-HOUR ATTAINMENT TEST
                               WITH  CONSTANT SAMPLING RATE
          en
          ?.   cc
                                         20X ABOVE STANDARD
              O
          ru
          «   o
              O   I

          LOBNORMAL DISTRIBUTIONS:
        STANDARD GEOMETRIC DEVIATION
                                                                                      1.6
           EXPECTED  EXCEEDANCE  FORM
           ONCE  PER  YEAR  FORM
         LOGNOHMAL DISTRIBUTIONS.
      STANDARD GEOMETRIC DEVIATION -1.4
                                      122       163       244       305
                                 NUMBER  OF SAMPLING DAYS PER  YEAR
                             355
                                                  126

-------
NJ
-J
                                                                                          Probability Of  Nonattainment Misclassification
                                                                                         0.00   0.05  0.10   0.15   0.20   0.25    0.30
                                                                                                                                      o

                                                                                                                                      5-n
                                                                                                                                      -H ^
                                                                                                                                      > CD
                                                                                                                                      2 ^
                                                                                                                                      n 2
                                                                                                                                      g§
                                                                                                                                      ts
                                                                                                                                      en Jr!
                                                                                                                                      tn r~
                                                                                                                                      n^;
                                                                                                                                      2^
                                                                                                                                      5Q
                                                                                      3D
                                                                                      m
  Probability Of Attainment Misclassification
0.00   0.05   0.10    0.15   0.20   0.25  0.30
                                                                                                                           j	I

-------
                                       DISCUSSION
                                      John Warren
                           US Environmental Protection Agency
   The use of the statistical concept of
expectation for comparing monitoring data
with a standard is new and quite intri-
guing as it offers promise of extension
to other standards and regulations.  The
difference between existing standards and
the new statistical standards is illus-
trated by the PM-10 standards.
   Existing standards:
   o The 24-hour concentration is not to
     exceed 150 micrograms per cubic
     meter more than once per year.
   o The annual average concentration is
     not to exceed 50 micrograms per
     cubic meter.
   New standards:
   o The expected 24-hour concentration
     is not to exceed 150 micrograms per
     cubic meter more than once per year.
   o The expected annual average concen-
     tration is not to exceed 50.
   The advantages of the "expected" meth-
odology over the existing methodology
include:
   o It has been used in a similar fash-
     ion in generating the Ozone standard
     and therefore "familiar" to the
     public.
   o It uses actual data to generate the
     results.
   o There is a reduction in year-to-ye'ar
     variability.
   o It enables the development of stable
     control strategy targets.
   The difference between the two method-
ologies would therefore appear to be
small and hence readily adaptable to
other standards.  One possible candidate
for the new methodology would seem to be
Effluent Guidelines and Standards,
Subchapter N, 40 CFR 400-471.  These reg-
ulations stem from the Clean Water Act
(1972)  and are based on the engineering
standards of Best Practicable Technology
(BPT)  or Best Available Technology (BAT).
These guidelines cover mining industries
(minerals,  iron ore,  coal etc.),  natural
products (timber, pulp and paper, leather
tanning etc.),  and the manufacturing
industries (pharmaceutical, rubber, plas-
tics,  etc.).  A typical standard within
these guidelines is the Steam Electric
Power Generating Point Source Category
(Part 423.12, Effluent Limitations Using
BPT) :
      BPT Effluent Limitations
                          Avgs.  of Daily
                 Maximum   values for 30
                 for any    consecutive
Pollutant         1 day      days shall
or Property	not exceed
   Although  there  are  small  differences
 in sampling  protocols,  comparison  with
 the new and  old  PM-10  standards would
 seem to imply that a set  of  standards
 devised on an expected basis would be
 possible; however, it  is  not to be.
   The problem lies with  the very  differ-
 ent objectives of  the  regulations,  state
 versus industry.   The  PM-10  standard
 applies to a State Implementation  Plan, a
 negotiated agreement between EPA and the
 states enforced  through the  National
 Ambient Air Quality Standards and  used
 to identify non-attainment areas.   The
 Effluent Guidelines, on the  other  hand,
 apply to a specific industry and is not a
 matter of negotiation.
   The resolution  of the  regulatory
 problems will be as difficult as the
 associated statistical  problems of:
   o Assumption of lognormality of data
   o Stability of  the process over time
   o Potential autocorrelation of  data
   o Uncertainties of data quality
   o The optimal allocation  of monitoring
     systems in non-attainment areas.
   Despite these problems, it is clear
that a statistical approach, in this case
expected values based on  an  underlying
lognormal distribution, is probably the
way of the future; research  should be
encouraged in this field.   Neil Frank and
Thomas Curran have indicated a viable
approach;  where will the  next step lead?
Total Suspended
Solids 	
Oil and Grease. .
Copper, total...
Iron, total 	
100.0 mg/1
20.0 mg/1
1.0 mg/1
1.0 ma/1
30.0 mg/1
15.0 mg/1
1.0 mg/1
1.0 ma/1
                                          128

-------
             ANALYSIS OF THE RELATIOHSHIP BZTWKEH MAXIMUM AMD AVERAGE IH S02 TIME SERIES
                            Thomas Hammerstrom and Ronald E. Wyzga
1.  Introduction and Motivation

Several  studies   have  examined  the
physiological   and  'symptomatic  re-
sponses of individuals to  various air
pollutants   under  controlled  condi-
tions.  Exposures in these experiments
are often  of limited duration.  These
studies   demonstrate   response  with
exposures as short as five nrmutes.

On  the  other  hand,  monitoring data
rarely exist  for periods  as short as
five   minutes.       Some  measurement
methods  do  not  lend  themselves  to
short  term  measurements;  for  other
methods,  5-minute   data   often  are
collected   but   are   not  saved  or
reported   because   of   the  massive
effort  that  would  be  required.  In
general,  the  shortest  time  average
reported  with  monitoring data is one
hour,  and  for  some  pollutants even
this time average is too short.

Where  monitored  data  do  not exist,
ambient  concentrations  can  be esti-
mated   by   the  use  of  atmospheric
dispersion models.'    The  accuracy of
these  models  degrades  as  averaging
times   decrease   and   they  require
meteorological     and     atmospheric
inputs for the  same  time  average as
predicted  by  the  model.   Thus, air
dispersion models are rarely  used for
time averages less than an hour.

There is, thus,  a fundamental mismatch
in   time   periods   between   health
response and  exposure, with responses
occurring after  only 5  or 10 minutes
of  exposure  while  exposure data are
only available for periods  of an hour
or  more.    This  paper  attempts  to
address  this  mismatch  by  examining
the relationship  between a short-term
time average (5 minutes)  and a longer
term  time  average  (60  minutes) for
one  pollutant  (S02)  for  which some
data  are  available.    Understanding
the relationship between the  two time
averages  would  allow  the estimation
of response  given  longer  term esti-
mates  of  ambient  concentration.  It
could  also  help  in  the  setting of
standards   for   long  term  averages
which would help protect  against peak
exposures.

This   paper   explores  the  type  of
inferences  that  can  be  made  about
five  minute  S02 levels, given infor-
mation on hourly  levels.    There are
three  possible  models  for    health
effects  which  motivate  these  infer-
ences :
     1.  there  is  one  effect   in an
     hour  if  any  5-minute  exposure
     level exceeds a threshold,
     2.      each   5-minute   segment
     corresponds   to  an  independent
     Bernoulli trial  with probability
     of   an   effect  equal  to  some
     increasing   function    of   the
     current 5-minute level,
     3.    each  5-minute segment is a
     Bernoulli trial  with  the proba-
     bility of  an effect depending on
     the entire recent history  of the
     S02 process.

Corresponding to  these health models,
there are three possible parameters to
estimate:
     1.     the  distribution  of  the
     maximum 5-minute  level during an
     hour,
     2.      the  distribution  of  an
     arbitrary    5-minute    reading,
     3.    the  joint  distribution of
     all twelve 5-minute readings.
All  three  distributions  are  condi-
tional     distributions,   given   the
average   of   all   twelve   5-minute
readings.      The  first  conditional
distribution  is   the   parameter  of
interest  if  one  postulates that the
dose  response   function  for  health
effect  is  an  indicator function and
only  one  health  event  per  hour is
possible; the  second is the parameter
of  interest  if   one   postulates  a
continuous dose response function with
each 5-minute  segment constituting an
independent Bernoulli trial; the third
conditional    distribution    is   of
interest  if  one  postulates that the
occurrence of a  health  effect within
an  hour  depends  continuously on the
cumulative number of 5-minute peaks.

This  paper  discusses  some approach-
es to  each of  these three estimation
problems.    Section  2  discusses why
the   problem   is   not  amenable  to
solution by routine algebra.  Sections
3  and   4  present  results  for  the
estimation of  the  maximum.   Section
3  presents  some  ad  hoc methods for
modelling  the  maximum  as  a  simple
function of  the average when both are
known and discuss how  to extend these
methods  to  estimate the maximum when
it is unknown.    Section  4 discusses
the  error  characteristics  of  these
methods.   Section  5  presents  an ad
hoc method  of estimating an arbitrary
5-minute   level   from   the   hourly
                                           129

-------
 average;    Section    6   discusses   the
 error  characteristics  of  this  method.
 Finally,    Section    7   presents    an
 estimation  of   the   joint  distribution
 of   all    twelve   5-minute   readings,
 derived  from a  specific distribution—
 theoretic model  for the 5-minute  time
 series  and  discuss   some    of   the
 difficulties  involved   with  extending
 this.

 2.   Obstacles to Theoretical  Analysis

 A brief  discussion  of  why we resorted
 to  ad  hoc  methods   is  needed to begin
 with.  In theory,   given  a   model  for
 the  (unconditional)  joint  distribution
 of   the    time-series   of   5-minute
 readings,   it   is   straightforward   to
 write  down  the exact  formula  for  the
 joint  conditional   distribution of  the
 twelve  5-minute  readings,   given  the
 average.
If ^  = (Xi , .
  f(x) ana if
              . . , Xp )  has joint density
              ~
               =
then the conditional joint  density  is
given by equation  1 .
(1)    h(x,;
 j  f(x)  ax
 -   ~\-  -^/
              =  f(x)
                              =~x)
                          : ?x< /p  = ~x}
where S  is the simplex
and I is, the  indicator function.

The  conditional  distribution  of the
maximum  and   the conditional distribu-
tion  of  any  5-minute  reading would
follow immediately from the condition-
al  joint  distribution  of all twelve
5-minute levels.

Unfortunately,   estimation   of   the
unconditional    joint   distribution,
f(x), of the  5-minute  time-series is
not  easy.      Non-parametric  density
estimation requires gigantic data sets
when   one    is   working  in  several
dimensions .

Parametric    modelling    also   poses
formidable computational problems.  If
f(x;9) is  the  joint  density  of the
5-minute levels,  then the log likeli-
hood function, based on observing only
_a_ sequence  of N  hourly averages, x"i  ,
xz , . . . ,   XN ,  is gi ven by
(2)
      (9)
             /V      f

          =  ~2_ log   )
             .       <~
                       f (w;6) dw
Here S, = (w: J Wj /p =  x, }  for i=1,
2,  . . .  N.  ~"   •

Each  term  on  the right hand side is
the  integral   of   a  1 2-dimensional
density    over    an   1 1-dimensional
simplex.  For most  reasonable choices
of   a   joint   distribution  of  the
5-minute readings, these integrals can
only  oe  evaluated numerically, using
Monte Carlo methods.   To find maximum
likelihood   estimates   of   9,   one
must   numerically   evaluate   Lx  at
sufficiently  many   values  of  9  to
approximate the maximizing  value.   9
is  always  at least three dimensional
(location, scale,  correlation)  and N
will  be   in  the   hundreds  (or
thousands),  making  numerical maximum
likelihood    estimation    a   nearly
insurmountable task.    (Moreover, the
hourly  averages  in the observed must
not be  consecutive hours  but must be
far  enough   apart  in   time  to  be
effectively   independent;  otherwise,
the  likelihood  function is even more
complicated.)

An additional  problem with parametric
modelling   is   the   choice  of  the
functional form of  the  joint density
f.   One can  test hypotheses that the
hourly averages  come from  one of the
commonly  used  distributions: lognor-
mal, Weibull,  or gamma.   However, if
hourly average  S02 readings are, say,
lognormal, then  5-minute averages are
not lognormal.   In general, one would
expect  the  hourly  averages   to  tje
closer in  shape to the normal distri-
bution than are  the  5-minute levels.
(At  least,  this  would  be  true  if
the  5-minute  levels  have  the  same
finite   variance.)      There  is  no
technique for inferring the functional
form   of   the  distribution  of  the
individual terms  in  a  sum  from the
functional form of the distribution of
the sum.

As  an   alternative   to  theoretical
modelling of  the relevant conditional
distributions, we  have  explored some
ad  hoc  empirical  methods of estima-
tion.  It is important to bear in mind
that the  objective of the exercise is
not merely to  determine  a functional
form  for   the  relationship  between
5-minute levels  and  hourly averages;
but rather  it is  to provide specific
numeric  estimates  that  can  be used
when  the  five-minute  levels are not
observed.  There are  no unknowns when
the  five-minute  levels  are known so
the  only   application   of   such  a
technique  is- extrapolation to situa-
tions where no data  for new parameter
estimation are available.

3.  Estimation of the Maximum

3.1  Nature of the Data

The Electric  Power Research Institute
has collected  data  relevant  to this
inference from  two different studies.
The  first  comes  from   a  group  of
stations  monitoring  a  point-source;
the second from  a  station monitoring
ambient  levels  in  a populated  area.
At these two sites, data were collect-
                                           130

-------
ed in  each 5  minute segment for long
periods  of  time,   permitting  direct
comparison of  the hourly and 5-minute
levels.   The first  data set analyzed
was   from   18  monitors  around  the
Kincaid  power  plant  in  Illinois, a
coal-fired plant  in Christian County,
111., with a single 615 foot stack and
a  generating  capacity  of 1320 mega-
watts.  The data set consists  of nine
months   of   observations   from   18
stations  around  this  plant.     SO2
readings at these stations reflect the
behavior of the plume  from the stack.
For  a  given  monitor  there are long
stretches where S02  levels  are zero,
indicating  that   the  plume  is  not
blowing  toward  the  monitor.    Such
readings  constitute  about 12% of the
hours  in  the  data  set;  these were
discarded before  any further analysis
was  done.     The   second  data  set
consists of  SO2 data  from a New York
City monitoring station  not  near any
dominant point  source.  The data were
collected between  December  15, 1981,
and March 11, 1984.

3.2  Outline of Methods Used

We explored three empirical methods of
estimating   the    maximum   5-minute
reading from  the hourly average.  All
three  methods   postulate   a  simple
parametric model  for the maximum as  a
function of the average.   The methods
differ  only  in  how estimates of the
parameters are  obtained.    The first
method   obtains  parameter  estimates
from data containing 5-minute readings
and  then  uses  these  estimates  for
other  data  sets  collected elsewhere
(and containing only 1 hour readings).
This method  is motivated by the theory
that there   is a universal law govern-
ing  the   relationship   between  the
maximum and the average of an S02 time
series, with  the  same  parameters at
all  sites.  The second method requires
expending effort  to  collect 5-minute
data  for a short period of time at the
site of interest  and  using  the data
from   this  period to obtain parameter
estimates  that  will  be   used  over
much   longer  periods when sampling is
only on the  1-hour   basis.   The third
method fits  a simple parametric model
to the  maximum  hourly  reading  in  a
12-hour  block  as   a  function of the
average  over  the   12-hour  block and
then   assumes   that  the  same  model
with   the   same    numeric  estimates
describes    the     maximum   5-minute
level  in an  hour as  a  function of  the
hourly  average.     (Daily  cycles  are
removed from  the   12-hour  block data
prior  to  estimation  by  dividing by
long-term averages over  a  fixed hour
of the clock.)  For  mnemonic purposes,
we   will  call  these  three  methods:
1.   the method of universal constants,
2. the method of  short-term monitors,
and   3.    the  method  of  change  of
time-scale.

Estimates of the  potential  errors in
the method of universal constants were
obtained   by   using   the  parameter
estimates  from  the  New York  data to
fit the  Kincaid data  and vice  versa.
Potential  errors   in  the  method  of
'short-term monitors  were estimated by
dividing  both  data sets into  batches
100 hours long and  then using   each of
the   hundred  odd   resulting parameter
estimates to fit 13  randomly selected
hours.    The  hours  were  chosen  by
dividing the range  of  hourly averages
into   13  intervals  and  choosing one
hour  from  each  interval.   Potential
errors   in  the  method  of  change of
time-scale  were  obtained  by   simply
comparing  the  maxima predicted using
the estimates from  the  12-hour  blocks
in  each  data  set  with the observed
maxima in the same  data.

3.3   Parametric Models for the  Maximum

The   parametric  models  proposed here
are intended to give ad hoc approxima-
tions  to  the  maximum.  One can show
that  they cannot be the true theoreti-
cal   formulae.    Because  the  maximum
necessarily increases  as  the  average
increases,  it  is  more convenient to
work  with the rati.o of the  maximum to
the   average  than  with  the   maximum
itself.  Previous   authors  (Larsen et
al . ,   1971)  working  on  this  problem
have  used models  in  which log(ratio)
is linear  in log(average).  Therefore,
we began by fitting  such  a  model to
the   two  data  sets by ordinary least
squares.  These estimates are given in
Table 1.   As  may  readily be checked,
for both  data sets,  this model leads
to  impossible  values,  fitted  ratios
which are  less  than  one,  for large
values   of   the  average.    For  the
Kincaid  data,  this  occurs  at rela-
tively low values of the average.

In  fact,  it  is   not  thought  that  a
single   universal   set  of  constants
applies  to   the   regression   of  log
(ratio)  on log(average).    Rather, it
is    thought ' that   the  atmospheric
conditions  around   the  monitor  are
classified into one of seven stability
classes; and  it  may  be  more  appro-
priate   to   assume   the  parameters
of the regression are  constant  within
a  given  a  stability  class.   It is
possible that the impossible values of
the fitted  maximum occur because of  a
Simpson's paradox   in  the  pooling of
data   from  several stability classes.
Ideally,  the  above  model  should be
fitted   separately  to  each stability
class.   Unfortunately,  there   were no
meteorological   data   available   to
                                           131

-------
 permit  such   a partition  of the data.
 It   is   possible  that   it  would  be
 worthwhile   to  obtain  such  data and
 redo  the  analysis.    The difference
 between the  Kincaid and   New York City
 sites   must  be   emphasized.    The
 sources and   variability  of pollution
 are  very  different, and  it may not be
 reasonable   to  extrapolate  from  one
 site  to  another;  two data sets from
 like  sites   should  be   considered in
 subsequent analyses.
 In order  to prevent the  occurrence of
 impossible   fitted   values,   we  fit
 models  in which the log[log(ratio)] is
 a linear function of the  log(-average).
 The   ordinary   least    square  (OLS)
 estimates (for New  York  and Kincaid)
 of this  line are  also given in Table
 1.   Figures  1  and 2  show the scatter
 plots of  the maximum  vs the average.
 Both axes have logarithmic scales.  If
 the  log  of the ratio were linear in
 the  log  of the  average,  one would
 expect   that   the   vertical   width
 of   the   scatterplot    would  remain
 roughly   constant   as   the  average
 varied.  Instead, it  appears that the
 scatterplots narrow  vertically as the
 average   increases,   as   would   be
 expected if   the iterated logarithm of
 the  ratio were linear  in  the  log of
 the  average.   For  both  data sets, it
 appears  that  the  iterated  log  log
 model   more  accurately  mimics  the
 real data  than the  only former model
 shows   the   diminishing (on log scale)
 spread  of the maximum  with increasing
 values-   of    the   hourly   average.
 This model is  the  preferable  one to
 estimate the maximum.

 In both  data sets, the residuals were
 slightly  negatively  skewed  with the
 skewness being  greater in the Kincaid
 data.   It seems  reasonable to assume
 that  the  residuals  in  the New York
 data were aoproximately normal.   This
 assumption   is  harder to maintain for
 the  Kincaid  data.    Figures  3  and  4
 show    the    histograms   and   normal
 probability   plots  for   the residuals
 from these two regressions.

 The  main purpose of the analysis is to
 obtain  a  formula  for  estimating the
 conditional    distribution   of   the
 unobserved   5-minute  maxima  from the
 observed   hourly   averages.      The
 iterated log vs log  models yield the
 following    two   formulae,  given  in
 equations 2  and 3.
 (2)   Probt 5-mi nute   max<_   x   |   hourly
 average  =  y)  =

      F(xly)  =
i£ ( {loglog(x/y)+. 267*1 og(y)+.719}/. 62  )

                          for  New York
(3)  Prob(5-minute  max!  *  !   hourly
average = y) =

     F(xly) =
 G({loglog(x/y)+.258*log(y )+.191} )

                         for Kincaid.

Here   S?   is  the  normal cumulative
distribution  and  G  is the empirical
distribution function of the residuals
of the  OLS regression of loglog ratio
on log average.  We recommend  using G
in  place  of treating these residuals
as normal.  G is tabulated in table 2;
its histogram  is graphed in figure 4.
Equations 2 and 3 do a reasonably good
job  of  modelling the observed maxima
in the two data  sets  from  which the
values of the parameter estimates were
derived.

Inverting  equations  2  and  3  gives
simple formulae for the percentiles of
the  conditional  distribution  of the
5-minute maxima.  Notice that equation
3,  table  2,  and   linear  interpol-
ation  permit  estimation  of percent-
lies of the  Kincaid  maxima  from the
5'th  to   the  95'th.    Attempts  to
estimate   more   extreme  percent!les
would require foolishly rash extrapol-
ation.

The  log  vs  log   models  provide  a
competing   (and   somewhat  inferior)
method  of  -estimation.    They  yield
conditional   distributions   of   the
5-minute maxima given  by  equations 4
and 5.
(4)  Prob(5-minute  max!
average = y) =
                                hour!y
     F(x!y) =
   {log(x/y)+.077*log(y)-.499}/.2
(5)  Prob( 5-mi nute  max<.
average = y) =
                         for New York

                          x   |  hourly
 _  F(x|y) =
 c£( {log(x/y)+.21*1og(y)-1.07}/.69  )

                         for Kincaid.

In  these  regressions,  we  found   it
acceptable to use a  normal approxima-
tion  for  the  residuals  in both  New
York and Kincaid.
4.  Error Estimation

4.1  Errors in the Method of Universal
Constants.

It  is  not  feasible to use a conven-
tional method to  estimate  the  uncer-
tainty in  the maxima fitted with  this
                                           132

-------
method.  The major  difficulty is that
one is  not looking for a well-behaved
estimator but rather for  a particular
numeric value  of the estimate for use
in all data sets.  The  standard error
of  the  estimate  in  one data set is
quite misleading  as a  measure of the
error  that  would  result  from using
that  same  estimate  in  another data
set.    A further exacerbation results
from the high correlation  between the
observations  used   to  generate  the
estimates.   The conventional formulae
for  the  standard  errors  will  exag-
gerate  the  amount  of information in
the  data  set  and  yield  spuriously
small standard errors.   Finally,  there
is the problem that one knows that the
model  is  theoretically incorrect and
that the  true underlying distribution
is   unknown   so   the   conventional
standard error formulae  based  on the
modeled  distribution  are necessarily
in error.  One would suspect that even
if  the  model adequately approximates
the first moment  of  the  maximum, it
approximates  the  second  moment less
wel 1 .

AS an alternative method  for estimat-
ing the  uncertainty in  the method of
universal constants for all data sets,
a cross-validation method was pursued.
We used the estimated  parameters from
each of  the New York and Kincaid data
sets to estimate  the  maxima  for the
other  data  set.   For each hour, the
estimated maximum were divided  by the
actual  maximum,  the resulting ratios
were grouped into  10  bins, according
to  the  value  of the hourly average.
Within each of these bins, we computed
the  three  quartiles of the quotients
of fitted over actual maxima.  Figures
5 and  6 show these three quartiles of
the fitted over  true  ratios, plotted
against  the  midpoint  of  the hourly
averages in the bin.

One  should  recall  that  the Kincaid
data  reflect  the  situation  near  a
point source  while the  New York data
reflects  ambient  levels far from any
point  source.     Consequently,   this
method    of    cross-validation   may
exaggerate the  error  associated with
this   procedure.     However,  unless
additional 5-minute data are collected
anq analyzed  from a  second plant and
from   a   second   population  center
station, it  is difficult to determine
how much  of the  error is  due to the
disparity  of  sites  and how much due
to the metnod.

The  most  striking  feature  of these
plots  is  that  the two cross-valida-
tions  are   biased  (necessarily,  in
opposite  directions).      The  higher
values  of  the  hourly  average  (the
right  half   of  the  graph)  are  of
greater interest.   For  the  New York
data, the  first quartile of the ratio
of fitted  over the  actual maximum is
greater  than  1;  i.e.  the estimated
maximum is  too high  three fourths of
the  time.    The median of the fitted
over actual ratio is,   for most hourly
averages over  1.2; i.e. the estimated
maximum is 20* too high more than half
the  time.    The estimated maximum is
30-40* too high at least a  quarter of
the time.  The situation at Kincaid is
essentially the mirror image  of this:
for  the  higher  values of the hourly
average,  the  third  quartile  of the
fitted over  actual ratio is below .9;
i.e. estimated maxima are at least 10*
too  low  nearly  three fourths of the
time.  They are 30-40* too  low nearly
half the  time; are  50-60* too low at
least a quarter of the time.

The proportionate  error diminishes as
the hourly  average goes up.  This, of
course, is an artifact of using fitted
value/true  value  as  the  measure of
error.  In absolute size (ug/m"3), the
errors  would   not  diminish  as  the
hourly average increases.

4.2  Errors in  the  Method  of Short-
term Monitors.

In   order   to  estimate  the  errors
associated with attempting to estimate
parameters    of   the   ratio-average
relationship  at  a   given   site  by
actually measuring 5-minute levels for
a  short  time,  each   data  set  was
divided  into  batches  100 hours long
and  OLS  estimates  were  derived for
each  batch.     There  are  125  such
batches in the New  York data  and 158
batches in the Kincaid data.

It is difficult to judge the potential
in  estimating  the  maxima  by simply
looking  at  the  uncertainty in these
parameters.    In  order   to  further
clarify the errors of direct interest,
we divided  the  hours  into  13 bins,
according to  the size of their hourly
averages.  For each  OLS estimate from
a batch, we randomly selected one hour
from each of the 13 bins  and computed
the quotient  -of the fitted maximum to
the true maximum for  each  hour.   We
then  computed  the three quartiles of
the resulting quotients in each of the
bins.   Figures 7-10  show these three
quartiles, plotted  against the hourly
average.

In  contrast  to  the previous method,
these  estimators  are  nearly  median
unbiased. That is, the median value of
the quotient is just  about 1, corres-
ponding  to  accurate estimation.  For
hourly averages greater than  1 ug/m"3
one  can  see  that  the  iterated log
models  lead to estimates of the maxima
                                           133

-------
that  are  within  20  to  40*  of the
actual maxima  at least  half the time
for the Kincaid data and within 10* at
least half the time  for the  New York
data.    That  is,  the first and third
quartiles of  the  fitted  over actual
ratios  fall  at  .9  and  1.1 for New
York, at .8 and  1.2  for  Kincaid (at
least on the right half of the plots).
The log  models have  roughly the same
error  rates.    It  is    also  worth
noting that, for the Kincaid data, the
log models continue to give impossible
fitted values in many cases.
Comparing   these   results  to  those
obtained from the method  of universal
constants, one can see that "the method
of  short-term  monitors  offers  some
improvement   in   accuracy  over  the
former method, where the estimates are
noticeably   biased   and   errors  of
20*   in  the  estimated  maximum occur
half  the time.  The increased accuracy
is much  more noticeable  with the New
York  data.     At  this  time  it  is
impossible to say whether a comparable
difference   in   accuracy   would  be
present  at   most  population  center
stations  and  absent  at  most  point
source stations.

4.3   Errors  in  the  Method  of Time-
Scale

The   third  method  suggested  was  -to
remove a daily cycle from the observed
hourly data  and then  assume that the
relationship between peak and  mean of
twelve hourly  readings is the same as
the that in twelve  5-minute  readings.
 A priori,  one would expect  that this
method to be  the  least  effective of
the   three.      The  correlation  of
successive 5-minute  readings  will be
higher than  that of successive hourly
averages;  averages  over   longer time
scales should  come from distributions
closer to Gaussian  so  the functional
form  of  the unqerlying distributions
will  not be  the  same.  In   fact, the
parameter estimates  obtained this way
are seriously in error, as  can be seen
by comparing  the estimates in Table  3
with  those  in Table 1.

Figure 11 shows plots  of quotients of
the maximum  estimated from the  1-hour
to   12-hour  relation  to   the maximum
estimated from  the actual  5-minute to
1-hour relation.    Results   from both
sets  and  both the  log vs  log and the
iterated  log vs  log model are graphed.
At high   levels, the  estimates  in New
York  are  too high  by   10-20*;   at low
levels,  they are seriously  biased  low.
In the   Kincaid  data,  estimates from
the   iterated   log  vs   log   model are
too  high  by  50-60*; the performance of
the   log  vs   log  model  is  even  worse.
These plots,  which  roughly  correspond
to   the   median  accuracy   using  this
method, were so  bad  that  we  did no
further investigation  for the Kincaid
data.

A  similar  procedure  was  applied to
the  New  York  data  to  predict  the
maximum for  the iterated  log and log
models,   respectively,  with  results
similar  to  those  obtained  from the
Kincaid  data.     The predictions are
biased  high;  three  fourths  of  the
time, the  fitted value  is at least 5
or 10* too high;  half  the  time, the
fitted value is at least 10 or 20* too
high.   Somewhat surprisingly, the log
versus  log  model  performs  somewhat
better  than  the  iterated log versus
log model for this data set.
5.     Estimation
5-Minute S02 Level
of   an  Arbitrary
The  second  objective of the analysis
was to find  a  model  for  the condi-
tional  distribution  of  an arbitrary
5-minute SO2 level,  given  the hourly
S02 average.  As an alternative to the
theoretical calculation, the following
ad hoc method was considered.

     1.    Use  deviations of 5-minute
     S02  levels   from  their  hourly
     averages,    rather    than   the
     5-minute levels themselves.

     2.    Make  deviations  from dif-
     ferent   . hours   comparable   by
     dividing  them   by   a  suitable
     scaling   factor.      The  usual
     scaling  factors,   the  standard
     deviation  or  the  interquartile
     range within an  hour,  cannot be
     used  because  one wants a method
     that can be  used  when knowledge
     of variability  within an hour is
     not available.  The  scale factor
     must  depend  only  on the hourly
     average.    We  employed  a scale
     factor of the form
     exp(B *log(hourly  average) + A).
     The slope and intercept, B and A,
     were  obtained  by OLS regression
     of log(hourly  SD)  on log(hourly
     average),   in   each   data  set
     separately. In practice, it would
     be necessary to use the parameter
     estimates  from  these  two  data
     sets  in  future  data sets which
     contain only SO2 hourly averages.

     3.      Pool   all   the   scaled
     deviations  together  and  fit   a
     simple  parametric  model  to the
     resulting   empirical   distribu-
     tion.

This  three  step  method  was applied
separately  to  each  data  set.   The
estimated   conditional   distribution
function is given by equation 6.
                                           134

-------
 (6)    Prob( 5-mi nute  S02   level   <. x|
 hourly average S02 = y) =

 __  F(x| y) =
    ( (x-y)/exp(B*ln(y) + A)   ).
The numerical  values of  A
given in table 4.
                            and  B are
We  found  that  the  -standard  normal
distribution  worked  acceptably  well
for both  the New York and the Kincaid
data.    An  attempt  to  use  a three
parameter    gamma   distribution   to
compensate  for  some  skewness in the
scaled  deviations  did  not. lead  to
enough  improvement   to  justify  the
introduction of  the extra parameters.
One  should   note  that  there  is  a
systematic  error  in  this  procedure
that was  not present  in modelling of
the  maximum.      Given   the  serial
correlation  of successive five-minute
readings,  the readings  in  the middle
of  the   hour  will  be  more  highly
correlated  with  the  hourly  average
than will   the first or last readings.
The model   in equation  6. is intended,
at  best,   to  predict  the value of a
5-minute  reading  selected  at random
from  one  of  the  twelve  time slots
during an hour,   not  the  value  of a
5-minute reading from a specified time
slot.
6.  Error  in  the  Estimation  of Any
5-Minute S02 Level

There are  two types of error that one
may consider here. First, there is the
error in  using equation 6 to estimate
the  proportion  of  5-minute readings
which  exceed  a  given  level of SO2.
Second,  there is  the  error  in using
the equation  to estimate the level of
S02  that  corresponds   to   a  given
percent!le  of   the  distribution  of
5-minute readings.    If  one  is'con-
cerned about  the frequency of exceed-
ances  of  a   threshold   for  health
effects, it is the first type of error
that is of interest.  We  will discuss
only the estimation of this first type
of error.

Cross-validation between  the two data
sets  was  used  to measure the error.
The estimated slope  and  intercept of
the  scaling  factor (the only unknown
parameters in the model)  from the New
York  data  and  the  observed  hourly
averages   from   the   Kincaid   data
to predict  the scaling factors in the
Kincaid data.   We then divided all  the
observed  deviations  from  the hourly
averages by these scaling factors.   If
the  parameter   estimates  are  good,
these  scaled   deviations  should  be
close  to  a standard normal  distribu-
tion.
 We  grouped  these  scaled  deviations
 into 16   bins,  according  to the level
 of the hourly  average.     To quantify
 how  well   the  estimates performed,  we
 computed,  for each of the 16  bins the
 observed  proportion,   p",   of  scaled
 deviations which  exceeded  the values
 -2,   -1,  -.5,   +.5,   +1,   +2.    This
 corresponds to  using as  thresholds the
 5'th,   15'th,  30'th, 70'th, 85'th and
 95'th  percent!les   of   the  5-minute
 readings,   computed  using the correct
 parameters.  Figure 14 shows the plots
 of these  five  P's  against the hourly
 average.     (The six  curves correspond
 to  the  nominal   5'th  through  95'th
 percentiles;  the  ordinate  shows the
 percentage    of     scaled    deviations
 actually  less   than   that  threshold.)
 The  whole  procedure was  then repeated,
 reversing   the   roles of  the New York
 and   Kincaid  data  sets.     Figure  15
 shows  the  plots  of the  P's from New
 York  data with Kincaid parameters.

 It can be  seen   from   these  two  plots
 that   the   5-minute  readings  in the
 Kincaid  data  are  more dispersed  about
 their  hourly  averages  than would  be
 expected from the New York data.    At
 high values   of  the average,  a thresh-
 old  which one would  expect to  be the
 70'th  percentile   is  actually only the
 55'th  to  60'th   percentile;   what one
 would  expect  to  be  the 85'th percen-
 tile  is  actually  between the  60'th and
 the    70'th   percent!le;    what   one
 would  expect  to  be  the 95'th percen-
 tile  is  actually only about  the  70'th
 to the 80'th  percentile.    Consequent-
 ly,   if  one  were  using the New York
 data   for   parameter  estimates,  one
 would    noticeably  underestimate the
 frequency of exceedances of   a thresh-
 old.

 Necessarily,  one   finds  the opposite
 situation  when   5-minute   readings  in
 New York are  inferred  from  the  Kincaid
 data.    As  shown  in   figure   15,  a
 threshold  that   one   would expect,  on
 the basis of the  Kincaid data,   to  be
 only the  70'th percentile  of  5-minute
 readings would actually  be  nearly the
 95'th   percent!le     in    New    York.
 Consequent!y,"if  one  were   using the
 Kincaid data  for parameter estimates,
 one would  noticeably overestimate the
 frequency of exceedances.

 7.  Theoretical  Modelling of  the  Joint
 Distribution of  5-Minute Levels

We  made  some  attempts    to  explore
 theoretically    motivated    paramet-
 ric  models  for  the   third  problem
 listed  in  the    introduction,  namely
estimation of  the  joint distribution
of the 5-nmnute  levels, conditional on
the hourly average.   The most popular
choice  of  marginal  distribution for
                                           135

-------
SO2  levels,  when  averages   over  a
single length of time are observed, is
the lognormal.    We  therefore tested
the  goodness-of-fit  of the lognormal
distribution to the 5-rmnute sequences
at Kincaid and New York.  The 5-minute
readings at New York appeared to fit a
lognormal distribution acceptably.  (A
formal  test  would  reject  the hypo-
thesis  of  lognormality.  However, it
appears that  the  deviation  from the
lognormal is  small  enough to be of no
practical importance  even  though the
enormous  sample  size leads to formal
rejection of the model.)  The 5-minute
readings  at  Kincaid appeared notice-
ably more leptokurtic than a' lognormal
distribution.     We  therefore  did no
further work with the Kincaid data.

Estimation  of  the  joint conditional
distribution  requires  three  further
assumptions.   First,  we  assume that
the  unconditional  joint distribution
of all the logs of 5-minute  levels is
multivariate   normal.     This  seems
reasonable in  light  of  the approxi-
mate  marginal  lognormality.  Second,
we  assume  that  the  autocorrelation
structure  of  the  sequence  of loga-
rithms of  the  5-minute  levels  is a
simple    serial    correlation,   the
correlation at lag i being just rho to
the i'th  power.  This is necessary to
keep the  number of  parameters in the
model  down  to  three.   In fact, the
sample correlations at lags 2 to 4 are
not too  far from the second to fourth
powers  of  the  lag   *1  correlation.
Third,  we   assume  that  the  hourly
average  observed  was  the  geometric
mean  of  the  twelve 5-minute levels,
although it was in fact the arithmetic
mean.    This assumption is explicitly
false:  the  true  geometric  mean  is
smaller than the observed average, but
the  higher  the  correlation  between
successive   5-rmnute   readings,  the
smaller  the  difference  between  the
arithmetic and  geometric means.  This
assumption is made in order to  get an
algebraically  tractable  problem  and
with the  hope  that  the  high serial
correlation  will  make  it  close  to
true.  With  these  three assumptions,
it  follows   that  the  logs  of  the
5-minute  levels   and   the    log  of
their  geometric   mean  come  from  a
13-dimensional   normal   distribution
with a rank 12 covariance matrix.

One   now   finds   that  the  desired
conditional   distribution    of   the
vector  of  12  log 5-minute readings,
given the   log of  the geometric mean,
is  12-dimensional normal with mean and
variance   given   by    the   standard
multivariate    regression   formulae.
Letting Zi =  log of the   i ' th 5-rmnute
reading,  we  have  that  the mean and
variance-covarlance   matrix   of   this
conditional distribution  are given by
equations 7A and B:
       Cov(Zi ,Z)*(Z - // )/Var(Z
(7A)
(78)  Var(z;z) =
 Var(Z) - "Cov(Z,Z)Cov(Z,Z)'/Var(Z).
     S\J        ^J       ~^s

In more detail, the i ' th coordinate of
the vector of covariances  of the  logs
of the  5-minute readings  and the log
of the geometric  mean,  Cov(Zi ,Z), is
equal to
... + £ '2-' }/ 12

and the variance  of  the   log  of  the
geometric mean is equal to

V=   CT 2 *  {12
The problem  of  estimating  the joint
distribution  of  the 5-rmnute  levels,
given  the  hourly  average,    is   now
reduced  to  the problem of estimating
the three parameters  (mu,  Sigma,  and
rho)  in  the  above expressions, when
one  observes  only  the  sequence   of
hourly averages.  Because the sequence
of observed logs of geometric means is
also  a  multivariate normal sequence,
it is  simple  to  estimate  the mean,
variance,   and   covariance  of  this
sequence.   Specifically,  the   log of
the geometric mean is normal with mean
equal to mu, with variance equal  to V
above.   Furthermore, the  logs of  the
geometric  means  in  successive hours
are  bivanate  normal with covariance
equal to
C =
+ 11
 2  *  {

> 1 3  4
              +  2
2 2
        12 £
        £ 23  }  /  144.
The   (computable)   maximum   likelihood
estimates  of  the mean   mu,  variance V,
and   covanance    C    of   the   hourly
averages uniquely   determine the MLE's
of the  parameters  mu,   sigma,  and rho
of the  5-minute series.

The  estimated conditional  distribution
of the  logs  of the 5-minute levels in
New  York,  given their  hourly averages,
is shown   in  Table   5.   This distribu-
tion is 12-dimensional  normal  with the
indicated   numerical   values  for  the
vector  of   conditional  expectations of
the   logs   of the   5-rmnute readings,
given the  hourly average,   and  for the
variance-covariance matrix.

One  can also attempt   to  elaborate on
the   above   computation    by   making
approximate  corrections  for the fact
                                           136

-------
 that   one    actually    observes   the
 arithmetic    mean    rather    than  the
 geometric   mean.     All   of   the above
 equations   and  distributional  formula-
 tions   are   still   valid.     The  only
 problem  is  that   they  cannot be used
 for  computation if  the geometric means
 are  not observed.   We suggest  that the
 following approximations be  used when
 only  the   arithmetic,  means   are  ob-
 served.   First,  compute   the  first and
 second  sample  moments of the  observed
 sequence of arithmetic   means   and use
 these  values to get method of moments
 estimates    of   the  parameters  mu,
 sigma,  and  rho.  (The arithmetic means
 are  not  lognormal   so   these   are not
 maximum likelihood   estimates.)   These
 parameter    estimates    then    specify
 numerically the joint distribution of
 the    5-minute   levels,   given   the
 geometric   mean.    To complete specif-
 ication of  this distribution,  one need
 only give a numeric  estimate,  based on
 the  arithmetic  mean, of   the geometric
 mean.    A reasonable choice  is  to set
 the  estimated  sample   geometric mean
 equal   to   the  observed sample  arith-
 metic   mean  times   the   ratio  of the
 estimated expectation of the geometric
 mean to the  estimated   expectation of
 the  arithmetic  mean.

 Application   of   the  above   protocol
 requires only expressions, in  terms of
 mu,   sigma,   and rho, for four  moments:
 the   expectations   of   the    sample
 arithmetic   and  geometric  means,  the
 variance  of    the   sample  arithmetic
 mean,    and  ' the  covanance  of   the
 arithmetic  means  of successive  hours.
 Given   that   the  logs of the  5-minute
 readings    are   serially   correlated
 normal(  u,    o~  2)'s,   the   expected
 values  of the arithmetic and geometric
 means are,  respectively,

 EA = exp(^j  + cr2 /2)    and

 EG = expuj  + 6  ~2/2 )  where

6   = {  12  + 2*[ 1 1 £ +  10 Cl 2   +  . . .  +

 p i 1  ]   } / 1 44 .

The  variance of  the arithmetic mean  is

VA =
     jj+ 0-2 }*{12 + 2*[1 1 (exp('jr z^)-i ) +

I0(exp(7 2 9 2 )-l ) +

     . . .  + (exp(.-r 2 g i i )~1 )  ]  J/144.

Finally,    the   covanance    of   the
arithmetic  means   from  two  consec-
utive hours  is equal  to
                                                1 1 (exp(;r 2

                                                I0(exp(0" 2
              (exp(
                                                                               5/144,
C«  =  exp(
        (exp(tr
. It  is  important  to note that all of
 the  above  theoretical  modelling  is
 heavily   dependent   on  the  assumed
 multivariate   lognormality   of   the
 5-mmute  levels.    If  the  5-mmute
 levels   were    marginally   Weibull,
 Gompertz,   or  gamma  then none of the
 above   manipulations    would   work.
 Furthermore, in  new data sets it will
 not be  possible to  check for lognor-
 mal ity  of  the  5-minute  sequence by
 examining  only the sequence  of hourly
 averages.       Thus,    the  techniques
 outlined in  this section  can only be
 applied by  either taking lognormality
 on faith or by  taking the  trouble to
 observe  enough   5-minute  levels  to
 perform at  least  a  simple  check on
 lognormality.

 8.   Conclusions

 There does not seem to be any reliable
 method for estimating the  maximum SO2
 level   within   an  hour from knowledge
 only of the time  series of  S02 hourly
 averages at the' same  site.   The theory
 that there  is  a  simple relationship
 between   the    5-minute   and  hourly
 averages,    governed    by   the   same
 constants    at   all    sites,   is   not
 borne   out   by  the    two  data   sets
 examined.     In  fact,   the  functional
 form of the  marginal   distribution of
 5-minute levels  is not  even  the  same
 at  the  two sites.   One   must  recognize
 that  the   two sites   considered  were
 very different.    The   analysis should
 be   repeated   with  data  from similar
 sites   to   determine   the   extent   of
 extrapolation   across   sites   that   is
 possible.

 If  the  expense  is  not  prohibitive,  the
 best  results are  likely  to be  obtained
 by  taking  the  trouble   to measure  the
 5-rmnute time-   series  for  a period  of
 100  or  so  hours.    Even  this effort
cannot   promise   better   than  an even
chance  of  predicting future  maxima  to
within   ±   20*.     Using  parameter
estimates  from  one  of   the  few sites
where 5-minute  data have been  collect-
ed or   from  the   relationship between
the  hourly  and  12-hourly averages  at
the  site   in  question   are  likely  to
lead    to   somewhat    less    accurate
predictions.    The  magnitude  of the
errors  associated  with  attempts  to
predict  the  proportion  of   5-minute
readings which  exceed a threshold are
comparable  to  those  experienced   in
                                          137

-------
estimating the  maximum.   If standards
are to be established  with the inten-
tion  of  limiting  the health effects
associated   with    high   short-term
exposures,  then  these  limits on the
accuracy in prediction  must  be borne
in mind in the setting of standards.

Given  the   ad  hoc   nature  of  the
parametric  models   used,   one  might
try    other    paranietri zations—e.g.
estimate    the    transfer   function
between  the  time  series  of  hourly
averages  and  the   time   series  of
hourly   maxima—to   see   if  better
approximations   can    be   obtained.
Because the  iterated log model does a
fairly  good  job  of  estimating  the
maxima in  the data set from which the
parameters were  estimated and because
the marginal  distributions at the two
sites considered  are not  even of the
same form,  we think  it unlikely that
other choices  of parametrization will
lead to  much reduction  in the cross-
validation errors.

The task of estimating the conditional
distribution of  an arbitrary 5-minute
level,  given   the   hourly  average,
appears to  be equally  difficult.  It
appears that  using  ad  hoc parameter
estimates  obtained  from  one site to
predict  5-minute  levels  at  another
site leads  to biased predictions.  In
the two data  sets  compared  here, it
was-   impossible   to   tell  reliably
whether  a   given   level   would  be
exceeded 5* or 30% of the time.

Estimation  of  the joint distribution
of all twelve  5-minute  levels, given
their  average,  appears feasible only
if  one   is  prepared   to  assume  a
lognormal    distribution    for   the
unconditional  distribution  of  these
readings.    There  are  data sets for
which this is  demonstrably  not true.
Thus, it  again appears  that the most
reliable  estimates  can  be  obtained
only by  observing at  least enough of
the 5-minute sequence to check lognor-
mality roughly.
            BIBLIOGRAPHY

(1)  Grande!1,  Jan (1984), Stochastic
Models of Air Pollutant Concentration.
Spnngei—Verlag, Berlin

(2)  Johnson,  Norman and Kotz, Samuel
(1970), Continuous  Univariate Distri-
butions,   vol.  1.  John Wiley & Sons,
New York

(3)  Larsen,  Ralph  (1971),  A Mathe-
matical   Model   for   Relating   Air
Quality  Measurements  to  Air Quality
Standards.   U.S. Environmental Protec-
tion Agency, Office  of  Air Programs,
Research Triangle Park, North Carolina

(4)  Legrand,  Michael (1974), Statis-
tical Studies of Urban Air Pollution—
-Sulfur Dioxide  and Smoke, in Statis-
tical  and   Mathematical  Aspects  of
Pollution  Problems.
Marcel Dekker, Inc,
      	John  Pratt
      New York
         ed,
(5)  Pollack,
Studies   of
   Richard
Pollutant
  I.  (1975),
Concentration
Frequency Distributions. U.S. Environ-
mental  Protection  Agency,  Office of
Research and  Development, Publication
EPA-650/4-75-004,   Research  Triangle
Park, North Carolina
                                           138

-------
                                   TABLE 1
                            Descriptive Statistics
         Station
          Mean
         S.D.
   Skewness
Kurtosis
Hr Avg

Hr Sd

Log (Avg)

Log (SD)

NY
Kincaid
NY
Kincaid
NY
Kincaid
NY
Kincaid
19.61
20.78
3.34
13-71
2.64
1.77
.84
1.25
18
75
3.6
109
.85
1.6
.84
1.4
2.8
47
3.5
109
-.3
.0
.3
1.01
15
3810
21
13000
.3
-.2
-.2
.12
257
2500
57
5000
5.55
7.82
4.04
8.52
  Regression of Log (Ratio) on Log (Average)
Station
Slope
Intercept
RMSE
Regression of LogLog (Ratio) on Log (Average)
Station
Slope
Intercept
RMSE
Ratio <1 When
  Average >
NY
Kincaid
-.077
-.210
.499
1.07
.20
.69
652
163
  Correlation
NY
Kincaid
-.267
-.258
-.719
-.191
.62
1.06
-.34
-.36
                                      139

-------
                    TABLE  2
     Distribution  of  Residuals  at  Kincaid
         Value of
     Log(log(ratio))
                       Percent
          -2.03
          -1.43
           -.70
            .23
            .76
           1 .24

           1.43
                         .05
                         . 1 0
                         .25
                         .50
                         .75
                         .90
                         .95
                    Table 3
Regressions from Method of Change of Time Scale
   Model
Data Set
 Slope
Intercept
Iterated   New York  -0.0854
Log                  -0.0528
                             -0.415
                              0.716
Iterated
Log
Kincaid
 -0.12
-0.170
   0.606
   2.010
                         140

-------
                         TABLE 4
       Fitted Models for Spread of 5-Minute Levels
Regression of Log (SD) on Log (AVG
 Station
Slope
Intercept
Correlation
  Squared
   NI
 Kincaid
 .687
 .645
  -.972
   .114
    .49
    .53
      Regression of SD on Average
 Station
Slope
Intercept
Correlation
  Squared
   NY
 Kincaid
 .114
1 .1 97
  1.109
-11.169
    .33
    .67
                                141

-------
                                                TABLE 5
                    CONDITIONAL MEANS  AND  VARIANCES OF  LOG  5-MINUTE  LEVELS
zbarO
 :zb-ar)
2.6-1
2.6.1
2.61
2.6H
2.&H
2.61
2.b1
2.f,1
2.61
2.6H
2. 61
O . O5&
O.CH1
O.G27
0.015
O.OO3
-O.OO?
-O.015
-O.O22
-O.02?
-O.032
-O.O3H
-0.035
» (1.98-1 >
» 1 . 002 '«
» 1 .00? »
» 1.012 «
» 1 . 0 1 3 »
» 1 . 0 1 3 »
t 1.012 *
* l.OO? *
» 1.O02 '••
* O . 'H9H *
4O.98H *
O . OH 1
O . OH :i
G.G2:j
O.Olt
O.OOH
-O.OO 6
-O.OH
-O.021
-O.H2O
-O.O3O
-O.O33
-O.I 13-1
1 1 z-h-'ir -2. t
C?t«.:ir-2.E
t i^zti-Eir--2 .(
Kzhar-2-f
i i. zh.tir-2.(
Kzh

O.015
O.Olf.
O.OI 9
O - O2T.
0.012
0 . 002
-O.OO&
-O.OI 3
-O.OliJ
-O.O23
-0.026
-O.O2?
•

O.OO 3
O . OOH
O.O07
0 . i..i 1 2
0.01:3
1 1 . l lOO
-O.O01
-O.OG0
-O.OI 3
-O.O1Q
-O . O? 1
-0 . O22


-O.OO?
- o . due
-O.OOIi
O.O02
O . OOt:
O.O15
o . one.
- O . CIO 1
-o.oot.
-CI.011
-O.O1-I
-0.015


-O.Ol'J
-O.OlH
-O.OI 1
-O.MOIj
-O.001
O.OO6
0.0 15
o.noa
O.i to 2
-O.OO 3
-O.MO6
-G.I JO 7


-0.022
-G.O21
-o.o m
-O.OI J
-O.OOO
-O.O01
O.I 100
O . 0 ] M
O . 0 1 2
O.OO?
O.OOH
O.OO 3


-O.O2?
-O.O2&
-O.O23
-0.018
-O.O 13
-0.0Gb
O.OO2
0.012
O.02 5
O.O1-3
o.oie.
O.O 15


-O.032
-0.03O
-0.023
-O.023
-0.01O
-0.011
-O.OO 3
0.00?
0.019
O.032
O.O28
0.027


-O.03H
-O.O33
-O.O 30
-O.02C
-O.O21
-O.01H
-O.OO6
O.OOH
O.O 16.
O.028
0.013
0.01 1


-0.035
-U.03H
-(1. 032
-0.02?
-O.022
-O.i 115
-O.OO?
o.o on
O.O 15
O.02?
O.OH1
O.056
     Hero Z = v*clor"  of • logs of 5-ninut.S1 rs-^dings




          = obsc-rved  valuo of log hourly gc-onc-lric nc>an

-------
                                                        FIGURE   1
                             MAXIMUM  VERSUS  HOURLY  AVERAGE:    NEW  YORK  DATA
MAXIMUM

    400 |
     90
     20 »
                                                                                          A
                                                                                        AAA
                                                                                         CAA
                                                                                       AA
                                                                                   ABACD
                                                                                A   CBCA
                                                                       A    AAA ADEED
                                                                A   AAA AA  BBA BDMIRB
                                                                   A AA    DCCGERMKC
                                                                 A A BCACBCHMJTZTB
                                                             AC  A ADDEDFEIQYZZR
                                                           A    ABCADEDDLPXZZZYC
                                                          ABAA ABBDDCEEMYZZZZD
                                                           A BADFIFIIOQZZZZZUA
                                                         AAABDDFUIPSWZZZZZB
                                                     AA BBCDHGDHRUZZZZZZO
                                                        ABBDDESSYZZZZZO
                                                  AA  ECIEFMIZTZZZZZZC
                                              AAAAAACC GHFOSZZZZZZZV
                                              AAAABABCCGHVMZZZZZZZA
                                            A A BADDKIIRXZZZZZZZD
                                           A CBBAEJIIISZZZZZZZZH
                                              BCDFKMKUZZZZZZV
                                          A BBBDDGJRYZZ2ZZZZB
                                     B   D AAADJPYZZZZZZZZB
                                         B ABBCDJKZZZZZZB
                                    A     AACCUZZZZZZUC
                                 A B  ABBCBDRXZZZZZZZUA
                                    A ADBGFIZUZZZZZA
                                A AABAADJMHZZZZZZA
                                                                                                                         AA
    4.5|
                            A AAA  BECISXZZZZZZO
                            AB A BFBFQGZZZZZZI

                    A      A  DAOJRYZZZZZn

                       AAA  BOIFMOXURRCE


                   AA  FBKJFKQFHBE
      1 »
       A  A  A  A A BD AAK

0.17          0.55
20.0
            0,05
 1.8            6.0

HOURLY  AVERAGE
67.0
221 .0

-------
                                                           FIGURE  2
                                MAXIMUM  VERSUS  HOURLY  AVERAGE:    KINCAID  DATA
MAXIMUM
   8000
    400
     20
                                                          *        AB A
                                                     AA        A BA A AAA
                                                      A  I   C BD DBBAA
                                            A    A   B ACAD AADCCB A A
                                            A   DA ABBBCDABAGCDFCBA
                                       A   AAABCD CDACCCDBEEFJABCB
                                     A   ABGCDCOCFHEEDGMDFOFGBFA
                                   B A  CACBCFEECOJDHFGEEFDBADA
                              AAA CBBDAECACCFGGDKLEKGIEDDB
                             AAACBACCCDEEMGIHJILJSOKJDGCCCA
                          AAACDBBECBEFGlGFJHGFEtlGIJIGBB
                        B  ABDCCCFDGICHFHQII1FJUHHHFAEA
                     BAB EBABIDBIMLDmiKGIIOMMKlFHKEE
                   BDDDFCBCEEEKGIIJMIIQIIIIMOHLJOLFA
                ADBCDACAIGIIIPEFLKPOIPGlOnMISPIIHB
              BEBEFCHGIIimFIIJIKKGMMIinKZIUZUZROA
            IFFCGDFEKHHHJFIflQOKPOKQKUXZVHRHE
         EFDGHMGFKLDKMGSHIMSSVUZUqqZIZZlD
       CEADBA  EDEDECIIAir.EIFGDMMIlHUPJJ
    UliqCEirCUUFESIIUlJORSYRZOZZYZVOO
   H DABCCOBJECBAHIICFDMEKGJKIQOKD
NZ OHIIIFHIMGlKIIPIIIPOQUaUZZZZZZLD




1




z
z z
0.05 0


z
z
z
.22

z
i
z
z

z
N
T
Z
Z

Z
J
J
z
z
z

GC
EJ
KG
RO
ZZ
ZZ

FEEBCFIHIGKGGHLQFQOVYZZZZI
IBHGIIJOrOPHMQSXZQZZZZZZ
MGHSJOUJHQVYZZZZ2ZZZZ
NZIRTRZXZZZZZZZZZ
ZZZZZZZZZZZZ
zzzz
1.0 4.5





20.0 90.0 403.0
                                                            HOURLY AVERAGE

-------
                            FIGURE 3
     RESIDUALS OF  ITERATED LOG MODEL:   NEW YORK  DATA
                                                    MISSING VALUE
                                                            COUNT
                                                     % COUNT/NOBS

       BAR CHART                                          ft BOXPLOT
 2.25+x                                                   1
     .xx                                                 91    o
     .xxxxxxxxx                                         687
     .xxxxxxxxxxxxxxxxxxxxxxxxx                         2035
     . XKKKKKXXXXKXKXXXXXXKXXXKXKXXKXXXXXKXX             3027 X	+	X
     . XXXXXXXXXXXXXXXXXXKXKXXXXXXXXXXXXKKXKXXXXXXXXXXK  3915 +	+
     .xxxxxxxxxxxxxxxxxxxxxxxx                          1893
     .XXXXKKXX                                          592
     .xx                                                124    0
     .x                                                  44    0
     .x                                                  130
-3.25+x                                                   4    x
     	+	+	+	+	•»•	+	+	+	+	
     X MAY REPRESENT UP TO 82 COUNTS
    36
  0.29
  2.25 +
                    NORMAL PROBABILITY PLOT
                                               xxxxxxxxxx
                                       XXXXXXXX+
                               XXXXXXXX+
                      xxxxxxxxxx
               XXXXXXXf
        +XXXXXXXX
       XXX
       X
       X
 •3.25+x
       +	-f	+	+	+	+	+	+	+	1	+
           -2        -1        +0        +1        +2

-------
                            FIGURE  1|


      RESIDUALS  OF ITERATED  LOG  MODEL:   KINCAID DATA


                                                     MISSING VALUE
                                                             COUNT
                                                      % COUNT/NOBS

       BAR CHART                                           I  BOXPLOT
 2.25+x                                                  n
     .xxxxxxxxxx                                        565
     .xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                   1868
     . XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXXXXXXXXXXKKXXX  2891  +	+
     .XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                   1852  X—+--X
     .xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                    1782  I     I
-0.75+xxxxxxxxxxxxxxxxxxxxxxxxxxxx                      1677  +	+
     .xxxxxxxxxxxxxxxxxxxx                              1164
     .xxxxxxx                                           380
     .xxxxxxxxx                                         505
     .xxxx                                              214    0
     .XKX                                               123    0
-3.75+x                                                   4    0
      	+	+	+	+	+	+	+	+	+ —
      X MAY REPRESENT UP TO 61 COUNTS
 2741
17.37

                    NORMAL  PROBABILITY PLOT
 2.25+                                           4
                                             + + + X X X X X X X X
                                       xxxxxxxxxx
                                  XXXXXX+
                             XXXXX++
                          XXX + +
-0.75+                xxxx*
                 +XXXXK
             •f-f XXXX
        ++++XXXX
      ++XXXXX
      XXX
 3.75+x
      +	+	^.	+	+	+	4.	+	+	+	+
          -2        -1       +0        +1        +2

-------
                                 FIGURE 5
I
1
o
w
U
S
*X

2


h.
•3
w
LJ
                 ERRORS  WITH  USING  FIXED ESTIMATES
                (NEW  YORK  DATA,  KINCAID  PARAMETERS)
                 0.0
                                    Hsur
                  First quartos
                                                 Medians
                           Third
                                 FIGURE 6
                 ERRORS WITH USING  FIXED  ESTIMATES
                (KINCAID  DATA,  NEW  YORK  PARAMETERS)
             1.2
0.9 -


o.a -


0.7 -


0.6 -


0.3 -


0.4 -


0.3 -


0.2 -


0.1
                                                               L
                                        i
                                        10
     First Quortilcs
                                       i
                                      100
                                                 M«dlcns
1000
                          • Third Quortil«s


                                 147

-------
                                   FIGURE  7

                    ERRORS WITH SHORT-TERM MONITORS
                  (NEW YORK DATA,  ITERATED LOG MODEL)
£
X
2
_c

I
ui
              1.* -


              i.j -


              1.2 -


              1.1  -


               1
              0.9 -
             0.8  -
             0.7  -
             0.6  -
             0.5
         1 st Quortil*
                             0           3           32

                                        Hour Av«rag«
                                +•  Median        	3rd Quortil«

                                   FIGURE 8

                    ERRORS  WITH  SHORT-TERM MONITORS
                   (KINCAID DATA, ITERATED LOG MODEL)
                                                               316
o
w
UJ
        1 st Quartiie

-------
                                      FIGURE  9
              1.5
c
3
£
"*
O
kl
1.2 -


1.1 -

  1


0.9 -


0.3 -

0.7 -


0.6 -


0.5 -
         1st Quartil*
o

UJ
                       ERRORS  WITH  SHORT-TERM MONITORS
                          (NEW  YORK  DATA,  LOG MODEL)
                                        Hour Av«rog«
                                                    32
                                      3rd Quarti!«
                                                  316
                                     FIGURE 10
                       ERRORS  WITH  SHORT-TERM MONITORS
                           (KINCAID  DATA,  LOG MODEL)
                                                                316
                                        Hour Av«rog«
        1st Quartile
                                                   3rd Quartii*
                                       149

-------
                                Error In Maximum
  ?
p   a
   io  -•
                 J	
         o -
     f
X    <>
III
  ~
  a
  a
         04 -
         04
                                 
                                 I    II
                                                 Is)
                                J	L
 !•»
 CO  Lit

_J	L
                                                                 04  04
                                                                 k>  V
                                                                             W
                                                                             i-3
                                                                             SG
                                                                             O
                                                              ro *q

                                                              j^cj C^    *^3
                                                              M EC    M
                                                                 >    O

                                                              MO    »


                                                                 O    -»

                                                              M
                                                                 to
                                                                 o
                                                                 >
                                                                 t-
                                                                 M

-------
                                       FIGURE  12
 E
 a
 "x
 0
 UJ
     1.7  -
     1.6  -
     1.5  -
     1.4  -
     1.3  -
     1.2  -
     1.1  -
       1  -
     0.9  -
     0.8  -
     0.7  -
     0.6  -
     0.5  -
     0.4  -
     0.3  -
     0.2  -
     0.1  -
         0

1 5t Quartil«
E
°x
I
_c
^
o
u
                         ERRORS  WITH CHANGE  OF  TIME  SCALE
                       (NEW  YORK DATA,  ITERATED LOG  MODEL)
                                             i
                                             3
32
316
                                         Hour Av«rag«
                                 +  M«dten        	3rd Quartil*
                                       FIGURE 13
                         ERRORS  WITH  CHANGE OF  TIME  SCALE
                             (NEW  YORK DATA, LOG MODEL)
        1st Quartliw
                                                                       316
                                    Median
                                        151

-------
S" '»
                   PERCENT  OF SCALED DEVIATIONS
                   EXCEEDING NOMINAL PERCENTILES
x
i!
          o   —
    r
    i   *
S ?r
                                                                         o
                                                                    o
                                                                    tz)
                                                                    GO
                                                                    M
2S O M
O M O

M >T3 *O
t? £C t1!

o    o
> O M
H Tl 2

-  2 M
   O t-
2 t> M
M PI 00   G

   f O   PI
K td Tl
o a     -»
»    co   -«^
!^ ""d o

is so t-
> o M
SJ M t>
> a
S H o
ra H M

M M M
» M a>
                                                                    O
                                                                    25
                                                                    GO

-------
                                         FIGURE  15


                  OBSERVED  PERCENTILES  OF  SCALED  DEVIATIONS:

                           GRAPHS  OF  MODELLED PERCENTILES

                       (NEW YORK  DATA,  KINCAID  PARAMETERS)
         eo
       0) Cd
       55 -J
       O M
       M H
       H Z
       •< Cd
       >-( O
       > cc
       u u
       a a.
       < M
       O S
       eo o
         z
       b.
       O C3
         Z
       H M
       z a
       U [d
       U Cd
       IB O
       td X
       a. u
0.3
 a S'th p«rc«ntll«
&  70'tti p«rc»ntil«
                          0.6
1.6
4.0

 Hour
                       10.0
•*•  IS'th p«roent!l«
 x 85'th p«ro«jntll
-------
                       DISCUSSION
                   R. CLIFTON BAILEY*
           Health Care Financing Administration,
2-D-2 Meadows East, 6325 Security Blvd., Baltimore,
                                                                           2 1 207
     A recent editorial suggested that there be no
new data collection until present data sets are
thoroughly analyzed. This a tough standard. Even if
one attempted to thoroughly analyze present data
sets there would always be the possibility for more
analysis.  This is especially true when one considers
analyses based on multiple data sets - meta
analyses.

     The authors are to be commended for their
extensive data analyses. Of course some of us
remain disappointed that certain parametric and
nonparametrlc models were not explored because of
complexity.  In stating the reasons for not doing
certain analyses, I think the authors take a narrow
view of what is possible. The issues may be more
ones of cost, time or expected return. This in no
way undermines the value of the extensive empirical
exploration of the data undertaken by the authors.

     The authors set a task of establishing a
relationship between studies in which data are
recorded in short,-5-minute, intervals and the more
common choice of hourly summaries. They are
especially interested in establishing this
relationship because they believe It is necessary to
have information on the short time records to
establish health effects.

      When  the basic process is observed from
several points of view—different measurements,
such as the 5-mtnute and the hourly measurements,
should be expressible in terms of the common
process observed.  The perspective of a common
process being observed from different points of
view provides the framework or model to work from
From this perspective, distinctly different
measurements or measurement processes generally
are not equally informative of the process and the
statistical properties of these measurement
processes are not the same. In analyzing the data,
 it is important to remember that the measurement
process  is part of the observation and more than
                                one quantity may be needed to describe the process.
                                 The model for the process generally will be a
                                combination of stochastic and deterministic
                                components.  An issue underlying the effort to
                                evaluate different methods of observation is that
                                precision as well as  costs differ.

                                     To deal with the basic problem, it helps to
                                have a model that consists of the underlying
                                process to be observed and the the measurements
                                used to observe the process.  An evaluation with
                                such a model may suggest alternative measurement
                                strategies. For example, the measurement strategy
                                may consist of obtaining a fixed quantity over a
                                random time  interval instead of obtaining a measure
                                over a fixed time interval.  The idea is clearly
                                suggested by the analogy with a Poisson counting
                                process. In counting statistics,  two strategies are
                                commonly used.  One uses a fixed interval and
                                obtains the count while the other specifies a count
                                and measures the time to obtain this count. These
                                strategies can be evaluated to compare costs and
                                precision for a given situation.

                                     The main concomitant measures  explored
                                were time  of day and a meteorological factor, wind
                                direction.  These and other concomitant measures
                                need to be  part of the model.  I would like to see
                                more attention paid to concomitant factors at the
                                two sites.

                                     The authors state In their conclusions," the
                                theory that there is a simple relationship between
                                the 5-minute and hourly averages, governed by the
                                same constants  for all sites, Is not borne out by  the
                                two data sets examined."

                                     The conclusions and recommendations are
                                fundamentally sound.  The authors recommend
                                calibrating a model for each site.  In this way
                                differences among observed processes are properly
                                recognized even if they are not explicitly modeled.

                                     "Disclaimer
                                     The opinions are those of the author and do not
                                necessarily reflect the opinions of the Health Care
                                Financing Administration.
                                                  154

-------
                                           SUMMARY OF CONFERENCE
                                                 John C. Bailar III
                            Department of Epidemiology & Biostatistics, McGill University
                                           Montreal, PO Canada H3A 1A2
                                                        and
                      Office of Disease Prevention & Health  Promotion, U. S. Public Health Serv.
                       Switzer Building, Room 2132, 330 C Street, S.W., Washington, D.C. 20201
   This summary  of the  conference is  intended  to
 provide some brief and integrated commentary on the
 eight  papers  and  eight  discussions presented  here
 (1-16), plus some perspective on broader issues raised
 by the papers as a group but not covered by any one of
 them.
   I will say much about unsolved problems. Of course,
 the more one  knows about a  situation, the easier it is
 to critique  specific points and point to things  that
 should be  done.  This  is good for bringing out issues,
 but it can be bad if  it  creates  an impression  that
 problems  dominate solutions.  I  do not  want  my
 comments here to be taken as a general  indictment of
 compliance  sampling,  a field that  has recently made
 much  progress and is clearly making more.

 Compliance Sampling in a Broader Context

   The focus  of  the   conference  was  compliance
 sampling;  this term  includes  both a)  the  general
 assessment  of  how   well   we  are  doing  in   the
 management of hazards and  b) the generation of data
 for individual  action  to  enforce relevant laws  and
 regulations.  My basic  view,  as a citizen  and scientist
 rather than a regulator, is   that  regulations should
 provide and should be interpreted as firm limits rather
 than  targets, though  they   are  often  abused  or
 misinterpreted as  targets.   Examples   include  the
 approaches of many states and cities to the control of
 criteria air pollutants, and  the  apparent attitude of
 parts  of private industry that penalties  for violations
 are a  business   expense, to  be  balanced  against
 production volume and  costs so as to maximize overall
 profits.   Carol Jones   (17)   has commented  on  the
 effects of penalties on the probabilities  of violations,
•and at this  Conference Holley (11) has discussed such
 approaches in the  context of bubbles.
   But these two  purposes of compliance sampling —
 overall assessment and enforcement —  are  broad  and
 vague. There was  very little said at  the Conference
 about  the ultimate  purposes,  or  even the penultimate
 purposes,  of these  activities.  This is  a potentially
 serious gap, because what we  do (or  should do) in
 compliance  sampling can be profoundly affected  by
 matters beyond  the  short  term  goals   of  accurate
 assessment of the  distribution  and level of  specific
 hazardous agents.  Is  our ultimate goal to  protect
 human health? If  so,  what  does that  mean for the
 design of a  program in  compliance sampling, given our
 limits on time, money, attention, and other resources?
 How are concerns about cancer to be balanced against
 concerns about (say) birth defects,  or heart  disease?
 How  are concerns  about health  in  the U.S. to  be
 balanced against  health in other countries?  How are
 we to  balance short-term protection of our own health
 against  protection  far into  the future,  even across
 generations  not yet born?  How should  we view  and
 assess the quality of outdoor (ambient)  air vs. indoor
 air (Hunt, 4)?  There are similar very broad questions
 about  direct  health effects  vs.  the  indirect health
 effects of  unemployment and poverty,  or  restricted
 choices of important consumer goods, on protection of
health.  How are such matters  to  be developed in a
context of  concern  about protection of non-health
values,  such as limiting  the  role  of government  in
controlling   private   behavior   or   in   facilitating
compensation for harm actually inflicted (perhaps  at
much lower  overall cost  to society), the  effects  of
unenforced or unenforceable directives on respect  for
the law in  other  areas,  and  many other matters? I
recognize  that  such issues  are  generally to be  dealt
with at the  highest political and social levels, but their
resolution can  have a profound  effect on  compliance
sampling,  and compliance  samplers should understand
the issues  and  express  themselves  as  knowledgable
professionals. Whether an inspector chooses to return
to a plant  that was in violation last  month or to visit a
new  plant  may  depend  on  how  much the  agency
depends on  quiet  negotiation  vs.  threats  of  legal
action.  Whether limited resources  are used to sample
for agents with acute, lethal,  and readily identifiable
toxicity or  for more common but less characteristic
and less devastating  chronic disease  may depend on
what  recourse  is available when injury is  suspected.
Intensity of  sampling (and of enforcement)  in  some
critical industry may even depend  on the state of the
industry, and the state of the economy more  generally.
  The  importance of defining the goals of compliance
sampling in the broadest way is clear. But we have not
dealt  very  well  even with defining goals at  more
technical  levels.   Suppose  that  a  well-conceived
regulation sets a maximum exposure limit of 10 ppm.
Should compliance sampling be designed to give only a
yes/no  answer,  perhaps  expressed  as  a  Bernoulli
variable, about whether  some stream, or factory,  or
city  is  in   violation?  Should  we  instead  try  to
determine the mean exposure over some defined region
of time and space?  The mean and variance, or the
tails generally?  Should  we  go only for  the  order
statistics,   especially  the  extremes  (which   will
generally  provide  a moving target  as problems are
solved and  compliance improves)?   Do  we  need the
whole  probability  distribution  of  values?   Surely  a
yes/no answer  can  lead to much nonsense,  as  it did in
some erroneous interpretations by the news media of a
recent NAS  report on drinking water, and some aspects
of the  probability  distribution  of  values  need  more
attention  than  others, but surely there is also a  point
where we  have  learned enough about that distribution,
and must  invest additional resources in the study  of
other problems.
  Gilbert  et  al. illustrate this general need for precise
goals  in their   discussion  (9)  of  sampling  soil for
radioactivity.  Was the underlying goal  to  determine
whether radiation levels at  any square inch  of surface
were  above  the standard? Was  it  to  average,  or
integrate,  over some unspecified larger area?  Was  it
to determine means and variances,  or other aspects  of
the distribution?  Here, maybe the goal was in fact  to
determine means for small areas,  but we would still
need to know more about the problem, especially about
the  small-scale variability  of  contamination,   to
determine an appropriate sampling plan.  For example,
if contamination is  nearly uniform within each area for
                                                       155

-------
which a mean is required,  one test  per  sampled area
may be enough.  Conversely, if there is much chance of
having one very small, very hot rock (of,  say,  10   ,
10  ,  or  10   pCi/g) one might have  to sample on a
much finer grid. The general issue here is the scope or
range for averaging  (or otherwise "smoothing") results.
Chesson (10) has also commented on  needs for relating
statistical  procedures  to  specific  problems  and
contexts.  Holley's work on the bubble (11) deals with a
kind of averaging, but this Conference as a whole has
given rather little attention to even this level of goals.
  Likewise,  there  was  little   discussion  of how
strategies for compliance sampling must  accommodate
the likelihood of legal challenge.  A probable freedom
from such challenge may well have given Gilbert  (9)
considerable  latitude  to be  complex; to use a  great
deal of peripheral  information, and to interpret EPA's
raw  standard   as   he  settled   on  the scope  and
distribution of  sampled areas, to decide  that he  could
ignore possible variation over time, and to develop  a
special sampling protocol.
  At this  point, one may begin to  wonder about the
role  of  statistics  (and statisticians) in compliance
sampling.  I believe  very strongly that  the most visible,
and apparently the most  characteristic aspects  of
statistics - modeling of random variation, algebra, and
computation  -  are only a small (though  essential) part
of  the field. Statistics is, rather, the art and science
of  interpreting quantitative  data  that are  subject" to
error, and indeed,  in  the  study  of  environmental
hazards, random error may account for only a tiny part
of  the uncertainty.  Ross discussion (12)  brings out
clearly the real potential of statistics in the design of
bubbles  as  well  as  the  way bubbles ignore some
important distributional issues.        •  •
  I turn now  to three sets of  generic problems  in
compliance sampling: those  in policy and concept, in
unpredictable (stochastic)  influences  on the data, and
in  applications of theory.  These sets  of problems  are
broad and deep, and statistical thinking  has a large and
 critical role in each.

 Policy and Conceptual Aspects of  Compliance Sampling

   The first  set is related to  policy  and concepts.  I
 have already referred to the differences between broad
 public goals and more narrowly statistical goals,  but
 there are many intermediate questions about what it  is
 that one wants to accomplish, and what is feasible.
   Approaches  to  evaluation in many  fields fall rather
 well into three categories: evaluation of structure, of
 process,  and  of  outcome.   Each  can  be defined at
 multiple  levels, but  here it may  be most useful to
 equate structure to the chemical methods, engineering
 and  mechanical structures,  and other aspects  of the
 generation of  hazardous agent; process  to the emission
 or other release of  the hazard into  the  community, its
 transport after release,  and  exposure levels  where
 people are in fact exposed; and outcome to the human
 health endpoints (or other endpoints)  that are the more
 fundamental objects of concern.  Compliance sampling
 focuses on process  (in this context), but it is not clear
 that there has been  much hard policy  thinking about
 whether this is the best way to attain  the still rather
 fuzzy goals of the activity.
    One aspect of this matter is  the  need to  consider
 sensitive subgroups of the population.  Such subgroups
 may not always be evident (as seems likely with some
 carcinogens),  and  their existence  may not even   be
 suspected, but somehow  we must  recognize not only
 that some people get sick from exposures that do not
affect  others, but that not all persons have the same
probability of responding to some toxic agent.
  A  related point is "conservatism" in regulation, and
its reflections in compliance sampling.  Conservatism
has  several  purposes,  including  the protection  of
sensitive subgroups, and the need to provide a cushion
against random and nonrandom excursions of exposure
to higher levels.  I believe that its  main use, however,
is  to protect us against our ignorance, not against our
failures. We simply don't know what goes on within
the human body at low exposure levels of carcinogens
and  other  toxic  agents, and  choice  of the  wrong
statistical model  could lead to risk estimates that are
wrong   by  orders  of  magnitude.   Unfortunately,
underestimates of risk will tend to be far more serious
than overestimates if one works on a log scale, as is
implied  by   the  phrase  "orders  of  magnitude."
Implications of conservatism for compliance  sampling
are substantial.  It does little good to set conservative
limits   for   exposure   if   sampling,    and  hence
enforcement, do not follow.  It  is not at all  clear that
regulatory agencies have been consistently attentive  to
the  logical   link   between   conservatism  in  risk
assessment and conservatism in enforcement; indeed,
some agencies may have it backwards, and believe that
conservative exposure limits actually reduce  the  need
for compliance  sampling.  There is  scope here for a
new  study of how to  trade  off  the risks and costs  of
(say)   a higher  exposure limit plus  more rigorous
sampling to assure compliance vs. a lower  exposure
limit that is to be less vigorously enforced.
   Another  policy and conceptual issue in compliance
sampling has to do  with distributional effects.  When
dose-response curves are linear at low doses, the mean
exposure level in a population determines the expected
number of adverse events,  but it  may still matter a
great  deal  how  the  risk  is distributed  over  the
population.  For example, it is no longer acceptable  (at
least in the U.S.) to  concentrate the  risks of toxic
exposures on the  lowest economic and  social groups.
Nor  does one often hear arguments in favor of placing
a new  toxic hazard in an area already contaminated on
grounds that a  little  more would not  make  much
difference, even though this  might be rational if there
is reason to think  that the  risk is concentrated on a
small,  sensitive subpopulation  that  has  already  been
"exhausted" by prior exposures.
   Time does not permit more  than  a listing  of some
other  policy  issues  in  compliance  sampling.   How
should  ambient "natural" exposures to some agents,
such   as ozone,  be  accommodated  in protocols  for
compliance  sampling?  What  do   we  mean,  in
operational terms,  by  an  "instantaneous"  exposure?
Marcus gave a strong start to the  conference with  his
discussion  of  the need to design compliance sampling
programs in light  of  the  different  time scales  for
environmental    exposure,   biologic  response,   and
regulatory action (1),  while Hertzberg (2) has pointed
to some of the practical problems  of doing so.  How
should, or how  can,  model  uncertainty  be built into
sampling plans,  including models  of distribution and
exposure as well  as models of outcome?

Stochastic Aspects of Compliance Sampling

   Issues to this point have not depended on any aspect
of  uncertainty   in   measurement   or  on  random
variability  in  the substance  understudy.  The  steps
 from  a precise  deterministic  model to  an uncertain
 stochastic model introduce new issues.  What are  the
 roles  of deterministic vs. stochastic models, and how
                                                       156

-------
should those roles affect compliance  sampling?  It is
perhaps understandable  that  in enforcement actions,
compliance  data  are  treated  as  free  of random
variation, but  surely this  matter needs some careful
thought.
  Another issue arises  from gaps in the data — gaps
that  are  sometimes by  design  and  sometimes  not.
There was  little attention  to  this  matter in  this
conference.  Though  every  applied  statistician  is
familiar with the problem,, fewer are  aware of  the
theoretical  and applied approaches that have  been
worked out in  recent  years.   These  range  from
modeling  the  whole  data  set  and  using   iterative
maximum  likelihood  methods  to  estimate  missing
values (the E-M  algorithm) to the  straightforward
duplication  of  some nearby value,  which may be in
error  but  not  as far  off  as  ignoring  the  missing
observations, which  in  practice generally treats them
as if they had the mean value for that  variable ("hot
deck" methods).   Little  and Rubin (18)  provide an
introduction to this topic, and techniques analogous to
kriging, a method often used in geostatistics, may also
be useful (19).
  Unfortunately,   the   probability   distributions   of
greatest interest in compliance sampling may often be
hard to work with at a practical level.  They tend to be
"lumpy"  in  both space  and  time,  with   extreme
variability, long tails to the  right, and big coefficients
of variation.  Correlation  functions  over space  and
time  (as in kriging) are important, but may themselves
need to be estimated anew in each specific application,
with detailed attention to local circumstances.
  One practical consequence of dealing with "difficult"
distributions is  the loss of applicability of the Gaussian
distribution  (or at least loss of some confidence in its
applicability),  even  in  the  form of the central  limit
theorem.  Another is the loss of applicability  of linear
approaches, which  have  many  well-known  practical
advantages  with  both  continuous data  and  discrete
(even  non-ordered) classifications.  Nonlinear analogs
of, say, the general linear model and the loglinear or
logit   approaches   have   neither  the  theoretical
underpinnings, nor the range of packaged general-use
computer  programs,  nor the background  of use and the
familiarity of the  linear approaches.
  Given a set of data  and  a need to "average," what
kind  of  average  is  appropriate?   Some  obvious
questions  have  to do with ordinary weighted  averages,
others with moving  averages.  Still other  questions
have  to do  with  the form of the averaging function:
arithmetic,  harmonic,  geometric,  etc.  Geometric
means are sometimes used in compliance sampling, as
Wyzga has noted here (15), but they may often be quite
unsuitable precisely  because their advantage in some
other situations - that they reduce the importance of
high  outliers - obscures the values  of most  concern.
When health is at issue,  I want a mean that will attend
more to the upper tail than the lower tail. If six values
on six successive  days are (for example)  1,2, 3, 4,  6,
and 12, the geometric mean is 3.46, distinctly less than
the arithmetic mean  of 4.67, but it is the  6 and 12 that
may matter most. An average that  works opposite to
the geometric  mean seems better,  such as  the root
mean square (5.92 in the example above) or root mean
cube (6.99 above).  I  was glad indeed to learn recently
that   the  geometric mean  has  been   abandoned  in
measures of air particulates.
  Many   statistical    approaches   incorporate   an
assumption  that  the variance of  an observation  is
independent of its true value. This may rarely be  the
case.  However, lack of uniformity in  variance  may
often have little consequence, and in some other cases
it can be readily dealt with (such as  by log  or  square
root   transforms).   But   there  may  be  serious
consequences  if  the  nonuniformity  or the statistical
methods have  statistical  properties   that  are not
understood, or are not acceptable. For example, in the
6-value  numeric example above,  if  variances are
proportional to  the observed values, a  log  transform
may produce values of approximately  equal variance;
however, the  arithmetic  mean  of  logged  values  is
equivalent  to  the geometric  mean  of the original
values, so  that  a different approach may  be better.
Problems are even greater, of course, when it is biases
rather than random  error  that  may depend  on the
unknown true  values.  Nelson's paper here (3) is rich in
these and other statistical questions  as well as policy
questions.

Empirical Aspects of Compliance Sampling

   The  compliance sampler must  attend to  a wide
variety of  issues of direct, practical  significance that
derive from the context  in which  the  data  are to be
collected  and  used.  One  is  that   results  must  be
prepared so  as to  withstand   legal  challenge and,
sometimes, political attack.  A  practical consequence
is that much flexibility and much scope of application
of  informed judgment  are lost. There  may also  be
extra   costs  for  sample  identification,  replicate
measurement, and extra  record  keeping  that help to
validate individual values  but  reduce  resources for
other sampling that may contribute  as much to the
public health.  This  is  in  part a  consequence  of
competing  objectives within the  general  scope  of
compliance sampling.  What is  the  optimum  mix of
finding indicators of  many preventable problems and
applying gentle  persuasion to remove them vs. nailing
down a smaller number of problems and ensuring that
the data can be used in strong legal action if need be?
  A second broadly empirical issue  is the whole range
of chemical and physical limitations on the  detection
and  accurate  measurement of hazardous substances.
This is  not the  problem it  once  was — indeed, some
observers believe that increased sensitivity of methods
has led to  the opposite problem of overdetection and
overcontrol — but some substances  are still difficult
to measure at low concentrations by  methods that are
accurate, fast, and inexpensive.  Thus, measurement
remains  a  serious problem. An example is USDA's
program for assessing pesticide residues in  meat and
meat products, which is limited by high costs to about
300 samples per year for  the general  surveillance of
each major category (e.g., "beef cattle.")  Thus there is
a close link between the setting of  standards (what is
likely to be  harmful, to whom, in what degree, and
with what probability?) and the enforcing of  standards
(what violations  are  to be found, to what degree  of
precision, and with what probability?).  A standard not
enforceable because of limits on  laboratory methods is
no better, and may be worse, than no standard at all,
and  should be  a candidate for  replacement by  some
other  method  of controlling   risk   (e.g.,  process
standards,  or  engineering  controls).   Sometimes,  of
course,   deliberately  insensitive methods  can be
cultivated  and  put  to  use.  An example  is  FDA's
"sensitivity of  the method" approach to carcinogens in
foods.   Another  real  example,  though  slightly less
serious  here,  was  the  step taken  by  the  State  of
Maryland to improve its performance in  enforcing
federal  highway speed limits:  Move radar  detectors
from  the  flat straightaways to places  where many
                                                     157

-------
drivers slow down anyway, such as sharp curves and the
tops of hills, as other states had done long before.  The
incidence of  detection  of speed  violations  dropped
markedly, and Maryland  was suddenly in compliance
with  Federal  standards.    Creative  design   of  a
compliance  sampling plan  can produce pretty much
whatever the designer wants, and I take it that  a part
of  our  task  here  is  to  develop  approaches that
discourage,   inhibit,   and/or   expose  the  cynical
manipulation of sampling procedures.
  Sometimes, methods exist but for other reasons the
data  have not been  collected.  One example  is the
distribution  of various  foreign substances  in  human
tissues.   These include heavy  metals, pesticides, and
radioactive  decay products; none of these had been
adequately  studied   to  determine  the  probability
distribution  of body burdens in the general population.
Reasons  are   varied  and  deep,  but include  cost,
problems of storage, control of  access  to banks of
human tissues (an expendable resource), and ultimately
the problems of procuring enough of  the right kind of
material from  a fully representative  sample of people.
The need for detailed human data will surely grow with
the  growth  of new approaches  to risk assessment
(especially of carcinogens), and  compliance  sampling
may  well be  involved.  Toxicokinetics, in particular,
often  demands  human data;  mechanisms   can  be
examined in  other   species, but  human sensitivity,
human rate parameters, and human  exposure can be
determined  only by study of human circumstances and,
sometimes, human specimens.
  Compliance  sampling  is  indeed  an activity loaded
with  problems.  Overall,  there is a clear need for
substantially  more   thought  and  research  on  the
empirical issues raised by compliance sampling.  Wyzga
(15) and Bailey (16)  provides a fresh view of many of
these.

Overview of the Overview

  Where do we go from here?  It  is easy to call for
more and better compliance sampling, and to show how
we  could then do more and better things. That will not
get  us  far  in this  age of constrained resources.   I
believe that we need some other things first, or instead.
  First  is a broader and  deeper view of compliance
sampling.   Many  agencies  and  programs  do  such
sampling, but almost always with a narrow focus  on the
enforcement of one or  another regulation.  This view
•should be broader — to include other substances, other
agencies, and other objectives  (including  research) —
and it should be deeper,  so that issues of compliance
sampling are  considered  at each  stage  from  initial
legislation  onward, and plans are integrated  with all
other   relevant   aspects   of  Agency   activities.
Compliance sampling simply must not be treated like a
poor relative — tolerated but not really welcome, and
largely  ignored until its  general shabbiness or some
genuine scandal forces a response.
  A broader view of compliance  sampling might, for
example, support Nelson's comments on extensions
from  existing data to broader groups, even to national
populations  (3). Nelson's paper as a whole is  unusually
rich in  both statistical  questions and policy questions.
While the matter seems to have received little specific
discussion,  it seems  to me that the maximum  useful
geographic  range  or  population  size for compliance
sampling, and  maybe the optimum too, is the same as
the   maximum  feasible  scope  of  specific   control
measures.  Thus,  national data may be  most critical in
drafting or  revising national laws  and regulations, but
local  data are indispensable  for  understanding  local
needs, monitoring local successes, and enforcing  local
sanctions.
  Another   aspect   of  broadening   our  view  of
compliance sampling is the need to optimize sampling
strategies for attaining specific, carefully elaborated
goals.  Thus,  there might be reason in public policy to
extend the use of weighted sampling,  with more effort
to collect samples likely to be out of compliance. This
approach  seems  to have  substantial  informal  use,
especially when inspectors have considerable latitude
to make decisions in the field, but has had less in the
way of formal attention.
  Still another aspect  is the need for empirical  study
of  the probability  distributions  that  arise  in the
samples, and the development of sampling plans and
analytic  approaches    that    accommodate    those
distributions. Should one take a "point" sample of just
the size needed for testing, or take a more distributed
sample, mix it, and test an aliquot?   Is there  a larger
role for two-stage sampling, in which the selection of
a general area for  examination  is   followed  by the
selection  of sub-  areas?   Or  a  role for  two-stage
testing, in which aliquots of several samples are mixed
and   tested   for  the   presence  of  some  offending
substance, with further testing of individual  samples
only if the group result is positive?
  Perhaps the most fundamental need in developing a
more  comprehensive  view of  compliance  sampling is
for careful   consideration  of  the role of genuinely
random  sampling,   as  opposed  to  haphazard  or
subjectively selected samples  of convenience.  One of
the biggest surprises to me at  this Conference was the
lack of attention to the need  to  guarantee genuinely
random  sampling,   though   it  provides  the  only
acceptable justification for the statistical measures,
such as p-values and confidence limits, that have been
tossed about quite freely here.  As a part of this,  there
is a clear need for new approaches to the computation
of variances and other  functions of the data, which will
force demands for some kinds  of randomization in the
sampling.  Gilbert's problem in particular (9) calls  for
highly sophisticated statistical  modeling and analysis.
  Second  is  a deeper consideration of how compliance
sampling can be made  more productive than in just the
detection of violations, and how it can support  broader
Agency  and  national  objectives.   I have   already
referred to  several aspects of this,  but  some points
still require  comment. One  is the value of designing
compliance   programs  (including   sampling)  that
encourage both more  and  better  monitoring  and also
encourage  what  might be  called   supercompliance.
Response  to the findings  of  a particular  sample or
pattern of samples  may be yes-or-no, but  surely one
should  put   greater   weight   on  finding  the bigger
violations. Frank has referred to this  (13), with special
comment  about' the  potential  value  of  variable
frequency (and intensity) in  sampling, while  Warren
(14) has noted some practical obstacles.
  Some statistical tools do exist  to  aid in increasing
the broad utility of data  from compliance  sampling.
Bisgaard (5)  and Price  (7) have each presented reasons
for   more   careful   attention  to   the  operating
characteristics  (OCs)   of  programs   for  compliance
sampling.   OCs  might  in  fact  be  a  good  way to
communicate with  Agency  administrators and others
about the consequences of  choosing one or  another
approach  to   monitoring,   though  Johnson  (6) has
emphasized  the need for attention to the upper tail of
exposure rather than the mean.  It seems to  me that
the question of tail vs. mean  may well depend on the
                                                      158

-------
 health endpoint in question;  an effect  such as  cancer
 that is considered a function of lifetime exposure may
 well be approached by means, while effects that really
 depend on  short-term peaks  should be regulated in
 terms of peaks, though this may create some problems
 when both kinds of endpoints must be  managed in the
 same exposure setting.  Bisgaard and  Hunter (5)  are
 firmly on  the  right  track with their  insistence on  a
 more comprehensive view   that  integrates  sampling
 protocols, calibration of the  tools and processes, and a
 decision  function to determine  responses.  This also
 underlines  the  need for clear articulation of goals;
 otherwise,    Bisgaard's    approach    cannot   be
 implemented.  Johnson  (6) also points  to the need for
 adequate attention to other matters, too, including the
 political   situation,  pollutant   behavior,   sampling
 constraints, and the objectives  of   the   standard,
 Flatman (8) also emphasizes  the need for constant
 attention to the practicalities of solutions to real,  and
 different, problems.
   Other  statistical  tools   of  potential   value   in
 compliance  sampling    can   be   found   in   the
 epidemiologist's  approach to diagnostic testing,  with
 an insistence  that policy decisions about  testing be
 based  on sound  data on sensitivity, specificity,  and
 positive  and   negative  predictive   values.   These
 concepts have  proved  invaluable in policy decisions
 about  medical  screening,  and  they have   similar
 potential  to sharpen decisions  about   environmental
 screening.
   Third,  and my final point,  is a plea  that regulatory
 agencies explore the potential of statistical decision
 theory in  their  approaches  to compliance sampling,
 including explicit consideration  of the value of new
 information.  The  -emphasis this  will put  on  such
 matters as prior distributions, objective functions, cost
 functions, and balancing of disparate endpoints — all
 of which are already major  elements in setting policy
 about  compliance  sampling —  can   only  be  good.
 Among  other benefits,  decision  theory will tend to
 direct  Agency  attention to  those points  where  the
 biggest improvements can be  made,  and  away from
 both fine-tuning of little things with  little potential
 profit and spinning wheels over big things that can't be
 settled anyway.
   This  would  again direct  attention  to  how prior
 distributions for  the  probability, location, and  degree
 of  violation are developed  and used.   Thus,  Gilbert
 samples  from plots that are  next to plots  already
 known to be in violation; the frequency of air sampling
 is tied  to  the  frequency  of past violations;  and
 experienced plant inspectors come to know where  the
 bodies  may be buried and how to look for them.
  Overall,  this  Conference was eminently successful in
 bringing  out a  broad range  of problems,  issues, and
 research needs.  It  has  also provided  some answers,
 though the  most  important products of  our work here
 will continue to unfold for years to come.  Our Chair,
 speakers, and discussants deserve much thanks for  a
job well done.
                 BIBLIOGRAPHY

1.   Marcus    AH.     Time    Scales:     Biological,
    environmental, regulatory. This conference.
2.   Hertzberg  RC.  Discussion of paper  by Marcus.
    This conference.
3.   Nelson WC. Statistical issues  in human exposure
    monitoring. This conference.
4.   Hunt WF.  Discussion  of paper by  Nelson.   This
    conference.
5.   Bisgaard S, Hunter WG.  Designing environmental
    regulations. This conference.
6.   Johnson WB.  Discussion of paper by Bisgaard and
    Hunter. This conference.
7.   Price  B.   Quality  control   issues  in  testing
    compliance   with   a    regulatory    standard:
    Controlling statistical decision error rates.   This
    conference.
8.   Flatman GT. Discussion of paper by Price.   This
    conference.
9.   Gilbert RO, Miller ML,  Meyer  HR.  On the design
    of a sampling plan  to verify compliance with EPA
    standards foe radium-226 in soil at uranium  mill
    tailings remedial action sites.  This conference.
10.  Chesson J.  Discussion of paper by Gilbert, Miller,
    and Meyer. This conference.
11.  Holley   JW,   Nussbaum    BD.     Distributed
    compliance:  EPA  and  the  lead   bubble.   This
    conference.
12.  Ross NP.   Discussion  pf paper  by  Holley  and
    Nussbaum.  This conference.
13.  Frank  NH,   Curran   TC.    Variable  sampling
    schedules  to  determine  PM]Q  status.    This
    conference.
14.  Warren J.   Discussion  of paper  by  Frank  and
    Curran. This conference.
15.  Hamraerstrom  TS,  Wyzga RE.  Analysis  of the
    relationship between maximum and average in S02
    time series. This conference.
16.  Bailey RC. Discussion of paper by Hammerstrom
    and Wyzga. This conference.
17.  Jones CA.  Models  of Regulatory enforcement and
    compliance,  with  an  application to  the OSHA
    Asbestos Standard.  Harvard University Economics
    Department,  Unpublished doctoral   dissertation,
    1982.
18.  Little  RJA, Rubin  DB.   Statistical Analysis  with
    Missing Data.  John Wiley, 1987.
19.  Jernigan RW.   A Primer  on Kriging.   Statistical
    Policy  Branch,  US   Environmental   Protection
    Agency, 1986.
                                                      159

-------
                            APPENDIX A:  Program
                                Monday. October 5

INTRODUCTION

9:00 a.m.      Paul I. Feder, Conference Chairman, Battelle Columbus Division
              Dorothy G. Wellington, U.S. Environmental Protection Agency

I.   TOXICOKINETIC AND PERSONAL EXPOSURE CONSIDERATIONS IN THE DESIGN
    AND EVALUATION OF MONITORING PROGRAMS

9:10 a.m.      Time Scales:  Biological, Environmental, Regulatory
              Allan H. Marcus, Battelle Columbus Division
              DISCUSSION
              Richard C. Hertzberg, U.S. EPA, ECAO-Cincinnati

10:15 a.m.     BREAK

10:30 a.m.     Some Statistical Issues in Human Exposure Monitoring
              William C. Nelson, U.S. EPA, EMSL-Research Triangle Park
              DISCUSSION
              William F. Hunt, Jr., U.S. EPA, OAQPS-Research Triangle Park

12:00 noon     LUNCHEON

H.  STATISTICAL DECISION AND QUALITY CONTROL CONCEPTS IN DESIGNING
    ENVIRONMENTAL STANDARDS AND COMPLIANCE MONITORING PROGRAMS

1:00 p.m.      Designing Environmental Regulations
              Soren Bisgaard, University of Wisconsin-Madison
              DISCUSSION
              W. Barnes Johnson, U.S. EPA, OPPE-Washington, D.C.

2:15 p.m.      BREAK

2:30 p.m.      Quality Control Issues in Testing Compliance with a Regulatory Standard:
              Controlling Statistical Decision Error Rates
              Bertram Price, Price Associates, Inc.
              DISCUSSION
              George T. Flatman, U.S. EPA, EMSL-Las Vegas

m.  COMPLIANCE WITH RADIATION STANDARDS

3:40 p.m.      On the  Design of a Sampling Plan to Verify Compliance with EPA
              Standards for Radium-226 in Soil at Uranium Mill Tailings Remedial
              Action  Sites
              Richard O. Gilbert, Battelle  Pacific Northwest Laboratories; Mark L.
              Miller,  Roy F. Weston, Inc.; H.R. Meyer, Chem-Nuclear, Inc.
              DISCUSSION
              Jean Chesson, Price Associates, Inc.

5:00 p.m.      RECEPTION            160
                                                          (see over)

-------
                                Tuesday. October 6

IV.  THE BUBBLE CONCEPT APPROACH TO COMPLIANCE

9:00 a.m.      Distributed Compliance—EPA and the Lead Bubble
              John W. Holley, Barry D. Nussbaum, U.S. EPA, OMS-Washington, D.C.
              DISCUSSION
              N. Philip Ross, U.S. EPA, OPPE-Washington, D.C.

10:15 a.m.     BREAK

V.  COMPLIANCE WITH AIR QUALITY STANDARDS

10:30 a.m.     Variable Sampling Schedules to Determine PMio Status
              Neil H. Frank, Thomas C. Curran, U.S. EPA, OAQPS-Research Triangle
              Park
              DISCUSSION
              John Warren, U.S. EPA, OPPE-Washington, D.C.

12:00 noon     LUNCHEON

1:00 p.m.      The Relationship Between Peak and Longer Term Exposures to Air
              Pollution
              Ronald E. Wyzga, Electric Power Research Institute, Thomas S.
              Hammerstrom, H. Daniel Roth, Roth Associates
              DISCUSSION
              R. Clifton Bailey, U.S. EPA, OWRS-Washington, D.C.

2:15 p.m.      BREAK

SUMMARY OF CONFERENCE

2:30 p.m.      John C. Bailar III, McGill University, Department of Epidemiology and
              Biostatistics


    This Conference is the final in a series of research conferences on interpretation of
environmental data organized by the American Statistical Association and supported by a
cooperative agreement between ASA and the Office of Standards and Regulations, under
the Assistant Administrator for Policy Planning and Evaluation, U.S. Environmental
Protection Agency.
                        Conference Chairman and Organizer:
                      Paul I. Feder, Battelle Columbus Division
                                   !fg|!
                                     161

-------
                       APPENDIX B: Conference Participants
Ruth Allen
U.S. EPA
401 M Street, S.W., RD-680
Washington, DC  20460

Stewart J. Anderson
CIBA-GEIGY Corporation
556 Morris Avenue, SIC 249
Summit, NJ  07901

John C. Bailar ffl
(McGill University)
468 N Street, S.W.
Washington, DC  20024

R. Clifton Bailey
Environmental Protection Agency
401 M Street, S.W., WH-586
Washington, DC  20460

T. O. Berner
Battelle Columbus Division
2030 M Street, N.W., Suite 700
Washington, DC  20036

Soren Bisgaard
University of Wisconsin
Center for Quality & Productivity
  Improvement
Warf Building, 610 Walnut Street
Madison, WI  53705

Jill Braden
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20852

Chao Chen
U.S. EPA
401 M Street, S.W., RD-689
Washington, DC  20460

Jean Chesson
Price Associates, Inc.
2100 M Street, N.W., Suite 400
Washington, DC  20037

James M. Daley
(U.S. EPA)
12206 Jennel Drive
Bristow, VA  22012
Susan Dill man
U.S. EPA
401 M Street, S.W., TS-798
Washington, DC 20460

Paul I. Feder
Battelle Columbus Division
505 King Avenue
Columbus, OH 43201

George T. Flatman
U.S. EPA, EMSL-LV
P.O. Box 93478
Las Vegas, NV 89193-3478

Paul Flyer
Westat,  Inc.
1650 Research Boulevard
Rockville, MD 20852

Ruth E.  Foster
U.S. EPA-OPPE/OSR
401 M Street, S.W.
Washington, DC 20460

Neil H. Frank
U.S. EPA, OAQPS
MD-14
Research Triangle Park, NC 27711

Richard O. Gilbert
Battelle Pacific Northwest Lab
P.O. Box 999
Richland, WA 99352

J. Hatfield
Battelle Columbus Division
2030 M Street, N.W., Suite 700
Washington, DC 20036

Richard C. Hertzberg
U.S. EPA, ECAO
Cincinatti, OH 45268

John W.  Holley
(U.S. EPA-OMS)
9700 Water Oak Drive
Fairfax, VA 22031

William  F. Hunt, Jr.
U.S. EPA, OAQPS
MD-14
Research Triangle Park, NC 27711
                                        162

-------
Thomas Jacob
Viar and Company
209 Madison
Alexandria, VA  22314

Robert Jernigan
American University
Department of Mathematics
 and Statistics
Washington, DC 20016

W. Barnes Johnson
U.S. EPA, OPPE
401 M Street, S.W., PM-223
Washington, DC 20460

Herbert Lacayo
U.S. EPA
401 M Street, S.W., PM-223
Washington, DC 20460

Emanuel Landau
American Public Health Association
1015 15th Street, N.W.
Washington, DC 20005

Darlene M.  Looney
CIBA-GEIGY Corporation
556 Morris Avenue, SIC 257
Summit, NJ 07901

Allan H. Marcus
Battelle Columbus Division
P.O. Box 13758
Research Triangle Park, NC 27709-2297

Lisa E. Moore
U.S. EPA
26 W. Martin Luther King Jr. Drive
Cincinnati,  OH  45268

William C.  Nelson
U.S. EPA, EMSL
MD-56
Research Triangle Park, NC 27711

Barry D. Nussbaum
U.S. EPA
401 M Street, S.W., EN-397F
Washington, DC 20460

Harold J. Petrimoulx
Environmental Resources
 Management, Inc.
999 West Chester Pike
West Chester, PA 19382

Bertram Price
Price Associates, Inc.
2100 M Street, N.W., Suite 400
Washington, DC 20037
 Dan Reinhart
 U.S. EPA
 401 M Street, S.W., TS-798
 Washington, DC  20460

 Alan C. Rogers
 U.S. EPA
 401 M Street, S.W.
 Washington, DC  20460

 John Rogers
 Westat, Inc.
 1650 Research Boulevard
 Rockville, MD 20852

 N. Philip Ross
 U.S. EPA, OPPE
 401 M Street, S.W., PM-223
 Washington, DC  20460

 Brad Schultz
 U.S. EPA
 401 M Street, S.W., TS-798
 Washington, DC  20460

 John Schwemberger
 U.S. EPA
 401 M Street, S.W., TS-798
 Washington, DC  20460

 Paul G. Wakim
 American Petroleum Institute
 1220 L Street, N.W.
 Washington, DC  20005

 John Warren
 U.S. EPA, OPPE
 401  M Street, S.W., PM-223
 Washington, DC  20460

 Dorothy G. Wellington
 U.S. EPA
 401  M Street, S.W., PM-223
 Washington, DC  20460

 Herbert L. Wiser
 U.S. EPA
 401  M Street, S.W., ANR-443
 Washington, DC  20460

 Ronald W. Wyzga
 Electric Power Research Institute
 P.O. Box 10412
 Palo Alto, CA 94303

 Conference Coordinator:
 Mary Esther Barnes
 American Statistical Association
 1429 Duke Street
 Alexandria, VA  22314-3402

163

-------