ASA-EPA Conferences On Interpretation Of Environmental Data: IV Compliance Sampling, October 5-6th, 1987


x>EPA
           Agency
            Washington, DC 20460
230-030-47
           Statistical Policy Branch
ASA/EPA Conferences on
Interpretation of
Environmental Data
           IV Compliance Sampling
           October 5 -6th, 1987

-------
PREFACE
This volume is a compendium of the papers and commentaries that were presented at
the fourth in a series of conferences on interpretation of environmental data conducted by
the American Statistical Association and the U.S. Environmental Protection Agency's
Statistical Policy Branch of the Office of Standards and Regulations/Office of Policy,
Planning, and Evaluation. The ASA Committee on Statistics and the Environment
developed this series and has general responsibility for it.

The purpose of these conferences is to provide a forum in which professionals from
the academic, private, and public sectors exchange ideas on statistical problems that
confront EPA in its charge to protect the public and the environment through regulation of
toxic exposures. They provide a unique opportunity for Agency statisticians and scientists
to interact with their counterparts in the private sector.

The eight papers and accompanying discussions in this volume of proceedings are
about "compliance sampling" to determine how well environmental standards are met.
These papers provide valuable guidance in the planning of future environmental studies.
The papers address many aspects of compliance, and are intended for statisticians involved
in planning how to ascertain general levels of compliance and identify noncompliers for
special attention. Such work is inherently statistical and must be based on anticipation of
the statistical analysis to be performed so that the necessary data can be collected. These
proceedings should help the statistician anticipate the analyses to be performed. In
addition, the papers discuss implications for new studies. No general prescriptions are
offered; none may be possible.

The emphases in these papers are quite different. No two authors have chosen the
same aspect of compliance to examine. This diversity suggests that a major challenge is
to consider carefully each study aspect in the planning process. Meeting this challenge
will require a high degree of professionalism from the statistical community.

The conference itself and these proceedings are primarily the result of the efforts of
the authors and discussants. The discussants not only describe how their views differ from
those of the authors, but provided independent ideas as well. The coordination of the
conference and of the publication of the proceedings was carried out by Mary Esther
Barnes and Lee L. Decker of the ASA staff.

The views presented in this conference are those of individual writers and should not
be construed as reflecting the official position of any agency or organization.

This fourth conference, "Compliance Sampling," was held in October 1987. Others
were the first conference, "Current Assessment of Combined Toxicant Effects," in May
1986, the second , "Statistical Issues in Combining Environmental Studies," in October
1986, and the third , "Sampling and Site Selection in Environmental Studies," in May 1987.
John C. Bailar HI, Editor
Chair, ASA Committee on Statistics and the Environment
Department of Epidemiology and Biostatistics, McGill University
and
Office of Disease Prevention and Health Promotion
U.S. Department of Health and Human Services

-------
INTRODUCTION

The general theme of the papers and associated discussions is the design and
interpretation of environmental regulations that incorporate, from the outset, statistically
valid compliance verification procedures. Statistical aspects of associated compliance
monitoring programs are considered. Collectively the papers deal with a wide variety of
environmental concerns including various novel approaches to air emissions regulations and
monitoring, spatial . sampling of soil, incorporation of potential health effects
considerations into the design of monitoring programs, and considerations in the statistical
evaluation of analytical laboratory performance.

Several papers consider aspects of determining appropriate sampling frequencies.
Allan Marcus discusses how response time frames of potential biological and health effects
due to exposures may be used to decide upon appropriate monitoring interval time frames.
He demonstrates how biokinetic modeling may be used in this regard.

Neil Frank and Tom Curran discuss factors influencing required sampling frequencies
to detect paniculate levels in air. They emphasize the need to specify compliance
monitoring requirements right at the time that the air quality standard is being
formulated. They suggest an adaptive monitoring approach based on site specific
requirements. Those sites that are clearly well above or well below the standard need be
sampled relatively infrequently. Those sites that straddle the standard should be sampled
more frequently to decrease the probabilities of misclassification of
attainment/nonattainment status.

Tom Hammerstrom and Ron Wyzga discuss strategies to accommodate situations
when Allan Marcus' recommendations for determining sampling frequency have not been
followed, namely when monitoring data averaging time intervals are very long relative to
exposure periods that may result in adverse physiological and health consequences. For
example, air monitoring data may be averaged over one hour intervals but respiratory
symptoms may be related to the highest five minutes of exposure during that hour. The
authors model the relationships between peak five minute average concentration during an
hour and the overall one hour average concentration under various stochastic process
assumptions. They combine monitoring and modeling to predict short term peak
concentrations on the basis of observed longer term average concentrations.

Bill Nelson discusses statistical aspects of personal monitoring and monitoring
"microenvironments" such as homes and workplaces to assess total personal exposure.
Such data are very useful for the exposure assessment portions of risk assessment. Dr.
Nelson compares and contrasts personal monitoring with the more traditional area
monitoring. The availability of good personal exposure data would permit much greater
use of human epidemiologic data in place of animal toxicologic data in risk assessment.

Richard Gilbert, M. Miller, and H. Meyer discuss statistical aspects of sampling
"frequency" determination in the spatial sense. They consider the development of a soil
sampling program to estimate levels of radioactive solid contamination. They discuss the
use of multilevel acceptance sampling plans to determine the compliance status of
individual soil plots. These plans have sufficient sensitivity to distinguish between
compliant/noncompliant plots yet result in substantial sample size economies relative to
more naive single stage plans.

-------
regulation. The "bubble" concept specifies that average environmental standards must be
maintained across a dimension such as area, time, auto fleet, or industry group. This
dimension constitutes the "bubble." Lack of compliance in one part of the bubble may be
offset by greater than minimum compliance in other parts. Emissions producers have the
option to trade, sell or purchase emissions "credits" with, from, or to other emissions
producers in the bubble. Alternatively, they may "bank" emissions "credits" for use in a
future time period. Such an approach to regulation greatly enhances the emissions
producers' flexibility, as a group, to configure their resources so as to most economically
comply with the overall standard.

Soren Bisgaard and William Hunter discuss statistical aspects of the formulation of
environmental regulations. They emphasize that the regulations, including their
associated compliance monitoring requirements, should be designed to have satisfactory
statistical characteristics. One approach to this is to design regulations that have
operating characteristic curves of desired shape. Alternative candidate formulations can
be compared in terms of the shapes of their associated operating characteristic curves.

Bert Price discusses yet another statistical aspect of environmental regulation;
evaluating the capabilities of analytical laboratories. He contrasts and compares
strategies to evaluate individual laboratories based only on their own bias and variability
characteristics (intralaboratory testing) with strategies that evaluate laboratories as a
group (interlaboratory testing). Price's paper has commonality with that of Bisgaard and
Hunter in that he argues that first the operating characteristic of a regulation needs to be
specified. This specification is then used to determine the types and numbers of
observations required in the associated compliance tests.

The eight papers in this volume of proceedings deal with diverse aspects of the
statistical design and interpretation of environmental regulations and associated
compliance monitoring programs. A unifying theme among them is that the statistical
objectives and characteristics of the regulations should be specified right at the planning
stage and should be drivers of the specific regulation designs rather than being
(in)consequential afterthoughts.

Paul I. Feder
Chair, ASA/EPA Conference on Compliance Sampling
Battelle Memorial Institute
IV

-------
                            TABLE OF CONTENTS



Preface. JOHN C. BAILAR III, McGill University                                ii

Introduction. PAUL I. FEDER, Battelle Memorial Institute                        iii

Index of Authors    .                                                       vi

       I. TOXICOKINETIC AND PERSONAL EXPOSURE CONSIDERATIONS IN
           THE DESIGN AND EVALUATION OF MONITORING PROGRAMS

Time Scales: Biological, Environmental, Regulatory. ALLAN H. MARCUS,
Battelle Columbus Division                                                   1

Discussion.  RICHARD C. HERTZBERG, U.S. Environmental Protection
Agency, ECAO-Cincinnati                                                   16

Statistical Issues in Human Exposure Monitoring. WILLIAM C. NELSON,
U.S. Environmental Protection Agency, EMSL-Research Triangle Park              17

Discussion.  WILLIAM F. HUNT, JR., U. S. Environmental Protection
Agency, OAQPS-Research Triangle Park                                       39

  H. STATISTICAL DECISION AND QUALITY  CONTROL CONCEPTS IN DESIGNING
   ENVIRONMENTAL STANDARDS AND COMPLIANCE MONITORING PROGRAMS

Designing Environmental Regulations.  SOREN BISGAARD, WILLIAM G. HUNTER,
University of Wisconsin-Madison                                             41

Discussion.  W. BARNES JOHNSON, U.S. Environmental Protection Agency,
OPPE-Washington, D.C.                                                     51

Quality Control Issues in Testing Compliance with a Regulatory Standard:
Controlling Statistical Decision Error Rates.  BERTRAM PRICE, Price
Associates, Inc.                                                            54

Discussion.  GEORGE T. FLATMAN, U.S. Environmental Protection Agency,
EMSL-Las Vegas                                                           75

                m. COMPLIANCE WITH RADIATION STANDARDS

On the Design of a Sampling Plan to Verify Compliance with EPA Standards
for Radium-226 in Soil at Uranium Mill Tailings Remedial-Action Sites.
RICHARD O. GILBERT, Battelle Pacific Northwest Laboratory, MARK L.
MILLER, Roy F. Weston, Inc.; H. R. MEYER, Chem-Nuclear Systems, Inc.           77

Discussion.  JEAN CHESSON, Price Associates,  Inc.                              111

            IV.  THE BUBBLE CONCEPT APPROACH TO COMPLIANCE

Distributed Compliance: EPA and the Lead Bubble. JOHN W. HOLLEY, BARRY
D. NUSSBAUM, U.S. Environmental Protection Agency, QMS-Washington, D.C.       112

Discussion.  N. PHILIP ROSS, U.S. Environmental Protection Agency,
OPPE-Washington, D.C.                                                     121

-------
                V.  COMPLIANCE WITH AIR QUALITY STANDARDS

Variable Sampling Schedules to Determine PMjQ Status.  NEIL H. FRANK,
THOMAS C. CURRAN, U. S. Environmental Protection Agency, OAQPS-
Research Triangle Park                                                      122

Discussion.  JOHN WARREN, U. S. Environmental Protection Agency, OPPE-
Washington, D.C.                                                           128

Analysis of the Relationship Between Maximum and Average in S02 Time
Series. THOMAS S. HAMMERSTROM, Roth Associates, RONALD E. WYZGA,
Electric Power Research Institute                                             129

Discussion.  R. CLIFTON BAILEY, Health Care Financing Administration           154

Summary of Conference. JOHN C. BAILAR III, McGill University and
U.S. Public Health Service                                                    155

Appendix A: Program                                                       160

Appendix B: Conference Participants                                          162
                             INDEX OF AUTHORS
Bailar, John C	  ii,155
Bailey, R. Clifton 	  154
Bisgaard, Soren 	  41
Chesson, Jean 	  Ill
Curran, Thomas C	  122
Feder, Paul I	  iii
Flatman, George T	  75
Frank. Neil H	  122
Gilbert, Richard 0	  77
Hammerstrom, Thomas S	  129
Hertzberg, Richard C	  16
Holley, John W	  112
Hunt, Jr., William F	  39
Hunter, William G	  41
Johnson, W. Barnes 	  51
Marcus, Allan H	   1
Meyer, H. R	  77
Miller, Mark L	   77
Nelson, William C	  17
Nussbaum, B. D	  112
Price, Bertram 	  54
Ross, N. Philip 	  121
Warren, John 	  128
Wyzga, Ronald E	  129
                                       VI

-------
TIME SCALES: BIOLOGICAL. ENVIRONMENTAL. ?E3L'LATCF.v

Allan H. Marcus
3attelle Coiumcus Division
P.O. Box i 3759
Researcn Triangle Park, NO 2"7'0Q

1. INTRODUCTION

E.P.A. has estao i isnec. primar, air duality standards t:> c-rptec: *T =
general puolic aaainst the adverse nealth effects of air col ".ut3"s. src
secondary standards to protect against other aa/er=e =p./ •; - j .-,- = .-, t a 1
impacts. Compliance wi^n these standaras i= usuall1-' are=cr:ce~ b- a>-
explicit sampling protocol for the pollutant, with SDSCITISO :'-'--[=:• ra 1
insui". variation in concentration to which the ppoulaticn 1= e-ocsec,
cost and precisipn of the sample data. Biological and heait" effects
issues are primary and should be kept always in mind. Iracec'..'= '.= sa-icli^c
schedules for compliance testing mignt allow fluctuarina e'Pcsures cf
toxicologies! significance to escape detection. Resources for trstinq
compliance are usual I/ going to be scarce, and focusing :n t.eal T =f-"ec"s
ma/ allow the analyst and designer of environmental '"egi. I a f: pr.s ~z f:>-~
some patn between oversampling and und'ersamp 1 ing environment;! data.

In this review I will emchasize air puaiitv starda^zs ""-•" leao.
Lead is a soft dense metal whose toxic effects have long ::==•- -roi-c-(. In
modern times atmoscneric lead has become a community prcole? Because of
the large quantities of lead used as gasoline additives. while t'u?
problem was supstantlal1v reduced as a result of E.P.A.'s ieaoso gasoline
pnasedown regulations, there are still significant Quantities iTeire' = . oart-3"'
plants etc., and substantial residues of previous lead e-ii! =31 rns i .•
surface soil and dust. Other regulatory author it 1=5 rc'Ttr-i lean
concentrations in drinkinq water. in consumer orcducts. and i .'• "he '.-
-------
of data has been collected by the State and Local Air Mom I-Q -1 -.Q Stations
(SLAMS) network. These provide information aoout areas wr.ere "he lead
concentration and population density are highest and mcrircrmg for
testing compliance with standards is most critical. In order f-c i -JL-MS
station to be part of the National Air Monitoring Station :i'(AfS. ~^r.:-iar'-.
verv specific criteria must be satisfied about sampler location in terms
of height above ground level, distance from the nearest ma:or roacwav.
and scatial scale of which the station is suocosed to be rspr~=enrati.e.
The citing study must also have a sufficiently long sampling oerioo to
exhibit typical wind speeds and directions, or a sufficient, larzie
number of short periods to provide an average value consisted wim ^ = ,..-
hour exposure (CD, 1986).

The current averaging time for the lead primary National Avioient -iir
Quality Standard (NAAGS) is a calendar quarter (3 months:, and trs a: •-
lead NAAGS is a ouarterlv a/erage of 1.5 ug/m3 that snail --i~ ce
exceeded. The lead stanaard proposed in 1977 was Oased on en. a/e-'agi^g
time of one calendar month. The longer period has the ad/an^age of
greater statistical stability. Howeve^, the shorter oe^iod 3;lows some
extra protection. Clinical studies with adult male volunteer subjects
showea that blood lead concentration (PbB) changed to a lew ec-uilinrium
level after E or 3 months of exposure (Rabinowitz et al.. '9~3, >376:
G'iffin et al., 1975). The shorter averaging time was also thought tc
give more protection -tc young . ch i Idren '^2 FR 530""*?) 'e-/en though rn = r =
was no direct evidence then (or now1) on blood lead kinetics '-• c""':dre'-.
""!"he risk of shorter term e'-oosures to air lead concent r a "• i ~ •• = e'S/atsd
above a quarterly-averaged standard that might go jncet=ct=c v.e'-r
considered in the 1Q78 standard decision to be minimizes because \ " -.^ =='~
on the ambient air quality data availaole at that time, tne possibilities
for significant, sustained excursions were considered sma". ., src £• i:
was determined that direct irnalation of air leac is a " = '. 5 •-.•=;. small
component of total airborne lead exposure ^3 FR ^6c^o>. ' r/2i'i- = -'., 1C5='.
The biological reasons for reevaluating the averaging time =re d:scj = = 5(:
in the next section.

Alternative forms of the air lead stancard are .-,cw "e1 'G =.al ;-sted
bv E.P.A.'s Office of Air Quality Planning and Standards (QACPS.. The
averaging time is only one of the comoonents in setting an air lead
standard. The "characterizinc /alue" for testing compliance can assume a
wide variety of forms, e.g. the maximum monthly (or quarteriv' a.erage 55
used in the "deterministic" form of the standaros, tne maximum of the
average monthly mean over a specified numoer of vears e.g. 3 consecutive
years, the average of the maximum monthly averages for each vesr within a
specified number of years, the average of the three highest nonths (or
quarte~s) within a specified number of years etc. Some averaging of the
extreme values certainly smoothes cut the oata, but also conceals extreme
high-level excursions. Some attention has been given to the statistical
properties of the alternative characterizing values (Hunt. Iq9c). Tne
consequences of different characterizing values for biolocica1 e^cosure
indices or health effects indicators has not vet been evaluated.

A final consideration is the samoling freauency. The current normal
situation is a 2^-hour average collected every 6th day. The number of
samples collected also depends on the fraction of lost days: it is not
-------
uncommon for E57, of the data to oe lost. Thus one might have oni.- 3 cr ^
valid samples per month. Hunt (1956- examined more frequent sampling
schemes: every day. every other day, eve^v third aav. ue ai = j comoared
the consequences of deterministic vs. "statistical" form of th*? sr = r,qarc.
monthly vs. quarterly characteristic values, E57. data loss -s. vo loss.
The community air lead problem in the U.S. is now <-nor = |H'~: • -3 c-e
related to point sources than to area-wide emissions. thus rne r"r i 1 z*>\ ~c
three scenarios for location were evaluated: (i) source or:ent=q sites
with maximum annual quarterly averages less than 1.5 ug.'m2: '5; =3urc =
oriented sites with maximum annual quarterly average greats1" '•han 1.5
ug/m3; (3) MAMS urban maximum concentration sites. Some c:.~ciusicrs
suggested bv his study for Quarterly averaging time 3"=:
(i) The characterizing value witn the best pr=ci = ior< ft
-------
plausible explanation is that tnere is reduced transfe-- of lead to tre
red blooa cells at higher concentrations, wnethe^ attributed fc '-educed
lead-binding capacity of the ervthrocytes or reduced transfe-- rste acro = ^
tne erythrocvte membrane as lead concentrations increase. 7r;-= ;H
reinforced bv multi-dose experiments on rats in wnicn lead concentrations
in brain, kidney, and femur are proportional to dose, which is ejected
if tissue concentrations equilibrate witn o'lasma concentrations, not .-u!?-!
whole blood lead concentrations.

Lead concentrations in peripheral tissues can be modeled bv couoied
systems of ordinary differential equations. Parameters for suc^ systems
car be estimated bv iterative nonlinear least squares methods, of-en with
Marquardt-type modifications to enlarge the domain of initial oa'-ametc'-
estimates which allow convergence to the optimal solution (Bermar, arid
Weiss, 1978). Data sets with observations of two or more comcone^ts
often sustain indirect inferences about unobser/ed Tissue oools.
Analyses of data in (Raoinowitz et al.. 1973, 1976; Griff i- et si., i-"1;;
De Silva, 1981) reported in (Marcus. I985abc: Chamberlain. '. 353; 12.
1986) show that lead is absorbed into peripheral tissues in adult humar=
within a few davs. The retention of lead by tissues is 'Tiucn larger thar,
is the initial uptake. Even soft tissues such as kicnev arc 1: ver -sones-
to retain lead for a month or so, and the skeleton retains lead "or .ea"=
or tens of years (Christoffersson et al.. 1986).

The relevance of blood lead and tissue lead con-centra* ions to overt
toxicitv is not unambiguous. As in any biologically vari=cl= acnulstion.
sc'Tie individuals can exhibit extremel/ high biood lead .-lit", -inly ->;'.-
lead poisoning (Chamberlain and riassev. 1°72). A r.o'-e ci-~ct PY-=CU'- = C-
of toxicitv is tne ervthncyte pro topcrphyr in ; EP ' concen fat i z~ .
Elevated levels of EP show that lead nas deranged the neme *:10=-ntnetic
pathway, reducing the rate of production of neme "or he^crilcb i n. EP is
now widely used as a screening indicator for potential tcvicit\. ~*n
example of the utility of EP is that after a brieT" ••nassi-. e e-oosure of a
British worker (Williams, 198^ >, zinc EF increased to .-erv elevated
levels within a week of exposure even the worl-er v-jas still larcielv
asymptomatic. Even though there is consideracie biological /ari=p : 1 i f. ,
EP levels in adults increase significantly within 10 to EO davs after
beginning an experimental increase of ingested lead 'Stui*. 19"-+; Cools
et al., 1976; Schlegel and Kufner, 1978). _ Thus biological effects in
adult humans occur very shortly after exposure, certainly witnip 3 month.

While the uptake of lead and the onset of potential "o-'icit^ cccur
ran idly during increased exposure, the reduction of exposure does not
cause an equally rapid reduction in either body aytleri or toxicit-,
indices. Accumulation of mobilizable pools of lead in "he skeleton and
other tissues create an endogenous source of lead that is only slcwl/
eliminateo. Thus the rapid uptake of leaa during periods of increased
exposure should be emonasized in setting standards for ierd.

The experimental data cited above are indeed human cats, ou* ai! for
adults (almost all for males). We are not aware of any direct stuaies en
lead kinetics in children. One of the more useful sets of oata involves
the uptake of lead by infants from formula and milk (Ryu et al., I'S^,
1985). Blood lead levels and lead content of food were measured at ES
-------
day intervals. The results are negative but informative: Bicod lead
levels in these infants appeared to eauilibrate so much faster that n^
estimate of the kinetic parameters we»s possible. A ,=r/ .-runn efi-ate
bv Duggan (I'St) based on earlier input-output studies i- infants
(Iiegler et'al., 1978) gave a Dlood lead half life (- mean li* = * i3q<2''<
of H to 6 days. Duggan's method has many assumptions and uncertainties.
An alternative method, allometric scaling based on surf;':e ares, sudc-ests
that if a 70 kg adult male has a blooc lead mean life of 30 da,-s, t-,en =
7 kg infant should have a blood lead mean life of about 3 ua.'S.

The above estimates of lead kinetics in children are not strict!'.
acceptable. Children are kineticaliy somewhat different f--r-. adults.
with a somewnat larger volume of blood and much smaller cj: rapid.
develooing skeleton (especially dense cortical bone :na^ • etav.s most : *
the adult body burden of lead). Children also aosorp lee-J f-~m r-e
environment at a greater rate, as -J>e/ na/e greats-- das trn i - res t: ~a 1
absorption of ingested lead and a ..more rapid ventilation ^a^e then zc
adults. A b lomathemat ical model has been developed p/ Hari = '. 3'ic - ~eip
(19Bn) ana modified for use by GAQPS . This uc take/b 10" inet ic mcce'. .=
based on lead concentrations in infant and juvenile baccon^. ,-ir.Q ire
believed to constitute a valid animal model for Human grr>Jtn anc
development. Preliminary applications of the incdel ;re described r.
(Cohen, 1986; ATSDR, 1937; Marcus et al.. 1987). The mcoe! includes
annual changes of kinetic parameters such as the transfe1" -ates f~r
plood-to-bone. blood-to-liver. 1iver-to-gastrointestina1 :r=ct. and
growth of blood, tissue, and skeleton. The model oreair's ~ Tie5r,
residence time for lead in blood of c-vear-old cnildren 3= = 33.•=.

Blood lead concentrations change suostantiall dL^ina r~;lorc3c
'Raoinowitz et al., 198^). These chanaes reflect the wssnc^t j* : n jte-'d
lead, tne exposure of the cm Id to changing patte-ns of f"Cd ^nc --later
ccnsumccion, and the exposure of the toddler to leaded sc:i and dust in
his or ner environment. We must thus ccnsio=-r also the temporal
variations of exposure to environmental lead.

<*. ~IME SCALES OF LEAD EXPOSURE

Air lead concentrations cnange ver / rapid-. , depending on ,-nra speed
and direction and on emissions patterns. Biological kinetics tenc to
filter out the "high-freguency" /ariations in snvironmenral lead, so tna';
only environmental variations on the order of a few days are li(-- = I/ to
play much of a role. The temporal patterns decena on averaging time and
sampling freouency, and thus will vary from one location to another
depending on the major lead sources at that site. Figure 1 shows the
time series for the logarithm of air lead concentration '.log PbA) near a
primary lead smelter in the northwestern U.S. The data are 5^-hour
concentrations sampled every third day (with a few minor slippages':. i4e
analysed these data using Box-Jenkins time series programs. The :emcc--al
structure is fairly complex, with a significant autoregressi^e component
at lag - (£7 davs) and significant moving average components 5*- lacs 1
and 3 (3 days and 9 days). Time series analyses around point source
sites and general urban sites mav thus be informative.
-------
Direct inhalation of atmospheric lean mav be only a minor part of
lead exposure attributable to air lead. Previously elevated air lead
levels mav have deposited a substantial reservoir of lead in =..c-fac= so: 1
r - r
and house oust in the environment; these are the pri-nsr/ p~thwa/s
leao in children aged 1-5 /ears. Little is known abour temcaral
variations in soil and house dust lead. Preliminary result- 11 ceo in
'Laxen et al., 1987) suggest that lead levels in surface dust arc soil
around redecorated houses and scnools can chance over periods or" • i r,e of
two to six months. While lead levels in undisturbed soils can aersisc
for thousands of years, the turnover of lead in urban soils due to human
activities is undoubtedly much faster.

Individuals are not stationar/ in their environment. Thus, tne lead
concentrations to which individuals are exposed must include both spatial
and temporal patterns of exposure. The Picture is corncle--, out much is
being learned from personal exoosure monitoring programs.

The amount of variation in air lead conceneracions at a stationar,
monitor can be extremely large. Coefficients of variation in excess of
100'/. are not uncommon around point sources sucn as lead smelters, eve11
uin en monthly or quarterly averages are used. This var iaD : 1 i r.v is far in
excess of that attributable to meteorological /ariation and is due to
fluctuations in the emissions process e.g. oue to variations in feec
stock, process control, • or production rate. ^urtharmore, the
concentration distributions are verv skewed and heavv-tai lea . more nearly
log-normally distributed than normal even r"or long averaging times. The
stochastic properties of the orccess are generally unknown, 3it~cucn it
mav be assumed that air, dust, and soil lead concer.tr a r, i ?<•= •=< ~i.>r all
sources of exposure, including food, water, ana paint. as •.•lell as t"ose
pathways from gasoline lead, have been declining. Ui'-h these points i~-
mind. we can begin to construct a Quantitative characterization c r~ a
nealth effects target for compliance studies.

5. HEALTH EFFECTS CHARACTERIZATION: A THEORETICAL APPROACH

We will here briefly descrioe a possible approach to tne orioles o*
choosing an averaging time that is meaningful for nealth effects.
Related problems such as sampling irequencv then aeoend on the precision,
with which one wishes to estimate the healtn effects characternac13- .
The basic fact is that ail of the effects of interest are driven DV the
environmental concentration-exposure C(t) at time t integrated over some
oeriod of time, witn an appropriate weighing factor. As oeoole are
encased to diverse pollutant sources, the uptake from all pathways must
be added up. If the health effect is an instantaneous one wnose .-alue at
time t is denoted X(t>, and if the biokinetic processes are ai1 linear
(as is assumed for OAGPS uptake-biokinetic moaei) or can be reasonably
approximated by a linear model driven bv C(u) at time u. then the
biokinetic model can be represented by an aftereffect hei vi ': -->.i ; after
an interval t-u. Mathematically.
-------
f
X ( c ) = J f(t-u) C!u> du

The after effect function for linear comcartmental models is 5 ~ i ' :_••.•=• of
exponential- terms.

The t ime-ave^agec! concentrat ion-exposure at time t, denoted '• ( t : , .3
also a moving average of concentration C(u) at time u. witn 3 we i git
given by g(t-u) after an interval t-u. Thus compliance >i i 1 1 te oaseci or
the values of the variable Y> covCC u).C', vi] nu dv

f f
covCX ( t ) , Y', s ) 3 = J J f(t-u) ai=-'' covCC : u ; , Cr/ ) 1 du 2/

Thus, we could formalize the proolem of selecting an averaging t:T,e T b1-
the following mathematical praole'n: choosing the averaairg time ~ rra"
maximises the correlation between X(t? and Y(s). for that time _t _ at vihic^
ECX(tn is max imum. That is. look for the timers; t at vinicn we exoect
the largest adverse health effect or effect indicator (e.g. olooc lead).
Then find the averaging time T such the moving average ar HO Tie otner tire
= is as highly correlated as possiDle with X(t>. Mete that we dc nc r
require tnat s = t. We may also restrict the range of values of T,

EXAMPLE: ONE-COMPARTMENT BIOKINETIC MODEL, MARKOV EXPOSURE MODEL. .

Suppose that the relevant biokinetic mccel 1= a simple one-
comcartment model. The aftereffect of a unit pollutant uptake is an
exponential washout (e.g. of blood lead, to a first aoprcx imat ion) with
time constant k ,

f(t-u) = exp(-k (t - uM if u '. t

= 0 i f u > t

We will also assume that the concentrat ion-exposure crocess Cst; is
stochastically second-order stationary with covariance function

covCC(u) ,C(v) ] = varCCl exo(-a I u - •/ I >
7
-------
After some algebra, one finds that:

-.•ar[X( t! ] = varCCl ' ^ : a * ;-)

varCY(t>] = varCC] 2 -

-exo ( -a' c+T-s) ) /a( k-a) -exo t -a • s-t > ; / = • 5--k .] "

If t < s-T then

covCX( t) ,Y( s) 3 = varCCl lexo ( -a ( s-1 -T • ; -=. p ; -a ( s-t ) '• 1' ~e ' =--

If t > s (for predicting from the current =amoLing time 5 to -~ /k >. a-^^ > i k-a ; ], T

A small table of correlations between X(t) and V(t; a
-------
children or for adults is about i.o/k. and that much longer or much
shorter averaging times will not capture significant e-cursions in blood
lead. An averaging time of 15-50 oavs will ma^e v ( t : reasoned I v
predictive' cf X(t) for both acuity and cniloren.

T^BLE c

CORRELATION BETWEEN BLOOD LEAD CONCENTRATION AND AVERAGE EM1.' [ROMdEMT-L
LEAD CONCENTRATION AS A FUNCTION OF AVERAGING TIME F

Assumed environmental leaa correlation scale a = l/(4 da/=)

CORRELATION
Averaging CHILD AC'UL"
Time T, Days <• = LMS days) K °= 1'iV) :J3>/5'
7 3.9237
10 0.9538
14 0.9^97 0.~E07
20 0.8900 0.3020
30 0.770" 0.3783
60 0.5^51 0.914!
90 O.^'-MJC O.aS"^

Samples collected fcr compliance resting have a mere complicated
structure for the weignt function qir,-u:, namei, (for fi--.cur =3^cles j~c =
ever-/ m days in an internal of T days/.

qi t-u) = m/hT ift.-*-vj-l)H
-------
lead, volume of environmental intake (e.g. m-3/d of air, L'd of •.•iate",
mq/d of leaded soil and dust, g/d of food) as well as concentration C't..

6. TIME SCALES FOR THE EFFECTS OF QZCME ON AGRICULTURAL C?,GP -'lE'-DS

The regulation of ozone has for some time been one of E.F.A.'s most
Dressing proolems — a regulatory- irritant as .-jell a= ~ lung ir<-1 i-an*..
The secondary standards for ozone nave drawn considerable attsrtic", ax.e
to the knowledge that exposure to ozone may rause economical 1•
significant damage to cash crops and forests. The time of day of tr>e
ozone exposure, and the day of exposure during the growing season, mav
seriously determine the effects of exposure and consecueor\v of tre
statistics that are used to formulate the standard. A u^ticer o*
aporoaches to defining a biologically relevant stanca^a are tiei^g
investigated (Lee et al, 19B7ab; Larsen et al., 1?97>.

Air monitoring data have been collected in ccnnec •:•;-• with -r =
chamber studies of the National Crop Loss ana Assessment let.-ior'-- (MC-LAN)
ana related studies have been carried out at E.P.A.'s Ccrvaiiis
Environmental Research Laboratory (CERL). The ear lie: HCLAd data /-er =
based on seven hours of monitoring (0900-1600) and statistics ^pp-opr ia'.«
to that period. More recent studies use longer sanoiing per:ccs,
including 2^-nour samples at CERL. Examples of the time patterns c *"
exposure used at CERL are shown in Lee et al., icS7-ab. Tr.e
characterizations of the air monitoring data considered for use a~
exoosure statistics and comoliance specifications include t-ne follo'rtinz,
ail based en the mean hourly ozone concentration C\'h) at ^CLI~ -:

MEAN STATISTICS

M7 = seasonal mean of C(h) for 0900-1600 hr each da.-

ill = seasonal mean of daily maximum C(h) aurinc ~! no'jrs

Effective Mean = ( * Cih)* N> *•*! D CNote: 5 T,=3,-.s =jm]

PEAK STATISTICS

F7 = seasonal peak of 7-hour daily mean over O^OO-lado hrs.

PI = seasonal peak hourly concentration

CUMULATIVE STATISTICS

Total Exposure = t C(h>

Total Impact = < 5 C(h)+*p )**l/p

Phenologically Weighted Cumulative Impact (PWCP

= ( $ C(h)**o i-i(ii) !*-*l/p

10
-------
EXCEEDANCE STATISTICS

HP.Sxx = number of hours in .^nicn C' h : .

SUMxx = total ozone concentration X hours v/it- C^. . - •
and at least six other statistics characterizing episode lengths etc.
The statistic most freauently consiaered for ozone cr^'--.cte'": 13 t i;<
|*17. However, the statistics tnat best predict crz' shoot weigri~ zf r-u
cuttings of alfalfa in a CERL experiment was transformed re a '"'-actior
the controls. The values of M7 clearly measure the damsci :.g e'"»=c •
ozone, but with a great deal of scatter around the regression lire.
somewhat clustered values of !17 ai e soread out bv the sta'.i£r;z ='.-JCI '
oi-.es much higher weight to large values of C (h'' (as C^h'/'-c1 a-id "o
n
weight 0.3 to those preceding the previous cutting. ^ro '.-*eig'i
those preceding the next earlier cutting). Crop lcs= .= mj
de finea av the values of PWCI. with relatively little scarf:-'-
fitted curve of "Weibull" form.

Tl,e ozone example suggests that; biological time 5C3l£-s'3f -esocnsr
are better caotured by comcliance statistics that give hiader .-.eiq^1: ':;
recent exoosures, as in our lead example. However, tne t i .^l- iner : cs an-?
clearly nonlinear in ozone concentration so tt-ac some nonce mo r ." :~e ~" ± :
mecnanism of damage, repair, and netaoolisn nust ~e is = -:"ed to -5
ocerating. The °WCI is a cumulative value ana not a Dea>- 'r e r=e";nc~
statistic, thus even low levels of ozone exoosure acoesr ~o oe rs'-si-g
seme carnage. The biological statistic for comcliance same Ing ' "-r
alfalfa, anvway) is thus a E^-hour peaK-weign tea c
-------
f-or most chemicals of intere-Jt there is not neariv enpuan
information on pharmacoK ine t ics . to -. icot< inet ics , or tempers; /ar i ap i i i i; .
of exposure pattern to allow these calculations to 'je marie. Ho.-;5,5<-t ^jr
manv criteria pollutants, the level or" information is adequate ^c :ne
ratio between typical population levels so cl:se to 5 ''eairn effects
criterion level as to make this s serious issue. For example. in i°^6.
the criterion level for blood lead *as 30 uq/ai. -u r •:>"<€• geometric mean
blood lead in urban cnildren was about 15 ug/dl, of wnicn 'c "C'd'
was assumed to be "non-air" bacl-grouna (i.e. reauia-ea c-v some ?'. >e--
office). Due to the reduction of leaaed gasoline during tne ! "0 ' = . the
mean blood lead level for urban children had fallen ""o Q-10 ug/dl bv
I960, and is likely to be somewhat lower today. However. ce^te1' d^ta on
health effects (e.g. erythrocyte oro tooororivr in i'-cr = 5 === i r, iron-
deficient children or hearing loss and neurobehav icrai prcoiems' •. n
children with lead burdens now suggest a much lower ^eaitn crite-'ior
level is appropriate, pernaps 10-15 ug/dl. Thus the^e is still •/?-•.
little "margin of safety" against ranconi excursions OT i=aa exposure.

This is also true for other criteria pollutants. especially for
sensitive or vulnerable suboopu la t ions. For example, asthmatics •"!>.£<•
experience sensitivity to elevated levels of sulfur oioxice cr crone.
especially wnen exercising. Ac t i /11 . levels ceitsntlv 3f-"ect tf>e
kinetics of gaseous pollutant uptake and elimination. Sucpcpu lation
variations in kinetics and pdai maco-iynamics mav be important. Acute
exposure sampling in air or water (e.g. 1-day Health Acvisories ~or
drinking water) shoula be sensiti/e to pnarmacok inet ic t-ime scales,

Biokinetic information on pollutant uotaKe and me tabc ' ; s.r j ."> '•umans
is not often available for /oiatile organic c'jiiccu:'3= acd for nest
carcinogens. Thus large uncertaint.- -"actors for animal =•< trapo 1 = t icn arc
for route of exposure variations are used to provide a conse-'^at I /e level
of exposure. The methods shown here mavbe less useful IT =LIC^
situations. But the de/elopment of lealistic biological 1, ^etivatec
pharmacokinetic models for e«trapclating animal data to humans mav
establish a larger role for assessment of compliance test'.nc for r,nese
sucstances.

ACKMOWLEDGEi-lEilTS

I am grateful to Ms. Judy Kapadia for retyping the m^rusc-ipt. ano
to the reviewer for his helpful comments.
REFERENCES

Eernan M, Weiss MF. 1978. SAAM - 5 i.nula t ion, A,isl-.si = . ana HodeLina.
Manual. U.S. Public Health Service Fuel. NIH-180.

Campbell BC, Mereditn PA. Moore MR, Uatson US. 1°8^. hinei.cs of lead
fallowing intravenous administration in man. To* Letters 51:E31-S35.
CD [Criteria Document]. 1996. nil duality criteria for lead.
Environmental Criteria and Assessment Office, US Environmental Protection
Agency. EPA-600/8-33/OE8aF (4 volumes). Res. Tri. Pk. , IMC.
12
-------
Chamberlain AC. 1935. Frediction o~~ resncr.se of blood leec to airborne
ana dietar/ lead from volunteer evoeriments with lead i settees. =r3C Rov
Soc Lond 522^: 1^9-182.

Chamfer la in MJ , Massey PMQ . 1Q72. Hi la ieaa DO i sen ing mm = >:ce= = 1 vei v
high blood lead. Brit J Industr Med 29:^58-^61.

Chr i stot'f ersson JO, Ahiqren L, Schut: A, Ske>'f / i,-,g r . l^Bc. Decrease of
skeletal lead levels in man after ena of occupational exacsure. Arcn En;
Health 41:312-318.

Cohen, J. Personal communica 1 100= st'Cut UAGiFS staff pace''. ~Dr:i-No/.
1996.

Cools A, Salle JA, Verberk MM, lielhms PL. 1 9"=. 5 1 ocr-.-ru i - a 1 lescC'-.se of
male volunteers ingesting inorganic lead for <^Q cays. Inc ACT-. Ccc'jc
Environ Health 38:12Q-139.

DeSilva PE. 1981. Determination of lead in plasma ana =trudie= " on :t =
relationship to lead in ervthrocvtes. Brit J Industr Mea 3S:20=-E!7.

Duggan M J . 1983. The uptake and excretion of lead b -• .Q'JI:Q c;iildi=i'. Ai c-.
Environ Health 38:.2.!C

Laxen, DPH, Lindsay F, Raab GM, Hunter R, Fell GS, Fulton M . 1987. The
variability of lead in dusts within the homes of •. oung c^iidien. In
Lead In the Home Environment, ed . E. Culbard. Science 5si . i ~^= . London.

Lee EH, Tingey DT, Hogsett WE. 1987a. Selection of the best excosure-
resoonse model using various 7-hcur ozone exposure statistics. Reoort for
Office of Air Quality Planning and Standards, US Environ. Protection
Agency. 13
-------
L=e EH, Tinqev DT, Hogsett WE. l^BTb. Evaluation j-~ crone e-.c
-------
2 —
1 ~
0.25
-------
DISCUSSION
Richard C. Hertzberg
Environmental Criteria and Assessment Office, U.S. EPA, Cincinnati, OH 45268

Comments on
"Time Scales: Biological, Environmental, Regulatory," Allan H. Marcus
Summary of Presentation

Marcus presents a case for consideration of
physiologic time scales in the determination of
compliance sampling protocols. The general theme of
incorporating physiologic time into risk assessment is
certainly scientifically supportable (e.g., NAS Workshop,
1986, "Pharmacokinetics in Risk Assessment," several
authors), but has been previously proposed only for
setting standards. Marcus takes the application one
step further by showing how improper sampling can fail
to detect exposure fluctuations that have toxicological
significance.

The Regulatory Context

The modeling and data that Marcus presents seem
reasonable, but key items seem to be missing, at least if
this approach is to become used by regulatory agencies.
The examples should show that the refinement will
make a practical difference in the "cost-benefit"
evaluation, and that the required data are accessible.

The first question is: does it matter? Most
standards are set with a fair degree of conservatism, so
that slight excursions above the standard will not pose a
significant health risk. The first impression of Marcus'
proposal is that it is fine tuning, when in fact it is the
coarse control which needs to be turned. Let us
consider the example of lead. Recent research has
suggested that significant impairment of neurological
development can be caused by lead concentrations much
lower than previously thought. In fact, some scientists
have suggested that lead toxicity may be a no-threshold
phenomenon. If such is the case, then EPA's approach
to setting lead standards will change drastically, and
Marcus' example, though not necessarily his proposal,
will probably not apply. But even with the current
standard, it is not clear that results from Marcus'
method will not be lost in the usual noise of biological
data. For example, consider his figure showing the
graphs of data and model fits for 11 human subjects.
First, these results may be irrelevant to the air
pollution issue since that data are following "ingestion"
of lead, not "inhalation." Lead inhalation is in many
ways more complicated than ingestion. Also, using day
30 as an example, the fitted erythrocyte protoporphyrin
levels vary dramatically across individuals (mean=49,
s.d.=20.3, range=30-73). I could not read the graphs
well, but even accounting for differing starting values,
the curve shapes also change across individuals, so that
predictions for any untested individual might be
difficult.
The second question, that of data requirements,
.cannot be answered from this presentation alone. But
some issues can be mentioned. It is not clear that the
correlations between blood lead (Table 1) and monthly
average lead are good predictors of the correlation
between monthly average lead and neurological
impairment. But is the correlation the best indicator of
performance? A better question, perhaps, is: do
changes in blood lead which could be allowed by using
the weakest sampling protocol actually result in
significantly increased incidence of neurological
dysfunction, when compared to the best compliance
sampling procedure as determined using Marcus'
scheme? It is not clear how much data would be
required to answer that question.

Also, it seems that Marcus' approach must have
pharmacokinetic data on humans. The data
requirements are then more severe for most of the
thousands of environmental chemicals, where only
animal data are available. The situation is even worse
for carcinogens, where human cancer incidence data are
not available at the low regulatory levels. In fact, the
orders-of-magnitude uncertainty in the low-dose
extrapolation of cancer bioassays easily swamps the
error due to non-optimal compliance sampling.

So where might this research go? Certainly it
should be further developed. This approach will
definitely be useful for acute regulatory levels, such as
the 1-day Health Advisories for drinking water, where
internal dose and toxicity are closely tied to
pharmacokinetics. It will probably be more significant
for sensitive subgroups, such as children and those with
respiratory disease, where the pharmacokinetics are
likely to be much different from the norm, and where
the tolerance to chemical exposure is already low. For
those cases, scaling factors and uncertainty factors are
highly inaccurate. Most important is the example
Marcus presents, chemicals where uptake and
elimination rates are dramatically different. For
control of those chemicals, using the "average"
monitored level is clearly misleading, and some
approach such as Marcus' must be used. I would
recommend the following steps:

• First, demonstrate the need. List at least a
few chemicals that are being improperly
monitored because of their pharmacokinetic
properties.

• Then, show us that your method works and is
practical.
16
-------
Statistical Issues in Human Exposure Monitoring

William C. Nelson, U.S. EPA, EMSL, Research Triangle Park

ABSTRACT

Pollutant exposure information provides a critical link in risk
assessment and therefore in environmental decision making. Traditionally,
outdoor air monitoring stations have been necessarily utilized to relate
air pollutant exposures to groups of nearby residents. This approach is
limited by (1) using only the outdoor air as an exposure surrogate when
most individuals spend relatively small proportions of time outdoors and
(2) estimating exposure of a group rather than an individual. More
recently, air monitoring of non-amoient locations, termed microenvironments,
such as residences, offices, and shops has increased. Such data when
combined with time and activity questionnaire information can provide
more accurate estimates of human exposure. Development of portable
personal monitors that can be used by the individual study volunteer
provides a more direct method for exposure estimation. Personal samplers
are available for relatively few pollutants including carbon monoxide and
volatile organic compounds (VOC's) such as benzene, styrene, tetrachloroethylene,
xylene, and dichlorobenzene. EPA has recently performed carbon monoxide
exposure studies in Denver, Colorado and Washington, D.C. which have
provided new information on CO exposure for individual activities and
various microenvironments. VOC personal exposure studies in New Jersey
and California have indicated that, for some hazardous chemicals,
individuals may receive higher exposure from indoor air than from outdoor
air. Indoor sources include tobacco smoke, cleansers, insecticides,
furnishings, deodorizers, and paints. Types of exposure assessment
included in these studies are questionnaires, outdoor, indoor, personal,
and biological (breath) monitoring.

As more sophisticated exposure data become available, statistical
design and analysis questions also increase. These issues include survey
sampling, questionnaire development, errors-in-variables situation, and
estimating the relationship between the microenvironment and direct
personal exposure, (Methodological development is needed for models wnich
permit supplementing the direct personal monitoring approach with an
activity diary which provides an opportunity for combining these data
with microenvironment data to estimate a population exposure distribution.
Another situation is the appropriate choice between monitoring instruments
of varying precision and cost. If inter-individual exposure variability
is high, use of a less precise instrument of lower cost which provides an
opportunity for additional study subjects may be justified. Appropriate
choice of an exposure metric also requires more examination. In some
instances, total exposure may not be as useful as exposure above a threshold
level.

Because community studies using personal exposure and microenvironmental
measurements are expensive, future studies will probably use smaller
sample sizes but be more intensive. However, since such studies
provide exposure data for individuals rather than only for groups, they
may not necessarily have less statistical power.

17
-------
INTRODUCTION

Pollutant exposure information is a necessary component of the risk
assessment process. The traditional approach to investigating the-
relationship between pollutant level in the environment and the concentration
available for human inhalation, absorption or ingestion, has been 1)
measurements at an outdoor fixed monitoring site or 2) mathematical model
estimates of pollutant concentration from effluent emission rate information.1

The limitations of such a preliminary exposure assessment have become
increasingly apparent. For example, recognition of the importance of
indoor pollutant sources, particularly considering the large amount of
time spent indoors, and concern for estimating total personal exposure
have lead to more in-depth exposure assessments.

One of the major problems to overcome when conducting a risk assessment
is the need to estimate population exposure. Such estimates require
information on the availability of a pollutant to a population group via
one or more pathways. In many cases, the actual concentrations encountered
are influenced by a number of parameters related to activity patterns.
Some of the more important are: the time spent indoors and outdoors,
commuting, occupations, recreation, food consumption, and water supply.
For specific situations the analyses will involve one major pathway to
man (e.g. outside atmospheric levels for ozone), but for others, such as
heavy metals or pesticides, the exposure will be derived from several
different media.

A framework for approaching exposure assessments for air pollutants
has been described by the National Academy of Science Epidemiology of Air
Pollution Committee.2 The activities shown in Figure 1 were considered
to be necessary to conduct an in-depth exposure assessment.

As knowledge about the components of this framework, particularly
sources and effects, has increased, the need for improved data on exposures
and doses has become more critical. A literature review published in
1982 discussed a large number of research reports and technical papers
with schemes for calculating population exposures.3 However, such schemes
are imperfect, relying on the limited data available from fixed air
monitoring stations and producing estimates of "potential exposures" with
unknown accuracy. Up until the 1980's, there were few accurate field
data on the actual exposures of the population to important environmental
pollutants. Very little was known about the variation from person to
person of exposure to a given pollutant, the reason for these variations,
or the differences in the exposures of subpopulatiohs of a city.
Furthermore, a variety of field studies undertaken in the 1970s and early
1980s showed that the concentrations experienced by people engaged in
various activities (driving, walking on sidewalks, shopping in stores,
working in buildings, etc.) did not correlate well with the simultaneous
readings observed at fixed air-monitoring stations.4-9 Two reviews have
summarized much of the literature on personal exposures to environmental
pollution showing the difficulty of relating conventional outdoor monitoring
data to actual exposures of the population.i°»H No widely acceptable
methodology was available for predicting and projecting future exposures
18
-------
of a population or for estimating how population exposures might change
in response to various regulatory actions. No satisfactory exposure
framework or models existed.

TOTAL HUMAN EXPOSURE

The total human exposure concept seeks to provide the missing
component in the full risk model: estimates of the total exposures of
the population to environmental pollutants, with known accuracy and
precision. Generating this new type of information requires developing
an appropriate research program and methodologies. The methodology has
been partially developed for carbon monoxide (CO), volatile organic
compounds (VOC's) and pesticides, and additional research is needed to
solve many problems for a variety of other pollutants.

The total human exposure concept defines the human being as the
target for exposure. Any pollutant in a transport medium that comes into
contact with this person, either through air, water, food, or skin, is
considered to be an exposure to that pollutant at that time.

The instantaneous exposure is expressed quantitatively as a
concentration in a particular carrier medium at a particular instant of
time, and the average exposure is the average of the concentration to the
person over some appropriate averaging time. Some pollutants, such as
CO, can reach humans through only one carrier medium, the air route of
exposure. Others, such as lead and chloroform, can reach humans through
two or more routes of exposure (e.g., air,.food, and water). If multiple
routes of exposure are involved, then the total human exposure approach
seeks to determine a person's exposure (concentration in each carrier
medium at a particular instant of time) through all major routes of
exposure.

Once implemented, the total human exposure methodology seeks to
provide information, with known precision and accuracy, on the exposures
of the general public through all environmental media, regardless of
whether the pathways of exposure are air, drinking water, food, or skin
contact. It seeks to provide reliable, quantitative data on the number
of people exposed and their levels of exposures, as well as the sources
or other contributors responsible for these exposures. In the last few
years, a number of studies have demonstrated these new techniques. The
findings have already had an impact on the Agency's policies and priorties,
As the methodology evolves, the research needs to be directed toward
identifying and better understanding the nation's highest priority
pollutant concerns.

The major goals of the Total Human Exposure Program can be summarized
as follows:

Estimate total human exposure for each pollutant of concern

Determine major sources of this exposure

Estimate health risks associated with these exposures

Determine actions to eliminate or at least reduce these risks

19
-------
The total human exposure concept considers major routes of exposure
by which a pollutant may reach the human target. Then, it focuses on
those particular routes which are relevant for the pollutants of concern,
developing information on the concentrations present and the movement of
the pollutants through the,exposure routes. Activity information from
diaries maintained by respondents helps identify the microenvironments of
greatest concern-, and in many cases, also helps identify likely contributing
sources. Biological samples of body burden may be measured to confirm
the exposure measurements and to estimate a later step in the risk assessment
framework.

In the total human exposure methodology, two complementary conceptual
approaches, the direct and the indirect, have been devised for providing
the human exposure estimates needed to plan and set priorities for reducing
risks.

Direct Approach

The "direct approach" consists of measurements of exposures of the
general population to pollutants of concern.12 A representative probability
based sample of the population is selected based on statistical design.
Then, for the class of pollutants under study, the pollutant concentrations
reaching the persons sampled are measured for the relevant environmental
media. A sufficient number of people are sampled using appropriate
statistical sampling techniques to permit inferences to be drawn, with
known precision, about the exposures of the larger population from which
the sample has been selected. From statistical analyses of subject
diaries which list activities and locations visited, it usually is possible
to identify the likely sources, microenvironments, and human activities
that contribute to exposures, including both traditional and nontraditional
components.

To characterize a population's exposures, it is necessary to monitor
a relatively large number of people and to select them in a manner that
is statistically representative of the larger population. This approach
combines the survey design techniques of the social scientist with the
latest measurement technology of the chemist and engineer, using both
statistical survey methodology and environmental monitoring in a single
field survey. It uses the new miniaturized personal exposure monitors
(PEMs) that have become available over the last decade, 13,14,15 ancj ^
adopts the survey sampling techniques that have been used previously to
measure public opinion and human behavior. The U.S. EPA Office of Research
and Development (ORD) has recently conducted several major field studies
using the direct approach, namely, the Total Exposure Assessment Methodology
(TEAM) Study of VOCs, the CO field studies in Washington, D.C. and Denver,
and the non-occupational exposure to pesticides study. These studies
will be described later.

Indirect Approach

Rather than measuring personal exposures directly as in the previous
approach, the "indirect approach" attempts to construct the exposure
profile mathematically by combining information on the times people spend
20
-------
in particular locations (homes, automobiles, offices, etc.) with the
concentrations expected to occur there. This approach requires a
mathematical model, information on human activity patterns, and statistical
information on the concentrations likely to occur in selected locations,
or "microenvironments".l^ -A microenvironment can be defined as a location
of relatively homogeneous pollutant concentration that a person occupies
for some time period. Examples include a house, office, school, automobile,
subway or bus. An activity pattern is a record of time spent in specific
mi croenvi ronments.

In its simplest form the "indirect approach" seeks to compute the
integrated exposure as the sum of the individual products of the concentrations
encountered by a person in a microenvironment and the time the person
spends there. The integrated exposure permits computing the average
exposure for any averaging period by dividing the time duration of the
averaging period. If the concentration within microenvironment j is
assumed to be constant during the period that person i occupies
microenvi ronment j, then the integrated exposure E-j for the person i will
be the sum of the product of the concentration cj in each microenvironment
and the time spent by person i in that microenvironment

J
E1 - I Cjt1jf
j = 1

where E-j = integrated exposure of person i over the time period of interest;

Cj = concentrations experienced in microenvironment j;

t-jj = time spent by person i in microenvi ronment j; and

J = total number of microenvironments occupied by person i over
the time period of interest.

To compute the integrated exposure E^ for person i, it obviously is
necessary to estimate_both Cj and t-jj. If T is the averaging time,
the average exposure E-j of person i is obtained by dividing by T; that is
E-J = E-j/T, where E-j is summed over time T.

Although the direct approach is invaluable in determining exposures
and sources of exposure for the specific population sampled, the Agency
needs to be able to extrapolate to much larger populations. The indirect
approach attempts to measure and understand the basic relationships
variables and resulting exposures, usually in particular
through "exposure modeling." An exposure model takes
the field, and then, in a separate and distinct activity,
The exposure model is intended to complement results
studies and to extend and extrapolate these findings to other
other situations. Exposure models are not traditional
used to predict
between causative
mi croenvi ronments
data collected in
predicts exposure
from di rect
locales and
dispersion models
outdoor concentrations; they are
different models designed to predict the exposure of a rather mobile
human being. Thus, they require information on typical activities and
time budgets of people, as well as information on likely concentrations
in places where people spend time.
21
-------
The U.S. EPA ORD has also conducted several studies using the indirect
approach. An example of a recent exposure model is the Simulation of
Human Activities ad Pollutant Exposures (SHAPE) model, which has been
designed to make predictions of exposures to population to CO in Urban
areas. This model is similar to the NAAQS Exposure Model (NEM). The
SHAPE model used the CO concentrations measured in the Washington-Denver
CO study to determine the contributions to exposure from commuting,
cooking, cigarette smoke, and other factors. Once a model such as SHAPE
is successfully validated (by showing that it accurately predicts exposure
distributions measured in a TEAM field study), it can be used in a new
city without a field study to make a valid prediction of that population's
exposures using that city's data on human activities, travel habits, and
outdoor concentrations. The goal of future development is to apply the
model to other pollutants (e.g., VOCs, household pesticides) making it
possible to estimate exposure frequency distributions for the entire
country, or for major regions.

Field Studies

The total human exposure field studies from a central part of the
U.S. EPA ORD exposure research program. Several studies have demonstrated
the feasibility of using statistical procedures to choose a small
representative sample of the population from which it is possible to make
inferences about the whole population. Certain subpopulations of importance
from the standpoint of their unique exposure to the pollutant under study
are "weighted" or sampled more heavily than others. In the subsequent
data analysis phases, sampling weights are used to adjust for the
overrepresentation of these groups. As a result, it is possible to draw
conclusions about the exposures of the larger population of a region with
a study that is within acceptable costs.

Once the sample of people has been selected, their exposures to the
pollutant through various environmental media (air, water, food, skin) '
are measured. Some pollutants have negligible exposure routes through
certain media, thus simplifying the study. Two large-scale total human
exposure field studies have been undertaken by U.S. EPA to demonstrate
this methodology: the TEAM study of VOCs and the Denver - Washington DC,
field study of CO.

The first set of TEAM Studies (1980-84) were the most extensive
investigation of personal exposures to multiple pollutants and corresponding
body burdens. In all, more than 700 persons in 10 cities have had their
personal exposures to 20 toxic compounds in air and-drinking water measured,
together with levels in exhaled breath as an indicator of blood
concentration.17'19 Because of the probability survey design used,
inferences can be made about a larger target population in certain areas:
128,000 persons in Elizabeth/Bayonne, NJ; 100,000 persons in the South
Bay Section of Los Angeles, CA; and 50,000 persons in Antioch/Pittsburg,
CA.
22
-------
The major findings of the TEAM Study may be summarized as follows:

1. Great variability (2-3 orders of magnitude) of exposures occur even
in small geographical areas (such as a college campus) monitored on the
same day.

2. Personal and overnight indoor exposures consistently outweigh outdoor
concentrations. At the higher exposure levels, indoor concentrations may
be 10-100 times the outdoor concentrations, even in New Jersey.

3. Drinking water and beverages in some cases are the main pathways of
exposure to chloroform and bromodichloromethane — air is the main route
of exposure to 10 other prevalent toxic organic compounds.

4. Breath levels are significantly correlated with previous personal
air exposures for all 10 compounds. On the other hand, breath levels are
usually not significantly correlated with outdoor levels, even when the
outdoor level is measured in the person's own backyard.

5. Activities and sources of exposure were significantly correlated
with higher breath levels for the following chemicals:

benzene: visits to service stations, smoking, work in chemical and
paint plants;
tetrachloroethylene: visits to dry cleaners.

6. Although questionnaires adequate for identifying household sources
were not part of the study, the following sources were hypothesized:

p-dichlorobenzene: moth crystals, deodorizers, pesticides;
chloroform: hot showers, boiling water for meals;
styrene: plastics, insulation, carpets;
xylenes; ethylbenzene: paints, gasoline.

7. Residence near major outdoor point sources of pollution had little
effect, if any, on personal exposure.

The TEAM direct approach has four basic elements:

Use of a representative probability sample of the population under
study

Direct measurement of the pollutant concentrations reaching these
people through all media (air, food, water, skin contact)

Direct measurement of body burden to infer dosage

Direct recording of each person's daily activities through diaries

The Denver - Washington, DC CO Exposure Study utilized a methodology
for measuring the frequency distribution of CO exposures in a representative
sample of urban populations during 1982-83.20-22 Household data were
collected from over 4400 households in Washington, DC and over 2100
23
-------
households in the Denver metropolitan areas. Exposure data using personal
monitors were collected from 814 individuals in Washington, DC, and 450
individuals in Denver, together with activity data from a stratified
probability sample of the residents living in each of the two urban areas.
Established survey sampling procedures were used. The resulting exposure
data permit statistical comparisons between population subgroups (e.g.,
commuters vs. noncommuters, and residents with and without gas stoves).
The data also provide evidence for judging the accuracy of exposure
estimates calculated from fixed site monitoring data.

Additional efforts are underway to use these data to recognize indoor
sources and factors which contribute to elevated CO exposure levels and
to validate existing exposure models.

Microenvironment Models

Utilizing data collected in the Washington, DC urban-scale CO Study,
two modeling and evaluation analyses have been developed. The first,
conducted by Duan, is for the purpose of evaluating the use of microenvironmental
and activity pattern data in estimating a defined population's exposure to
CO.16 The second, conducted by Flachsbart, is to model the microenvironmental
situation of commuter rush-hour traffic (considering type and age of
vehicle, speed, and meteorology) and observed CO concentrations.5 With
the assistance of a contractor, U.S. EPA has collected data on traffic
variables, traffic volume, types of vehicles, and model year. An earlier
study measured CO in a variety of microenvironments and under a variety
of conditions.23

The indirect method for estimating population exposure to CO was
compared to exposures to the CO concentrations observed while people
carried personal monitors during their daily activities. The indirect
estimate derived from personal monitoring at the low concentration levels,
say 1 ppm but higher at levels above that. For example, at the 5 ppm
level, indirect estimates were about half the direct estimates within the
regression model utilizing these data. Although the results are limited,
it appears that when monitoring experts design microenvironmental field
surveys, there is a tendency to sample more heavily in those settings
where the concentration is expected to be higher, thereby causing exaggerated
levels of the indirect method. The possibility of using microenvironmental
measurements and/or activity patterns from one city to extrapolate to
those of another city is doubtful but not yet fully evaluated.

Dosimetry Research

The development of reliable biological indicators of either specific
pollutant exposures or health effects is in its early stages. A limited
number of biomarkers such as blood levels of lead or CO have been recognized
and used for some time. Breath levels of VOCs or CO have also been
measured successfully. However, the use of other biomarkers such as
cotinine, a metabolite of nicotine, for a tracer compound of environmental
tobacco smoke is still in its experimental phase. This also applies to
24
-------
use of the hydroxyproline-to-creatinine ratio as a measure of N02 exposure
and also to use of DNA adducts which form as a result of VOC exposure and
have been found to be correlated with genotoxic measures. Dosimetry
methods development, though still very new and too often not yet peady
for field application for humans, is obviously a very promising research
area.

Exhaled breath measurements have been used successfuly in VOC and CO
exposure studies. Since breath samples can be obtained noninvasively,
they are preferred to blood measurements whenever they can meet the
exposure research goals. A methodology to collect expired samples on a
Tenax adsorbent has been developed and used on several hundred TEAM study
subjects. Major findings have included the discovery that breath levels
generally exceed outdoor levels, even in heavily industrialized'petrochemical
manufacturing areas. Significant correlations of breath levels with
personal air exposures for certain chemicals give further proof that the
source of the high exposure is in personal activities or indoors, at home
as well as at work.

The basic advantages of monitoring breath rather than blood or tissues
are:

1. Greater acceptability by volunteers. Persons give breath samples
more readily than blood samples. The procedure is rapid and convenient,
taking only 5-10 min. in all.

2. Greater sensitivity. . Since volatile organic compounds often have a
high air-to-blood partition coefficient, they will have higher concentrations
in breath than in blood under equilibrium conditions. Thus, more than
100 compounds have been detected in the breath of subjects where
simultaneously collected blood samples showed only one or two above
detectable limits.

3. Fewer analytical problems. Several "clean-up" steps must be completed
with blood samples, including centrifuging, extraction, etc., with each
step carrying possibility for loss or contamination of the sample.

Measurements of CO in expired air often are used as indicators of
carboxyhemoglobin (COHb) concentrations in blood, although the precise
relationship between alveolar CO and blood COHb has not been agreed upon.

The U.S. EPA exposure monitoring program therefore included a breath
monitoring component in its study of CO exposures in Denver and Washington,
DC. The purpose was (1) to estimate the distribution of alveolar CO (and
therefore blood COHb) concentrations in the nonsmoking adult residents of
the two cities; and (2) to compare the alveolar CO measurements to preceding
personal CO exposures.

The major findings of the breath monitoring program included:

1. The percent of nonsmoking adults with alveolar CO exceeding 10 ppm
(i.e., blood COHb 2%) was 11% in Denver and 6% in Washington, DC.
25
-------
2. The correlations between breath CO and previous 8-h CO exposure were
0.5 for Denver and 0.66 for Washington, DC.

3. The correlations between personal CO exposures at home or at-work
and ambient CO at the nearest stations averaged 0.25 at Denver and 0.19
at Washington, DC. Thus, the ambient data explained little of the
variability of CO exposure.

Sampling Protocols

Statistical sampling protocols are the design for large-scale total
human exposure field studies. They describe the procedures to be used in
identifying respondents, choosing the sample sizes, selecting the number
of persons to be contacted within various subpopulations, and other
factors. They are essential to the total human exposure research program
to ensure that a field survey will provide the information necessary to
meet its objectives. Because one's activities affect one's exposures,
another unique component of the total human exposure research program is
the development of human activity pattern*data bases. Such data bases
provide a record describing what people do in time and space.

Whenever the objectives of a study are to make valid inferences beyond
the group surveyed, a statistical survey design is required. For exposure
studies, the only statistically valid procedure that is widely accepted
for making such inferences is to select a probability sample from the
target population. The survey designs used in the total exposure field
studies have been-three-stage probability-based, which consist of areas
defined by census tracts, households randomly selected within the census
tracts, and stratified sampling of screened eligible individuals.20,24

STATISTICAL ISSUES

TEAM Design Considerations

It appears that some variability in the TEAM exposure data might be
due to meteorological factors such as some receptors being downwind of the
sources while others are not. A more careful experimental design that
includes consideration of these factors, including measurement of
appropriate meteorological parameters, may lead to more meaningful data
in future studies.

Other TEAM design considerations are:

1. The intraperson temporal variation in VOC exposure is crucial in
risk assessment and should be given a high priority in future studies.

2. Given the substantial measurement error, the estimated exposure
distributions can be substantially more heterogeneous than the true
exposure distributions. For example, the variance of the estimated
exposures is the sum of the variance of the true exposures and the
variance of the measurement errors, assuming that: a) measurement
errors are homoscedastic, and b) there is no correlation between
measurement error and true exposure. Empirical Bayes methods are
available for such adjustments.
26
-------
3. The relatively high refusal rate in the sample enrollment is of
concern. A more rigorous effort in the future to assess the impact
of the refusal on the generalizability of the sample is desirable.
For example, a subsample of the accessible part of the refusals can
be offered an incentive to participate, or be offered a less intensive
protocol for their participation; the data from the would-be refusals
can then be compared with the "regular" participants to assess the
possible magnitudes of selection bias.

4. In future studies, the following might be used:

a. use of closed format questionnaires,
b. use of artifical intelligence methodology,
c. use of automated instrument output.

Development of Improved Microenvironmental Monitoring Designs

The direct method of personal exposure is appealing but is expensive
and burdensome to human subjects. Monitoring microenvironments instead
is less costly but estimtes personal exposure only indirectly. Obviously
these approaches can be used in a complementary way to answer specific
pollutant exposure questions.

With either method, a crucial issue is how to stratify the
microenvironments into relatively homogeneous microenvironment types
(METS).12 Usually there are many possible ways to stratify the
microenvironments into METs, thus there can be many potentially distinct
METs. Obviously one cannot implement a stratification scheme with five
hundred METs in field studies. It is therefore important to develop
methods for identifying the most informative ways to stratify the
microenvironments into METs. For example, if we can only afford to
distinguish two METs in a field study, is it better to distinguish indoor
and outdoor as the two METs', or is it better to distinguish awake and
sleeping as the two METs?

Some of the more important issues which will require additional
methodological development are:

1. How to identify the most informative ways to stratify microenvironments
into METs.

2. How to optimize the number of METs, choosing between a larger number
of METs and fewer microenvironments for each MET, and a smaller
number of METs and more microenvironments for each MET.

3. How to allocate the number of monitored microenvironments across
different METs: one should monitor more microenvironments for the
more crucial METs (those in which the human subjects spend more of
their time) than the less crucial METs.
27
-------
Development and Validation of Improved Models for Estimating Personal
Exposure from Microenvironmental Monitoring Data

Methodological development is needed for models which allow
supplementing the direct personal monitoring approach with an activity
diary enabling these data to be combined with indirect approach
microenvironmental data to estimate personal exposure through a regression-
like model. The basic exposure model which sums over microenvironments

Ei • I Cjtij
j

can be interpreted as a regression model with the concentrations being
the parameters to be estimated. To fully develop this approach, it is
necessary to make crucial assumptions about independence between individuals
and between METs. Therefore, it is very important to validate the method
empirically.

Errors-in-Vari'ables Problem

It is important to recognize an errors-in-variables situation which.
may often occur in exposure assessment. In estimating the relationship
between two variables, Y (a health effect) and X (true personal exposure),
when X is not observed but a surrogate of X, say Z, which is related to X
is observed. Such variables may have systematic errors as well as zero-
centered random errors. The effects of the measurement bias are more
serious in estimation situations than for hypothesis testing.

Choice Between Monitoring Instruments of Varying Precision and Cost

When designing monitoring programs, it is common to have available
instruments of varying quality. Measurement devices that are less
expensive to obtain and use are typically also less accurate and precise.
Strategies could be developed and evaluated that consider the costs of
measurement as well as the precision. In situations of high between-
individual exposure variability, a less precise instrument of lower cost
may be preferred if it permits an opportunity for enough additional study
subjects.

Development of Designs Appropriate for Assessing National Levels

At the present time, the data available for the assessment of personal
exposure distributions are restricted to a limited number of locales.
The generalization from existing data to a very general population such
as the national population requires a great deal of caution. However, it
is conceivable that large scale studies or monitoring programs aimed at a
nationally representative sample might be implemented in the future. It
would be useful to consider the design of such studies using data presently
available. It would also be useful to design studies of more limited
scales to be conducted in the near future as pilot studies for a possible
national study, so as to collect information which might be useful for
the design of a national study.
28
-------
An issue in the design of a national study is the amount of clustering
of the sample: one has to decide how many locales to use, and how large
a sample to take for each locale. The decision depends partly on the
fixed cost in using additional locales, and partly on the intracluster
correlation for the locales. For many of the VOC's measured in the TEAM
studies, there is far more variability within locales than between locales,
in other words, there is little intracluster correlation for the locales.
This would indicate that a national study should be highly clustered,
with a few locales and a large sample for each locale. On the other
hand, if there is more variability between locales than within locales, a
national study should use many locales and a small sample for each locale.

Further analysis of the existing TEAM data base can help to address
these issues. For example, the TEAM sample to date can be identified as
a "population" from which various "samples" can be taken. The characteristics
of various sample types can be useful for the design of any followup
studies as well as for a larger new study.

Evaluating Extreme Values in Exposure Monitoring

Short term extreme values of pollutant exposure may well be more
important from a biological point of view than elevated temporal mean
values. The study of statistical properties of extreme values from
multivariate spatio-temporally dependent data is in its infancy. In
particular, the possibility of synergy necessitates the development of a
theory of multivariate extreme values. It is desirable to develop estimates
of extreme quantiles of pollutant concentration.

Estimation Adjustment for Censored Monitoring Data

One should develop low exposure level extrapolation procedures and
models, and check the sensitivity of these procedures to the models
chosen. In some cases a substantial fraction of exposure monitoring data
is below the detection limit even though these low exposure levels may be
important. The problem of extrapolating from measured to unmeasured
values thus naturally arises. Basically this is a problem of fitting the
lower tail of the pollutant concentration distribution. Commonly used
procedures assume either that below detectable level values are actually
at the detection limit, or that they are zero, or that they are one-half
of the detection limit.

In many monitoring situations we may find a good fit to simple models
such as the lognormal for that part of the data which lies above the
detection limit. Then the calculation of total exposure would use a
lognormal extrapolation of the lower tail.

SUMMARY

Personal exposure assessment is a critical link in the overall risk
assessment framework. Recent advances in exposure monitoring have provided
new capabilities and additional challenges to the environmental research
team, particularly to the statistician, to improve the current state of
29
-------
information on microenvironment concentrations, activity patterns, and
particularly personal exposure. If these opportunities are realized,
then risk assessments can more often use human exposure and risk data in
addition to available animal toxicology information.
30
-------
REFERENCES

1. Lioy, P. J., (1987) In Depth Exposure Assessments. JAPCA, 37, 791-
793.

2. Epidemiology of Air Pollution, National Research Council National
Academy Press, Washington, DC (1985), 1-334.

3. Ott, W. R. (1982) Concepts of human exposure to air pollution,
Environ. Int., 7, 179-196.

4. Cortese, A. D. and Spengler, J.D. (1976) Ability of fixed monitoring
stations to represent carbon monoxide exposure. J. Air Pollut.
Control Assoc., 26, 1144.

5. Flachsbart, P. G. and Ott, W. R. (1984) Field Surveys of carbon
monoxide in commercial settings using personal exposure monitors.
EPA-600/4-94-019, PB-84-211291, U.S. Environmental Protection
Agency, Washington, DC.

6. Wallace, L. A. (1979) Use of personal monitor to measure commuter
exposure to carbon monoxide in vehicle passenger compartment.
Paper No. 79-59.2, presented at the 72nd Annual Meeting of the
Air Pollution Control Association, Cincinnati, OH.

7. Ott, W. R. and Eliassen, R. (1973) A survey technique for determining
the representativeness of urban air monitoring stations with
respect to carbon monoxide, J. Air. Pollut. Control Assoc. 23,
685-690.

8. Ott, W. R. and Flachsbart, P. (1982) Measurement of carbon monoxide
concentrations in indoor and outdoor locations using personal
exposure monitors, Environ. Int. 8, 295-304.

9. Peterson, W. B. and Allen, R. (1982) Carbon monoxide exposures to
Los Angeles commuters, J. Air Pollut. Control Assoc. 32, 826-833.

10. Spengler, J. D. and Soczek, M. L. (1984) Evidence for improved
ambient air quality and the need for personal exposure research,
Environ. Sci. Techno!. 18, 268-80A.

11. Ott, W. R. (1985) Total human exposure: An emerging science focuses
on humans as receptors of environmental pollution, Environ.
Sci. Techno!. 19, 880-886.

12. Duan, N (1982) Models for human exposure to air pollutant, Environ.
Int. 8, 305-309.

13. Mage, D. T. and Wallace, L. A., eds. (1979) Proceedings of the
Symposium on the Development and Usage of Personal Monitors for
Exposure and Health Effects Studies. EPA-600/9-79-032, PB-80-
143-894, U.S. Environmental Protection Agency, Research Triangle
Park, NC.
31
-------
14. Wallace, L. A. (1981) Recent progress in developing and using personal
monitors to measure human exposure to air pollution, Environ.
Int. 5, 73-75.

15. Wallace, L. A. and Ott, W. R. (1982) Personal monitors: A state-of-
the-art survey, J. Air Pollut. Control Associ. 32, 601-610.

16. Duan, N. (1984) Application of the microenvironment type approach to
assess human exposure to carbon monoxide. Rand Corp., draft
final report submitted to the U.S. Environmental Protection
Agency, Research Triangle Park, NC.

17. Wallace, L. A., Zweidinger, R., Erickson, M., Cooper, S., Whitaker,
D., and Pellizzari, E. D. (1982) Monitoring individual exposure:
Measurements of volatile organic compounds in breathing-zone
air, drinking water, and exhaled breath, Environ. Int. 8, 269-282.

18. Wallace, L., Pellizzari, E., Hartwell, T., Rosenzweig, M., Erickson,
M., Sparacino, C. and Zelon, H. (1984) Personal exposures
to volatile organic compounds: I. Direct measurements in
breathing-zone air, drinking water, food, and exhaled breath,
Environ. Res. 35, 293-319.

19. Wallace, L., Pellizzari, E., Hartwell, T., Zelon, H., Sparacino, C.,
and Whitmore, R. (1984) Analyses of exhaled breath of 335
urban residents for volatile organic compounds, in Indoor Air,
vol. 4: Chemical Characterization'and Personal Exposure, pp.
15-20. Swedish Council for Building Research, Stockholm.

20. Akland, G. G., Hartwell, T. D., Johnson, T.R., and Whitmore, R. W.
(1985) Measuring human exposure to carbon monoxide in Washington,
DC, and Denver, Colorado, during the winter of 1982-83, Environ.
Sci. Technol. 19, 911-918.

21. Johnson, T. (1984) A study of personal exposure to carbon monoxide
in Denver, Colorado. EPA-600/4-84-015, PB-84-146-125,
Environmental Monitoring Systems Laboratory, U.S. Environmental
Protection Agency, Research Triangle Park, NC

22. Hartwell, T. D., Carlisle, A. C., Michie, R. M., Jr., Whitmore, R.
W., Zelon, H. S., and Whitehurst, D. A. (1984) A study of carbon
monoxide exposure of the residents in Washington, DC. Paper
No. 121.4, presented at the 77th Annual Meeting of the Air
Pollution Control Association, San Francisco, CA.

23. Holland, D. M. and Mage, D. T. (1983) Carbon monoxide in four cities
during the winter of 1981. EPA-600/4-83-025, Environmental
Monitoring Systems Laboratory, U.S. Environmental Protection
Agency, Research Triangle Park, NC.

24. Whitmore, R. W., Jones, S. M., and Rozenzeig, M. S. (1984) Final
sampling report for the study of personal CO exposure. EPA-
600/S4-84-034, PB-84-181-957, Environmental Monitoring
Systems Laboratory, U.S. Environmental Protection Agency,
Research Triangle Park, NC.

32
-------
FRAMEWORK FOR EXPOSURE ASSESSMENT
Outdoor
Emission
Sources
Outdoor
Concentrations
Time-activity
patterns
Total
Personal
Exposure
Internal Dose
i
Biologically Effective
Dose
i
Health Effect
Indoor
Emission
Sources
I
Indoor
Concentrations
Time-activity
patterns
-------
TOTAL HUMAN EXPOSURE PROGRAM
GOALS:
Estimate total human exposure for each
pollutant of concern
Determine major sources of this exposure
Estimate health risks associated with
these exposures

Determine actions to reduce these risks
-------
PROPORTION OF TIME IN SELECTED MICROENVIRONMENT

EMPLOYED PERSONS
INDOORS. WORK—28%
OJ
(J\
OUTDOORS—2%
IN TRANSIT—6%
INDOORS, OTHER—1%
INDOORS, HOME—63%
-------
PROPORTION OF TIME IN SELECTED MICROENVIRONMENTS
FULL-TIME HOMEMAKERS
CO
ON
.A*
INDOORS, OTHER—5%
INDOORS, HOME~89%
-------
MAJOR EXPOSURE SOURCES
Outdoors
Indoors
Ul
-4
Industrial
Automobile
Toxic wastes
Pesticides
Tobacco smoke
Gas stoves
Cleaners
Sprays
Dry Cleaning
Paints
Polishes
-------
EXPOSURE ASSESSMENT FOR

COMMUNITY STUDIES
OJ
00
Questionnaires
Outdoor monitoring
Indoor monitoring
Personal monitoring
Biological monitoring
-------
DISCUSSION
William F. Hunt, Jr.
Chief, Monitoring and Research Branch
Technical Support Division
Research Triangle Park, NC 27711
William C. Nelson's paper provides an
excellent overview of exposure monitoring
and associated statistical issues. The
reader must keep in mind that the paper
is directed at estimating air pollution
in microscale environments—in the home,
at work, in automobiles, etc., as well as
in the ambient air to which the general
public has access.
While it is important to better
understand air pollution levels in each
of these microenvironments, it must be
clearly understood that the principal
focus of the nation's air pollution
control program is directed at
controlling ambient outdoor air pollution
levels to which the general public has
access. The Clean Air Act (CAA) of 1970
and the CAA of 1977 emphasized the
importance of setting and periodically
reviewing the National Ambient Air
Quality Standards (NAAQS) for the
nation's most pervasive ambient air
pollutants—particulate matter, sulfur
dioxide, carbon monoxide, nitrogen
dioxide, ozone and lead. NAAQS(s) were
set to protect against both public health
and welfare effects.
One of. these pollutants, carbon
monoxide (CO), is discussed extensively
in Dr. Nelson's paper. CO is a
colorless, odorless, poisonous gas formed
when carbon in fuels is not burned
completely. Its major source is motor
vehicle exhaust, which contributes more
than two-thirds of all emissions
nationwide. In cities or areas with
heavy traffic congestion, however,
automobile exhaust can cause as much as
95 percent of all emissions, and carbon
monoxide concentrations can reach very
high levels.
In Dr. Nelson's paper, he states that
the correlations between personal CO
exposures at home or at work and ambient
CO at the nearest fixed site air
monitoring stations are weak. This does
not mean from an air pollution control
standpoint, however, that there is
something wrong with the fixed site CO
monitoring network. As stated earlier,
the air pollution control program is
directed at controlling outdoor ambient
air at locations to which the public has
access. The microscale CO monitoring
sites are generally located in areas of
highest concentration within metropolitan
areas at locations to which the general
public has access.
The Federal Motor Vehicle Control
Program has been very successful in
reducing these concentrations over time.
In fact, CO levels have dropped 32
percent between 1977 and 1986, as
measured at the nation's fixed site
monitoring networks." This improvement
has a corresponding benefit for people in
office buildings which use the outdoor
ambient air to introduce fresh air into
their buildings through their ventilation
systems. A major benefit occurs for
people who are driving back and forth to
work in their automobiles, for new cars
are much less polluting than older cars.
This should be clearly understood when
trying to interpret the major findings of
the breath monitoring programs that are
described in Dr. Nelson's paper.
Otherwise, the reader could mistakenly
conclude that somehow the Federal
Government may be in error in using fixed
site monitoring. Such a conclusion would
be incorrect. Further, it should be
pointed out that a fixed site network
also has the practical advantages of
identifying the source of the problem and
the amount of pollution control that
would be needed.
Another area of concern that needs to
be addressed in the future regarding the
breath monitoring program is the
relationship between alveolar CO and
blood carboxyhemoglobin (COHb). Dr.
Nelson states that the precise
relationship between alveolar CO and
blood COHb has not been agreed upon.
Given that, is there an inconsistency in
not being able to determine the
relationship between alveolar CO and
blood COHb and then using alveolar CO
measurements in Washington, D.C. and
Denver, Colorado to estimate blood COHb?
A final point, which needs to be
addressed in the breath monitoring
program,is the ability to detect volatile
organic chemicals, some of which may be
carcinogenic. What is the significance
of being able to detect 100 compounds in
breath, yet only one or two in blood
above the detectable limits? Does the
body expel the other 98 compounds that
cannot be detected in the blood? If so,
why?
STATISTICAL ISSUES
I agree with Dr. Nelson that
meteorological factors should be
incorporated into future TEAM studies,
through more careful experimental design.
The statistical issues identified under
TEAM design considerations, the
development of improved
raicroenvironmental monitoring designs,
errors-in-variables problem, choice
between monitoring instruments of varying
precision and cost, the development of
designs appropriate for assessing
39
-------
National levels, evaluating extreme
values in exposure monitoring, and
adjusting for censored monitoring data
are all well thought out and timely. I
strongly agree with his recommendation
that when considering multiple pollutant
species, as in the case of the volatile
and semi-volatile organic chemicals, as
well as polar compounds, the possibility
of synergistic effects necessitates the
development of a theory of multivariate
extreme values.
SUMMARY
In conclusion, Dr. Nelson's paper
provides a well thought out overview of
exposure monitoring and the associated
statistical issues. It should be an
excellent reference for people interested
in this topic. The reader should be
aware, however, of the importance of the
nation's fixed site monitoring network in
evaluating the effectiveness of the
nation's air pollution control program.
REFERENCE
1. National Air Quality and Emissions
Trends Report, 1986. U.S. Environmental
Protection Agency, Technical Support
Division, Monitoring and Reports Branch,
Research Triangle Park, NC 27711.
40
-------
Designing Environmental Regulations
Stfren Bisgaard and William G. Hunter*
Center for Quality and Productivity Improvement
University of Wisconsin-Madison
610 Walnut Street, Madison, Wisconsin 53705
• Public debate on proposed environmental regulations
often focuses almost entirely (and naively) on the allow-
able limit for a particular pollutant, with scant attention
being paid to the statistical nature of environmental data
and to the operational definition of compliance. As a
consequence regulations may fail to accomplish their pur-
pose. A unifying framework is therefore proposed that
interrelates assessment of risk and determination of compli-
ance. A central feature is the operating characteristic
curve, which displays the discriminating power of a regula-
tion. This framework can facilitate rational discussion
among scientists, policymakers, and others concerned with
environmental regulation.
Introduction
Over the past twenty years many new federal, state,
and local regulations have resulted from heightened con-
cern about the damage that we humans have done to the
environment - and might do in the future. Public debate,
unfortunately, has often focused almost exclusively on risk
assessment and the allowable limit of a pollutant.
Although this "limit part" of a regulation is important, a
regulation also includes a "statistical pan" that defines
how compliance is to be determined; even though it is typi-
cally relegated to an appendix and thus may seem unimpor-
tant, it can have a profound effect on how the regulation
performs.
Our purpose in this article is to introduce some new
ideas concerning the general problem of designing environ-
mental regulations, and, in particular, to consider the role
of the "statistical pan" of such regulations. As a vehicle for
illustration, we use the environmental regulation of
ambient ozone. Our intent is not to provide a definitive
analysis of that particular problem. Indeed, that would
require experts familiar with the generation, dispersion,
measurements, and monitoring of ozone to analyze avail-
able data sets. Such detailed analysis would probably lead
to the adoption of somewhat different statistical assump-
tions than we use. The methodology described below,
however, can accommodate any reasonable statistical
assumptions for ambient ozone. Moreover, this methodol-
ogy can be used in the rational design of any environmental
regulation to limit exposure to any pollutant.

Ambient Ozone Standard
For illustrative purposes, then, let us consider the
ambient ozone standard (1,2). Ozone is a reactive form of
oxygen that has serious health effects. Concentrations from
about 0.15 parts per million (ppm), for example, affect

*) Deceased.
respiratory mucous membranes and other lung tissues in
sensitive individuals as well as healthy exercising persons.
In 1971, based on the best scientific studies at the time, the
Environmental Protection Agency (EPA) promulgated a
National Primary and Secondary Ambient Air Quality
Standard ruling that "an hourly average level of 0.08 parts
per million (ppm) not to be" exceeded more than 1 hour
per year." Section 109(d) of the Clean Air Act calls for a
review every five years of the Primary National Ambient
Air Quality Standards. In 1977 EPA announced that it was
reviewing and updating the 1971 ozone standard. In
preparing a new criteria document, EPA provided a number
of opportunities for external review and comment. Two
drafts of the document were made available for external
review. EPA received more than 50 written responses to
the first draft and approximately 20 to the second draft.
The American Petroleum Institute (API), in particular, sub-
mined extensive comments.
The criteria document was the subject of two meet-
ings of the Subcommittee on Scientific Criteria for Photo-
chemical Oxidants of EPA's Science Advisory Board. At
each of these meetings, which were open to the public, crit-
ical review and new information were presented for EPA's
consideration. The Agency was petitioned by the API and
29 member companies and by the City of Houston around
the time the revision was announced. Among other things.
the petition requested that EPA state the primary and
secondary standards in such a way as to permit reliable
assessment of compliance. In the Federal Register it is
noted that
EPA agrees that the present deterministic form of
the oxidant standard has several limitations and
has made reliable assessment of compliance
difficult. The revised ozone air quality standards
are stated in a statistical form that will more
accurately reflect the air quality problems in vari-
ous regions of the country and allow more reli-
able assessment of compliance with the stan-
dards. (Emphasis added)
Later, in the beginning of 1978, the EPA held a public
meeting to receive-comments from interested parties on the
initial proposed revision of the standard. Here several
representatives from the State and Territorial Air Pollution
Program Administrators (STAPPA) and the Association of
Local Air Pollution Control Officials participated. After
the proposal was published in the spring of 1978, EPA held
four public meetings to receive comments on the proposed
standard revisions. In addition, 168 written comments were
received during the formal comment period. The Federal
Register summarizes the comments as follows:
The majority of comments received (132 out of
168) opposed EPA's proposed standard revision,
favoring either a more relaxed or a more
41
-------
stringent standard. State air pollution control
agencies (and STAPPA) generally supported a
standard level of 0.12 ppm on the basis of their
assessment of an adequate margin of safety.
Municipal groups generally supported a standard
level of 0.12 ppm or higher, whereas most indus-
trial groups supported a standard level of 0.15
ppm or higher. Environmental groups generally
encouraged EPA to retain the 0.08 ppm standard.
As reflected in this statement, almost all of the public dis-
cussion of the ambient ozone standard (not just the 168
comments summarized here) focused on the limit part of
the regulation. In this instance, in common with similar
discussion of other environmental regulations, the statisti-
cal pan of the regulation was largely ignored.
The final rule-making made the following three
changes:

(1) The primary standard was raised to 0.12 ppm.
(2) The secondary standard was raised to 0.12 ppm.
(3) The definition of the point at which the standard is
attained was changed to "when the expected number
of days per calendar year" with maximum hourly
average concentration above 0.12 ppm is equal to or
less than one."

The Operating Characteristic Curve
Environmental regulations have a structure similar to
that of statistical hypothesis tests. A regulation states how
data are to be used to decide whether a particular site is in
compliance with a specified standard, and a hypothesis test
states how a particular set of data are to be used to decide
whether they are in reasonable agreement with a specified
hypothesis. Borrowing the terminology and methodology
from hypothesis testing, we can say there are two types of
errors that can be made because of the stochastic nature of
environmental data: a site that is really in compliance can
be declared out of compliance (type I error) and vice versa
(type II error). Ideally the probability of committing both
types of error should be zero. In practice, however, it is not
feasible to obtain this ideal.
In the context of environmental regulations, an operat-
ing characteristic curve is the probability of declaring a site
to be in compliance (d.i.c.) plotted as a function of some
parameter 9 such as the mean level of a pollutant. This
Probfd.i.c. I 9J can be used to determine the probabilities
of committing type I and type n errors. As long as 9 is
below the stated standard, the probability of a type I error
is 1 -Prob{d.i.c. I 6}. When 6 is above the stated
standard, Prob (d.i.c. I 9J is the probability of a type II
error. Using the operating characteristic curve for the old
and the new regulations for ambient ozone, we can evalu-
ate them to see what was accomplished by the revision.
The old standard stated that "an hourly average level
of 0.08 ppm [was] not to be exceeded more than 1 hour per
year." This standard was therefore defined operationally in
terms of the observations themselves. The new standard, on
the other hand, states that the expected number of days per
calendar year with a maximum hourly average concentra-
tion above 0.12 ppm should be less than one. Compliance,
however, must be determined in terms of the actual data.
not an unobserved expected number. How should this
conversion be made? In Appendix D of the new ozone
regulation, it is stated that:
In general, the average number of exceedances
per calendar year must be less than or equal to 1.
In its simplest form, the number of exceedances
at a monitoring site would be recorded for each
calendar year and then averaged over the past 3
calendar years to determine if this average is less
than or equal to 1.
Based on the stated requirements of compliance, we have
computed the operating characteristic functions for the old
and the new ozone regulations. They are plotted in Figures
1 and 2. (The last sentence in the legend for Figure 1 will
be discussed below in the following section, Statistical
Analysis.) To construct these curves, certain simplifying
assumptions were made, which are discussed in the section
entitled "Statistical Concepts." Before such curves are
used in practice, these assumptions need to be investigated
and probably modified.
According to the main part of the new ozone regula-
tion, the interval from 0 to 1 expected number of
exceedances of 0.12 ppm per year can be regarded as
defining "being in compliance." Suppose the decision
rule outlined above is used for a site that is operating at a
level such that the expected number of days exceeding 0.12
ppm is just below one. In that case, as was noted by Javitz
(3), with the new ozone regulation, there is a probability of
approximately 37% in any given year that such a site will
be declared out of compliance. Moreover, there is approxi-
mately a 10% chance of not detecting a violation of 2
expected days per year above the 0.12 ppm limit; that is,
the standard operates such that the probability is 10% of
not detecting occurrences when the actual value is twice its
pennissable value (2 instead of 1). Some individuals may
find these probabilities (37% and 10%) to be surprisingly
and unacceptably high, as we do. Others, however, may
regard them as being reasonable or too low. In this paper.
our point is not to pursue that particular debate. Rather, it
is simply to argue that, before environmental regulations
are put in place, different segments of society need to be
aware of such operating characteristics, so that informed
policy decisions can be made. It is important to realize that
the relevant operating characteristic curves can be con-
structed before a regulation is promulgated.

Statistical Concepts
Let X denote a measurement from an instrument such
that X = 9 + e, where 9 is the mean value of the pollutant
and e is the statistical error term with variance cr . The
term e contains not only the error arising from an imperfect
instrument but also the fluctuations in the level of the pol-
lutant itself. We assume that the measurement process is
well calibrated and that the mean value of e is zero. The
parameters 9 and O"2 of the distribution of e are unknown
but estimates of them can be obtained from data. A
prescription of how the data are to be collected is known as
the sampling plan. It addresses the questions of how many.
where, when, and how observations are to be collected.
Any function f (X) =/(Xi,X2,.. - ,Xn) of the observa-
tions is an estimator, for example, the average of a set of
values or the number of observations in a sample above a
certain limit. The value of the function / for a given sam-
42
-------
pie is an estimate. The estimator has a distribution, which
can be detennined from the distribution of the observations
and the functional form of the estimator. With the distribu-
tion of the estimator, one can answer questions of the form:
what is the probability that the estimate f = f(X) is smaller
than or equal to some critical value c? Symbolically this
probability can be written as P = Probff (X_)< c I 6/.
If we want to have a regulation limiting the pollution
to a certain level, it is not enough to state the limit as a par-
ticular value of a parameter. We must define compliance
operationally in terms of the observations. The condition of
compliance therefore takes the form of an estimator
f(X\,... ,Xn) being less than or equal to some critical
value c, that is, { f (X i,... ,Xn)< c J. Regarded as a func-
tion of 6, the probability Prob{f(Xi,... ,Xn)L and zero otherwise. A year consists of
approximately n = 365 x 12 = 4380 hours of observations
(data are only taken from 9:01 am to 9:00 pm LST). The
expected number of hours per year above the limit is then
4380
Q = E{ £//.(*,) =i;=W,x 4380.
1=1
The probability that a site is declared to be in compliance
(d.i.c.) is
PM =Prob{d.i.c. I Q}=Prob\ _
1'='
(1)
1=0
This probability P0id, plotted as a function of 9, is the
operating characteristic curve for the old regulation (Figure
1). Note that (/the old standard had been written in terms
of an allowable limit of one for the expected number of
exceedances above 0.08 ppm, the maximum type I error
would be 1.00 - 0.73 = 0.27. The old standard, however, is
actually written in terms of the observed number of
exceedances so type I and type II errors, strictly speaking,
are undefined.
The condition of compliance stated in the new regula-
tion is that the "expected number of days per calendar year
with daily maximum ozone" concentration exceeding 0.12
ppm must be less than or equal to 1." Let 7, represent the
daily maximum hourly average 0=1,... ,365). Suppose
the random variables Yj are independently and identically
distributed. EPA proposed that the expected number of
days (a parameter) be estimated by a three-year moving
average of exceedances of 0.12 ppm. A site is in compli-
ance when the moving average is less than or equal to 1.
The expected number of days above the limit of L = 0.12
ppm is then
365
The three-year specification of the new standard
makes it hard to compare with the previous one-year stan-
dard. If, however, one computes the conditional probability
that the number of exceedances in the present year is less
than or equal to 0, 1,2 and 3 and multiplies that by the pro-
bability that the number of exceedances was 3, 2, 1 and 0,
respectively, for the previous two years, one then obtains a
one-year operating characteristic function.

P™ = Prob{ d.i.c. I 0 } = JT Prob f d.i.c \ k,6 }P(k)
k=0
where
f2x365 }
P(k)=Prob\ £ I(Yj) = b =
and
Prob {d.i.c.
3-k
y=0
where k=0,1,2,3. A plot of the operating characteristic
function for the new regulation, Pnew versus 9, is presented
in Figure 2.
Figures 1 and 2 show the operating characteristic
curves computed as a function of (1) the expected number
of hours per year above 0.08 ppm for the old ambient
ozone regulation and (2) the expected number of of days
per year with a maximum hourly observation above 0.12
ppm for the new ambient ozone regulation. We observe
that the 95 % de facto limit (the parameter value for which
the site in a given year will be declared to be in compliance
43
-------
with 95 % probability) is 0.36 hours per year exceeding
0.08 ppm for the old standard and 0.46 days per year
exceeding 0.12 ppm for the new standard. If the expected
number of hours of exceedances of 0.08 ppm is one (and
therefore in compliance), the probability is approximately
26% of declaring a site to be not in compliance with the old
standard. If the expected number of days exceeding 0.12
ppm is one (and therefore in compliance), the probability is
approximately 37% of declaring a site to be not in compli-
ance with the new standard. (We are unaware of any other
legal context in which type I errors of this magnitude
would be considered reasonable.) Note that the parameter
value for which the site in a given year will be declared to
be in compliance with 95% probability is 0.36 hours per
year exceeding 0.08 ppm for the old standard and 0.46 days
per year exceeding 0.12 ppm for the new standard.
Neither curve provides sharp discrimination between
"good" and "bad" values of 0. Note that the old standard
did not specify any parameter value above which non-
compliance was defined. The new standard, however,
specifies that one expected day is the limit, thereby creating
an inconsistency between what the regulation says and how
it operates because of the large discrepancy between the
stated limit and the operational limit.
The construction of Figures 1 and 2 only requires the
assumption that the relevant observations are approxi-
mately identically and independently distributed (for the
old standard, the relevant observations are those for the
hourly ambient ozone measurements; for the new standard,
they are the maximum hourly average measurements of the
ambient ozone measurements each day). The construction
does not require knowledge of the distribution of ambient
ozone observations. If one has an estimate of this distribu-
tional form, however, a direct comparison of the new and
old regulation is possible in terms of the concentration of
ambient ozone (in units, say, of ppm.) To illustrate this
point, suppose the random variable Xt is independently
and identically distributed according to a normal distribu-
tion with mean (j. and variance a2, that is, X,-Af(|a,o2).
Then the probability of one observation being above the
limit L =0.08 is
(4)
where () is the cumulative density function of the stan-
dard normal distribution. The probability that a site is
declared to be in compliance can be computed as a function
of )i by substituting pL from (4) into (1).
For the new regulation let Xl} represent the one-hour
average, O'=l 12;y=l 365), and
Y, = max£X1;,... ,X^}). If Xl}-N(]L, a2) , then YrHQ.l2} = \-
0.12-
12
one obtains the operating characteristic function for the
new standard.
For a fixed value of the variance a*, one can compute
the operating characteristic curves for the old and new
regulations to provide a graphical comparison of the way
these two regulations perform. Figure 3 shows these curves
for the old and new ambient ozone regulations computed as
a function of the mean hourly values when it is assumed
that a = 0.02 ppm. We observe that the 95% de facto limit
is changed from 0.0046 ppm to 0.045 ppm. That is. it is
approximately ten times higher in the new ozone regula-
tion.
We have three observations to offer with regard to the
old and new regulations for ambient ozone standards. First,
notwithstanding EPA's comment to the contrary, the new
ozone regulation is not more statistical than the previous
one; like all environmental regulations, both the new and
old ozone regulations contain statistical parts, and, for that
reason, both are statistical. Changing the specification
from one in terms of a critical value to one in terms of a
parameter does not make it more statistical. It actually
introduced an inconsistency. The old standard did not
specify any parameter value as a limit but only an opera-
tional limit in terms of the parameters. This therefore con-
stitutes the standard. The new standard, however, specifies
not only an intent in terms of what the desired limit is but
also an operational limit. The large difference between the
interned limit and the operational limit constitute the incon-
sistency. This inconsistency is a potential and unnecessary
source of conflict. Second, the new regulation is dependent
on the ambient ozone level for the past two years as well as
the present year, which means that a sudden rise in the
ozone level might be detected more slowly. The new regu-
lation is also more complicated. Third, it is unwise first to
record and store every single hourly observation and then
to use only the binary observation as to whether the daily
maximum is above or below 0.12 ppm. This procedure
wastes valuable scientific information. As a matter of pub-
lic policy, it is unwise to use the data in a binary form
when they are already measured on a continuous scale.
The estimate of the 1/365 percentile is an unreliable statis-
tic. It is for this reason that type I and type H errors are as
high as they are. In fact, the natural variability of this
statistic is of the same order of magnitude as the change in
the limit which was so much in debate.

If instead, for example, one used a procedure based on
the t-statistic for control of the proportion above the limit,
as is commonplace in industrial quality control procedures
(4), one would get the operating characteristic curve plotted
in Figure 4 (see also appendix). For comparison, the curve
for the new regulation is also plotted as a function of the
expected number of exceedances per year. With the new
ozone regulation, the probability can exceed 1/3 that a par-
ticular site will be declared out of compliance when it is
actually in compliance. The operating characteristic curve
for the t-test is steeper (and hence has more discriminating
power) than that for the new standard. The modified pro-
cedure based on the t-test generally reduces the probability
that sites that are actually in compliance will be declared to
be out of compliance. In fact, it is constructed so that there
is 5% chance of declaring that a site is out of compliance
when it is actually in compliance in the sense that the
expected exceedance number is one per year. Furthermore,
when a violation has occurred, it is much more certain that
44
-------
it will be detected with the t-based procedure. In this
respect, the t-based procedure provides more protection to
the public.
We do not conclude that procedures based on the t-
test are best. We merely point out that there are alterna-
tives to the procedures used in the old and new ozone stan-
dard. A basic principle is that information is lost when data
are collected on a continuous scale and then reduced to a
binary form. One of the advantages of procedures based on
the t-test is that they do not waste information in this way.
The most important point to be made goes beyond the
regulation of ambient ozone; it applies to regulation of all
pollutants where there is a desire to limit exposure. With
the aid of operating characteristic curves, informed judge-
ments can be made when an environmental regulation is
being developed. In particular, operating characteristic
curves for alternative forms of a regulation can be con-
structed and compared before a final one is selected. Also,
the robustness of a regulation to changes in assumptions,
such as normality and statistical independence of observa-
tions, can be investigated prior to the promulgation. Note
that environmental lawmaking, as it concerns the design of
environmental regulations, is similar to design of scientific
experiments. In both contexts, data should be collected in
such a way that clear answers will emerge to questions of
interest, and careful forethought can ensure that this desired
result is achieved.

Scientific Framework
The operating characteristic curve is only one com-
ponent in a more comprehensive scientific framework that
we would like to promote for the design of environmental
regulations. The key elements in this process are:

(a) Dose/risk curve
(b) Risk/benefit analysis
(c) Decision on maximum acceptable risk
(d) Stochastic nature of the pollution process
(e) Calibration of measuring instruments
(f) Sampling plan
(g) Decision function
(h) Distribution theory
(i) Operating characteristic function

Currently there may be some instances in which all of these
elements are considered in some form when environmental
regulations arc designed. Because the particular purposes
and techniques are not explicitly isolated and defined, how-
ever, the resulting regulations are not as clear nor as effec-
tive as they might otherwise be.
Often the first steps towards establishing an environ-
mental regulation are (a) to estimate the relationship
between the "dose" of a pollutant and some measure of
health risk associated with it and (b) to carry out a formal
or informal risk/benefit analysis. The problems associated
with estimating dose/risk relationships and doing
risk/benefit analyses are numerous and complex, and uncer-
tainties can never be completely eliminated. As a next step
a political decision is made - based on this uncertain
scientific and economic groundwork - as to the maximum
risk that is acceptable to society (c). As indicated in Figure
5, the maximum acceptable risk implies, through the
dose/risk curve, the maximum allowable dose. The first
three elements have received considerable attention when
environmental regulations have been formulated, but the
last six elements have not received the attention they
deserve.
The maximum allowable dose defines the compliance
set &0 and the noncompliance set 0; , which is its comple-
ment. The pollution process can be considered (d) as a sto-
chastic process or statistical time-series 0(6; r). Fluctua-
tions in the measurements X can usefully be thought of as
arising from three sources: variation in the pollution level
itself <)>, the bias b in the readings, and the measurement
error e. Thus X = <}> + b + e. Often it is assumed that 4> = 9,
a fixed constant and that variation arises only from the
measurement error e; however, all three components
, b, and e can vary. Ideally b=0 and the variance of e is
small.
Measurements will only have scientific meaning if
there is a detailed operational description of how the meas-
urements are to be obtained and the measurement process
is in a state of statistical control. A regulation must include
a specification relating to how the instruments are to be
calibrated (e). These descriptions must be an integral pan
of a regulation if it is going to be meaningful. The subject
of measurement is deeper than is generally recognized,
with important implications for environmental regulation
(5, 6, 7). The pollution process and the observed process
as a function of time are indicated in Figure 5.
Logically the next question is (f) how best to obtain a
sample X^ = (Xl,X2,... ,Xn) from the pollution process.
The answer to this question will be related to the form of
the estimator/ (X) and (g) the decision rule
d(f(X))=<
0 : process in compliance
1 : process not in compliance
The sample, the estimator, and the decision function are
indicated in Figure 5. Based on knowledge about the sta-
tistical distribution of the sample (h), one can compute (i)
the operating characteristic function
P =Prob{d(f(X)) = Q I 9; and plot the operating charac-
teristic curve P versus 9. An operating characteristic func-
tion is drawn at the bottom of Figure 5. (In practice it
would probably be desirable to construct more than one
curve because, with different assumptions, different curves
will result). Projected back on the dose/risk relationship
(see Figure 5), this curve shows the probability of
encountering various risks for different values of 9 if the
proposed environmental regulation is enacted. Suppose
there is a reasonable probability that the pollutant levels
occur in the range where the rate of change of the dose/risk
relationship is appreciable; then the steeper the dose/risk
function, the steeper the operating characteristic curve
needs to be if the regulation is to offer adequate protection.
The promulgated regulation should be expressed in terms
of an operational definition that involves measured quanti-
ties, not parameters. Figure 5 provides a convenient sum-
mary of our proposed framework for designing environ-
mental regulations.
In environmental lawmaking, it is most prudent to
consider a range of plausible assumptions. Operating
45
-------
characteristic curves will sometimes change with different
geographical areas to a significant degree. Although this is
an awkward fact when a legislative, administrative, or
other body is trying to enact regulations at an international,
national, or other level, it is better to face the problem as
honestly as possible and deal with it rather than pretending
that it does not exist.

Operating Characteristic Curve as a Goal, Not a Conse-
quence
We suggest that operating characteristic curves be
published whenever an environmental regulation is
promulgated that involves a pollutant the level of which is
to be controlled. When a regulation is being developed,
operating characteristic curves for various alternative forms
of the regulation should be examined. An operating
characteristic curve with specified desirable properties
should be viewed as a goal, not as something to compute
after a regulation has been promulgated. (Nevertheless, we
note in passing that it would be informative to compute
operating characteristic curves for existing environmental
regulations.)
In summary, the following procedure might be feasi-
ble. First, based on scientific and economic studies of risks
and benefits associated with exposure to a particular pollu-
tant, a political decision would be reached concerning the
compliance set in the form of an interval of the type
0 < 9 < Q0 for a parameter of the distribution of the pollu-
tion process. Second, criteria for desirable sampling plans,
estimators, and operating characteristic curves would be
established. Third, attempts would be made to create a
sampling plan and estimators that would meet these cri-
teria. The costs associated with different sampling plans
would be estimated. One possibility is that the desired pro-
perties of the operating characteristic curve might not be
achievable at a reasonable cost. Some iteration and even-
tual compromise may be required among the stated criteria.
Finally, the promulgated regulation would be expressed in
terms of an operational definition that involves measured
quantities, not parameters.
Injecting parameters into regulations, as was done in
the new ozone standard, leads to unnecessary questions of
interpretation and complications in enforcement. In fact,
inconsistencies (such as that implied by
/V0&f/(X)
-------
ties of violations not being detected (type n errors); indus-
tries would know the probabilities of being accused
incorrectly of violating standards (type I errors); and all
parties would know the costs associated with various pro-
posed environmental control schemes. We believe that the
operating characteristic curve is a simple, yet comprehen-
sive device for presenting and comparing different alterna-
tive regulations because it brings into the open many
relevant and sometimes subtle points. For many people it
is unsettling to realize that type I and type II errors will be
made, but it is unrealistic to develop regulations pretending
that such errors do not occur. In fact, one of the central
issues that should be faced in formulating effective and fair
regulations is the estimation and balancing of the probabili-
ties of such occurrences.

Acknowledgments
This research was supported by grants SES - 8018418
and DMS - 8420968 from the National Science Founda-
tion, Computing was facilitated by access to the research
computer at the Department of Statistics, University of
Wisconsin, Madison.
Appendix
The t-statistic procedure is based on the estimator
/ (x) = (L-x)/s where L is the limit (0.12 ppm), x the sam-
ple average, and s the sample standard deviation. The deci-
sion function is
f / (x) > c : in compliance
d(f&=-\f(x) c
(A2)
where ZQ = ~I(l-9o) and 9o is the fraction above the
limit we at most want to accept (here 1/365).
The exact operating characteristic function is found
by reference to a non-central t-distribution, but for all prac-
tical purposes the following approximation is sufficient:

L-x
Pro,
>c» =
(A3)
The operating characteristic function in Figure 4 is con-
structed using a=0.05, 9o=l/365 and n=3x365. Substitut-
ing (A3) into (A2) yields
= 1 - 0.05
(A4)
Literature Cited
(1) National Primary and Secondary Ambient Air Quality
Standards, Federal Register 36, 1971 pp 8186-8187.
(This final rulemaking document is referred to in this
article as the old ambient ozone standard.)
(2) National Primary and Secondary Ambient Air Quality
Standards, Federal Register 44, 1979 pp 8202-8229.
(This final rulemaking document is referred to in this
article as the new ambient ozone standard.) The back-
ground material we summarize is contained in this
comprehensive reference.
(3) Javitz, H. J. /. Air Poll. Con. Assoc. 1980 30, pp 58-
59.
(4) Hald, A. "Statistical Theory with Engineering Appli-
cations"; Wiley, New York, 1952; pp 303-311.
(5) Hunter, J. S. Science 210,1980 pp 869-874;
(6) Hunter, J. S. In "Appendix D", Environmental Moni-
toring, Vol IV, National Academy of Sciences 1977,
(7) Eisenhart, C. In "Precision Measurements and Cali-
bration", National Bureau of Standards Special Publi-
cation 300 Vol. 1, 1969; pp 21-47.
(8) Porter, W. P.; Hinsdill, R.; Fairbrother, A.; Olson, L.
J.; Jaeger, J.; Yuill, T.; Bisgaard, S.; Hunter, W. G :
K. Nolan, K. Science 1984,224, pp 1014-1017.
(9) Rogers, W. H. "Handbook of Environmental Law",
West Publishing Company, 1977, St. Paul, MN.
which solved for the critical value yields c = 2.6715. Refer
for example to (4) for more details.
47
-------
Figure 1. Operating characteristic curve for the 1971 ambient ozone standard (old
standard), as a function of the expected number of hours of exceedances of 0.08 ppm
per year. Note that if the old standard had been written in terms of an allowable limit
of one for the expected number of exceedances above 0.08 ppm, the maximum type I
error would be 1.00 - 0.73 = 0.27.

Figure 2. Operating characteristic curve for the 1979 ambient ozone standard (new
standard), as a function of the expected number of days of exceedances of 0.12 ppm
per year.. Note that the maximum type I error is 1.00 - 0.63 = 0.37.

Figure 3. Operating characteristic curves for the old and the new standards as a func-
tion of the mean value of ozone measured in parts per million when it is assumed that
ozone measurements are normally and independently distributed with cr = 0.02 ppm.

Figure 4. Operating characteristic curves for the new ozone standard and a t-statistic
alternative as a function of the expected number of exceedances per year.

Figure 5. Elements of the environmental standard-setting process: Laboratory experi-
ments and/or epidemiological studies are used to assess the dose/risk relationship. A
maximum acceptable risk is determined through a political process balancing risk and
economic factors. The maximum acceptable risk implies a limit for the "dose" which
again implies a limit for the pollution process as a function of time. Compliance with
the standard is operationally determined based on a discrete sample * taken from a
particular site. The decision about whether a site is in compliance is reached through
use of a statistic / and a decision function d. Knowing the statistical nature of the pol-
lution process, the sampling plan, and the functional form of the statistics and the
decision function, one can compute the operating characteristic function. Projecting
the operating characteristic function back on the dose/risk relationship, one can assess
the probability of encountering various levels of undetected violation of the standard.
03
O
(£>
^ O
U
o
ol
f\J
d
p
o
123456

expected number of hours above 0.08 ppm
48
-------
0.0
0.2
Prob( d.i.c.)

0.4 0.6
0.8
p
b
p
b
~1 - 1

old de facto limit
~r
1.0

-71
CD
CD_

o p
o p.
N W
O

T3 P
q b
o
en
o
o>
new de facto limit
(D
X
XJ
(D
Q.
C

CT
CD
a>
cr
o
CD
p

ro
TD
•o
3
Prob( d.i.c.)

0.4 0.6
0.8
1.0
limit specified by new standard
-------
u
•d
CO
o
ID
d
••r
0
OJ
d
O
CJ
0.0 0.5 1.0 1.5 2.0 2.5

expected number of days above 0.12 ppm
3.0
50
-------
EPA PROGRAMS AND ENVIRONMENTAL
STANDARDS
I appreciate the general points
that Dr. Bisgaard has made regarding
the development of environmental
standards. I agree that generally,
when standards are developed, most of
the technical emphasis is placed on
developing the magnitude of the absolute
number, which Dr. Bisgaard calls the
"limit part" of the standard. In
contrast, frequently little work is
expended developing the sampling program
and the rules that are used to evaluate
compliance with the limit in applica-
tion, which he calls the "statistical
part" of the standard. At EPA some
programs do a thorough and thoughtful
job of designing environmental stan-
dards. However, other EPA programs
could benefit from Dr. Bisgaard's work
because they have focused strictly on
the magnitude of the standard and have
not considered the "statistical part" of
the standard.

However, I insist that the ozone
standard and all of the National Ambient
Air Quality Standards fall into the
category of standards where both the
"limit part" and the "statistical part"
of the standard have been designed based
on extensive performance evaluations and.
practical considerations.

There are other EPA programs that
have also done an excellent job of
designing and evaluating the "limit
part" and the "statistical part" of
their standards. For example, under
the Toxic Substances Control Act (TSCA)
regulations, there are procedures for
managing PCB containing wastes. In
particular, PCB soil contamination must
be cleaned up to 50 ppm. Guidances have
been prepared that stipulate a detailed
sampling and evaluation program and
effectively describe the procedure for
verifying when the 50 ppm limit has been
achieved. Also under the TSCA mandate,
clearance tests are under development
for verifying that, after the removal
of asbestos from a building, levels are
not different from background levels.

There are, however, many programs
at EPA that have not performed the
analysis and inquiry necessary to
design the "statistical part" of their
standards. One example is the Maximum
Contaminant Levels (MCLs) which are
developed and used by EPA'S drinking
water program. MCLs are concentration
limits established for controlling
pollutants in drinking water supplies.
Extensive health effect, engineering,
and economic analysis is used to choose
DISCUSSION
W. Barnes Johnson
the MCL concentration value. However,
relatively little work is done to ensure
that, when compliance with the MCL is
evaluated, appropriate sampling and
analysis methodologies are used to
ensure a designed level of statistical
performance.
Similarly, risk-based cleanup
standards are used in EPA's Superfund
program as targets for how much aban-
doned hazardous waste sites should be
cleaned up. These are concentration
levels either borrowed from another pro-
gram (e.g., an MCL) or developed based
on site-specific circumstances. A great
deal of effort has been expended on
discussions of how protective the actual
risk related cleanup standards should
be; however, virtually no effort has
been focused on the methodology that
will be used to evaluate attainment of
these standards. Drinking water MCLs
and Superfund cleanup standards could
benefit from the approaches offered by
Dr. Bisgaard.

PRACTICAL ENVIRONMENTAL STANDARDS
DESIGN: POLITICS, POLLUTANT BEHAVIOR,
SAMPLING AND OBJECTIVES

Dr. Bisgaard clearly points out
that his use of the ozone standard is
only for the purpose of example and
that the message of his presentation
applies to the development of any
standard. I have responded by trying
to identify other EPA program areas
that could benefit from the perspective
offered by Dr. Bisgaard's approach.
However, it is important to realize that
the development of the "statistical
part" of an environmental standard must
consider the nature of the political
situation, pollutant behavior, sampling
constraints, and the objective of the
standard. Ignorance of these practical
considerations can limit the usefulness
of a proposed standard regardless of the
theoretical basis. The developers of
the ozone standard were quite aware of
these contingencies and it is reflected
in the form of the "statistical part" of
the ozone standard.

Central Tendency Versus Extremes

I must agree that a standard based
on central tendency statistics will be
more robust with better operating
characteristics than a standard based on
peak statistics. The difficulty is that
EPA is not concerned with estimating or
controlling the mean ozone concentra-
tion. Ozone is a pollutant with acute
health effects and, as such, EPA's
interest lies in control of the extremes
of the population. Peak statistics were
51
-------
the primary concern when the ozone
standard was developed.

EPA, in the development of NAAQS's,
has tried to balance statistical per-
formance with objectives by examining
the use of other statistics that are
more robust and yet retain control of
the extremes. For example, EPA has
suggested basing the standard on the
fourth or fifth largest value; however,
commenters maintained that EPA would
lose control of the extremes and cause
undo harm to human health. It has also
been suggested that the peak to mean
ratio (P/M) be considered. The problem
with this approach is that the P/M is
highly variable across the United States
because of variation in the "ozone
season." The objective of developing a
nationally applicable regulatory frame-
work would be quite difficult if each
locale was subject to a different stan-
dard.

Decision Errors and Power

In addition, regardless of the
standard that is chosen, decision
errors will be highest when the true
situation at a monitoring station is at
or close to the standard. As the true
situation becomes well above or below
the standard, certainty increases and
our decisions become less subject to
error. Of course, it would be -most
desirable to have an operating charac-
teristic function with a large distinct
step at the standard. This operating
characteristic would have no error even
when the true situation is slightly
above or below the standard; however,
this is virtually impossible. There-
fore, when standards are compared for
their efficacy, it is important to
compare performance along the continuum
when the true situation is well above,
at, and well below the standard. One
should not restrict performance evalu-
ation to the area at or immmediately
adjacent to the standard, for most
statistics the performance will be
quite low in this region.

Dr. Bisgaard points out from his
Figure 2 that when a site is in compli-
ance and at the standard, expecting to
exceed the standard on one day, there
is a 37% chance that the site may be
indicated as exceeding the standard.
However, it can also be shown that when
a site is below the standard and
expects to exceed the standard on one-
half of a day, there is only about a 6%
chance that the site may be indicated
as exceeding the standard. Conversely,
it can be pointed out that when the site
is above the standard and expects to
exceed the standard on three days, there
is only a 3% chance that the site will
be found to be in compliance.
Dr. Bisgaard is quite correct in
pointing out that the operating charac-
teristics of a standard based on the
mean are better than a standard based
on the largest order statistic. How-
ever, as mentioned above, a standard
based on the mean does not satisfy the
objectives of the ozone standard. EPA
staff have tendered proposals to
improve the operating characteristics
of the standard. One of these involved
the development of a three-tiered
approach that would allow a site to be
judged: in attainment, not in attain-
ment, or too close to call. The
existing structure of the attainment
program was not flexible enough to
permit this approach.

Pollutant Behavior

Ozone is a pollutant which exists
in the environment at a high mean ambi-
ent level of approximately one-third the
existing standard. Effort expended
trying to drive down peak statistics
indirectly by controlling the mean would
be futile. This is because mean levels
can only be reduced to the background
mean which, relative to the standard, is
high even in the absence of air
pollution.

Another point to consider is that
ozone behavior is influenced by both
annual .and seasonal meteorological
effects. This is the reason that the
newest standard is based on three years
of data. The effect of an extreme year
is reduced by the averaging process
associated with a three year standard.
As mentioned above, work has also
focused on controlling the peak to mean
ratios; however, because ozone seasons
vary radically across the country, this
sort of measure would be difficult to
implement.

Dr. Bisgaard has also questioned
the new standard because of the use of
the term "expected." This terminology
was probably included in the wording
because of the many legal and policy
edits that are performed on a draft
regulation. It was not intended that
the term "expected" be applied in the
technical statistical use of the term.
The term was intended to show that EPA
had considered and reflected annual
differences in ozone conditions in the
three year form of the standard.

CONCLUSIONS

Dr. Bisgaard brings an interesting
and useful perspective to the develop-
ment of environmental standards. The
important idea is that an environmental
standard is more than a numerical limit
and must include a discussion of the
associated sampling approach and
52
-------
decision function. I tried to extend issues in exhaustive detail. Second,
this central idea by adding two primary the practical issues that influence the
points. First, there are several pro- implementation of an environmental
grams within EPA that can benefit from standard are a primary constraint and
Dr. Bisgaard's perspective; however, the must be understood in order to develop a
NAAQS program is fully aware of and has standard that offers a useful measure of
considered these sampling and decision compliance.
53
-------
QUALITY CONTROL ISSUES IN TESTING COMPLIANCE WITH A REGULATORY
STANDARD: CONTROLLING STATISTICAL DECISION ERROR RATES

Bertram Price
Price Associates, Inc.

prepared under

EPA Contract No. 68-02-4139
Research Triangle Institute

for

The Quality Assurance Management Staff
Office of Research and Development
U. S. Environmental Protection Agency
Washington, D.C. 20460
ABSTRACT

Testing compliance with a regulatory standard intended to
control chemical or biological contamination is inherently a
statistical decision problem. Measurements used in compliance
tests exhibit statistical variation resulting from random
factors that affect sampling and laboratory analysis. Since a
variety of laboratories with potentially different performance
characteristics produce data used in compliance tests, a
regulatory agency must be concerned about uniformity in
compliance decisions. Compliance monitoring programs must be
designed to avoid, for example, situations where a sample
analyzed by one qualified laboratory leads to a noncompliance
decision, but there is reasonable likelihood that if the same
sample were analyzed by another qualified laboratory, the
decision would be reversed.

Two general approaches to designing compliance tests are
discussed. Both approaches have, as an objective, controlling
statistical decision error rates associated with the compliance
test. One approach, the approach typically employed, depends
on interlaboratory quality control (QC) data. The alternative,
referred to as the intralaboratory approach, is based on a
protocol which leads to unique QC data requirements in each
laboratory. An overview of the statistical issues affecting
the development and implementation of the two approaches is
presented and the approaches are compared from a regulatory
management perspective.

SECTION 1 - INTRODUCTION

Testing compliance with a regulatory standard intended to

control chemical or biological contamination is inherently a

54
-------
statistical decision problem. Measurements used in compliance
tests exhibit statistical variation resulting from random factors
affecting sampling and laboratory analysis. Compliance decision
errors may be identified with Type I and Type II statistical
errors (i.e., false positive and false negative compliance test
results, respectively). A regulating agency can exercise control
over the compliance testing process by establishing statistical
decision error rate objectives (i.e., error rates not to be
exceeded). From a statistical design perspective, these error
rate objectives are used to determine the number and types of
measurements required in the compliance test.

Bias and variability in measurement data are critical
factors in determining if a proposed compliance test satisfies
error rate objectives. Various quality control (QC) data
collection activities lead to estimates of bias and variability.
An interlaboratory study is the standard approach to obtaining
these estimates. (The U.S. Environmental Protection Agency
[USEPA] has employed the interlaboratory study approach
extensively to establish bias and variability criteria for test
procedures required for filing applications for National
Pollution Discharge Elimination System [NPDES] permits - 40 CFR
Part 136, Guidelines Establishing Test Procedures for the
Analysis of Pollutants Under the Clean Water Act.) An
alternative means of estimating bias and variability that does
not require an interlaboratory study is referred to in this
report as the intralaboratory approach. The intralaboratory
approach relies on data similar to those generated in standard
laboratory QC activities to extract the information on bias and
variability needed for controlling compliance test error rates.

The purpose of this report is to describe and compare the
interlaboratory and intralaboratory approaches to collecting QC
data needed for bias and variability estimates which are used in
compliance tests. Toward that end, two statistical models, which
55
-------
reflect two different attitudes toward compliance test
development, are introduced. Model 1, which treats differences
among laboratories as random effects, is appropriate when the
laboratory producing the measurements in a particular situation
is not uniquely identified, but is viewed as a randomly selected
choice from among all qualified laboratories. If Model 1 is
used, an interlaboratory study is necessary to estimate "between
laboratory" variance which is an essential component of the
compliance test. Model 2 treats laboratory differences as fixed
effects (i.e., not random, but systematic and identified with
specific laboratories). If Model 2 is used, bias adjustments and
estimates of variability required for compliance tests are
prepared in each laboratory from QC data collected in the
laboratory. Model 2 does not require estimates of bias and
variability from interlaboratory data.

The remainder of this report consists of five sections.
First, in Section 2, statistical models selected to represent the
data used in compliance tests are described. In Section 3, a
statistical test used in compliance decisions is developed. The
comparison of interlaboratory and intralaboratory approaches is
developed in two steps. Section 4 is included primarily for
purposes of exposition. The types and numbers of measurements
needed for a compliance test are derived assuming that the
critical variance components - i.e., within and between
laboratories - have known values. This section provides the
structure for comparing the interlaboratory and intralaboratory
approaches in the realistic situation where the variance
components must be estimated. The comparison is developed in
Section 5. A summary and conclusions are presented in Section 6.

SECTION 2 - STATISTICAL MODELS
Compliance tests are often complex rules defined as
combinations of measurements that exceed a quantitative standard.
However, a simple rule - an average of measurements compared to
56
-------
the standard - is the basis for most tests. This rule provides
the necessary structure for developing and evaluating the
interlaboratory and intralaboratory approaches. Throughout the
subsequent discussion, the compliance standard is denoted by C0
and interpreted as a concentration - e.g., micrograms per liter.
Samples of the target medium are obtained, analyzed by chemical
or other appropriate methods and summarized as an average for use
in the test. The statistical design issues are:

o total number of measurements required;

o number and type of samples required; and

o number of replicate analyses per sample required.

The design issues are resolved by imposing requirements on the
compliance test error rates (i.e., the Type I and Type II
statistical error rates).

Many sources of variation potentially affect the data used
in a compliance test. The list includes variation due to sanple
selection, laboratory, day and time of analysis, analytical
instrument, analyst, and measurement error. To simplify the
ensuing discussion, the sources have been limited to sample
selection, laboratory, and measurement error. (Measurement error-
means analytical replication error or single analyst
variability.) This simplification, limiting the number of
variance components considered, does not limit the generality of
subsequent results.

The distribution of the compliance data is assumed to have
both mean and variance proportional to the true concentration.
(This characterization has been used since many types of
environmental measurements reflect these properties.) The data,
after transformation to logarithms, base e, may be described as:
57
-------
1 Yi,j,k = M + Bi + Sifj + eifjfk

where i = 1(1)1 refers to laboratory, j = 1(1) J refers to sample
and k = 1(1)K refers to analytical replication. Two different
interpretations referred to as Model 1 and Model 2 are considere
for the factors on the right side of equation 1.
In Model 1:
\i - ln(C), where C is the true concentration;

B^ - the logarithm of recovery (i.e., the
proportion of the true concentration
recovered by the analytical method) which is
a laboratory specific effect treated as
random with mean zero and variance cr2B;

Sj_ j - a sample effect which is random with mean
zero and variance C72s/* and

€i j,k " replication error which is random with mean
zero and variance CT26.

It follows that:
ae
and denoting as Yj_ an average over samples and replicates,

EQ 2 Var[Y.i_] = <72B + ff2s/J + 02e/J-K.
In Model 2, BJ_ is interpreted as a fixed effect (i.e., Bj_ is
bias associated with laboratory i) . All other factors have the
same interpretation used in Model 1. Therefore, in Model 2:
58
-------
+ B±
and
EQ 3 Var[Yi] = cr2s/J + a2e/J'K

Differentiating between Model 1 and Model 2 has significant
practical implications for establishing an approach to compliance
testing. These implications are developed in detail below. For
now, it is sufficient to note that the collection of Bj_'s are
treated as sealer factors uniquely associated with laboratories.
If the identity of the specific laboratory conducting an analysis
is unknown because it is viewed as randomly selected from the
population of all laboratories, then Bj_ is treated as a random
effect. If the laboratory conducting the analysis is known, Bj_
is treated as a sealer, namely the bias of the ith laboratory.

SECTION 3 - STATISTICAL TEST: GENERAL FORMULATION
The statistical test for compliance is based on an average
of measurements, Y. Assuming that Y's are normally distributed
(recall that Y is the natural logarithm of the measurement),
noncompliance is inferred when

EQ 4 Y > T

where T and the number of measurements used in the average are
determined by specifying probabilities of various outcomes of the
test. (For simplicity in exposition in this section, the
subscripts i, j, and k used to describe the models in Section 2
are suppressed. Also, aY is used in place of the expressions in
EQ 2 and EQ 3 to represent the standard deviation of Y. The more
detailed notation of EQ 2 and EQ 3 is used in the subsequent
sections where needed.)
59
-------
Let P! and p2 be probabilities of declaring noncompliance
when the true means are d]/Co and d2'CQ respectively (dlrd2 > 0),
and let

Mo = ln(C0)

D! = ln(d!), D2 = In(d2).
Requiring

EQ 5 px = P[ Y > T: M = Mo + Dl ]
and
EQ 6 p2 = P[ Y > T: M = Mo + D2 1

leads to values of T and the number of measurements used to forr
Y by solving

EQ 7 [(T - Mo + D1)]/aY = Zi.p-L
and
EQ 8 [(T - MO + D2)]/(7y = Zl-p2

where Z^.p^ and Z^_p2 are percentile points of the standard
normal distribution.

The solutions are:

EQ 9 T = C7y-Zl_pl + MO + Dl

EQ 10 ay = (D2 - QI)/(ZI-PI ~ Zi_p2).

This formulation allows considerable flexibility for
determining compliance test objectives. Consider the following
three special cases:

Case (i) . When d]_ = 1, p^_ = a, d2 is any positive number
60
-------
greater than 1 and p2 = 1 - /3, the formulation reduces to the
classical hypothesis testing problem Ho: M = MO versus
H]_: fj. = ^o + D2 • T^e correct number of measurements establishes
the probabilities of Type I and Type II errors at a and /3
respectively.

Case (ii) . Let d]_ = 1, £2 be a positive number less than 1,
P! = 1 - /3, and p2 = a. This formulation also reduces to the
classical hypothesis testing problem Ho: M = MO + D2 versus
H2_: M = MO- (Note that ju0 + D2 < MO* i.e., D2 < 0.)

Case (iii) . Let 1 < d-^ < d2. Set p^_ < p2 to large values
(e.g., .90 and .99). This formulation imposes a high probability
of failing the compliance test when the mean is D]_ times the
standard, and a higher probability of failing when the mean is
further above the standard.

Case (ii) imposes a more stringent regulatory program on the
regulated community than Case (i). In Case (i), the regulated
community may establish control methods to hold the average
pollution level at the standard. In Case (ii), the pollution
level must be controlled at a concentration below the standard if
the specified error rates are to be achieved. In Case (iii), a
formal Type I error is not defined. Individual members of the
regulated community may establish the Type I error rate by
setting their own pollution control level - the lower the control
level, the lower the Type I error rate. In Case (iii), the
regulated community has another option also. There is a tradeoff
between the control level and the number of measurements used in
the compliance test. Individuals may choose to operate at a
level near the standard and increase the number of measurements
used in the compliance test over the number required to achieve
the stated probability objectives. The important difference
between Case (iii) and the two other cases is the responsibility-
placed with the regulated community regarding false alarms (i.e.,
61
-------
Type I errors) . Since false alarms affect those regulated more
than the regulator, Case (iii) may be the most equitable approach
to compliance test formulation.

SECTION 4 - SAMPLE SIZE REQUIREMENTS: VALUES OF VARIANCE
COMPONENTS KNOWN
The discussion below follows the structure of Case (i)
described above. Based on the general formulation developed in
Section 3, the conclusions obtained also hold for Cases (ii) and
(iii).

MODEL 1
The compliance test is a statistical test of:

H0: ju = Mo = In (C0)
versus
+ D
where C0 is the compliance standard. Assuming the values 'of the
variance components are known, the test statistic is

Z = (Yi - M0)/(tf2B + cr2S/J

Specifying the Type I error rate to be a leads to a test
that rejects HQ if

EQ 11 Z > Zi_a

where Z^_a is the (l-a)th percentile point of the standard normal
distribution. If the Type II error is specified to be /? when the
alternative mean is /no + 02, then:

EQ 12 a2B + (72S/J + a2e/J-K =
62
-------
Any combination of J and K satisfying EQ 12 will achieve the
compliance test error rate objectives. However, unique values of
J and K may be determined by minimizing the cost of the data
collection program subject to the constraint in EQ 12. Total
cost may be stated as:

EQ 13 TC = J-G! + J'K-C2

where Cl is the unit cost of obtaining a sample and C2 is the
cost of one analysis.

Using the LaGrange Multiplier method to minimize EQ 13
subject to the constraint imposed by EQ 12 yields:

EQ 14 K = (ae/as)-(C1/C2)1/2
and
EQ 15 J = [as-ae/(U-a2B)]-[as/ae + (C2/C1)1/2]
where
U = [D2/(Z!_a + Z^)]2.
•
(If EQ 14 does not produce an integer value for K, the next
largest integer is used and J is adjusted accordingly.)

The number of replicate analyses for each sample, K,
increases as the ratio of the sampling cost to the analysis cost
increases and the ratio of the single analyst standard deviation
to the sampling standard deviation increases. In many
situations, the analysis cost, C2, is much larger than the
sampling cost, GI; and the sampling variance is much larger than
single analysis variability. Under these conditions, the number
of replicate analyses, K, will be 1 (i.e., each sample will be
analyzed only once).
63
-------
MODEL 2
Since

E(Yi) = M + Bi

the statistic used in the compliance test must incorporate a bias
adjustment (i.e., an estimate of B^) . This can be achieved by
analyzing standard samples prepared with a known concentration c.
(Choosing C at or near CQ minimizes the effects of potential
model specification errors.) Let
16 *>i,,k = Yi,j,k - lnc * Bi
Since
E(bi) = Bi

b^ is an estimate of B^ and

Var(bi) = a2Si/J' + a2e/J"K'
where
S ' i j - an effect associated with standard samples
which is random with mean zero and variance
a2Si ;
J1 - the number of standard samples used to
estimate B-[ ; and
K1 - the number of analyses conducted on each
standard sample.

(Note that single analyst variability, cr2€, is assumed to have
the same value for field samples and prepared samples.)

The test statistic is

EQ 17 (Yi-bi-)u0)/[a2s/J + a2s,/J' + a2e/(l/J"K' + 1/J-K)]V2
64
-------
The cost function used to allocate the samples and replicates is:

EQ 18 TC = J-G! + J'-C3 + (J-K + J''K')'C2

where C3 is the unit cost for preparing a standard sample.
Type I and Type II error rates - a and /3 - are achieved if:

EQ 19 o2s/J + a2Si/J! + a2e(l/J'-K' + 1/J'K) = U
where
U = [D2/(Z1_a + Z^)]2,

as defined in the discussion of Model 1.

Minimizing costs subject to the constraint on variance
yields

EQ 20 K = (ae/as)-(C1/C2)1/2,

which is identical to the solution obtained for Model 1, and

EQ 21 K' = (ae/as,)-(C3/C2)1/2/

EQ 22 J- = (as,/U)-[as-(C1/C3)1/2, + 2-ae(C2/C3)L/2 + ag,i,
and
EQ 23 J = J"(as/c7S,)'(C3/C1)1/2.

The solutions for K and K1 are similar. Each increases with
the ratio of sampling to analytical costs and the ratio of
analytical to sampling standard deviations.

SECTION 5 - SAMPLE SIZE REQUIREMENTS: VALUES OF VARIANCE
COMPONENTS UNKNOWN
In this section the interlaboratory and intralaboratory
approaches for obtaining estimates of the variance components
necessary to implement the designs developed in Section 4 are
65
-------
described. As in Section 4, the design objective is to control
the compliance test error rates (i.e., the Type I and Type II
error probabilities). The discussion is simplified by
considering situations where the cost of analysis is signifi-
cantly greater than the cost of sampling, and the sample to
sample variability is at least as large as the analytical
variability:

C2 >:> cl an<^ a^S > Cf2e*

Under these conditions, K = 1 (i.e., each sample is analyzed onl;
once). Also, the value of K1 determined from EQ 21 (i.e., the
number of replicate analyses performed on each standard sample)',
will be set equal to 1 since the cost of preparing standard
samples for estimating Bj_ is significantly less than the cost of
analyzing those samples (i.e., C^ « ^2) •

When K = K1 = 1, the variances used to define the test
statistic are, for Model 1 and Model 2 respectively:

EQ 24 Var(Yj_) = CT2B + (a2s + cr2e)/J

= a2B + a2€,/J
and
EQ 25 Var(Yj_ - bjj = (a2s + a2e)/J + (a2s. + a2e)/J'

= CT2ei/J + a2eii/J1.

(The notations
-------
using interlaboratory data or it may be estimated from the J
measurements of field samples used to form the average when the
compliance test is performed.

As described by Youden (1975), an interlaboratory study
involves M laboratories (between 6 and 12 are used in practice)
which by assumption under Model 1 are randomly selected from the
collection of all laboratories intending to produce measurements
for compliance testing. For the discussion below, let n denote
the number of samples analyzed by each laboratory. (Youden
recommends n = 6 prepared as 3 pairs where the concentrations of
paired samples are close to each other but not identical.)

Let
w^
where {Vj_^j: i=l(l)M; j=l(l)n) are the measurements produced by
the i-th laboratory on the j-th sample, and {Cj: j=l(l)n} are the
concentration levels used in the study. (Youden does not
recommend using logarithms, however the logarithmic
transformation is convenient and is consistent with other
assumptions in Youden's design.) The statistical model
describing the interlaboratory study measurements is:

EQ 26 Wifj = Bi + e"!^

where
BJ[ is an effect associated with the i-th laboratory
and treated as a random variable with mean zero
and variance cr2B; and
67
-------
e''j_^j is analytical error, the sum of single analyst
error and an effect associated with variation
among standard samples, which has mean zero and
variance cr2e • i .

Using standard ANOVA (analysis of variance) techniques, a2B
may be estimated from the "within laboratory" and "between
laboratory" mean squares, Q^ and Q2 :

EQ 27 Q-L = 2(Wi;j - Wi)2/M-(n-l)
and
EQ 28 Q2 = n'Z(Wi - W)2/(M-1).

The estimate is:

EQ 29 s2B = (Q2 - Qi)/n

which reflects differences among the laboratories through the
quantity

EQ 30 Z(Bj_ - B) 2.

Also, QI is an estimate of a2eii.

The compliance test statistic may be defined either as
EQ 3 la R = (Yi - M0)/(SB +
or
2
1/2
EQ 31b R = (Yi - M0)/(SB + S

where s2ei is the sample variance of the J measurements,
se
68
-------
and (Y-j^j = InCXi^j), j = i(i)J) are the measurements obtained
from field samples in the laboratory selected to conduct the
analyses. (Based on the discussion at the beginning of this
section, K is always equal to I. Therefore, the notation
describing compliance measurements has been simplified, i.e.,
Yi,j ~ Yi,j,l)- Note that Q-^ estimates the average variability
over laboratories, whereas s2ei estimates variability for the
laboratory conducting the test. Also, QI is an estimate of
a2eii, the variability associated with the analysis of standard
samples; s2ei is an estimate of the variability associated with
the analysis of field samples.

The ratios in EQ 31a and EQ 31b have approximate t-distri-
butions when the null hypothesis is true. The degrees of freedc:
may be estimated by methods developed by Satterthwaite (1946).
Although it is possible to approximate the degrees of freedom anc
use a percentile point of the t-distribution to define the test,
that approach is complicated. Develop it at this point would be
an unnecessary diversion. Instead, non-compliance will be
inferred when

EQ 32 R > Zx_a

where Z^-a is the (1 - a)th percentile point of the standard
normal distribution. (If R has only a few degrees of freedom,
which is likely, the Type I error rate will be larger than a.
The situation may be improved by using, for example, Z1_a/2 °r
some other value of Z larger than Z1_a. If necessary, exact
values of Z could be determined using Monte Carlo methods.)

The number of samples, J, that must be analyzed for the
compliance test is obtained by specifying that the expression in
EQ 32 is equal to 1-/3 when the true mean is MO + D2 • Tne value
of J may be obtained either by using approximations based on the
normal distribution, the noncentral t-distribution, or by
69
-------
estimates based on a Monte Carlo simulation of the exact
distribution of R.

If EQ 31a is used, the compliance test criterion (i.e., the
expression in EQ 32) becomes
EQ 33 GM(Xi/:J) > CQ-expCZ^a • (sB +

where GM is the geometric mean of the J compliance measurements.
The right side of the inequality is a fixed number once the
interlaboratory study is completed. The advantage of this
approach is the simplicity realized in describing the compliance
test to the regulated community 'in terms of one measured
quantity, the geometric mean. The disadvantage is using Q]_
rather than the sample variance calculated from the compliance
test measurements which is likely to be a better estimate of
variability for the particular laboratory conducting the test.

MODEL 2
Under Model 2, estimates of variance from interlaboratory
study data are unnecessary. Since the laboratory conducting the
analyses for the compliance test is uniquely identified, the
laboratory factor, Bj_, is a sealer, and the variance component,
cr2B/ does not enter the model. The variance estimates needed for
the compliance test can be obtained from the measurements used tc
compute YJ_ and bj_.

The test statistic is

EQ 34 t = (Y-j. - bi - M0)/(s2e./J + s2e, '/J')1/2

which has an approximate t-distribution with degrees of freedom
equal to J + J1 - 2 when the true mean is /i0- (The statistic
would have an exact t-distribution if a2ei were equal to cr2eii.)
Noncompliance is inferred if
70
-------
EQ 35 t > tx_a.

J and J1 are determined by requiring that the probability of the
expression in EQ 35 be equal to 1 - /3 when the true mean is
MO + D2• T^i3 calculation can be made using the noncentral t-
distribution. Where cr2ei = a2eii/ the noncentrality parameter is
D2/[c72€'(1/J + 1/J')]. (Note that this formulation implies a
tradeoff between J and J1 for achieving the compliance test error
rate objectives.) If cr2ei and cr2eii are not equal, the correct
value to replace t;L_a in EQ 35 and values of J and J1 may be
determined using Monte Carlo methods.

SECTION 6 - DISCUSSION AND CONCLUSIONS
Both statistical models considered above are consistent with
reasonable approaches to compliance testing. The two approaches,
however, have distinctly different data requirements.

Model 1, through EQ 32a, reflects "the conventional"
approach to compliance testing. A "target value for control,"
Co, is established (e.g., either a health based standard or a
"best available control technology" standard) and then adjusted
upward to account for both analytical variability and laboratory
differences. Using EQ 33, noncompliance is inferred when the
geometric mean of the compliance test measurements, GM(Xj_ j), is
larger than Co multiplied by a factor which combines estimates
reflecting variability between laboratories, a2B, and analytical
variability within laboratories. Since an estimate of cr2B is
required in the Model 1 approach, an interlaboratory study is
required also. The role of cr2B, which reflects laboratory
differences, is to provide insurance against potentially
conflicting compliance results if one set of samples were
analyzed in two different laboratories. Systematic laboratory
differences (i.e., laboratory bias) could lead to a decision of
noncompliance based on analyses conducted in one laboratory and a
71
-------
decision of compliance based on analyses of the same samples
conducted in another laboratory.

In practice, a2B is replaced by s2B, an estimate obtained
from the interlaboratory study. The variability of this estimate
also affects the compliance test error rates. If the variance of
s2B is large, controlling the compliance test error rates becomes
complicated. Requiring that more field samples be analyzed
(i.e., increasing J) may help. However, increasing the amount of
interlaboratory QC data to reduce the variance of s2B directly
may be the only effective option. Based on interlaboratory QC
data involving 6 to 12 laboratories, which is current practice,
the error in s2B as an estimate of a2B is likely to be as large
as 100%. If interlaboratory QC data were obtained from 30
laboratories, the estimation error still would exceed 50%. "
(These results are based on a 95% confidence interval for CT2B/s2B
determined using the chi-square distribution.) Since
interlaboratory data collection involving 12 laboratories is
expensive and time consuming, it is doubtful if a much larger
effort would be feasible or could be justified.

Using Model 2 and the intralaboratory approach, a regulatory
agency would not attempt to control potential compliance decision
errors resulting from laboratory differences by using an estimate
of "between laboratory" variability to adjust the compliance
standard. Instead, compliance data collected in each laboratory
would be adjusted to reflect the laboratory's unique bias and
variability characteristics. In many situations, bias for any
specific laboratory can be estimated as precisely as needed using
QC samples. Also, the variance of the bias estimate, which is
needed for the compliance test, can be estimated from the same
set of QC sample measurements. An estimate of analytical
variability required for the compliance test can be estimated
from the measurements generated on field samples. Therefore, all
information needed to develop the compliance test can be obtained
72
-------
within the laboratory that produces the measurements for the
test.

From a regulatory management perspective, both approaches
(i.e., Model 1 using interlaboratory QC data and Model 2 using
intralaboratory QC data) lead to compliance tests that satisfy
specified decision error rate objectives. However, the
intralaboratory approach based on Model 2 appears to be the more
direct approach. The design for producing data that satisfy
error rate objectives is laboratory specific, acknowledging
directly that laboratories not only have different bias factors,
but also may have different "within laboratory" variances. Each
laboratory estimates a bias adjustment factor and a variance
unique to that laboratory. Then, the number of samples required
for that specific laboratory to achieve specified error rate
objectives is determined. As a result, each laboratory produces
unbiased compliance data. Also, compliance test error rates are
identical for all laboratories conducting the test. Moreover,
the data used to estimate laboratory bias and precision are
similar to the QC measurements typically recommended for every
analytical program. In summary, the intralaboratory approach
appears, in general, to provide a greater degree of control over
compliance test error rates while using QC resources more
efficiently than the approach requiring interlaboratory QC data.
73
-------
REFERENCES
Satterthwaite, F.E. (1946), "An Approximate Distribution of
Estimates of Variance Components", Biometrics Bulletin, Vol. 2,
pp. 110-114.

Youden, W.J.; and Steiner, E.H. (1975), Statistical Manual of
AOAC. Association of Official Analytical Chemists, Washington,
D.C.
74
-------
DISCUSSION
George T. Flatman
U.S. Environmental Protection Agency
Dr. Bertram Price has something worth saying
and has said it well in his paper entitled,
"Quality Control Issues in Testing Compliance
with a Regulatory Standard: Controlling Sta-
tistical Decision Error Rates."

The Environmental Protection Agency is
emphasizing "Data Quality Objectives." Dr. Price
has expressed the most important of these objec-
tives in his title, "Controlling Statistical
Decision Error Rates." The paper is timely for
EPA because it demonstrates how difficult the
statistics and the implementation are for data
quality objectives.

In Section 1.. .Introduction, an "interlabora-
tory study approach" is suggested for establish-
ing "bias and variability criteria." This is
theoretically valid but may not be workable in
practice. In contract laboratory programs,
standards are in a much cleaner matrix (dis-
tilled water instead of leachate) and sometimes
run on cleaner instruments that have not just
run dirty specimens. Standards or blank samples
cannot avoid special treatment by being blind
samples since they are in a different matrix
than the field samples. Thus, in practice, the
same matrix and analytical instruments must be
used to make "interlaboratories study" an un-
biased estimate of the needed "bias and vari-
ability criteria." Both the theory and the
implementation must be vigorously derived.

In Section 2...Statistical Models the enumer-
ation of the components of variation is important
for both theory and practice. More precise
enumeration of variance components than the
mutually exclusive and jointly exhaustive theory
of "between and within" is needed for adequate
sampling design. I agree with Dr. Price that
"simplification, limiting the number of variance
components, does not limit the generality of
subsequent results," but I suggest it makes
biased or aliased data collection more probable.
For example, the Superfund Interlaboratories
Studies of the Contract Labs has identified the
calibration variance of the analytical instrument
as the largest single component of longitudinal
laboratory (or interlaboratories) variance.
If this component of variation is not enumerated
explicitly, I suggest this component of variance
could be omitted, included once, or included
twice. If all the field samples and lab repli-
cate analyses were run between recalibrations of
the analytical instrument, the recalibration
variance would be omitted from the variances of
the data. If the analytical instrument were
recalibrated in the stream of field samples and
between lab replicate analyses, the recalibration
variance would be aliased with both the sample
and lab variances, and thus added twice into the
total variance. With these possible analyses
scenarios the recalibration component of variance
could be either omitted or included twice. This
potential for error can be minimized through the
vigorous modeling of all the process sources of
variation in the components of variance model.
This is not a criticism of the paper out it is a
problem for the implementation of this paper by
EPA's data quality objectives.

Section 3...Statistical Test is very important
because it specifically states the null and
alternative hypotheses with their probability
alpna of type I error and probability beta of
type II error. This may appear pedantic to the
harried practitioner, but due to the importance
of the decision is absolutely essential to data
quality objectives. Dr. Price's alternative
hypothesis and his beta-algebra is complicated
by EPA's interpretation of the law, "no exceed-
ence of background values or concentration
limits" (40 CFR part 264). This requires an
interval alternative hypothesis

Hi : u > Mo

rather than Dr. Price's point hypothesis

H-J : u = U0 + D.

Lawyers should be more aware of how they increase
the statistician's work. Beta is a function or
curve over all positive D.

I think it is important to mention in any
environmental testing that beta is more critical
or important than in historical hypotheses test-
ing. Classically the hypotheses are formulated
so that a type II error is to continue with the
status quo when in fact a new fertilizer, brand
of seed potato, etc., would be better. Thus, the
loss associated with the type II error is low and
its probability of occurrence can be large (e.g.,
20 percent) in agricultural experiments. This is
not true in environmental hypotheses testing!
The hypotheses usually make a type II error the
misclassification of "dirty" as "clean" with a
loss in public health and environmental protec-
tion. Thus, beta representing the probability of
this loss in public health and environmental
protection should be set arbitrarily low like
alpha (1% or 52) .

Sections 4 and 5...Sample Size Reouirements
derive equations for numbers of field samples
and lab replicates as a function of cost and
variances. The formulas digitize the process
for precise decis'ions between number of field
samples and number of lab replicates. The for-
mulas indicate that an analysis instrument like
GCMS, because of its high incremental analysis
cost and low variance requires few replications
(K=l), but other analysis instruments such as
radiation counters may not. These formulas have
a practical value because of the diversity of
analysis instruments and pollutants.

Section 5...Sample Size Requirements: Values
of Variance Components Unknown detail the rigors
of variance components estimation through unknown
degrees of freedom and non-central t-distribution.
75
-------
It might be asked, is not only the sum of var-
iances needed for testing or "quality assurance"
(i.e., rejection of outliers). This is true, but
"quality improvement" requires the estimation of
each component of variance. The analysis is more
meaningful and usable if the individual compo-
nents have an estimate.

Section 6...Discussion and Conclusions state
that interlaboratories QC model(variable effects)
and intralaboratory QC model (fixed effects)
"lead to compliance tests that satisfy specified
decision error rate objectives." This theoreti-
cal position of the paper is confirmed by the
empirical findings of the Superfund Interlabora-
tories Comparison of the Contract Laboratories.
This study found that within-lab variance is of
corresponding magnitude to between-lab variance.
The appropriate test and model should be used
that correspond to the use of one lab or more
than one lab in the actual chemical analysis of
the data.

In conclusion, Dr. Bertram Price has rigor-
ously presented the algorithms and the problems
for "Controllina Statistical Decision Error
Rates." This paper enumerates the statistical
problems in applying hypothesis testing to real
world data. Unfortunately, hypotheses testing is
made deceptively simple in many textbooks and'the
true complexity is discovered in practice through
the expensive consequences of a wrong decision.
The serious problems discussed in Or. Price's
paper are needed to sober the superficial use of
"alphas, betas, and other probabilities" in data
quality objective statements. The paper is a
timely and vigorous summary of components of vari-
ance modeling and hypotheses testing.

Acknowledgments: The discussant wishes to thank
l-orest Garner and Evangelos Yfantis for their
advice, review, and insight gained from Super-
fund interlaboratories testing.

Notice: Although the thoughts expressed in this
discussion have been supported by the United
States Environmental Protection Agency, they have
not been subject to Agency review and therefore
do not necessarily reflect the views of the
Agency and no official endorsement shculd.be
inferred.
76
-------
ON THE DESIGN OF A SAMPLING PLAN TO VERIFY COMPLIANCE WITH EPA STANDARDS
FOR RADIUM-226 IN SOIL AT URANIUM MILL-TAILINGS REMEDIAL-ACTION SITES
R.O. Gilbert, Pacific Northwest Laboratory; M.L. Miller, Roy F. Weston,
Inc.; H.R. Meyer, Chem-Nuclear Systems, Inc.
1.0 INTRODUCTION

The United States government is required under the Uranium Mill Tailings
Radiation Control Act (U.S. Congress Public Law 95-604, 1978) to perform
remedial actions on inactive uranium mill-tailings sites that had been federally
supported and on properties that had been contaminated by the tailings. The
poc
current Environmental Protection Agency (EPA) standard for Ra (henceforth
denoted by Ra) in soil (EPA, 1983) requires that remedial action must be taken
if the average concentration of Ra in surface (0- to 15-cm) soil over any
area of 100 square meters exceeds the background level by more than 5 pCi/g,
or if the average exceeds 15 pCi/g for subsequent 15-cm thick layers of soil
more than 15 cm below the surface. Since there are many thousands of 100
square-meter areas that must be evaluated, the soil sampling plan should be
as economical as possible while still meeting the intent of the regulations.
After remedial action at a site has }>een conducted, the field sampling
procedure that has been used to determine whether the E.PA standard was'met was
to first grid the entire site into 10-m by 10-m plots. Then, in each plot,
20 plugs of surface soil were collected and physically mixed together from
which a single 500-g composite sample was withdrawn and assayed for Ra. If
this measurement was > 5 pCi/g above background, then additional remedial
action was required. Recently, based on cost considerations and the study
described in Section 2.0, the number of soil plugs per composite sample was
reduced from 20 to 9.
In this paper we discuss a verification acceptance-sampling plan that is
being developed to reduce costs by reducing the number of composite soil samples
that must be analyzed for Ra. In Section 2.0 we report on statistical analyses
of Ra measurements on soil samples collected in the windblown mi 11-tailings
flood plain at Shiprock, NM. These analyses provide guidance on the number
and size of composite soil samples and on the choice of a statistical decision
rule (test) for the acceptance-sampling plan discussed in Section 4.0. In
Section 3.0, we discuss the RTRAK system, which is a 4-wheel-drive tractor
equipped with four Sodium-Iodide (Nal) gamma-ray detectors. The RTRAK is being
developed for measuring radionuclides that indicate the amount of Ra in surface
soil. Preliminary results on the calibration of these detectors are presented.

77
-------
2.0 PERCENT ACCURACY OF MEANS AND PROBABILITIES OF DECISION ERRORS

In this section we statistically analyze Ra measurements of composite
soil samples collected from the windblown mi 11-tailings flood-plain region at
Shiprock, NM. This is done to evaluate the impact on probabilities of false
positive and false negative decision errors resulting from reducing the number
of soil plugs per composite soil sample from 21 to 9 or 5 and from collecting
1, 2, or 3 composite samples per plot. We also consider how these changes
affect the accuracy of estimated mean Ra concentrations.

2.1 FIELD SAMPLING DESIGN
The Shiprock study involved collecting multiple composite soil samples
of different sizes from 10 plots in the flood-plain region after an initial
remedial action had occurred. Five sizes of composite samples were collected;
those formed by pooling either 5, 8, 9, 16, or 21 plugs of soil.
Figure 1 shows the windblown mill-tailings flood-plain region and the
location of ten 30-m by 30-m study areas from which composite soil samples.
were collected. Eight- and 16-plug composite samples were formed by pooling
soil plugs that were collected over the ten 30-m by 30-m areas according to
the three sampling patterns shown in the lower half of Fig. 2. The 5-, 9-,
and 21-plug composite samples were formed by pooling soil plugs collected
from only the central 10-m by 10-m plot in each 30-m by 30-m area using the
three patterns shown in the upper half of Fig. 2.
Up to nine composite samples of each type were formed in each of the ten
areas. Each composite sample of a given type used the same pattern that had
been shifted slightly in location. For example, referring to Fig. 2, the
21-plug composite sample number 1 in a given 10-m by 10-m plot was formed by
pooling soil plugs collected at the 21 positions numbered 1 in the plot.
This design allowed replicate composite samples of a given type to be collected
without altering the basic pattern that would be used in practice.
Each soil plug was collected to a depth of 15 cm using a garden trowel.
The plugs collected for a given composite sample were placed in a bucket and
mixed vigorously by stirring and shaking. The composite sample analyzed for
Ra consisted of about 500 g of the mixed soil.
78
-------
10-m by 10-m Plots Where 226Ra Concentrations
Were Expected to Exceed 5 pCi/g
FIGURE 1. Location of the Ten 30-m by 30-m Areas in the Windblown Mill-
tailings Flood Plain Region at Shiprock, New Mexico," Within
Which 'Multiple-composite Soil Samples were Collected Following
Initial Removal of Surface Soil.
79
-------
-10 m
21-Plug Composites
9-Plug Composites
T
1.8 m
i
i
10 m
m
m
16-Plug Composites
8-Plug Composites
1. 3, 5, 7 and 9
5-Plug Composites
Positions Where
Soil Cores Were
Taken
m
8-Plug Composites
2, 4, 6 and 8
FIGURE 2. Sampling Patterns for 5-, 8-, 9-, 16-, and 21-plug
.Composite Soil Samples Collected From Ten 30-m by
.. 30-m Areas in the Windblown Mill-tailings Flood Plain
"•v-:': at Shiprock, New Mexico.
80
-------
2.2 DESCRIPTION OF THE DATA
The Ra measurements for the composite samples are plotted in Figs. 3, 4,
and 5. The figures also give the arithmetic mean, x, the standard deviation,
s, and the number of replicate composite samples, n. We wish to determine
the extent to which the true standard deviation, <7, increases when fewer than
21 plugs are used to form a composite sample. To avoid confusion, we point
out that Figs. 4 and 5 indicate that Ra measurements of most 5-, 9-, and 21-
plug samples from Areas- 1, 3, and 4 are larger than measurements for the 8-
and 16-plug samples from those areas. This is believed to have occurred
because the soil in the central 10-m by 10-m plot (from which 5-, 9-, and 21-
plug composite samples were formed) had higher concentrations of Ra than the
soil in the 30-m by 30-m areas from which the 8- and 16-plug samples were
formed (see Fig. 1).
Measurements for Areas 8, 9, and 10 were below 5 pCi/g (Fig. 3) and the
standard deviations ranged from 0.2 to 0.8 pCi/g, with no apparent trends in
s with increasing number of plugs per sample. The data in Fig. 4 indicates
that 5-plug sample data sets may be more skewed than those for 9- or 21-plug
samples, at least for some plots. The measurements for Areas 1, 4, and 7 (Fig.
5) had higher means and were more variable than those for the areas in Figs.
3 and 4. In Fig. 6 are plotted the values of s from Figs. 3, 4, and 5 to
show more clearly the changes in s that occurred as the number of plugs per
composite sample changed.

2.3 ESTIMATING AND MODELING CHANGES IN STANDARD DEVIATIONS
In this section we first estimate the changes in a that occur as the
number of plugs per composite sample decreases from 21 to a smaller number.
Then a model for these changes is developed for use in later sections.
A simple model for the ratio of standard deviations is obtained by assuming
that measurements of Ra in individual soil plugs are uncorrelated, than the
soil plugs are thoroughly mixed together before the 500-g aliquot is removed,
and that the standard deviation between soil plugs does not change as the
81
-------
n
X
.s
«• f\
10
CD "
0
Q. _
>. cr
cu 5
QC
CO
a
n
99999
2.7 2.2 2.5 2.2 2.6
0.7 0.7 0.7 0.6 0.7

yy A ^— — //^
A jfa vv vv y\
8 | 1 | 1
.1 1 ' I 1 L_
99889
0.7 1.4 0.6 1.6 1.5
0.3 0.3 0.4 0.4 0.5

| A | |
1,1.1
89999
0.6 1.4 0.8 1.6 0.8
0.4 0.5 0.2 0.8 0.3

A
fcr
& 1 A i 1
1 g p ( a
5 8 9 16 21 5 8 9 16 21 5 8 9 16 21
Number of Soil Plugs per Composite Sample
~8 _9 10
Area Number
FIGURE 3. Ra Measurements (pCi/g) of 5-, 8-, 9-, 16-, and 21-plug
Composite Soil Samples Taken from Areas 8, 9, and 10 in
the Windblown Mill-tail ings Flood Plain at Shiprock, New
Mexico, x and s are the Arithmetic Mean and Standard
••'."Deviation of the n Measurements for each Data Set.
82
-------
in
ix
IS
'*>ft'
20
i
15
! 05
O
! °- 10
< CD
:cc
CO
c
O
n
45756
2.2 2.2 2.3 1.4 2.4
0.6 1.8 1.2 0.4 0.6

* 1
i i i ' i
99999
6.0 1.9 5.3 1.5 4.7
1.3 0.6 1.2 0.5 0.5

| *
|~A T 1
I ! 1 i 1
99999
5.2 1.9 2.6 1.9 2.5
4.7 0.8 2.1 0.8 0.7
A

A
A
A
. A
A
^ 1 & ^ &
I I ! 1 1
99999
3.8 1.8 1.0 1.7 2.0
1.3 0.4 0.4 0.4 0.8

•
A
1 £
i i i i i
1 5 8 9 16 21 5 8 9 16 21 5 8 9 16 21 5 8 9 16 21
Number of Soil Plugs per Composite Sample

2356
' Area Number
FIGURE 4. 225Ra Measurements (pCi/g) of 5-, 8-, 9-, 16-, and 21-
plug Composite Soil Samples Taken from Areas 2, 3, 5,
and 6 in the Windblown Mi 11-tail ings Flood Plain at
Shiprock, New Mexico, x and s are the Arithmetic
Mean and Standard Deviation of the n Measurements for
each Data Set.
83
-------
h
X
s
OCX

20
en
^ 15

Q.
*
CU
DC
§ 10

95959
10.2 5.6 9.9 4.5 9.0
1.82.1 1.3 0.9 3.1
-

A
A
^ A
$ A-
i
g o
* A A • ^
A.
A A
^ , A

ZS
A

III!!

55555
10.5 3.3 8.5 4.7 9.5
2.9 1.4 3.4 2.3 1.7

^J
A
A
A
A A
A A
A A A A
A

^
A 6

A
A
I I I I !

98 989
13.1 7.9 10.6 7.5 8.0
4.3 1.5 3.1 0.8 1.9
" • " •
A

A A
A

A A '
A
A A
fi A
A A A A

2 A
^ ^ ^ s
^
/\
A

III!!

5 8 9 16 21 5 8 9 16 21 5 8 9 16 21,
~~ Number of Soil Plugs per Composite Sample
4 7
1
Area Number
FIGURE 5.
226Ra Measurements (pCi/g) of 5-, 8-, 9-16- and 21-plug Composite Soil
Samples Taken from Areas 1, 4, and 7 in the Windblown Mill-tailings rlood
Plain at Shiorock, New Mexico, x and s are the Arithmetic Mean and
Standard Deviation of the n Measurements for each Data Set.
84
-------
Mean
1 Area 226Ra,
5 _ Number pCi/g
i CT
\
"o
C.
o
ra

••a
L_
ra
5

7
;6
.3

'8
2
10
9
3

11
no

2
.5

3
2
1
1
: Mean
i Area :22eRa,
Number ; pci/g
'1
:2
7
4

1 5
: s
3
10
6
Q
2
8
4
2
2
1
2
1
21
16
Number of Plugs per Composite Sample
FIGURE 6. Standard Deviations of Multiple Composite Samples from Areas
1 Through 10 at the Windblown^ill-tailings Flood Plain at
Shiprock, New Mexico. Mean DRa Concentrations for each
Area are Given to Illustrate that Areas with Lower Average
Concentrations tend to have Smaller and More Stable Standard
Deviations.
85
-------
sampling pattern (see Fig. 2) changes. Under these assumptions we have the
model

-------
TABLE 1. Comparing Estimated and Predicted Ratios of Standard
Deviations for Composite Samples Formed From Different
Numbers of Soil Plugs.

+ Predicted** Ratios
Ratio of Standard Estimated Ratios Computed Using Computed Using
Deviations Data from Areas 1 through 8 Equation 1
Geometric
Standard
Geometric Error Arithmetic Standard
Mean (GM) (GSE) Mean (AM) Error (SE)
cTq/(7«<* 1.3 1.3 1.6
<75/a21 1.7 1.3 2.2
a5/crg 1.3 1.2 . 1.5
a8/a15 1.4 1.3 1.7
* a. = true standard deviation of j-plug composite
1/2
** r nmn 1 1 +• o H ac (n 1 n \ ' uikior»a n anr) n a VQ -Una t
0.3
0.7
0.3
0.5
samples.
1.53
2.05
1.34
1.41

of soil plugs per composite sample, respectively.
ooc
+ Areas 9 and 10 were excluded because of their very low and uniform Ra

measurements.

++ GSE = exp (Sp/v/O where s,, is the estimated standard deviation of the

natural logarithms (n =8).
87
-------
1 ra
\
'o
,2 4
I 3
OJ
O
•a
h-
« O
•o -i
a
1 -
A5-Piug Composite Sample
O 9-Plug Composite Sample
+ 21 -Plug Composite Sample
A
;s = 0.43 + 0.22x(9-Plug)
r = 0.76
s = 0.41 -f, 0.26x(5-Plug)
r= 0.71
s = 0.10 -f 0.23x(21-Plug)
r = 0.87
4 5 6 7 8 9 10

I Arithmetric Mean, x (pCi/g)
11 112 13 14
FIGURE /. Least-Squares Linear Regression Lines Relating the
Standard Deviation of Replicate Composite Samples
from a Plot to the Estimated Mean Concentration of
"DRa for the Plot.
88
-------
Substituting Eq. (3) in Eq. (2) gives

ap = (0.10 + 0.23/i)(p2/p1)1/2 (4)

which is the model used here to predict the standard deviation of p.-plug
composite samples, where p. < 21. The equations for 5- and 9-plug samples in
Fig. 7 were not used to predict standard deviations because of the relatively
small correlations (r) obtained for those data.

2.4 PERCENT ACCURACY OF ESTIMATED MEAN Ra CONCENTRATIONS
Using Eq. (4) and assuming that Ra measurements of composite samples are
normally distributed, the following formula was used to estimate the percent-
accuracy with which the post-remedial-action mean Ra concentration for a plot
at Shiprock would be estimated with specified confidence:
Percent Accuracy = 100 Z (0.10+0.23/i) (p2/p1)1/2/(/iV/n), (5)

where Z equals 1.96 or 1.28 if 95% or 80% confidence, respectively, is required,
n is the number of p,-plug composite samples collected in the plot and averaged
together to estimate the plot mean, and \i is the true plot mean. Eq. (5) is
based on the usual formula for estimating the number of samples required to
estimate a mean with prespecified relative accuracy and confidence; see, e.g.,
Gilbert (1987, p. 33).
In Fig. 8 are plotted values of Eq. (5) for 80% and 95% confidence, p. =
5, 9, and 21 plugs, n = 1 and 2 composite samples per plot, and for ji ranging
from 1 to 10 pCi/g. To illustrate the meaning of Fig. 8, consider the plotted
value for 95% confidence, p^ 9, n = 2, and ft = 8. If two 9-plug samples are
from a 10-m by 10-m plot that has a true mean concentration of 8 pCi/g
(including background), then we can be 95% sure that.the arithmetic mean of
the two measurements will fall within about 51% of the true mean.
The curves in Fig. 8 show that approximately doubling the number of plugs
per sample increases the percent accuracy by 20 to 25 percentage points.
Also, the increase in percent accuracy is negligible if more than 4 composite
samples are used.
89
-------
140
120 - \
100
o
5
3
u
u
c
93
U
k.
0!
Q.
80
60
40
20
95%
Confidence
80%
Confidence
Number of Soil Plugs
per Composite Sample
Number of
Composite Samples
1
2
8100
226.
8 10
= True Mean Ra Concentration (pCi/g)
Including Background
?26
FIGURE 8. Percent Accuracies for Estimated Mean Ra
Concentrations in Surface Soil for 10-m by
10-m Plots at the Shiprock, New Mexico Site.
90
-------
By dividing Eq. 5 when p« = 21 and p^ < 21 by Eq. 5 when p2 = p1 = 21 we
obtain (21/p,)1 , which is the factor by which the percent accuracy of 21-
plug composite samples is multiplied to get the percent accuracy of p^-plug
samples. This formula gives 1.5 and 2.0 when Pj = 9 and 5, respectively.
Notice that this factor is not study-site dependent since it does not depend
on fj, or a.

2.5 PROBABILITIES OF REMEDIAL ACTION DECISION ERRORS
In this section the increase in remedial-action decision errors as the
number of plugs per sample declines is quantified. These results are obtained
assuming: (1) that Eq. 4 is an appropriate model for the variance of p.-plug
composite samples (p. < 21), (2) the estimated Ra mean concentration for a
plot based on p,-plug composite samples withdrawn from the plot is normally
distributed, and (3) the mean Ra background concentration is known.
The probabilities of making remedial action decision errors are computed
for three different decision rules:
Decision Rule 1
Take additional .remedial action if x1 + 1.645 a_ /v/n (the upper-95%
Pi
confidence limit on the true plot mean) exceeds 5 pCi/g above background,
where x' is the estimated mean concentration (above background) for the plot
based on n p. - plug composite samples.
Decision Rule 2
Take additional remedial action if x1 exceeds 5 pCi/g above background.
Decision Rule 3
Take additional remedial action if x1 - 1.645 a /v/n (the lower 95%
Pi
confidence limit on the true plot mean) exceeds 5 pC.i/g above background.
Among these three rules, Rule 1 offers the greatest protection to the
public because the probabilities of taking additional remedial action are
greater than for rules 2 or 3. Rule 3 will result in fewer decisions to take
remedial action than rules 1 or 2 for plots with true mean Ra concentrations
near 5 pCi/g above background. Hence, Rule 3 will tend to reduce costs of
91
-------
remedial action. Rule 2 is a compromise strategy in that the probabilities
of taking remedial action fall between those for Rules 1 and 3.
Let us define ft to be the probability that a statistical test will indicate
additional remedial action is needed. When Decision Rule 1 is used, the
probability ft is obtained by computing:
.
"21
where 5 is the EPA limit, /*' is the true plot mean above background, a~, is
the standard deviation of 21-plug composite samples given by Eq. (3), p, is
the number of soil plugs used to form each of the n composite samples from
distribution. I0 is then referred to tables of the cumulative normal
P
distribution to determine ft.
For Decision Rule 2, the same procedure is used except that Eq. (6) is
computed with the constant 1.645 replaced by zero. For Decision Rule 3, the
negative sign before 1.645 in Eq. (6) is replaced by a positive sign.
We computed ft for various' values of ft' when the background Ra concentration
was assumed to be 1 pCi/g (the approximate background value for the windblown
flood plain at the Shiprock site) when n = 1, 2, or 3, and p, = 5, 9, or 21.
The results when n = 1 are plotted in Fig. 9, and the results for one, two,
or three 9-plug composite samples are plotted in Fig. 10.
These figures indicate that:
1. Decreasing the number of plugs per composite sample increases the
probability of incorrectly deciding additional remedial action is needed.
92
-------
Confidence
Lower 95%
Confidence
Rule
(Rule 3)
if 50%
Confidence /
EPA Limit
I ! I I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
22£
True Mean Ra Concentration (pCi/g) Including Background for a Plot
0 1 2 3 4 5 6 78 9 10 11 12 13 14 15 16 17 18

True Mean 22SRa Concentration (pCi/g) Above Background for a Plot
FIGURE 9. Probabilities of Taking Additional Remedial Action in a
Plot for Three Decision Rules When One 500-g Sample from
a Composite Sample Composed of Eithe^Sl, 9, or 5 Soil
Plugs from the Plot is Measured for Ra.
93
-------
For example, if the upper confidence limit rule is used (Rule 1), if one
composite sample is collected, if the true mean for the plot is 3 pCi/g
above background, and if background is 1 pCi/g, then the probability the
rule will indicate additional remedial action is needed increases from
about 0.40 to about 0.65 if a 9-plug rather than a 21-plug composite
sample is used to estimate the plot mean (see Fig. 9).
2. Decreasing the number of plugs per composite sample increases the
probability of incorrectly deciding additional remedial action is not
needed. For example, if the lower confidence limit rule is used (Rule
3), if one composite sample is collected, if the true plot mean is 10
pCi/g above background, and if background is 1 pCi/g, then the probability
that Rule 3 will correctly indicate additional remedial action is needed
decreases from about 0.60 to about 0.30 if a 9-plug rather than a 21-
plug sample is used (see Fig. 9).
3. Taking more than one composite sample per plot reduces the probability
of incorrectly deciding additional remedial action is needed. For the
example in number 1 above, the probability decreases from about 0.65 to
about 0.45 if two composite samples rather than one are collected to
estimate the mean (see Fig. 10).
4. For plots with mean concentrations near 5 pCi/g above background, the
probabilities of taking additional remedial action are highly dependent
on which decision rule is used. For example, if the upper confidence
limit rule is used (Rule 1), the probability is greater than 0.95 that
the test will indicate additional remedial action is needed when the
plot has a mean Ra concentration greater than 5 pCi/g above background.
But if the lower confidence limit rule (Rule 3) is used, and one 21-plug
composite sample is collected, the probability that the test will indicate
additional remedial action is needed does not reach 0.95 until the true
plot mean is about 20 pCi/g above background. Rule 2 falls between these
two extremes. It achieves a 0.95 probability (for one or more 21-plug
samples) when the true mean above background is about 9 or 10 pCi/g (see
Fig. 9).
The three decision rules may find application at different times in the
remedial action process. The upper confidence limit rule seems most appropriate
94
-------
1.0
Upper 95%
Confidence ,
Rule (Rule 1) /
Lower 95%
Confidence
Rule (Rule 3)
if/ Confidence
'// Rule
'
Number of
9-Plug Samples
. 1
2
3
2 4 6 8 10 12 14 16

True Mean Ra Concentration (pCi/g) Including Background for a Plot

0_ 2 4 6 8 10 12 14 16
22£
True Mean Ra Concentration (pCi/g) Above Background for a Plot
FIGURE 10. Probabilities of Taking Additional Remedial Action in a
Plot for Three Decision Rules if One, Two, or Three
500-g Samples from a Composite Sample Composed of 9 Soil
Plugs are Measured for Ra.
95
-------
at initial stages when it may be prudent to assume that the plot is contaminated
until proven otherwise. The "price" of using this rule is increased remedial
action costs for plots that have true mean concentrations just under 5 pCi/g
above background. The lower confidence limit rule is more appropriate for
plots that are strongly believed to have already been cleaned to below the
EPA limit. Using this rule, the probability of taking additional remedial
action is less than 0.05 when the true plot mean is 5 pCi/g above background
or less.
The magnitude of changes in the probability of making incorrect remedial
action decisions due to changing the number of soil plugs per composite sample
from 21 to a lesser number depends on the particular statistical test used to
make the decision. For example, suppose the decision to take additional
remedial action will be made whenever the estimated plot mean above background
is greater than the EPA limit of 5 pCi/g above background (Rule 2). Also,
assume that the standard deviation of composite-sample Ra concentrations is a
known constant as modeled using the Shiprock data. Then using one or more 9-
plug rather than 21-plug composite samples increases the probability of making
decision errors (incorrectly deciding additional remedial action is or is not
needed) by no more than about 17 probability points. These maximum increases
are over relatively narrow bands of true plot means above background; between
2.5 and 4.5 pCi/g and between 6 and 13 pCi/g. These bands become smaller if
more than one composite sample per plot is used to estimate the plot mean.
If the plot mean is estimated using one or more 21- or 9-plug samples, the
probability of incorrectly deciding additional remedial action is not needed
is small (< 0.05) when the true plot mean above background exceeds about 15
pCi/g.
If Rules 1 and 3 are to yield and probabilities shown in Figs. 9 and 10
the true standard deviation for the plot must be given by Eq. (4). At
contaminated sites where this model does not apply, special soil sampling
studies could be conducted to determine whether Eq. (4) or some other model
is applicable. Alternatively, if several composite samples are collected
from each plot then the standard deviation could be estimated directly for
each plot using those data. Then upper or lower confidence limits would be
computed using the t distribution rather than the normal distribution [see
96
-------
Exner et al. (1985) for an application of the upper confidence limit test].
Use of the t distribution will generally give more decision errors, which is
the price paid when the standard deviation must be estimated. If the mean
background Ra concentration is estimated, this will also increase the standard
deviation and hence the probabilities of making decision errors.
As concerns the comparison of 21-, 9-, and 5-plug samples, the increase
in probabilities of decision errors as the number of plugs per composite sample
is reduced is, on the whole, about the same as shown in Figs. 9 and 10 when
the standard deviation, a , was assumed known. This conclusion is based on
1
probabilities of decision errors we obtained using the noncentral t distribution
and the methods in Wine (1964), pp. 254-260). These results are shown in
Fig. 11 for the case of two composite samples per plot.

2.6 EXPECTED NUMBER OF DECISION ERRORS
The expected number of plots at a remediated site that are misclassified
as needing or not needing additional remedial action depends on the
probabilities of making decision errors and on the frequency distribution of
the true plot means. Fig. 12 shows the frequency distribution of estimated
Ra means for 1053 plots at the Shiprock floodplain site that had undergone an
initial remedial action (removal of soil). Each mean was estimated by the
measurement of one 20-plug composite sample from the plot. Fig. 12 shows
that 83 plots had estimated means that exceeded the EPA standard of 1 pCi/g
above background (6 pCi/g).
We assume for illustration purposes that the histogram in Fig. 12'is
the distribution of true plot means. (When the RTRAK system becomes
operational, it is expected that, following remedial action, all plots will
have Ra concentrations below the EPA limit. Hence, the distribution in
Fig. 12 may be a worst case distribution.) Under this assumption we wish to
determine the effect of using 9 rather than 21 plugs of soil per composite
sample on the expected number of plots that are misclassified. Let ni be the
number of plots in the ith frequency class, Q be the number of classes, and
Pi be the probability of a decision error for a plot with true mean in the
ith class using a chosen decision rule. Then E =£ nipj is the expected
number of misclassified plots for the decision rule.
97
-------
CO
f*
o
o
„ o>
o Z
: C3 CO
1 o —

,11
— :VM
.+- O
«•<
05 M>
I- ra
2'-5
- 0
•= cc
re
O
i_
C_
!1.0r
0.8 -
Upper 95%
Rule .'//
(Rule 1)
50% Confidence
Rule (Rule 2)
21 Plugs
9 Plugs
5 Plugs
EPA Limit
Lower 95
Confidence
Rule (Rule 3)
0
10 12 14 16 18 20
226,
Mean Ra Concentration (pCi/g) Above Background
for a Plot
FIGURE 11. Probabilities of Taking Additional Remedial Action
in a Plot for Three Decision Rules if Two 21-, 9-,
or 5-Plug Composite Soil Samples are Collected and
the t Test is Used to Make Decisions.
98
-------
200--
150--
C/3
*-*
_o
0.
OJ
3
z
100-
2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0

Ra (pCi/g) (Including Background of 1 pCi/g)
FIGURE 12, Frequency Distribution of Estimated Mean ^"Ra
Concentrations (pCi/g) in Surface Soil following
Initial Remedial Action for 1053 10-m by 10-m
Plots in the Windblown Mill-tailings Flood Plain
at Shiprock, New Mexico.
99
-------
First, we computed E for the 970 plots in the Q = 12 classes in Fig. 12
that had means less than 6 pCi/g, i.e., for plots that met the EPA standard.
Using the probabilities in Fig. 9 for Rule 2 of incorrectly deciding to take
additional remedial action, we found that E = 27.4 and 40.2 for 21- and 9-
plug samples, respectively. Hence, the use of a single 9-plug rather than a
single 21-plug composite in each plot would result in an expected 13 more
plots undergoing unneeded additional remedial action.
Next, we computed E for the 83 plots in Fig. 12 that had means greater
than 6 pCi/g, i.e., for plots needing additional cleanup. Using Rule 2 and
the probabilities of incorrectly deciding no additional remedial action was
needed from Fig. 9, we found E = 12.95 and 19.5 for 21- and 9-plug samples,
respectively. That is, about 7 more plots would not receive needed remedial
action if 9- rather than 21-plug samples were used.
We note that the 83 plots in Fig. 12 that exceeded the EPA standard were
subsequently further remediated:

2.7. LOGNORMAL MODEL
The results in Sections 2.3 - 2.6 were obtained by modeling the
untransformed data under the assumption those data were normally distributed.
We used the W statistic to test for normality and lognormality (see, e.g.
Gilbert (1987) or Conover (1980) for descriptions of this test) of the data
in Figs. 5, 6, and 7. We found that 21-plug samples were more likely to be
normally distributed than the 9- or 5-plug samples, and that 9- and 5-plug
samples were more likely to be lognormally distributed than normally
distributed. Also, the increase in the standard deviation as the mean increases
(see Fig. 7) indicates that the lognormal distribution may be a better model
for these data than the normal distribution.
In this section we investigate the extent to which the probability results
in Section 2.5 would change if the lognormal distribution rather than the
normal distribution was appropriate. To do this, the natural logarithms of
the data in Figs. 3, 4, and 5, were computed and a model was developed for
the standard deviation of the logarithms. We found that after deleting the
data for plots 9 and 10 (the standard deviation of the logarithms (sy) for
these plots were about twice as large as for the remaining eight plots) there
100
-------
was no statistically significant linear relationship between Sy and the mean
of the logarithms. This indicates that the lognormal distribution may be a
reasonable model, at least for plots with concentrations at the level of those
in plots 1 through 8. The pooled standard deviation of the logarithms for
plots 1-8 was 0.4, 0.37, and 0.3 for 5-, 9-, and 21-plug samples, respectively,
The probabilities of taking additional remedial action were computed for
Rule 2 for the case of one, two, or three 5-, 9-, and 21-plug samples using
these modeled standard deviations. This was done by computing

Z^= (In 5 - In /t1)/ ay

and referring Z^ to the standard normal distribution tables, where ay equalled
0.4, 0.37, and 0.3 for 5-, 9-, and 21-plug samples, respectively.
We found that for 9-plug samples, the false-positive error probabilities
for the lognormal case differed by less than two probability points from those
for the normal case for all mean Ra concentrations less than the EPA limit.
Differences in the false-negative rates were as large as 8 probability points
for mean concentrations between 8 and 10 pCi/g above background for the case
of one 9-plug composite sample per plot. These results, while limited in
scope, suggest that the false-positive and false-negative error probabilities
in Section 2.5 may be somewhat too large if the lognormal distribution is
indeed a better model for the Ra data than the normal distribution.
101
-------
3.0 RTRAK AND ITS CALIBRATION

The RTRAK is a 4-wheel-drive tractor equipped with four Sodium-Iodide
(Nal) detectors, their supporting electronics, an industrial-grade IBM PC,
and a commercial microwave auto-location system. The detectors are
independently mounted on the front of the tractor and can be hydraulically
lifted and angled. Bogey wheels support the detectors to maintain a distance
of 12 inches from the ground during monitoring. Each detector has a tapered
lead shield that restricts its field of view to about 12 inches, with overlap
between adjacent detectors. The RTRAK will take gamma-ray readings while
moving at a constant speed of 1 mph. When a reading above a prespecified
level is encountered, red paint is sprayed on the ground to mark these "hot
spots". The automatic microwave locator system provides x-y coordinates with
the count data. This will permit real-time map generation to assist in control
of contamination excavation. Preliminary data indicate that the RTRAK should
be able to detect Ra in soil at concentrations less than 5 pCi/g. Further
tests of the RTRAK1s detection capabilities are underway.
The proper calibration of the RTRAK detectors' is important to the success
of the remedial-action effort. The Na(I) detectors detect selected radon
daughter gamma peaks that are related to Ra. Hence, the RTRAK detectors do
not directly measure Ra, the radionuclide to which the EPA standard applies.
Radon is a gas, and the rate that it escapes from the soil depends on several
factors including soil moisture, source depth distribution, soil randon
emanating fraction, barometric pressure, soil density, and soil composition.
The calibration of the detectors must take these variables into account so
that randon daughter gamma peaks can be accurately related to Ra concentrations
under field conditions.
A field calibration experiment near the Ambrosia Lake, NM, mill-tailings
pile was recently conducted as part of the effort to develop a calibration
procedure. In this experiment the RTRAK accumulated counts of Bi (Bismuth)
for approximately 2-second intervals while traveling at 1 mph. Red paint was
sprayed to mark the locations and distances traveled for each time interval.
For each detector, from 3 to 5 surface soil samples were collected down the
center!ine of each scanned area (Fig. 13). Then, for each of these areas,
102
-------
Location
ector 1 2
! 4
1
2
8 F
3
4
i
eet
r
000
0 0 O
o o o
000

O 0 0
o o o
O O O
o o o
OUll
Plugs,
• • •
N -
o o o
000
0 O O
0 O O
FIGURE 13. Pattern of.Soil-Sample Locations and RTRAK
Detector Readings for Obtaining Data to
Calibrate the Detectors.
103
-------
these samples were mixed and a ~ 500-g aliquot was removed and sealed in a
metal can that was assayed for Ra within a few days and then again following
a 30-day waiting period to permit equilibrium to be established between Rn
and 214Bi.
The data and the fitted least-squares linear regression line are displayed
in Fig. 14. The data for the 4 detectors have been combined into one data
set because there were no important differences in the 4 separate regression
lines. Also shown in Fi-g. 14 are the 90% confidence intervals for predicted
Ra individual measurements. The regression line and limits in Fig. 14 were
obtained by first using ordinary least-squares regression on the In-transformed
data. Then the equation was exponentiated and plotted in Fig. 14. It is
expected that this calibration equation will be adjusted on a day-by-day basis
by taking several RTRAK-detector measurements per day at the same location in
conjunction with measurements of barometric pressure and soil moisture. This
adjustment procedure is presently being developed.
104
-------
90
80 -

^
•9
CO
5:
J-N-*
~\
\j
70 '
60 -

50 -
-
40 -

226Ra

n
R2

30 1
20 ^
10
0 1
I '

10
90% Confidence
Intervals
20
30 40 5

609 KeV, cps
60
70
FIGURE 14. Least-sauares Regression Line for Estimating " Ra
Concentrations (pCi/g) in Surface Soil Based on
RTRAK-Detector Readings of Bi-214 (609 Kev).
105
-------
4.0 COMPLIANCE ACCEPTANCE SAMPLING

As illustrated by Fig. 14, there is not a perfect one-to-one correspondence
214
between RTRAK detector counts for Bi and measurements of Ra in aliquots of
214
soil. This uncertainty in the conversion of Bi counts to Ra concentrations,
and the fact that the EPA standard is written in terms of Ra concentrations,
suggests that soil samples should be collected in some plots and their Ra
concentrations measured in the laboratory as a further confirmation that the
EPA standard has been met. Schilling (1978) developed a compliance acceptance-
sampling plan that is useful for this purpose.
Schilling's procedure as applied here would be to (1) determine (count)
the total number (N) of 10-m by 10-m plots in the remediated region, (2) select
a limiting (small) fraction (PL) of defective plots that will be allowed (if
undiscovered) to remain after remedial action has been completed, (3) select
the confidence (C) required that the fraction of defective plots that remain
after remedial action has been conducted does not exceed PL, (4) enter Table
1 in Schilling (1978) or Table 17-1 in Schilling (1982) with D = NPL to
determine the fraction (f) of plots to be sampled, (5) select n = fN plots at
random for inspection, and (6) "reject" the lot of N plots if the inspection
indicates one or more of the n plots does not meet the EPA standard. (The
meaning of "reject" is discussed below.)
In Step 6, each of the n plots would be "inspected" by collecting three
or four 9- or 21-plug composite soil samples and using these to conduct a
statistical test to decide if the plot meets the EPA standard. The choice of
three or four 9- or 21-plug samples is suggested by the results of our
statistical analyses in Section 2.0 in the windblown mill-tailings flood plain
region at Shiprock, NM.
Steps 4 and 5 can be simplified by using curves (Hawkes, 1979) that give
n at a glance for specified N, PL, and C. Also, the Operating Characteristic
(OC) curves for this procedure (curves that give the probability of rejecting
the lot [of N plots] as a function of the true fraction of plots that exceeds
the standard) can be easily obtained using Table 2 in Schilling (1978) or
Table 17-2 in Schilling (1982).
106
-------
To illustrate the 6-step procedure above, suppose C = 0.90 and PL = 0.05
are chosen, and that the remediated region contains N = 1000 plots. Then we
find from Fig. 1 in Hawkes (1979) that n = 46 plots should be inspected. If
all 46 inspected plots are found to be non-defective, we can be 100C = 90%
confident that the true fraction of defective plots in the population of N =
1000 plots is less than 0.05, the specified value of PL- If one or more of
the n plots fail the inspection, then our confidence is less than 0.90.
As another example; suppose there are N = 50 plots in the remediated
region of interest. Then, when C = 0.90 and PL = 0.05, we find that n = 30
plots should be inspected. Small lots that correspond perhaps to subregions
of the entire remediated region may be needed if soil excavation in these
regions was difficult or more subject to error because of hilly terrain or
other reasons.
The action that is taken in response to "rejecting the lot" may include
collecting three or four 9- or 21-plug composite soil samples in adjacent plots
surrounding the inspected plots that exceeded the EPA standard. The same
statistical test as used previously in the original n plots would then be
conducted in each of these plots. If any of these plots were contaminated
above the EPA limit, they would undergo remedial action and gamma scans using
the RTRAK system, and additional adjacent plots would be sampled, and so forth.
The calibration and operation of the RTRAK Nal detectors would also need to
be double checked to be sure the detectors and entire RTRAK system is operating
correctly.
An assumption underlying Schilling's procedure is that no decision error
is made when inspecting any of the n plots. However, inspection errors will
sometimes occur since "inspection", as discussed above, consists of conducting
a statistical test for each plot using only a small sample of soil from the
plot. When inspection errors can occur, the fraction of defective plots is
artificially increased, which increases the probability of rejecting the lot.
To see this, let P denote the actual fraction of plots whose mean exceeds the
EPA limit, let PI denote the probability of a false-positive decision on any
plot (deciding incorrectly that additional remedial action is needed), and
let ?2 denote the probability of a false-negative decision (deciding incorrectly
that no additional remedial action is needed). Then, the effective fraction
107
-------
defective is Pe = Pl(l-P) + P(l-Pz)- For example, if PI = P£ = P = 0.05,
then Pe = 0.05(0.95) + 0.05(0.95) = 0.095 so that the compliance sampling
plan will operate as if the true proportion of defective plots is 0.095 rather
than 0.05. This means there will be a tendency to reject too many lots that
actually meet the C and PL specifications.
In Section 2.5 we saw, using Ra data from the Shiprock, NM, mill-tailings
site, how P, and P« change with the statistical test used, the true mean
concentration, the number of composite samples, and the amount of soil used
to form each composite sample. If remedial action has been very thorough so
that mean concentrations in all plots are substantially below the EPA limit,
then the true fraction of defective plots, P, will be zero and P = P. (since
C J.
P = 0) will be small. In that case, the probability of "rejecting the lot"
using Shillings' compliance acceptance sampling plan will be small. As
indicated above, this probability is given by the OC curve that may be obtained
using Table 2 in Schilling (1978).
108
-------
5.0 DISCUSSION

In this paper we have illustrated some statistical techniques for
developing more cost-effective sampling plans for verifying that 225Ra
concentrations in surface soil meet EPA standards. Although the focus here
po c
was on Ra in soil, these techniques can be used in other environmental
cleanup situations. Because of the high cost of chemical analyses for hazardous
chemicals, it is important to determine the number and type or size of
environmental samples that will give a sufficiently high probability of making
correct cleanup decisions at hazardous-waste sites. Also, it is clear from
Section 2.5 above that when the level of contamination is close to the allowed
maximum concentration limit, the probabilities of making correct cleanup
decisions depend highly on the particular statistical test used to make
decisions. Plots of probabilities such as given in Figs. 9, 10, and 11 provide
information for evaluating which test is most appropriate for making remedial-
action decisions.
A topic that is receiving much attention at the present time is the use
of in-situ measurements to reduce the number of environmental samples that
must be analyzed for radionuclides or hazardous chemicals. The RTRAK system
discussed in this paper is an example of what can be achieved in the case of
radionuclides in soil. Some in-situ measurement devices may only be sensitive
enough to determine if and where a contamination problem exists. Other devices
may be accurate enough to provide a quantitative assessment of contamination
levels. In either case, but especially for the latter case, it is important
to quantitatively assess the accuracy with which the in-situ method can measure
the contaminant of interest. The regression line in Fig. 14 illustrates this
concept.
It is hoped that this paper will provide additional stimulus for the use
of statistical methods in the design of environmental sampling programs for
the cleanup of sites contaminated with radionuclides and/or hazardous-waste.
109
-------
6.0 REFERENCES

Conover, W. J. 1980. Practical Nonparametric Statistics, 2nd ed., Wiley,
New York.

EPA 1983. Standard for Remedial Actions at Inactive Uranium Processing Sites;
Final Rule (40 CFR Part 19 2). Federal Register 48 (3):590-604-(January
5, 1983).

Exner, J. H., W. D. Keffer, R. 0. Gilbert, and R. R. Kinnison. 1985. "A
Sampling Strategy for Remedial Action at Hazardous Waste Sites: Clean-up
of Soil Contaminated by Tetrachlorodibenzo-p-Dioxin." Hazardous Waste and
Hazardous Materials 2:503-521.

Hawkes, C. J. 1979. "Curves for Sample Size Determination in Lot Sensitive
Sampling Plans", J. of Quality Technology 11(4):205-210.

Gilbert, R. 0. 1987. Statistical Methods for Environmental Pollution
Monitoring. Van Nostrand Reinhold, Inc., New York.

Schilling, E. G. 1978. "A Lot Sensitive Sampling Plan for Compliance Testing
and Acceptance Inspection", J. of Quality Technology 10(2):47-51.

Schilling, E. G. 1982. Acceptance Sampling in Quality Control. Marcel Dekker,
Inc., New York.

Wine, R. L. 1964. Statistics for Scientists and Engineers. Prentice-Hall,
Inc., Englewood Cliffs, New Jersey.
no
-------
DISCUSSION
Jean Chesson
Price Associates, Inc., 2100 M Street, NW, Washington, DC 20037
The presentation by Richard Gilbert
provides a good illustration of several
points that have been made by earlier
speakers. My discussion is organized
around three topics that have general
applicability to compliance testing,
namely, decision error rates, sampling
plans, and initial screening tests.

Decision Error Rates
The EPA standard for Cleanup of Land
and Buildings Contaminated with Residual
Radioactive Materials from Inactive Uran-
ium Processing Sites (48 FR 590) reads
"Remedial actions shall be conducted so
as to provide reasonable assurance
that, " and then goes on to define
the requirements for concentrations of •
radium-226 in the soil. An objective way
to "provide reasonable assurance" is to
devise a procedure which maintains stati-
stical Type II error rates at an accep-
table level. A Type II error, or false
negative, occurs when the site is decl-
ared in compliance when in fact it does
not satisfy the standard. The probab-
ility of a Type II error must be low
enough to satisfy EPA. On the other
hand, the false positive (or Type I)
error rate also needs to be kept reason-
ably low, otherwise resources will be
wasted on unnecessary remedial action.
The aim is to devise a compliance test
that will keep Type I and II errors with-
in acceptable bounds.
Developing a compliance test involves
three steps. First, a plan for collect-
ing data and a rule for interpreting it
is specified. The paper considers sever-
al sampling plans and three decision
rules for data interpretation. Second,
the decision error rates are calculated
based on a statistical model. In this
case, the model involves a normal distri-
bution, a linear relationship between the
variance and mean for composite samples,
and an assumption of independence between
individual soil plugs making up the comp-
osite. The last two components of the
model are based on empirical data.
Third, the sensitivity of the estimated
error rates to changes in the model ass-
umptions should be investigated. This is
particularly important if the same proce-
dure is going to be applied at other
sites. For example, if the estimated
error rates are very sensitive to the
model relating variance and mean, it will
be necessary to verify the relationship
at each site. Conversely, if the error
rates are relatively insensitive to
changes in the relationship, the com-
pliance test could be applied with con-
fidence to other sites without additional
verification.

Sampling Plans
The sampling plan is an integral part
of the compliance test. The paper illus-
trates how sampling occurs at several
levels. There is the choice of plots
within the site. The current plan in-
volves sampling every plot. The proposed
plan suggests sampling a subset of the
plots according to an acceptance sampling
plan. Then there is the choice of the
number and type of samples. One or more
samples may be collected per plot each
composed of one or more soil plugs.
Usually more than one combination will
achieve the required decision error
rates. The optimum choice is determined
by the contribution of each type of sam-
ple to the total variance and by relative
costs. For example, if variability bet-
ween soil plugs is high but the cost of
collecting them is low, and the measu-
rement method is precise but expensive,
it is advantageous to analyze composite
samples composed of several soil plugs.
If the measurement method is inexpensive,
it may be preferable to analyze individ-
ual samples rather than composites.

Initial Screening Tests
The RTRAK is an interesting example of
an initial screening test. Initial scre-
ening tests may be used by the regulated
party to determine when the site is ready
for the "real" compliance test, or they
may be an integral part of the compliance
test itself. In either case, the objec-
tive is to save costs by quickly ident-
ifying cases that are very likely to pass
or to fail the clearance test. For ex-
ample, if the RTRAK indicates that the
EPA standard is not being met, additional
remedial action can be taken before final
soil sampling, thereby reducing the num-
ber of times soil samples are collected
before the test is passed. If the init-
ial screening test is incorporated in the
compliance test, i.e., if a favorable
result in the initial screening reduces
or eliminates subsequent sampling re-
quirements, then calculations of decision
error rates must take this into account.

The "reasonable assurance" stated in
the EPA rule is provided by an assessment
of the decision error rates for the en-
tire compliance test. The development
and evaluation of a practical and effec-
tive multi-stage compliance test is a
significant statistical challenge.
Ill
-------
DISTRIBUTED COMPLIANCE: EPA AND THE LEAD BUBBLE
John W. Holley
Barry D. Nussbaum
U.S. EPA (EN-397F), 401 M St., S.W. Washington, D.C.
This paper discusses a particular class 01
strategies, "DUDbies", tor tne management or
numan exposure to environmental hazaras ana
examines an application or sucn strategies to
tne case or ieao in gasoline. Wnile gasoline
is by no means the oniy source 01 environmental
iead, lor most ol the population it has been
tne dominant source ror many years ana is
certainly the most controllable source. Leaa
is not oniy toxic to people, it is also toxic
to catalytic converters which are used on vehi-
cles to reduce emissions of such conventional
poiiutants as carbon monoxide, hydrocarbons,
ana oxides ot nitrogen. The twin objectives or
protecting people iron lead ana iron the con-
ventional emissions ol vehicles with lead-
disaoieo catalysts led to the nrst Enviro-
nmental Protection Agency (EPA; regulation or
the substance in gasoiine in 1979.- This tirst
regulation covered the totai amount ot lead
aiiouea in eacn gailon or gasoline produced by
& retinery wnen leaded and unieaded gasoiine
are considered together and averaged over a
quarter. it aiso set up temporary stanQaros at
a iess stringent level tor smail retiners.
without tninKing or it in these terms, the
Agency had taxen tne first steps toward recog-
nizing the neeo ior ana implementing a "bubble"
poiicy tor lead. The paper will present some
conceptual toois tor discussing buboies ana
then examine tne application 01 tnis management
approach to gasoiine iead.

Bubbies--GeneraI Principles

in general, a ouoole approach to environ-
mental regulation may oe thought 01 as any
approach that aims at ensuring that environ-
mental exposure to some poiiutant is reduced or
controlled "on the average" whiie accepting
some variability across emitters in the nagni-
tuoe or tneir contribution. "Un tne average"
ana "emitters" are iaeas that oDviousiy require
rurther discussion.

rurooses or buobie regulations

Regulators may use buooies ror at least tour
reasons. First, they may allow institution ot a
stringent regulation that would oe inteasibie
tor each entity to meet, yet might be feasible
tor an inaustry as a whole. Second, bubbles
make it possible to improve the flexibility ot
a regulation trom the standpoint or tne regu-
lateo entities and may thus lessen any negative
economic impacts. Tne classic plant bubble is
a case in point, providing tor operating tiexi-
biiity oy regulating the pollution from the
entire piant rather than tnat from each smoke-
stacK. Truro, DUDDIBS may improve tne
"raiiness" ot application ot tne burdens asso-
ciated witn a regulation. In IMS way regulators
may mitigate the economic impact ot an action
upon rirms that are somehow unusually sensitive
to its provisions. The final reason tor using
a ouooie approach is really derivative ot the
second and third. By minimizing and more
fairly distributing the impact ot a regulation,
the drafter may make badly needed controls
"possible" in a politico-economic sense. Thus
the public health may be protected by a buooie
regulation in a situation where the economic
impact ot a simpler regulation would make it
politically impossible to achieve.

Logical elements ot a buoble

A bubbie regulation always has some
dimension or set ot dimensions along which
compliance is distributed. The most ODVIOUS
such dimension is space, ana is illustrated
again by reference to the pi ant/smokestack
bubble. A lack of compliance in one location
may be balanced otr against greater than mini-
mum compliance in another location. It is
important in planning the implementation ot a
ouoble regulation whether sources across whicn
emissions are to be averagea are part ot a
single legally responsible entity ias in the
plant model) or are eacn themselves separate
corporate entities.
Time is another dimension aiong which com-
pliance may be distributed. Almost ail ot our
regulations are to some degree buooies in this
sense, since tne dimension or time is always
involved in our setting ot compliance periods.
Time even enters into our selection ot the
appropriate units las in cubic feet per
ninutej. This dimension becomes most impor-
tant, though, in a situation where it is
actively and intentionally manipulated in the
design ot tne compliance strategy so as to
achieve one or more ot the objectives or
bubbles that were mentioned above.
In addition to dimension, any successtui
ouobie approach must have some thought given to
wnat, tor want ot a oetter terra, we may can an
integrating medium. This medium must assure
that the results ol our allowing an uneven
distribution ot compliance across some dimen-
sion does not also result in sharp differences
in the cdnsequences ot exposure across that
same dimension. People in one area suffering
from some Kind ot toxic exposure are artoroeo
scant comtort by knowing that in consequence or
their suttering tne people in another area are
not attected at all by the pollutant. So whiie
we are attempting to achieve fairness in dis-
tributing the economic burdens or compliance
among polluters, we must also consider the
question of equity in exposure.
The integrating media in most oubbies are
the classic air, water, soil, and tood. Unaer
some circumstances we may consider the human
body to be an integrating medium, as in tne
case or pollutants whose etfects are cumulative
in the body over a lifetime. The air may mix
the emissions trom stack A and stack B so that
the downwind victim experiences the average or
tne two. Certain pollutants may be oittusea
throughout a body ot water in such a way tnat
heavy emissions on one day may be oaianceo ort
112
-------
aga;nst very light emissions on another day
with tne sarae ertect as it aaily emissions hao
Deen carefully held to an intermediate or
average i eve I .

Ent01 cement considerations

Measurement and/or sampling prooiems may
arise with distributed compliance regulations
tnat are rarely a problem with more conventio-
nal approaches. An example is a scheme tor
averaging automobile emissions across nodeis or
engine lamilies that was considered by the
Agency some years ago. Jitnout a cuobie
approach the certification process is limited
to determining whether each engine tamity meets
a single standard. Under a oubble- approach a
wnoie set ot issues arises around measuring the
emission level ot each tamily within some con-
tiaence iimits--questions 01 sample size and
design and distribution shape rear their heads.
When these vehicles are tested to verily tneir
in-use performance, statistical concerns again
arise as we consider whether the manuracturer
should be held responsible ror the point esti-
mate ot certification emissions, trie lower
conridence limit ito provide maximum protection
ror tne environment, or the upper confidence
limit >to protect the manufacturer against
unp.easant surprises that may be cased upon
sampling error;. These statistical concerns
cieariy nave snarpiy tocussea policy and tegai
imp Iications.
une enect of some distributed compliance
schemes is to unintentionally compromise an
environmental oenelit which arises out 01 in-
dustry quality assurance provisions. In the
simple situation where the manufacturer uust
meet a standard and face dire consequences tcr
railing to do so, some "headroom" is likely to
Cjt iert Between the actual emission level and
tne somewhat higher standard. This gap oenetits
trie environment to tne extent of tne manufac-
turer's intolerance ot risk. A redesign or such
an existing compliance scheme to a distributed
compliance approach with payment of a monetary
penalty for each ton ot pollutant over the
overall stanaaro may lead to an increase in
emissions oy reducing tne manufacturers' uncer-
tainty, even though emissions overall remain
uncer the statutory standard.
The enforcement or buoDie regulations may
cost more than would oe tne case tor simpler
alternatives. This is true because or the com-
plexity ot sampling and measurement and tne
administrative machinery needed to carry out
enforcement. where the buoole regulation pro-
vides significant oenetits to the industry in
tne torm ot flexibility, out costs more to ad-
minister, tne question arises as to whether tne
Agency or the industry should Dear tne cost. An
interesting example ot the working out ot these
problems can oe seen in a groundoreaning regu-
lation tor heavy-duty engine emissions negotia-
ted between the Agency ano various interested
parties. where a small manufacturer finds the
number or tests required by the Agency to estao-
iisn a family's emissions level too Durdensome,
tne firm may eiect a sampling approach tnat uses
rewe: tests. Tne nsK to the environment is
neia constant, leading tc higher risk of having
to pay unmerited non-compilance penalties in ex-
change tor the smaller sample.
DistriDuted compliance systems that sounded
wonderful when oeing discussed in theory oy
policy maxers and economists may contribute to
the development of ulcers oy the Agency's legai
fraternity. The very complexity ot these
schemes may become a major proolem in court,
where the violator can ta*e pot-snots at the
reasonableness of tne regulation and seeK refuge
in the loopholes tnat are the unintended con-
sequence ot complexity. The statistical aspects
or the design of the regulation are put to a
severe test as tne violator's attorneys and
consultants question the Agency's proof that
statistical assumptions were met or question tne
appropriateness ot the methods chosen. Where
compliance is distributed among different firms,
major difficulties may arise over the fixing or
responsibility for a vioiation--a prooiem tnat
may be unlikely to occur with a simpier com-
pllance scheme.
me case ot i eac

history ano background

Lead compounds were first used in gasoiine in
the 1920s to boost octane. Tne effects ot lead
on octane can be seen in the sample response
curve, Figure 1. While this curve is dirterent
for different base gasolines, its essential
feature is a declining octane oenent per unit
or lead as the total lead concentration in-
creases. The nature of this curve creates an
incentive for refiners to spread the amount or
lead they are allowed to use as evenly as possi-
cie over the gallons of leaded gaso1 me-pro-
duced. In addition to increasing octane rating,
lead compounds provide some protection trom
valve wear to older engines designed with soft
vaive seats. This vaive protection is proviceo
by relatively tow concentrations 01 leao com-
pared to tne more than two grams per leaoeo
gallon (gpig> once used in leaded gasoiine for
octane reasons.
As mentioned earlier, lead in gasoiine was
first regulated in Ia7& ooth to reduce lead ror
health reasons and to provide ror avalia&iiity
01 unieaded gasoline. Tougher standards tor
automotive emissions of caroon monoxide tCuJ ano
hydrocarbons iHC; led auto maxers to turn to
catalytic converters as control devices. wideiy
used rirst in 1»75, these devices are very sen-
sitive to poisoning oy lead, phosphorus, ana
other metallic substances.

Types of refineries

Tne refining industry grew up with the auto-
mobile and is thus a relatively old industry.
Refineries are technologically stratified Dy age
based upon tne levei ot technology wnen they
were constructed. Tne geographical development
ot the industry has tenoeo to fol low concen-
trations or population. Thus the older reri-
neries tend to be located in the East. Newer
refineries teno to be located near emerging
centers ot population and more recently oevei-
opeo sources ot crude 01 ; . These newer laciii-
113
-------
ties, incorporating more recent technology, teno
to oe locatea on tne Uest Coast.
As one mignt expect, refineries aiso vary
considerably in size. Figure 2 snows something
or tne size Distribution ot tne inaustry. A
substantial number ot these snail rerinenes
togetner proauce oniy a small part 01 tne totai
gasoiine supply. In certain markets, these
smaii facilities may play an important role aue
to nign transportation costs from areas ynere
yarger ana more efficient refineries are locatea.

Tne lead buboies

Quarterly averaging. The rirst bubble or
averaging approach usea in regulating gasoiine
ieaa emergea a most unconscious ly- in the process
ot selecting an eriicient way to monitor con-
pi lance, since continuous monitoring of each
refinery's output was not practical, and since
requiring that each gallon or gasoline must meet
a stanoara was very inflexible from the indus-
try's standpoint, the first regulations pre-
scrioeo a compliance period during which tne
average concentration or Ieaa couia not exceed
tne standard. Tne selection or a caienaar quai -
ter represents a compromise between environmen-
tai concerns ano tne industry's neec tor tiexi-
oiiity. The dimension tor this bubble, then, is
time. Tne relatively hign concentrations dic-
tate & snort time span in oraer to protect
pub,;c health. Tne integrating media are tne
air and soii from vruch lead emitted in automo-
bile exhaust is taken into the human oody. Tne
environmental concerns regaraing the use or the
quarter are mitigated oy the fact that the gaso-
line distribution system tends to mix gasoiine
from aitterent producers in the marketplace, ana
the air ano soil smooth out, over the course
quarter, tne intensity of human exposure.

Tracing. The second buboie occurred in a
more deliberate fashion with regulations that
oecame effective in iate 1962 and eariy 1963.
These regulations shitted tne basis of the stan-
aara and introduced a system ot trading in .ead
usage rights. The stanaaro was cnanged from one
pertaining to a refinery's pooied gasoiine out-
put tunieaaea ana leaded considered together/ to
a standard applied strictly to ieaaea gasoiine.
Tne original regulation purposefully encouraged
tne increaseo production or unleaded gasoline as
this product was new to the market. By 196^..
unieaaea gasoiine had become a permanent tix-
ture. The change to oase the standard on ieaaea
gasoiine oniy was made so tnat the totai amount
ot ieaa in gasoiine would aeciine with trie per-
centage of gasoline demand that was leaaed.
under tne older pooiea standard the amount of
ieaa per leaded gaiion could increase as the
percentage of leaded aeclined, resulting in a
Siower decline in totai lead use.
Accompanied by a tightening of standards ana
a phasecut or special small refinery stanaaras,
tne tracing system provioea tor an improvement
ir. tne ai.ocatior. ot Ieaa usage among refine-
ries. Tnis was aone oy permitting refineries
wnich neeaea iess ieaa than the standard aliowea
to seii tneir excess to other iess technologi-
cally aovanced refineries. Thus n modern raci-
iity capaoie ot producing Ieaaea gasoline com-
tortaniy at 0.70 gplg could seii the proouct ot
its leaaed gailonage and tne dirierence between
that concentration and the standard or 1.10 gpig
to one or more other refineries which found it
necessary to use more than 1.10 gpig in tneir
leaaed gasoiine. Sucn transactions were required
to occur during the compliance period in ques-
tion ana could occur either within corporate
bounaaries or across them.
Without changing the time dimension, trading
extended the oubole or distributed compliance
system for lead into the dimension ot space.
Incurring no more transportation costs than tne
price of a stamp, a refinery or importer in New
jersey could purchase the right to use ieaa tnat
was not needed by a refinery or importer in
Oregon and thereby legitimize actual lead use
tnat was over the standard. The integrating
media were essentially the same as for quarterly
averaging, but greater reliance was placed upon
the homogenizing effects ot the distribution
system to avoid tne development of "hot spots".

Banking. Responding to a mounting body ot
evidence on tne negative health etrects ot leac
and to the problem or increasea conventional
pollutants from Iead-poisonea emission contro,
systems, tne Agency took further action on ieaa
in eariy 1965. As shown in Figure 3, the re-
suiting regulations reduced the allowable ieaa
concentration by 91% in two stages (from 1.10
gpig to 0.50 gplg on July 1, 1965, and from 0.50
gpig to O.lo gpig on January 1, 1966;. Tnis
sharp tightening ot the standard tor lead was
accompanied by a system of banking whicn effect-
ively extended tne lead bubble over a much
longer time span than the calendar quarter that
was previously allowed.
Under the banking provisions a refiner was
allowed to store away in a bank account the
ditrerence oetween the standard ana either u. 10
gpig or actual Ieaa usage, whichever was larger.
Sucn accumulation of rights was permittea curing
the lour quarters ot calendar 1985. The oankea
lead rights were to be available for use or
transfer to another refiner or importer curing
any future quarter through 1567. Tnus ieaa
rights toregone During 1965 coula be used to
meet the sharply tignter 0.10 gpig stanaard
during 1966 and 1967 after which any remaining
rights expire. The 0.10 actual lead use limi-
tation on rights accumulation was intended to
avoid any incentive for refiners to use iess
than u.10 gplg in leaded gasoline, since this
was the level believed sufficient to protect tne
vaives of some oiaer engines from excessive
wear.
The Agency's predictions ot probaole refiner
behavior when given tne flexibility ot banking
are shown in Figure «, in wnich the concentra-
tions from Figure 3 are weighted by estimates
of leadeo gaiionage. The shaoeo areas during
196S represent the extent to which Agency econo-
mists expected reiineries- to lower lead concen-
trations in oraer to oanx lead rights tor later
use. The snaoed areas tartner to tne right show
tne aitterence between the expected concentra-
tions and the stanaard during the 1966-1967
period when the bankea rights could be usea to
supplement the O.lo gpig allowed under the stan-
aara. As the tigure snows, the Agency expectea
114
-------
only partial use or banking in the tirst quarter
or 1985 aue to the time required for refineries
to revise their planning horizons unaer the new
regulations. The heaviest banking was expected
to occur in the second quarter as refineries
were aDle to take full advantage of the regula-
tion. The third and fourth quarters were ex-
pected to show only slight oanking due to the
55* reduction in the standard to 0.50 gpig.
Predictions for the 1966-1967 period snow de-
ciining lead use in the second year as addi-
tional octane generation capacity was expected
to corae into service in anticipation of the 0.10
standard without banking.
This rinai step in extending a system or
distributed compl iance--a bubble — to cover lead
in gasoline completed what was started by the
decision to use quarters as compliance periods,
greatly extending on a temporary basis the time
span over which rerineries could demonstrate
compliance. Coupled with the trading provisions
to provide lor distriDution over the space di-
mension, the package provided the industry with
a very suostantial degree of flexibility in
meeting a standard which public health neeos
required to oe as stringent as possible. The
oanking ano trading together provided tor an
orderly adaptation by the more obsolete facili-
ties, providing them with the time necessary to
instaii new equipment.

now well it worxed

Use or banking and trading. From the very
oeginning 01 tne trading provisions in 196o,
oetween one tirth ana one third or the reporting
facilities found it either necessary or desira-
ble to purchase lead rights for use in demon-
strating compliance with the regulations. Tne
amount or lead involved in these transactions
was at first small, amounting to about 7* or tne
totai lead used. By the end of 196« this figure
had ciimoed to 20*.
The trading provisions of the regulation
unintentionally permitted facilities blending
aiconoi into leaded gasoline to claim and sell
leao rights based updn their activity. These
facilities, frequently little more than large
service stations, generated leao rights in tne
amount of the product of the 1.10 standard ana
tne nuaioer or gallons of alcohol they blended.
Both the lead and the gallons or leaaed gasoline
into which tne aicohol was bienaed had aireaay
oeen reported by others. While these alcohol
Djenoers increased sharply in numoer starting in
the second quarter or 1984, their activities
generated only a small amount or lead rights.
This appearance 01 a new "industry" as an unex-
pected consequence of tne regulation should
remind the statistician or analyst that "ceteris
parious" is not always the case. Even with all
the avaiia&le information aoout tne regulated
industry to analyze, all else will not be equal
since the regulation itself wi1i cause pertur-
oations, such as tne new and previously non-
existent ciass or blender "refiners".
Tne oanking program provided a great deal or
tiexioility to the industry, and accordingly was
neaviiy used from its outset in the first
quarter or 1965, even though the regulations
were not made rinal until after tne end of the
quarter. About halt of the entities reporting
to the Agency made deposits in that first
quarter, and tne industry held the actual lead
concentration to 0.70 gpig--lower than tne
Agency had predicted--thus banking more lead
rights than expected. Along with the oanking
came a sharp increase in trading activity. The
lead rights, because they no longer expired at
the end or each quarter, were worth more and
were traded in a more rational market where
sellers had more time to seek out buyers and
where brokers arose to place buyers and sellers
in touch with each other. The higher price of
Ieaa rights led to an explosion in the nuaoer or
alcohol blenders. Major refiners' facilities,
which were previously not motivated to buy or
even sell lead rights, began to bank and traae
aggressively, stocking up rights for use in the
1966-1967 transition period at the new more
stringent standard of 0.10 gpig.
Figures 5 and 6 show the leao use outcome of
banking and trading compared to the standards
and Agency predictions at the time the standards
were promulgated. Figure 5 shows concentrations
wh_ile figure 6 introduces leaded gallonage. The
early and vigorous banking reduced concentra-
tions to a lower level than expected, and sub-
stantial oanking continued to occur on into tne
second half of the year under a naIr gram stan-
dard. Actual lead use, as figure 6 shows, was
higher than predicted in both the secona and
third quarters as a result of higher than anti-
cipated leaded gasoline usage. In all, 1965
endea with a net collective bank oaiance in
excess or ten billion grams.
The first quarter of 1986 saw lead rights
leaving the bank at about the rate that the
Agency had predicted.- The second quarter caused
some alarm with a sharp drain on the bank oying
to tne unusually nigh leaded galionage at a
substantially higher concentration, u.*»u gpig,
than predicted. As Figures 5 and 6 show,
though, this ear ly drain was partially otfset oy
lower than expected usage in tne rourtn quarter.

The environmental effect of the regulation
has been an unusually sharp and rapid decrease
in a major pollutant, one that health studies
indicate may be more aangerous at lower con-
centrations and to a broader segment or tne
numan population than used to oe Delieved. The
oanking and tracing appear to have done pre-
cisely what they were intended and expected to,
trading orr lead use lower than the standard in
1985 against higher use in 1966-1967 with a
total lead use over the period about the same as
if the standards had been rigidly held to. It
may be the case that a lead reduction this
severe could not have been achieved without the
distributed compliance approach that was useo.
It is certainly true that a transition to lower
standards was achieved with greatly reduced
economic impact.

Administration and entorcement. The Banking
and trading regulations were conceived with
every intent that the Agency could keep a low
profile and let market mechanisms ao most or the
work. While this was achieved to a suostantiai
degree, the need to ensure compliance involved
the Agency in processing more paperwork than the
115
-------
draiters of the regulations anticipated. It is
prooabiy worthwhile to examine briefly how this
Happened.
The rlood of alcohol blenders swelling the
ranks of the reporting population was not expec-
ted. Blenders had first come onto the scene
with the trading provisions. By the end of 1964
they numbered something over a hundred, selling
smalt amounts of lead credits, generated during
the quarter, to small and/or obsolete refineries
which were not otherwise'able to neet the 1.10
gplg standard. In the first quarter of 1965
well over 200 additional blenders reported,
drawn by the prospect of either immediately
selling their lead usage rights at the sharply
higher prices that prevailed with banking or re-
taining them and speculating on t-he price. As
the word of this opportunity spread among dis-
trioutors and service station chains, the pop-
ulation of these "refineries" exploded, reaching
more tnan 600 by the third quarter of 1965 and
pushing the reporting population above &00.
The numbers themselves would not have been
such a problem for the Agency if all of the
reports had been made correctly. The blenders,
though, were new to this business. They didn't
understand tne regulations, and they lacked the
accounting and legal departments which usually
nandled reporting for large refineries. Tne nost
common error made by the blenders was to attempt
to bank and immediately seil to another refiner
lead rights that could not legitimately be
claimed. This frequently took the fora of
simply multiplying the alcohol gallonage by the
standard il.lO or 0.50 gpig, depending on the
quarter), ignoring the restriction mentioned
eariier that lead lights could be banked only on
foregone lead usage above 0.10 gplg. By the
time the blender filed a report and his error
was detected by the Agency's computer, the
rignts naa already been sold to another party
ano pernaps resoid or used. In addition to the
obvious legal tangle caused by this, there was
the instability or the blender population--the
party responsible for the improperly generateo
rights could not always be found.
Tne enrorcement machinery developed oy the
Agency to nanaie lead phasedown was shaped by
certain reasonable expectations about the re-
porting popuiation--scaIe or operations, number
or reporting entities, relative sophistication,
etc. Trie blenders did not fit these expec-
tations, and the enforcement process developed
considerable congestion until some adaptation
could take place. The computer system developed
to audit reports and especially to match up the
parties in lead rights transfers did precisely
wnat it was designed to do and generated thick
stacks of error output where only a few errors
hao been expected. The further processing of
the errors hao to be done manually and required
Clerical and legal staffing at a level that was
not anticipated. By the time these resources
were increased to tne appropriate levels the
oackiog 07 errors was substantial and the tne
elapsed since the filing of the original reports
maoe sorting things out more difficult.
A further illustration of how the crystal
bail can rail is found in the dirrerence between
true relineries ana the blenders in scale of
operations. True refineries deal in such large
quantities of gasoline and lead that for con-
venience all ol the report forms used thousands
of gallons and kilograms of lead as units. To
report in smaller units would be to claim a
degree of precision lacking in the basic inior-
mation available to the refineries' accounting
departments. The effect of rounding to thou-
sands, trivial to larger refineries, was defi-
nitely not trivial to the blenders, many or whoro
only blended a thousand gallons ot alcohoi in a
quarter. The blenders used whatever units
optimized their profit with a fine disregard for
the proper placement ot decimal points. where
their gallonage was, say, 1,600 gallons, they
would take advantage ot the rounding instruc-
tions on the form to claim credits based upon 2
units of a thousand gallons each. If the amount
was 1,400, they would report in gallons rather
tnan thousands of gallons, often without
labelling the units or putting a decimal point
in the correct position.
All ot these difficulties of enforcement
logistics came into being as a result ot the
complexity ot the bubble or distributed com-
pliance system. With a simple set of rigid
standards there would have been no blenders.
Fortunately, this was a case where the environ-
ment suffered almost no harm as a result 01 tne
unforeseen consequences ot tne regulations,
however embarrassing the situation may nave been
to Agency managers. This was probably mostly
good luck, and should not be counted upon to
happen routinely.

Legai Considerations

The statistician frequently tinds himseii
witn a weii-thought-out concept for a procedure
only to be raced with complications in tne
implementation scheme. Banking and trading
proved no exception to this problem. The idea
ot free trade of lead rignts between parties in
order to increase flexibility of each refinery's
planning was too good to resist. Tne government
even took great pains to stay at "arms distance"
in tne trading process. Prior experience with
the Department of Energy's entitlements program,
in which the Federal government establisneo
tormula upon formula to assure that every reri-
nery got its "fair share" demonstrated that the
Fedeial government was not the best "broxer in
the refinery industry! In this case the EPA was
staying out ot the business.
So, what could go wrong? Since lead rignts
are valuable, there is an incentive to cneat.
The value ot lead rights rose from 3/4 of a
penny to slightly over 4 cents per gram or lead.
Trading and banking transactions are frequently
in the order ot 25 to 50 million grams. Thus
the dollar amounts are in the $1 to *2 million
dollar vicinity. Consequently, monitoring ana
entorcement become major issues. Monitoring and
its requirement tor extra personnel and computer
usage has already been discussed. Entorcement
and the legal considerations are another matter.
Prior to banking ana trading, the regulations
were applied on a retinery by refinery oasis and
entorcement was a fairly straightforward matter.
Under banking and trading the host or possible
violations increased exponentially. The types
ot violations included trading rights that were
116
-------
improperly generated, selling the same rights
twice, ana banking rignts lor a ruture quarter
mat were in tact required for trie current
quarter's compliance. Any or these trans-
gressions, of course, may have ramifications lot
the cuyers ol such lead rights. The situation
Decomes very complex trom an entorcesent stand-
point since frequently rignts are soid to an
intermeaiary wno resells them. Ir tne original
rignts were bogus, or partly Bogus, wno among
ail tne recipients has good rights and who has
bad ones: These are not like counterteit bills;
they are entirely fungible, and determining it a
particular right is legitimate can be a night-
mare. Since banking lasts over several time
periods, bogus rights can be exchanged irequent-
ly, and tracing the source of the -bad rights can
be next to impossible. Further, what action, it
any, should oe taken against the gooo faith
purchaser or such lead rights? This last ques-
tion subdivides into possioie different actions
depending upon whether the purchaser just de-
posits the rights into his account or, alterna-
tively, actuary uses them berore they are
discovered to be bogus. The possibilities seem
end less '
An interesting sidelight to these difficul-
ties is that it is frequently a smacl refiner
with smail amounts of rights that causes the
difficulty. More eftort is expenoeo to chase
snail infractions than can be imagined, and
enforcement policies designed for use witn a
sma:i number or large violators prove awkward
ana unwieidy when dealing with a large number of
smail violators. A second side effect, though
no fault or the designer of the regulation, is
that many refineries find thenseives bankrupt in
today's oil industry. Chasing after lead rights
of a oannrupt concern is generally tar iess than
IruitIui.
Nevertne1 ess, the system has rared remarkaoiy
weii. uver ten billion grams of iead rights
were banned, roughiy two year's worth, and no
one is asking tor government intervention to
maxe ieaa rights trading run more smootniy.
nowever, tne point to be raaoe is that tne sta-
tistician can iii-atrord to wash his hanas or
the problems involved in day-to-day implemen-
tation anc enforcement or the requlations. Me
must guard against being the party who suggesteo
the program and then waixea away wnen some as-
pect didn't worx as planned.

Conclusions
We have tried to provioe in tnis paper an
analytic framework tor understanding the set or
compliance management mechanisms looseiy classi-
fied as "bubbles". we nave seen something ot
the attractive features ot such approaches,
especially trom the standpoint ot the economic
flexibility which they may make possible, but
nave also seen some ot the ways in wrncn things
may go otherwise than as the drafters or the
regulations intended. The lead phasedown
banking and trading system was used to illus-
trate some ot the concepts presented, even
though the statistical problems in this regula-
tion were iess extensive than those with some
other bubble regulations.
Distributed compliance schemes are fasci-
nating to economists, and they are attractive to
higher Agency managers from other professional
backgrounds because of their potential to biunt
the resistance to needed environmental regula-
tion and sugarcoat the regulatory piil. The
statistician must nave a piace in the develop-
ment of these regulations — the questions or
measurement, estimation, and uncertainty triat
a/e frequently involved demand it. The proper
roie of the statistician is not oust that ot
picking up the pieces after things oegin to go
wrong in implementation. Neither is it to be a
nit-picking nay-sayer whose business is to teli
people why "you can't get there from here".
hather the statistician's role should be an
affirmative one — that ot a full partner in tne
regulation development process. As such,
members or the profession must not only serve in
the critical role of assuring a regulation's
scientific integrity (and therefore its enrorce-
ability* but must also lend their creativity ana
special insights to the fundamental design or
the regulation's compliance system, finding ways
to do things where others, perhaps, cannot.
117
-------
Figure 1
GasolinQ octane enhancement from lead
antiknock compounds
Octane number
100 r
95 I-
90
0.5 1.0 1.5 2.0 2.5
Grams of lead per gallon
3.0
Figure 2
Cumulative percentage of total gasoline
production by refinery size percent!le*
Percentage of total gasoline
100 r
80 -
Size percentile of refineries
•Lua-tar III. ]9B3
118
-------
Figure 3
Standards and predicted* lead concentrations
under banking and trading
Grams par leaded gallon
Standard in offset
Predicted by model
12341234123412341234
•Costs and Benefits of Roducjng Load
in Gasoline. Feb.. 1985. p. 11-53.
Figure 4
Lead usage predicted
with and without banking program
Billion grams of lead st«to-* lonk.ns
tOTHIng no-it u O.JO SU. u
r
t-
10 -
Grams predicted
by trie standard
Grams predicted
by the model
Load banned
for future use
12341234123412341234
Looo«c gollenojs for 18BS »id Jot»r
• 1th th« o*cuHptlon o^ BOX reduction in Blcfucllng. EorJlcr
or-* frof p. 11-63 of tnc abovi docu>wnt.
tahan fro« Co»t* ami B«n«fltt of Raduelng Load in
er* actual. Pr«dlct«d cancttn
. F»b. . 19E5.
trationc
119
-------
Figure 5
Predicted* and actual lead concentrations
under banking program
Grams per leaded gallon
•Costs and Benefits of Raaucing Lead
in Gasoline. Feb.. 1965. p. 11-63.
Predicted by model
Actual
Figure 6
Predicted and actual lead usage
with banking program*
Billion arams of lead
steiBIa u Bvfclng •«•
N0 trertir^
Crams predicted
by the model
Actua] ]ead u
12341234123412341234
•Predicted lead ucag* it the mama of In flyjre 4 and it based upon the Agancy'c
predlrted leaded gollonoos. Actua] aollonoge •« htarier than predicted.
120
-------
DISCUSSION
N. Phillip Ross
US Environmental Protection Agency
The concept of bubbles is intrigu-
ing; an umbrella under which trades can
be made which enable regulated indus-
tries within the bubble to meet
environmental standards—standards that
they otherwise may not have been able to
satisfy. This paper describes such a
bubble; an umbrella of time for compli-
ance with lead in gasoline standards.

The idea has logical appeal. Unfor-
tunately, the world in which it is
implemented is not always as logical.
There is an implicit concept of
uniformity that underlies the ideas of
trading and banking. It's okay to have
high levels of pollutants as long as you
balance them against low levels either
at a later point in time or by purchas-
ing "credits." Although the "average"
levels of the pollutant within the bub-
ble's boundaries may be at or below the
EPA standard, there will be many points
within the bubble where levels are well
above the standard. From a public
health point of view, this may not be
desirable. It eventually translates
into periods when the population at risk
will receive exposures to levels greater
than the standard.

As pointed out by the authors, a
major advantage to- use of the bubble in
the case of lead in gasoline was that
many refiners and blenders who could not
immediately meet the standard were able
to continue operations through the pur-
chase of credits. Indeed, imposition of
the standard on many of these companies
may well have forced them out of busi-
ness. This is not a minor concern.
Enforcement of environmental standards
is exceptionally difficult. The
regulated industry must be willing to
cooperate through voluntary compli-
ance. The bubble approach, even under
conditions of non-uniformity, provides
the needed incentives to encourage
voluntary compliance. Environmental
standards which cause major economic
hardship for the regulated industry
will be difficult to enforce. Federal
enforcement resources are minimal.
Lack of a substantial enforcement
presence could result in greater pol-
lution through noncompliance. Even
though the real world does not always
conform to the basic assumption of the
bubble model, the real world will use
the approach to achieve an overall
reductions in pollution.

The lead bubble was very successful.
As the authors have pointed out, there
.were problems; however, overall the
levels of lead in gasoline did go down
rapidly. This probably would not have
happened under the more traditional
approach to enforcement.

I agree with the author's conclusion
that statisticians must learn to play
a greater role in developing the
strategies and in "finding ways to do
things where others, perhaps, cannot."
Statistical thinking involves the
consideration of uncertainty in
decisionmaking. All problems cannot be
solved statistically; however, statisti-
cal thinking can help solve problems.
Statisticians need to realize that their
roles are not limited to the design or
analysis components of a study. They
have a role to play in the process of
regulation development and in the
development of new an innovative ways
to deal with enforcement and compliance
problems—ways which are not necessarily
based on mathematically tractable
assumptions.
121
-------
VARIABLE SAMPLING SCHEDULES TO DETERMINE PM10 STATUS
Neil H. Frank and Thomas C. Curran
U.S. Environmental Protection Agency, Research Triangle Park, NC 27711
Introduction
In April 1971, EPA set National
Ambient Air Quality Standards (NAAQS)
for participate matter (PM) and five
other air pollutants - nitrogen dioxide,
sulfur oxioes, carbon dioxide, hydrocar-
bons, and photochemical oxidants.1
There are two types of NAAQS: primary
standards designed to protect human
health and secondary standards designed
to protect public welfare. In recent
years, the standard for hydrocarbons has
been rescinded and standards for an
additional pollutant, lead, have been
added. The reference method for measur-
i ny attainment of the PM standards pro-
mulgated in 1971 was the "high-volune"
sampler, which collects PM up to a nominal
size of 25 to 45 micrometers (urn). This
measure of PM was called "Total Suspended
Particulate (TSP)" and was the indicator
for the 1971 PM standards. The primary
(health-related) standards set in 1971
for particulate matter (measured as TSP)
were 2faU ug/m3, averaged over a period
of 24-hours and not to be exceeded more
tnan once per year, and 75 ug/rr3 annual
geometric mean. The secondary (welfare-
related) standard set in 1971 (measured
as TSP) was 150 ug/m3, averaged over a
period of 24 hours and not to be exceeded
more than once per year.
The gaseous NAAQS pollutants including
carbon monoxide, nitrogen dioxide, ozone,
and sulfur dioxide, are sampled with
instruments which operate continuously,
producing data for each hour of the year.
This data is subsequently processed into
various statistical indicators necessary
to judye air quality status and attain-
ment witn their respective standards.
Lead anc TSP are NAAQS pollutants sampled
on an intermittent basis. For these
pollutants, one integrated 24-hour mea-
surement is typically scheduled every
sixth day. This is designed to produce
measurements which are representative of
every day of the week and season of the
year. This approach has been shown to be
useful in producing unbiased estimates of
quarterly and annual average air quality,
out nas various limitations regarding
estimation of peak air quality values.
One shortcoming of concern was that
attainment of the short-term 260 ug/m3
TSP standard could be judged using data
typically collected every sixth day and
there was no specified adjustment for the
effect of incomplete sampling. This was
recognized as a problem in the early
197u's. If the second highest observed
TSP measurement was less than 260 ug/m3,
the primary health related standard was
juaged as being attained. These stan-
dards were termed "deterministic."
Pursuant to the requirements of the
1977 amendments to the Clean Air Act, EPA
has reviewed new scientific and technical
data and has promulgated substantial
revisions to the particulate matter
standards.2>3 The review identified the
need to focus from larger, total parti-
cles to smaller, inhalable particles that
are more damaging to human health. The
TSP indicator for particulate matter has
therefore, been replaced with a new
indicator called PM^o that only includes
those particles with an aerodynamic
diameter smaller than or equal to a
nominal 10 micrometers. A 24-hour
concentration of 150 ug/rn^ levels was
selected to provide a wide margin
of safety against exposure which is asso-
ciated with increased mortality and
aggravation of respiratory illness; an
annual average concentration of 5U ug/m3
was selected to provide a reasonable
margin of safety against long-term degra-
dation in lung function. The secondary
standards were set at the same levels to
protect against welfare effects. The
EPA review also noted that tne relative
protection provided by the previous short-
term PM standards varied significantly
with the frequency of sampling. This
was identified as a flaw in both the
form of the earlier TSP standard and the
associated monitoring requiremants.
Following the recommendations of the EPA
staff review, the interaction between
the form of the standard and alternative
monitoring requirements was considered
in developing the recently promulgated PM
standards.
Form of the New PMm Standards
The new standards for particulate
matter are stated in terms of a statis-
tical form. The 24-hour standards were
changed from a concentration level not to
be exceeded more than once per year to
a concentration level not to have more
than one expected exceedance per year.
This form corresponds to the one promul-
gated for the revised ozone standard in
1979.4 The annual standards were changed
from an annual average concentration not
to be exceeded to an expected annual
average concentration. To be more con-
sistent with pollutant exposure, the
annual average statistic was also changed
from a geometric mean to an arithmetic
mean.
The attainment tests, described for
the new expected value forms of the
particulate matter standards, are
designed to reduce the effects of
year-to-year variability in pollutant
concentrations due to meteorology,
and unusual events. For the new 24-hour
PM standard, an expected annual
122
-------
number of exceedances would be estimated
from observed data to account for the
effects of incomplete sampling following
the precedents set for the ozone stan-
dard. With averaging of annual arithme-
tic means ana estimated exceedances over
a multiple-year time period, the forms of
these standards will permit more accurate
indicators of air quality status and will
provide a more stable target for control
strategy development.
The adjustments for incomplete data
and use of multi-year time periods are
significant improvements in the inter-
pretation of the particulate matter
standards. These changes increase the
relative importance of the 24-hour stan-
dard and play an important role in the
development of the PM^o monitoring
strategy. They also help to alleviate
tne implicit penalty under the old form
that was associated with more complete
data. The review of alternative forms of
the 24-hour standards identified that
the ability to detect nonattainuent
situations improves with increasing
sample size. This is true for the pre-
vious "deterministic" form and the
current statistical form. With the
new 24-hour attainment test, however,
there is a significant increase in the
probability of failing the attainment
test with incomplete data sets. This
sets the stage for attainment sampling
strategies.
Figure 1 presents the probability of
falling the 24-hour "attainment tests for
the new PM^Q NAAQS over a 3-year period.
These failure probabilities were based on:
(1) a constant 24-hour PM^g exceedance
probability from an underlying concentra-
tion frequency distribution with a speci-
fied characteristic high value (concen-
tration whose expected number of exceed-
ances per year is exactly one), and (2)
a binomial distribution of the number of
observed exceedances as a function of
sample size. Lognormal distributions
with standard geometric deviations (sgd)
of 1.4 and 1.6 were chosen for this
illustration to represent typical air
quality situations. The approach used
in Figure 1 and throughout this paper
are similar to analyses presented else-
where. 5,6,7 This facilitates examining
properties of the proposed standard in
terms of the relative status of a site to
tne standard level (e.g. 20 percent above
the standard or 10 percent below the
standard) and the number of sampling days
per year. It is worth noting that the
percent above or below the standard is
determined by the characteristic high.
This is more indicative of the percent
control requirements than using the
expected exceedance rates.
Sampling frequency was judged to not
oe an important factor in the ability to
identify nonattainment situations for
eitner the current or previous annual
standards. This is due to the generally
unbiased nature and small statistical
variability of the annual mean which is
used to judge attainment with this stan-
dard. The change to an expected annual
mean form, however, would tend to provide
better estimates of the long-term pol-
lutant behavior and provide a more stable
indicator of attainment status.
With the new 24-hour attainment test,
one important consequence of increased
failure probabilities is the potential
misclassification of true attainment
areas. In Figure 1, it can be seen
that these Type I errors are generally
higher for small sample sizes, including
those typical of previous TSP monitoring.
This error is shown to be as high as 0.22
for a site which is 10 percent below the
standard and has a sampling frequency of
115 days per year.
During the review of the standards,
it was recognized that the ideal approach
to evaluate air quality status would be
to employ everyday sampling. This would
minimize the potential misclassi fication
error associated with the new PM attain-
ment tests. From Figure 1, it can be
seen that this would produce the desir-
able results of high failure probabili-
ties for nonattainment sites and low
failure probabilities for attainment
sites. Unfortunately, existing PM moni-
toring technology as well as available
monitoring resources do not make it
convenient to monitor continuously
throughout the nation. Moreover, while
more data is better than less, it may not
be necessary in all situations. When we
revisit Figure 1, it can be seen that
when a site is considerably above or
below the standard, small sample sizes
can also produce reasonably correct
results with respect to attainment/
nonattainment decisions. Thus, in order
to balance the ideal and the practical, a
monitoring strategy was developed which
involves variable sampling schedules to
determine PM^y status and attainment with
the new standards.
The new strategy will permit most
locations to continue sampling once in 6
days for particulate matter. Selected
locations will be required to operate
with systematic sampling schedules of
once in 2 days or every day. With
approval of EPA Regional Office these
schedules may also vary quarterly depend-
i ny on the local seasonal behavior of
PM^U. Schedules of once in 3 days were
not considered because of the discon-
tinuity in failure probabilities occuring
at 115 sampling days per year (95% data
capture), seen in Figure 1 and discussed
el sewhere.^i 1
Monitoring Strategy
The previous monitoring regulations
which applied to particulate matter
specified that "at least one 24-hour
sample (is required) every 6 days except
123
-------
during periods or seasons exempted by
the Regional Administrator."8 The new
PM^j monitoring regulations would permit
monitoring agencies to continue this sampling
frequency for PM^0 but would require them
to conduct more frequent PM^g sampling in
certain areas in order to estimate air
quality indicators more accurately for
control strategy development and to
provide more correct attainment/nonat-
tainment determinations.9 The change in
monitoring practice is largely required
to overcome the deficiency of existing
sampling frequency in detecting exceed-
ances of the 24-hour standard. The
operating schedules proposed for the
measurement of PM^g will consist of a
short-term and long-term monitoring plan.
The short-term monitoring plan will be
based on the requirements and time
schedules set forth in the new PM^g
Implementation Regulations for revising
existing State Implementation Plans
(SIPs).^a The requirements ensure that the
standards will be attained and properly
maintained in a timely fashion. The
long-term requirements will depend on
PM}Q air quality status derived from future
PMig monitoring data. These are designed
to ensure that adequate information is
produced to evaluate PMjg air quality
status and to ensure that the standards
are attained and subsequently maintained.
Consistent with the new reference
sampling principle, available PM^j
instruments only produce one integrated
measurement during each 24-hour"period.
Multiple instruments operating with
timers, therefore are necessary to avoid
daily visits to a given location. The
new standards, however, will permit
approval of alternative "equivalent"
methods which include the use of contin-
uous analyzers. 'Because of the new
monitoring requirements, instrument
manufacturers are currently developing
such analyzers. This will alleviate the
temporary burden associated with more
frequent monitoring.
Short-term Monitoring Plan
The proposed first-year monitoning
requirements will be based on the
requirements for revising SIPs.
Areas of the country have been clas-
sified into three groups, based upon the
likelihood that they are not currently
attaining the PM^g standards as well as
other considerations of SIP adequacy.ll
Since PM^y monitoring is in the process
of being established nationwide and
is quite limited, a procedure was used
which estimated the probability that each
area of the country would not attain the
new standards using existing TSP data in
combination with available PMig data.
Tnis is described elsewhere.12
Areas have been classified as Group I,
I! or III. Group I areas have been
judged to have a high probability,
p > 0.95, of not being in attainment with
the new standards. Group II areas have
been judged to be too close to call, but
still very likely to violate the new
standards (0.20 _< p <0.95). Group III
areas have been judged to be in attain-
ment (p <0.2U).
For Group I areas, the value of a
first year intensified PMjg data
collection is most important. This
is because these areas are most likely
to require a revised SIP. Since the
24-hour standard is expected to be
controlling, the development of control
strategies will require at least 1
complete year of representative data.
Consequently, everyday sampling for a
minimum of 1 year is required for the
worst site in these areas in order to
confirm a probable nonattainment status,
and to determine the degree of the
problem.
The Group II category identifies
areas which may be nonattainment (out
whose air quality status is essentially
too close to call.) For such areas, the
value of additional PM^g information is
important in order to properly categorize
air quality status. For these areas,
more intensified sampling is desirable.
Based on the consideration of cost, and
available monitoring resources, however,
a more practical strategy of sampling
once in 2 days at the worst site is
required for the first year of monitor-
ing.
All remaining areas in the country
(defined in terms of p<0.20) have been
categorized Group III and judged not
likely to violate the new standards. For
such areas, the value of collecting more
than a minimum amount of PMm data is
relatively low and intensified PM^g data
collection is not warranted. Recognizing
tnat there is still a small chance of
being nonattainment, however, a minimum
sampling program is still required at
these locations. Based on considerations
of failing the 24-hour attainment
test and estimating an annual mean value,
a minimum sampling frequency of once in 6
days is required.
The short-term strategy also contains
previsions for monitoring to be inten-
sified to everyday at the site of ex-
pected maximum concentration if exceed-
dances of the 24-hour standard are
measured during the first year of moni-
toring. This is intended to reduce the
potential for nonattai nment mi sclas-
sification (type I error) with the 24-hour
PMm attainment test. With this provision,
the first observed exceedance is not
adjusted for incomplete sampling and is
assumed to be the only true exceedance at
that location during the calendar quarter
in which it occurred. The effect on
misclassification error associated with a
3-year attainment test is illustrated in
Figure 2. It can be seen that the sites
most vulnerable to this error are slightly
124
-------
less than the standard. In these com-
parisons, for sites whicn are 1U percent
less than the standard and are sampling
once in 2 days, the type I error is
reduced from 6 percent to 1 percent. If
tnese same sites are sampling once in 6
days, the type I error is similarly
reduced from 12 percent to 0.5 percent.
There is, however, a corresponding
increase in the type II error associated
with the attainment test for true nonat-
tainment sites also close to the stan-
dard. This compromise was judged to be
appropriate in developing the new rules.
Long-term Monitoring Plan
Tne long-term monitoring plan starts
with the second year of sampling.' The
required sampling frequencies are based
on an analysis of the ratio of measured
PM^Q concentrations to the controlling
PMig standard. This determination depends
upon an assessment of (1) whether the
annual or 24-hour standard is controlling
and, if it is the latter, (2) the
magnitude of the 24-hour PM^o problem.
Both items are evaluated in terms of the
air quality statistic called the design
concentration. For the annual standard,
tne design concentration is the expected
annual mean; for the 24-hour standard,
tie design concentration is the
characteristic high value whose
expected exceedance rate is once per
year. In both cases the design
concentration is the value the control
strategy must be capable of reducing to
tne level of the standard in order to
achieve attainment. The ratio to the
standard is defined in terms of the
design concentrations and the standard
level; tne controlling standard is simply
the standard which has the highest ratio.
Trns is a somewhat simplified definition
but is adequate for present purposes.
Tne long-term strategy specifies
frequencies of every day, every other
aay, or every sixtn day. Tne long-term
monitoring strategy is designed to
optimize monitoring resources and
maximize information concerning attain-
ment status. As with tne short-term
strategy, the increased sampling fre-
quency provisions only apply to the
site with expected maximum concentra-
tion in each monitoring area.
For tnose areas where the annual
standard is controlling, 1 in 6 day
monitoring would be required; this
frequency has been judged to be adequate
for assessing status with respect to
tnis standard. For those areas where the
24-hour standard is controlling, the
required minimum sampling frequency for
the calendar year will vary according to
the relative level of the most current
maximum concentration site to the level
of the standard. In other words, the
sampling requirement applies to the site
wrnch drives attainment/nonattainment
status for the monitoring area. The
least frequent monitoring (1 i n 6 days)
would be required for those areas where
the maximum concentration site is clearly
above the standard (>_40 percent aoove) or
clearly below the standard (>20 percent
below). For such sites a minimun amount
of data collection would be adequate to
verify correct attainment/nonattainment
status. As the area approaches the
.standard, the monitoring frequency for
the maximum concentration site would
increase so that the misclassification
of correct attainment/nonattainment
status can be reduced. If the area is
either 10-20 percent below or 20-40
percent above the 24-hour standard, 1 in
2 day mom'toriny would be required. When
the area is close to the standard, i.e.
10 percent below to 20 percent above,
everyday sampling would be required in
order to improve the stability of tne
attainment/nonattai nment classi fication.
Figures 2 and 3 illustrate mi sclassi fi -
cation rates for a 3-year, 24-hour
attainment test as a function of the
relative status of a site to the standard
and in terms of alternative sampling
frequencies. As witn previous analyses,
underlying lognormal distributions with
sgd's of 1.4 and 1.6 for attainment and
nonattainment sites are utilized.
For sites following the long-term
incomplete sampling schedules (1 in 6
days and 1 in 2 days) misclassification
rates can be maintained in or below the
neighborhood of 5-10 percent.
Summary
The revisions to the PM standards
improve the ability to identify non-
attainment situations, provide for more
stable pollutant indicators, and change
the relative importance of the annual
and 24-hour averaging times. With the
required adjustments for incomplete
sampling in the interpretation of PM
data, the revised standard would correct
for the variable protection afforded by
tne current 24-hour PM standard, and it
is expected that the revised 24-hour
standard will generally be controlling.
Monitoring requirements have been
promulgated which will similarly correct
for the deficiency in the current
standards. Variable frequencies are now
required in order to reduce the uncer-
tainty associated with attainment/
nonattainment classification. This
provides more uniform protection by the
standards but at the same time conserves
scarce monitoring resources. The initial
requirements will place the most emphasis
on areas with the highest estimated
probability of violating the PM]_.j stand-
ards wmle the long-term strategy will
allow sampling frequency to vary accord-
ing to the relative status of an area
with respect to the standard concen-
tration levels.
The operational difficulties
associated with implementing the new
125
-------
requirements for everyday monitoring has
generated new research initiatives to
develop a continuous analyzer for PM^o-
Once this is available, particulate matter
can be conveniently monitored everywhere
on the same basis as the gaseous NAAQS
pollutants.
References
1. "National Primary and Secondary Ambient Air
Quality Standards," Federal Register,
36(84):8186. April 30, 1971.

2. Review of the National Ambient Air Quality
Standards for Particulate Matter: Assess-
ment of Scientific and Technical Information,
OAQPS Staff PaperJU. S. Environmental
Protection Agency, Research Triangle Park,
N.C. 27711. EPA-450/5-82-001. January
1982.

3. "Revisions to the National Ambient Air
Quality Standards for Particulate Matter,"
Federal Register, 52(126):24634. July 1 ,

4. "Revisions to the National Ambient Air
Quality Standard for Photochemical Oxidants,"
Federal Register, 44(28):8202. February 8, 1979.

5. Frank, N. H. and T. C. Curran, "Statistical
Aspects of a 24-hour National Ambient Air
Quality Standard for Particulate Matter,"
presented at the 75th APCA Annual Meeting,
New Orleans, LA. June 1982.
Davidson, J. E., and P. K. Hopke, "Implica-
tions of Incomplete Sampling on a Statis-
tical Form of the Ambient Air Quality
Standard for Particulate Matter,"
Environmental Science and Technology, 18(8),
1984.

Frank, N. H., S. F. SI ev a and N. J. Berg, Jr,
"Revising the National Ambient Air Quality
Standards for Particulate Matter - A
Selective Sampling Monitoring Strategy,"
presented at the 77tn Annual Meeting of
the Air Pollution Control Association,
San Francisco, CA., June 1984.

"Ambient Air Quality Surveillance,"
Federal Register. 44(92):27571, May 10,
9. "Ambient Air Quality Surveillance for
Particulate Matter," Federal Register,
52(12b):24736. July 1 , 1987. -

10. "Regulations for Implementing Revised
Particulate Matter Standards," Federal
Register. 52 ( 126 ):24672. July 1, 1987
11
12.
Group I and Group II Areas" Federal
Register, 52(152):29383. August 7, 1987.

Pace, T. G. , and N. K. Frank, "Procedures
for Estimating Probability of Nonattainment
of a PMiQ NAAQS Using Total Suspended
Particulate or Inhalable Particulate Data,"
U. S. Environmental Protection Agency,
Research Triangle Park, N.C. 1984.
:IGURE 1. FAILURE PROBABILITIES FOR 3-YEAR, 24-HOUR ATTAINMENT TEST
WITH CONSTANT SAMPLING RATE
en
?. cc
20X ABOVE STANDARD
O
ru
« o
O I

LOBNORMAL DISTRIBUTIONS:
STANDARD GEOMETRIC DEVIATION
1.6
EXPECTED EXCEEDANCE FORM
ONCE PER YEAR FORM
LOGNOHMAL DISTRIBUTIONS.
STANDARD GEOMETRIC DEVIATION -1.4
122 163 244 305
NUMBER OF SAMPLING DAYS PER YEAR
355
126
-------
NJ
-J
Probability Of Nonattainment Misclassification
0.00 0.05 0.10 0.15 0.20 0.25 0.30
o

5-n
-H ^
> CD
2 ^
n 2
g§
ts
en Jr!
tn r~
n^;
2^
5Q
3D
m
Probability Of Attainment Misclassification
0.00 0.05 0.10 0.15 0.20 0.25 0.30
j I
-------
DISCUSSION
John Warren
US Environmental Protection Agency
The use of the statistical concept of
expectation for comparing monitoring data
with a standard is new and quite intri-
guing as it offers promise of extension
to other standards and regulations. The
difference between existing standards and
the new statistical standards is illus-
trated by the PM-10 standards.
Existing standards:
o The 24-hour concentration is not to
exceed 150 micrograms per cubic
meter more than once per year.
o The annual average concentration is
not to exceed 50 micrograms per
cubic meter.
New standards:
o The expected 24-hour concentration
is not to exceed 150 micrograms per
cubic meter more than once per year.
o The expected annual average concen-
tration is not to exceed 50.
The advantages of the "expected" meth-
odology over the existing methodology
include:
o It has been used in a similar fash-
ion in generating the Ozone standard
and therefore "familiar" to the
public.
o It uses actual data to generate the
results.
o There is a reduction in year-to-ye'ar
variability.
o It enables the development of stable
control strategy targets.
The difference between the two method-
ologies would therefore appear to be
small and hence readily adaptable to
other standards. One possible candidate
for the new methodology would seem to be
Effluent Guidelines and Standards,
Subchapter N, 40 CFR 400-471. These reg-
ulations stem from the Clean Water Act
(1972) and are based on the engineering
standards of Best Practicable Technology
(BPT) or Best Available Technology (BAT).
These guidelines cover mining industries
(minerals, iron ore, coal etc.), natural
products (timber, pulp and paper, leather
tanning etc.), and the manufacturing
industries (pharmaceutical, rubber, plas-
tics, etc.). A typical standard within
these guidelines is the Steam Electric
Power Generating Point Source Category
(Part 423.12, Effluent Limitations Using
BPT) :
BPT Effluent Limitations
Avgs. of Daily
Maximum values for 30
for any consecutive
Pollutant 1 day days shall
or Property not exceed
Although there are small differences
in sampling protocols, comparison with
the new and old PM-10 standards would
seem to imply that a set of standards
devised on an expected basis would be
possible; however, it is not to be.
The problem lies with the very differ-
ent objectives of the regulations, state
versus industry. The PM-10 standard
applies to a State Implementation Plan, a
negotiated agreement between EPA and the
states enforced through the National
Ambient Air Quality Standards and used
to identify non-attainment areas. The
Effluent Guidelines, on the other hand,
apply to a specific industry and is not a
matter of negotiation.
The resolution of the regulatory
problems will be as difficult as the
associated statistical problems of:
o Assumption of lognormality of data
o Stability of the process over time
o Potential autocorrelation of data
o Uncertainties of data quality
o The optimal allocation of monitoring
systems in non-attainment areas.
Despite these problems, it is clear
that a statistical approach, in this case
expected values based on an underlying
lognormal distribution, is probably the
way of the future; research should be
encouraged in this field. Neil Frank and
Thomas Curran have indicated a viable
approach; where will the next step lead?
Total Suspended
Solids
Oil and Grease. .
Copper, total...
Iron, total
100.0 mg/1
20.0 mg/1
1.0 mg/1
1.0 ma/1
30.0 mg/1
15.0 mg/1
1.0 mg/1
1.0 ma/1
128
-------
ANALYSIS OF THE RELATIOHSHIP BZTWKEH MAXIMUM AMD AVERAGE IH S02 TIME SERIES
Thomas Hammerstrom and Ronald E. Wyzga
1. Introduction and Motivation

Several studies have examined the
physiological and 'symptomatic re-
sponses of individuals to various air
pollutants under controlled condi-
tions. Exposures in these experiments
are often of limited duration. These
studies demonstrate response with
exposures as short as five nrmutes.

On the other hand, monitoring data
rarely exist for periods as short as
five minutes. Some measurement
methods do not lend themselves to
short term measurements; for other
methods, 5-minute data often are
collected but are not saved or
reported because of the massive
effort that would be required. In
general, the shortest time average
reported with monitoring data is one
hour, and for some pollutants even
this time average is too short.

Where monitored data do not exist,
ambient concentrations can be esti-
mated by the use of atmospheric
dispersion models.' The accuracy of
these models degrades as averaging
times decrease and they require
meteorological and atmospheric
inputs for the same time average as
predicted by the model. Thus, air
dispersion models are rarely used for
time averages less than an hour.

There is, thus, a fundamental mismatch
in time periods between health
response and exposure, with responses
occurring after only 5 or 10 minutes
of exposure while exposure data are
only available for periods of an hour
or more. This paper attempts to
address this mismatch by examining
the relationship between a short-term
time average (5 minutes) and a longer
term time average (60 minutes) for
one pollutant (S02) for which some
data are available. Understanding
the relationship between the two time
averages would allow the estimation
of response given longer term esti-
mates of ambient concentration. It
could also help in the setting of
standards for long term averages
which would help protect against peak
exposures.

This paper explores the type of
inferences that can be made about
five minute S02 levels, given infor-
mation on hourly levels. There are
three possible models for health
effects which motivate these infer-
ences :
1. there is one effect in an
hour if any 5-minute exposure
level exceeds a threshold,
2. each 5-minute segment
corresponds to an independent
Bernoulli trial with probability
of an effect equal to some
increasing function of the
current 5-minute level,
3. each 5-minute segment is a
Bernoulli trial with the proba-
bility of an effect depending on
the entire recent history of the
S02 process.

Corresponding to these health models,
there are three possible parameters to
estimate:
1. the distribution of the
maximum 5-minute level during an
hour,
2. the distribution of an
arbitrary 5-minute reading,
3. the joint distribution of
all twelve 5-minute readings.
All three distributions are condi-
tional distributions, given the
average of all twelve 5-minute
readings. The first conditional
distribution is the parameter of
interest if one postulates that the
dose response function for health
effect is an indicator function and
only one health event per hour is
possible; the second is the parameter
of interest if one postulates a
continuous dose response function with
each 5-minute segment constituting an
independent Bernoulli trial; the third
conditional distribution is of
interest if one postulates that the
occurrence of a health effect within
an hour depends continuously on the
cumulative number of 5-minute peaks.

This paper discusses some approach-
es to each of these three estimation
problems. Section 2 discusses why
the problem is not amenable to
solution by routine algebra. Sections
3 and 4 present results for the
estimation of the maximum. Section
3 presents some ad hoc methods for
modelling the maximum as a simple
function of the average when both are
known and discuss how to extend these
methods to estimate the maximum when
it is unknown. Section 4 discusses
the error characteristics of these
methods. Section 5 presents an ad
hoc method of estimating an arbitrary
5-minute level from the hourly
129
-------
average; Section 6 discusses the
error characteristics of this method.
Finally, Section 7 presents an
estimation of the joint distribution
of all twelve 5-minute readings,
derived from a specific distribution—
theoretic model for the 5-minute time
series and discuss some of the
difficulties involved with extending
this.

2. Obstacles to Theoretical Analysis

A brief discussion of why we resorted
to ad hoc methods is needed to begin
with. In theory, given a model for
the (unconditional) joint distribution
of the time-series of 5-minute
readings, it is straightforward to
write down the exact formula for the
joint conditional distribution of the
twelve 5-minute readings, given the
average.
If ^ = (Xi , .
f(x) ana if
. . , Xp ) has joint density
~
=
then the conditional joint density is
given by equation 1 .
(1) h(x,;
j f(x) ax
- ~\- -^/
= f(x)
=~x)
: ?x< /p = ~x}
where S is the simplex
and I is, the indicator function.

The conditional distribution of the
maximum and the conditional distribu-
tion of any 5-minute reading would
follow immediately from the condition-
al joint distribution of all twelve
5-minute levels.

Unfortunately, estimation of the
unconditional joint distribution,
f(x), of the 5-minute time-series is
not easy. Non-parametric density
estimation requires gigantic data sets
when one is working in several
dimensions .

Parametric modelling also poses
formidable computational problems. If
f(x;9) is the joint density of the
5-minute levels, then the log likeli-
hood function, based on observing only
_a_ sequence of N hourly averages, x"i ,
xz , . . . , XN , is gi ven by
(2)
(9)
/V f

= ~2_ log )
. <~
f (w;6) dw
Here S, = (w: J Wj /p = x, } for i=1,
2, . . . N. ~" •

Each term on the right hand side is
the integral of a 1 2-dimensional
density over an 1 1-dimensional
simplex. For most reasonable choices
of a joint distribution of the
5-minute readings, these integrals can
only oe evaluated numerically, using
Monte Carlo methods. To find maximum
likelihood estimates of 9, one
must numerically evaluate Lx at
sufficiently many values of 9 to
approximate the maximizing value. 9
is always at least three dimensional
(location, scale, correlation) and N
will be in the hundreds (or
thousands), making numerical maximum
likelihood estimation a nearly
insurmountable task. (Moreover, the
hourly averages in the observed must
not be consecutive hours but must be
far enough apart in time to be
effectively independent; otherwise,
the likelihood function is even more
complicated.)

An additional problem with parametric
modelling is the choice of the
functional form of the joint density
f. One can test hypotheses that the
hourly averages come from one of the
commonly used distributions: lognor-
mal, Weibull, or gamma. However, if
hourly average S02 readings are, say,
lognormal, then 5-minute averages are
not lognormal. In general, one would
expect the hourly averages to tje
closer in shape to the normal distri-
bution than are the 5-minute levels.
(At least, this would be true if
the 5-minute levels have the same
finite variance.) There is no
technique for inferring the functional
form of the distribution of the
individual terms in a sum from the
functional form of the distribution of
the sum.

As an alternative to theoretical
modelling of the relevant conditional
distributions, we have explored some
ad hoc empirical methods of estima-
tion. It is important to bear in mind
that the objective of the exercise is
not merely to determine a functional
form for the relationship between
5-minute levels and hourly averages;
but rather it is to provide specific
numeric estimates that can be used
when the five-minute levels are not
observed. There are no unknowns when
the five-minute levels are known so
the only application of such a
technique is- extrapolation to situa-
tions where no data for new parameter
estimation are available.

3. Estimation of the Maximum

3.1 Nature of the Data

The Electric Power Research Institute
has collected data relevant to this
inference from two different studies.
The first comes from a group of
stations monitoring a point-source;
the second from a station monitoring
ambient levels in a populated area.
At these two sites, data were collect-
130
-------
ed in each 5 minute segment for long
periods of time, permitting direct
comparison of the hourly and 5-minute
levels. The first data set analyzed
was from 18 monitors around the
Kincaid power plant in Illinois, a
coal-fired plant in Christian County,
111., with a single 615 foot stack and
a generating capacity of 1320 mega-
watts. The data set consists of nine
months of observations from 18
stations around this plant. SO2
readings at these stations reflect the
behavior of the plume from the stack.
For a given monitor there are long
stretches where S02 levels are zero,
indicating that the plume is not
blowing toward the monitor. Such
readings constitute about 12% of the
hours in the data set; these were
discarded before any further analysis
was done. The second data set
consists of SO2 data from a New York
City monitoring station not near any
dominant point source. The data were
collected between December 15, 1981,
and March 11, 1984.

3.2 Outline of Methods Used

We explored three empirical methods of
estimating the maximum 5-minute
reading from the hourly average. All
three methods postulate a simple
parametric model for the maximum as a
function of the average. The methods
differ only in how estimates of the
parameters are obtained. The first
method obtains parameter estimates
from data containing 5-minute readings
and then uses these estimates for
other data sets collected elsewhere
(and containing only 1 hour readings).
This method is motivated by the theory
that there is a universal law govern-
ing the relationship between the
maximum and the average of an S02 time
series, with the same parameters at
all sites. The second method requires
expending effort to collect 5-minute
data for a short period of time at the
site of interest and using the data
from this period to obtain parameter
estimates that will be used over
much longer periods when sampling is
only on the 1-hour basis. The third
method fits a simple parametric model
to the maximum hourly reading in a
12-hour block as a function of the
average over the 12-hour block and
then assumes that the same model
with the same numeric estimates
describes the maximum 5-minute
level in an hour as a function of the
hourly average. (Daily cycles are
removed from the 12-hour block data
prior to estimation by dividing by
long-term averages over a fixed hour
of the clock.) For mnemonic purposes,
we will call these three methods:
1. the method of universal constants,
2. the method of short-term monitors,
and 3. the method of change of
time-scale.

Estimates of the potential errors in
the method of universal constants were
obtained by using the parameter
estimates from the New York data to
fit the Kincaid data and vice versa.
Potential errors in the method of
'short-term monitors were estimated by
dividing both data sets into batches
100 hours long and then using each of
the hundred odd resulting parameter
estimates to fit 13 randomly selected
hours. The hours were chosen by
dividing the range of hourly averages
into 13 intervals and choosing one
hour from each interval. Potential
errors in the method of change of
time-scale were obtained by simply
comparing the maxima predicted using
the estimates from the 12-hour blocks
in each data set with the observed
maxima in the same data.

3.3 Parametric Models for the Maximum

The parametric models proposed here
are intended to give ad hoc approxima-
tions to the maximum. One can show
that they cannot be the true theoreti-
cal formulae. Because the maximum
necessarily increases as the average
increases, it is more convenient to
work with the rati.o of the maximum to
the average than with the maximum
itself. Previous authors (Larsen et
al . , 1971) working on this problem
have used models in which log(ratio)
is linear in log(average). Therefore,
we began by fitting such a model to
the two data sets by ordinary least
squares. These estimates are given in
Table 1. As may readily be checked,
for both data sets, this model leads
to impossible values, fitted ratios
which are less than one, for large
values of the average. For the
Kincaid data, this occurs at rela-
tively low values of the average.

In fact, it is not thought that a
single universal set of constants
applies to the regression of log
(ratio) on log(average). Rather, it
is thought ' that the atmospheric
conditions around the monitor are
classified into one of seven stability
classes; and it may be more appro-
priate to assume the parameters
of the regression are constant within
a given a stability class. It is
possible that the impossible values of
the fitted maximum occur because of a
Simpson's paradox in the pooling of
data from several stability classes.
Ideally, the above model should be
fitted separately to each stability
class. Unfortunately, there were no
meteorological data available to
131
-------
permit such a partition of the data.
It is possible that it would be
worthwhile to obtain such data and
redo the analysis. The difference
between the Kincaid and New York City
sites must be emphasized. The
sources and variability of pollution
are very different, and it may not be
reasonable to extrapolate from one
site to another; two data sets from
like sites should be considered in
subsequent analyses.
In order to prevent the occurrence of
impossible fitted values, we fit
models in which the log[log(ratio)] is
a linear function of the log(-average).
The ordinary least square (OLS)
estimates (for New York and Kincaid)
of this line are also given in Table
1. Figures 1 and 2 show the scatter
plots of the maximum vs the average.
Both axes have logarithmic scales. If
the log of the ratio were linear in
the log of the average, one would
expect that the vertical width
of the scatterplot would remain
roughly constant as the average
varied. Instead, it appears that the
scatterplots narrow vertically as the
average increases, as would be
expected if the iterated logarithm of
the ratio were linear in the log of
the average. For both data sets, it
appears that the iterated log log
model more accurately mimics the
real data than the only former model
shows the diminishing (on log scale)
spread of the maximum with increasing
values- of the hourly average.
This model is the preferable one to
estimate the maximum.

In both data sets, the residuals were
slightly negatively skewed with the
skewness being greater in the Kincaid
data. It seems reasonable to assume
that the residuals in the New York
data were aoproximately normal. This
assumption is harder to maintain for
the Kincaid data. Figures 3 and 4
show the histograms and normal
probability plots for the residuals
from these two regressions.

The main purpose of the analysis is to
obtain a formula for estimating the
conditional distribution of the
unobserved 5-minute maxima from the
observed hourly averages. The
iterated log vs log models yield the
following two formulae, given in
equations 2 and 3.
(2) Probt 5-mi nute max<_ x | hourly
average = y) =

F(xly) =
i£ ( {loglog(x/y)+. 267*1 og(y)+.719}/. 62 )

for New York
(3) Prob(5-minute max! * ! hourly
average = y) =

F(xly) =
G({loglog(x/y)+.258*log(y )+.191} )

for Kincaid.

Here S? is the normal cumulative
distribution and G is the empirical
distribution function of the residuals
of the OLS regression of loglog ratio
on log average. We recommend using G
in place of treating these residuals
as normal. G is tabulated in table 2;
its histogram is graphed in figure 4.
Equations 2 and 3 do a reasonably good
job of modelling the observed maxima
in the two data sets from which the
values of the parameter estimates were
derived.

Inverting equations 2 and 3 gives
simple formulae for the percentiles of
the conditional distribution of the
5-minute maxima. Notice that equation
3, table 2, and linear interpol-
ation permit estimation of percent-
lies of the Kincaid maxima from the
5'th to the 95'th. Attempts to
estimate more extreme percent!les
would require foolishly rash extrapol-
ation.

The log vs log models provide a
competing (and somewhat inferior)
method of -estimation. They yield
conditional distributions of the
5-minute maxima given by equations 4
and 5.
(4) Prob(5-minute max!
average = y) =
hour!y
F(x!y) =
{log(x/y)+.077*log(y)-.499}/.2
(5) Prob( 5-mi nute max<.
average = y) =
for New York

x | hourly
_ F(x|y) =
c£( {log(x/y)+.21*1og(y)-1.07}/.69 )

for Kincaid.

In these regressions, we found it
acceptable to use a normal approxima-
tion for the residuals in both New
York and Kincaid.
4. Error Estimation

4.1 Errors in the Method of Universal
Constants.

It is not feasible to use a conven-
tional method to estimate the uncer-
tainty in the maxima fitted with this
132
-------
method. The major difficulty is that
one is not looking for a well-behaved
estimator but rather for a particular
numeric value of the estimate for use
in all data sets. The standard error
of the estimate in one data set is
quite misleading as a measure of the
error that would result from using
that same estimate in another data
set. A further exacerbation results
from the high correlation between the
observations used to generate the
estimates. The conventional formulae
for the standard errors will exag-
gerate the amount of information in
the data set and yield spuriously
small standard errors. Finally, there
is the problem that one knows that the
model is theoretically incorrect and
that the true underlying distribution
is unknown so the conventional
standard error formulae based on the
modeled distribution are necessarily
in error. One would suspect that even
if the model adequately approximates
the first moment of the maximum, it
approximates the second moment less
wel 1 .

AS an alternative method for estimat-
ing the uncertainty in the method of
universal constants for all data sets,
a cross-validation method was pursued.
We used the estimated parameters from
each of the New York and Kincaid data
sets to estimate the maxima for the
other data set. For each hour, the
estimated maximum were divided by the
actual maximum, the resulting ratios
were grouped into 10 bins, according
to the value of the hourly average.
Within each of these bins, we computed
the three quartiles of the quotients
of fitted over actual maxima. Figures
5 and 6 show these three quartiles of
the fitted over true ratios, plotted
against the midpoint of the hourly
averages in the bin.

One should recall that the Kincaid
data reflect the situation near a
point source while the New York data
reflects ambient levels far from any
point source. Consequently, this
method of cross-validation may
exaggerate the error associated with
this procedure. However, unless
additional 5-minute data are collected
anq analyzed from a second plant and
from a second population center
station, it is difficult to determine
how much of the error is due to the
disparity of sites and how much due
to the metnod.

The most striking feature of these
plots is that the two cross-valida-
tions are biased (necessarily, in
opposite directions). The higher
values of the hourly average (the
right half of the graph) are of
greater interest. For the New York
data, the first quartile of the ratio
of fitted over the actual maximum is
greater than 1; i.e. the estimated
maximum is too high three fourths of
the time. The median of the fitted
over actual ratio is, for most hourly
averages over 1.2; i.e. the estimated
maximum is 20* too high more than half
the time. The estimated maximum is
30-40* too high at least a quarter of
the time. The situation at Kincaid is
essentially the mirror image of this:
for the higher values of the hourly
average, the third quartile of the
fitted over actual ratio is below .9;
i.e. estimated maxima are at least 10*
too low nearly three fourths of the
time. They are 30-40* too low nearly
half the time; are 50-60* too low at
least a quarter of the time.

The proportionate error diminishes as
the hourly average goes up. This, of
course, is an artifact of using fitted
value/true value as the measure of
error. In absolute size (ug/m"3), the
errors would not diminish as the
hourly average increases.

4.2 Errors in the Method of Short-
term Monitors.

In order to estimate the errors
associated with attempting to estimate
parameters of the ratio-average
relationship at a given site by
actually measuring 5-minute levels for
a short time, each data set was
divided into batches 100 hours long
and OLS estimates were derived for
each batch. There are 125 such
batches in the New York data and 158
batches in the Kincaid data.

It is difficult to judge the potential
in estimating the maxima by simply
looking at the uncertainty in these
parameters. In order to further
clarify the errors of direct interest,
we divided the hours into 13 bins,
according to the size of their hourly
averages. For each OLS estimate from
a batch, we randomly selected one hour
from each of the 13 bins and computed
the quotient -of the fitted maximum to
the true maximum for each hour. We
then computed the three quartiles of
the resulting quotients in each of the
bins. Figures 7-10 show these three
quartiles, plotted against the hourly
average.

In contrast to the previous method,
these estimators are nearly median
unbiased. That is, the median value of
the quotient is just about 1, corres-
ponding to accurate estimation. For
hourly averages greater than 1 ug/m"3
one can see that the iterated log
models lead to estimates of the maxima
133
-------
that are within 20 to 40* of the
actual maxima at least half the time
for the Kincaid data and within 10* at
least half the time for the New York
data. That is, the first and third
quartiles of the fitted over actual
ratios fall at .9 and 1.1 for New
York, at .8 and 1.2 for Kincaid (at
least on the right half of the plots).
The log models have roughly the same
error rates. It is also worth
noting that, for the Kincaid data, the
log models continue to give impossible
fitted values in many cases.
Comparing these results to those
obtained from the method of universal
constants, one can see that "the method
of short-term monitors offers some
improvement in accuracy over the
former method, where the estimates are
noticeably biased and errors of
20* in the estimated maximum occur
half the time. The increased accuracy
is much more noticeable with the New
York data. At this time it is
impossible to say whether a comparable
difference in accuracy would be
present at most population center
stations and absent at most point
source stations.

4.3 Errors in the Method of Time-
Scale

The third method suggested was -to
remove a daily cycle from the observed
hourly data and then assume that the
relationship between peak and mean of
twelve hourly readings is the same as
the that in twelve 5-minute readings.
A priori, one would expect that this
method to be the least effective of
the three. The correlation of
successive 5-minute readings will be
higher than that of successive hourly
averages; averages over longer time
scales should come from distributions
closer to Gaussian so the functional
form of the unqerlying distributions
will not be the same. In fact, the
parameter estimates obtained this way
are seriously in error, as can be seen
by comparing the estimates in Table 3
with those in Table 1.

Figure 11 shows plots of quotients of
the maximum estimated from the 1-hour
to 12-hour relation to the maximum
estimated from the actual 5-minute to
1-hour relation. Results from both
sets and both the log vs log and the
iterated log vs log model are graphed.
At high levels, the estimates in New
York are too high by 10-20*; at low
levels, they are seriously biased low.
In the Kincaid data, estimates from
the iterated log vs log model are
too high by 50-60*; the performance of
the log vs log model is even worse.
These plots, which roughly correspond
to the median accuracy using this
method, were so bad that we did no
further investigation for the Kincaid
data.

A similar procedure was applied to
the New York data to predict the
maximum for the iterated log and log
models, respectively, with results
similar to those obtained from the
Kincaid data. The predictions are
biased high; three fourths of the
time, the fitted value is at least 5
or 10* too high; half the time, the
fitted value is at least 10 or 20* too
high. Somewhat surprisingly, the log
versus log model performs somewhat
better than the iterated log versus
log model for this data set.
5. Estimation
5-Minute S02 Level
of an Arbitrary
The second objective of the analysis
was to find a model for the condi-
tional distribution of an arbitrary
5-minute SO2 level, given the hourly
S02 average. As an alternative to the
theoretical calculation, the following
ad hoc method was considered.

1. Use deviations of 5-minute
S02 levels from their hourly
averages, rather than the
5-minute levels themselves.

2. Make deviations from dif-
ferent . hours comparable by
dividing them by a suitable
scaling factor. The usual
scaling factors, the standard
deviation or the interquartile
range within an hour, cannot be
used because one wants a method
that can be used when knowledge
of variability within an hour is
not available. The scale factor
must depend only on the hourly
average. We employed a scale
factor of the form
exp(B *log(hourly average) + A).
The slope and intercept, B and A,
were obtained by OLS regression
of log(hourly SD) on log(hourly
average), in each data set
separately. In practice, it would
be necessary to use the parameter
estimates from these two data
sets in future data sets which
contain only SO2 hourly averages.

3. Pool all the scaled
deviations together and fit a
simple parametric model to the
resulting empirical distribu-
tion.

This three step method was applied
separately to each data set. The
estimated conditional distribution
function is given by equation 6.
134
-------
(6) Prob( 5-mi nute S02 level <. x|
hourly average S02 = y) =

__ F(x| y) =
( (x-y)/exp(B*ln(y) + A) ).
The numerical values of A
given in table 4.
and B are
We found that the -standard normal
distribution worked acceptably well
for both the New York and the Kincaid
data. An attempt to use a three
parameter gamma distribution to
compensate for some skewness in the
scaled deviations did not. lead to
enough improvement to justify the
introduction of the extra parameters.
One should note that there is a
systematic error in this procedure
that was not present in modelling of
the maximum. Given the serial
correlation of successive five-minute
readings, the readings in the middle
of the hour will be more highly
correlated with the hourly average
than will the first or last readings.
The model in equation 6. is intended,
at best, to predict the value of a
5-minute reading selected at random
from one of the twelve time slots
during an hour, not the value of a
5-minute reading from a specified time
slot.
6. Error in the Estimation of Any
5-Minute S02 Level

There are two types of error that one
may consider here. First, there is the
error in using equation 6 to estimate
the proportion of 5-minute readings
which exceed a given level of SO2.
Second, there is the error in using
the equation to estimate the level of
S02 that corresponds to a given
percent!le of the distribution of
5-minute readings. If one is'con-
cerned about the frequency of exceed-
ances of a threshold for health
effects, it is the first type of error
that is of interest. We will discuss
only the estimation of this first type
of error.

Cross-validation between the two data
sets was used to measure the error.
The estimated slope and intercept of
the scaling factor (the only unknown
parameters in the model) from the New
York data and the observed hourly
averages from the Kincaid data
to predict the scaling factors in the
Kincaid data. We then divided all the
observed deviations from the hourly
averages by these scaling factors. If
the parameter estimates are good,
these scaled deviations should be
close to a standard normal distribu-
tion.
We grouped these scaled deviations
into 16 bins, according to the level
of the hourly average. To quantify
how well the estimates performed, we
computed, for each of the 16 bins the
observed proportion, p", of scaled
deviations which exceeded the values
-2, -1, -.5, +.5, +1, +2. This
corresponds to using as thresholds the
5'th, 15'th, 30'th, 70'th, 85'th and
95'th percent!les of the 5-minute
readings, computed using the correct
parameters. Figure 14 shows the plots
of these five P's against the hourly
average. (The six curves correspond
to the nominal 5'th through 95'th
percentiles; the ordinate shows the
percentage of scaled deviations
actually less than that threshold.)
The whole procedure was then repeated,
reversing the roles of the New York
and Kincaid data sets. Figure 15
shows the plots of the P's from New
York data with Kincaid parameters.

It can be seen from these two plots
that the 5-minute readings in the
Kincaid data are more dispersed about
their hourly averages than would be
expected from the New York data. At
high values of the average, a thresh-
old which one would expect to be the
70'th percentile is actually only the
55'th to 60'th percentile; what one
would expect to be the 85'th percen-
tile is actually between the 60'th and
the 70'th percent!le; what one
would expect to be the 95'th percen-
tile is actually only about the 70'th
to the 80'th percentile. Consequent-
ly, if one were using the New York
data for parameter estimates, one
would noticeably underestimate the
frequency of exceedances of a thresh-
old.

Necessarily, one finds the opposite
situation when 5-minute readings in
New York are inferred from the Kincaid
data. As shown in figure 15, a
threshold that one would expect, on
the basis of the Kincaid data, to be
only the 70'th percentile of 5-minute
readings would actually be nearly the
95'th percent!le in New York.
Consequent!y,"if one were using the
Kincaid data for parameter estimates,
one would noticeably overestimate the
frequency of exceedances.

7. Theoretical Modelling of the Joint
Distribution of 5-Minute Levels

We made some attempts to explore
theoretically motivated paramet-
ric models for the third problem
listed in the introduction, namely
estimation of the joint distribution
of the 5-nmnute levels, conditional on
the hourly average. The most popular
choice of marginal distribution for
135
-------
SO2 levels, when averages over a
single length of time are observed, is
the lognormal. We therefore tested
the goodness-of-fit of the lognormal
distribution to the 5-rmnute sequences
at Kincaid and New York. The 5-minute
readings at New York appeared to fit a
lognormal distribution acceptably. (A
formal test would reject the hypo-
thesis of lognormality. However, it
appears that the deviation from the
lognormal is small enough to be of no
practical importance even though the
enormous sample size leads to formal
rejection of the model.) The 5-minute
readings at Kincaid appeared notice-
ably more leptokurtic than a' lognormal
distribution. We therefore did no
further work with the Kincaid data.

Estimation of the joint conditional
distribution requires three further
assumptions. First, we assume that
the unconditional joint distribution
of all the logs of 5-minute levels is
multivariate normal. This seems
reasonable in light of the approxi-
mate marginal lognormality. Second,
we assume that the autocorrelation
structure of the sequence of loga-
rithms of the 5-minute levels is a
simple serial correlation, the
correlation at lag i being just rho to
the i'th power. This is necessary to
keep the number of parameters in the
model down to three. In fact, the
sample correlations at lags 2 to 4 are
not too far from the second to fourth
powers of the lag *1 correlation.
Third, we assume that the hourly
average observed was the geometric
mean of the twelve 5-minute levels,
although it was in fact the arithmetic
mean. This assumption is explicitly
false: the true geometric mean is
smaller than the observed average, but
the higher the correlation between
successive 5-rmnute readings, the
smaller the difference between the
arithmetic and geometric means. This
assumption is made in order to get an
algebraically tractable problem and
with the hope that the high serial
correlation will make it close to
true. With these three assumptions,
it follows that the logs of the
5-minute levels and the log of
their geometric mean come from a
13-dimensional normal distribution
with a rank 12 covariance matrix.

One now finds that the desired
conditional distribution of the
vector of 12 log 5-minute readings,
given the log of the geometric mean,
is 12-dimensional normal with mean and
variance given by the standard
multivariate regression formulae.
Letting Zi = log of the i ' th 5-rmnute
reading, we have that the mean and
variance-covarlance matrix of this
conditional distribution are given by
equations 7A and B:
Cov(Zi ,Z)*(Z - // )/Var(Z
(7A)
(78) Var(z;z) =
Var(Z) - "Cov(Z,Z)Cov(Z,Z)'/Var(Z).
S\J ^J ~^s

In more detail, the i ' th coordinate of
the vector of covariances of the logs
of the 5-minute readings and the log
of the geometric mean, Cov(Zi ,Z), is
equal to
... + £ '2-' }/ 12

and the variance of the log of the
geometric mean is equal to

V= CT 2 * {12
The problem of estimating the joint
distribution of the 5-rmnute levels,
given the hourly average, is now
reduced to the problem of estimating
the three parameters (mu, Sigma, and
rho) in the above expressions, when
one observes only the sequence of
hourly averages. Because the sequence
of observed logs of geometric means is
also a multivariate normal sequence,
it is simple to estimate the mean,
variance, and covariance of this
sequence. Specifically, the log of
the geometric mean is normal with mean
equal to mu, with variance equal to V
above. Furthermore, the logs of the
geometric means in successive hours
are bivanate normal with covariance
equal to
C =
+ 11
2 * {

> 1 3 4
+ 2
2 2
12 £
£ 23 } / 144.
The (computable) maximum likelihood
estimates of the mean mu, variance V,
and covanance C of the hourly
averages uniquely determine the MLE's
of the parameters mu, sigma, and rho
of the 5-minute series.

The estimated conditional distribution
of the logs of the 5-minute levels in
New York, given their hourly averages,
is shown in Table 5. This distribu-
tion is 12-dimensional normal with the
indicated numerical values for the
vector of conditional expectations of
the logs of the 5-rmnute readings,
given the hourly average, and for the
variance-covariance matrix.

One can also attempt to elaborate on
the above computation by making
approximate corrections for the fact
136
-------
that one actually observes the
arithmetic mean rather than the
geometric mean. All of the above
equations and distributional formula-
tions are still valid. The only
problem is that they cannot be used
for computation if the geometric means
are not observed. We suggest that the
following approximations be used when
only the arithmetic, means are ob-
served. First, compute the first and
second sample moments of the observed
sequence of arithmetic means and use
these values to get method of moments
estimates of the parameters mu,
sigma, and rho. (The arithmetic means
are not lognormal so these are not
maximum likelihood estimates.) These
parameter estimates then specify
numerically the joint distribution of
the 5-minute levels, given the
geometric mean. To complete specif-
ication of this distribution, one need
only give a numeric estimate, based on
the arithmetic mean, of the geometric
mean. A reasonable choice is to set
the estimated sample geometric mean
equal to the observed sample arith-
metic mean times the ratio of the
estimated expectation of the geometric
mean to the estimated expectation of
the arithmetic mean.

Application of the above protocol
requires only expressions, in terms of
mu, sigma, and rho, for four moments:
the expectations of the sample
arithmetic and geometric means, the
variance of the sample arithmetic
mean, and ' the covanance of the
arithmetic means of successive hours.
Given that the logs of the 5-minute
readings are serially correlated
normal( u, o~ 2)'s, the expected
values of the arithmetic and geometric
means are, respectively,

EA = exp(^j + cr2 /2) and

EG = expuj + 6 ~2/2 ) where

6 = { 12 + 2*[ 1 1 £ + 10 Cl 2 + . . . +

p i 1 ] } / 1 44 .

The variance of the arithmetic mean is

VA =
jj+ 0-2 }*{12 + 2*[1 1 (exp('jr z^)-i ) +

I0(exp(7 2 9 2 )-l ) +

. . . + (exp(.-r 2 g i i )~1 ) ] J/144.

Finally, the covanance of the
arithmetic means from two consec-
utive hours is equal to
1 1 (exp(;r 2

I0(exp(0" 2
(exp(
5/144,
C« = exp(
(exp(tr
. It is important to note that all of
the above theoretical modelling is
heavily dependent on the assumed
multivariate lognormality of the
5-mmute levels. If the 5-mmute
levels were marginally Weibull,
Gompertz, or gamma then none of the
above manipulations would work.
Furthermore, in new data sets it will
not be possible to check for lognor-
mal ity of the 5-minute sequence by
examining only the sequence of hourly
averages. Thus, the techniques
outlined in this section can only be
applied by either taking lognormality
on faith or by taking the trouble to
observe enough 5-minute levels to
perform at least a simple check on
lognormality.

8. Conclusions

There does not seem to be any reliable
method for estimating the maximum SO2
level within an hour from knowledge
only of the time series of S02 hourly
averages at the' same site. The theory
that there is a simple relationship
between the 5-minute and hourly
averages, governed by the same
constants at all sites, is not
borne out by the two data sets
examined. In fact, the functional
form of the marginal distribution of
5-minute levels is not even the same
at the two sites. One must recognize
that the two sites considered were
very different. The analysis should
be repeated with data from similar
sites to determine the extent of
extrapolation across sites that is
possible.

If the expense is not prohibitive, the
best results are likely to be obtained
by taking the trouble to measure the
5-rmnute time- series for a period of
100 or so hours. Even this effort
cannot promise better than an even
chance of predicting future maxima to
within ± 20*. Using parameter
estimates from one of the few sites
where 5-minute data have been collect-
ed or from the relationship between
the hourly and 12-hourly averages at
the site in question are likely to
lead to somewhat less accurate
predictions. The magnitude of the
errors associated with attempts to
predict the proportion of 5-minute
readings which exceed a threshold are
comparable to those experienced in
137
-------
estimating the maximum. If standards
are to be established with the inten-
tion of limiting the health effects
associated with high short-term
exposures, then these limits on the
accuracy in prediction must be borne
in mind in the setting of standards.

Given the ad hoc nature of the
parametric models used, one might
try other paranietri zations—e.g.
estimate the transfer function
between the time series of hourly
averages and the time series of
hourly maxima—to see if better
approximations can be obtained.
Because the iterated log model does a
fairly good job of estimating the
maxima in the data set from which the
parameters were estimated and because
the marginal distributions at the two
sites considered are not even of the
same form, we think it unlikely that
other choices of parametrization will
lead to much reduction in the cross-
validation errors.

The task of estimating the conditional
distribution of an arbitrary 5-minute
level, given the hourly average,
appears to be equally difficult. It
appears that using ad hoc parameter
estimates obtained from one site to
predict 5-minute levels at another
site leads to biased predictions. In
the two data sets compared here, it
was- impossible to tell reliably
whether a given level would be
exceeded 5* or 30% of the time.

Estimation of the joint distribution
of all twelve 5-minute levels, given
their average, appears feasible only
if one is prepared to assume a
lognormal distribution for the
unconditional distribution of these
readings. There are data sets for
which this is demonstrably not true.
Thus, it again appears that the most
reliable estimates can be obtained
only by observing at least enough of
the 5-minute sequence to check lognor-
mality roughly.
BIBLIOGRAPHY

(1) Grande!1, Jan (1984), Stochastic
Models of Air Pollutant Concentration.
Spnngei—Verlag, Berlin

(2) Johnson, Norman and Kotz, Samuel
(1970), Continuous Univariate Distri-
butions, vol. 1. John Wiley & Sons,
New York

(3) Larsen, Ralph (1971), A Mathe-
matical Model for Relating Air
Quality Measurements to Air Quality
Standards. U.S. Environmental Protec-
tion Agency, Office of Air Programs,
Research Triangle Park, North Carolina

(4) Legrand, Michael (1974), Statis-
tical Studies of Urban Air Pollution—
-Sulfur Dioxide and Smoke, in Statis-
tical and Mathematical Aspects of
Pollution Problems.
Marcel Dekker, Inc,
John Pratt
New York
ed,
(5) Pollack,
Studies of
Richard
Pollutant
I. (1975),
Concentration
Frequency Distributions. U.S. Environ-
mental Protection Agency, Office of
Research and Development, Publication
EPA-650/4-75-004, Research Triangle
Park, North Carolina
138
-------
TABLE 1
Descriptive Statistics
Station
Mean
S.D.
Skewness
Kurtosis
Hr Avg

Hr Sd

Log (Avg)

Log (SD)

NY
Kincaid
NY
Kincaid
NY
Kincaid
NY
Kincaid
19.61
20.78
3.34
13-71
2.64
1.77
.84
1.25
18
75
3.6
109
.85
1.6
.84
1.4
2.8
47
3.5
109
-.3
.0
.3
1.01
15
3810
21
13000
.3
-.2
-.2
.12
257
2500
57
5000
5.55
7.82
4.04
8.52
Regression of Log (Ratio) on Log (Average)
Station
Slope
Intercept
RMSE
Regression of LogLog (Ratio) on Log (Average)
Station
Slope
Intercept
RMSE
Ratio <1 When
Average >
NY
Kincaid
-.077
-.210
.499
1.07
.20
.69
652
163
Correlation
NY
Kincaid
-.267
-.258
-.719
-.191
.62
1.06
-.34
-.36
139
-------
TABLE 2
Distribution of Residuals at Kincaid
Value of
Log(log(ratio))
Percent
-2.03
-1.43
-.70
.23
.76
1 .24

1.43
.05
. 1 0
.25
.50
.75
.90
.95
Table 3
Regressions from Method of Change of Time Scale
Model
Data Set
Slope
Intercept
Iterated New York -0.0854
Log -0.0528
-0.415
0.716
Iterated
Log
Kincaid
-0.12
-0.170
0.606
2.010
140
-------
TABLE 4
Fitted Models for Spread of 5-Minute Levels
Regression of Log (SD) on Log (AVG
Station
Slope
Intercept
Correlation
Squared
NI
Kincaid
.687
.645
-.972
.114
.49
.53
Regression of SD on Average
Station
Slope
Intercept
Correlation
Squared
NY
Kincaid
.114
1 .1 97
1.109
-11.169
.33
.67
141
-------
TABLE 5
CONDITIONAL MEANS AND VARIANCES OF LOG 5-MINUTE LEVELS
zbarO
:zb-ar)
2.6-1
2.6.1
2.61
2.6H
2.&H
2.61
2.b1
2.f,1
2.61
2.6H
2. 61
O . O5&
O.CH1
O.G27
0.015
O.OO3
-O.OO?
-O.015
-O.O22
-O.02?
-O.032
-O.O3H
-0.035
» (1.98-1 >
» 1 . 002 '«
» 1 .00? »
» 1.012 «
» 1 . 0 1 3 »
» 1 . 0 1 3 »
t 1.012 *
* l.OO? *
» 1.O02 '••
* O . 'H9H *
4O.98H *
O . OH 1
O . OH :i
G.G2:j
O.Olt
O.OOH
-O.OO 6
-O.OH
-O.021
-O.H2O
-O.O3O
-O.O33
-O.I 13-1
1 1 z-h-'ir -2. t
C?t«.:ir-2.E
t i^zti-Eir--2 .(
Kzhar-2-f
i i. zh.tir-2.(
Kzh

O.015
O.Olf.
O.OI 9
O - O2T.
0.012
0 . 002
-O.OO&
-O.OI 3
-O.OliJ
-O.O23
-0.026
-O.O2?
•

O.OO 3
O . OOH
O.O07
0 . i..i 1 2
0.01:3
1 1 . l lOO
-O.O01
-O.OG0
-O.OI 3
-O.O1Q
-O . O? 1
-0 . O22

-O.OO?
- o . due
-O.OOIi
O.O02
O . OOt:
O.O15
o . one.
- O . CIO 1
-o.oot.
-CI.011
-O.O1-I
-0.015

-O.Ol'J
-O.OlH
-O.OI 1
-O.MOIj
-O.001
O.OO6
0.0 15
o.noa
O.i to 2
-O.OO 3
-O.MO6
-G.I JO 7

-0.022
-G.O21
-o.o m
-O.OI J
-O.OOO
-O.O01
O.I 100
O . 0 ] M
O . 0 1 2
O.OO?
O.OOH
O.OO 3

-O.O2?
-O.O2&
-O.O23
-0.018
-O.O 13
-0.0Gb
O.OO2
0.012
O.02 5
O.O1-3
o.oie.
O.O 15

-O.032
-0.03O
-0.023
-O.023
-0.01O
-0.011
-O.OO 3
0.00?
0.019
O.032
O.O28
0.027

-O.03H
-O.O33
-O.O 30
-O.02C
-O.O21
-O.01H
-O.OO6
O.OOH
O.O 16.
O.028
0.013
0.01 1

-0.035
-U.03H
-(1. 032
-0.02?
-O.022
-O.i 115
-O.OO?
o.o on
O.O 15
O.02?
O.OH1
O.056
Hero Z = v*clor" of • logs of 5-ninut.S1 rs-^dings

= obsc-rved valuo of log hourly gc-onc-lric nc>an
-------
FIGURE 1
MAXIMUM VERSUS HOURLY AVERAGE: NEW YORK DATA
MAXIMUM

400 |
90
20 »
A
AAA
CAA
AA
ABACD
A CBCA
A AAA ADEED
A AAA AA BBA BDMIRB
A AA DCCGERMKC
A A BCACBCHMJTZTB
AC A ADDEDFEIQYZZR
A ABCADEDDLPXZZZYC
ABAA ABBDDCEEMYZZZZD
A BADFIFIIOQZZZZZUA
AAABDDFUIPSWZZZZZB
AA BBCDHGDHRUZZZZZZO
ABBDDESSYZZZZZO
AA ECIEFMIZTZZZZZZC
AAAAAACC GHFOSZZZZZZZV
AAAABABCCGHVMZZZZZZZA
A A BADDKIIRXZZZZZZZD
A CBBAEJIIISZZZZZZZZH
BCDFKMKUZZZZZZV
A BBBDDGJRYZZ2ZZZZB
B D AAADJPYZZZZZZZZB
B ABBCDJKZZZZZZB
A AACCUZZZZZZUC
A B ABBCBDRXZZZZZZZUA
A ADBGFIZUZZZZZA
A AABAADJMHZZZZZZA
AA
4.5|
A AAA BECISXZZZZZZO
AB A BFBFQGZZZZZZI

A A DAOJRYZZZZZn

AAA BOIFMOXURRCE

AA FBKJFKQFHBE
1 »
A A A A A BD AAK

0.17 0.55
20.0
0,05
1.8 6.0

HOURLY AVERAGE
67.0
221 .0
-------
FIGURE 2
MAXIMUM VERSUS HOURLY AVERAGE: KINCAID DATA
MAXIMUM
8000
400
20
* AB A
AA A BA A AAA
A I C BD DBBAA
A A B ACAD AADCCB A A
A DA ABBBCDABAGCDFCBA
A AAABCD CDACCCDBEEFJABCB
A ABGCDCOCFHEEDGMDFOFGBFA
B A CACBCFEECOJDHFGEEFDBADA
AAA CBBDAECACCFGGDKLEKGIEDDB
AAACBACCCDEEMGIHJILJSOKJDGCCCA
AAACDBBECBEFGlGFJHGFEtlGIJIGBB
B ABDCCCFDGICHFHQII1FJUHHHFAEA
BAB EBABIDBIMLDmiKGIIOMMKlFHKEE
BDDDFCBCEEEKGIIJMIIQIIIIMOHLJOLFA
ADBCDACAIGIIIPEFLKPOIPGlOnMISPIIHB
BEBEFCHGIIimFIIJIKKGMMIinKZIUZUZROA
IFFCGDFEKHHHJFIflQOKPOKQKUXZVHRHE
EFDGHMGFKLDKMGSHIMSSVUZUqqZIZZlD
CEADBA EDEDECIIAir.EIFGDMMIlHUPJJ
UliqCEirCUUFESIIUlJORSYRZOZZYZVOO
H DABCCOBJECBAHIICFDMEKGJKIQOKD
NZ OHIIIFHIMGlKIIPIIIPOQUaUZZZZZZLD

z
z z
0.05 0

z
z
z
.22

z
i
z
z

z
N
T
Z
Z

Z
J
J
z
z
z

GC
EJ
KG
RO
ZZ
ZZ

FEEBCFIHIGKGGHLQFQOVYZZZZI
IBHGIIJOrOPHMQSXZQZZZZZZ
MGHSJOUJHQVYZZZZ2ZZZZ
NZIRTRZXZZZZZZZZZ
ZZZZZZZZZZZZ
zzzz
1.0 4.5

20.0 90.0 403.0
HOURLY AVERAGE
-------
FIGURE 3
RESIDUALS OF ITERATED LOG MODEL: NEW YORK DATA
MISSING VALUE
COUNT
% COUNT/NOBS

BAR CHART ft BOXPLOT
2.25+x 1
.xx 91 o
.xxxxxxxxx 687
.xxxxxxxxxxxxxxxxxxxxxxxxx 2035
. XKKKKKXXXXKXKXXXXXXKXXXKXKXXKXXXXXKXX 3027 X + X
. XXXXXXXXXXXXXXXXXXKXKXXXXXXXXXXXXKKXKXXXXXXXXXXK 3915 + +
.xxxxxxxxxxxxxxxxxxxxxxxx 1893
.XXXXKKXX 592
.xx 124 0
.x 44 0
.x 130
-3.25+x 4 x
+ + + + •»• + + + +
X MAY REPRESENT UP TO 82 COUNTS
36
0.29
2.25 +
NORMAL PROBABILITY PLOT
xxxxxxxxxx
XXXXXXXX+
XXXXXXXX+
xxxxxxxxxx
XXXXXXXf
+XXXXXXXX
XXX
X
X
•3.25+x
+ -f + + + + + + + 1 +
-2 -1 +0 +1 +2
-------
FIGURE 1|

RESIDUALS OF ITERATED LOG MODEL: KINCAID DATA

MISSING VALUE
COUNT
% COUNT/NOBS

BAR CHART I BOXPLOT
2.25+x n
.xxxxxxxxxx 565
.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1868
. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXXXXXXXXXXKKXXX 2891 + +
.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1852 X—+--X
.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1782 I I
-0.75+xxxxxxxxxxxxxxxxxxxxxxxxxxxx 1677 + +
.xxxxxxxxxxxxxxxxxxxx 1164
.xxxxxxx 380
.xxxxxxxxx 505
.xxxx 214 0
.XKX 123 0
-3.75+x 4 0
+ + + + + + + + + —
X MAY REPRESENT UP TO 61 COUNTS
2741
17.37

NORMAL PROBABILITY PLOT
2.25+ 4
+ + + X X X X X X X X
xxxxxxxxxx
XXXXXX+
XXXXX++
XXX + +
-0.75+ xxxx*
+XXXXK
•f-f XXXX
++++XXXX
++XXXXX
XXX
3.75+x
+ + ^. + + + 4. + + + +
-2 -1 +0 +1 +2
-------
FIGURE 5
I
1
o
w
U
S
*X

h.
•3
w
LJ
ERRORS WITH USING FIXED ESTIMATES
(NEW YORK DATA, KINCAID PARAMETERS)
0.0
Hsur
First quartos
Medians
Third
FIGURE 6
ERRORS WITH USING FIXED ESTIMATES
(KINCAID DATA, NEW YORK PARAMETERS)
1.2
0.9 -

o.a -

0.7 -

0.6 -

0.3 -

0.4 -

0.3 -

0.2 -

0.1
L
i
10
First Quortilcs
i
100
M«dlcns
1000
• Third Quortil«s

147
-------
FIGURE 7

ERRORS WITH SHORT-TERM MONITORS
(NEW YORK DATA, ITERATED LOG MODEL)
£
X
2
_c

I
ui
1.* -

i.j -

1.2 -

1.1 -

1
0.9 -
0.8 -
0.7 -
0.6 -
0.5
1 st Quortil*
0 3 32

Hour Av«rag«
+• Median 3rd Quortil«

FIGURE 8

ERRORS WITH SHORT-TERM MONITORS
(KINCAID DATA, ITERATED LOG MODEL)
316
o
w
UJ
1 st Quartiie
-------
FIGURE 9
1.5
c
3
£
"*
O
kl
1.2 -

1.1 -

0.9 -

0.3 -

0.7 -

0.6 -

0.5 -
1st Quartil*
o

UJ
ERRORS WITH SHORT-TERM MONITORS
(NEW YORK DATA, LOG MODEL)
Hour Av«rog«
32
3rd Quarti!«
316
FIGURE 10
ERRORS WITH SHORT-TERM MONITORS
(KINCAID DATA, LOG MODEL)
316
Hour Av«rog«
1st Quartile
3rd Quartii*
149
-------
Error In Maximum
?
p a
io -•
J
o -
f
X <>
III
~
a
a
04 -
04

I II
Is)
J L
!•»
CO Lit

_J L
04 04
k> V
W
i-3
SG
O
ro *q

j^cj C^ *^3
M EC M
> O

MO »

O -»

M
to
o
>
t-
M
-------
FIGURE 12
E
a
"x
0
UJ
1.7 -
1.6 -
1.5 -
1.4 -
1.3 -
1.2 -
1.1 -
1 -
0.9 -
0.8 -
0.7 -
0.6 -
0.5 -
0.4 -
0.3 -
0.2 -
0.1 -
0

1 5t Quartil«
E
°x
I
_c
^
o
u
ERRORS WITH CHANGE OF TIME SCALE
(NEW YORK DATA, ITERATED LOG MODEL)
i
3
32
316
Hour Av«rag«
+ M«dten 3rd Quartil*
FIGURE 13
ERRORS WITH CHANGE OF TIME SCALE
(NEW YORK DATA, LOG MODEL)
1st Quartliw
316
Median
151
-------
S" '»
PERCENT OF SCALED DEVIATIONS
EXCEEDING NOMINAL PERCENTILES
x
i!
o —
r
i *
S ?r
o
o
tz)
GO
M
2S O M
O M O

M >T3 *O
t? £C t1!

o o
> O M
H Tl 2

- 2 M
O t-
2 t> M
M PI 00 G

f O PI
K td Tl
o a -»
» co -«^
!^ ""d o

is so t-
> o M
SJ M t>
> a
S H o
ra H M

M M M
» M a>
O
25
GO
-------
FIGURE 15

OBSERVED PERCENTILES OF SCALED DEVIATIONS:

GRAPHS OF MODELLED PERCENTILES

(NEW YORK DATA, KINCAID PARAMETERS)
eo
0) Cd
55 -J
O M
M H
H Z
•< Cd
>-( O
> cc
u u
a a.
< M
O S
eo o
z
b.
O C3
Z
H M
z a
U [d
U Cd
IB O
td X
a. u
0.3
a S'th p«rc«ntll«
& 70'tti p«rc»ntil«
0.6
1.6
4.0

Hour
10.0
•*• IS'th p«roent!l«
x 85'th p«ro«jntll
-------
DISCUSSION
R. CLIFTON BAILEY*
Health Care Financing Administration,
2-D-2 Meadows East, 6325 Security Blvd., Baltimore,
2 1 207
A recent editorial suggested that there be no
new data collection until present data sets are
thoroughly analyzed. This a tough standard. Even if
one attempted to thoroughly analyze present data
sets there would always be the possibility for more
analysis. This is especially true when one considers
analyses based on multiple data sets - meta
analyses.

The authors are to be commended for their
extensive data analyses. Of course some of us
remain disappointed that certain parametric and
nonparametrlc models were not explored because of
complexity. In stating the reasons for not doing
certain analyses, I think the authors take a narrow
view of what is possible. The issues may be more
ones of cost, time or expected return. This in no
way undermines the value of the extensive empirical
exploration of the data undertaken by the authors.

The authors set a task of establishing a
relationship between studies in which data are
recorded in short,-5-minute, intervals and the more
common choice of hourly summaries. They are
especially interested in establishing this
relationship because they believe It is necessary to
have information on the short time records to
establish health effects.

When the basic process is observed from
several points of view—different measurements,
such as the 5-mtnute and the hourly measurements,
should be expressible in terms of the common
process observed. The perspective of a common
process being observed from different points of
view provides the framework or model to work from
From this perspective, distinctly different
measurements or measurement processes generally
are not equally informative of the process and the
statistical properties of these measurement
processes are not the same. In analyzing the data,
it is important to remember that the measurement
process is part of the observation and more than
one quantity may be needed to describe the process.
The model for the process generally will be a
combination of stochastic and deterministic
components. An issue underlying the effort to
evaluate different methods of observation is that
precision as well as costs differ.

To deal with the basic problem, it helps to
have a model that consists of the underlying
process to be observed and the the measurements
used to observe the process. An evaluation with
such a model may suggest alternative measurement
strategies. For example, the measurement strategy
may consist of obtaining a fixed quantity over a
random time interval instead of obtaining a measure
over a fixed time interval. The idea is clearly
suggested by the analogy with a Poisson counting
process. In counting statistics, two strategies are
commonly used. One uses a fixed interval and
obtains the count while the other specifies a count
and measures the time to obtain this count. These
strategies can be evaluated to compare costs and
precision for a given situation.

The main concomitant measures explored
were time of day and a meteorological factor, wind
direction. These and other concomitant measures
need to be part of the model. I would like to see
more attention paid to concomitant factors at the
two sites.

The authors state In their conclusions," the
theory that there is a simple relationship between
the 5-minute and hourly averages, governed by the
same constants for all sites, Is not borne out by the
two data sets examined."

The conclusions and recommendations are
fundamentally sound. The authors recommend
calibrating a model for each site. In this way
differences among observed processes are properly
recognized even if they are not explicitly modeled.

"Disclaimer
The opinions are those of the author and do not
necessarily reflect the opinions of the Health Care
Financing Administration.
154
-------
SUMMARY OF CONFERENCE
John C. Bailar III
Department of Epidemiology & Biostatistics, McGill University
Montreal, PO Canada H3A 1A2
and
Office of Disease Prevention & Health Promotion, U. S. Public Health Serv.
Switzer Building, Room 2132, 330 C Street, S.W., Washington, D.C. 20201
This summary of the conference is intended to
provide some brief and integrated commentary on the
eight papers and eight discussions presented here
(1-16), plus some perspective on broader issues raised
by the papers as a group but not covered by any one of
them.
I will say much about unsolved problems. Of course,
the more one knows about a situation, the easier it is
to critique specific points and point to things that
should be done. This is good for bringing out issues,
but it can be bad if it creates an impression that
problems dominate solutions. I do not want my
comments here to be taken as a general indictment of
compliance sampling, a field that has recently made
much progress and is clearly making more.

Compliance Sampling in a Broader Context

The focus of the conference was compliance
sampling; this term includes both a) the general
assessment of how well we are doing in the
management of hazards and b) the generation of data
for individual action to enforce relevant laws and
regulations. My basic view, as a citizen and scientist
rather than a regulator, is that regulations should
provide and should be interpreted as firm limits rather
than targets, though they are often abused or
misinterpreted as targets. Examples include the
approaches of many states and cities to the control of
criteria air pollutants, and the apparent attitude of
parts of private industry that penalties for violations
are a business expense, to be balanced against
production volume and costs so as to maximize overall
profits. Carol Jones (17) has commented on the
effects of penalties on the probabilities of violations,
•and at this Conference Holley (11) has discussed such
approaches in the context of bubbles.
But these two purposes of compliance sampling —
overall assessment and enforcement — are broad and
vague. There was very little said at the Conference
about the ultimate purposes, or even the penultimate
purposes, of these activities. This is a potentially
serious gap, because what we do (or should do) in
compliance sampling can be profoundly affected by
matters beyond the short term goals of accurate
assessment of the distribution and level of specific
hazardous agents. Is our ultimate goal to protect
human health? If so, what does that mean for the
design of a program in compliance sampling, given our
limits on time, money, attention, and other resources?
How are concerns about cancer to be balanced against
concerns about (say) birth defects, or heart disease?
How are concerns about health in the U.S. to be
balanced against health in other countries? How are
we to balance short-term protection of our own health
against protection far into the future, even across
generations not yet born? How should we view and
assess the quality of outdoor (ambient) air vs. indoor
air (Hunt, 4)? There are similar very broad questions
about direct health effects vs. the indirect health
effects of unemployment and poverty, or restricted
choices of important consumer goods, on protection of
health. How are such matters to be developed in a
context of concern about protection of non-health
values, such as limiting the role of government in
controlling private behavior or in facilitating
compensation for harm actually inflicted (perhaps at
much lower overall cost to society), the effects of
unenforced or unenforceable directives on respect for
the law in other areas, and many other matters? I
recognize that such issues are generally to be dealt
with at the highest political and social levels, but their
resolution can have a profound effect on compliance
sampling, and compliance samplers should understand
the issues and express themselves as knowledgable
professionals. Whether an inspector chooses to return
to a plant that was in violation last month or to visit a
new plant may depend on how much the agency
depends on quiet negotiation vs. threats of legal
action. Whether limited resources are used to sample
for agents with acute, lethal, and readily identifiable
toxicity or for more common but less characteristic
and less devastating chronic disease may depend on
what recourse is available when injury is suspected.
Intensity of sampling (and of enforcement) in some
critical industry may even depend on the state of the
industry, and the state of the economy more generally.
The importance of defining the goals of compliance
sampling in the broadest way is clear. But we have not
dealt very well even with defining goals at more
technical levels. Suppose that a well-conceived
regulation sets a maximum exposure limit of 10 ppm.
Should compliance sampling be designed to give only a
yes/no answer, perhaps expressed as a Bernoulli
variable, about whether some stream, or factory, or
city is in violation? Should we instead try to
determine the mean exposure over some defined region
of time and space? The mean and variance, or the
tails generally? Should we go only for the order
statistics, especially the extremes (which will
generally provide a moving target as problems are
solved and compliance improves)? Do we need the
whole probability distribution of values? Surely a
yes/no answer can lead to much nonsense, as it did in
some erroneous interpretations by the news media of a
recent NAS report on drinking water, and some aspects
of the probability distribution of values need more
attention than others, but surely there is also a point
where we have learned enough about that distribution,
and must invest additional resources in the study of
other problems.
Gilbert et al. illustrate this general need for precise
goals in their discussion (9) of sampling soil for
radioactivity. Was the underlying goal to determine
whether radiation levels at any square inch of surface
were above the standard? Was it to average, or
integrate, over some unspecified larger area? Was it
to determine means and variances, or other aspects of
the distribution? Here, maybe the goal was in fact to
determine means for small areas, but we would still
need to know more about the problem, especially about
the small-scale variability of contamination, to
determine an appropriate sampling plan. For example,
if contamination is nearly uniform within each area for
155
-------
which a mean is required, one test per sampled area
may be enough. Conversely, if there is much chance of
having one very small, very hot rock (of, say, 10 ,
10 , or 10 pCi/g) one might have to sample on a
much finer grid. The general issue here is the scope or
range for averaging (or otherwise "smoothing") results.
Chesson (10) has also commented on needs for relating
statistical procedures to specific problems and
contexts. Holley's work on the bubble (11) deals with a
kind of averaging, but this Conference as a whole has
given rather little attention to even this level of goals.
Likewise, there was little discussion of how
strategies for compliance sampling must accommodate
the likelihood of legal challenge. A probable freedom
from such challenge may well have given Gilbert (9)
considerable latitude to be complex; to use a great
deal of peripheral information, and to interpret EPA's
raw standard as he settled on the scope and
distribution of sampled areas, to decide that he could
ignore possible variation over time, and to develop a
special sampling protocol.
At this point, one may begin to wonder about the
role of statistics (and statisticians) in compliance
sampling. I believe very strongly that the most visible,
and apparently the most characteristic aspects of
statistics - modeling of random variation, algebra, and
computation - are only a small (though essential) part
of the field. Statistics is, rather, the art and science
of interpreting quantitative data that are subject" to
error, and indeed, in the study of environmental
hazards, random error may account for only a tiny part
of the uncertainty. Ross discussion (12) brings out
clearly the real potential of statistics in the design of
bubbles as well as the way bubbles ignore some
important distributional issues. • •
I turn now to three sets of generic problems in
compliance sampling: those in policy and concept, in
unpredictable (stochastic) influences on the data, and
in applications of theory. These sets of problems are
broad and deep, and statistical thinking has a large and
critical role in each.

Policy and Conceptual Aspects of Compliance Sampling

The first set is related to policy and concepts. I
have already referred to the differences between broad
public goals and more narrowly statistical goals, but
there are many intermediate questions about what it is
that one wants to accomplish, and what is feasible.
Approaches to evaluation in many fields fall rather
well into three categories: evaluation of structure, of
process, and of outcome. Each can be defined at
multiple levels, but here it may be most useful to
equate structure to the chemical methods, engineering
and mechanical structures, and other aspects of the
generation of hazardous agent; process to the emission
or other release of the hazard into the community, its
transport after release, and exposure levels where
people are in fact exposed; and outcome to the human
health endpoints (or other endpoints) that are the more
fundamental objects of concern. Compliance sampling
focuses on process (in this context), but it is not clear
that there has been much hard policy thinking about
whether this is the best way to attain the still rather
fuzzy goals of the activity.
One aspect of this matter is the need to consider
sensitive subgroups of the population. Such subgroups
may not always be evident (as seems likely with some
carcinogens), and their existence may not even be
suspected, but somehow we must recognize not only
that some people get sick from exposures that do not
affect others, but that not all persons have the same
probability of responding to some toxic agent.
A related point is "conservatism" in regulation, and
its reflections in compliance sampling. Conservatism
has several purposes, including the protection of
sensitive subgroups, and the need to provide a cushion
against random and nonrandom excursions of exposure
to higher levels. I believe that its main use, however,
is to protect us against our ignorance, not against our
failures. We simply don't know what goes on within
the human body at low exposure levels of carcinogens
and other toxic agents, and choice of the wrong
statistical model could lead to risk estimates that are
wrong by orders of magnitude. Unfortunately,
underestimates of risk will tend to be far more serious
than overestimates if one works on a log scale, as is
implied by the phrase "orders of magnitude."
Implications of conservatism for compliance sampling
are substantial. It does little good to set conservative
limits for exposure if sampling, and hence
enforcement, do not follow. It is not at all clear that
regulatory agencies have been consistently attentive to
the logical link between conservatism in risk
assessment and conservatism in enforcement; indeed,
some agencies may have it backwards, and believe that
conservative exposure limits actually reduce the need
for compliance sampling. There is scope here for a
new study of how to trade off the risks and costs of
(say) a higher exposure limit plus more rigorous
sampling to assure compliance vs. a lower exposure
limit that is to be less vigorously enforced.
Another policy and conceptual issue in compliance
sampling has to do with distributional effects. When
dose-response curves are linear at low doses, the mean
exposure level in a population determines the expected
number of adverse events, but it may still matter a
great deal how the risk is distributed over the
population. For example, it is no longer acceptable (at
least in the U.S.) to concentrate the risks of toxic
exposures on the lowest economic and social groups.
Nor does one often hear arguments in favor of placing
a new toxic hazard in an area already contaminated on
grounds that a little more would not make much
difference, even though this might be rational if there
is reason to think that the risk is concentrated on a
small, sensitive subpopulation that has already been
"exhausted" by prior exposures.
Time does not permit more than a listing of some
other policy issues in compliance sampling. How
should ambient "natural" exposures to some agents,
such as ozone, be accommodated in protocols for
compliance sampling? What do we mean, in
operational terms, by an "instantaneous" exposure?
Marcus gave a strong start to the conference with his
discussion of the need to design compliance sampling
programs in light of the different time scales for
environmental exposure, biologic response, and
regulatory action (1), while Hertzberg (2) has pointed
to some of the practical problems of doing so. How
should, or how can, model uncertainty be built into
sampling plans, including models of distribution and
exposure as well as models of outcome?

Stochastic Aspects of Compliance Sampling

Issues to this point have not depended on any aspect
of uncertainty in measurement or on random
variability in the substance understudy. The steps
from a precise deterministic model to an uncertain
stochastic model introduce new issues. What are the
roles of deterministic vs. stochastic models, and how
156
-------
should those roles affect compliance sampling? It is
perhaps understandable that in enforcement actions,
compliance data are treated as free of random
variation, but surely this matter needs some careful
thought.
Another issue arises from gaps in the data — gaps
that are sometimes by design and sometimes not.
There was little attention to this matter in this
conference. Though every applied statistician is
familiar with the problem,, fewer are aware of the
theoretical and applied approaches that have been
worked out in recent years. These range from
modeling the whole data set and using iterative
maximum likelihood methods to estimate missing
values (the E-M algorithm) to the straightforward
duplication of some nearby value, which may be in
error but not as far off as ignoring the missing
observations, which in practice generally treats them
as if they had the mean value for that variable ("hot
deck" methods). Little and Rubin (18) provide an
introduction to this topic, and techniques analogous to
kriging, a method often used in geostatistics, may also
be useful (19).
Unfortunately, the probability distributions of
greatest interest in compliance sampling may often be
hard to work with at a practical level. They tend to be
"lumpy" in both space and time, with extreme
variability, long tails to the right, and big coefficients
of variation. Correlation functions over space and
time (as in kriging) are important, but may themselves
need to be estimated anew in each specific application,
with detailed attention to local circumstances.
One practical consequence of dealing with "difficult"
distributions is the loss of applicability of the Gaussian
distribution (or at least loss of some confidence in its
applicability), even in the form of the central limit
theorem. Another is the loss of applicability of linear
approaches, which have many well-known practical
advantages with both continuous data and discrete
(even non-ordered) classifications. Nonlinear analogs
of, say, the general linear model and the loglinear or
logit approaches have neither the theoretical
underpinnings, nor the range of packaged general-use
computer programs, nor the background of use and the
familiarity of the linear approaches.
Given a set of data and a need to "average," what
kind of average is appropriate? Some obvious
questions have to do with ordinary weighted averages,
others with moving averages. Still other questions
have to do with the form of the averaging function:
arithmetic, harmonic, geometric, etc. Geometric
means are sometimes used in compliance sampling, as
Wyzga has noted here (15), but they may often be quite
unsuitable precisely because their advantage in some
other situations - that they reduce the importance of
high outliers - obscures the values of most concern.
When health is at issue, I want a mean that will attend
more to the upper tail than the lower tail. If six values
on six successive days are (for example) 1,2, 3, 4, 6,
and 12, the geometric mean is 3.46, distinctly less than
the arithmetic mean of 4.67, but it is the 6 and 12 that
may matter most. An average that works opposite to
the geometric mean seems better, such as the root
mean square (5.92 in the example above) or root mean
cube (6.99 above). I was glad indeed to learn recently
that the geometric mean has been abandoned in
measures of air particulates.
Many statistical approaches incorporate an
assumption that the variance of an observation is
independent of its true value. This may rarely be the
case. However, lack of uniformity in variance may
often have little consequence, and in some other cases
it can be readily dealt with (such as by log or square
root transforms). But there may be serious
consequences if the nonuniformity or the statistical
methods have statistical properties that are not
understood, or are not acceptable. For example, in the
6-value numeric example above, if variances are
proportional to the observed values, a log transform
may produce values of approximately equal variance;
however, the arithmetic mean of logged values is
equivalent to the geometric mean of the original
values, so that a different approach may be better.
Problems are even greater, of course, when it is biases
rather than random error that may depend on the
unknown true values. Nelson's paper here (3) is rich in
these and other statistical questions as well as policy
questions.

Empirical Aspects of Compliance Sampling

The compliance sampler must attend to a wide
variety of issues of direct, practical significance that
derive from the context in which the data are to be
collected and used. One is that results must be
prepared so as to withstand legal challenge and,
sometimes, political attack. A practical consequence
is that much flexibility and much scope of application
of informed judgment are lost. There may also be
extra costs for sample identification, replicate
measurement, and extra record keeping that help to
validate individual values but reduce resources for
other sampling that may contribute as much to the
public health. This is in part a consequence of
competing objectives within the general scope of
compliance sampling. What is the optimum mix of
finding indicators of many preventable problems and
applying gentle persuasion to remove them vs. nailing
down a smaller number of problems and ensuring that
the data can be used in strong legal action if need be?
A second broadly empirical issue is the whole range
of chemical and physical limitations on the detection
and accurate measurement of hazardous substances.
This is not the problem it once was — indeed, some
observers believe that increased sensitivity of methods
has led to the opposite problem of overdetection and
overcontrol — but some substances are still difficult
to measure at low concentrations by methods that are
accurate, fast, and inexpensive. Thus, measurement
remains a serious problem. An example is USDA's
program for assessing pesticide residues in meat and
meat products, which is limited by high costs to about
300 samples per year for the general surveillance of
each major category (e.g., "beef cattle.") Thus there is
a close link between the setting of standards (what is
likely to be harmful, to whom, in what degree, and
with what probability?) and the enforcing of standards
(what violations are to be found, to what degree of
precision, and with what probability?). A standard not
enforceable because of limits on laboratory methods is
no better, and may be worse, than no standard at all,
and should be a candidate for replacement by some
other method of controlling risk (e.g., process
standards, or engineering controls). Sometimes, of
course, deliberately insensitive methods can be
cultivated and put to use. An example is FDA's
"sensitivity of the method" approach to carcinogens in
foods. Another real example, though slightly less
serious here, was the step taken by the State of
Maryland to improve its performance in enforcing
federal highway speed limits: Move radar detectors
from the flat straightaways to places where many
157
-------
drivers slow down anyway, such as sharp curves and the
tops of hills, as other states had done long before. The
incidence of detection of speed violations dropped
markedly, and Maryland was suddenly in compliance
with Federal standards. Creative design of a
compliance sampling plan can produce pretty much
whatever the designer wants, and I take it that a part
of our task here is to develop approaches that
discourage, inhibit, and/or expose the cynical
manipulation of sampling procedures.
Sometimes, methods exist but for other reasons the
data have not been collected. One example is the
distribution of various foreign substances in human
tissues. These include heavy metals, pesticides, and
radioactive decay products; none of these had been
adequately studied to determine the probability
distribution of body burdens in the general population.
Reasons are varied and deep, but include cost,
problems of storage, control of access to banks of
human tissues (an expendable resource), and ultimately
the problems of procuring enough of the right kind of
material from a fully representative sample of people.
The need for detailed human data will surely grow with
the growth of new approaches to risk assessment
(especially of carcinogens), and compliance sampling
may well be involved. Toxicokinetics, in particular,
often demands human data; mechanisms can be
examined in other species, but human sensitivity,
human rate parameters, and human exposure can be
determined only by study of human circumstances and,
sometimes, human specimens.
Compliance sampling is indeed an activity loaded
with problems. Overall, there is a clear need for
substantially more thought and research on the
empirical issues raised by compliance sampling. Wyzga
(15) and Bailey (16) provides a fresh view of many of
these.

Overview of the Overview

Where do we go from here? It is easy to call for
more and better compliance sampling, and to show how
we could then do more and better things. That will not
get us far in this age of constrained resources. I
believe that we need some other things first, or instead.
First is a broader and deeper view of compliance
sampling. Many agencies and programs do such
sampling, but almost always with a narrow focus on the
enforcement of one or another regulation. This view
•should be broader — to include other substances, other
agencies, and other objectives (including research) —
and it should be deeper, so that issues of compliance
sampling are considered at each stage from initial
legislation onward, and plans are integrated with all
other relevant aspects of Agency activities.
Compliance sampling simply must not be treated like a
poor relative — tolerated but not really welcome, and
largely ignored until its general shabbiness or some
genuine scandal forces a response.
A broader view of compliance sampling might, for
example, support Nelson's comments on extensions
from existing data to broader groups, even to national
populations (3). Nelson's paper as a whole is unusually
rich in both statistical questions and policy questions.
While the matter seems to have received little specific
discussion, it seems to me that the maximum useful
geographic range or population size for compliance
sampling, and maybe the optimum too, is the same as
the maximum feasible scope of specific control
measures. Thus, national data may be most critical in
drafting or revising national laws and regulations, but
local data are indispensable for understanding local
needs, monitoring local successes, and enforcing local
sanctions.
Another aspect of broadening our view of
compliance sampling is the need to optimize sampling
strategies for attaining specific, carefully elaborated
goals. Thus, there might be reason in public policy to
extend the use of weighted sampling, with more effort
to collect samples likely to be out of compliance. This
approach seems to have substantial informal use,
especially when inspectors have considerable latitude
to make decisions in the field, but has had less in the
way of formal attention.
Still another aspect is the need for empirical study
of the probability distributions that arise in the
samples, and the development of sampling plans and
analytic approaches that accommodate those
distributions. Should one take a "point" sample of just
the size needed for testing, or take a more distributed
sample, mix it, and test an aliquot? Is there a larger
role for two-stage sampling, in which the selection of
a general area for examination is followed by the
selection of sub- areas? Or a role for two-stage
testing, in which aliquots of several samples are mixed
and tested for the presence of some offending
substance, with further testing of individual samples
only if the group result is positive?
Perhaps the most fundamental need in developing a
more comprehensive view of compliance sampling is
for careful consideration of the role of genuinely
random sampling, as opposed to haphazard or
subjectively selected samples of convenience. One of
the biggest surprises to me at this Conference was the
lack of attention to the need to guarantee genuinely
random sampling, though it provides the only
acceptable justification for the statistical measures,
such as p-values and confidence limits, that have been
tossed about quite freely here. As a part of this, there
is a clear need for new approaches to the computation
of variances and other functions of the data, which will
force demands for some kinds of randomization in the
sampling. Gilbert's problem in particular (9) calls for
highly sophisticated statistical modeling and analysis.
Second is a deeper consideration of how compliance
sampling can be made more productive than in just the
detection of violations, and how it can support broader
Agency and national objectives. I have already
referred to several aspects of this, but some points
still require comment. One is the value of designing
compliance programs (including sampling) that
encourage both more and better monitoring and also
encourage what might be called supercompliance.
Response to the findings of a particular sample or
pattern of samples may be yes-or-no, but surely one
should put greater weight on finding the bigger
violations. Frank has referred to this (13), with special
comment about' the potential value of variable
frequency (and intensity) in sampling, while Warren
(14) has noted some practical obstacles.
Some statistical tools do exist to aid in increasing
the broad utility of data from compliance sampling.
Bisgaard (5) and Price (7) have each presented reasons
for more careful attention to the operating
characteristics (OCs) of programs for compliance
sampling. OCs might in fact be a good way to
communicate with Agency administrators and others
about the consequences of choosing one or another
approach to monitoring, though Johnson (6) has
emphasized the need for attention to the upper tail of
exposure rather than the mean. It seems to me that
the question of tail vs. mean may well depend on the
158
-------
health endpoint in question; an effect such as cancer
that is considered a function of lifetime exposure may
well be approached by means, while effects that really
depend on short-term peaks should be regulated in
terms of peaks, though this may create some problems
when both kinds of endpoints must be managed in the
same exposure setting. Bisgaard and Hunter (5) are
firmly on the right track with their insistence on a
more comprehensive view that integrates sampling
protocols, calibration of the tools and processes, and a
decision function to determine responses. This also
underlines the need for clear articulation of goals;
otherwise, Bisgaard's approach cannot be
implemented. Johnson (6) also points to the need for
adequate attention to other matters, too, including the
political situation, pollutant behavior, sampling
constraints, and the objectives of the standard,
Flatman (8) also emphasizes the need for constant
attention to the practicalities of solutions to real, and
different, problems.
Other statistical tools of potential value in
compliance sampling can be found in the
epidemiologist's approach to diagnostic testing, with
an insistence that policy decisions about testing be
based on sound data on sensitivity, specificity, and
positive and negative predictive values. These
concepts have proved invaluable in policy decisions
about medical screening, and they have similar
potential to sharpen decisions about environmental
screening.
Third, and my final point, is a plea that regulatory
agencies explore the potential of statistical decision
theory in their approaches to compliance sampling,
including explicit consideration of the value of new
information. The -emphasis this will put on such
matters as prior distributions, objective functions, cost
functions, and balancing of disparate endpoints — all
of which are already major elements in setting policy
about compliance sampling — can only be good.
Among other benefits, decision theory will tend to
direct Agency attention to those points where the
biggest improvements can be made, and away from
both fine-tuning of little things with little potential
profit and spinning wheels over big things that can't be
settled anyway.
This would again direct attention to how prior
distributions for the probability, location, and degree
of violation are developed and used. Thus, Gilbert
samples from plots that are next to plots already
known to be in violation; the frequency of air sampling
is tied to the frequency of past violations; and
experienced plant inspectors come to know where the
bodies may be buried and how to look for them.
Overall, this Conference was eminently successful in
bringing out a broad range of problems, issues, and
research needs. It has also provided some answers,
though the most important products of our work here
will continue to unfold for years to come. Our Chair,
speakers, and discussants deserve much thanks for a
job well done.
BIBLIOGRAPHY

1. Marcus AH. Time Scales: Biological,
environmental, regulatory. This conference.
2. Hertzberg RC. Discussion of paper by Marcus.
This conference.
3. Nelson WC. Statistical issues in human exposure
monitoring. This conference.
4. Hunt WF. Discussion of paper by Nelson. This
conference.
5. Bisgaard S, Hunter WG. Designing environmental
regulations. This conference.
6. Johnson WB. Discussion of paper by Bisgaard and
Hunter. This conference.
7. Price B. Quality control issues in testing
compliance with a regulatory standard:
Controlling statistical decision error rates. This
conference.
8. Flatman GT. Discussion of paper by Price. This
conference.
9. Gilbert RO, Miller ML, Meyer HR. On the design
of a sampling plan to verify compliance with EPA
standards foe radium-226 in soil at uranium mill
tailings remedial action sites. This conference.
10. Chesson J. Discussion of paper by Gilbert, Miller,
and Meyer. This conference.
11. Holley JW, Nussbaum BD. Distributed
compliance: EPA and the lead bubble. This
conference.
12. Ross NP. Discussion pf paper by Holley and
Nussbaum. This conference.
13. Frank NH, Curran TC. Variable sampling
schedules to determine PM]Q status. This
conference.
14. Warren J. Discussion of paper by Frank and
Curran. This conference.
15. Hamraerstrom TS, Wyzga RE. Analysis of the
relationship between maximum and average in S02
time series. This conference.
16. Bailey RC. Discussion of paper by Hammerstrom
and Wyzga. This conference.
17. Jones CA. Models of Regulatory enforcement and
compliance, with an application to the OSHA
Asbestos Standard. Harvard University Economics
Department, Unpublished doctoral dissertation,
1982.
18. Little RJA, Rubin DB. Statistical Analysis with
Missing Data. John Wiley, 1987.
19. Jernigan RW. A Primer on Kriging. Statistical
Policy Branch, US Environmental Protection
Agency, 1986.
159
-------
APPENDIX A: Program
Monday. October 5

INTRODUCTION

9:00 a.m. Paul I. Feder, Conference Chairman, Battelle Columbus Division
Dorothy G. Wellington, U.S. Environmental Protection Agency

I. TOXICOKINETIC AND PERSONAL EXPOSURE CONSIDERATIONS IN THE DESIGN
AND EVALUATION OF MONITORING PROGRAMS

9:10 a.m. Time Scales: Biological, Environmental, Regulatory
Allan H. Marcus, Battelle Columbus Division
DISCUSSION
Richard C. Hertzberg, U.S. EPA, ECAO-Cincinnati

10:15 a.m. BREAK

10:30 a.m. Some Statistical Issues in Human Exposure Monitoring
William C. Nelson, U.S. EPA, EMSL-Research Triangle Park
DISCUSSION
William F. Hunt, Jr., U.S. EPA, OAQPS-Research Triangle Park

12:00 noon LUNCHEON

H. STATISTICAL DECISION AND QUALITY CONTROL CONCEPTS IN DESIGNING
ENVIRONMENTAL STANDARDS AND COMPLIANCE MONITORING PROGRAMS

1:00 p.m. Designing Environmental Regulations
Soren Bisgaard, University of Wisconsin-Madison
DISCUSSION
W. Barnes Johnson, U.S. EPA, OPPE-Washington, D.C.

2:15 p.m. BREAK

2:30 p.m. Quality Control Issues in Testing Compliance with a Regulatory Standard:
Controlling Statistical Decision Error Rates
Bertram Price, Price Associates, Inc.
DISCUSSION
George T. Flatman, U.S. EPA, EMSL-Las Vegas

m. COMPLIANCE WITH RADIATION STANDARDS

3:40 p.m. On the Design of a Sampling Plan to Verify Compliance with EPA
Standards for Radium-226 in Soil at Uranium Mill Tailings Remedial
Action Sites
Richard O. Gilbert, Battelle Pacific Northwest Laboratories; Mark L.
Miller, Roy F. Weston, Inc.; H.R. Meyer, Chem-Nuclear, Inc.
DISCUSSION
Jean Chesson, Price Associates, Inc.

5:00 p.m. RECEPTION 160
(see over)
-------
Tuesday. October 6

IV. THE BUBBLE CONCEPT APPROACH TO COMPLIANCE

9:00 a.m. Distributed Compliance—EPA and the Lead Bubble
John W. Holley, Barry D. Nussbaum, U.S. EPA, OMS-Washington, D.C.
DISCUSSION
N. Philip Ross, U.S. EPA, OPPE-Washington, D.C.

10:15 a.m. BREAK

V. COMPLIANCE WITH AIR QUALITY STANDARDS

10:30 a.m. Variable Sampling Schedules to Determine PMio Status
Neil H. Frank, Thomas C. Curran, U.S. EPA, OAQPS-Research Triangle
Park
DISCUSSION
John Warren, U.S. EPA, OPPE-Washington, D.C.

12:00 noon LUNCHEON

1:00 p.m. The Relationship Between Peak and Longer Term Exposures to Air
Pollution
Ronald E. Wyzga, Electric Power Research Institute, Thomas S.
Hammerstrom, H. Daniel Roth, Roth Associates
DISCUSSION
R. Clifton Bailey, U.S. EPA, OWRS-Washington, D.C.

2:15 p.m. BREAK

SUMMARY OF CONFERENCE

2:30 p.m. John C. Bailar III, McGill University, Department of Epidemiology and
Biostatistics

This Conference is the final in a series of research conferences on interpretation of
environmental data organized by the American Statistical Association and supported by a
cooperative agreement between ASA and the Office of Standards and Regulations, under
the Assistant Administrator for Policy Planning and Evaluation, U.S. Environmental
Protection Agency.
Conference Chairman and Organizer:
Paul I. Feder, Battelle Columbus Division
!fg|!
161
-------
APPENDIX B: Conference Participants
Ruth Allen
U.S. EPA
401 M Street, S.W., RD-680
Washington, DC 20460

Stewart J. Anderson
CIBA-GEIGY Corporation
556 Morris Avenue, SIC 249
Summit, NJ 07901

John C. Bailar ffl
(McGill University)
468 N Street, S.W.
Washington, DC 20024

R. Clifton Bailey
Environmental Protection Agency
401 M Street, S.W., WH-586
Washington, DC 20460

T. O. Berner
Battelle Columbus Division
2030 M Street, N.W., Suite 700
Washington, DC 20036

Soren Bisgaard
University of Wisconsin
Center for Quality & Productivity
Improvement
Warf Building, 610 Walnut Street
Madison, WI 53705

Jill Braden
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20852

Chao Chen
U.S. EPA
401 M Street, S.W., RD-689
Washington, DC 20460

Jean Chesson
Price Associates, Inc.
2100 M Street, N.W., Suite 400
Washington, DC 20037

James M. Daley
(U.S. EPA)
12206 Jennel Drive
Bristow, VA 22012
Susan Dill man
U.S. EPA
401 M Street, S.W., TS-798
Washington, DC 20460

Paul I. Feder
Battelle Columbus Division
505 King Avenue
Columbus, OH 43201

George T. Flatman
U.S. EPA, EMSL-LV
P.O. Box 93478
Las Vegas, NV 89193-3478

Paul Flyer
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20852

Ruth E. Foster
U.S. EPA-OPPE/OSR
401 M Street, S.W.
Washington, DC 20460

Neil H. Frank
U.S. EPA, OAQPS
MD-14
Research Triangle Park, NC 27711

Richard O. Gilbert
Battelle Pacific Northwest Lab
P.O. Box 999
Richland, WA 99352

J. Hatfield
Battelle Columbus Division
2030 M Street, N.W., Suite 700
Washington, DC 20036

Richard C. Hertzberg
U.S. EPA, ECAO
Cincinatti, OH 45268

John W. Holley
(U.S. EPA-OMS)
9700 Water Oak Drive
Fairfax, VA 22031

William F. Hunt, Jr.
U.S. EPA, OAQPS
MD-14
Research Triangle Park, NC 27711
162
-------
Thomas Jacob
Viar and Company
209 Madison
Alexandria, VA 22314

Robert Jernigan
American University
Department of Mathematics
and Statistics
Washington, DC 20016

W. Barnes Johnson
U.S. EPA, OPPE
401 M Street, S.W., PM-223
Washington, DC 20460

Herbert Lacayo
U.S. EPA
401 M Street, S.W., PM-223
Washington, DC 20460

Emanuel Landau
American Public Health Association
1015 15th Street, N.W.
Washington, DC 20005

Darlene M. Looney
CIBA-GEIGY Corporation
556 Morris Avenue, SIC 257
Summit, NJ 07901

Allan H. Marcus
Battelle Columbus Division
P.O. Box 13758
Research Triangle Park, NC 27709-2297

Lisa E. Moore
U.S. EPA
26 W. Martin Luther King Jr. Drive
Cincinnati, OH 45268

William C. Nelson
U.S. EPA, EMSL
MD-56
Research Triangle Park, NC 27711

Barry D. Nussbaum
U.S. EPA
401 M Street, S.W., EN-397F
Washington, DC 20460

Harold J. Petrimoulx
Environmental Resources
Management, Inc.
999 West Chester Pike
West Chester, PA 19382

Bertram Price
Price Associates, Inc.
2100 M Street, N.W., Suite 400
Washington, DC 20037
Dan Reinhart
U.S. EPA
401 M Street, S.W., TS-798
Washington, DC 20460

Alan C. Rogers
U.S. EPA
401 M Street, S.W.
Washington, DC 20460

John Rogers
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20852

N. Philip Ross
U.S. EPA, OPPE
401 M Street, S.W., PM-223
Washington, DC 20460

Brad Schultz
U.S. EPA
401 M Street, S.W., TS-798
Washington, DC 20460

John Schwemberger
U.S. EPA
401 M Street, S.W., TS-798
Washington, DC 20460

Paul G. Wakim
American Petroleum Institute
1220 L Street, N.W.
Washington, DC 20005

John Warren
U.S. EPA, OPPE
401 M Street, S.W., PM-223
Washington, DC 20460

Dorothy G. Wellington
U.S. EPA
401 M Street, S.W., PM-223
Washington, DC 20460

Herbert L. Wiser
U.S. EPA
401 M Street, S.W., ANR-443
Washington, DC 20460

Ronald W. Wyzga
Electric Power Research Institute
P.O. Box 10412
Palo Alto, CA 94303

Conference Coordinator:
Mary Esther Barnes
American Statistical Association
1429 Duke Street
Alexandria, VA 22314-3402

163
-------