x>EPA
Agency
Washington, DC 20460
230-030-47
Statistical Policy Branch
ASA/EPA Conferences on
Interpretation of
Environmental Data
IV Compliance Sampling
October 5 -6th, 1987
-------
PREFACE
This volume is a compendium of the papers and commentaries that were presented at
the fourth in a series of conferences on interpretation of environmental data conducted by
the American Statistical Association and the U.S. Environmental Protection Agency's
Statistical Policy Branch of the Office of Standards and Regulations/Office of Policy,
Planning, and Evaluation. The ASA Committee on Statistics and the Environment
developed this series and has general responsibility for it.
The purpose of these conferences is to provide a forum in which professionals from
the academic, private, and public sectors exchange ideas on statistical problems that
confront EPA in its charge to protect the public and the environment through regulation of
toxic exposures. They provide a unique opportunity for Agency statisticians and scientists
to interact with their counterparts in the private sector.
The eight papers and accompanying discussions in this volume of proceedings are
about "compliance sampling" to determine how well environmental standards are met.
These papers provide valuable guidance in the planning of future environmental studies.
The papers address many aspects of compliance, and are intended for statisticians involved
in planning how to ascertain general levels of compliance and identify noncompliers for
special attention. Such work is inherently statistical and must be based on anticipation of
the statistical analysis to be performed so that the necessary data can be collected. These
proceedings should help the statistician anticipate the analyses to be performed. In
addition, the papers discuss implications for new studies. No general prescriptions are
offered; none may be possible.
The emphases in these papers are quite different. No two authors have chosen the
same aspect of compliance to examine. This diversity suggests that a major challenge is
to consider carefully each study aspect in the planning process. Meeting this challenge
will require a high degree of professionalism from the statistical community.
The conference itself and these proceedings are primarily the result of the efforts of
the authors and discussants. The discussants not only describe how their views differ from
those of the authors, but provided independent ideas as well. The coordination of the
conference and of the publication of the proceedings was carried out by Mary Esther
Barnes and Lee L. Decker of the ASA staff.
The views presented in this conference are those of individual writers and should not
be construed as reflecting the official position of any agency or organization.
This fourth conference, "Compliance Sampling," was held in October 1987. Others
were the first conference, "Current Assessment of Combined Toxicant Effects," in May
1986, the second , "Statistical Issues in Combining Environmental Studies," in October
1986, and the third , "Sampling and Site Selection in Environmental Studies," in May 1987.
John C. Bailar HI, Editor
Chair, ASA Committee on Statistics and the Environment
Department of Epidemiology and Biostatistics, McGill University
and
Office of Disease Prevention and Health Promotion
U.S. Department of Health and Human Services
-------
INTRODUCTION
The general theme of the papers and associated discussions is the design and
interpretation of environmental regulations that incorporate, from the outset, statistically
valid compliance verification procedures. Statistical aspects of associated compliance
monitoring programs are considered. Collectively the papers deal with a wide variety of
environmental concerns including various novel approaches to air emissions regulations and
monitoring, spatial . sampling of soil, incorporation of potential health effects
considerations into the design of monitoring programs, and considerations in the statistical
evaluation of analytical laboratory performance.
Several papers consider aspects of determining appropriate sampling frequencies.
Allan Marcus discusses how response time frames of potential biological and health effects
due to exposures may be used to decide upon appropriate monitoring interval time frames.
He demonstrates how biokinetic modeling may be used in this regard.
Neil Frank and Tom Curran discuss factors influencing required sampling frequencies
to detect paniculate levels in air. They emphasize the need to specify compliance
monitoring requirements right at the time that the air quality standard is being
formulated. They suggest an adaptive monitoring approach based on site specific
requirements. Those sites that are clearly well above or well below the standard need be
sampled relatively infrequently. Those sites that straddle the standard should be sampled
more frequently to decrease the probabilities of misclassification of
attainment/nonattainment status.
Tom Hammerstrom and Ron Wyzga discuss strategies to accommodate situations
when Allan Marcus' recommendations for determining sampling frequency have not been
followed, namely when monitoring data averaging time intervals are very long relative to
exposure periods that may result in adverse physiological and health consequences. For
example, air monitoring data may be averaged over one hour intervals but respiratory
symptoms may be related to the highest five minutes of exposure during that hour. The
authors model the relationships between peak five minute average concentration during an
hour and the overall one hour average concentration under various stochastic process
assumptions. They combine monitoring and modeling to predict short term peak
concentrations on the basis of observed longer term average concentrations.
Bill Nelson discusses statistical aspects of personal monitoring and monitoring
"microenvironments" such as homes and workplaces to assess total personal exposure.
Such data are very useful for the exposure assessment portions of risk assessment. Dr.
Nelson compares and contrasts personal monitoring with the more traditional area
monitoring. The availability of good personal exposure data would permit much greater
use of human epidemiologic data in place of animal toxicologic data in risk assessment.
Richard Gilbert, M. Miller, and H. Meyer discuss statistical aspects of sampling
"frequency" determination in the spatial sense. They consider the development of a soil
sampling program to estimate levels of radioactive solid contamination. They discuss the
use of multilevel acceptance sampling plans to determine the compliance status of
individual soil plots. These plans have sufficient sensitivity to distinguish between
compliant/noncompliant plots yet result in substantial sample size economies relative to
more naive single stage plans.
-------
regulation. The "bubble" concept specifies that average environmental standards must be
maintained across a dimension such as area, time, auto fleet, or industry group. This
dimension constitutes the "bubble." Lack of compliance in one part of the bubble may be
offset by greater than minimum compliance in other parts. Emissions producers have the
option to trade, sell or purchase emissions "credits" with, from, or to other emissions
producers in the bubble. Alternatively, they may "bank" emissions "credits" for use in a
future time period. Such an approach to regulation greatly enhances the emissions
producers' flexibility, as a group, to configure their resources so as to most economically
comply with the overall standard.
Soren Bisgaard and William Hunter discuss statistical aspects of the formulation of
environmental regulations. They emphasize that the regulations, including their
associated compliance monitoring requirements, should be designed to have satisfactory
statistical characteristics. One approach to this is to design regulations that have
operating characteristic curves of desired shape. Alternative candidate formulations can
be compared in terms of the shapes of their associated operating characteristic curves.
Bert Price discusses yet another statistical aspect of environmental regulation;
evaluating the capabilities of analytical laboratories. He contrasts and compares
strategies to evaluate individual laboratories based only on their own bias and variability
characteristics (intralaboratory testing) with strategies that evaluate laboratories as a
group (interlaboratory testing). Price's paper has commonality with that of Bisgaard and
Hunter in that he argues that first the operating characteristic of a regulation needs to be
specified. This specification is then used to determine the types and numbers of
observations required in the associated compliance tests.
The eight papers in this volume of proceedings deal with diverse aspects of the
statistical design and interpretation of environmental regulations and associated
compliance monitoring programs. A unifying theme among them is that the statistical
objectives and characteristics of the regulations should be specified right at the planning
stage and should be drivers of the specific regulation designs rather than being
(in)consequential afterthoughts.
Paul I. Feder
Chair, ASA/EPA Conference on Compliance Sampling
Battelle Memorial Institute
IV
-------
TABLE OF CONTENTS
Preface. JOHN C. BAILAR III, McGill University ii
Introduction. PAUL I. FEDER, Battelle Memorial Institute iii
Index of Authors . vi
I. TOXICOKINETIC AND PERSONAL EXPOSURE CONSIDERATIONS IN
THE DESIGN AND EVALUATION OF MONITORING PROGRAMS
Time Scales: Biological, Environmental, Regulatory. ALLAN H. MARCUS,
Battelle Columbus Division 1
Discussion. RICHARD C. HERTZBERG, U.S. Environmental Protection
Agency, ECAO-Cincinnati 16
Statistical Issues in Human Exposure Monitoring. WILLIAM C. NELSON,
U.S. Environmental Protection Agency, EMSL-Research Triangle Park 17
Discussion. WILLIAM F. HUNT, JR., U. S. Environmental Protection
Agency, OAQPS-Research Triangle Park 39
H. STATISTICAL DECISION AND QUALITY CONTROL CONCEPTS IN DESIGNING
ENVIRONMENTAL STANDARDS AND COMPLIANCE MONITORING PROGRAMS
Designing Environmental Regulations. SOREN BISGAARD, WILLIAM G. HUNTER,
University of Wisconsin-Madison 41
Discussion. W. BARNES JOHNSON, U.S. Environmental Protection Agency,
OPPE-Washington, D.C. 51
Quality Control Issues in Testing Compliance with a Regulatory Standard:
Controlling Statistical Decision Error Rates. BERTRAM PRICE, Price
Associates, Inc. 54
Discussion. GEORGE T. FLATMAN, U.S. Environmental Protection Agency,
EMSL-Las Vegas 75
m. COMPLIANCE WITH RADIATION STANDARDS
On the Design of a Sampling Plan to Verify Compliance with EPA Standards
for Radium-226 in Soil at Uranium Mill Tailings Remedial-Action Sites.
RICHARD O. GILBERT, Battelle Pacific Northwest Laboratory, MARK L.
MILLER, Roy F. Weston, Inc.; H. R. MEYER, Chem-Nuclear Systems, Inc. 77
Discussion. JEAN CHESSON, Price Associates, Inc. 111
IV. THE BUBBLE CONCEPT APPROACH TO COMPLIANCE
Distributed Compliance: EPA and the Lead Bubble. JOHN W. HOLLEY, BARRY
D. NUSSBAUM, U.S. Environmental Protection Agency, QMS-Washington, D.C. 112
Discussion. N. PHILIP ROSS, U.S. Environmental Protection Agency,
OPPE-Washington, D.C. 121
-------
V. COMPLIANCE WITH AIR QUALITY STANDARDS
Variable Sampling Schedules to Determine PMjQ Status. NEIL H. FRANK,
THOMAS C. CURRAN, U. S. Environmental Protection Agency, OAQPS-
Research Triangle Park 122
Discussion. JOHN WARREN, U. S. Environmental Protection Agency, OPPE-
Washington, D.C. 128
Analysis of the Relationship Between Maximum and Average in S02 Time
Series. THOMAS S. HAMMERSTROM, Roth Associates, RONALD E. WYZGA,
Electric Power Research Institute 129
Discussion. R. CLIFTON BAILEY, Health Care Financing Administration 154
Summary of Conference. JOHN C. BAILAR III, McGill University and
U.S. Public Health Service 155
Appendix A: Program 160
Appendix B: Conference Participants 162
INDEX OF AUTHORS
Bailar, John C ii,155
Bailey, R. Clifton 154
Bisgaard, Soren 41
Chesson, Jean Ill
Curran, Thomas C 122
Feder, Paul I iii
Flatman, George T 75
Frank. Neil H 122
Gilbert, Richard 0 77
Hammerstrom, Thomas S 129
Hertzberg, Richard C 16
Holley, John W 112
Hunt, Jr., William F 39
Hunter, William G 41
Johnson, W. Barnes 51
Marcus, Allan H 1
Meyer, H. R 77
Miller, Mark L 77
Nelson, William C 17
Nussbaum, B. D 112
Price, Bertram 54
Ross, N. Philip 121
Warren, John 128
Wyzga, Ronald E 129
VI
-------
TIME SCALES: BIOLOGICAL. ENVIRONMENTAL. ?E3L'LATCF.v
Allan H. Marcus
3attelle Coiumcus Division
P.O. Box i 3759
Researcn Triangle Park, NO 2"7'0Q
1. INTRODUCTION
E.P.A. has estao i isnec. primar, air duality standards t:> c-rptec: *T =
general puolic aaainst the adverse nealth effects of air col ".ut3"s. src
secondary standards to protect against other aa/er=e =p./ •; - j .-,- = .-, t a 1
impacts. Compliance wi^n these standaras i= usuall1-' are=cr:ce~ b- a>-
explicit sampling protocol for the pollutant, with SDSCITISO :'-'--[=:• ra 1
insui". variation in concentration to which the ppoulaticn 1= e-ocsec,
cost and precisipn of the sample data. Biological and heait" effects
issues are primary and should be kept always in mind. Iracec'..'= '.= sa-icli^c
schedules for compliance testing mignt allow fluctuarina e'Pcsures cf
toxicologies! significance to escape detection. Resources for trstinq
compliance are usual I/ going to be scarce, and focusing :n t.eal T =f-"ec"s
ma/ allow the analyst and designer of environmental '"egi. I a f: pr.s ~z f:>-~
some patn between oversampling and und'ersamp 1 ing environment;! data.
In this review I will emchasize air puaiitv starda^zs ""-•" leao.
Lead is a soft dense metal whose toxic effects have long ::==•- -roi-c-(. In
modern times atmoscneric lead has become a community prcole? Because of
the large quantities of lead used as gasoline additives. while t'u?
problem was supstantlal1v reduced as a result of E.P.A.'s ieaoso gasoline
pnasedown regulations, there are still significant Quantities iTeire' = . oart-3"'
plants etc., and substantial residues of previous lead e-ii! =31 rns i .•
surface soil and dust. Other regulatory author it 1=5 rc'Ttr-i lean
concentrations in drinkinq water. in consumer orcducts. and i .'• "he '.-
-------
of data has been collected by the State and Local Air Mom I-Q -1 -.Q Stations
(SLAMS) network. These provide information aoout areas wr.ere "he lead
concentration and population density are highest and mcrircrmg for
testing compliance with standards is most critical. In order f-c i -JL-MS
station to be part of the National Air Monitoring Station :i'(AfS. ~^r.:-iar'-.
verv specific criteria must be satisfied about sampler location in terms
of height above ground level, distance from the nearest ma:or roacwav.
and scatial scale of which the station is suocosed to be rspr~=enrati.e.
The citing study must also have a sufficiently long sampling oerioo to
exhibit typical wind speeds and directions, or a sufficient, larzie
number of short periods to provide an average value consisted wim ^ = ,..-
hour exposure (CD, 1986).
The current averaging time for the lead primary National Avioient -iir
Quality Standard (NAAGS) is a calendar quarter (3 months:, and trs a: •-
lead NAAGS is a ouarterlv a/erage of 1.5 ug/m3 that snail --i~ ce
exceeded. The lead stanaard proposed in 1977 was Oased on en. a/e-'agi^g
time of one calendar month. The longer period has the ad/an^age of
greater statistical stability. Howeve^, the shorter oe^iod 3;lows some
extra protection. Clinical studies with adult male volunteer subjects
showea that blood lead concentration (PbB) changed to a lew ec-uilinrium
level after E or 3 months of exposure (Rabinowitz et al.. '9~3, >376:
G'iffin et al., 1975). The shorter averaging time was also thought tc
give more protection -tc young . ch i Idren '^2 FR 530""*?) 'e-/en though rn = r =
was no direct evidence then (or now1) on blood lead kinetics '-• c""':dre'-.
""!"he risk of shorter term e'-oosures to air lead concent r a "• i ~ •• = e'S/atsd
above a quarterly-averaged standard that might go jncet=ct=c v.e'-r
considered in the 1Q78 standard decision to be minimizes because \ " -.^ =='~
on the ambient air quality data availaole at that time, tne possibilities
for significant, sustained excursions were considered sma". ., src £• i:
was determined that direct irnalation of air leac is a " = '. 5 •-.•=;. small
component of total airborne lead exposure ^3 FR ^6c^o>. ' r/2i'i- = -'., 1C5='.
The biological reasons for reevaluating the averaging time =re d:scj = = 5(:
in the next section.
Alternative forms of the air lead stancard are .-,cw "e1 'G =.al ;-sted
bv E.P.A.'s Office of Air Quality Planning and Standards (QACPS.. The
averaging time is only one of the comoonents in setting an air lead
standard. The "characterizinc /alue" for testing compliance can assume a
wide variety of forms, e.g. the maximum monthly (or quarteriv' a.erage 55
used in the "deterministic" form of the standaros, tne maximum of the
average monthly mean over a specified numoer of vears e.g. 3 consecutive
years, the average of the maximum monthly averages for each vesr within a
specified number of years, the average of the three highest nonths (or
quarte~s) within a specified number of years etc. Some averaging of the
extreme values certainly smoothes cut the oata, but also conceals extreme
high-level excursions. Some attention has been given to the statistical
properties of the alternative characterizing values (Hunt. Iq9c). Tne
consequences of different characterizing values for biolocica1 e^cosure
indices or health effects indicators has not vet been evaluated.
A final consideration is the samoling freauency. The current normal
situation is a 2^-hour average collected every 6th day. The number of
samples collected also depends on the fraction of lost days: it is not
-------
uncommon for E57, of the data to oe lost. Thus one might have oni.- 3 cr ^
valid samples per month. Hunt (1956- examined more frequent sampling
schemes: every day. every other day, eve^v third aav. ue ai = j comoared
the consequences of deterministic vs. "statistical" form of th*? sr = r,qarc.
monthly vs. quarterly characteristic values, E57. data loss -s. vo loss.
The community air lead problem in the U.S. is now <-nor = |H'~: • -3 c-e
related to point sources than to area-wide emissions. thus rne r"r i 1 z*>\ ~c
three scenarios for location were evaluated: (i) source or:ent=q sites
with maximum annual quarterly averages less than 1.5 ug.'m2: '5; =3urc =
oriented sites with maximum annual quarterly average greats1" '•han 1.5
ug/m3; (3) MAMS urban maximum concentration sites. Some c:.~ciusicrs
suggested bv his study for Quarterly averaging time 3"=:
(i) The characterizing value witn the best pr=ci = ior< ft
-------
plausible explanation is that tnere is reduced transfe-- of lead to tre
red blooa cells at higher concentrations, wnethe^ attributed fc '-educed
lead-binding capacity of the ervthrocytes or reduced transfe-- rste acro = ^
tne erythrocvte membrane as lead concentrations increase. 7r;-= ;H
reinforced bv multi-dose experiments on rats in wnicn lead concentrations
in brain, kidney, and femur are proportional to dose, which is ejected
if tissue concentrations equilibrate witn o'lasma concentrations, not .-u!?-!
whole blood lead concentrations.
Lead concentrations in peripheral tissues can be modeled bv couoied
systems of ordinary differential equations. Parameters for suc^ systems
car be estimated bv iterative nonlinear least squares methods, of-en with
Marquardt-type modifications to enlarge the domain of initial oa'-ametc'-
estimates which allow convergence to the optimal solution (Bermar, arid
Weiss, 1978). Data sets with observations of two or more comcone^ts
often sustain indirect inferences about unobser/ed Tissue oools.
Analyses of data in (Raoinowitz et al.. 1973, 1976; Griff i- et si., i-"1;;
De Silva, 1981) reported in (Marcus. I985abc: Chamberlain. '. 353; 12.
1986) show that lead is absorbed into peripheral tissues in adult humar=
within a few davs. The retention of lead by tissues is 'Tiucn larger thar,
is the initial uptake. Even soft tissues such as kicnev arc 1: ver -sones-
to retain lead for a month or so, and the skeleton retains lead "or .ea"=
or tens of years (Christoffersson et al.. 1986).
The relevance of blood lead and tissue lead con-centra* ions to overt
toxicitv is not unambiguous. As in any biologically vari=cl= acnulstion.
sc'Tie individuals can exhibit extremel/ high biood lead .-lit", -inly ->;'.-
lead poisoning (Chamberlain and riassev. 1°72). A r.o'-e ci-~ct PY-=CU'- = C-
of toxicitv is tne ervthncyte pro topcrphyr in ; EP ' concen fat i z~ .
Elevated levels of EP show that lead nas deranged the neme *:10=-ntnetic
pathway, reducing the rate of production of neme "or he^crilcb i n. EP is
now widely used as a screening indicator for potential tcvicit\. ~*n
example of the utility of EP is that after a brieT" ••nassi-. e e-oosure of a
British worker (Williams, 198^ >, zinc EF increased to .-erv elevated
levels within a week of exposure even the worl-er v-jas still larcielv
asymptomatic. Even though there is consideracie biological /ari=p : 1 i f. ,
EP levels in adults increase significantly within 10 to EO davs after
beginning an experimental increase of ingested lead 'Stui*. 19"-+; Cools
et al., 1976; Schlegel and Kufner, 1978). _ Thus biological effects in
adult humans occur very shortly after exposure, certainly witnip 3 month.
While the uptake of lead and the onset of potential "o-'icit^ cccur
ran idly during increased exposure, the reduction of exposure does not
cause an equally rapid reduction in either body aytleri or toxicit-,
indices. Accumulation of mobilizable pools of lead in "he skeleton and
other tissues create an endogenous source of lead that is only slcwl/
eliminateo. Thus the rapid uptake of leaa during periods of increased
exposure should be emonasized in setting standards for ierd.
The experimental data cited above are indeed human cats, ou* ai! for
adults (almost all for males). We are not aware of any direct stuaies en
lead kinetics in children. One of the more useful sets of oata involves
the uptake of lead by infants from formula and milk (Ryu et al., I'S^,
1985). Blood lead levels and lead content of food were measured at ES
-------
day intervals. The results are negative but informative: Bicod lead
levels in these infants appeared to eauilibrate so much faster that n^
estimate of the kinetic parameters we»s possible. A ,=r/ .-runn efi-ate
bv Duggan (I'St) based on earlier input-output studies i- infants
(Iiegler et'al., 1978) gave a Dlood lead half life (- mean li* = * i3q<2''<
of H to 6 days. Duggan's method has many assumptions and uncertainties.
An alternative method, allometric scaling based on surf;':e ares, sudc-ests
that if a 70 kg adult male has a blooc lead mean life of 30 da,-s, t-,en =
7 kg infant should have a blood lead mean life of about 3 ua.'S.
The above estimates of lead kinetics in children are not strict!'.
acceptable. Children are kineticaliy somewhat different f--r-. adults.
with a somewnat larger volume of blood and much smaller cj: rapid.
develooing skeleton (especially dense cortical bone :na^ • etav.s most : *
the adult body burden of lead). Children also aosorp lee-J f-~m r-e
environment at a greater rate, as -J>e/ na/e greats-- das trn i - res t: ~a 1
absorption of ingested lead and a ..more rapid ventilation ^a^e then zc
adults. A b lomathemat ical model has been developed p/ Hari = '. 3'ic - ~eip
(19Bn) ana modified for use by GAQPS . This uc take/b 10" inet ic mcce'. .=
based on lead concentrations in infant and juvenile baccon^. ,-ir.Q ire
believed to constitute a valid animal model for Human grr>Jtn anc
development. Preliminary applications of the incdel ;re described r.
(Cohen, 1986; ATSDR, 1937; Marcus et al.. 1987). The mcoe! includes
annual changes of kinetic parameters such as the transfe1" -ates f~r
plood-to-bone. blood-to-liver. 1iver-to-gastrointestina1 :r=ct. and
growth of blood, tissue, and skeleton. The model oreair's ~ Tie5r,
residence time for lead in blood of c-vear-old cnildren 3= = 33.•=.
Blood lead concentrations change suostantiall dL^ina r~;lorc3c
'Raoinowitz et al., 198^). These chanaes reflect the wssnc^t j* : n jte-'d
lead, tne exposure of the cm Id to changing patte-ns of f"Cd ^nc --later
ccnsumccion, and the exposure of the toddler to leaded sc:i and dust in
his or ner environment. We must thus ccnsio=-r also the temporal
variations of exposure to environmental lead.
<*. ~IME SCALES OF LEAD EXPOSURE
Air lead concentrations cnange ver / rapid-. , depending on ,-nra speed
and direction and on emissions patterns. Biological kinetics tenc to
filter out the "high-freguency" /ariations in snvironmenral lead, so tna';
only environmental variations on the order of a few days are li(-- = I/ to
play much of a role. The temporal patterns decena on averaging time and
sampling freouency, and thus will vary from one location to another
depending on the major lead sources at that site. Figure 1 shows the
time series for the logarithm of air lead concentration '.log PbA) near a
primary lead smelter in the northwestern U.S. The data are 5^-hour
concentrations sampled every third day (with a few minor slippages':. i4e
analysed these data using Box-Jenkins time series programs. The :emcc--al
structure is fairly complex, with a significant autoregressi^e component
at lag - (£7 davs) and significant moving average components 5*- lacs 1
and 3 (3 days and 9 days). Time series analyses around point source
sites and general urban sites mav thus be informative.
-------
Direct inhalation of atmospheric lean mav be only a minor part of
lead exposure attributable to air lead. Previously elevated air lead
levels mav have deposited a substantial reservoir of lead in =..c-fac= so: 1
r - r
and house oust in the environment; these are the pri-nsr/ p~thwa/s
leao in children aged 1-5 /ears. Little is known abour temcaral
variations in soil and house dust lead. Preliminary result- 11 ceo in
'Laxen et al., 1987) suggest that lead levels in surface dust arc soil
around redecorated houses and scnools can chance over periods or" • i r,e of
two to six months. While lead levels in undisturbed soils can aersisc
for thousands of years, the turnover of lead in urban soils due to human
activities is undoubtedly much faster.
Individuals are not stationar/ in their environment. Thus, tne lead
concentrations to which individuals are exposed must include both spatial
and temporal patterns of exposure. The Picture is corncle--, out much is
being learned from personal exoosure monitoring programs.
The amount of variation in air lead conceneracions at a stationar,
monitor can be extremely large. Coefficients of variation in excess of
100'/. are not uncommon around point sources sucn as lead smelters, eve11
uin en monthly or quarterly averages are used. This var iaD : 1 i r.v is far in
excess of that attributable to meteorological /ariation and is due to
fluctuations in the emissions process e.g. oue to variations in feec
stock, process control, • or production rate. ^urtharmore, the
concentration distributions are verv skewed and heavv-tai lea . more nearly
log-normally distributed than normal even r"or long averaging times. The
stochastic properties of the orccess are generally unknown, 3it~cucn it
mav be assumed that air, dust, and soil lead concer.tr a r, i ?<•= •=< ~i.>r all
sources of exposure, including food, water, ana paint. as •.•lell as t"ose
pathways from gasoline lead, have been declining. Ui'-h these points i~-
mind. we can begin to construct a Quantitative characterization c r~ a
nealth effects target for compliance studies.
5. HEALTH EFFECTS CHARACTERIZATION: A THEORETICAL APPROACH
We will here briefly descrioe a possible approach to tne orioles o*
choosing an averaging time that is meaningful for nealth effects.
Related problems such as sampling irequencv then aeoend on the precision,
with which one wishes to estimate the healtn effects characternac13- .
The basic fact is that ail of the effects of interest are driven DV the
environmental concentration-exposure C(t) at time t integrated over some
oeriod of time, witn an appropriate weighing factor. As oeoole are
encased to diverse pollutant sources, the uptake from all pathways must
be added up. If the health effect is an instantaneous one wnose .-alue at
time t is denoted X(t>, and if the biokinetic processes are ai1 linear
(as is assumed for OAGPS uptake-biokinetic moaei) or can be reasonably
approximated by a linear model driven bv C(u) at time u. then the
biokinetic model can be represented by an aftereffect hei vi ': -->.i ; after
an interval t-u. Mathematically.
-------
f
X ( c ) = J f(t-u) C!u> du
The after effect function for linear comcartmental models is 5 ~ i ' :_••.•=• of
exponential- terms.
The t ime-ave^agec! concentrat ion-exposure at time t, denoted '• ( t : , .3
also a moving average of concentration C(u) at time u. witn 3 we i git
given by g(t-u) after an interval t-u. Thus compliance >i i 1 1 te oaseci or
the values of the variable Y> covCC u).C', vi] nu dv
f f
covCX ( t ) , Y', s ) 3 = J J f(t-u) ai=-'' covCC : u ; , Cr/ ) 1 du 2/
Thus, we could formalize the proolem of selecting an averaging t:T,e T b1-
the following mathematical praole'n: choosing the averaairg time ~ rra"
maximises the correlation between X(t? and Y(s). for that time _t _ at vihic^
ECX(tn is max imum. That is. look for the timers; t at vinicn we exoect
the largest adverse health effect or effect indicator (e.g. olooc lead).
Then find the averaging time T such the moving average ar HO Tie otner tire
= is as highly correlated as possiDle with X(t>. Mete that we dc nc r
require tnat s = t. We may also restrict the range of values of T,
EXAMPLE: ONE-COMPARTMENT BIOKINETIC MODEL, MARKOV EXPOSURE MODEL. .
Suppose that the relevant biokinetic mccel 1= a simple one-
comcartment model. The aftereffect of a unit pollutant uptake is an
exponential washout (e.g. of blood lead, to a first aoprcx imat ion) with
time constant k ,
f(t-u) = exp(-k (t - uM if u '. t
= 0 i f u > t
We will also assume that the concentrat ion-exposure crocess Cst; is
stochastically second-order stationary with covariance function
covCC(u) ,C(v) ] = varCCl exo(-a I u - •/ I >
7
-------
After some algebra, one finds that:
-.•ar[X( t! ] = varCCl ' ^ : a * ;-)
varCY(t>] = varCC] 2 -
-exo ( -a' c+T-s) ) /a( k-a) -exo t -a • s-t > ; / = • 5--k .] "
If t < s-T then
covCX( t) ,Y( s) 3 = varCCl lexo ( -a ( s-1 -T • ; -=. p ; -a ( s-t ) '• 1' ~e ' =--
If t > s (for predicting from the current =amoLing time 5 to -~ /k >. a-^^ > i k-a ; ], T
A small table of correlations between X(t) and V(t; a
-------
children or for adults is about i.o/k. and that much longer or much
shorter averaging times will not capture significant e-cursions in blood
lead. An averaging time of 15-50 oavs will ma^e v ( t : reasoned I v
predictive' cf X(t) for both acuity and cniloren.
T^BLE c
CORRELATION BETWEEN BLOOD LEAD CONCENTRATION AND AVERAGE EM1.' [ROMdEMT-L
LEAD CONCENTRATION AS A FUNCTION OF AVERAGING TIME F
Assumed environmental leaa correlation scale a = l/(4 da/=)
CORRELATION
Averaging CHILD AC'UL"
Time T, Days <• = LMS days) K °= 1'iV) :J3>/5'
7 3.9237
10 0.9538
14 0.9^97 0.~E07
20 0.8900 0.3020
30 0.770" 0.3783
60 0.5^51 0.914!
90 O.^'-MJC O.aS"^
Samples collected fcr compliance resting have a mere complicated
structure for the weignt function qir,-u:, namei, (for fi--.cur =3^cles j~c =
ever-/ m days in an internal of T days/.
qi t-u) = m/hT ift.-*-vj-l)H
-------
lead, volume of environmental intake (e.g. m-3/d of air, L'd of •.•iate",
mq/d of leaded soil and dust, g/d of food) as well as concentration C't..
6. TIME SCALES FOR THE EFFECTS OF QZCME ON AGRICULTURAL C?,GP -'lE'-DS
The regulation of ozone has for some time been one of E.F.A.'s most
Dressing proolems — a regulatory- irritant as .-jell a= ~ lung ir<-1 i-an*..
The secondary standards for ozone nave drawn considerable attsrtic", ax.e
to the knowledge that exposure to ozone may rause economical 1•
significant damage to cash crops and forests. The time of day of tr>e
ozone exposure, and the day of exposure during the growing season, mav
seriously determine the effects of exposure and consecueor\v of tre
statistics that are used to formulate the standard. A u^ticer o*
aporoaches to defining a biologically relevant stanca^a are tiei^g
investigated (Lee et al, 19B7ab; Larsen et al., 1?97>.
Air monitoring data have been collected in ccnnec •:•;-• with -r =
chamber studies of the National Crop Loss ana Assessment let.-ior'-- (MC-LAN)
ana related studies have been carried out at E.P.A.'s Ccrvaiiis
Environmental Research Laboratory (CERL). The ear lie: HCLAd data /-er =
based on seven hours of monitoring (0900-1600) and statistics ^pp-opr ia'.«
to that period. More recent studies use longer sanoiing per:ccs,
including 2^-nour samples at CERL. Examples of the time patterns c *"
exposure used at CERL are shown in Lee et al., icS7-ab. Tr.e
characterizations of the air monitoring data considered for use a~
exoosure statistics and comoliance specifications include t-ne follo'rtinz,
ail based en the mean hourly ozone concentration C\'h) at ^CLI~ -:
MEAN STATISTICS
M7 = seasonal mean of C(h) for 0900-1600 hr each da.-
ill = seasonal mean of daily maximum C(h) aurinc ~! no'jrs
Effective Mean = ( * Cih)* N> *•*! D CNote: 5 T,=3,-.s =jm]
PEAK STATISTICS
F7 = seasonal peak of 7-hour daily mean over O^OO-lado hrs.
PI = seasonal peak hourly concentration
CUMULATIVE STATISTICS
Total Exposure = t C(h>
Total Impact = < 5 C(h)+*p )**l/p
Phenologically Weighted Cumulative Impact (PWCP
= ( $ C(h)**o i-i(ii) !*-*l/p
10
-------
EXCEEDANCE STATISTICS
HP.Sxx = number of hours in .^nicn C' h : .
SUMxx = total ozone concentration X hours v/it- C^. . - •
and at least six other statistics characterizing episode lengths etc.
The statistic most freauently consiaered for ozone cr^'--.cte'": 13 t i;<
|*17. However, the statistics tnat best predict crz' shoot weigri~ zf r-u
cuttings of alfalfa in a CERL experiment was transformed re a '"'-actior
the controls. The values of M7 clearly measure the damsci :.g e'"»=c •
ozone, but with a great deal of scatter around the regression lire.
somewhat clustered values of !17 ai e soread out bv the sta'.i£r;z ='.-JCI '
oi-.es much higher weight to large values of C (h'' (as C^h'/'-c1 a-id "o
n
weight 0.3 to those preceding the previous cutting. ^ro '.-*eig'i
those preceding the next earlier cutting). Crop lcs= .= mj
de finea av the values of PWCI. with relatively little scarf:-'-
fitted curve of "Weibull" form.
Tl,e ozone example suggests that; biological time 5C3l£-s'3f -esocnsr
are better caotured by comcliance statistics that give hiader .-.eiq^1: ':;
recent exoosures, as in our lead example. However, tne t i .^l- iner : cs an-?
clearly nonlinear in ozone concentration so tt-ac some nonce mo r ." :~e ~" ± :
mecnanism of damage, repair, and netaoolisn nust ~e is = -:"ed to -5
ocerating. The °WCI is a cumulative value ana not a Dea>- 'r e r=e";nc~
statistic, thus even low levels of ozone exoosure acoesr ~o oe rs'-si-g
seme carnage. The biological statistic for comcliance same Ing ' "-r
alfalfa, anvway) is thus a E^-hour peaK-weign tea c
-------
f-or most chemicals of intere-Jt there is not neariv enpuan
information on pharmacoK ine t ics . to -. icot< inet ics , or tempers; /ar i ap i i i i; .
of exposure pattern to allow these calculations to 'je marie. Ho.-;5,5<-t ^jr
manv criteria pollutants, the level or" information is adequate ^c :ne
ratio between typical population levels so cl:se to 5 ''eairn effects
criterion level as to make this s serious issue. For example. in i°^6.
the criterion level for blood lead *as 30 uq/ai. -u r •:>"<€• geometric mean
blood lead in urban cnildren was about 15 ug/dl, of wnicn 'c "C'd'
was assumed to be "non-air" bacl-grouna (i.e. reauia-ea c-v some ?'. >e--
office). Due to the reduction of leaaed gasoline during tne ! "0 ' = . the
mean blood lead level for urban children had fallen ""o Q-10 ug/dl bv
I960, and is likely to be somewhat lower today. However. ce^te1' d^ta on
health effects (e.g. erythrocyte oro tooororivr in i'-cr = 5 === i r, iron-
deficient children or hearing loss and neurobehav icrai prcoiems' •. n
children with lead burdens now suggest a much lower ^eaitn crite-'ior
level is appropriate, pernaps 10-15 ug/dl. Thus the^e is still •/?-•.
little "margin of safety" against ranconi excursions OT i=aa exposure.
This is also true for other criteria pollutants. especially for
sensitive or vulnerable suboopu la t ions. For example, asthmatics •"!>.£<•
experience sensitivity to elevated levels of sulfur oioxice cr crone.
especially wnen exercising. Ac t i /11 . levels ceitsntlv 3f-"ect tf>e
kinetics of gaseous pollutant uptake and elimination. Sucpcpu lation
variations in kinetics and pdai maco-iynamics mav be important. Acute
exposure sampling in air or water (e.g. 1-day Health Acvisories ~or
drinking water) shoula be sensiti/e to pnarmacok inet ic t-ime scales,
Biokinetic information on pollutant uotaKe and me tabc ' ; s.r j ."> '•umans
is not often available for /oiatile organic c'jiiccu:'3= acd for nest
carcinogens. Thus large uncertaint.- -"actors for animal =•< trapo 1 = t icn arc
for route of exposure variations are used to provide a conse-'^at I /e level
of exposure. The methods shown here mavbe less useful IT =LIC^
situations. But the de/elopment of lealistic biological 1, ^etivatec
pharmacokinetic models for e«trapclating animal data to humans mav
establish a larger role for assessment of compliance test'.nc for r,nese
sucstances.
ACKMOWLEDGEi-lEilTS
I am grateful to Ms. Judy Kapadia for retyping the m^rusc-ipt. ano
to the reviewer for his helpful comments.
REFERENCES
Eernan M, Weiss MF. 1978. SAAM - 5 i.nula t ion, A,isl-.si = . ana HodeLina.
Manual. U.S. Public Health Service Fuel. NIH-180.
Campbell BC, Mereditn PA. Moore MR, Uatson US. 1°8^. hinei.cs of lead
fallowing intravenous administration in man. To* Letters 51:E31-S35.
CD [Criteria Document]. 1996. nil duality criteria for lead.
Environmental Criteria and Assessment Office, US Environmental Protection
Agency. EPA-600/8-33/OE8aF (4 volumes). Res. Tri. Pk. , IMC.
12
-------
Chamberlain AC. 1935. Frediction o~~ resncr.se of blood leec to airborne
ana dietar/ lead from volunteer evoeriments with lead i settees. =r3C Rov
Soc Lond 522^: 1^9-182.
Chamfer la in MJ , Massey PMQ . 1Q72. Hi la ieaa DO i sen ing mm = >:ce= = 1 vei v
high blood lead. Brit J Industr Med 29:^58-^61.
Chr i stot'f ersson JO, Ahiqren L, Schut: A, Ske>'f / i,-,g r . l^Bc. Decrease of
skeletal lead levels in man after ena of occupational exacsure. Arcn En;
Health 41:312-318.
Cohen, J. Personal communica 1 100= st'Cut UAGiFS staff pace''. ~Dr:i-No/.
1996.
Cools A, Salle JA, Verberk MM, lielhms PL. 1 9"=. 5 1 ocr-.-ru i - a 1 lescC'-.se of
male volunteers ingesting inorganic lead for <^Q cays. Inc ACT-. Ccc'jc
Environ Health 38:12Q-139.
DeSilva PE. 1981. Determination of lead in plasma ana =trudie= " on :t =
relationship to lead in ervthrocvtes. Brit J Industr Mea 3S:20=-E!7.
Duggan M J . 1983. The uptake and excretion of lead b -• .Q'JI:Q c;iildi=i'. Ai c-.
Environ Health 38:.2.!C
Laxen, DPH, Lindsay F, Raab GM, Hunter R, Fell GS, Fulton M . 1987. The
variability of lead in dusts within the homes of •. oung c^iidien. In
Lead In the Home Environment, ed . E. Culbard. Science 5si . i ~^= . London.
Lee EH, Tingey DT, Hogsett WE. 1987a. Selection of the best excosure-
resoonse model using various 7-hcur ozone exposure statistics. Reoort for
Office of Air Quality Planning and Standards, US Environ. Protection
Agency. 13
-------
L=e EH, Tinqev DT, Hogsett WE. l^BTb. Evaluation j-~ crone e-.c
-------
2 —
1 ~
0.25
-------
DISCUSSION
Richard C. Hertzberg
Environmental Criteria and Assessment Office, U.S. EPA, Cincinnati, OH 45268
Comments on
"Time Scales: Biological, Environmental, Regulatory," Allan H. Marcus
Summary of Presentation
Marcus presents a case for consideration of
physiologic time scales in the determination of
compliance sampling protocols. The general theme of
incorporating physiologic time into risk assessment is
certainly scientifically supportable (e.g., NAS Workshop,
1986, "Pharmacokinetics in Risk Assessment," several
authors), but has been previously proposed only for
setting standards. Marcus takes the application one
step further by showing how improper sampling can fail
to detect exposure fluctuations that have toxicological
significance.
The Regulatory Context
The modeling and data that Marcus presents seem
reasonable, but key items seem to be missing, at least if
this approach is to become used by regulatory agencies.
The examples should show that the refinement will
make a practical difference in the "cost-benefit"
evaluation, and that the required data are accessible.
The first question is: does it matter? Most
standards are set with a fair degree of conservatism, so
that slight excursions above the standard will not pose a
significant health risk. The first impression of Marcus'
proposal is that it is fine tuning, when in fact it is the
coarse control which needs to be turned. Let us
consider the example of lead. Recent research has
suggested that significant impairment of neurological
development can be caused by lead concentrations much
lower than previously thought. In fact, some scientists
have suggested that lead toxicity may be a no-threshold
phenomenon. If such is the case, then EPA's approach
to setting lead standards will change drastically, and
Marcus' example, though not necessarily his proposal,
will probably not apply. But even with the current
standard, it is not clear that results from Marcus'
method will not be lost in the usual noise of biological
data. For example, consider his figure showing the
graphs of data and model fits for 11 human subjects.
First, these results may be irrelevant to the air
pollution issue since that data are following "ingestion"
of lead, not "inhalation." Lead inhalation is in many
ways more complicated than ingestion. Also, using day
30 as an example, the fitted erythrocyte protoporphyrin
levels vary dramatically across individuals (mean=49,
s.d.=20.3, range=30-73). I could not read the graphs
well, but even accounting for differing starting values,
the curve shapes also change across individuals, so that
predictions for any untested individual might be
difficult.
The second question, that of data requirements,
.cannot be answered from this presentation alone. But
some issues can be mentioned. It is not clear that the
correlations between blood lead (Table 1) and monthly
average lead are good predictors of the correlation
between monthly average lead and neurological
impairment. But is the correlation the best indicator of
performance? A better question, perhaps, is: do
changes in blood lead which could be allowed by using
the weakest sampling protocol actually result in
significantly increased incidence of neurological
dysfunction, when compared to the best compliance
sampling procedure as determined using Marcus'
scheme? It is not clear how much data would be
required to answer that question.
Also, it seems that Marcus' approach must have
pharmacokinetic data on humans. The data
requirements are then more severe for most of the
thousands of environmental chemicals, where only
animal data are available. The situation is even worse
for carcinogens, where human cancer incidence data are
not available at the low regulatory levels. In fact, the
orders-of-magnitude uncertainty in the low-dose
extrapolation of cancer bioassays easily swamps the
error due to non-optimal compliance sampling.
So where might this research go? Certainly it
should be further developed. This approach will
definitely be useful for acute regulatory levels, such as
the 1-day Health Advisories for drinking water, where
internal dose and toxicity are closely tied to
pharmacokinetics. It will probably be more significant
for sensitive subgroups, such as children and those with
respiratory disease, where the pharmacokinetics are
likely to be much different from the norm, and where
the tolerance to chemical exposure is already low. For
those cases, scaling factors and uncertainty factors are
highly inaccurate. Most important is the example
Marcus presents, chemicals where uptake and
elimination rates are dramatically different. For
control of those chemicals, using the "average"
monitored level is clearly misleading, and some
approach such as Marcus' must be used. I would
recommend the following steps:
• First, demonstrate the need. List at least a
few chemicals that are being improperly
monitored because of their pharmacokinetic
properties.
• Then, show us that your method works and is
practical.
16
-------
Statistical Issues in Human Exposure Monitoring
William C. Nelson, U.S. EPA, EMSL, Research Triangle Park
ABSTRACT
Pollutant exposure information provides a critical link in risk
assessment and therefore in environmental decision making. Traditionally,
outdoor air monitoring stations have been necessarily utilized to relate
air pollutant exposures to groups of nearby residents. This approach is
limited by (1) using only the outdoor air as an exposure surrogate when
most individuals spend relatively small proportions of time outdoors and
(2) estimating exposure of a group rather than an individual. More
recently, air monitoring of non-amoient locations, termed microenvironments,
such as residences, offices, and shops has increased. Such data when
combined with time and activity questionnaire information can provide
more accurate estimates of human exposure. Development of portable
personal monitors that can be used by the individual study volunteer
provides a more direct method for exposure estimation. Personal samplers
are available for relatively few pollutants including carbon monoxide and
volatile organic compounds (VOC's) such as benzene, styrene, tetrachloroethylene,
xylene, and dichlorobenzene. EPA has recently performed carbon monoxide
exposure studies in Denver, Colorado and Washington, D.C. which have
provided new information on CO exposure for individual activities and
various microenvironments. VOC personal exposure studies in New Jersey
and California have indicated that, for some hazardous chemicals,
individuals may receive higher exposure from indoor air than from outdoor
air. Indoor sources include tobacco smoke, cleansers, insecticides,
furnishings, deodorizers, and paints. Types of exposure assessment
included in these studies are questionnaires, outdoor, indoor, personal,
and biological (breath) monitoring.
As more sophisticated exposure data become available, statistical
design and analysis questions also increase. These issues include survey
sampling, questionnaire development, errors-in-variables situation, and
estimating the relationship between the microenvironment and direct
personal exposure, (Methodological development is needed for models wnich
permit supplementing the direct personal monitoring approach with an
activity diary which provides an opportunity for combining these data
with microenvironment data to estimate a population exposure distribution.
Another situation is the appropriate choice between monitoring instruments
of varying precision and cost. If inter-individual exposure variability
is high, use of a less precise instrument of lower cost which provides an
opportunity for additional study subjects may be justified. Appropriate
choice of an exposure metric also requires more examination. In some
instances, total exposure may not be as useful as exposure above a threshold
level.
Because community studies using personal exposure and microenvironmental
measurements are expensive, future studies will probably use smaller
sample sizes but be more intensive. However, since such studies
provide exposure data for individuals rather than only for groups, they
may not necessarily have less statistical power.
17
-------
INTRODUCTION
Pollutant exposure information is a necessary component of the risk
assessment process. The traditional approach to investigating the-
relationship between pollutant level in the environment and the concentration
available for human inhalation, absorption or ingestion, has been 1)
measurements at an outdoor fixed monitoring site or 2) mathematical model
estimates of pollutant concentration from effluent emission rate information.1
The limitations of such a preliminary exposure assessment have become
increasingly apparent. For example, recognition of the importance of
indoor pollutant sources, particularly considering the large amount of
time spent indoors, and concern for estimating total personal exposure
have lead to more in-depth exposure assessments.
One of the major problems to overcome when conducting a risk assessment
is the need to estimate population exposure. Such estimates require
information on the availability of a pollutant to a population group via
one or more pathways. In many cases, the actual concentrations encountered
are influenced by a number of parameters related to activity patterns.
Some of the more important are: the time spent indoors and outdoors,
commuting, occupations, recreation, food consumption, and water supply.
For specific situations the analyses will involve one major pathway to
man (e.g. outside atmospheric levels for ozone), but for others, such as
heavy metals or pesticides, the exposure will be derived from several
different media.
A framework for approaching exposure assessments for air pollutants
has been described by the National Academy of Science Epidemiology of Air
Pollution Committee.2 The activities shown in Figure 1 were considered
to be necessary to conduct an in-depth exposure assessment.
As knowledge about the components of this framework, particularly
sources and effects, has increased, the need for improved data on exposures
and doses has become more critical. A literature review published in
1982 discussed a large number of research reports and technical papers
with schemes for calculating population exposures.3 However, such schemes
are imperfect, relying on the limited data available from fixed air
monitoring stations and producing estimates of "potential exposures" with
unknown accuracy. Up until the 1980's, there were few accurate field
data on the actual exposures of the population to important environmental
pollutants. Very little was known about the variation from person to
person of exposure to a given pollutant, the reason for these variations,
or the differences in the exposures of subpopulatiohs of a city.
Furthermore, a variety of field studies undertaken in the 1970s and early
1980s showed that the concentrations experienced by people engaged in
various activities (driving, walking on sidewalks, shopping in stores,
working in buildings, etc.) did not correlate well with the simultaneous
readings observed at fixed air-monitoring stations.4-9 Two reviews have
summarized much of the literature on personal exposures to environmental
pollution showing the difficulty of relating conventional outdoor monitoring
data to actual exposures of the population.i°»H No widely acceptable
methodology was available for predicting and projecting future exposures
18
-------
of a population or for estimating how population exposures might change
in response to various regulatory actions. No satisfactory exposure
framework or models existed.
TOTAL HUMAN EXPOSURE
The total human exposure concept seeks to provide the missing
component in the full risk model: estimates of the total exposures of
the population to environmental pollutants, with known accuracy and
precision. Generating this new type of information requires developing
an appropriate research program and methodologies. The methodology has
been partially developed for carbon monoxide (CO), volatile organic
compounds (VOC's) and pesticides, and additional research is needed to
solve many problems for a variety of other pollutants.
The total human exposure concept defines the human being as the
target for exposure. Any pollutant in a transport medium that comes into
contact with this person, either through air, water, food, or skin, is
considered to be an exposure to that pollutant at that time.
The instantaneous exposure is expressed quantitatively as a
concentration in a particular carrier medium at a particular instant of
time, and the average exposure is the average of the concentration to the
person over some appropriate averaging time. Some pollutants, such as
CO, can reach humans through only one carrier medium, the air route of
exposure. Others, such as lead and chloroform, can reach humans through
two or more routes of exposure (e.g., air,.food, and water). If multiple
routes of exposure are involved, then the total human exposure approach
seeks to determine a person's exposure (concentration in each carrier
medium at a particular instant of time) through all major routes of
exposure.
Once implemented, the total human exposure methodology seeks to
provide information, with known precision and accuracy, on the exposures
of the general public through all environmental media, regardless of
whether the pathways of exposure are air, drinking water, food, or skin
contact. It seeks to provide reliable, quantitative data on the number
of people exposed and their levels of exposures, as well as the sources
or other contributors responsible for these exposures. In the last few
years, a number of studies have demonstrated these new techniques. The
findings have already had an impact on the Agency's policies and priorties,
As the methodology evolves, the research needs to be directed toward
identifying and better understanding the nation's highest priority
pollutant concerns.
The major goals of the Total Human Exposure Program can be summarized
as follows:
Estimate total human exposure for each pollutant of concern
Determine major sources of this exposure
Estimate health risks associated with these exposures
Determine actions to eliminate or at least reduce these risks
19
-------
The total human exposure concept considers major routes of exposure
by which a pollutant may reach the human target. Then, it focuses on
those particular routes which are relevant for the pollutants of concern,
developing information on the concentrations present and the movement of
the pollutants through the,exposure routes. Activity information from
diaries maintained by respondents helps identify the microenvironments of
greatest concern-, and in many cases, also helps identify likely contributing
sources. Biological samples of body burden may be measured to confirm
the exposure measurements and to estimate a later step in the risk assessment
framework.
In the total human exposure methodology, two complementary conceptual
approaches, the direct and the indirect, have been devised for providing
the human exposure estimates needed to plan and set priorities for reducing
risks.
Direct Approach
The "direct approach" consists of measurements of exposures of the
general population to pollutants of concern.12 A representative probability
based sample of the population is selected based on statistical design.
Then, for the class of pollutants under study, the pollutant concentrations
reaching the persons sampled are measured for the relevant environmental
media. A sufficient number of people are sampled using appropriate
statistical sampling techniques to permit inferences to be drawn, with
known precision, about the exposures of the larger population from which
the sample has been selected. From statistical analyses of subject
diaries which list activities and locations visited, it usually is possible
to identify the likely sources, microenvironments, and human activities
that contribute to exposures, including both traditional and nontraditional
components.
To characterize a population's exposures, it is necessary to monitor
a relatively large number of people and to select them in a manner that
is statistically representative of the larger population. This approach
combines the survey design techniques of the social scientist with the
latest measurement technology of the chemist and engineer, using both
statistical survey methodology and environmental monitoring in a single
field survey. It uses the new miniaturized personal exposure monitors
(PEMs) that have become available over the last decade, 13,14,15 ancj ^
adopts the survey sampling techniques that have been used previously to
measure public opinion and human behavior. The U.S. EPA Office of Research
and Development (ORD) has recently conducted several major field studies
using the direct approach, namely, the Total Exposure Assessment Methodology
(TEAM) Study of VOCs, the CO field studies in Washington, D.C. and Denver,
and the non-occupational exposure to pesticides study. These studies
will be described later.
Indirect Approach
Rather than measuring personal exposures directly as in the previous
approach, the "indirect approach" attempts to construct the exposure
profile mathematically by combining information on the times people spend
20
-------
in particular locations (homes, automobiles, offices, etc.) with the
concentrations expected to occur there. This approach requires a
mathematical model, information on human activity patterns, and statistical
information on the concentrations likely to occur in selected locations,
or "microenvironments".l^ -A microenvironment can be defined as a location
of relatively homogeneous pollutant concentration that a person occupies
for some time period. Examples include a house, office, school, automobile,
subway or bus. An activity pattern is a record of time spent in specific
mi croenvi ronments.
In its simplest form the "indirect approach" seeks to compute the
integrated exposure as the sum of the individual products of the concentrations
encountered by a person in a microenvironment and the time the person
spends there. The integrated exposure permits computing the average
exposure for any averaging period by dividing the time duration of the
averaging period. If the concentration within microenvironment j is
assumed to be constant during the period that person i occupies
microenvi ronment j, then the integrated exposure E-j for the person i will
be the sum of the product of the concentration cj in each microenvironment
and the time spent by person i in that microenvironment
J
E1 - I Cjt1jf
j = 1
where E-j = integrated exposure of person i over the time period of interest;
Cj = concentrations experienced in microenvironment j;
t-jj = time spent by person i in microenvi ronment j; and
J = total number of microenvironments occupied by person i over
the time period of interest.
To compute the integrated exposure E^ for person i, it obviously is
necessary to estimate_both Cj and t-jj. If T is the averaging time,
the average exposure E-j of person i is obtained by dividing by T; that is
E-J = E-j/T, where E-j is summed over time T.
Although the direct approach is invaluable in determining exposures
and sources of exposure for the specific population sampled, the Agency
needs to be able to extrapolate to much larger populations. The indirect
approach attempts to measure and understand the basic relationships
variables and resulting exposures, usually in particular
through "exposure modeling." An exposure model takes
the field, and then, in a separate and distinct activity,
The exposure model is intended to complement results
studies and to extend and extrapolate these findings to other
other situations. Exposure models are not traditional
used to predict
between causative
mi croenvi ronments
data collected in
predicts exposure
from di rect
locales and
dispersion models
outdoor concentrations; they are
different models designed to predict the exposure of a rather mobile
human being. Thus, they require information on typical activities and
time budgets of people, as well as information on likely concentrations
in places where people spend time.
21
-------
The U.S. EPA ORD has also conducted several studies using the indirect
approach. An example of a recent exposure model is the Simulation of
Human Activities ad Pollutant Exposures (SHAPE) model, which has been
designed to make predictions of exposures to population to CO in Urban
areas. This model is similar to the NAAQS Exposure Model (NEM). The
SHAPE model used the CO concentrations measured in the Washington-Denver
CO study to determine the contributions to exposure from commuting,
cooking, cigarette smoke, and other factors. Once a model such as SHAPE
is successfully validated (by showing that it accurately predicts exposure
distributions measured in a TEAM field study), it can be used in a new
city without a field study to make a valid prediction of that population's
exposures using that city's data on human activities, travel habits, and
outdoor concentrations. The goal of future development is to apply the
model to other pollutants (e.g., VOCs, household pesticides) making it
possible to estimate exposure frequency distributions for the entire
country, or for major regions.
Field Studies
The total human exposure field studies from a central part of the
U.S. EPA ORD exposure research program. Several studies have demonstrated
the feasibility of using statistical procedures to choose a small
representative sample of the population from which it is possible to make
inferences about the whole population. Certain subpopulations of importance
from the standpoint of their unique exposure to the pollutant under study
are "weighted" or sampled more heavily than others. In the subsequent
data analysis phases, sampling weights are used to adjust for the
overrepresentation of these groups. As a result, it is possible to draw
conclusions about the exposures of the larger population of a region with
a study that is within acceptable costs.
Once the sample of people has been selected, their exposures to the
pollutant through various environmental media (air, water, food, skin) '
are measured. Some pollutants have negligible exposure routes through
certain media, thus simplifying the study. Two large-scale total human
exposure field studies have been undertaken by U.S. EPA to demonstrate
this methodology: the TEAM study of VOCs and the Denver - Washington DC,
field study of CO.
The first set of TEAM Studies (1980-84) were the most extensive
investigation of personal exposures to multiple pollutants and corresponding
body burdens. In all, more than 700 persons in 10 cities have had their
personal exposures to 20 toxic compounds in air and-drinking water measured,
together with levels in exhaled breath as an indicator of blood
concentration.17'19 Because of the probability survey design used,
inferences can be made about a larger target population in certain areas:
128,000 persons in Elizabeth/Bayonne, NJ; 100,000 persons in the South
Bay Section of Los Angeles, CA; and 50,000 persons in Antioch/Pittsburg,
CA.
22
-------
The major findings of the TEAM Study may be summarized as follows:
1. Great variability (2-3 orders of magnitude) of exposures occur even
in small geographical areas (such as a college campus) monitored on the
same day.
2. Personal and overnight indoor exposures consistently outweigh outdoor
concentrations. At the higher exposure levels, indoor concentrations may
be 10-100 times the outdoor concentrations, even in New Jersey.
3. Drinking water and beverages in some cases are the main pathways of
exposure to chloroform and bromodichloromethane — air is the main route
of exposure to 10 other prevalent toxic organic compounds.
4. Breath levels are significantly correlated with previous personal
air exposures for all 10 compounds. On the other hand, breath levels are
usually not significantly correlated with outdoor levels, even when the
outdoor level is measured in the person's own backyard.
5. Activities and sources of exposure were significantly correlated
with higher breath levels for the following chemicals:
benzene: visits to service stations, smoking, work in chemical and
paint plants;
tetrachloroethylene: visits to dry cleaners.
6. Although questionnaires adequate for identifying household sources
were not part of the study, the following sources were hypothesized:
p-dichlorobenzene: moth crystals, deodorizers, pesticides;
chloroform: hot showers, boiling water for meals;
styrene: plastics, insulation, carpets;
xylenes; ethylbenzene: paints, gasoline.
7. Residence near major outdoor point sources of pollution had little
effect, if any, on personal exposure.
The TEAM direct approach has four basic elements:
Use of a representative probability sample of the population under
study
Direct measurement of the pollutant concentrations reaching these
people through all media (air, food, water, skin contact)
Direct measurement of body burden to infer dosage
Direct recording of each person's daily activities through diaries
The Denver - Washington, DC CO Exposure Study utilized a methodology
for measuring the frequency distribution of CO exposures in a representative
sample of urban populations during 1982-83.20-22 Household data were
collected from over 4400 households in Washington, DC and over 2100
23
-------
households in the Denver metropolitan areas. Exposure data using personal
monitors were collected from 814 individuals in Washington, DC, and 450
individuals in Denver, together with activity data from a stratified
probability sample of the residents living in each of the two urban areas.
Established survey sampling procedures were used. The resulting exposure
data permit statistical comparisons between population subgroups (e.g.,
commuters vs. noncommuters, and residents with and without gas stoves).
The data also provide evidence for judging the accuracy of exposure
estimates calculated from fixed site monitoring data.
Additional efforts are underway to use these data to recognize indoor
sources and factors which contribute to elevated CO exposure levels and
to validate existing exposure models.
Microenvironment Models
Utilizing data collected in the Washington, DC urban-scale CO Study,
two modeling and evaluation analyses have been developed. The first,
conducted by Duan, is for the purpose of evaluating the use of microenvironmental
and activity pattern data in estimating a defined population's exposure to
CO.16 The second, conducted by Flachsbart, is to model the microenvironmental
situation of commuter rush-hour traffic (considering type and age of
vehicle, speed, and meteorology) and observed CO concentrations.5 With
the assistance of a contractor, U.S. EPA has collected data on traffic
variables, traffic volume, types of vehicles, and model year. An earlier
study measured CO in a variety of microenvironments and under a variety
of conditions.23
The indirect method for estimating population exposure to CO was
compared to exposures to the CO concentrations observed while people
carried personal monitors during their daily activities. The indirect
estimate derived from personal monitoring at the low concentration levels,
say 1 ppm but higher at levels above that. For example, at the 5 ppm
level, indirect estimates were about half the direct estimates within the
regression model utilizing these data. Although the results are limited,
it appears that when monitoring experts design microenvironmental field
surveys, there is a tendency to sample more heavily in those settings
where the concentration is expected to be higher, thereby causing exaggerated
levels of the indirect method. The possibility of using microenvironmental
measurements and/or activity patterns from one city to extrapolate to
those of another city is doubtful but not yet fully evaluated.
Dosimetry Research
The development of reliable biological indicators of either specific
pollutant exposures or health effects is in its early stages. A limited
number of biomarkers such as blood levels of lead or CO have been recognized
and used for some time. Breath levels of VOCs or CO have also been
measured successfully. However, the use of other biomarkers such as
cotinine, a metabolite of nicotine, for a tracer compound of environmental
tobacco smoke is still in its experimental phase. This also applies to
24
-------
use of the hydroxyproline-to-creatinine ratio as a measure of N02 exposure
and also to use of DNA adducts which form as a result of VOC exposure and
have been found to be correlated with genotoxic measures. Dosimetry
methods development, though still very new and too often not yet peady
for field application for humans, is obviously a very promising research
area.
Exhaled breath measurements have been used successfuly in VOC and CO
exposure studies. Since breath samples can be obtained noninvasively,
they are preferred to blood measurements whenever they can meet the
exposure research goals. A methodology to collect expired samples on a
Tenax adsorbent has been developed and used on several hundred TEAM study
subjects. Major findings have included the discovery that breath levels
generally exceed outdoor levels, even in heavily industrialized'petrochemical
manufacturing areas. Significant correlations of breath levels with
personal air exposures for certain chemicals give further proof that the
source of the high exposure is in personal activities or indoors, at home
as well as at work.
The basic advantages of monitoring breath rather than blood or tissues
are:
1. Greater acceptability by volunteers. Persons give breath samples
more readily than blood samples. The procedure is rapid and convenient,
taking only 5-10 min. in all.
2. Greater sensitivity. . Since volatile organic compounds often have a
high air-to-blood partition coefficient, they will have higher concentrations
in breath than in blood under equilibrium conditions. Thus, more than
100 compounds have been detected in the breath of subjects where
simultaneously collected blood samples showed only one or two above
detectable limits.
3. Fewer analytical problems. Several "clean-up" steps must be completed
with blood samples, including centrifuging, extraction, etc., with each
step carrying possibility for loss or contamination of the sample.
Measurements of CO in expired air often are used as indicators of
carboxyhemoglobin (COHb) concentrations in blood, although the precise
relationship between alveolar CO and blood COHb has not been agreed upon.
The U.S. EPA exposure monitoring program therefore included a breath
monitoring component in its study of CO exposures in Denver and Washington,
DC. The purpose was (1) to estimate the distribution of alveolar CO (and
therefore blood COHb) concentrations in the nonsmoking adult residents of
the two cities; and (2) to compare the alveolar CO measurements to preceding
personal CO exposures.
The major findings of the breath monitoring program included:
1. The percent of nonsmoking adults with alveolar CO exceeding 10 ppm
(i.e., blood COHb 2%) was 11% in Denver and 6% in Washington, DC.
25
-------
2. The correlations between breath CO and previous 8-h CO exposure were
0.5 for Denver and 0.66 for Washington, DC.
3. The correlations between personal CO exposures at home or at-work
and ambient CO at the nearest stations averaged 0.25 at Denver and 0.19
at Washington, DC. Thus, the ambient data explained little of the
variability of CO exposure.
Sampling Protocols
Statistical sampling protocols are the design for large-scale total
human exposure field studies. They describe the procedures to be used in
identifying respondents, choosing the sample sizes, selecting the number
of persons to be contacted within various subpopulations, and other
factors. They are essential to the total human exposure research program
to ensure that a field survey will provide the information necessary to
meet its objectives. Because one's activities affect one's exposures,
another unique component of the total human exposure research program is
the development of human activity pattern*data bases. Such data bases
provide a record describing what people do in time and space.
Whenever the objectives of a study are to make valid inferences beyond
the group surveyed, a statistical survey design is required. For exposure
studies, the only statistically valid procedure that is widely accepted
for making such inferences is to select a probability sample from the
target population. The survey designs used in the total exposure field
studies have been-three-stage probability-based, which consist of areas
defined by census tracts, households randomly selected within the census
tracts, and stratified sampling of screened eligible individuals.20,24
STATISTICAL ISSUES
TEAM Design Considerations
It appears that some variability in the TEAM exposure data might be
due to meteorological factors such as some receptors being downwind of the
sources while others are not. A more careful experimental design that
includes consideration of these factors, including measurement of
appropriate meteorological parameters, may lead to more meaningful data
in future studies.
Other TEAM design considerations are:
1. The intraperson temporal variation in VOC exposure is crucial in
risk assessment and should be given a high priority in future studies.
2. Given the substantial measurement error, the estimated exposure
distributions can be substantially more heterogeneous than the true
exposure distributions. For example, the variance of the estimated
exposures is the sum of the variance of the true exposures and the
variance of the measurement errors, assuming that: a) measurement
errors are homoscedastic, and b) there is no correlation between
measurement error and true exposure. Empirical Bayes methods are
available for such adjustments.
26
-------
3. The relatively high refusal rate in the sample enrollment is of
concern. A more rigorous effort in the future to assess the impact
of the refusal on the generalizability of the sample is desirable.
For example, a subsample of the accessible part of the refusals can
be offered an incentive to participate, or be offered a less intensive
protocol for their participation; the data from the would-be refusals
can then be compared with the "regular" participants to assess the
possible magnitudes of selection bias.
4. In future studies, the following might be used:
a. use of closed format questionnaires,
b. use of artifical intelligence methodology,
c. use of automated instrument output.
Development of Improved Microenvironmental Monitoring Designs
The direct method of personal exposure is appealing but is expensive
and burdensome to human subjects. Monitoring microenvironments instead
is less costly but estimtes personal exposure only indirectly. Obviously
these approaches can be used in a complementary way to answer specific
pollutant exposure questions.
With either method, a crucial issue is how to stratify the
microenvironments into relatively homogeneous microenvironment types
(METS).12 Usually there are many possible ways to stratify the
microenvironments into METs, thus there can be many potentially distinct
METs. Obviously one cannot implement a stratification scheme with five
hundred METs in field studies. It is therefore important to develop
methods for identifying the most informative ways to stratify the
microenvironments into METs. For example, if we can only afford to
distinguish two METs in a field study, is it better to distinguish indoor
and outdoor as the two METs', or is it better to distinguish awake and
sleeping as the two METs?
Some of the more important issues which will require additional
methodological development are:
1. How to identify the most informative ways to stratify microenvironments
into METs.
2. How to optimize the number of METs, choosing between a larger number
of METs and fewer microenvironments for each MET, and a smaller
number of METs and more microenvironments for each MET.
3. How to allocate the number of monitored microenvironments across
different METs: one should monitor more microenvironments for the
more crucial METs (those in which the human subjects spend more of
their time) than the less crucial METs.
27
-------
Development and Validation of Improved Models for Estimating Personal
Exposure from Microenvironmental Monitoring Data
Methodological development is needed for models which allow
supplementing the direct personal monitoring approach with an activity
diary enabling these data to be combined with indirect approach
microenvironmental data to estimate personal exposure through a regression-
like model. The basic exposure model which sums over microenvironments
Ei • I Cjtij
j
can be interpreted as a regression model with the concentrations being
the parameters to be estimated. To fully develop this approach, it is
necessary to make crucial assumptions about independence between individuals
and between METs. Therefore, it is very important to validate the method
empirically.
Errors-in-Vari'ables Problem
It is important to recognize an errors-in-variables situation which.
may often occur in exposure assessment. In estimating the relationship
between two variables, Y (a health effect) and X (true personal exposure),
when X is not observed but a surrogate of X, say Z, which is related to X
is observed. Such variables may have systematic errors as well as zero-
centered random errors. The effects of the measurement bias are more
serious in estimation situations than for hypothesis testing.
Choice Between Monitoring Instruments of Varying Precision and Cost
When designing monitoring programs, it is common to have available
instruments of varying quality. Measurement devices that are less
expensive to obtain and use are typically also less accurate and precise.
Strategies could be developed and evaluated that consider the costs of
measurement as well as the precision. In situations of high between-
individual exposure variability, a less precise instrument of lower cost
may be preferred if it permits an opportunity for enough additional study
subjects.
Development of Designs Appropriate for Assessing National Levels
At the present time, the data available for the assessment of personal
exposure distributions are restricted to a limited number of locales.
The generalization from existing data to a very general population such
as the national population requires a great deal of caution. However, it
is conceivable that large scale studies or monitoring programs aimed at a
nationally representative sample might be implemented in the future. It
would be useful to consider the design of such studies using data presently
available. It would also be useful to design studies of more limited
scales to be conducted in the near future as pilot studies for a possible
national study, so as to collect information which might be useful for
the design of a national study.
28
-------
An issue in the design of a national study is the amount of clustering
of the sample: one has to decide how many locales to use, and how large
a sample to take for each locale. The decision depends partly on the
fixed cost in using additional locales, and partly on the intracluster
correlation for the locales. For many of the VOC's measured in the TEAM
studies, there is far more variability within locales than between locales,
in other words, there is little intracluster correlation for the locales.
This would indicate that a national study should be highly clustered,
with a few locales and a large sample for each locale. On the other
hand, if there is more variability between locales than within locales, a
national study should use many locales and a small sample for each locale.
Further analysis of the existing TEAM data base can help to address
these issues. For example, the TEAM sample to date can be identified as
a "population" from which various "samples" can be taken. The characteristics
of various sample types can be useful for the design of any followup
studies as well as for a larger new study.
Evaluating Extreme Values in Exposure Monitoring
Short term extreme values of pollutant exposure may well be more
important from a biological point of view than elevated temporal mean
values. The study of statistical properties of extreme values from
multivariate spatio-temporally dependent data is in its infancy. In
particular, the possibility of synergy necessitates the development of a
theory of multivariate extreme values. It is desirable to develop estimates
of extreme quantiles of pollutant concentration.
Estimation Adjustment for Censored Monitoring Data
One should develop low exposure level extrapolation procedures and
models, and check the sensitivity of these procedures to the models
chosen. In some cases a substantial fraction of exposure monitoring data
is below the detection limit even though these low exposure levels may be
important. The problem of extrapolating from measured to unmeasured
values thus naturally arises. Basically this is a problem of fitting the
lower tail of the pollutant concentration distribution. Commonly used
procedures assume either that below detectable level values are actually
at the detection limit, or that they are zero, or that they are one-half
of the detection limit.
In many monitoring situations we may find a good fit to simple models
such as the lognormal for that part of the data which lies above the
detection limit. Then the calculation of total exposure would use a
lognormal extrapolation of the lower tail.
SUMMARY
Personal exposure assessment is a critical link in the overall risk
assessment framework. Recent advances in exposure monitoring have provided
new capabilities and additional challenges to the environmental research
team, particularly to the statistician, to improve the current state of
29
-------
information on microenvironment concentrations, activity patterns, and
particularly personal exposure. If these opportunities are realized,
then risk assessments can more often use human exposure and risk data in
addition to available animal toxicology information.
30
-------
REFERENCES
1. Lioy, P. J., (1987) In Depth Exposure Assessments. JAPCA, 37, 791-
793.
2. Epidemiology of Air Pollution, National Research Council National
Academy Press, Washington, DC (1985), 1-334.
3. Ott, W. R. (1982) Concepts of human exposure to air pollution,
Environ. Int., 7, 179-196.
4. Cortese, A. D. and Spengler, J.D. (1976) Ability of fixed monitoring
stations to represent carbon monoxide exposure. J. Air Pollut.
Control Assoc., 26, 1144.
5. Flachsbart, P. G. and Ott, W. R. (1984) Field Surveys of carbon
monoxide in commercial settings using personal exposure monitors.
EPA-600/4-94-019, PB-84-211291, U.S. Environmental Protection
Agency, Washington, DC.
6. Wallace, L. A. (1979) Use of personal monitor to measure commuter
exposure to carbon monoxide in vehicle passenger compartment.
Paper No. 79-59.2, presented at the 72nd Annual Meeting of the
Air Pollution Control Association, Cincinnati, OH.
7. Ott, W. R. and Eliassen, R. (1973) A survey technique for determining
the representativeness of urban air monitoring stations with
respect to carbon monoxide, J. Air. Pollut. Control Assoc. 23,
685-690.
8. Ott, W. R. and Flachsbart, P. (1982) Measurement of carbon monoxide
concentrations in indoor and outdoor locations using personal
exposure monitors, Environ. Int. 8, 295-304.
9. Peterson, W. B. and Allen, R. (1982) Carbon monoxide exposures to
Los Angeles commuters, J. Air Pollut. Control Assoc. 32, 826-833.
10. Spengler, J. D. and Soczek, M. L. (1984) Evidence for improved
ambient air quality and the need for personal exposure research,
Environ. Sci. Techno!. 18, 268-80A.
11. Ott, W. R. (1985) Total human exposure: An emerging science focuses
on humans as receptors of environmental pollution, Environ.
Sci. Techno!. 19, 880-886.
12. Duan, N (1982) Models for human exposure to air pollutant, Environ.
Int. 8, 305-309.
13. Mage, D. T. and Wallace, L. A., eds. (1979) Proceedings of the
Symposium on the Development and Usage of Personal Monitors for
Exposure and Health Effects Studies. EPA-600/9-79-032, PB-80-
143-894, U.S. Environmental Protection Agency, Research Triangle
Park, NC.
31
-------
14. Wallace, L. A. (1981) Recent progress in developing and using personal
monitors to measure human exposure to air pollution, Environ.
Int. 5, 73-75.
15. Wallace, L. A. and Ott, W. R. (1982) Personal monitors: A state-of-
the-art survey, J. Air Pollut. Control Associ. 32, 601-610.
16. Duan, N. (1984) Application of the microenvironment type approach to
assess human exposure to carbon monoxide. Rand Corp., draft
final report submitted to the U.S. Environmental Protection
Agency, Research Triangle Park, NC.
17. Wallace, L. A., Zweidinger, R., Erickson, M., Cooper, S., Whitaker,
D., and Pellizzari, E. D. (1982) Monitoring individual exposure:
Measurements of volatile organic compounds in breathing-zone
air, drinking water, and exhaled breath, Environ. Int. 8, 269-282.
18. Wallace, L., Pellizzari, E., Hartwell, T., Rosenzweig, M., Erickson,
M., Sparacino, C. and Zelon, H. (1984) Personal exposures
to volatile organic compounds: I. Direct measurements in
breathing-zone air, drinking water, food, and exhaled breath,
Environ. Res. 35, 293-319.
19. Wallace, L., Pellizzari, E., Hartwell, T., Zelon, H., Sparacino, C.,
and Whitmore, R. (1984) Analyses of exhaled breath of 335
urban residents for volatile organic compounds, in Indoor Air,
vol. 4: Chemical Characterization'and Personal Exposure, pp.
15-20. Swedish Council for Building Research, Stockholm.
20. Akland, G. G., Hartwell, T. D., Johnson, T.R., and Whitmore, R. W.
(1985) Measuring human exposure to carbon monoxide in Washington,
DC, and Denver, Colorado, during the winter of 1982-83, Environ.
Sci. Technol. 19, 911-918.
21. Johnson, T. (1984) A study of personal exposure to carbon monoxide
in Denver, Colorado. EPA-600/4-84-015, PB-84-146-125,
Environmental Monitoring Systems Laboratory, U.S. Environmental
Protection Agency, Research Triangle Park, NC
22. Hartwell, T. D., Carlisle, A. C., Michie, R. M., Jr., Whitmore, R.
W., Zelon, H. S., and Whitehurst, D. A. (1984) A study of carbon
monoxide exposure of the residents in Washington, DC. Paper
No. 121.4, presented at the 77th Annual Meeting of the Air
Pollution Control Association, San Francisco, CA.
23. Holland, D. M. and Mage, D. T. (1983) Carbon monoxide in four cities
during the winter of 1981. EPA-600/4-83-025, Environmental
Monitoring Systems Laboratory, U.S. Environmental Protection
Agency, Research Triangle Park, NC.
24. Whitmore, R. W., Jones, S. M., and Rozenzeig, M. S. (1984) Final
sampling report for the study of personal CO exposure. EPA-
600/S4-84-034, PB-84-181-957, Environmental Monitoring
Systems Laboratory, U.S. Environmental Protection Agency,
Research Triangle Park, NC.
32
-------
FRAMEWORK FOR EXPOSURE ASSESSMENT
Outdoor
Emission
Sources
Outdoor
Concentrations
Time-activity
patterns
Total
Personal
Exposure
Internal Dose
i
Biologically Effective
Dose
i
Health Effect
Indoor
Emission
Sources
I
Indoor
Concentrations
Time-activity
patterns
-------
TOTAL HUMAN EXPOSURE PROGRAM
GOALS:
Estimate total human exposure for each
pollutant of concern
Determine major sources of this exposure
Estimate health risks associated with
these exposures
Determine actions to reduce these risks
-------
PROPORTION OF TIME IN SELECTED MICROENVIRONMENT
EMPLOYED PERSONS
INDOORS. WORK—28%
OJ
(J\
OUTDOORS—2%
IN TRANSIT—6%
INDOORS, OTHER—1%
INDOORS, HOME—63%
-------
PROPORTION OF TIME IN SELECTED MICROENVIRONMENTS
FULL-TIME HOMEMAKERS
CO
ON
.A*
INDOORS, OTHER—5%
INDOORS, HOME~89%
-------
MAJOR EXPOSURE SOURCES
Outdoors
Indoors
Ul
-4
Industrial
Automobile
Toxic wastes
Pesticides
Tobacco smoke
Gas stoves
Cleaners
Sprays
Dry Cleaning
Paints
Polishes
-------
EXPOSURE ASSESSMENT FOR
COMMUNITY STUDIES
OJ
00
Questionnaires
Outdoor monitoring
Indoor monitoring
Personal monitoring
Biological monitoring
-------
DISCUSSION
William F. Hunt, Jr.
Chief, Monitoring and Research Branch
Technical Support Division
Research Triangle Park, NC 27711
William C. Nelson's paper provides an
excellent overview of exposure monitoring
and associated statistical issues. The
reader must keep in mind that the paper
is directed at estimating air pollution
in microscale environments—in the home,
at work, in automobiles, etc., as well as
in the ambient air to which the general
public has access.
While it is important to better
understand air pollution levels in each
of these microenvironments, it must be
clearly understood that the principal
focus of the nation's air pollution
control program is directed at
controlling ambient outdoor air pollution
levels to which the general public has
access. The Clean Air Act (CAA) of 1970
and the CAA of 1977 emphasized the
importance of setting and periodically
reviewing the National Ambient Air
Quality Standards (NAAQS) for the
nation's most pervasive ambient air
pollutants—particulate matter, sulfur
dioxide, carbon monoxide, nitrogen
dioxide, ozone and lead. NAAQS(s) were
set to protect against both public health
and welfare effects.
One of. these pollutants, carbon
monoxide (CO), is discussed extensively
in Dr. Nelson's paper. CO is a
colorless, odorless, poisonous gas formed
when carbon in fuels is not burned
completely. Its major source is motor
vehicle exhaust, which contributes more
than two-thirds of all emissions
nationwide. In cities or areas with
heavy traffic congestion, however,
automobile exhaust can cause as much as
95 percent of all emissions, and carbon
monoxide concentrations can reach very
high levels.
In Dr. Nelson's paper, he states that
the correlations between personal CO
exposures at home or at work and ambient
CO at the nearest fixed site air
monitoring stations are weak. This does
not mean from an air pollution control
standpoint, however, that there is
something wrong with the fixed site CO
monitoring network. As stated earlier,
the air pollution control program is
directed at controlling outdoor ambient
air at locations to which the public has
access. The microscale CO monitoring
sites are generally located in areas of
highest concentration within metropolitan
areas at locations to which the general
public has access.
The Federal Motor Vehicle Control
Program has been very successful in
reducing these concentrations over time.
In fact, CO levels have dropped 32
percent between 1977 and 1986, as
measured at the nation's fixed site
monitoring networks." This improvement
has a corresponding benefit for people in
office buildings which use the outdoor
ambient air to introduce fresh air into
their buildings through their ventilation
systems. A major benefit occurs for
people who are driving back and forth to
work in their automobiles, for new cars
are much less polluting than older cars.
This should be clearly understood when
trying to interpret the major findings of
the breath monitoring programs that are
described in Dr. Nelson's paper.
Otherwise, the reader could mistakenly
conclude that somehow the Federal
Government may be in error in using fixed
site monitoring. Such a conclusion would
be incorrect. Further, it should be
pointed out that a fixed site network
also has the practical advantages of
identifying the source of the problem and
the amount of pollution control that
would be needed.
Another area of concern that needs to
be addressed in the future regarding the
breath monitoring program is the
relationship between alveolar CO and
blood carboxyhemoglobin (COHb). Dr.
Nelson states that the precise
relationship between alveolar CO and
blood COHb has not been agreed upon.
Given that, is there an inconsistency in
not being able to determine the
relationship between alveolar CO and
blood COHb and then using alveolar CO
measurements in Washington, D.C. and
Denver, Colorado to estimate blood COHb?
A final point, which needs to be
addressed in the breath monitoring
program,is the ability to detect volatile
organic chemicals, some of which may be
carcinogenic. What is the significance
of being able to detect 100 compounds in
breath, yet only one or two in blood
above the detectable limits? Does the
body expel the other 98 compounds that
cannot be detected in the blood? If so,
why?
STATISTICAL ISSUES
I agree with Dr. Nelson that
meteorological factors should be
incorporated into future TEAM studies,
through more careful experimental design.
The statistical issues identified under
TEAM design considerations, the
development of improved
raicroenvironmental monitoring designs,
errors-in-variables problem, choice
between monitoring instruments of varying
precision and cost, the development of
designs appropriate for assessing
39
-------
National levels, evaluating extreme
values in exposure monitoring, and
adjusting for censored monitoring data
are all well thought out and timely. I
strongly agree with his recommendation
that when considering multiple pollutant
species, as in the case of the volatile
and semi-volatile organic chemicals, as
well as polar compounds, the possibility
of synergistic effects necessitates the
development of a theory of multivariate
extreme values.
SUMMARY
In conclusion, Dr. Nelson's paper
provides a well thought out overview of
exposure monitoring and the associated
statistical issues. It should be an
excellent reference for people interested
in this topic. The reader should be
aware, however, of the importance of the
nation's fixed site monitoring network in
evaluating the effectiveness of the
nation's air pollution control program.
REFERENCE
1. National Air Quality and Emissions
Trends Report, 1986. U.S. Environmental
Protection Agency, Technical Support
Division, Monitoring and Reports Branch,
Research Triangle Park, NC 27711.
40
-------
Designing Environmental Regulations
Stfren Bisgaard and William G. Hunter*
Center for Quality and Productivity Improvement
University of Wisconsin-Madison
610 Walnut Street, Madison, Wisconsin 53705
• Public debate on proposed environmental regulations
often focuses almost entirely (and naively) on the allow-
able limit for a particular pollutant, with scant attention
being paid to the statistical nature of environmental data
and to the operational definition of compliance. As a
consequence regulations may fail to accomplish their pur-
pose. A unifying framework is therefore proposed that
interrelates assessment of risk and determination of compli-
ance. A central feature is the operating characteristic
curve, which displays the discriminating power of a regula-
tion. This framework can facilitate rational discussion
among scientists, policymakers, and others concerned with
environmental regulation.
Introduction
Over the past twenty years many new federal, state,
and local regulations have resulted from heightened con-
cern about the damage that we humans have done to the
environment - and might do in the future. Public debate,
unfortunately, has often focused almost exclusively on risk
assessment and the allowable limit of a pollutant.
Although this "limit part" of a regulation is important, a
regulation also includes a "statistical pan" that defines
how compliance is to be determined; even though it is typi-
cally relegated to an appendix and thus may seem unimpor-
tant, it can have a profound effect on how the regulation
performs.
Our purpose in this article is to introduce some new
ideas concerning the general problem of designing environ-
mental regulations, and, in particular, to consider the role
of the "statistical pan" of such regulations. As a vehicle for
illustration, we use the environmental regulation of
ambient ozone. Our intent is not to provide a definitive
analysis of that particular problem. Indeed, that would
require experts familiar with the generation, dispersion,
measurements, and monitoring of ozone to analyze avail-
able data sets. Such detailed analysis would probably lead
to the adoption of somewhat different statistical assump-
tions than we use. The methodology described below,
however, can accommodate any reasonable statistical
assumptions for ambient ozone. Moreover, this methodol-
ogy can be used in the rational design of any environmental
regulation to limit exposure to any pollutant.
Ambient Ozone Standard
For illustrative purposes, then, let us consider the
ambient ozone standard (1,2). Ozone is a reactive form of
oxygen that has serious health effects. Concentrations from
about 0.15 parts per million (ppm), for example, affect
*) Deceased.
respiratory mucous membranes and other lung tissues in
sensitive individuals as well as healthy exercising persons.
In 1971, based on the best scientific studies at the time, the
Environmental Protection Agency (EPA) promulgated a
National Primary and Secondary Ambient Air Quality
Standard ruling that "an hourly average level of 0.08 parts
per million (ppm) not to be" exceeded more than 1 hour
per year." Section 109(d) of the Clean Air Act calls for a
review every five years of the Primary National Ambient
Air Quality Standards. In 1977 EPA announced that it was
reviewing and updating the 1971 ozone standard. In
preparing a new criteria document, EPA provided a number
of opportunities for external review and comment. Two
drafts of the document were made available for external
review. EPA received more than 50 written responses to
the first draft and approximately 20 to the second draft.
The American Petroleum Institute (API), in particular, sub-
mined extensive comments.
The criteria document was the subject of two meet-
ings of the Subcommittee on Scientific Criteria for Photo-
chemical Oxidants of EPA's Science Advisory Board. At
each of these meetings, which were open to the public, crit-
ical review and new information were presented for EPA's
consideration. The Agency was petitioned by the API and
29 member companies and by the City of Houston around
the time the revision was announced. Among other things.
the petition requested that EPA state the primary and
secondary standards in such a way as to permit reliable
assessment of compliance. In the Federal Register it is
noted that
EPA agrees that the present deterministic form of
the oxidant standard has several limitations and
has made reliable assessment of compliance
difficult. The revised ozone air quality standards
are stated in a statistical form that will more
accurately reflect the air quality problems in vari-
ous regions of the country and allow more reli-
able assessment of compliance with the stan-
dards. (Emphasis added)
Later, in the beginning of 1978, the EPA held a public
meeting to receive-comments from interested parties on the
initial proposed revision of the standard. Here several
representatives from the State and Territorial Air Pollution
Program Administrators (STAPPA) and the Association of
Local Air Pollution Control Officials participated. After
the proposal was published in the spring of 1978, EPA held
four public meetings to receive comments on the proposed
standard revisions. In addition, 168 written comments were
received during the formal comment period. The Federal
Register summarizes the comments as follows:
The majority of comments received (132 out of
168) opposed EPA's proposed standard revision,
favoring either a more relaxed or a more
41
-------
stringent standard. State air pollution control
agencies (and STAPPA) generally supported a
standard level of 0.12 ppm on the basis of their
assessment of an adequate margin of safety.
Municipal groups generally supported a standard
level of 0.12 ppm or higher, whereas most indus-
trial groups supported a standard level of 0.15
ppm or higher. Environmental groups generally
encouraged EPA to retain the 0.08 ppm standard.
As reflected in this statement, almost all of the public dis-
cussion of the ambient ozone standard (not just the 168
comments summarized here) focused on the limit part of
the regulation. In this instance, in common with similar
discussion of other environmental regulations, the statisti-
cal pan of the regulation was largely ignored.
The final rule-making made the following three
changes:
(1) The primary standard was raised to 0.12 ppm.
(2) The secondary standard was raised to 0.12 ppm.
(3) The definition of the point at which the standard is
attained was changed to "when the expected number
of days per calendar year" with maximum hourly
average concentration above 0.12 ppm is equal to or
less than one."
The Operating Characteristic Curve
Environmental regulations have a structure similar to
that of statistical hypothesis tests. A regulation states how
data are to be used to decide whether a particular site is in
compliance with a specified standard, and a hypothesis test
states how a particular set of data are to be used to decide
whether they are in reasonable agreement with a specified
hypothesis. Borrowing the terminology and methodology
from hypothesis testing, we can say there are two types of
errors that can be made because of the stochastic nature of
environmental data: a site that is really in compliance can
be declared out of compliance (type I error) and vice versa
(type II error). Ideally the probability of committing both
types of error should be zero. In practice, however, it is not
feasible to obtain this ideal.
In the context of environmental regulations, an operat-
ing characteristic curve is the probability of declaring a site
to be in compliance (d.i.c.) plotted as a function of some
parameter 9 such as the mean level of a pollutant. This
Probfd.i.c. I 9J can be used to determine the probabilities
of committing type I and type n errors. As long as 9 is
below the stated standard, the probability of a type I error
is 1 -Prob{d.i.c. I 6}. When 6 is above the stated
standard, Prob (d.i.c. I 9J is the probability of a type II
error. Using the operating characteristic curve for the old
and the new regulations for ambient ozone, we can evalu-
ate them to see what was accomplished by the revision.
The old standard stated that "an hourly average level
of 0.08 ppm [was] not to be exceeded more than 1 hour per
year." This standard was therefore defined operationally in
terms of the observations themselves. The new standard, on
the other hand, states that the expected number of days per
calendar year with a maximum hourly average concentra-
tion above 0.12 ppm should be less than one. Compliance,
however, must be determined in terms of the actual data.
not an unobserved expected number. How should this
conversion be made? In Appendix D of the new ozone
regulation, it is stated that:
In general, the average number of exceedances
per calendar year must be less than or equal to 1.
In its simplest form, the number of exceedances
at a monitoring site would be recorded for each
calendar year and then averaged over the past 3
calendar years to determine if this average is less
than or equal to 1.
Based on the stated requirements of compliance, we have
computed the operating characteristic functions for the old
and the new ozone regulations. They are plotted in Figures
1 and 2. (The last sentence in the legend for Figure 1 will
be discussed below in the following section, Statistical
Analysis.) To construct these curves, certain simplifying
assumptions were made, which are discussed in the section
entitled "Statistical Concepts." Before such curves are
used in practice, these assumptions need to be investigated
and probably modified.
According to the main part of the new ozone regula-
tion, the interval from 0 to 1 expected number of
exceedances of 0.12 ppm per year can be regarded as
defining "being in compliance." Suppose the decision
rule outlined above is used for a site that is operating at a
level such that the expected number of days exceeding 0.12
ppm is just below one. In that case, as was noted by Javitz
(3), with the new ozone regulation, there is a probability of
approximately 37% in any given year that such a site will
be declared out of compliance. Moreover, there is approxi-
mately a 10% chance of not detecting a violation of 2
expected days per year above the 0.12 ppm limit; that is,
the standard operates such that the probability is 10% of
not detecting occurrences when the actual value is twice its
pennissable value (2 instead of 1). Some individuals may
find these probabilities (37% and 10%) to be surprisingly
and unacceptably high, as we do. Others, however, may
regard them as being reasonable or too low. In this paper.
our point is not to pursue that particular debate. Rather, it
is simply to argue that, before environmental regulations
are put in place, different segments of society need to be
aware of such operating characteristics, so that informed
policy decisions can be made. It is important to realize that
the relevant operating characteristic curves can be con-
structed before a regulation is promulgated.
Statistical Concepts
Let X denote a measurement from an instrument such
that X = 9 + e, where 9 is the mean value of the pollutant
and e is the statistical error term with variance cr . The
term e contains not only the error arising from an imperfect
instrument but also the fluctuations in the level of the pol-
lutant itself. We assume that the measurement process is
well calibrated and that the mean value of e is zero. The
parameters 9 and O"2 of the distribution of e are unknown
but estimates of them can be obtained from data. A
prescription of how the data are to be collected is known as
the sampling plan. It addresses the questions of how many.
where, when, and how observations are to be collected.
Any function f (X) =/(Xi,X2,.. - ,Xn) of the observa-
tions is an estimator, for example, the average of a set of
values or the number of observations in a sample above a
certain limit. The value of the function / for a given sam-
42
-------
pie is an estimate. The estimator has a distribution, which
can be detennined from the distribution of the observations
and the functional form of the estimator. With the distribu-
tion of the estimator, one can answer questions of the form:
what is the probability that the estimate f = f(X) is smaller
than or equal to some critical value c? Symbolically this
probability can be written as P = Probff (X_)< c I 6/.
If we want to have a regulation limiting the pollution
to a certain level, it is not enough to state the limit as a par-
ticular value of a parameter. We must define compliance
operationally in terms of the observations. The condition of
compliance therefore takes the form of an estimator
f(X\,... ,Xn) being less than or equal to some critical
value c, that is, { f (X i,... ,Xn)< c J. Regarded as a func-
tion of 6, the probability Prob{f(Xi,... ,Xn)L and zero otherwise. A year consists of
approximately n = 365 x 12 = 4380 hours of observations
(data are only taken from 9:01 am to 9:00 pm LST). The
expected number of hours per year above the limit is then
4380
Q = E{ £//.(*,) =i;=W,x 4380.
1=1
The probability that a site is declared to be in compliance
(d.i.c.) is
PM =Prob{d.i.c. I Q}=Prob\ _
1'='
(1)
1=0
This probability P0id, plotted as a function of 9, is the
operating characteristic curve for the old regulation (Figure
1). Note that (/the old standard had been written in terms
of an allowable limit of one for the expected number of
exceedances above 0.08 ppm, the maximum type I error
would be 1.00 - 0.73 = 0.27. The old standard, however, is
actually written in terms of the observed number of
exceedances so type I and type II errors, strictly speaking,
are undefined.
The condition of compliance stated in the new regula-
tion is that the "expected number of days per calendar year
with daily maximum ozone" concentration exceeding 0.12
ppm must be less than or equal to 1." Let 7, represent the
daily maximum hourly average 0=1,... ,365). Suppose
the random variables Yj are independently and identically
distributed. EPA proposed that the expected number of
days (a parameter) be estimated by a three-year moving
average of exceedances of 0.12 ppm. A site is in compli-
ance when the moving average is less than or equal to 1.
The expected number of days above the limit of L = 0.12
ppm is then
365
The three-year specification of the new standard
makes it hard to compare with the previous one-year stan-
dard. If, however, one computes the conditional probability
that the number of exceedances in the present year is less
than or equal to 0, 1,2 and 3 and multiplies that by the pro-
bability that the number of exceedances was 3, 2, 1 and 0,
respectively, for the previous two years, one then obtains a
one-year operating characteristic function.
P™ = Prob{ d.i.c. I 0 } = JT Prob f d.i.c \ k,6 }P(k)
k=0
where
f2x365 }
P(k)=Prob\ £ I(Yj) = b =
and
Prob {d.i.c.
3-k
y=0
where k=0,1,2,3. A plot of the operating characteristic
function for the new regulation, Pnew versus 9, is presented
in Figure 2.
Figures 1 and 2 show the operating characteristic
curves computed as a function of (1) the expected number
of hours per year above 0.08 ppm for the old ambient
ozone regulation and (2) the expected number of of days
per year with a maximum hourly observation above 0.12
ppm for the new ambient ozone regulation. We observe
that the 95 % de facto limit (the parameter value for which
the site in a given year will be declared to be in compliance
43
-------
with 95 % probability) is 0.36 hours per year exceeding
0.08 ppm for the old standard and 0.46 days per year
exceeding 0.12 ppm for the new standard. If the expected
number of hours of exceedances of 0.08 ppm is one (and
therefore in compliance), the probability is approximately
26% of declaring a site to be not in compliance with the old
standard. If the expected number of days exceeding 0.12
ppm is one (and therefore in compliance), the probability is
approximately 37% of declaring a site to be not in compli-
ance with the new standard. (We are unaware of any other
legal context in which type I errors of this magnitude
would be considered reasonable.) Note that the parameter
value for which the site in a given year will be declared to
be in compliance with 95% probability is 0.36 hours per
year exceeding 0.08 ppm for the old standard and 0.46 days
per year exceeding 0.12 ppm for the new standard.
Neither curve provides sharp discrimination between
"good" and "bad" values of 0. Note that the old standard
did not specify any parameter value above which non-
compliance was defined. The new standard, however,
specifies that one expected day is the limit, thereby creating
an inconsistency between what the regulation says and how
it operates because of the large discrepancy between the
stated limit and the operational limit.
The construction of Figures 1 and 2 only requires the
assumption that the relevant observations are approxi-
mately identically and independently distributed (for the
old standard, the relevant observations are those for the
hourly ambient ozone measurements; for the new standard,
they are the maximum hourly average measurements of the
ambient ozone measurements each day). The construction
does not require knowledge of the distribution of ambient
ozone observations. If one has an estimate of this distribu-
tional form, however, a direct comparison of the new and
old regulation is possible in terms of the concentration of
ambient ozone (in units, say, of ppm.) To illustrate this
point, suppose the random variable Xt is independently
and identically distributed according to a normal distribu-
tion with mean (j. and variance a2, that is, X,-Af(|a,o2).
Then the probability of one observation being above the
limit L =0.08 is
(4)
where () is the cumulative density function of the stan-
dard normal distribution. The probability that a site is
declared to be in compliance can be computed as a function
of )i by substituting pL from (4) into (1).
For the new regulation let Xl} represent the one-hour
average, O'=l 12;y=l 365), and
Y, = max£X1;,... ,X^}). If Xl}-N(]L, a2) , then YrHQ.l2} = \-
0.12-
12
one obtains the operating characteristic function for the
new standard.
For a fixed value of the variance a*, one can compute
the operating characteristic curves for the old and new
regulations to provide a graphical comparison of the way
these two regulations perform. Figure 3 shows these curves
for the old and new ambient ozone regulations computed as
a function of the mean hourly values when it is assumed
that a = 0.02 ppm. We observe that the 95% de facto limit
is changed from 0.0046 ppm to 0.045 ppm. That is. it is
approximately ten times higher in the new ozone regula-
tion.
We have three observations to offer with regard to the
old and new regulations for ambient ozone standards. First,
notwithstanding EPA's comment to the contrary, the new
ozone regulation is not more statistical than the previous
one; like all environmental regulations, both the new and
old ozone regulations contain statistical parts, and, for that
reason, both are statistical. Changing the specification
from one in terms of a critical value to one in terms of a
parameter does not make it more statistical. It actually
introduced an inconsistency. The old standard did not
specify any parameter value as a limit but only an opera-
tional limit in terms of the parameters. This therefore con-
stitutes the standard. The new standard, however, specifies
not only an intent in terms of what the desired limit is but
also an operational limit. The large difference between the
interned limit and the operational limit constitute the incon-
sistency. This inconsistency is a potential and unnecessary
source of conflict. Second, the new regulation is dependent
on the ambient ozone level for the past two years as well as
the present year, which means that a sudden rise in the
ozone level might be detected more slowly. The new regu-
lation is also more complicated. Third, it is unwise first to
record and store every single hourly observation and then
to use only the binary observation as to whether the daily
maximum is above or below 0.12 ppm. This procedure
wastes valuable scientific information. As a matter of pub-
lic policy, it is unwise to use the data in a binary form
when they are already measured on a continuous scale.
The estimate of the 1/365 percentile is an unreliable statis-
tic. It is for this reason that type I and type H errors are as
high as they are. In fact, the natural variability of this
statistic is of the same order of magnitude as the change in
the limit which was so much in debate.
If instead, for example, one used a procedure based on
the t-statistic for control of the proportion above the limit,
as is commonplace in industrial quality control procedures
(4), one would get the operating characteristic curve plotted
in Figure 4 (see also appendix). For comparison, the curve
for the new regulation is also plotted as a function of the
expected number of exceedances per year. With the new
ozone regulation, the probability can exceed 1/3 that a par-
ticular site will be declared out of compliance when it is
actually in compliance. The operating characteristic curve
for the t-test is steeper (and hence has more discriminating
power) than that for the new standard. The modified pro-
cedure based on the t-test generally reduces the probability
that sites that are actually in compliance will be declared to
be out of compliance. In fact, it is constructed so that there
is 5% chance of declaring that a site is out of compliance
when it is actually in compliance in the sense that the
expected exceedance number is one per year. Furthermore,
when a violation has occurred, it is much more certain that
44
-------
it will be detected with the t-based procedure. In this
respect, the t-based procedure provides more protection to
the public.
We do not conclude that procedures based on the t-
test are best. We merely point out that there are alterna-
tives to the procedures used in the old and new ozone stan-
dard. A basic principle is that information is lost when data
are collected on a continuous scale and then reduced to a
binary form. One of the advantages of procedures based on
the t-test is that they do not waste information in this way.
The most important point to be made goes beyond the
regulation of ambient ozone; it applies to regulation of all
pollutants where there is a desire to limit exposure. With
the aid of operating characteristic curves, informed judge-
ments can be made when an environmental regulation is
being developed. In particular, operating characteristic
curves for alternative forms of a regulation can be con-
structed and compared before a final one is selected. Also,
the robustness of a regulation to changes in assumptions,
such as normality and statistical independence of observa-
tions, can be investigated prior to the promulgation. Note
that environmental lawmaking, as it concerns the design of
environmental regulations, is similar to design of scientific
experiments. In both contexts, data should be collected in
such a way that clear answers will emerge to questions of
interest, and careful forethought can ensure that this desired
result is achieved.
Scientific Framework
The operating characteristic curve is only one com-
ponent in a more comprehensive scientific framework that
we would like to promote for the design of environmental
regulations. The key elements in this process are:
(a) Dose/risk curve
(b) Risk/benefit analysis
(c) Decision on maximum acceptable risk
(d) Stochastic nature of the pollution process
(e) Calibration of measuring instruments
(f) Sampling plan
(g) Decision function
(h) Distribution theory
(i) Operating characteristic function
Currently there may be some instances in which all of these
elements are considered in some form when environmental
regulations arc designed. Because the particular purposes
and techniques are not explicitly isolated and defined, how-
ever, the resulting regulations are not as clear nor as effec-
tive as they might otherwise be.
Often the first steps towards establishing an environ-
mental regulation are (a) to estimate the relationship
between the "dose" of a pollutant and some measure of
health risk associated with it and (b) to carry out a formal
or informal risk/benefit analysis. The problems associated
with estimating dose/risk relationships and doing
risk/benefit analyses are numerous and complex, and uncer-
tainties can never be completely eliminated. As a next step
a political decision is made - based on this uncertain
scientific and economic groundwork - as to the maximum
risk that is acceptable to society (c). As indicated in Figure
5, the maximum acceptable risk implies, through the
dose/risk curve, the maximum allowable dose. The first
three elements have received considerable attention when
environmental regulations have been formulated, but the
last six elements have not received the attention they
deserve.
The maximum allowable dose defines the compliance
set &0 and the noncompliance set 0; , which is its comple-
ment. The pollution process can be considered (d) as a sto-
chastic process or statistical time-series 0(6; r). Fluctua-
tions in the measurements X can usefully be thought of as
arising from three sources: variation in the pollution level
itself <)>, the bias b in the readings, and the measurement
error e. Thus X = <}> + b + e. Often it is assumed that 4> = 9,
a fixed constant and that variation arises only from the
measurement error e; however, all three components
, b, and e can vary. Ideally b=0 and the variance of e is
small.
Measurements will only have scientific meaning if
there is a detailed operational description of how the meas-
urements are to be obtained and the measurement process
is in a state of statistical control. A regulation must include
a specification relating to how the instruments are to be
calibrated (e). These descriptions must be an integral pan
of a regulation if it is going to be meaningful. The subject
of measurement is deeper than is generally recognized,
with important implications for environmental regulation
(5, 6, 7). The pollution process and the observed process
as a function of time are indicated in Figure 5.
Logically the next question is (f) how best to obtain a
sample X^ = (Xl,X2,... ,Xn) from the pollution process.
The answer to this question will be related to the form of
the estimator/ (X) and (g) the decision rule
d(f(X))=<
0 : process in compliance
1 : process not in compliance
The sample, the estimator, and the decision function are
indicated in Figure 5. Based on knowledge about the sta-
tistical distribution of the sample (h), one can compute (i)
the operating characteristic function
P =Prob{d(f(X)) = Q I 9; and plot the operating charac-
teristic curve P versus 9. An operating characteristic func-
tion is drawn at the bottom of Figure 5. (In practice it
would probably be desirable to construct more than one
curve because, with different assumptions, different curves
will result). Projected back on the dose/risk relationship
(see Figure 5), this curve shows the probability of
encountering various risks for different values of 9 if the
proposed environmental regulation is enacted. Suppose
there is a reasonable probability that the pollutant levels
occur in the range where the rate of change of the dose/risk
relationship is appreciable; then the steeper the dose/risk
function, the steeper the operating characteristic curve
needs to be if the regulation is to offer adequate protection.
The promulgated regulation should be expressed in terms
of an operational definition that involves measured quanti-
ties, not parameters. Figure 5 provides a convenient sum-
mary of our proposed framework for designing environ-
mental regulations.
In environmental lawmaking, it is most prudent to
consider a range of plausible assumptions. Operating
45
-------
characteristic curves will sometimes change with different
geographical areas to a significant degree. Although this is
an awkward fact when a legislative, administrative, or
other body is trying to enact regulations at an international,
national, or other level, it is better to face the problem as
honestly as possible and deal with it rather than pretending
that it does not exist.
Operating Characteristic Curve as a Goal, Not a Conse-
quence
We suggest that operating characteristic curves be
published whenever an environmental regulation is
promulgated that involves a pollutant the level of which is
to be controlled. When a regulation is being developed,
operating characteristic curves for various alternative forms
of the regulation should be examined. An operating
characteristic curve with specified desirable properties
should be viewed as a goal, not as something to compute
after a regulation has been promulgated. (Nevertheless, we
note in passing that it would be informative to compute
operating characteristic curves for existing environmental
regulations.)
In summary, the following procedure might be feasi-
ble. First, based on scientific and economic studies of risks
and benefits associated with exposure to a particular pollu-
tant, a political decision would be reached concerning the
compliance set in the form of an interval of the type
0 < 9 < Q0 for a parameter of the distribution of the pollu-
tion process. Second, criteria for desirable sampling plans,
estimators, and operating characteristic curves would be
established. Third, attempts would be made to create a
sampling plan and estimators that would meet these cri-
teria. The costs associated with different sampling plans
would be estimated. One possibility is that the desired pro-
perties of the operating characteristic curve might not be
achievable at a reasonable cost. Some iteration and even-
tual compromise may be required among the stated criteria.
Finally, the promulgated regulation would be expressed in
terms of an operational definition that involves measured
quantities, not parameters.
Injecting parameters into regulations, as was done in
the new ozone standard, leads to unnecessary questions of
interpretation and complications in enforcement. In fact,
inconsistencies (such as that implied by
/V0&f/(X)
-------
ties of violations not being detected (type n errors); indus-
tries would know the probabilities of being accused
incorrectly of violating standards (type I errors); and all
parties would know the costs associated with various pro-
posed environmental control schemes. We believe that the
operating characteristic curve is a simple, yet comprehen-
sive device for presenting and comparing different alterna-
tive regulations because it brings into the open many
relevant and sometimes subtle points. For many people it
is unsettling to realize that type I and type II errors will be
made, but it is unrealistic to develop regulations pretending
that such errors do not occur. In fact, one of the central
issues that should be faced in formulating effective and fair
regulations is the estimation and balancing of the probabili-
ties of such occurrences.
Acknowledgments
This research was supported by grants SES - 8018418
and DMS - 8420968 from the National Science Founda-
tion, Computing was facilitated by access to the research
computer at the Department of Statistics, University of
Wisconsin, Madison.
Appendix
The t-statistic procedure is based on the estimator
/ (x) = (L-x)/s where L is the limit (0.12 ppm), x the sam-
ple average, and s the sample standard deviation. The deci-
sion function is
f / (x) > c : in compliance
d(f&=-\f(x) c
(A2)
where ZQ = ~I(l-9o) and 9o is the fraction above the
limit we at most want to accept (here 1/365).
The exact operating characteristic function is found
by reference to a non-central t-distribution, but for all prac-
tical purposes the following approximation is sufficient:
L-x
Pro,
>c» =
(A3)
The operating characteristic function in Figure 4 is con-
structed using a=0.05, 9o=l/365 and n=3x365. Substitut-
ing (A3) into (A2) yields
= 1 - 0.05
(A4)
Literature Cited
(1) National Primary and Secondary Ambient Air Quality
Standards, Federal Register 36, 1971 pp 8186-8187.
(This final rulemaking document is referred to in this
article as the old ambient ozone standard.)
(2) National Primary and Secondary Ambient Air Quality
Standards, Federal Register 44, 1979 pp 8202-8229.
(This final rulemaking document is referred to in this
article as the new ambient ozone standard.) The back-
ground material we summarize is contained in this
comprehensive reference.
(3) Javitz, H. J. /. Air Poll. Con. Assoc. 1980 30, pp 58-
59.
(4) Hald, A. "Statistical Theory with Engineering Appli-
cations"; Wiley, New York, 1952; pp 303-311.
(5) Hunter, J. S. Science 210,1980 pp 869-874;
(6) Hunter, J. S. In "Appendix D", Environmental Moni-
toring, Vol IV, National Academy of Sciences 1977,
(7) Eisenhart, C. In "Precision Measurements and Cali-
bration", National Bureau of Standards Special Publi-
cation 300 Vol. 1, 1969; pp 21-47.
(8) Porter, W. P.; Hinsdill, R.; Fairbrother, A.; Olson, L.
J.; Jaeger, J.; Yuill, T.; Bisgaard, S.; Hunter, W. G :
K. Nolan, K. Science 1984,224, pp 1014-1017.
(9) Rogers, W. H. "Handbook of Environmental Law",
West Publishing Company, 1977, St. Paul, MN.
which solved for the critical value yields c = 2.6715. Refer
for example to (4) for more details.
47
-------
Figure 1. Operating characteristic curve for the 1971 ambient ozone standard (old
standard), as a function of the expected number of hours of exceedances of 0.08 ppm
per year. Note that if the old standard had been written in terms of an allowable limit
of one for the expected number of exceedances above 0.08 ppm, the maximum type I
error would be 1.00 - 0.73 = 0.27.
Figure 2. Operating characteristic curve for the 1979 ambient ozone standard (new
standard), as a function of the expected number of days of exceedances of 0.12 ppm
per year.. Note that the maximum type I error is 1.00 - 0.63 = 0.37.
Figure 3. Operating characteristic curves for the old and the new standards as a func-
tion of the mean value of ozone measured in parts per million when it is assumed that
ozone measurements are normally and independently distributed with cr = 0.02 ppm.
Figure 4. Operating characteristic curves for the new ozone standard and a t-statistic
alternative as a function of the expected number of exceedances per year.
Figure 5. Elements of the environmental standard-setting process: Laboratory experi-
ments and/or epidemiological studies are used to assess the dose/risk relationship. A
maximum acceptable risk is determined through a political process balancing risk and
economic factors. The maximum acceptable risk implies a limit for the "dose" which
again implies a limit for the pollution process as a function of time. Compliance with
the standard is operationally determined based on a discrete sample * taken from a
particular site. The decision about whether a site is in compliance is reached through
use of a statistic / and a decision function d. Knowing the statistical nature of the pol-
lution process, the sampling plan, and the functional form of the statistics and the
decision function, one can compute the operating characteristic function. Projecting
the operating characteristic function back on the dose/risk relationship, one can assess
the probability of encountering various levels of undetected violation of the standard.
03
O
(£>
^ O
U
o
ol
f\J
d
p
o
123456
expected number of hours above 0.08 ppm
48
-------
0.0
0.2
Prob( d.i.c.)
0.4 0.6
0.8
p
b
p
b
~1 - 1
old de facto limit
~r
1.0
-71
CD
CD_
o p
o p.
N W
O
CD
5'
T3 P
q b
o
en
o
o>
new de facto limit
(D
X
XJ
(D
Q.
C
CT
CD
a>
cr
o
CD
p
ro
TD
•o
3
Prob( d.i.c.)
0.4 0.6
0.8
1.0
limit specified by new standard
-------
u
•d
CO
o
ID
d
••r
0
OJ
d
O
CJ
0.0 0.5 1.0 1.5 2.0 2.5
expected number of days above 0.12 ppm
3.0
50
-------
EPA PROGRAMS AND ENVIRONMENTAL
STANDARDS
I appreciate the general points
that Dr. Bisgaard has made regarding
the development of environmental
standards. I agree that generally,
when standards are developed, most of
the technical emphasis is placed on
developing the magnitude of the absolute
number, which Dr. Bisgaard calls the
"limit part" of the standard. In
contrast, frequently little work is
expended developing the sampling program
and the rules that are used to evaluate
compliance with the limit in applica-
tion, which he calls the "statistical
part" of the standard. At EPA some
programs do a thorough and thoughtful
job of designing environmental stan-
dards. However, other EPA programs
could benefit from Dr. Bisgaard's work
because they have focused strictly on
the magnitude of the standard and have
not considered the "statistical part" of
the standard.
However, I insist that the ozone
standard and all of the National Ambient
Air Quality Standards fall into the
category of standards where both the
"limit part" and the "statistical part"
of the standard have been designed based
on extensive performance evaluations and.
practical considerations.
There are other EPA programs that
have also done an excellent job of
designing and evaluating the "limit
part" and the "statistical part" of
their standards. For example, under
the Toxic Substances Control Act (TSCA)
regulations, there are procedures for
managing PCB containing wastes. In
particular, PCB soil contamination must
be cleaned up to 50 ppm. Guidances have
been prepared that stipulate a detailed
sampling and evaluation program and
effectively describe the procedure for
verifying when the 50 ppm limit has been
achieved. Also under the TSCA mandate,
clearance tests are under development
for verifying that, after the removal
of asbestos from a building, levels are
not different from background levels.
There are, however, many programs
at EPA that have not performed the
analysis and inquiry necessary to
design the "statistical part" of their
standards. One example is the Maximum
Contaminant Levels (MCLs) which are
developed and used by EPA'S drinking
water program. MCLs are concentration
limits established for controlling
pollutants in drinking water supplies.
Extensive health effect, engineering,
and economic analysis is used to choose
DISCUSSION
W. Barnes Johnson
the MCL concentration value. However,
relatively little work is done to ensure
that, when compliance with the MCL is
evaluated, appropriate sampling and
analysis methodologies are used to
ensure a designed level of statistical
performance.
Similarly, risk-based cleanup
standards are used in EPA's Superfund
program as targets for how much aban-
doned hazardous waste sites should be
cleaned up. These are concentration
levels either borrowed from another pro-
gram (e.g., an MCL) or developed based
on site-specific circumstances. A great
deal of effort has been expended on
discussions of how protective the actual
risk related cleanup standards should
be; however, virtually no effort has
been focused on the methodology that
will be used to evaluate attainment of
these standards. Drinking water MCLs
and Superfund cleanup standards could
benefit from the approaches offered by
Dr. Bisgaard.
PRACTICAL ENVIRONMENTAL STANDARDS
DESIGN: POLITICS, POLLUTANT BEHAVIOR,
SAMPLING AND OBJECTIVES
Dr. Bisgaard clearly points out
that his use of the ozone standard is
only for the purpose of example and
that the message of his presentation
applies to the development of any
standard. I have responded by trying
to identify other EPA program areas
that could benefit from the perspective
offered by Dr. Bisgaard's approach.
However, it is important to realize that
the development of the "statistical
part" of an environmental standard must
consider the nature of the political
situation, pollutant behavior, sampling
constraints, and the objective of the
standard. Ignorance of these practical
considerations can limit the usefulness
of a proposed standard regardless of the
theoretical basis. The developers of
the ozone standard were quite aware of
these contingencies and it is reflected
in the form of the "statistical part" of
the ozone standard.
Central Tendency Versus Extremes
I must agree that a standard based
on central tendency statistics will be
more robust with better operating
characteristics than a standard based on
peak statistics. The difficulty is that
EPA is not concerned with estimating or
controlling the mean ozone concentra-
tion. Ozone is a pollutant with acute
health effects and, as such, EPA's
interest lies in control of the extremes
of the population. Peak statistics were
51
-------
the primary concern when the ozone
standard was developed.
EPA, in the development of NAAQS's,
has tried to balance statistical per-
formance with objectives by examining
the use of other statistics that are
more robust and yet retain control of
the extremes. For example, EPA has
suggested basing the standard on the
fourth or fifth largest value; however,
commenters maintained that EPA would
lose control of the extremes and cause
undo harm to human health. It has also
been suggested that the peak to mean
ratio (P/M) be considered. The problem
with this approach is that the P/M is
highly variable across the United States
because of variation in the "ozone
season." The objective of developing a
nationally applicable regulatory frame-
work would be quite difficult if each
locale was subject to a different stan-
dard.
Decision Errors and Power
In addition, regardless of the
standard that is chosen, decision
errors will be highest when the true
situation at a monitoring station is at
or close to the standard. As the true
situation becomes well above or below
the standard, certainty increases and
our decisions become less subject to
error. Of course, it would be -most
desirable to have an operating charac-
teristic function with a large distinct
step at the standard. This operating
characteristic would have no error even
when the true situation is slightly
above or below the standard; however,
this is virtually impossible. There-
fore, when standards are compared for
their efficacy, it is important to
compare performance along the continuum
when the true situation is well above,
at, and well below the standard. One
should not restrict performance evalu-
ation to the area at or immmediately
adjacent to the standard, for most
statistics the performance will be
quite low in this region.
Dr. Bisgaard points out from his
Figure 2 that when a site is in compli-
ance and at the standard, expecting to
exceed the standard on one day, there
is a 37% chance that the site may be
indicated as exceeding the standard.
However, it can also be shown that when
a site is below the standard and
expects to exceed the standard on one-
half of a day, there is only about a 6%
chance that the site may be indicated
as exceeding the standard. Conversely,
it can be pointed out that when the site
is above the standard and expects to
exceed the standard on three days, there
is only a 3% chance that the site will
be found to be in compliance.
Dr. Bisgaard is quite correct in
pointing out that the operating charac-
teristics of a standard based on the
mean are better than a standard based
on the largest order statistic. How-
ever, as mentioned above, a standard
based on the mean does not satisfy the
objectives of the ozone standard. EPA
staff have tendered proposals to
improve the operating characteristics
of the standard. One of these involved
the development of a three-tiered
approach that would allow a site to be
judged: in attainment, not in attain-
ment, or too close to call. The
existing structure of the attainment
program was not flexible enough to
permit this approach.
Pollutant Behavior
Ozone is a pollutant which exists
in the environment at a high mean ambi-
ent level of approximately one-third the
existing standard. Effort expended
trying to drive down peak statistics
indirectly by controlling the mean would
be futile. This is because mean levels
can only be reduced to the background
mean which, relative to the standard, is
high even in the absence of air
pollution.
Another point to consider is that
ozone behavior is influenced by both
annual .and seasonal meteorological
effects. This is the reason that the
newest standard is based on three years
of data. The effect of an extreme year
is reduced by the averaging process
associated with a three year standard.
As mentioned above, work has also
focused on controlling the peak to mean
ratios; however, because ozone seasons
vary radically across the country, this
sort of measure would be difficult to
implement.
Dr. Bisgaard has also questioned
the new standard because of the use of
the term "expected." This terminology
was probably included in the wording
because of the many legal and policy
edits that are performed on a draft
regulation. It was not intended that
the term "expected" be applied in the
technical statistical use of the term.
The term was intended to show that EPA
had considered and reflected annual
differences in ozone conditions in the
three year form of the standard.
CONCLUSIONS
Dr. Bisgaard brings an interesting
and useful perspective to the develop-
ment of environmental standards. The
important idea is that an environmental
standard is more than a numerical limit
and must include a discussion of the
associated sampling approach and
52
-------
decision function. I tried to extend issues in exhaustive detail. Second,
this central idea by adding two primary the practical issues that influence the
points. First, there are several pro- implementation of an environmental
grams within EPA that can benefit from standard are a primary constraint and
Dr. Bisgaard's perspective; however, the must be understood in order to develop a
NAAQS program is fully aware of and has standard that offers a useful measure of
considered these sampling and decision compliance.
53
-------
QUALITY CONTROL ISSUES IN TESTING COMPLIANCE WITH A REGULATORY
STANDARD: CONTROLLING STATISTICAL DECISION ERROR RATES
by
Bertram Price
Price Associates, Inc.
prepared under
EPA Contract No. 68-02-4139
Research Triangle Institute
for
The Quality Assurance Management Staff
Office of Research and Development
U. S. Environmental Protection Agency
Washington, D.C. 20460
ABSTRACT
Testing compliance with a regulatory standard intended to
control chemical or biological contamination is inherently a
statistical decision problem. Measurements used in compliance
tests exhibit statistical variation resulting from random
factors that affect sampling and laboratory analysis. Since a
variety of laboratories with potentially different performance
characteristics produce data used in compliance tests, a
regulatory agency must be concerned about uniformity in
compliance decisions. Compliance monitoring programs must be
designed to avoid, for example, situations where a sample
analyzed by one qualified laboratory leads to a noncompliance
decision, but there is reasonable likelihood that if the same
sample were analyzed by another qualified laboratory, the
decision would be reversed.
Two general approaches to designing compliance tests are
discussed. Both approaches have, as an objective, controlling
statistical decision error rates associated with the compliance
test. One approach, the approach typically employed, depends
on interlaboratory quality control (QC) data. The alternative,
referred to as the intralaboratory approach, is based on a
protocol which leads to unique QC data requirements in each
laboratory. An overview of the statistical issues affecting
the development and implementation of the two approaches is
presented and the approaches are compared from a regulatory
management perspective.
SECTION 1 - INTRODUCTION
Testing compliance with a regulatory standard intended to
control chemical or biological contamination is inherently a
54
-------
statistical decision problem. Measurements used in compliance
tests exhibit statistical variation resulting from random factors
affecting sampling and laboratory analysis. Compliance decision
errors may be identified with Type I and Type II statistical
errors (i.e., false positive and false negative compliance test
results, respectively). A regulating agency can exercise control
over the compliance testing process by establishing statistical
decision error rate objectives (i.e., error rates not to be
exceeded). From a statistical design perspective, these error
rate objectives are used to determine the number and types of
measurements required in the compliance test.
Bias and variability in measurement data are critical
factors in determining if a proposed compliance test satisfies
error rate objectives. Various quality control (QC) data
collection activities lead to estimates of bias and variability.
An interlaboratory study is the standard approach to obtaining
these estimates. (The U.S. Environmental Protection Agency
[USEPA] has employed the interlaboratory study approach
extensively to establish bias and variability criteria for test
procedures required for filing applications for National
Pollution Discharge Elimination System [NPDES] permits - 40 CFR
Part 136, Guidelines Establishing Test Procedures for the
Analysis of Pollutants Under the Clean Water Act.) An
alternative means of estimating bias and variability that does
not require an interlaboratory study is referred to in this
report as the intralaboratory approach. The intralaboratory
approach relies on data similar to those generated in standard
laboratory QC activities to extract the information on bias and
variability needed for controlling compliance test error rates.
The purpose of this report is to describe and compare the
interlaboratory and intralaboratory approaches to collecting QC
data needed for bias and variability estimates which are used in
compliance tests. Toward that end, two statistical models, which
55
-------
reflect two different attitudes toward compliance test
development, are introduced. Model 1, which treats differences
among laboratories as random effects, is appropriate when the
laboratory producing the measurements in a particular situation
is not uniquely identified, but is viewed as a randomly selected
choice from among all qualified laboratories. If Model 1 is
used, an interlaboratory study is necessary to estimate "between
laboratory" variance which is an essential component of the
compliance test. Model 2 treats laboratory differences as fixed
effects (i.e., not random, but systematic and identified with
specific laboratories). If Model 2 is used, bias adjustments and
estimates of variability required for compliance tests are
prepared in each laboratory from QC data collected in the
laboratory. Model 2 does not require estimates of bias and
variability from interlaboratory data.
The remainder of this report consists of five sections.
First, in Section 2, statistical models selected to represent the
data used in compliance tests are described. In Section 3, a
statistical test used in compliance decisions is developed. The
comparison of interlaboratory and intralaboratory approaches is
developed in two steps. Section 4 is included primarily for
purposes of exposition. The types and numbers of measurements
needed for a compliance test are derived assuming that the
critical variance components - i.e., within and between
laboratories - have known values. This section provides the
structure for comparing the interlaboratory and intralaboratory
approaches in the realistic situation where the variance
components must be estimated. The comparison is developed in
Section 5. A summary and conclusions are presented in Section 6.
SECTION 2 - STATISTICAL MODELS
Compliance tests are often complex rules defined as
combinations of measurements that exceed a quantitative standard.
However, a simple rule - an average of measurements compared to
56
-------
the standard - is the basis for most tests. This rule provides
the necessary structure for developing and evaluating the
interlaboratory and intralaboratory approaches. Throughout the
subsequent discussion, the compliance standard is denoted by C0
and interpreted as a concentration - e.g., micrograms per liter.
Samples of the target medium are obtained, analyzed by chemical
or other appropriate methods and summarized as an average for use
in the test. The statistical design issues are:
o total number of measurements required;
o number and type of samples required; and
o number of replicate analyses per sample required.
The design issues are resolved by imposing requirements on the
compliance test error rates (i.e., the Type I and Type II
statistical error rates).
Many sources of variation potentially affect the data used
in a compliance test. The list includes variation due to sanple
selection, laboratory, day and time of analysis, analytical
instrument, analyst, and measurement error. To simplify the
ensuing discussion, the sources have been limited to sample
selection, laboratory, and measurement error. (Measurement error-
means analytical replication error or single analyst
variability.) This simplification, limiting the number of
variance components considered, does not limit the generality of
subsequent results.
The distribution of the compliance data is assumed to have
both mean and variance proportional to the true concentration.
(This characterization has been used since many types of
environmental measurements reflect these properties.) The data,
after transformation to logarithms, base e, may be described as:
57
-------
1 Yi,j,k = M + Bi + Sifj + eifjfk
where i = 1(1)1 refers to laboratory, j = 1(1) J refers to sample
and k = 1(1)K refers to analytical replication. Two different
interpretations referred to as Model 1 and Model 2 are considere
for the factors on the right side of equation 1.
In Model 1:
\i - ln(C), where C is the true concentration;
B^ - the logarithm of recovery (i.e., the
proportion of the true concentration
recovered by the analytical method) which is
a laboratory specific effect treated as
random with mean zero and variance cr2B;
Sj_ j - a sample effect which is random with mean
zero and variance C72s/* and
€i j,k " replication error which is random with mean
zero and variance CT26.
It follows that:
ae
and denoting as Yj_ an average over samples and replicates,
EQ 2 Var[Y.i_] = <72B + ff2s/J + 02e/J-K.
In Model 2, BJ_ is interpreted as a fixed effect (i.e., Bj_ is
bias associated with laboratory i) . All other factors have the
same interpretation used in Model 1. Therefore, in Model 2:
58
-------
+ B±
and
EQ 3 Var[Yi] = cr2s/J + a2e/J'K
Differentiating between Model 1 and Model 2 has significant
practical implications for establishing an approach to compliance
testing. These implications are developed in detail below. For
now, it is sufficient to note that the collection of Bj_'s are
treated as sealer factors uniquely associated with laboratories.
If the identity of the specific laboratory conducting an analysis
is unknown because it is viewed as randomly selected from the
population of all laboratories, then Bj_ is treated as a random
effect. If the laboratory conducting the analysis is known, Bj_
is treated as a sealer, namely the bias of the ith laboratory.
SECTION 3 - STATISTICAL TEST: GENERAL FORMULATION
The statistical test for compliance is based on an average
of measurements, Y. Assuming that Y's are normally distributed
(recall that Y is the natural logarithm of the measurement),
noncompliance is inferred when
EQ 4 Y > T
where T and the number of measurements used in the average are
determined by specifying probabilities of various outcomes of the
test. (For simplicity in exposition in this section, the
subscripts i, j, and k used to describe the models in Section 2
are suppressed. Also, aY is used in place of the expressions in
EQ 2 and EQ 3 to represent the standard deviation of Y. The more
detailed notation of EQ 2 and EQ 3 is used in the subsequent
sections where needed.)
59
-------
Let P! and p2 be probabilities of declaring noncompliance
when the true means are d]/Co and d2'CQ respectively (dlrd2 > 0),
and let
Mo = ln(C0)
D! = ln(d!), D2 = In(d2).
Requiring
EQ 5 px = P[ Y > T: M = Mo + Dl ]
and
EQ 6 p2 = P[ Y > T: M = Mo + D2 1
leads to values of T and the number of measurements used to forr
Y by solving
EQ 7 [(T - Mo + D1)]/aY = Zi.p-L
and
EQ 8 [(T - MO + D2)]/(7y = Zl-p2
where Z^.p^ and Z^_p2 are percentile points of the standard
normal distribution.
The solutions are:
EQ 9 T = C7y-Zl_pl + MO + Dl
EQ 10 ay = (D2 - QI)/(ZI-PI ~ Zi_p2).
This formulation allows considerable flexibility for
determining compliance test objectives. Consider the following
three special cases:
Case (i) . When d]_ = 1, p^_ = a, d2 is any positive number
60
-------
greater than 1 and p2 = 1 - /3, the formulation reduces to the
classical hypothesis testing problem Ho: M = MO versus
H]_: fj. = ^o + D2 • T^e correct number of measurements establishes
the probabilities of Type I and Type II errors at a and /3
respectively.
Case (ii) . Let d]_ = 1, £2 be a positive number less than 1,
P! = 1 - /3, and p2 = a. This formulation also reduces to the
classical hypothesis testing problem Ho: M = MO + D2 versus
H2_: M = MO- (Note that ju0 + D2 < MO* i.e., D2 < 0.)
Case (iii) . Let 1 < d-^ < d2. Set p^_ < p2 to large values
(e.g., .90 and .99). This formulation imposes a high probability
of failing the compliance test when the mean is D]_ times the
standard, and a higher probability of failing when the mean is
further above the standard.
Case (ii) imposes a more stringent regulatory program on the
regulated community than Case (i). In Case (i), the regulated
community may establish control methods to hold the average
pollution level at the standard. In Case (ii), the pollution
level must be controlled at a concentration below the standard if
the specified error rates are to be achieved. In Case (iii), a
formal Type I error is not defined. Individual members of the
regulated community may establish the Type I error rate by
setting their own pollution control level - the lower the control
level, the lower the Type I error rate. In Case (iii), the
regulated community has another option also. There is a tradeoff
between the control level and the number of measurements used in
the compliance test. Individuals may choose to operate at a
level near the standard and increase the number of measurements
used in the compliance test over the number required to achieve
the stated probability objectives. The important difference
between Case (iii) and the two other cases is the responsibility-
placed with the regulated community regarding false alarms (i.e.,
61
-------
Type I errors) . Since false alarms affect those regulated more
than the regulator, Case (iii) may be the most equitable approach
to compliance test formulation.
SECTION 4 - SAMPLE SIZE REQUIREMENTS: VALUES OF VARIANCE
COMPONENTS KNOWN
The discussion below follows the structure of Case (i)
described above. Based on the general formulation developed in
Section 3, the conclusions obtained also hold for Cases (ii) and
(iii).
MODEL 1
The compliance test is a statistical test of:
H0: ju = Mo = In (C0)
versus
+ D
where C0 is the compliance standard. Assuming the values 'of the
variance components are known, the test statistic is
Z = (Yi - M0)/(tf2B + cr2S/J
Specifying the Type I error rate to be a leads to a test
that rejects HQ if
EQ 11 Z > Zi_a
where Z^_a is the (l-a)th percentile point of the standard normal
distribution. If the Type II error is specified to be /? when the
alternative mean is /no + 02, then:
EQ 12 a2B + (72S/J + a2e/J-K =
62
-------
Any combination of J and K satisfying EQ 12 will achieve the
compliance test error rate objectives. However, unique values of
J and K may be determined by minimizing the cost of the data
collection program subject to the constraint in EQ 12. Total
cost may be stated as:
EQ 13 TC = J-G! + J'K-C2
where Cl is the unit cost of obtaining a sample and C2 is the
cost of one analysis.
Using the LaGrange Multiplier method to minimize EQ 13
subject to the constraint imposed by EQ 12 yields:
EQ 14 K = (ae/as)-(C1/C2)1/2
and
EQ 15 J = [as-ae/(U-a2B)]-[as/ae + (C2/C1)1/2]
where
U = [D2/(Z!_a + Z^)]2.
•
(If EQ 14 does not produce an integer value for K, the next
largest integer is used and J is adjusted accordingly.)
The number of replicate analyses for each sample, K,
increases as the ratio of the sampling cost to the analysis cost
increases and the ratio of the single analyst standard deviation
to the sampling standard deviation increases. In many
situations, the analysis cost, C2, is much larger than the
sampling cost, GI; and the sampling variance is much larger than
single analysis variability. Under these conditions, the number
of replicate analyses, K, will be 1 (i.e., each sample will be
analyzed only once).
63
-------
MODEL 2
Since
E(Yi) = M + Bi
the statistic used in the compliance test must incorporate a bias
adjustment (i.e., an estimate of B^) . This can be achieved by
analyzing standard samples prepared with a known concentration c.
(Choosing C at or near CQ minimizes the effects of potential
model specification errors.) Let
16 *>i,,k = Yi,j,k - lnc * Bi
Since
E(bi) = Bi
b^ is an estimate of B^ and
Var(bi) = a2Si/J' + a2e/J"K'
where
S ' i j - an effect associated with standard samples
which is random with mean zero and variance
a2Si ;
J1 - the number of standard samples used to
estimate B-[ ; and
K1 - the number of analyses conducted on each
standard sample.
(Note that single analyst variability, cr2€, is assumed to have
the same value for field samples and prepared samples.)
The test statistic is
EQ 17 (Yi-bi-)u0)/[a2s/J + a2s,/J' + a2e/(l/J"K' + 1/J-K)]V2
64
-------
The cost function used to allocate the samples and replicates is:
EQ 18 TC = J-G! + J'-C3 + (J-K + J''K')'C2
where C3 is the unit cost for preparing a standard sample.
Type I and Type II error rates - a and /3 - are achieved if:
EQ 19 o2s/J + a2Si/J! + a2e(l/J'-K' + 1/J'K) = U
where
U = [D2/(Z1_a + Z^)]2,
as defined in the discussion of Model 1.
Minimizing costs subject to the constraint on variance
yields
EQ 20 K = (ae/as)-(C1/C2)1/2,
which is identical to the solution obtained for Model 1, and
EQ 21 K' = (ae/as,)-(C3/C2)1/2/
EQ 22 J- = (as,/U)-[as-(C1/C3)1/2, + 2-ae(C2/C3)L/2 + ag,i,
and
EQ 23 J = J"(as/c7S,)'(C3/C1)1/2.
The solutions for K and K1 are similar. Each increases with
the ratio of sampling to analytical costs and the ratio of
analytical to sampling standard deviations.
SECTION 5 - SAMPLE SIZE REQUIREMENTS: VALUES OF VARIANCE
COMPONENTS UNKNOWN
In this section the interlaboratory and intralaboratory
approaches for obtaining estimates of the variance components
necessary to implement the designs developed in Section 4 are
65
-------
described. As in Section 4, the design objective is to control
the compliance test error rates (i.e., the Type I and Type II
error probabilities). The discussion is simplified by
considering situations where the cost of analysis is signifi-
cantly greater than the cost of sampling, and the sample to
sample variability is at least as large as the analytical
variability:
C2 >:> cl an<^ a^S > Cf2e*
Under these conditions, K = 1 (i.e., each sample is analyzed onl;
once). Also, the value of K1 determined from EQ 21 (i.e., the
number of replicate analyses performed on each standard sample)',
will be set equal to 1 since the cost of preparing standard
samples for estimating Bj_ is significantly less than the cost of
analyzing those samples (i.e., C^ « ^2) •
When K = K1 = 1, the variances used to define the test
statistic are, for Model 1 and Model 2 respectively:
EQ 24 Var(Yj_) = CT2B + (a2s + cr2e)/J
= a2B + a2€,/J
and
EQ 25 Var(Yj_ - bjj = (a2s + a2e)/J + (a2s. + a2e)/J'
= CT2ei/J + a2eii/J1.
(The notations
-------
using interlaboratory data or it may be estimated from the J
measurements of field samples used to form the average when the
compliance test is performed.
As described by Youden (1975), an interlaboratory study
involves M laboratories (between 6 and 12 are used in practice)
which by assumption under Model 1 are randomly selected from the
collection of all laboratories intending to produce measurements
for compliance testing. For the discussion below, let n denote
the number of samples analyzed by each laboratory. (Youden
recommends n = 6 prepared as 3 pairs where the concentrations of
paired samples are close to each other but not identical.)
Let
w^
where {Vj_^j: i=l(l)M; j=l(l)n) are the measurements produced by
the i-th laboratory on the j-th sample, and {Cj: j=l(l)n} are the
concentration levels used in the study. (Youden does not
recommend using logarithms, however the logarithmic
transformation is convenient and is consistent with other
assumptions in Youden's design.) The statistical model
describing the interlaboratory study measurements is:
EQ 26 Wifj = Bi + e"!^
where
BJ[ is an effect associated with the i-th laboratory
and treated as a random variable with mean zero
and variance cr2B; and
67
-------
e''j_^j is analytical error, the sum of single analyst
error and an effect associated with variation
among standard samples, which has mean zero and
variance cr2e • i .
Using standard ANOVA (analysis of variance) techniques, a2B
may be estimated from the "within laboratory" and "between
laboratory" mean squares, Q^ and Q2 :
EQ 27 Q-L = 2(Wi;j - Wi)2/M-(n-l)
and
EQ 28 Q2 = n'Z(Wi - W)2/(M-1).
The estimate is:
EQ 29 s2B = (Q2 - Qi)/n
which reflects differences among the laboratories through the
quantity
EQ 30 Z(Bj_ - B) 2.
Also, QI is an estimate of a2eii.
The compliance test statistic may be defined either as
EQ 3 la R = (Yi - M0)/(SB +
or
2
1/2
EQ 31b R = (Yi - M0)/(SB + S
where s2ei is the sample variance of the J measurements,
se
68
-------
and (Y-j^j = InCXi^j), j = i(i)J) are the measurements obtained
from field samples in the laboratory selected to conduct the
analyses. (Based on the discussion at the beginning of this
section, K is always equal to I. Therefore, the notation
describing compliance measurements has been simplified, i.e.,
Yi,j ~ Yi,j,l)- Note that Q-^ estimates the average variability
over laboratories, whereas s2ei estimates variability for the
laboratory conducting the test. Also, QI is an estimate of
a2eii, the variability associated with the analysis of standard
samples; s2ei is an estimate of the variability associated with
the analysis of field samples.
The ratios in EQ 31a and EQ 31b have approximate t-distri-
butions when the null hypothesis is true. The degrees of freedc:
may be estimated by methods developed by Satterthwaite (1946).
Although it is possible to approximate the degrees of freedom anc
use a percentile point of the t-distribution to define the test,
that approach is complicated. Develop it at this point would be
an unnecessary diversion. Instead, non-compliance will be
inferred when
EQ 32 R > Zx_a
where Z^-a is the (1 - a)th percentile point of the standard
normal distribution. (If R has only a few degrees of freedom,
which is likely, the Type I error rate will be larger than a.
The situation may be improved by using, for example, Z1_a/2 °r
some other value of Z larger than Z1_a. If necessary, exact
values of Z could be determined using Monte Carlo methods.)
The number of samples, J, that must be analyzed for the
compliance test is obtained by specifying that the expression in
EQ 32 is equal to 1-/3 when the true mean is MO + D2 • Tne value
of J may be obtained either by using approximations based on the
normal distribution, the noncentral t-distribution, or by
69
-------
estimates based on a Monte Carlo simulation of the exact
distribution of R.
If EQ 31a is used, the compliance test criterion (i.e., the
expression in EQ 32) becomes
EQ 33 GM(Xi/:J) > CQ-expCZ^a • (sB +
where GM is the geometric mean of the J compliance measurements.
The right side of the inequality is a fixed number once the
interlaboratory study is completed. The advantage of this
approach is the simplicity realized in describing the compliance
test to the regulated community 'in terms of one measured
quantity, the geometric mean. The disadvantage is using Q]_
rather than the sample variance calculated from the compliance
test measurements which is likely to be a better estimate of
variability for the particular laboratory conducting the test.
MODEL 2
Under Model 2, estimates of variance from interlaboratory
study data are unnecessary. Since the laboratory conducting the
analyses for the compliance test is uniquely identified, the
laboratory factor, Bj_, is a sealer, and the variance component,
cr2B/ does not enter the model. The variance estimates needed for
the compliance test can be obtained from the measurements used tc
compute YJ_ and bj_.
The test statistic is
EQ 34 t = (Y-j. - bi - M0)/(s2e./J + s2e, '/J')1/2
which has an approximate t-distribution with degrees of freedom
equal to J + J1 - 2 when the true mean is /i0- (The statistic
would have an exact t-distribution if a2ei were equal to cr2eii.)
Noncompliance is inferred if
70
-------
EQ 35 t > tx_a.
J and J1 are determined by requiring that the probability of the
expression in EQ 35 be equal to 1 - /3 when the true mean is
MO + D2• T^i3 calculation can be made using the noncentral t-
distribution. Where cr2ei = a2eii/ the noncentrality parameter is
D2/[c72€'(1/J + 1/J')]. (Note that this formulation implies a
tradeoff between J and J1 for achieving the compliance test error
rate objectives.) If cr2ei and cr2eii are not equal, the correct
value to replace t;L_a in EQ 35 and values of J and J1 may be
determined using Monte Carlo methods.
SECTION 6 - DISCUSSION AND CONCLUSIONS
Both statistical models considered above are consistent with
reasonable approaches to compliance testing. The two approaches,
however, have distinctly different data requirements.
Model 1, through EQ 32a, reflects "the conventional"
approach to compliance testing. A "target value for control,"
Co, is established (e.g., either a health based standard or a
"best available control technology" standard) and then adjusted
upward to account for both analytical variability and laboratory
differences. Using EQ 33, noncompliance is inferred when the
geometric mean of the compliance test measurements, GM(Xj_ j), is
larger than Co multiplied by a factor which combines estimates
reflecting variability between laboratories, a2B, and analytical
variability within laboratories. Since an estimate of cr2B is
required in the Model 1 approach, an interlaboratory study is
required also. The role of cr2B, which reflects laboratory
differences, is to provide insurance against potentially
conflicting compliance results if one set of samples were
analyzed in two different laboratories. Systematic laboratory
differences (i.e., laboratory bias) could lead to a decision of
noncompliance based on analyses conducted in one laboratory and a
71
-------
decision of compliance based on analyses of the same samples
conducted in another laboratory.
In practice, a2B is replaced by s2B, an estimate obtained
from the interlaboratory study. The variability of this estimate
also affects the compliance test error rates. If the variance of
s2B is large, controlling the compliance test error rates becomes
complicated. Requiring that more field samples be analyzed
(i.e., increasing J) may help. However, increasing the amount of
interlaboratory QC data to reduce the variance of s2B directly
may be the only effective option. Based on interlaboratory QC
data involving 6 to 12 laboratories, which is current practice,
the error in s2B as an estimate of a2B is likely to be as large
as 100%. If interlaboratory QC data were obtained from 30
laboratories, the estimation error still would exceed 50%. "
(These results are based on a 95% confidence interval for CT2B/s2B
determined using the chi-square distribution.) Since
interlaboratory data collection involving 12 laboratories is
expensive and time consuming, it is doubtful if a much larger
effort would be feasible or could be justified.
Using Model 2 and the intralaboratory approach, a regulatory
agency would not attempt to control potential compliance decision
errors resulting from laboratory differences by using an estimate
of "between laboratory" variability to adjust the compliance
standard. Instead, compliance data collected in each laboratory
would be adjusted to reflect the laboratory's unique bias and
variability characteristics. In many situations, bias for any
specific laboratory can be estimated as precisely as needed using
QC samples. Also, the variance of the bias estimate, which is
needed for the compliance test, can be estimated from the same
set of QC sample measurements. An estimate of analytical
variability required for the compliance test can be estimated
from the measurements generated on field samples. Therefore, all
information needed to develop the compliance test can be obtained
72
-------
within the laboratory that produces the measurements for the
test.
From a regulatory management perspective, both approaches
(i.e., Model 1 using interlaboratory QC data and Model 2 using
intralaboratory QC data) lead to compliance tests that satisfy
specified decision error rate objectives. However, the
intralaboratory approach based on Model 2 appears to be the more
direct approach. The design for producing data that satisfy
error rate objectives is laboratory specific, acknowledging
directly that laboratories not only have different bias factors,
but also may have different "within laboratory" variances. Each
laboratory estimates a bias adjustment factor and a variance
unique to that laboratory. Then, the number of samples required
for that specific laboratory to achieve specified error rate
objectives is determined. As a result, each laboratory produces
unbiased compliance data. Also, compliance test error rates are
identical for all laboratories conducting the test. Moreover,
the data used to estimate laboratory bias and precision are
similar to the QC measurements typically recommended for every
analytical program. In summary, the intralaboratory approach
appears, in general, to provide a greater degree of control over
compliance test error rates while using QC resources more
efficiently than the approach requiring interlaboratory QC data.
73
-------
REFERENCES
Satterthwaite, F.E. (1946), "An Approximate Distribution of
Estimates of Variance Components", Biometrics Bulletin, Vol. 2,
pp. 110-114.
Youden, W.J.; and Steiner, E.H. (1975), Statistical Manual of
AOAC. Association of Official Analytical Chemists, Washington,
D.C.
74
-------
DISCUSSION
George T. Flatman
U.S. Environmental Protection Agency
Dr. Bertram Price has something worth saying
and has said it well in his paper entitled,
"Quality Control Issues in Testing Compliance
with a Regulatory Standard: Controlling Sta-
tistical Decision Error Rates."
The Environmental Protection Agency is
emphasizing "Data Quality Objectives." Dr. Price
has expressed the most important of these objec-
tives in his title, "Controlling Statistical
Decision Error Rates." The paper is timely for
EPA because it demonstrates how difficult the
statistics and the implementation are for data
quality objectives.
In Section 1.. .Introduction, an "interlabora-
tory study approach" is suggested for establish-
ing "bias and variability criteria." This is
theoretically valid but may not be workable in
practice. In contract laboratory programs,
standards are in a much cleaner matrix (dis-
tilled water instead of leachate) and sometimes
run on cleaner instruments that have not just
run dirty specimens. Standards or blank samples
cannot avoid special treatment by being blind
samples since they are in a different matrix
than the field samples. Thus, in practice, the
same matrix and analytical instruments must be
used to make "interlaboratories study" an un-
biased estimate of the needed "bias and vari-
ability criteria." Both the theory and the
implementation must be vigorously derived.
In Section 2...Statistical Models the enumer-
ation of the components of variation is important
for both theory and practice. More precise
enumeration of variance components than the
mutually exclusive and jointly exhaustive theory
of "between and within" is needed for adequate
sampling design. I agree with Dr. Price that
"simplification, limiting the number of variance
components, does not limit the generality of
subsequent results," but I suggest it makes
biased or aliased data collection more probable.
For example, the Superfund Interlaboratories
Studies of the Contract Labs has identified the
calibration variance of the analytical instrument
as the largest single component of longitudinal
laboratory (or interlaboratories) variance.
If this component of variation is not enumerated
explicitly, I suggest this component of variance
could be omitted, included once, or included
twice. If all the field samples and lab repli-
cate analyses were run between recalibrations of
the analytical instrument, the recalibration
variance would be omitted from the variances of
the data. If the analytical instrument were
recalibrated in the stream of field samples and
between lab replicate analyses, the recalibration
variance would be aliased with both the sample
and lab variances, and thus added twice into the
total variance. With these possible analyses
scenarios the recalibration component of variance
could be either omitted or included twice. This
potential for error can be minimized through the
vigorous modeling of all the process sources of
variation in the components of variance model.
This is not a criticism of the paper out it is a
problem for the implementation of this paper by
EPA's data quality objectives.
Section 3...Statistical Test is very important
because it specifically states the null and
alternative hypotheses with their probability
alpna of type I error and probability beta of
type II error. This may appear pedantic to the
harried practitioner, but due to the importance
of the decision is absolutely essential to data
quality objectives. Dr. Price's alternative
hypothesis and his beta-algebra is complicated
by EPA's interpretation of the law, "no exceed-
ence of background values or concentration
limits" (40 CFR part 264). This requires an
interval alternative hypothesis
Hi : u > Mo
rather than Dr. Price's point hypothesis
H-J : u = U0 + D.
Lawyers should be more aware of how they increase
the statistician's work. Beta is a function or
curve over all positive D.
I think it is important to mention in any
environmental testing that beta is more critical
or important than in historical hypotheses test-
ing. Classically the hypotheses are formulated
so that a type II error is to continue with the
status quo when in fact a new fertilizer, brand
of seed potato, etc., would be better. Thus, the
loss associated with the type II error is low and
its probability of occurrence can be large (e.g.,
20 percent) in agricultural experiments. This is
not true in environmental hypotheses testing!
The hypotheses usually make a type II error the
misclassification of "dirty" as "clean" with a
loss in public health and environmental protec-
tion. Thus, beta representing the probability of
this loss in public health and environmental
protection should be set arbitrarily low like
alpha (1% or 52) .
Sections 4 and 5...Sample Size Reouirements
derive equations for numbers of field samples
and lab replicates as a function of cost and
variances. The formulas digitize the process
for precise decis'ions between number of field
samples and number of lab replicates. The for-
mulas indicate that an analysis instrument like
GCMS, because of its high incremental analysis
cost and low variance requires few replications
(K=l), but other analysis instruments such as
radiation counters may not. These formulas have
a practical value because of the diversity of
analysis instruments and pollutants.
Section 5...Sample Size Requirements: Values
of Variance Components Unknown detail the rigors
of variance components estimation through unknown
degrees of freedom and non-central t-distribution.
75
-------
It might be asked, is not only the sum of var-
iances needed for testing or "quality assurance"
(i.e., rejection of outliers). This is true, but
"quality improvement" requires the estimation of
each component of variance. The analysis is more
meaningful and usable if the individual compo-
nents have an estimate.
Section 6...Discussion and Conclusions state
that interlaboratories QC model(variable effects)
and intralaboratory QC model (fixed effects)
"lead to compliance tests that satisfy specified
decision error rate objectives." This theoreti-
cal position of the paper is confirmed by the
empirical findings of the Superfund Interlabora-
tories Comparison of the Contract Laboratories.
This study found that within-lab variance is of
corresponding magnitude to between-lab variance.
The appropriate test and model should be used
that correspond to the use of one lab or more
than one lab in the actual chemical analysis of
the data.
In conclusion, Dr. Bertram Price has rigor-
ously presented the algorithms and the problems
for "Controllina Statistical Decision Error
Rates." This paper enumerates the statistical
problems in applying hypothesis testing to real
world data. Unfortunately, hypotheses testing is
made deceptively simple in many textbooks and'the
true complexity is discovered in practice through
the expensive consequences of a wrong decision.
The serious problems discussed in Or. Price's
paper are needed to sober the superficial use of
"alphas, betas, and other probabilities" in data
quality objective statements. The paper is a
timely and vigorous summary of components of vari-
ance modeling and hypotheses testing.
Acknowledgments: The discussant wishes to thank
l-orest Garner and Evangelos Yfantis for their
advice, review, and insight gained from Super-
fund interlaboratories testing.
Notice: Although the thoughts expressed in this
discussion have been supported by the United
States Environmental Protection Agency, they have
not been subject to Agency review and therefore
do not necessarily reflect the views of the
Agency and no official endorsement shculd.be
inferred.
76
-------
ON THE DESIGN OF A SAMPLING PLAN TO VERIFY COMPLIANCE WITH EPA STANDARDS
FOR RADIUM-226 IN SOIL AT URANIUM MILL-TAILINGS REMEDIAL-ACTION SITES
R.O. Gilbert, Pacific Northwest Laboratory; M.L. Miller, Roy F. Weston,
Inc.; H.R. Meyer, Chem-Nuclear Systems, Inc.
1.0 INTRODUCTION
The United States government is required under the Uranium Mill Tailings
Radiation Control Act (U.S. Congress Public Law 95-604, 1978) to perform
remedial actions on inactive uranium mill-tailings sites that had been federally
supported and on properties that had been contaminated by the tailings. The
poc
current Environmental Protection Agency (EPA) standard for Ra (henceforth
denoted by Ra) in soil (EPA, 1983) requires that remedial action must be taken
if the average concentration of Ra in surface (0- to 15-cm) soil over any
area of 100 square meters exceeds the background level by more than 5 pCi/g,
or if the average exceeds 15 pCi/g for subsequent 15-cm thick layers of soil
more than 15 cm below the surface. Since there are many thousands of 100
square-meter areas that must be evaluated, the soil sampling plan should be
as economical as possible while still meeting the intent of the regulations.
After remedial action at a site has }>een conducted, the field sampling
procedure that has been used to determine whether the E.PA standard was'met was
to first grid the entire site into 10-m by 10-m plots. Then, in each plot,
20 plugs of surface soil were collected and physically mixed together from
which a single 500-g composite sample was withdrawn and assayed for Ra. If
this measurement was > 5 pCi/g above background, then additional remedial
action was required. Recently, based on cost considerations and the study
described in Section 2.0, the number of soil plugs per composite sample was
reduced from 20 to 9.
In this paper we discuss a verification acceptance-sampling plan that is
being developed to reduce costs by reducing the number of composite soil samples
that must be analyzed for Ra. In Section 2.0 we report on statistical analyses
of Ra measurements on soil samples collected in the windblown mi 11-tailings
flood plain at Shiprock, NM. These analyses provide guidance on the number
and size of composite soil samples and on the choice of a statistical decision
rule (test) for the acceptance-sampling plan discussed in Section 4.0. In
Section 3.0, we discuss the RTRAK system, which is a 4-wheel-drive tractor
equipped with four Sodium-Iodide (Nal) gamma-ray detectors. The RTRAK is being
developed for measuring radionuclides that indicate the amount of Ra in surface
soil. Preliminary results on the calibration of these detectors are presented.
77
-------
2.0 PERCENT ACCURACY OF MEANS AND PROBABILITIES OF DECISION ERRORS
In this section we statistically analyze Ra measurements of composite
soil samples collected from the windblown mi 11-tailings flood-plain region at
Shiprock, NM. This is done to evaluate the impact on probabilities of false
positive and false negative decision errors resulting from reducing the number
of soil plugs per composite soil sample from 21 to 9 or 5 and from collecting
1, 2, or 3 composite samples per plot. We also consider how these changes
affect the accuracy of estimated mean Ra concentrations.
2.1 FIELD SAMPLING DESIGN
The Shiprock study involved collecting multiple composite soil samples
of different sizes from 10 plots in the flood-plain region after an initial
remedial action had occurred. Five sizes of composite samples were collected;
those formed by pooling either 5, 8, 9, 16, or 21 plugs of soil.
Figure 1 shows the windblown mill-tailings flood-plain region and the
location of ten 30-m by 30-m study areas from which composite soil samples.
were collected. Eight- and 16-plug composite samples were formed by pooling
soil plugs that were collected over the ten 30-m by 30-m areas according to
the three sampling patterns shown in the lower half of Fig. 2. The 5-, 9-,
and 21-plug composite samples were formed by pooling soil plugs collected
from only the central 10-m by 10-m plot in each 30-m by 30-m area using the
three patterns shown in the upper half of Fig. 2.
Up to nine composite samples of each type were formed in each of the ten
areas. Each composite sample of a given type used the same pattern that had
been shifted slightly in location. For example, referring to Fig. 2, the
21-plug composite sample number 1 in a given 10-m by 10-m plot was formed by
pooling soil plugs collected at the 21 positions numbered 1 in the plot.
This design allowed replicate composite samples of a given type to be collected
without altering the basic pattern that would be used in practice.
Each soil plug was collected to a depth of 15 cm using a garden trowel.
The plugs collected for a given composite sample were placed in a bucket and
mixed vigorously by stirring and shaking. The composite sample analyzed for
Ra consisted of about 500 g of the mixed soil.
78
-------
10-m by 10-m Plots Where 226Ra Concentrations
Were Expected to Exceed 5 pCi/g
FIGURE 1. Location of the Ten 30-m by 30-m Areas in the Windblown Mill-
tailings Flood Plain Region at Shiprock, New Mexico," Within
Which 'Multiple-composite Soil Samples were Collected Following
Initial Removal of Surface Soil.
79
-------
-10 m
21-Plug Composites
9-Plug Composites
T
1.8 m
i
i
10 m
m
m
16-Plug Composites
8-Plug Composites
1. 3, 5, 7 and 9
5-Plug Composites
Positions Where
Soil Cores Were
Taken
m
8-Plug Composites
2, 4, 6 and 8
FIGURE 2. Sampling Patterns for 5-, 8-, 9-, 16-, and 21-plug
.Composite Soil Samples Collected From Ten 30-m by
.. 30-m Areas in the Windblown Mill-tailings Flood Plain
"•v-:': at Shiprock, New Mexico.
80
-------
2.2 DESCRIPTION OF THE DATA
The Ra measurements for the composite samples are plotted in Figs. 3, 4,
and 5. The figures also give the arithmetic mean, x, the standard deviation,
s, and the number of replicate composite samples, n. We wish to determine
the extent to which the true standard deviation, <7, increases when fewer than
21 plugs are used to form a composite sample. To avoid confusion, we point
out that Figs. 4 and 5 indicate that Ra measurements of most 5-, 9-, and 21-
plug samples from Areas- 1, 3, and 4 are larger than measurements for the 8-
and 16-plug samples from those areas. This is believed to have occurred
because the soil in the central 10-m by 10-m plot (from which 5-, 9-, and 21-
plug composite samples were formed) had higher concentrations of Ra than the
soil in the 30-m by 30-m areas from which the 8- and 16-plug samples were
formed (see Fig. 1).
Measurements for Areas 8, 9, and 10 were below 5 pCi/g (Fig. 3) and the
standard deviations ranged from 0.2 to 0.8 pCi/g, with no apparent trends in
s with increasing number of plugs per sample. The data in Fig. 4 indicates
that 5-plug sample data sets may be more skewed than those for 9- or 21-plug
samples, at least for some plots. The measurements for Areas 1, 4, and 7 (Fig.
5) had higher means and were more variable than those for the areas in Figs.
3 and 4. In Fig. 6 are plotted the values of s from Figs. 3, 4, and 5 to
show more clearly the changes in s that occurred as the number of plugs per
composite sample changed.
2.3 ESTIMATING AND MODELING CHANGES IN STANDARD DEVIATIONS
In this section we first estimate the changes in a that occur as the
number of plugs per composite sample decreases from 21 to a smaller number.
Then a model for these changes is developed for use in later sections.
A simple model for the ratio of standard deviations is obtained by assuming
that measurements of Ra in individual soil plugs are uncorrelated, than the
soil plugs are thoroughly mixed together before the 500-g aliquot is removed,
and that the standard deviation between soil plugs does not change as the
81
-------
n
X
.s
«• f\
10
CD "
0
Q. _
>. cr
cu 5
QC
CO
a
n
99999
2.7 2.2 2.5 2.2 2.6
0.7 0.7 0.7 0.6 0.7
yy A ^— — //^
A jfa vv vv y\
8 | 1 | 1
.1 1 ' I 1 L_
99889
0.7 1.4 0.6 1.6 1.5
0.3 0.3 0.4 0.4 0.5
| A | |
1,1.1
89999
0.6 1.4 0.8 1.6 0.8
0.4 0.5 0.2 0.8 0.3
A
fcr
& 1 A i 1
1 g p ( a
5 8 9 16 21 5 8 9 16 21 5 8 9 16 21
Number of Soil Plugs per Composite Sample
~8 _9 10
Area Number
FIGURE 3. Ra Measurements (pCi/g) of 5-, 8-, 9-, 16-, and 21-plug
Composite Soil Samples Taken from Areas 8, 9, and 10 in
the Windblown Mill-tail ings Flood Plain at Shiprock, New
Mexico, x and s are the Arithmetic Mean and Standard
••'."Deviation of the n Measurements for each Data Set.
82
-------
in
ix
IS
'*>ft'
20
i
15
! 05
O
! °- 10
< CD
:cc
CO
c
O
n
45756
2.2 2.2 2.3 1.4 2.4
0.6 1.8 1.2 0.4 0.6
-
* 1
i i i ' i
99999
6.0 1.9 5.3 1.5 4.7
1.3 0.6 1.2 0.5 0.5
| *
|~A T 1
I ! 1 i 1
99999
5.2 1.9 2.6 1.9 2.5
4.7 0.8 2.1 0.8 0.7
A
A
A
A
. A
A
^ 1 & ^ &
I I ! 1 1
99999
3.8 1.8 1.0 1.7 2.0
1.3 0.4 0.4 0.4 0.8
•
A
1 £
i i i i i
1 5 8 9 16 21 5 8 9 16 21 5 8 9 16 21 5 8 9 16 21
Number of Soil Plugs per Composite Sample
2356
' Area Number
FIGURE 4. 225Ra Measurements (pCi/g) of 5-, 8-, 9-, 16-, and 21-
plug Composite Soil Samples Taken from Areas 2, 3, 5,
and 6 in the Windblown Mi 11-tail ings Flood Plain at
Shiprock, New Mexico, x and s are the Arithmetic
Mean and Standard Deviation of the n Measurements for
each Data Set.
83
-------
h
X
s
OCX
20
en
^ 15
Q.
*
CU
DC
§ 10
95959
10.2 5.6 9.9 4.5 9.0
1.82.1 1.3 0.9 3.1
-
A
A
^ A
$ A-
i
g o
* A A • ^
A.
A A
^ , A
ZS
A
III!!
55555
10.5 3.3 8.5 4.7 9.5
2.9 1.4 3.4 2.3 1.7
^J
A
A
A
A A
A A
A A A A
A
^
A 6
A
A
I I I I !
98 989
13.1 7.9 10.6 7.5 8.0
4.3 1.5 3.1 0.8 1.9
" • " •
A
A A
A
A A '
A
A A
fi A
A A A A
2 A
^ ^ ^ s
^
/\
A
III!!
5 8 9 16 21 5 8 9 16 21 5 8 9 16 21,
~~ Number of Soil Plugs per Composite Sample
4 7
1
Area Number
FIGURE 5.
226Ra Measurements (pCi/g) of 5-, 8-, 9-16- and 21-plug Composite Soil
Samples Taken from Areas 1, 4, and 7 in the Windblown Mill-tailings rlood
Plain at Shiorock, New Mexico, x and s are the Arithmetic Mean and
Standard Deviation of the n Measurements for each Data Set.
84
-------
Mean
1 Area 226Ra,
5 _ Number pCi/g
i CT
\
"o
C.
o
ra
QJ
••a
L_
ra
5
7
;6
.3
'8
2
10
9
3
11
no
2
.5
3
2
1
1
: Mean
i Area :22eRa,
Number ; pci/g
'1
:2
7
4
1 5
: s
3
10
6
Q
2
8
4
2
2
1
2
1
21
16
Number of Plugs per Composite Sample
FIGURE 6. Standard Deviations of Multiple Composite Samples from Areas
1 Through 10 at the Windblown^ill-tailings Flood Plain at
Shiprock, New Mexico. Mean DRa Concentrations for each
Area are Given to Illustrate that Areas with Lower Average
Concentrations tend to have Smaller and More Stable Standard
Deviations.
85
-------
sampling pattern (see Fig. 2) changes. Under these assumptions we have the
model
|