Ambient Ozone and Human Health An Epidemiological Analysis, Volume 3


United States       Office of Air Cudhty       EPA-450/5-85-005c
Environmental Protection   Planning ana Standards      August 1985
Agency         Research Triangle Park NC 27711
Air
Ambient Ozone And
Human Health:
An  Epidemiological
Analysis

Volume III

-------
                      AMBIENT OZONE AND HUMAN HEALTH:

                        AN EPIDEMIOLOGICAL ANALYSIS
                     Paul R. Portney and John Mullahy
                         Resources for the Future
                            1616 P Street, N.W.
                          Washington, D.C.  20036
                                Volume III
                               Final Report
                                 June 1985
Submitted to the Economic Analysis Branch, Office of Air Quality Planning
and Standards, Environmental Protection Agency, Research Triangle Park,
North Carolina 27711, under contract number 68-02-3583.

-------
                                 DISCLAIMER
     This report has been reviewed by the Office of Air Quality Planning



and Standards, U. S. Environmental  Protection  Agency, and approved for



publication as received from Resources for the Future.   The analysis and



conclusions presented in this report are those of the authors and should



not  be interpreted  as necessarily reflecting  the official  policies of



the U. S. Environmental Protection Agency.

-------
                             TABLE OF CONTENTS
                                                                 Page
CHAPTER 1.  INTRODUCTION
1-1
CHAPTER 2.  ECONOMETRIC ESTIMATION OF HEALTH STATUS MODELS

   2.1   Introduction                                             2-1
   2.2   Some Problems with Least-Squares Estimation of Health
          Status Models                                           2-3
   2.3   Tobit Health Outcome Models                              2-7
   2.4   Cragg-class Health Outcome Models                        2-11
   2.5   Truncated-Normal Estimation                              2-15
   2.6   Heckman's Approach:  Sample Selection                    2-19
   2.7   Tobin, Cragg and Heckman:  A Digression                  2-21
   2.8   Poisson-distributed Health Outcome Measures              2-30
   2.9   Geometric-distributed Health Outcome Measures            2-33
   2.10  Multinomial-distributed Health Outcome Measures          2-35
   2.11  Estimation of Grouped Data Models Under the
          Normality Assumption                                    2-38
   2.12  Summary and Conclusions                                  2-40

CHAPTER 3.  AIR POLLUTION MONITORS AND INDIVIDUAL EXPOSURE        3-1

CHAPTER 4.  URBAN AIR QUALITY AND ACUTE RESPIRATORY ILLNESS

   4.1   Introduction                                             4-1
   4.2   Framework for the Analysis                               4-3
   4.3   Model Specification                                      4-8
   4.4   Empirical Results                                        4-11
   4.5   Policy Implications                                      4-20
         Appendix                                                 4-28

CHAPTER 5.  CONSTRUCTING A LIFETIME SMOKING PROFILE USING THE
            1979 HEALTH INTERVIEW SURVEY                          5-1

CHAPTER 6.  CIGARETTE SMOKING, AIR POLLUTION, AND RESPIRATORY
            ILLNESS:  AN ANALYSIS

   6.1   Introduction                                             6-1
   6.2   Smoking, Pollution, and Acute Illness                    6-2
   6.3   Data and Estimation Strategy                             6-5
   6.4   Estimates of Model Parameters and Relative Risks         6-14
         Appendix                                                 6-23
CHAPTER 7.  CHRONIC RESPIRATORY DISEASE
7-1

-------
                                                                  Page

CHAPTER 8.  ADDITIONAL SENSITIVITY ANALYSES

   8.1   The Effects of Precipitation on Acute Health Status      8-1
   8.2   Sample Size, Model Specification, and Parameter
           Estimate Sensitivity                                   8-2
   8.3   Poisson Regression Analysis of Volume I
           Models (48), (49), and (50)                            8-4
   8.4   Sensitivity to Aggregation Across Smoking and Chronic
           Illness Status                                         8-6

-------
                               Chapter  1
                             INTRODUCTION
     Volume I of  this  report  presents  a. great many results from our basic




analysis of ozone and acute and  chronic illness.  As indicated in Volume I,




we had to  make  a number of decisions along the way in the early stages of




our research.   One of the most important  concerned the tradeoff between the




breadth of our  analysis  as  opposed  to  the possible in-depth investigation




of a relatively small number of hypotheses.   In other words, should we use




fairly  standard   statistical   techniques   to  investigate  dose-response




relationships for  a  broad range  of  possible illness, using a  variety of




explanatory variables, and in a number of different population groups?  Or




should  we  winnow  out  a relatively few "promising"  relationships  using




preliminary tests, and  then  allocate time and  computing  resources  to the



application of more powerful  statistical  techniques to these relationships?




     With some  exceptions we  adopted the former approach.   Because of our




unique  and   comprehensive   data  on  air  pollution  concentrations  and



individuals'  health and socioeconomic status,  and we elected to test a wide




variety of  hypotheses  about  the  possible relationships  between  ozone and




other air  pollutants  on the one hand,  and a  variety  of  acute  and chronic




illnesses  on  the  other.   In  addition, we  examined separately  several

-------
                                  1-2






different degrees of  severity  for the acute  illnesses  we examined and we



also conducted separate analyses  for  adults  and for children (aged 17 and



below).  Of course,  as we point out in Volume I, we did conduct additional



sensitivity analyses where our  preliminary  research  suggested statistically



significant  associations  between ozone  and  the  dependent  variable  in



question.  Nevertheless,  the general approach was a  "broad brush" one.



     Since  completing the  work  reported  in Volumes  I  and  II,   we  have



received  many helpful  comments  on  and   constructive  criticisms  of  the



approach  we took in our analyses.   Many of  these  comments  came  in  an



EPA-sponaored public Peer Review Meeting held  in Raleigh (N.C.) on  April 3,



1984.'   There experts from  the epidemiological, clinical, biostatistical,



and economic  communities  presented us with a  number of useful suggestions



for further work.   In addition, we have received many useful comments from



our EPA  project  officers  and from our colleagues at RFF  and elsewhere who



have  read  with  interest our  original  work.    Finally, we  have given



considerable thought ourselves  to  ways in which  the  original analysis might



be extended or improved.



     Thus, over the past  year we  have tried to  conduct additional  analyses



that address  some of  most important  questions arising out of our  original



work.  Volume III below presents  the results of some of that work.  We say



"some"  because  we  are  continuing  to conduct  additional epidemiological



analyses  as  time and resources permit,  at least some  of  which  may not be



complete  until  after this  report has been submitted.    In one  sense,  in



fact,  we hope to never  be  "done" with  our work even  though  this report



completes our analysis for EPA.

-------
                                  1-3





     In one way or  another,  each  of  the following chapters is designed to



address one  or more  of  the  questions  raised in  our  earlier work.   For



instance,  Chapter  2 is purely methodological.   It presents  a  variety of



different   estimation  techniques   that  may  be   appropriate   when  the



assumptions that lie  behind  ordinary  least squares (OLS)  are violated as



several careful readers of  the studies in Volume I suggested they might be.



There we consider the sorts of problems that arise with OLS in the special



context of  health  effects  estimation.   Among the alternatives  to  OLS we



consider    are   Tobit    estimation,    Cragg-type    "hurdles"    models,



sample-selection and count-data models, multinomial logit  approaches, and



grouped  dependent   variable   techniques.    This   chapter   is  a  long  and



technical  one, we  realize.    However,  we feel  it is necessary to  set the



stage for  the  empirical  work presented in  later  chapters;  it should also



prove  useful  to anyone  about to embark for  the first time on  his own



estimation  of   air pollution (or  other  environmentally-related)  health



effects.



     Chapter 3  is much shorter and simpler.  It addresses a common reaction



to our  original  analysis.    Remember  that  in Volume I, the  air  pollution



readings we  assign to each  individual  are those measured at  the  monitor



nearest his or  her  home,  provided  that the monitor in question is. no more



tihan twenty miles  away (sometimes less).    We  continue  to  believe  this is



preferable to  the  most common alternative  to  this approach—suggested in



the literature matching  each individual in an SMSA to the  air  pollution



concentrations  averaged  over all the  monitors in  the  SMSA,  or within  a



subset of  it.  However, because most individuals  do travel  about  within an

-------
                                  1-4






area, it  is possible  that  the  area  wide averaged approach  might  better



characterize the  exposures  of  at  least  seme  individuals.    If  so,  these



averaged  concentrations  would   be   the  appropriate  ones   to  use  in



epidemiological analyses.   Hence,  it  is  of  interest to  know how closely



correlated  are  the readings  at  the  monitor(s)  nearest  the  individuals'



dwellings with the average of  all  the monitors within a given radius of the



dwelling.  This analysis is  undertaken  in  Chapter 3.



     That exercise in  turn forma  the basis for some sensitivity analysis we



conduct  in Chapter 4 of the  effect  on our  findings of different rules about



matching air  pollution concentrations  to individuals.  Chapter  4 extends



and  improves  upon our original work in  a number of other  ways,  as  well.



For  instance,  building on the methodology presented in Chapter  2 of this



volume,  in Chapter  4  we investigate the determinants  of  acute respiratory



disease using  poisson regression instead of  the OLS  and  logit techniques



employed  in Volume I.'   For  reasons  presented  in Chapters  2 and  4,  we



consider this to be a  significant improvement on our earlier analysis.  In



addition, Chapter 4 presents a more sophisticated analysis of  the possible



non-linearities  that   may  characterize  the  dose-response  relationship



linking   acute  respiratory   disease   to   ambient   ozone   and  sulfate



concentrations.     Not   only   do  we   consider   spline-type  functional



relationships, but  we also  allow for  a  variety  of  non-linearities  within



the  (already  non-linear) poisson  approach.   This, too,  sheds additional



light  on  the  analysis  in Volume  I.    Finally,  we  believe  that  the



elasticity-of-response calculations contained in the last  part  of Chapter 4



are  a useful  way  to view the possible effects  of changes  in ambient ozone

-------
                                  1-5





concentrations on human  health.   This suggests how  our  findings  might be



used in applied policy analysis  if it were desired to do so.



     One of the respects  in which  the analysis in Volume I could clearly be



improved concerns the  measures  of cigarette smoking we  employed.   Recall



that in most  of  the  models estimated,  we used MCIGS, a continuous measure



of  daily  cigarette  consumption, or SMOKY1NO, a  dummy  variable indicating



whether or not  an individual  is a never- or former  smoker  as opposed to a



current smoker.  We  also occasionally used  an  additional  dummy variable,



FORMER, to distinguish between those who  do  not smoke now but once did from



those who never smoked.



     However,  even this additional treatment  resulted in our finding a less



pronounced relationship between smoking and  ill health than  we might have



expected (although we  hasten  to point  out  that  even our  crude measures of



smoking  were  often   positively  and   significantly  associated  with  ill



health).  One reason  for  this  was  our inability in the Volume I analyses to



make  use  of  all  the data  provided  in the  HIS  Smoking  Supplement  on



individuals'  lifetime smoking histories. Thus, one of our  purposes  in the



analyses we  have conducted since  April  1984 was  to develop a.  measure of



lifetime  smoking behavior  and employ it  in our  analyses.   Chapter  5



presents the approach we took in  doing so.   While the  HIS  smoking data do



not enable us to specify an exact profile of respondents' lifetime smoking



habits,  they  do permit  the  construction  of several  plausible profiles.



These are discussed in some detail in Chapter 5.  Among other things, that

-------
                                  1-6






chapter discusses the differing weights that might  be  given to cigarettes



smoked years ago compared to recent  cigarette consumption.



     Chapter 6 presents  the results  of additional empirical analysis of the



relationship  between  air  pollution   (ozone   and  sulfates)   and  acute



respiratory disease. It  extends the analysis in Chapter  4 of this volume,



and  all  the  work  in Volume  I,  in several important  ways.    First,  the



analysis  in Chapter  6   incorporates  the  more sophisticated measures  of



individual  smoking.   For  instance, in  addition to  NCIGS,  a  measure  of



current smoking  habit,  the  analysis also  includes the  variable  PACKS,  a



proxy  for  lifetime  cigarette  consumption.    The  analysis  in Chapter  6



extends our earlier research  in another suggested  direction.   That is,  it



models the  individuals'   health  outcomes as  a multinomial logit process  in



which, on  any given day during the two-week recall period, an individual



could  report  no restriction  of activity  at  all,  a minor  restriction  in



activity attributable to respiratory illness (with no bed confinement),  or



what  we  refer to as  a   "severe" respiratory restriction—i.e.,  one which



requires confinement to  bed for  at least half the dayj  For reasons spelled



out  in Chapter  6,  we feel this  is another  productive  way to model  the



possible  relationship  between ambient air quality and  acute  respiratory



disease.   (The chapter   also  contains  a very  brief  discussion of ordered



logit as an estimation approach.)



     Chapter  6  is  intended to  accomplish  one  additional objective.   The



comments  on our  work in Volume I  often  expressed surprise that cigarette



smoking  did  not  completely   "swamp"   ambient  ozone  pollution  in  its



contribution to acute (and chronic)  illness—even though we  generally found

-------
                                  1-7





a positive and significant association between smoking and illness.   Thus,



one purpose of the analysis in Chapter 6 is to explore in somewhat greater



detail the  relative  risks posed  by  cigarette smoking and  air  pollution.



This is important since considerable public resources are currently devoted



to  reducing both.   While far  from  being  comprehensive on  the  subject,



Chapter 6 does explore these relative  risks in some detail.



     As  the  preceding pages  suggest,  most  of  the emphasis in  Volume  III



falls  on the  possible relationship  between  ambient ozone (and  sulfate)



concentrations and  acute  respiratory health.    This  reflects  the  heavy



emphasis  given acute  health  effects  in  our  earlier work as  described in



Volume I.   However,  we did devote some  attention in Volune I to possible



relationships  between  long-term  exposures   to  air  pollution  and  the



prevalence of chronic  respiratory and  other kinds of  disease.



     Chapter 7 below  presents  the results of some preliminary reanalysis of



those finding, specifically those dealing with chronic respiratory disease.



The analysis  below  extends our  original  work in several  important  ways.



First, we restrict our attention  in Chapter 7 to a group of individuals who



at  the  time of  the  1979  HIS had lived in their present location for at



least ten  years.   This is a more irresidentially stable" group than that



analyzed  in  Volune  I,  an important  consideration in  the epidemiological



investigation of  chronic illness.  In  addition, the individuals analyzed in



Chapter 7 are divided into two distinct  groups depending on whether or not



they received a special supplement  (or  "probe") on  respiratory  disease as



part  of  the  1979  HIS.'     Because   the  reported   incidence  of  chronic



respiratory disease  varies by a  factor  of six  between  those  who  received

-------
                                  1-3






the probe and those who did not, we felt the two groups should be analyzed



separately rather than pooled as  in our  original  analysis.   Finally,  this



reanalysis includes some model specifications in which ozone is measured by



the ambient concentration  averaged over all the monitors with ten or twenty



miles of each resident's dwelling.



     Finally, in Chapter 8 we report our responses to a variety of comments



or queries on Volumes I and II.'    None of these required the preparation of



a separate chapter, but each was important enough to merit consideration.



     One final  note  about Volume  III.'   Several of  the  chapters have  been



written to serve more than one  purpose.   For instance,  a slightly revised



version of Chapter 4 will  be appearing in the Journal of Urban Economics in



1986 under  the  title "Urban Air  Quality and Acute  Respiratory Illness.lf



Similarly, the material in Chapter 6 formed the basis of a paper presented



at the 1984 annual meetings of the American Economic Association in Dallas.



While  we  have  modified  them   for  incorporation  into  Volume  III,  some



material—particularly the brief descriptions of  the HIS and air pollution



data bases—will occasionally appear repetitive.

-------
                                Chapter 2



              ECONOMETRIC ESTIMATION  OF HEALTH STATUS MODELS








2.1  INTRODUCTION



     In  the  last  decade or  30,  estimation of  microeconoraic models  of



individual behavior  using large individual- or  ho use hold-level  data sets



has  flourished and  proven  an  important  advance  in   applied economics.



Details typically masked in  aggregate  time-series  data analysis are often



available in individual  cross-sectional data, thus enabling the testing of



hypotheses about responses of individuals to  changes in  constraints.



     In such micro datasets  one is prone  to find measures that economists



would characterize either as  corner-solution realizations  of instantaneous



optimizing decisions or as discrete representations of  such decisions.  An



example of the former  case would  be where  one has data  on  the number of



hours an  individual  worked in the market over a given  year,, and for some



subset of individuals no market hours were worked.



     An instance  of  the  latter case is  where data are available only on



whether or not  an  individual  had  purchased  some  consumer durable over the



previous twelve months, but  not on the  amount of the expenditure.  Assuming



such statistical models to be the  objectives  of estimation, then the former



is an example  of  what  have  come to be known as limited dependent variable



(LDV) models,  while the  latter  is a member  of  the class of qualitative



dependent variable (QDV) models.   Tobin's pioneering 1957  paper on durables



demand is the forerunner of  LDV estimation in economics.   Using data on 735



households,  Tobin modeled the ratio of durables expenditures to disposable



income;  for 183 of these spending units,  no durables were  purchased during

-------
                                   2-2
the time period of interest and a "corner solution" had to be treated.  As



is well  known,  the  solution to this problem was the  genesis  of  the Tobit



estimator,  which will  be  discussed below.  Note that if Tobin only had data



on whether  or not  there was  sane  durable purchased  rather  than  on the



actual amount, a QDV model  (such as  binary probit or logit) would have been



the appropriate approach.



     In  this  chapter  we discuss the theory  and practice of  econometric



estimation of LDV and QDV models as they pertain to health status measures



such   as   respiratory-related   restricted   activity   days,    or   the



presence-absence of a chronic  respiratory  condition.   It  is  seen that,



owing  to the  nature  of the available  micro  data,  standard  econometric



techniques  such  as  ordinary  least  squares  (OLS)   will  typically  be



inappropriate  tools  for the  analysis  of  the  relationships between air



pollution and human- health.  The available data on health status measures,



rather, are generally of a nature best described as qualitative or limited



dependent  variables.   This being  the  case,  more  complicated  estimation



techniques are in general required in order to obtain consistent estimates



of the parameters governing the health status outcomes.  Maximum likelihood



is the estimation method most commonly used  in such analysis.



     The treatment  here  is necessarily  brief.   However,  several excellent



surveys are available  for the reader who  wishes more detailed treatments of



the topics to be discussed  below. The 1981  and 1984 surveys by Amemiya are



excellent overviews of  qualitative  and  limited dependent variable models,



respectively, and the 1983  monograph by Maddala provides  broad coverage in



both  these  areas.    The  often-cited  1981  volume  edited  by Manski  and



McFadden is also an excellent  survey of  topics  in qualitative and limited



dependent variable estimation.

-------
                                   2-3
     Seme definitional  preliminaries  are appropriate here.  First, standard



practice  is followed,  with  random   variables  represented  in  upper-case



notation, their  realizations  in lower-case.   Second,  the  terms "censored



distribution" and "truncated  distribution"  will  be  used with considerable



frequency below.  The introduction to chapter 6 of Maddala (1983) provides



a good heuristic explanation of  censoring and truncation as they pertain to



the normal econometric  model.



     The plan for the  remainder  of  this  chapter  is as follows.   First, we



briefly  assess  problems associated  with  least squares estimation  of  air



pollution -  health  status  models.   Then we turn to  a  discussion  of some



techniques  that might  be   considered more or less  appropriate for  the



estimation  problems attendant   to  estimation  of  health  status  models.



Following  this  we   turn  to  a  discussion  of  prediction  based  on  the



estimation of the various models.  A  summary concludes the chapter.








2.2 SOME PROBLEMS WITH  LEAST-SQUARES  ESTIMATION OF HEALTH STATUS MODELS



     As  mentioned   above,   this  chapter  surveys   various  econometric



techniques  for  estimating   health   outcome models.    As  will  be  seen



throughout,   these  techniques  are   generally  such  that  iterative  (and



sometimes costly)  maximum  likelihood methods  are required in order  to



obtain consistent and efficient  estimates of the models' parameters.  Since



sound econometric policy  analysis depends  at  least in part on obtaining



consistent,   if  not   efficient,  parameter  estimates,  the question  is then



begged:   why is  it   necessary to utilize  such complicated  and  expensive



methods  when simple  and inexpensive least-squares algorithms  abound?  In a



nutshell, the answer is that least-squares  estimates of models of the genre



we are considering will generally be  biased and inconsistent.  The purpose

-------
2-4
of this section is to briefly demonstrate why this is so. To this end, a
brief exposition of the fundamentals of the basic linear econometric model
is presented, the requirements for consistent estimation of the parameters
are explained, and why at least seme of these requirements are unlikely to
be met in the health status models to be considered is discussed. The
exposition of the linear model and its properties follows that of Schmidt
(1976), which is among the most lucid in published texts.
Of fundamental concern is consistent and, if possible, efficient
estimation of the parameter vector 8 in the case where random variables Y,
2
are distributed in some manner with finite mean u. and finite variance a .
Specifying w.=X.0 makes the problem nontrivial, with X. a 1xk vector of
independent variables which will in general include measures of air
pollution and other covariatea, and 8 a kx1 vector of unknown parameters to
be estimated. Given these assumptions, we can write
., (1)
where, because E(Y.) - X. 3 and Var(I ) » a , e, has mean zero and variance
2
a . The unobserved realizations of e. correspond to the observed
realizations of I., y. . It is assumed that there exist T independent
observations on (y.,X.).
The model described satisfies full ideal conditions (Schmidt, p. 2)
when
i) X is a nonstochastic matrix of rank k
-------
2-5
matrix of independent variables, y will henceforth denote the Txl

vector of the realizations y. .

It can be demonstrated that, with or without the assumption of

-1
normality for e, the OLS estimator of B, B - (X'X) .X'y, is consistent:

i) B - (X'Xj'lx'y

- (X'X)~1X'(X8+£)

- B + (X'X)"1X'e

E(S>- 3 * (X'X)~1X'E(e)

=• 8+0

* S, so that 8 is unbiased for 3;

"2—1
ii) The covariance matrix of 8 is a (X'X) . so that, with all limits

taken for T-*•«••,

lim a2(X'X)"1 = a2lim(X'X)"1

=• a2lim(X'X/T)~1T~1

» a2lim(cf1T~1)

- 0,

because from above Q is finite nonaingular so that its inverse

exists and is finite, and a is finite;

iii) Therefore, since S is unbiased and its covariance matrix vanishes

in the limit, then 8 is consistent.

Because of its computational ease, least squares is obviously an

appealing tool for model estimation. The analyst must assess whether any

or all of the above conditions fail to characterize the data or model under

consideration to see if least squares maintains its consistency properties.

Should least squares prove inconsistent, alternative, and generally more

costly, methods of estimation must be utilized in order to obtain

consistent estimates of 8.
-------
2-6
As discussed in detail below, a very general characterization of

quantitative health outcomes measures is that they are data bounded from

below by zero, i.e. data realized only in nonnegative quantities. Of

specific concern here are measures like "amount of time spent ill." Such

measures are generally modeled econometrically as the censored or truncated

*
counterparts of normally-distributed latent random variables Y. having

2
E(Y.) - X.3, Var(Y.) = a . However, if the realizations of Y. are censored

frcra below at zero, we have
(2)
E(Y*)
where and $. are the standard normal density and distribution functions

evaluated at (X.B/tf). In the truncated case, where Pr(y.>0) - 1,
E(Y*) - X^ * ai/*i. (3)
When defined in terms of these expectations, the problems inherent in

least squares estimation become apparent. Since E(a<<»./*.) * 0, then E(e )

4 0 when e. is defined as the difference between either E(Y..) or E(Y.|y.>0)

and X.8 in (2). Thus least squares regression of y on X will yield

inconsistent estimates of 3, given that the null error expectation

assumption has been violated. Heckman (1976) is a good general discussion

of such problems.

Not all measures of interest in our analysis are cast in terms of

normally-distributed, parti ally-observed random variables. In the other
-------
2-7
cases we shall investigate, there are yet different characteristics of the

data or the assumed statistical distributions that render least squares

inappropriate given the objective of consistent parameter estimation. For

example, least-squares estimation strategy is generally completely

inappropriate when outcomes are qualitative since no objective function of

interest can be east in terms of linear expectations functions like those

above. We now turn to an assessment of various approaches to the

estimation of health status models.

2.3 TOBIT HEALTH OUTCOME MODELS

A logical starting point is the basic Tobit model. The nature of

several of the health status measures of interest in the micro data sets

being analyzed in this study is such that Tobit estimation would seem—at

least at first blush—to be a sensible approach. (See Osfcro (1983) for an

application of Tobit to a similar problem.)

Tobit estimation has been utilized in a variety of areas in applied

microeconomics, ranging from labor supply (see the excellent survey by

Killingsworth (1983)), to health economics (Ostro, (1983)), to commodity

demands or expenditures (Tobin (1957), Pitt (1983)), and many others (see

Amemiya (1984) for an extensive bibliography). The basic idea underlying

Tobit estimation is that one posits the existence of (latent) random

* 2
variables Y. are independently, normally distributed (MID) (X.S.a ). In

*
many interpretations of the Tobit model, the Y., are stochastic indicators

of intensity of desire for undertaking some activity. Owing to the nature

*
of the activity, however, some realizations of the Y. are censored while

for the others, the intensities are mapped directly into actual

undertakings of the activity. Some threshold, in effect, is crossed such
-------
2-6
that the activities are actually undertaken. For example, the fundamental

*
idea behind Tobin's seminal paper is that the Y. represent intensities of

desire to purchase consumer durables. When certain (assumed known)

thresholds are crossed, these intensities become actual purchases. In most

applied areas, the thresholds are zero, so that the mappings from

intensities into undertaken activities can be looked at as occurring when

*
the realizations of the I. occur in the interior of commodity space.

Otherwise, corner solutions obtain (for one discussion of estimation in the

Kuhn-Tucker/ corner-sol ution/Tobit context, see Wales and Woodland (1983)).

Assuming, then, that the thresholds are known and constant across

individuals, the basic Tobit model can be described by (4):

Y.* - NID(X, 3, a2)

* <*>
yi =• max(G, y. ) .

Setting C - 0 gives the model we shall discuss in the sequel. Letting fl-

*
signify the index set for observations for which raax(0, y.) - 0, and fl be

*
the index set for observations for which max(0, y.) > 0, then the

likelihood function for the Tobit model described here is

X 3 y -X 3
(5)
In log form (5) is
Z ln(1-*.) - |Q,|lna - Zin*. (6)
X x
iefl,

where |»| denotes cardinality and where terms not involving (3, a) are

dropped.

The first-order conditions for maximizing I are the (k •*• 1) equations
3H/36 = E (-X /a)X| * I (y^-X^X^/a = 0

l
-------
2-9
2 223 (7)
EA.(X.6)/a + Z ((y.-X.S) - a )/./ (1-<&. ) • Using terms in these equations, the method of

Berndt-Hall-Hall-Hausman (1974) among others, can be used for optimization,
and statistical inference is based on the asymptotic t- tests generated by
N .
utilizing [z(l.Z!)] . as the estimate of cov(S) (t. is the i-th term of
1=1 1 l 1
[(34/38)',
Several characteristics of the Tobit model are noteworthy. First, as

Amemiya (1984) points out, the likelihood function (5) can be rewritten as
L - Cud-*,) n *,1 Cn( <(>/*. a)] (8)
Written in this form, the likelihood function of the Tobit model can be

viewed as the product of the likelihood functions of a probit model with

parameter vector a » (8/a) (first brackets) and a truncated-at-zero normal

distribution with parameters (S,a) and E(Y..) - X. 6 •*- a«f./*. (second

brackets). As such, separate maximization subject to the restrictions that

the probit parameter vector be a positive scalar multiple (specifically

1/a) of the parameter vector of the truncated normal model yields the Tobit

model. The probit component can, of course, be viewed as the model of

whether or not the threshold is crossed, while the truncated normal

component models the conditional phenomenon of the magnitude of the

activity given that the activity is undertaken.

It is certainly reasonable to consider the possibility that the

parameter restrictions described in the proceeding paragraph are in fact

invalid. This would indicate, therefore, that the model of threshold
-------
2-1 0
crossing is not as intimately related to the conditional model of the

magnitude of the undertaken activity as is implied by the Tobit model. In

the context of health outcomes, this could mean that the phenomenon of

whether some illness occurs is governed by a set of parameters different

than that determining the amount, duration, or severity of the illness,

given that seme illness occurs. We discuss such issues in greater detail

later in the Chapter.

Another characteristic of the Tobit model that merits discussion is

the fact that the parameters estimated under the assumptions of the Tobit

model are in general nonrobuat to departures from many of the underlying

assumptions. That is, violation in the data of seme of the properties

implied when the likelihood function is written in the form (5) will lead

to inconsistent estimates of the parameters (S,a). This phenomenon, which

is not uncommon in many types of models that are estimated by means of

maximum likelihood, stands in contrast to more familiar formulations such

as OLS and nonlinear* least squares where, in spite of a variety of

departures from the assumed ideal structure of the error terms, one can

still obtain consistent estimates of the structural parameters.

Two of the most often discussed violations that bode dire consequences

for Tobit parameter estimates are violations of the MID assunption: first,

that the error variances are nonconstant across observations, and second,

that the error structure, though perhaps homoscedastic, is nonnormal. Note

that normal, homosoedastic errors are implied when writing the likelihood

function in the form (5). The results of several studies, summarized by

Amemiya (1984), suggest that under either type of departure, the maximum

likelihood Tobit parameter estimates are inconsistent.
-------
2-1 1
2.4 CRAGG-CLASS HEALTH OUTCOME MODELS

In a 1971 paper, Cragg proposed a set of models for situations that

can be depicted as follows. An economic agent makes two (simultaneous)

decisions. A dichotomous decision is made about whether or not to engage

in some activity. Conditional on an affirmative for this decision, a

decision is made regarding how much of the activity to pursue. The

activities can be construed in the broadest of terms: expenditures,

quantities demanded or supplied, or the amount of time spent in ill health.

Such models have come to be known as hurdles models, that is, conditional

or some hurdle being crossed, a decision is made about seme magnitude of

interest. Although these processes might in some cases seem logically to

be ordered in a temporal manner, the statistical properties of the model

abstract from any temporal considerations, the quantity decision being

described in terms of conditional densities.

Cragg proposed several models. However, because of the nature of the

present study, only two members of this set will concern us here, these

being the formulations wherein the quantity or second-stage decision is

defined only on the positive reals. This is in obvious reference to ideas

like "given that an individual had some illness, how much time was spent

ill." Although Cragg's other formulations are also interesting, their

discussion is omitted for economy of space.

For notational ease, we will assume that the same vector of

independent variables, X., influences both the first-and second-stage

decisions. This is a completely innocuous assumption, however, as elements

of parameter vectors can be restricted equal to zero to accommodate more

general cases. Regardless of the specification of the second-stage or

conditional decision, the first-stage is described by a binary
-------
2-1 2
probit model, i.e. the existence of latent random variables
* ?
Y..J -N(X. 8, , a") is posited. Only the signs of the realizations are

recorded, however, and are codified according to
0, y < 0
Because of this codification scheme, there is no information about the
* *
scale of the random variables Y^ (i.e. the mappings of y., into y,1 are
* * -
unaffected by transformations of Y(1 of the form 9Y... for Q > 0) .

Therefore, some normalization is required, the most common being a. » 1 .

This formulation gives rise to Cragg's formulation of the hurdle-crossing

model, where, with obvious change from Cragg's notation, we specify
Pr(yn - 1) - XX^) (10)

Pr(y11 - 0) - tf-Xj^),

where * is the standard normal distribution function (Cragg uses C(*) for

*(•)).

For strictly positive second-stage quantity realizations, Cragg

proposes two alternative formulations. Both are based on the specification

of the conditional densities for random variables Y,.^ given that the

activity is in fact undertaken.

The first formulation is one where the conditional density for the

realizations of the '£.- is truncated-normal, with the truncation point at

zero. Thus we have
yi2~Xi32 Xi32
~^), y_2 > o (11)
-------
2-13
= 0 , else,

where $ and $ are as defined earlier. With obvious notations! change from

Cragg's article, the (unconditional) likelihood of the positive

realizations, can be written as

f(y.-) = s(y<0|y-1=1) Pr>(y4, -1) =
ic. id' i i n.

(12)

y 0. Therefore, the likelihood function of Cragg's first model is

7i2 "V 2 Xi82
L - H *(-X.8) II ( :—) + ln«(X 8 ) - Ina - ln*( ). (
ff IT ff
In the form (14), it is straightforward to see that maximization of I is fully

equivalent to the two-stage maximization problemr

1) Probit estimation of the parameter vector 8 via maximization of
2) Truncated-normal estimation of the parameters (82»a^ via

of
7~XS XiS2
^-^-). (16)
-------
2-1 4
Because of the complexity of the log likelihood (I1*), estimation in this

two-stage fashion is likely to be somewhat easier than attempting to maximize

(14) with respect to the (2k+1 ) parameters (0 , 8 , a).

Cragg's second formulation again depends on the probit first-stage model,

but the conditional density of the positive realizations is respecified.

Instead of assuming that the conditional density of the positive realizations

of Y.2 is truncated-normal, the model is now formulated such that the

logarithms of the y are normal, i.e. conditional on y...=0, log(y._) -

N(X.8_,0 ). The conditional density for the isfl. is
(yi2a)
_, log(y )-X
where the term (yi2) . is the Jacobian of the transformation from y,_ to

log(y._). Therefore, the likelihood for the iefl., which is Cragg's equation

(11), is

f(y12) - Myl2|yll-l)Pr(yll-1)
log(y )-X 3
\ »i / *>^ *• ^*\ *./ v ^ \ rift^
The likelihood function for the entire sample is

_, log(y )-X 3
L- tt^-X.ft.) II (y.-a) !*( - — - L^-j^x.g.) (19)
l£: a l
In log form,
log(y )-
8)) * E
' lefl,

-Iny - Ino (20)

As in Cragg's first model, the second model can be estimated in two stages:
-------
2-15
1) Probit estimation of B- as above;

2) OL5 estimation of (82»cr) using the log transform of the y._ as

dependent variables and X. as the independent variables. This is

perhaps surprising, but results because the terms in (20) involving

(82,a) are identical to those of the likelihood function of the

familiar normal linear model.

Because of the simplicity of this two-stage approach, estimation in such a

framework is obviously appealing. Duan, et. al. (1983) have proposed the

second Cragg model to estimate medical expenditures: individuals either have or

do not have medical expenses, and given that they have medical expenses, the

2
conditional density of the expenditures is lognormal, log(Y.?)-(X. S?,
-------
2-16
truncated-frcm-below distribution where the point of truncation is constant

across observations and is assumed to be zero. The results easily generalize,

however, and for a discussion of the statistical properties of the truncated

normal distribution in the most general case, the reader is referred to Johnson

and Kotz (1970, pp. 81-87).

It should be noted that interest in the truncated normal should not be

confined to the role it plays in the Cragg model. The distribution is indeed

useful in many empirical situations. Hurd (1979) notes that

(e)stimation based on only positive y's cornea about very

naturally in a number of kinds of studies. For example, in many

labor supply studies one of the right-hand variables, the wage

rate, is only observed when the left-hand variable, labor

supply, is positive. Imputing the unobaerved wage rates causes

a number of complications that can be avoided by discarding

those observations for which labor supply is zero. Another

example is a demand study where the price is not known unless a

purchase is made. (Hurd, 1979, p. 248).

For our purposes, the likelihood function of the truncated normal can

be constructed as follows. We assume the existence of T. +• T realizations

2
of random variables Y..-NID (X.S.a ). However, for whatever reasons, only

the positive realization of the Y,. are used in the analysis, these assumed

to number T . Given these assumptions, the likelihood function is

T1
L = IT (<(>./
-------
2-17
where . is the standard normal density evaluated at ((y. - X.3)/a), and *.

is the standard normal distribution function evaluated at (X. 3/a) which

serves as the normalizing factor of the truncated density. The

log-likelihood function (suppressing terms not depending on (3, a)) is
T 2 XiS
A • I ~.5((y. - X.B)/ar ~ log a - log *(-£-) (22)
i-1 •
Estimation is by means of maximum likelihood. The first-order conditions

for a maximum of I are
T1r~*l 7i"Xi8
31/36 =• E ![•— * ( 2 )] X[ - 0
i="1 i
-------
2-13
Olsen's method relies on a method of moments technique whereby the

moments (specifically the mean and variance) of the empirical incomplete

distribution that of the positive y. , are related to the moments of the

complete distribution via formulae developed by Pearson and Lee (1908).

Extending the Pearson-Lee methodology to the multiple regression case,

Olsen demonstrates that the least squares slope coefficients differ from

the true slope coefficients by a common factor, and he presents in tabular

form the multiplicative correction factors needed to transform the OLS

estimates of the slope, intercept, and standard error parameters (based on

data from the incomplete distribution) to the corresponding complete

distribution estimates* In practice, we have fitted polynominal functions

of the third degree to Olsen's tabled data so that the transformations are

facilitated.

Olsen also presents the multipliers for transforming the

(mean/standard error) ratio estimated by OLS on the incomplete distribution

to the corresponding ratio of the complete distribution, (u/a). Olsen

notes that $(u/
-------
2-19
2.6 HECKMAN'S APPROACH; SAMPLE SELECTION

A very popular technique for estimating models with limited dependent

variable estimation is the sample selection model, attributable largely to

Heckman (1976, 1979). The model has a number of applications (see

Heckman's 1976 article in particular), and is quite easy to estimate.

Because it is so well-known, we will only provide a sketch of the details.

The following section, which contrasts and compares the Tobit, Cragg, and

Heckman models, sheds some more light on subtleties of Heckman'3

formulation.

Heckman considers the following two-equation model:
(24)
£i2
It is assumed that e... and e.? are distributed, j.oint normal, with marginal
2 " 2
densities N(0, a ) and N(0, o
(26)
- o, y,- < o.
In Heckman's model, the realizations y.. are available to the analyst only
*
when y > 0, i.e. when y._ »1 .
-------
2-20
A concrete example is where (24) is a model determining market wage

rate (or log(wage rate)) by a linear function of X. and random error and

where (25) is a model determining hours of labor supplied in the market.

It is assumed that either hours of labor supplied or a discrete binary

indicator of whether or not any hours were supplied is available for all

observations. However, because market wage rates are only observed for

individuals for whom the market wage rate exceeds the reservation wage at

*
zero hours, data on the y... are available only when y.? > 0 (y.? - 1).

Heckman then considers the expectation E(Y ]y » 1), which can be

written as
E(Yillyi2 '1) 'Vl *E(sillyi2 - 1)' (27)
If one considers least-squares estimation, of (27), the question is: Do

there obtain consistent estimates of 8. when y. is regressed on X. for

those i for whom y,_-1 ? Basically the issue is whether the expectation

S(e |y • 1) is null. In general, and thus at the core of the sample

selection bias problem, the answer is "no". Based on well-known formulae,

it holds that
E(eil'yi2 = 1) " ai2VB2n~*i)f (28)

where . , (1-*.), and a., are all positive, then least

squares estimation of (27) will be based on an expectations function with

nonnull disturbance expectation, and will therefore yield inconsistent
-------
2-21
estimates of 3. .

Heckman's suggested procedure in this situation is as follows.

Estimate on the entire sample a probit model for the discrete indicator

representation of the model (25). This yields a consistent estimate of the

parameter vector (6 /a_) from which consistent estimates of \. * ./(1-$.)

are constructed. Form the Tx(k-M) matrix Z » [XJA], where A is a Tx1

vector with typical element \. , and regress y. on [x. , i.]. This procedure

yields consistent estimates of the parameters 3. and (a12/a?), having

effectively solved the omitted variables problem by using a consistent

estimate of E(e |y.2 » 1) as a regressor.

*
In the context of health outcomes models, one could define y.2 as some

latent index of the propensity to be ill. Given that this index is greater

than some threshold level, illness results, its magnitude determined by the

realization y . The translation of the latent illness model into

Heckman's framework is not straightforward, however. For those individuals

not reporting- illness over the sample interval f we observe zero time spent

ill rather than not observing the amount. It is therefore difficult to

interpret the meaning of the realized, but unobserved, y for the healthy.

We turn in the next section to a more detailed analysis of such subtleties.

2.7 TOBIN, CRAGG, AND HECKMAN; A DIGRESSION

As there are some similarities between and among the models described

above and identified for expositional parsimony as the models of Tobin,

Cragg, and Heckman, it is appropriate to summarize their similarities and

differences and in so doing to elucidate the circumstances in which each

model is more or less appropriate. (The discussion of Cragg1 s model here

is the second Cragg model (probit/truncated-normal), as that version is
-------
2-22
most similar to the others discussed here.)

First to note is that the Tobit model results as a restricted version

of both the Cragg and the Heckman models. The reason for this is purely

mechanical, however, and should not be taken to imply that the Cragg and

Heckman models are in general identical. As we will see below, these

models are structurally quite different.

To see that the Cragg model reduces to the Tobit, the Cragg

log-dikelihood function can be written (following Lin and Schmidt (LS)

(1984)) as

X 3
4 - £ ln*(-X 8,) + E [lirtU^ ) - ln*(-~)_

(29)
(1/2)ln(2H
-------
2-23
excerpt from LS provides a particularly cogent summary description of the

appropriateness of the restricted (Tobit) versus the unrestricted versions

of the Cragg model:

(I)n the Tobit model any variable which increases the probability
of a non-zero value must also increase the mean of the positive
values; a positive element of 8 means that an increase in the
corresponding variable (element of X.) increases both Pr(y.>0)
and E(y |y > 0). This is not always reasonable. As an example,
consider a, hypothetical sample of buildings, and'suppose that we
wish to analyze the dependent variable "loss due to fire," during
some time period. Since this is often zero but otherwise
positive, the Tobit model might be an obvious choice. However,
it is not hard to imagine that newer (and more' valuable)
buildings might be less likely to have fires, but might have
greater average losses when a fire did occur. The Tobit model
can not accommodate this possibility.

Another problem with the Tofait model is that it links the shape
of the distribution of the positive observations and the
probability of a positive observation. For rare events (like
fires), the shape of the distribution of the positive
observations would have to resemble the extreme upper* tail of a
normal, which would imply a continuous and faster than
exponential decline in density as one moved away from zero.
Conversely, when zero occurs less than half of the time, the
Tobit model necessarily implies a non-zero mode for the non-zero
observations.

Cragg's model avoids both of the above problems with the Tobit
model. A reasonably strong case can be made for it as a general
alternative to the Tobit model, for analysis of data sets to
which Tobit is typically applied—namely, data sets in which zero
is a common (and meaningful) value of the dependent variable, and
the non-zero observations are all positive. The distribution of
such a dependent variable is characterized by the probability
that it equals zero and by the (conditional) distribution of the
positive observations, both of which Cragg1s model parameterizes
in a general way. (LS, pp. 174-175 )

Turning now to Heckman's formulation, his two-equation model is seen

to reduce to the Tobit model as follows. Recall that the model can be

written (with notational changes obvious) as
(30)
-------
2-2 U
2= Xi82
*
Y is a latent variable, however, and only a discrete (0,1) sign indicator

of its realization y\2 is available, y is observed only when y.» = 1.

Letting 3- =8- and e. =e.? (i.e. the error structure is univariate rather

than bivariate), then the Heckman model is the standard Tobit model. The

logic is that when these restrictions are imposed in the Heckman

two-equation model, the remaining single equation plays both the censoring

and the determination-of-intensity roles. Since the censoring occurs as a

*
result of a non-positive realization of the random variable Y.2, the Tobit

requirement that the quantity or intensity realization be confined to the

*
nonnegative orthant is automatically satisfied when the restriction y-i^Viy

(i.e. S-i-6-, en*e-2^ is imP°3ecl- In general, however, the Heckman

two-equation framework is not specifically designed to model situations

where realizations of the dependent variable of interest are necessarily

nonnegative and are recorded for all individuals/observations, and where

Pr(y.aQ) > 0. Heckman's formulation has y./O except on a set of measure

zero. We turn now to an explanation of the fundamental differences between

the Heckman two-equation formulation and the two versions-of-rinterest of

the Cragg model.

*
The two-equation Heckman model describes two phenomenon, Y and Y.o,

2
that are marginally, distributed, respectively, as MID(X. B , a ) and

22
NID(X.32,0-) (a? is usually restricted - 1 for normalization when only the

*
sign of y.~ is observed). The joint distribution is bivariate NIDCX.S^

Y a 2 2
i82f °1 • a-| , P) » where p is the correlation of (s.,. £jO» (a^/a^),

which is in general nonzero. The important point is that these marginal

and joint distributions are unconditional. That is, for all i, there exist
-------
2-25
realizations (y..., y.p) although the realizations y.. for some i will be

unavailable to the researcher. Casting the problem concretely in the area

where Heckman1s model has been most fruitfully applied, labor economics,

sheds further light on the subtleties of his model. Here we define

*
y. =log(W.) and y.~=log(H.+1) , where W. is wage earned in market work and
!• 1 i \(~ L , C
*
H. is hours of market work. Thus, y.2 is positive only if market hours are

positive. It is posited that the expected values of both Y. and Y._ are

linear functions of personal characteristics and other variables so that

the two-'equation model results. However, because we only observe the

market wage for those individuals actually participating in market work

(those for whom H.>0), some subset of observations will not have data on

the y... There is a market wage determined for nonparticipants; whether or

not such individuals have knowledge of their market wages is immaterial.

The relevant analytical fact is that such data are unavailable to the

researcher.

In this labor supply framework, it is apparent why the estimation

techniques developed for the two-^equation Heckman model and discussed

earlier in this chapter have such appeal. The more immediate concern, of

course, is whether such techniques are in fact appropriate to the

estimation requirements of the present analysis. In a nutshell, Heckman's

model is one where there are two equations of interest, both holding for

all i unconditionally, and where (except when restricted so as to be

identical to a Tobit model) the probability of observing realizations of

the dependent variable equal zero is zero. Does such a formulation capture

the essence of the "corner solution" problems of the health status outcomes

phenomenon?

It seems rather artificial to cast such phenomena in such this
-------
2-26
framework. It is not generally the case with the generation of health

outcomes data that one can posit the existence of some latent variable such

that data for the illness raeasure(s) of interest are only available given a

positive realization of the latent variable. Rather, the processes of

interest here are represented more typically by data that indicate the

realizations of illness outcomes for all individuals, even though these

realizations are quite frequently on the boundary of the "consumption" set.

In sketching some of the differences between the Heckman two-equation

formulation and the Cragg models with particular reference to data sets

where the zero or corner solution outcomes are meaningful and where nonzero

outcomes are strictly positive, LS observe that in such cases the Heckman

model's assumptions are not particularly representative of the situation

because in the Heckman formulation:
— the observed values of y. ^ need not be positive, in the sense

that the model implies a non-zero probability of observed y <

0; and the unobserved y are literally unobserved, rather than

observed as equal to zero. The first of these problems can be

circumvented, for example, by measuring y.. in logarithms,...and

the second problem is in any case fundamental. (LS p. 175).
We turn now to a discussion of how the Cragg models differ in substance

from the Heckman two-equation setup and argue that the Cragg formulations

are relatively more suited than Heckman1s model to the nature of a subset

of our estimation requirements.

Although like the Heckman formulation in being a "two-model"

specification, the fundamental point of departure for the Cragg technique
-------
2-27
is that one of the two models is formulated in terms of conditional

expectations. The conditions on which the expectations are taken are, as

described above, the outcomes of unconditional models, which are generally

stated as binary representations of latent random variables. Thus, in the

context of health measures, there is an unconditional model defined for all

individuals determining the binary outcome (illness, no illness).

Conditional on an "illness" outcome, the quantity or duration of illness is

determined either by a lognormal or truncated-normal model. The

unconditional likelihood for a representative ill individual is then

density(illness duration given some illness)*Pr(some illness), (31)

which is equation (12) as specified earlier. There is no density of the

quantity of illness defined for the healthy, unlike Heckman's formulation that

defines such a density for all individuals.

Deaton and Irish (DI) (1984), in an independent line of investigation,

have purportedly cast Cragg's first model in a two-equation Heckman

formulation. They indicate that a positive observation on the quantity measure

*
of interest is made when, in the notation used earlier, both Y. and Y._ are

realized as positive, else a zero or a nonparticipation results. In two cases,

DI specify
(32)
7
Cast thusly, the Cragg model can be viewed as a Heckman two-equation model, but

with a restriction imposed that is absent in Heckman's formulations. That is DI

seem to have ignored one aspect of the Cragg model that is key in
-------
2-28
*
differentiating it from Heckman's specification, viz. that y.?>0 is both a

necessary and sufficient condition for a positive realization of y... to

i * *
result. That is, Pr(y. >0|y.2>0) - 1, Pr(y. Ojy_2<0) = 1. When, and only

when the first hurdle is traversed is there a positive amount of the activity

undertaken. So DI's statement that positive realizations of both variables

determines whether y. is observed positive is somewhat misleading in that a

positive realization of either suffices to assume the positivity of the other.

Neither of Cragg's specifications, then, is really in the spirit of the model

proposed by Heckman except, of course, when both the Cragg model and the

Heckman two-equation formulation are restricted such that the Tobit

specification results.

Owing to the subtleties of the arguments, it is likely that the above

discussion has provided somewhat less than a total clarification of all the

relevant issues. Some of these shortcomings are due to the fact that even

central participants in the academic debates appear still unconvinced about the

nature of the differences among the estimation techniques. For example, as

noted earlier Duan and coauthors (1983) have used the Cragg estimation

technique to model individuals' medical expenditures. The expenditure

decision, in the spirit of Cragg's specification, is statistically modeled as

two separate processes. Model one determines the binary outcome of whether or

not any expenditures will occur, and model two determines the amount of

expenditure (positive by definition) that results conditional on there being

some expenditure. In this paper, Duan and coauthors assert that the covariance

between the error terms of the two models is irrelevant insofar as construction

of the likelihood function is concerned.

Recently, however, Hay and Olsen (1984) have questioned the Duan and

coauthors method, stating that this approach "requires some fairly unusual
-------
2-29
assumptions on the model joint error distribution and functional form (p.

279)." Moreover, Hay and Olsen go on to claim that the Duan and coauthors

formulation "can be interpreted as being nested in the more general sample

selection models (p. 279)." Duan and coauthors respond that Hay and Olsen "are

incorrect in claiming that our models are nested within the sample selection

model," and that "the conditional specification in the multi-part (i.e., Duan

and coauthors) model is preferable to the unconditional specification in the

selection model for modeling actual (v. potential) outcomes (p. 283)."

As we argued earlier, the sample selection or Heckman approach is

particularly fruitful when analyzing phenomena such as labor market

participation. Quoting Duan and coauthors:

For certain empirical problems such as labor force

participation, the primary goal might be to predict the

potential outcome instead of the actual outcome; therefore, an

unconditional specification such as the sample selection models

might be preferable. For the present application, however, the

goal is to predict the actual expense, not the potential

expense; therefore, the unconditional equation... is of no

direct interest, and the preference for the unconditional

specification in the other empirical problems does not apply to

the present application, (p. 286).

In any event, this discussion demonstrates that there still exists

some confusion on these points in the published literature. We have

attempted to be as thorough as time and space permit in hope of emphasizing

one extremely important message. That is, it is essential that the
-------
2-30
researcher be intimately familiar with the behavioral and statistical

structure of the models of interest in order to avoid being swallowed by

the slippery quicksand we have described. The nature of health status

measures as conditional or unconditional and the interpretation of any

latent variables in the model must be quite clear before the correct

estimation technique can be selected. When, and only when, such issues are

in order is it possible to make sense of the estimated obtained and their

relevance to benefit estimation.

It seems that the logic of the health status outcome measures of

interest in this study is better captured in terras of Cragg's

specifications . than in the Heckman two-equation model although this

question is obviously still open to informed debate. The specification of

the magnitude-of-illness model as a conditional model is, however,

intuitively plausible, and Cragg's formulations provide a natural vehicle

for translating such intuitive plausibility into an econometric framework.

However, it so happens that the assumption of normally, or at least

continuously, distributed random variables, which characterizes the above

models, is not necessarily appropriate insofar as count measures like

"days" or "times" are concerned. To a discussion of some alternative

estimation techniques that might be used in such situations we now turn.

2.8 PQISSON-DISTRIBUTED HEALTH OUTCOME MEASURES

In modeling event counts (non-negative integer data) over some time

interval (t, t+dt), the Poisson distribution is commonly used. Here, a

random variable Y^ follows the probability law

?r(Y = y) = exp (-X.nT/y!, yetO,1 ,2,...}
1 l " ' (33)
= 0 , else
-------
2-31
with j

It happens that there exist health outcome data of interest that are

recorded as nonnegative integers, most obviously as counts of days of

activity restriction. For any individual, such measures can, over a time

interval (t, t+dt), say one two-week period, assume only integer values in

{0,1,2,...,14}. Because of the paucity of observations likely to be found

at the upper (14 day) limit, we ignore the fact that these measures obey

upper bounds and concentrate instead on the complications presented by the

large number of individuals who in a typical random sample of the

population report zero days of restricted activity.

Analogous to the familiar normal distribution where for econometric

work one typically specifies u. - X.S, the \. parameter of the Poisson

distribution can. be reparameterized to admit the influence of

covariates. Since for all i, \. > 0, a straightforward approach is to

assume A. » exp(X. 3) and to estimate S by maximum likelihood (see Hausman,

Hall, Griliches (1984), Hausman, Ostro, Wise (1983), Portney and Mullahy

(1985)). This is the approach adopted here for modeling the restricted

"%
activity day outcomes.

One drawback of the Poisson model is the restriction that E(Y.)

» Var(Y.). Should this restriction not in fact characterize the data, the
>*
maximum likelihood estimates of the covariance matrix of 3 based on minus

the inverse of the estimated Hessian will be inconsistent and t-tests based

thereon would be misleading. Hausman, Ostro, and Wise circumvent this

restriction by allowing for an overdispersion parameter. A different

approach is used here, using an estimator of the covariance matrix that is

robust against departures from the mean»variance restriction, this
-------
2-32
procedure described below.

Given T independent observations, the log-likelihood function of the

Poisson health outcome model can be written as
i = I -exp(X.S) + y.X.3 + C, (34)

i l x

where exp(X.S) » X., y. is the observed count of illness days, and C does

not depend on 0. It is obvious that i is concave in 6. The first-order

conditions for the maximization of i are
3A/3B = E -exp(X.S)X! + y.X! =• 0 (35)
• J» J. i, J.

with the maximum guaranteed by the condition

32l/363Bf - Z -(XlX.)exp(X.S) (36)
L i i i

negative definite.

The maximum likelihood estimates of 0 obtained by maximizing (46) are

consistent, but the estimate of the covariance matrix of SM, using

2-i "
[-3 1/3B36'] evaluated at 8Mr will be inconsistent if the data are not in
nt.

fact generated by the specified Poisson distribution.

This is most easily seen as follows. Note that the model can be

equivalently cast as a nonlinear least squares regression, the i-th

observation being
(37)
exp(X.[S)
with E(e.) =• 0. Clearly, var(e.) = var(Y.) = exp(X.S), so that the e. are

heteroscedastic. If nonlinear weighted least squares is used with the
-------
2-33
weights exp(~X.8) formed using consistent estimates of 3, and if the data

are in fact Poisson as specified, the maximum likelihood consistent
^ *\
estimates of 8 and cov(B) will obtain. (The consistency of 3MT for 3 does

not depend on the weighting scheme.) However, if the data is not

Poisson-distributed, the estimate of cov(3) obtained in this manner will be

inconsistent and asymptotic t-tests based thereon will be misleading. The

case is fully analogous to the estimation of the heteroscedastic linear

model which yields inconsistent covariance estimates (and, therefore,

t-statistics) if the heteroscedastic nature of the error structure is

either ignored or incorrectly specified.
«*
Royall (1984) has demonstrated a method whereby estimates of cov(3)

robust against misspecification of the underlying distribution of the data

can be obtained for various distributions, including the Poisaon, when

2-1
[-3 1/3838'] - evaluated at &ML fails to yield a consistent estimate of
*• 2
cov(B). Denoting 1(8) as [-3 1/3638'], Royall's suggestion is to estimate
A
cov(B) as
31.^/38)']I(8) (38)
where I. is the i-th observation's contribution to the log-likelihood
/*
function and where all relevant evaluations in (38) are at 3M,. This will

be the approach adopted in empirical implementation of the Poisson model

the present study.
2.9 GEOMETRIC-DISTRIBUTED HEALTH OUTCOME MEASURES

One alternative to the Poisson model for the modeling of count data is

the geometric distribution. Though seemingly not as often used by
-------
2-34
econometricians as the Poisson, the geometric is a logical choice should an

alternative to the Poisson be desired. Furthermore, the basic geometric

specification does not suffer from the mean=variance restriction that is

implied in the basic Poisson model. As will .be seen below, the variance of

a geometric-distrbuted discrete random variable is greater than its mean,

although the fact that the variance depends on the mean limits somewhat the

flexibility of the distribution.

Our description of the properties of the geometric distribution

follows that of Johnson and Kotz (1969). First, it should be noted that

the geometric is a special case of the negative binomial. Discussion is

confined here to the geometric because it is computationally far more

straightforward than is the general negative binomial. The geometric

distribution is defined as follows:

Pr(X-k) - Pk(H-Pr(k*1), k -0,1,2... (39)

« 0 , else

with P>0. It holds that E(X) - P and Var(X) =• P(1+P). As in the

econometric specification of the Poisson model considered earlier, one

allows the P to vary across observations as P., and again P. =• exp(X.S) is

a sensible parameterization due to the required positivity of the P..

Given this, the likelihood function for T independent observations can

be written as
T
L = H exp (k.Xlft)(1 + exp(X.3))~(!V1) (40)
i=-1 L

with loglihood
T
I
- Ui-H) log (1 H- exp(xiS)) (41)
-------
2-35
where -k. is the observed count for the i-th observation. The ML estimate 3

satisfies
T
- l£k. - (k. +1) exp (X 3)/(1 + exp (X B))3X! - 0 (42)
1-1 *•

The Hessian is

T
H - 321/3B3S'- I -(k + 1)Cexp(X 3)/(1 * exp(X.S))2]X!X , (43)
1-1 1 x 111

which is seen by inspection to be negative definite. Because it is a

fairly uncluttered expression, estimation and inference can proceed using

-H as an estimate of the information matrix and (-H) . as an estimate of

the covariance matrix. Unfortunately, much like the Poisson specification,

the covariance estimate thus obtained is not robust to departures from the

data being in fact geometric. However, the methods proposed by Royall

(1984) and described for the Poisson model can be used for the geometric

distribution also. As the development is identical, the details are

omitted for economy of space.
2.10 MULTINOMIAL-DISTRIBUTED HEALTH OUTCOME MEASURES

One type of micro data of particular interest in health econometrics

is of the following nature. We observe over the course of some fixed time

period (say one two-week period) the number of times (say days) that an

individual's health status is characterized by (k-1) mutually exclusive

illness outcome measures and, therefore, the number of days on which no

illness resulted, which can be viewed as the k-th activity. To be
-------
2-35
concrete, the two-week illness profile for some individual who has in

his/her illness "possibility set" two illnesses (minor restricted activity

day (=M), and severe restricted activity day (=S)), and healthy days (=H)

(=14-M-S) might look like

H - 11

M - 2

S = 1

Given observations on such health outcome profiles, it is appropriate

to view the data characterizing individuals' health status as realizations

of multinomial random variables (see Morey (1981) for a related

discussion). Recalling from discrete statistical theory, the multinomial

distribution of a random variable Y. with parameters (T; P.,..., P ) can be
"* ' - , «V

written
k t
Pr(Y - y) - T! H (P.J/t !), (44)
" j-t J J

where T is the number of trials (here days), the t. are the number of

occurrences of the j-th outcome, and P. are the probabilities that the j-th

outcome will occur on a single trial. To extend the statistical model to

the health status measures, we consider each daily outcome as one trial

from a multinomial distribution with individual-specific parameter vector

for the ra-th individual (T ; P. ,..., P, ). Assuming T = T' - T for all
mi. k . m m

ra, m', we henceforth drop the subscripts on the T parameters. The profile

for two weeks, then, is the 14 (by assumption independent) daily trials for

each individual. The econometric objective is the estimation of the P. ,
Jm
i.e. estimation of the probabilities of realizing one of the k possible
-------
2-37
outcomes on a given day.

For computational simplicity, we proceed as follows. A logistic

distribution for the daily outcome probabilities is assumed. Thus, the

probability that the outcome is Z on any trial is
PZ =* exp(Xm3z)/ I exp(X & ) (45)
m jefl

for Zefl={M,S,H}. The logistic distribution assures that for all m

the multinomial requirement (Z P. =-1) is met.
jeQ Jm '
Since the probabilities (45) are unique only up to a difference in

parameter vectors (8 -8.,), some normalization is required. The

normalization most convenient and easily interpreted i3 &„ - 0, so that 8,.
ti M

and fj_ are interpreted as differences between the respective illness
O

parameter vectors and the no-illness parameter vector.

The objective, then, is estimation of the parameter vectors 3.. and 8 .
M o.
This is, of course, fully analogous to the widely-used multinomial logit

model where a single outcome from a set of mutually exclusive- outcomes is

considered. In fact, that case is merely a special case of the present

exposition for which T « 1 for all m.
tn

Estimation is by means of maximum likelihood. Assuming the existence

of N independent profile draws from the population, the likelihood of the

data as a function of the parameters is

N M fcj
L(B) - H Pr(y - y) - H T! H (P. /t.!) (46)
m-1 "m ~ m-1 jefl Jm Jm

where the P. are as defined in (57) and where ft is the illness-type index
Jm
set. In log form,
-------
2-33
N
1(6) -I St. log P. + C, (47)
m=1 jeQ Jm Jm

where C is a constant not depending on 3. Given the assumed logistic

probabilities, we have
N
US) - I S t [X 8 - log ( E exp (X 0 ))] + G. (48)
m=1 jefl Jra J ksft

Maximizing (48) can be accomplished with only a slight modification of most

existing (single-trial) multinomial logit programs.
2.11 ESTIMATION OF GROUPED DATA MODELS UNDER THE NORMALITY ASSUMPTION

There are often institutional or other constraints in the sampling or

data-recording processes that have the effect of generating inexact data

for research purposes. A common case is the situation where continuous

measures of interest, such as the amount of time spent in ill health, are

cast in the recorded micro data as grouped or interval data. We discussed

above strategies that might be considered when the outcomes are recorded as

"number of days" or "number of times," i.e. where the data can be viewed as

realizations of discrete statistical processes rather than as

discrete/integer codings of fundamentally continuous processes. In this

section we concern ourselves with the situations where the underlying

processes are best viewed as continuous phenomena but where the vagaries of

either the sampling or data-coding procedures are such that only a finite

number of intervals in which the continuous measure is defined are

determined and the only data available to the analyst are indicators of the

interval bounds in which the (unknown) continuous measure is realized. For

example, the latent continuous measure might be "time spent ill over some
-------
2-39
time interval y (say t)," but owing to whatever reasons, all one knows is

whether t-0, te(0,4 days], te(4 days, 3 days], or tc(S> days, 365 days) (for

y=one year). The purpose of this section is to present an estimating

technique designed to handle such situations.

The method is based on the work of Rosett and Nelson (RN) (1975), who

developed what is known as the two-limit probit estimation technique, and

of Stewart (1983), who generalized the RN method to account for

multi-interval data. We will, therefore, refer to the model expounded here

as the RNS method. Here is posited the existence of normally-distributed

* 2 *
random variables I. - NID(X 3,a ). The realizations y. are unobserved,

#
however. Available is the knowledge that the realization y. is an element

of some proper subset of 8. More formally, partition R into P (>2) subsets

J , such that U J»R, J. BJ.-
-------
2-40
T

3i/3o = Z (6 _..>/.•> ~ 8 ,.,/a( ,.. - Q, ,H.J - 0,

where 9 ,. . = (A - X.3/a)((A -X,B/a)), and 4>(c) is the standard normal
pi. i; p i p i
- 5 2
density (2ir) " exp(-.5c ). (Note that when P = 2, i.e. when the model is

binary probit, a parameter normalization is required. Typically a=1 is

used. This, of course, reduces the number of first order conditions from

(m+1) to m, where m is the dimensionality of 0.) Stewart has shown how

iterative least squares can be used to obtain the ML estimate. The reader

is referred to his work for the details.

2.12 SUMMARY AND CONCLUSIONS

This brief survey has attempted to present an overview of several

approaches to econometric estimation of air pollution - health outcomes

models in situations where the distributions health outcome data are such

that methods other than linear ordinary least squares are likely to be

required in order to obtain consistent parameter estimates. The data used

in this study are in all instances of this "nonstandard" nature. In

particular, the analysis to follow concentrates on three of the types of

data described in the preceeding discussion: count data,

multinomial-distributed data, and discrete indicator, or (0,1), data of the

probit sort. The following chapters discuss in some detail estimation of

such models, and implement some of the estimation techniques presented in

the analysis of this chapter.

The scope of the present analysis precludes consideration of several

interesting research issues that must be placed on the menu of future

research. First, the matter of severity of chronic illnesses is left
-------
2-41
untreated. It clearly is plausible that not only the presence or absence

of, for example, chronic respiratory illness is related to air pollution

exposures, but also that the severity of the illness - defined by some

metric of severity — is responsive to pollution exposures as well as other

covariates. A second interesting issue that merits analysis in the future

is the possibility that some subset of the explanatory variables used to

explain health outcomes is correlated with heterogeneous

individual-specific components of the unobserved equation error terms. In

the present context, it might be argued that covariates such as cigarette

consumption, income, labor market status, and even air pollution exposures

(on this last point, see Rosenzweig and Wolpin (1984)) are possibly

correlated with unobservable errors. When such heterogeneous unobservables

are present and — the crux of the problem — are correlated with observed

explanatory variables, parameter estimates obtained without explicit

recognition and control for this nonzero correlation are will in general be

inconsistent. Sane instrumental variable technique will likely be required

in order to obtain consistent parameter estimates under such circumstances.
-------
2-42
REFERENCES

Amemiya, T. 1981. "Qualitative Response Models: A Survey," Journal of

Economic Literature, vol. 19, pp. 1483-1536.

. 1983. "Nonlinear Regression Models," in Z. Griliches and M. D.

Intriligator, eds., Handbook of Econometrics, vol. 1, (Amsterdam:

North-Holland).

. 1982*. "Tobit Models: A Survey," Journal of Econometrics, vol. 24,

pp. 3-61.

Serndt, E. R., B. H. Hall, R. E. Hall, and J. A. Hausman. 1974.

"Estimation and Inference in Nonlinear Structural Models," Annals of

Economic and Social Measurement, vol. 3, pp. 653-665.

Breusch, T. and A. R. Pagan. 1980. "The Lagrange Multiplier Test and Its

Application to Model Specification in Econometrics," Review of Economic

Studies, vol. 47, pp. 239-253.

Cox, D. R. and D. V. Hinkley. 1974. Theoretical Statistics (London:

Chapman and Hall).

Cragg, J. G. 1971. "Some Statistical Models for Limited Dependent

Variables with Application to the Demand for Durable Goods,"

Econometrica, vol. 39, pp. 829-844.

Duan, N., W. G. Manning, C. M. Morris and J. P. Newhouse. 1983- "A

Comparison of Alternative Models for the Demand for Medical Care,"

Journal of Business and_Economic Statistics, vol. 1, pp. 115-126.

, , , and . 1984. "Choosing Between the

Sample-Selection Model and the Multipart Model," Journal of Business

and Economic Statistics, vol. 2, pp. 283-289.

Dudley, L. and C. Montmarquette. 1976. "A Model of the Supply of

Bilateral Foreign Aid," American Economic Review, vol. 66, pp. 132-142.
-------
2-43
Hausraan, J. A. 1978. "Specification Tests in Econometrics,1' 2conometrica,

vol. 46, pp. 1251-1271.

, B. Hall and Z. Griliches. 1984. "Econometric Methods for Count

Data with an Application to the Patents-R&D Relationship," Econometrica,

vol. 52, pp. 909-938.

, B. Ostro and D, Wise. 1984. "Air Pollution and Lost Work," NBER

working paper 1263, January.

Hay, J. W. and R. J. Olsen. 1984. "Let Them Eat Cake: A Note on

Comparing Alternative Models of the Demand for Medical Care," Journal of

Business and Economic Statistics, vol. 2, pp. 279-282.

Heckman, J. 1976. "The Common Structure of Statistical Models of

Truncation,. Sample Selection and Limited Dependent Variables and a

Simple Estimator for Such Models," Annals of Economic and Social

Measurement, vol. 5, pp. 475-492.

. 1979. "Sample Selection Bias as a Specification Error,""

Econometrica,. vol. 47, pp. 153-161.

Hurd, M. 1979. "Estimation in Truncated Samples When There is

Heteroscedasticity,™ Journal of Econometrics, vol. 11, pp. 247-258.

Johnson, N. L. and S. Kotz. 1969. Distributions in Statistics; Discrete

Distributions (New York: Wiley).

and . 1970. Distributions in Statistics; Continuous

Univariate Distributions - I (New York: Wiley).

Kendall, M. G. and A. Stuart. 1973. Advanced Theory of Statistics, vol. 3

(London: Griffin).

Killingsworth, M. R. 1983. Labor Supply (Cambridge: Cambridge University

Press).

Lin, T.-F. and P. Schmidt. 1984. "A Test of the Tobit specification
-------
2-44
Against an Alternative Suggested by Cragg," Review of Economics and

Statistics, vol. 66, pp. 174-177.

Maddala, G. S. 1977. Econometrics (New York: McGraw-Hill).

. 1983. Limited-Dependent and Qualitative Variables in Econometrics

(Cambridge: Cambridge University Press).

Manski, C. F. and D. McFadden. 1981. Structural Analysis of Discrete Data

with Econometric Applications (Cambridge, Mass: MIT Press).

Morey, E. R. 1981. "The Demand for Site-Specific Recreational Activities:

A Characteristics Approach," Journal of Environmental Economics and

Management, vol. 8, pp. 345-371.

Nelson, F. D. 1981. "A Test for Misspecification in the Cenaored-Normal

Model," Econometrica, vol. 49, pp. 1317-1329.

Qlsen, R. 1980. "Approximating a Truncated Normal Regression with the

Method of Moments," Econometrica, vol. 48, pp. 1099-1106.

Ostro, B. 1983. "The Effects of Air Pollution on Work Loss and

Morbidity," Journal of Environmental Economics and Management, vol. 10,

pp. 371-382.

Pearson, K. and A. Lee. 1908. "Generalized Probable Error in Multiple

Normal Correlations," Biometrika, vol. 6, pp. 59-68.

Pitt, M. 1983. "Food Preferences and Nutrition in Rural Bangladesh,"

Review of Economics and Statistics, vol. 65, pp. 105-114.

Portney, P. R. and J. Mullahy. 1985. "Urban Air Quality and Acute

Respiratory Illness," Journal of Urban Economics, forthcoming.

Rao, C. R. 1965. Linear Statistical Inference and Its Applications, (New

York: Wiley).

Rosenzweig, M. R. and K. I. Wolpin. 1984. "Migration Selectivity and the

Effects of Public Programs," University of Minnesota, Economic
-------
2-45
Development Center, Bulletin

Rosett, R. N. and F. D. Nelson. 1975. "Estimation of a Two-Limit Probit

Regression Model," Econometrica, vol. 43, pp. 141-146.

Royal1, R. 1984. "Robust Inference Using Maximum Likelihood Estimators,"

Johns Hopkins University, Department of Biostatistics Working Paper.

Schmidt, P. 1976. Econometrics (New York: Marcel Dekker).

Smith, M. and G. Maddala. 1983- "Multiple Model Testing for Non-Nested

Heteroscedastic Censored Regression Models," Journal of Econometrics,

vol. 21, pp. 71-81.

Stapleton, D. and D. Young. 1984. "Censored Normal Regression with

Measurement Error on the Dependent Variable," Econometrica, vol. 52,

pp. 737-760.

Stewart, M. B. 1983. On Least Squares Estimation when the Dependent

Variable is Grouped," Review of Economics Studies, vol. 50, pp.

737-753.

Tobin, J. 1957. "Estimation of Relationships for Limited Dependent

Variables," Eoonometrica, vol. 26, pp. 24-36.

Wales, T. and A. Woodland. 1983. "Estimation of Consumer Demand Systems

with Binding Non-Negativitay Constraints,* Journal of Econometrics,

vol. 21, pp. 263-285.

White, H. 1982. "Maximum Likelihood Estimation of Misspecified Models,"

Sconometrica, vol. 50, pp. 1-25.

White, H. 1983. "Corrigendum," Econometrica, vol. 51, p. 513.
-------
Chapter 3

AIR POLLUTION MONITORS AND INDIVIDUAL EXPOSURES

The models estimated in Volume I typically utilized as measures of

an individual's exposure the pollutant-specific readings from the

monitor nearest the centroid of the respondent's census tract for which

the data were available. In most cases, screening criteria were

established so that it was necessary both for a monitor to have recorded

at least some minimal nunber of hourly readings during the two-week

period and for the monitor to be located not further than some

prescribed distance (20 miles; 10 miles) frcra the residents' census

tract centroids.

It is possible that the nearest-monitor readings we utilized are

not representative of the pollution "profile" of the metropolitan area

in which each respondent lives. If some average of the readings fron a

nunber of nearby monitors better characterizes the ambient

concentrations facing the individuals in question, then the consistency

of results obtained using nearest-monitor readings must be called into

question. (This abstracts, of course, from the larger question of the

ability of ambient monitors at all to measure the exposure of

individuals.)

The purpose of this very brief chapter and assess the pollution

profiles constructed using the nearest-monitor readings versus those

that result when the average readings from a number of monitors are

used. The extent to which the two constructs are correlated indicates

the sensitivity of our results to the use of nearest-monitor readings to
-------
3-2

characterize exposure.

The procedure is as follows. For each of the six pollutants used

in our study—ozone, 3ulfates, TSP, NO , CO, and S02—we utilized the

data for the 14,441 adults in the main sample and constructed the

nearest-monitor measures used in the main study. These were designated

as XXNR01, where XX is the specific pollutant (03,34,SP,N2,GO,32). In

our study, recall, these measures were subjected to a

miles-from-census-tract-centroid cutoff of 5, 10, or, most often, 20

miles; the specific distance will be obvious from the context. (Mo

minimal hours standard is used here.)

For these same individuals, we then constructed two averaged

measures for each pollutant. The two measures constructed were the

simple average over all the available readings from monitors within 10

and then within 20 miles of the census tract centroids. These measures

were designated XXAVGYY, where XX was as defined above and Y.Y was either

10 or 20. Thus, N2AVG20 is the average of all nitrogen dioxide monitors

within 20 miles of the census tract centroid.

Then, given these measures, we calculated for each pollutant the

correlation between the nearest-monitor reading and the area-average

reading at both the 10 and 20 miles cutoff values.We also calculated the

number of monitors used to construct the two area-aver ages.

The results are presented in the tables that follow. In each, case,

"r" is the simple correlation coefficient between the nearest-monitored

reading and the 1 0 or 20 mile averaged readings.
-------
3-3
10 mile:
OZONE
,965, N = 8,323

03NR01
03AVG1 0
20 mile:

03NR01
03AVG20
Number
n
1
2
3
4
5
Mean
.'0454
.0460
r - .'931,
Number
n.
1
2
3
4
5
Mean
. 0450
.'0461
Monitor
f(n)
3832
2303
985
553
302
Max
.251
.'236

Monitor
f(n)
2665
2463
1447
954
847
Max
.'251
.225
Readings in Area-Average:
n
6
7
8
9
10
Min
0
0
N =»
f(n)
244
262
179
112
51

11 ,241
Readings in Area-Average:
n
6
7
8
9
10
Min
0
.003
f(n)
679
646
550
631
359

-------
3-4
SULFATES
10 mile: r = .952,
Number
n
1
2
3
4
5
Mean
S4NR01 10.528
S4AVG10 10.544
20 mile: r - .'91 2,
Number
n
1
2
3
4
5
Mean
S4NR01 10.590
S4AVG20 10.523

Monitor
f(n)
2693
1 134
401
308
247
Max
52.136
52.136

Monitor
f(n)
2595
1230
823
614
559
Max
52 .'136
52.136
N = 5,
249
Readings in Area-Average:
n
6
7
8
9
10
Min
0
0
N - 7
f(n)
134
109
101
116
6

,512
Readings in Area-Readings:
n
6
7
8
9
10
Min
0
1.586
f(n)
526
559
329
250
27

-------
3-5
TSP
10 mile:
859,
N - 12,598
Number Monitor Readings in Area-Average:

SPNR01
SPAVG10
20 mile:

SPNR01
SPAVG20
n
1
2
3
4
5
Mean
70.' 478
72.021
r = .'818
Number
n
1
2
3
4
5
Mean
70.128
71.948
fCn)
1851
1283
1275
979
1012
Max
284.004
253.^28

Monitor Readings
f(n)
822
895
702
797
1 084
Max
284.004
272.244
n
5
7
8
9
10
Min
9.996
15.092
N =
f(n)
1039
975
1207
1455
1522

13-772
in Are a- Aver age:
n
6
•j
8
9
10
Min
9.996
15.092
f(n)
1272
1373
1429
2330
3068

-------
3-6
10 mile: r
,951
N - 6,393
Number Monitor Readings in Area-Average:
f(n)
f(n)

N2NR01
N2AVG1 0
20 mile:
1
2
3
4
5
Mean
117.857
1 1 7. 323
r - .'923
3195
1593
775
485
212
Max
435.316
435.316

Number Monitor

N2NR01
N2AVG20
n
1
2
3
4
5
Mean
112.913
111. '646
f(n)
3004
1890
747
442
417
Max
435.' 31 6
375.' 928
6
7
8
9
1
Min
0
0
N
Readings in
96
36
1
0
0 0

=. 8,452
Area-Readings:
n f(n)
6
7
8
9
1
Min
0
0
668
692
398
168
0 26

-------
3-7
CO
10 mile: r = .887
N = 3,921
Number Monitor Readings in Area-Average:

CONR01
COAVG1 0
20 mile:

CONR01
COAVG20
n
1
2
3
4
5
Mean
3.306
3.937
r =» .'838
Number
n
1
2
3
4
5
Mean
3-717
3.808
f(n)
3638
2087
946
676
510
Max
26. 583
26.583
Monitor Readings
f(n)
2344
2536
1130
1215
984
Max
26.583
25.111
n
6
7
8
9
10
Min
0
0
N = 10,939
in Area-Average
n
6
7
8
9
10
Min
0
0
f(n)
288
280
235
146
115

f(n)
480
298
553
838
56i

-------
3-3
10 mile: r
,'857
M = 8,842
Number Monitor

S2NR01
S2AVG1 0
20 mile:

S2NR01
S2AVG20
n
1
2
3
4
5
Mean
68. 591
69.375
'r = .'819
Number
n
1
2
3
4
5
Mean
66. 222
67.050
f(n)
2976
1855
1280
966
770
Max
760.088
568.988

Monitor
f(n)
2414
1733
1204
1129
1069
Max
760.088
568.988
Readings in Area-Average:
n
6
7
8
9
10'
Min
0
0
N -
f(n)
356
182
206
192
59

10,784
Readings in Area-Average:
n
6
7
3
9
10
Min
0
0
f(n)
919
726
841
541
208

-------
3-9

The results are quite reassuring about the use of nearest-monitor

data to proxy individual exposure.' The correlation coefficients between

the nearest-monitor reading and the average of all monitors within 10

miles range from .965 for ozone (a highly dispersed pollutant) to .86

for TSP and SO (more localized pollutants). The 20-mile correlations

follow similar relationships, but are of course somewhat lower than the

10-mile correlations due to the decreased weight of the nearest-monitor

reading in calculating the 20-mile averages. What is particularly

encouraging is that no correlation coefficient is below 0.8, leading us

to suspect that the use of the nearest-monitor reading would be unlikely

to impart any systematic biases vis-a-vis use of area-averaged readings.

In the following chapter, we make use of the area-averaged readings to

test this suspicion.
-------
Chapter 4

URBAN AIR QUALITY AND ACUTE RESPIRATORY ILLNESS

4.1 Introduction

Over the past fifteen years, economists interested in the benefits of

air pollution control have concerned themselves with more than just the

appropriate valuation of health gains and losses. In addition, some have

explored in epidemiological analyses the actual physical relationships

between air pollution and health itself using statistical techniques common

in the social and natural sciences. Most of these studies have used

aggregate data at the city or SMSA level to test for the effects of

prolonged exposures to air pollution on the mortality rates across the

units of observation. The studies of Lave and Seskin [8], Crocker, et al.

[2], Mendelsohn and Orcutt [12], Chappie and Lave [1], and Lipfert [10] are

among the best examples.

Relatively less attention has been given in this literature to the

relationship between air pollution and sickness (or morbidity). This is

unfortunate because morbidity is observed much more frequently than

mortality and may be of greater economic significance than premature death.

When researchers have examined possible links between air pollution and

morbidity, they have generally been forced through lack of data to do so in

the absence of information about individuals' socioeconoraic and other

characteristics—even though these characteristics may have an important

effect on health status.
-------
4-2

Volume I presents our recently completed comprehensive investigation

of the effects of ozone (ground-level rather than stratospheric) and other

air pollutants on individuals' acute and chronic health status. Unlike

many previous studies, this work is based on a large and relatively

detailed individual data base, allowing controls for certain important

socioeconomic and demographic characteristics in addition to the

meteorological measures sometimes included in earlier studies using either

aggregate or less detailed individual data. This chapter presents seme of

the major findings concerning the effects of urban air quality on acute

respiratory disease using an estimation technique not employed in Volume I.

Chapter 7 reports some new findings on air pollution and chronic illness.

Of particular concern here is the sensitivity of the findings to the

measures of air quality used. As suggested above, most previous analyses

of the health effects associated with air pollution have characterized

individual exposures using some measure of air quality averaged over most

or all of the monitors in the urban areas where the individuals live.

However, many persons may get most or all of their ambient exposure

proximate to the monitors nearest their homes. As part of our larger study

in Volume I, therefore, each individual in the sample was matched to the

nearest ten air pollution monitors for each of eight different air

pollutants so as to use close-to-home pollution readings to characterize

exposure. Because this was very resource-intensive, it is important to

illustrate the difference such an approach may make when estimating

dose-response relationships. Additional sensitivity 'analyses in this

chapter explore interactive effects as well as possible thresholds and
-------
4-3

non-linearities in the relationship between air pollution and acute health

status.

In Section II we briefly describe the data used in our analysis and

the independent variables we include. In Section III, we discuss the

estimating techniques used to explore possible links between air pollution

and acute respiratory disease. In Section IV we present our empirical

findings and in Section V we draw some cautious inferences from them for

applied welfare calculations.

4-2. Framework for the Analysis

The individual data underlying both our larger study as well as the

present chapter come from the 1979 Health Interview Survey (HIS)—a

nationwide sample of approximately 110,000 individuals conducted during the

course of each year by the National Center for Health Statistics. All

acute illness experienced during the two-^week period prior to the date of

each interview was to be reported by each respondent or the family member

responding for him or her. Manifestations of these illnesses were

classified in three types—bed disability days (the most serious of the

three categories), work or school loss days, and what might best be thought

of as minor restricted activity days. The latter are days on which the

respondent was neither bed-ridden nor forced to miss work or school but did

suffer from an acute impairment sufficient to cause him or her to restrict

activity in some noticeable way. The dependent variable in the subsequent

analysis is total restricted activity days—the total num'ber of days during

the two-week period on which any of these three types of acute illness
-------
4-4

occurred. Finally, all acute (and chronic) health information elicited in

the survey was coded by cause, using the International Classification of

Disease. Attention is limited in this chapter to total restricted activity

days due to respiratory disease since this is the type of acute impairment

most likely to result from exposure to air pollution.

The socioeconomic data elicited from each respondent in the Health

Interview Survey includes, among many other individual and

household-specific characteristics, information on age, race, sex, income,

and education. In addition, several supplements to the 1979 survey made it

particularly useful for epidemiological purposes. Specifically, the 1979

HIS contained a supplemental questionnaire asked of one-third of all the

adults interviewed (26,271 of a total of 79,743 adults) which provided

detailed data on lifetime smoking history, including the tar and nicotine

content of the brands most commonly smoked. Smoking data are obviously of

great importance if one is interested in exploring the determinants of

respiratory and other types of disease. The 1979 HIS also included a

supplement (again to one-third of all adults surveyed) designed to provide

detailed information on residential histories. This is not important for

our present purposes but will play a major role in our analysis in Chapter

7 of the determinants of chronic respiratory and other types of disease.

All air pollution data come from the Environmental Protection Agency's

SAROAD system. For our analysis of the relationship between air pollution

and acute morbidity in the larger study, all air quality data were specific

to the two-week recall period for which individual health data were

available. This is also the case here, save for sensitivity analyses
-------
4-5

conducted using annual average data as a proxy for air quality during the

two-week period. As indicated above, most of the analysis below

characterizes individuals' exposures to air pollution using data from the

air pollution monitors nearest their residences. No individuals are

included in the final sample if the nearest monitor for any pollutant is

more than ten miles away. The average distance to the nearest monitor is

slightly more than four miles. In addition to the air pollution data,

meteorological data were added from the monitoring network of the National

Oceanic and Atmospheric Administration. Included are observations on

temperature and precipitation during the two-week recall period.

The overall sample from which the subsample used here is drawn

consists of 14,441 individuals aged seventeen and above for whom both

smoking data and at least some air pollution data were available. The

models estimated below are based on a smaller subsample, however, since

complete data are required for each of the air pollutants and other

independent variables.

The analysis of acute respiratory disease includes air pollution data

during the two-week recall period for ozone, a gaseous pollutant that is

the primary constituent of smog, and well as sulfates, perhaps the most

harmful of the airborne particles. It is worth noting that the computer

algorithm used to match individuals to the ten nearest ozone and sulfate

monitors could only be used for monitors within SMSAs. Thus, the

estimation sample consists entirely of city and suburban residents from

around the United States. Table 4-1 lists the independent variables used

in the analysis of acute respiratory disease and their sample means.
-------
U-6
Table 4-1. Variable Definitions and Sample Means
Variable Name
OZNEAR
S4NEAR
OZAV1 Of
S4AV10
OZAV20
S4AV20
OZANNR
S4ANNR

OZAN1 0

S4AN10

OZAN20
Description
Average daily maximum one-hour ozone
reading during two week recall period
at monitor nearest the centroid of
respondent's census tract of residence
(in parts per million)

Average 24-hour sulfate concentration
during two weeks at nearest monitor
(see above) (in micrograms per
cubic meter)
Average daily maximum one-hour
ozone reading during two weeks
averaged over all monitors within
a ten mile radius of respondent's
census tract centroid

Average 24-hour sulfate concentration
during two weeks averaged as in OZAV1 0

Same as OZAV10 but averaged over all
monitors within 20 mile radius

Same as S4AV1 0 but averaged over all
monitors within 20 mile radius

Average daily maximum one-hour ozone
concentration over entire calendar
year 1979 as measured at the nearest
monitor

Average 24-hour sulfate concentration
over calendar year 1979 as measured at
the nearest monitor'

Same as OZANNR but averaged over all
monitors within ten mile radius

Same as S4ANNR but averaged over all
monitors within 10 miles

Sane as OZAN10 but averaged over all
monitors within 20 miles
Sample Mean

0.042
10.876
0.043
10.890

0.044

10.700

0.042
10.752

0.043

10.709

0.044
-------
1-7
Table 4-1 (cont'd). Variable Definitions and Sample Means
Variable Name

S4AN20
WHITE
MALE
INCOME
AGE
GIGS
FORMER
SCHLYR
CHRONIC
Description
Same as S4AN1 0 but averaged over
all monitors' within 20 miles
Dummy variable: 1 if white,
0 otherwise
Dummy variable: 1 if male,
0 if female
Annual household income
in dollars
Age in years
Number of cigarettes smoked per day
Dummy variable: 1 if respondent
formerly smoked regularly but does
not presently, 0 if not
Years of education completed
Dummy variable: 1 if respondent
Sample Mean

10.588
0.852
0.436
17,152
42.30
7.58
0.20
11.73
0.17
MAXTMP
RAIN
RRAD
has any limitation in activity due
to chronic illness, 0 otherwise

Average daily maximum temperature 64.02
during two-week period

Average daily rainfall during 0.12
two-week period

Number of respiratory-related restricted 0.162
activity days during two-week recall
period
-------
4-3

4.3 Model Specification

For reasons of economy and computational simplicity, most of the

models in Volume I were estimated using ordinary least squares and logit

techniques (where the dependent variable was, respectively, either the

number of days of a particular kind of impairment during the two-week

recall period or a dichotomous indicator of an individual having at least

one day of that kind of impairment during the period). As Chapter 2 points

out, however, estimation techniques like OLS are not ideally suited to the

nature of our measures of acute health status, however. Recall that that

measure is the number of respiratory-related restricted activity days

during the two-week recall period (RRADs). Clearly this measure is bounded

by zero and fourteen and because of survey protocol can assume only integer

values in {0,1,2,...'., 14}. The frequency distribution of RRADs for the

sample of 3.347 adults is presented in Table 4-2. Because, of the small

number of observations at the upper (14 day) limit, the implications of

this upper bound for estimation strategy are ignored in the following

analysis; we concentrate instead on the complications arising from the

overwhelmingly large number of individuals reporting zero RRADs.

A standard approach in such circumstances is to use the Tobit or

censored normal estimator where one observes T independent observations on

yfc which are the realizations of random variables Y * subject to the
L. t

censoring rule y =«max(0,y *), Y. *-N(X 3,

obtained using the Tobit model are generally inconsistent when the

underlying data are not distributed as censored normal with
-------
4-9

Table 4-2. RRAD Frequency Distribution
RRAD
0
1
2
3
4
5
6
7
8
10
11
12
14
1 OBS
3227
25
28
23
9
7
2
3
3
3
1
1
15
%_
96.42
0.75
0.84
0.69
0.27
0.21
0.06
0.09
0.09
0.09
0.03
0.03
0.45
-------
4-10

independent-, identically distributed errors. Estimating a Tobit model of

RRADs using the two-week average pollution data from the nearest monitor

and the other independent variables in Table 4-1 above, some tests for its

appropriateness were conducted and strong evidence of tnisspecification was

found. While this might be attributable to omitted variables or other

factors unrelated to departures from the usual assumptions about the error

distribution in the Tobit model, a different statistical approach is

utilized here.

In modeling event counts (non-negative integer data) over a time

interval (t,t+dt), the Poisson distribution is commonly used. Here,

discrete random variates Y follow the probability law:
(1)

=• 0 , else

with E(Y ) - Var(Y. ) - \ . Given the nonnegative integer nature of the
u U t ,

RRAD measure, such a probability law has obvious appeal for estimation.

Analogous to the normal distribution where for econometric work one

typically specifies E(Y.fc) - \i - X 8, the parameter of the Poisson

distribution can be reparameterized to admit the influence of covariates.

Since for all t, X > 0, a straightforward approach is to assume A -
c t

exp(X 3) and to estimate 8 by maximum likelihood (see Hausman, Hall,

Griliches [5], Hausman, Ostro, Wise [4]). This is the approach adopted

here for modeling the RRAD outcomes.
-------
4-11
A drawback of the Poisson specification is the restriction that E(Y. )
t

= Var(Y ). Should this restriction not characterize the data, the maximum
w

likelihood estimates of the covariance matrix of 3 will be inconsistent and

asymptotic t-tests based thereon would be misleading. Hausman, Ostro, and

Wise circumvent this restriction by allowing for an overdispersion

parameter. We take a different approach here, using an estimator of the

covariance matrix that is more robust against departures from the

restriction that the mean be equal to the variance. Details of this

procedure are presented in the appendix.

Given the assumptions on the parameterization of the \. , the
U

log-likelihood function to be maximized is:
(2) i =« I-exp(XtS) +• ytXt& + c,

w
where X. is the vector of independent variables as described in Table 1, y.
W . V*

is the observed READ count for individual t, and c does not depend on 3.

The ML estimate of 8 satisfies:
(3) 3*/3B - I(-exp(X.S) * y.)X! - 0.

t G c
4.4. Empirical Results

Table 4-3 presents the results of our basic model and the variants

designed to test the sensitivity of the results to assumptions about
-------
4-12

individual exposures to ambient air pollution. In specifications (3.1) -

(3-3) each individual's count of respiratory restricted activity days is

hypothesized to be related to ambient air quality during the individual's

two-week recall period. In (3.1) exposures are proxied by readings from

the one ozone and one sulfate monitor nearest each individual's residence;

in specifications (3-2) and (3.3). readings are averaged, respectively,

over all monitors within 10 and 20 miles of each respondent's residence.

Specifications (3.4) - (3-6) use annual 1979 average air pollution readings

as a proxy for air pollution exposure during each recall period. As in

(3«D ~ (3»3)» equation (3-4) uses the annual average a.t the nearest

monitor to proxy individual exposure while (3-5) and (3.6) use the average

of the annual averages of all monitors within 10 and 20 miles respectively.

Table 4-3 indicates that of the non-pollution variables, race, income

and temperature are related consistently across models to RRADs in a

statistically significant way—with whites, those with lower incomes, and

those exposed to colder temperatures all experiencing relatively more acute

respiratory illness during the two-week recall period. Because those

reporting the presence of a chronic illness would be expected to experience

more restrictions in activity during any two-week period, a dummy variable

identifying such individuals was included. As expected, this dummy

variable was positively and significantly related to the number of RRADs.

Finally, while always of the expected sign, the number of cigarettes smoked
-------
4-13
Table 4-3. Model Estimates: Sensitivity to Air Pollution Measurement
('Dependent variable is RRADs during two-week recall period)
Model
Independent 3.1 3.2
Variable
OZNEAR 6.883
(1.97)
OZAV10 6.614
(1.91)
OZAV20
OZANNR
OZAN1 0
OZAN20
S4NEAR -0.005
(0.22)
S4AV10 -0.0210
(0.67)
S4AV20
S4ANNR
S4AN10
S4AN20
WHITE 1.261 1.258
(2;87) (2.86)
3.3 3.4 3.5 3.6

9.324
(2.41)
17.603
19.449
(2.88)
1 7. 473
(2.12)

-0.046
(1.4)
-0.0175
(0.41)
-0.0558
(1.34)
-0 . 0765
(1.87)
1.249 1.165 1.163 1.188
(2.85) (2.65) (2.65) (2.72)
-------
4-14
Table 4-3 (cont'd.) Model Estimates: Sensitivity to Air Pollution
Measurement (Dependent variable is RRADs during two-week
recall period)
Model
Independent
Variable
MALE
INCOME
AGE
GIGS
FORMER
SCHLY.R
CHRONIC
MAXTMP
RAIN
INTERCEPT
N
*
L
3.1
-0.054
(0.19)
-0.000035
(2.3D
0.00031
(0.05) -
0.015
(1.53)
0.312
(0.89)
0.0067
(0.17)
0.776
(2.45)
-0.019
(2.45)
1.629
(1.07)
-2.127
(2.06)
3,347
-741.5
3.2
-0.058
(0.21)
-0.000035
(2:30)
0.00050
(0.08)
0.015
(U56)
0.319
(0.91)
0.0066
(0;17)
0.769
(2.42)
-0.013
(2.54)
1.735
(1:13)
-1.993
(1.92)
3,347
-740.9
3-3
-0.064
(0.23)
-0.000035
(2;28)
0.00086
(0;14)
0.016
(1.62)
0.323
(0:92)
0.0067
(0.17)
0.760
(2.39)
-0.021
(2;70)
1.952
(1^28)
-1.780
(1.67)
3,347
-732.0
3.4
-0.062
(0.22)
-0.000035
(2.27)
0.00076
(0.13)
0.016
(1.7D
0.340
(0.98)
.000062
(0.02)
0.707
(2.18)
-0.016
(2;18)
1.801
(1.12)
-2.559
(2;26)
3,347
-710.0
3.5
-0.065
(0.23)
-0.000034
(2.22)
0.0013
(0.23)
0.016
(1.70)
0.344
(o;99)
0.0035
(0;09)
0.071
(2:19)
-0.017
(2.38)
1 .992
(1.30)
-2.257
(1:93)
3,347
-707.0
3.6
-0.055
(0.20)
-0.000033
(2.19)
0.0013
(0.21)
0.016
(1.72)
0.328
(0:94)
0.0010
(0.03)
0.720
(2;24)
-0.013
(2.48)
2.049
(U34)
-1.950
(1.66)
3,347
-712.0
L = Log likelihood

(Asymptotic normal statistics for Ho:3
•0 in parentheses)
-------
4-15

per day and the dummy variable indicating that the respondent is a former

smoker were not significant at conventional levels, a somewhat surprising

finding given the concentration on respiratory disease.

The main focus of our analysis is the relationship between acute

respiratory disease (as measured by RRADs) and urban air quality. As Table

4-3 indicates, in only one of the six specifications is the hypothesis of

no relationship between ozone and RRADs not rejected at at least the 95S

level. This finding is fully consistent with the analysis in Volume I

where we used different samples, estimating techniques, and combinations of

independent variables—including monitored readings for as many as five

separate air pollutants. There positive and significant associations

between ozone and RRADS in adults were frequent although not uniform.

The statistical significance of the ozone coefficients is not altered

appreciably by using monitored readings averaged over 1 0 or 20 miles rather

than readings at the nearest monitor. This is intuitively plausible since

ozone tends to be a diffuse (as opposed to a "hot-spot") pollutant. To the

extent they are general izable, our findings suggest that city or SMSA-wide

average readings may be preferable to nearest-monitor readings to

characterize individual exposure to ozone in view of the resources required

to obtain the latter.

Using air pollution data averaged over the entire year during which

the health interview took place—models (3.4) - (3.6)—results in larger

estimated coefficients and higher asymptotic t-ratios for the ozone

variable than when air quality data contemporaneous to the recall period

are used. The importance of this finding should be discounted, we
-------
4-16

believe. So long as one is concerned with the possible relationships

between urban air quality and day-to-day variations in acute morbidity, the

correct measure of pollution must be one which is coincident with, or

slightly precedes, the period during which health status is being observed.

To illustrate, consider an individual interviewed for the HIS on January

15, 1979. Clearly, using 1979 annual average air pollution readings for

ozone and sulfates to help explain RRADs between jgiuary 1-14 brings into

play 50 weeks of data which could have no effect whatsoever on health

during the recall period. For this reason, the use of contemporaneous (or

"real time") air pollution data should be considered the conceptually

correct approach when analyzing acute respiratory disease.

Based on the findings in Table 4-3 we cannot reject the hypothesis of

no relationship between ambient sulfate concentrations and RRADS during the

two-week recall period. It should be noted, however, that sulfates and

other particulates are generally monitored only every six days. Thus, any

two-week period will contain at most three 24-hour sulfate measurements and

this may affect the findings. (Ozone, on the other hand, is monitored

continuously and is measured in specifications (3-1) ~ (3-6) by the average

daily maximum one-hour reading—measured during the recall period or

annually depending on the equation.) Note also that the coefficient on

sulfates is more sensitive to the choice of exposure proxy. This is

because concentrations of sulfates and other particulates exhibit greater

il
variation within an area than does ozone. (It should be noted here that

the sample correlation between OZNEAR and S4NEAR is 0.108. We conducted
-------
4-17

teats for possible degradation of parameter estimates due to collinearity

but found no evidence thereof.)

Prior clinical and epidemiological analyses suggest the possible

importance of interactive or synergistic effects of certain air pollutants

(see Hazucha and Bates [7] and Graves and Krumm [3], for instance).

Accordingly, the existence of such an effect between ozone and sulfates

(OZXS4) is tested. The results are presented in specification (4.1) in

Table 4-4, and do not support the hypothesis that such effects are

important. In (4.2) another hypothesized interactive effect is tested, that

between ozone and average maximum temperature (OZXTEMP) during the recall

period. Again, no evidence of such an effect is found. These results are

consistent with the more extensive analysis of interactive effects in

Volume I.

So-called threshold effects or other types of non-linearities in the

relationship between ozone and RRADs are potentially important and are

tested for here. To see whether the relationship with RRADs differs

between lower and higher concentrations, the sample was twice divided into

two separate regimes, once with the dividing point being 0.05 ppm.

Separate coefficients were estimated on the ozone variable in the lower and

higher regimes. In this specification ozone is positively and

significantly associated with the expected number of RRADs in regimes both

above and below 0.05 ppm. A causal inspection of the coefficients in (4.3)

could convey the impression that a marginal change in ozone will have a

larger impact on RRADs at lower than at higher concentrations. In fact,

this is not the case. When the first derivatives of the estimating
-------
4-13
Table 4-4. Model Estimates: Alternative Specifications (Dependent variable is
'RRADs during two-week recall period)
Model
Independent
Variable
OZNEAR
OZH75
OZL05
(OZNEAR)2
(OZNEAR)1/2
34 NEAR
OZXS4
OZXTEMP
WHITE
MALE
INCOME
AGE
4.1
7.410
(1.24)

-0.003
(0.09)
-0.047
(0.09)

1.262
(2.37)
-0.054
(0.19)
-0.000035
(2.3D
0.00031
(0.05)
4.2
70.659
(1.77)

-0.003
(0.12)

-0.874
(1:65)
1.235
(2.80)
-0.053
(0:19)
-0.000036
'(2.37)
0.00067
(0.11)
4.3

9.554
(2.71)
22.505
(2.11)

-0.0023
(0.10)

1.259
(2.86)
-0.049
(0.18)
-0.000036
'(2.38)
-0.000024
'(0.04)
4.4

1.343
(0.07)

-0.0017
(0.07)

1.290
(2.93)
-0.043
(0.15)
-0.000035
(2.32)
0.00025
(0.04)
4.5

4.926
(2.45)
-0.0074
(0.31)

1.239
(2.83)
-0.060
(0.21)
-0.000036
(2.32)
0.00034
(0.06)
-------
4-19
Table 4-4 Cont'd.) Model Estimates: Alternative Specifications (Dependent
variable is RRADs during two-week recall period)
Model
Independent
Variable
4.1
4.2
4.3
4.4
4.5

GIGS
FORMER
SCHLYR
CHRONIC
MAXTMP
RAIN
INTERCEPT
N
i
0.015
(t.53)
0.312
(0.89)
0.0067
(0.17)
0.776
(2.44)
-0.019
(2.44)
1 .632
U.07)
-2.152
(1.92)
3,347
-2049.4
0.015
(t.52)
0.318
(0.90)
0.0036
(0.09)
0.779
(2.49)
0.0059
(0.32)
1.763
(1.12)
-3.827
(2.29)
3,347
-2031 .2
0.015
(T.52)
0.318
(0.90)
0.0051
(0.13)
0.773
(2.44)
-0.019
(2.50)
1 .626
(T.07)
-2 . 489
(2.26)
3,347
-2039.2
0.014
O.49)
0.303
(0.87)
• 0.0063
(0.16)
0.773
(2.43)
-0.013
(1.85)
1 .366
(0.87)
-2 . 225
(2:14)
3,347
-2054.3
0.151
(t.55)
0.319
(0.91)
0.0071
(0.18)
0.781
(2.47)
-0.023
(2.92)
1.776
(T.17)
-2 . 498
(2.46)
3,347
-2043.1
(Asymptotic normal statistics for
Ho:3 -0
in parentheses)
-------
4-20

equation are evaluated at the appropriate ozone concentration for each of

the individuals in the low and high regimes and the resulting values then

averaged, the estimated first derivative is nearly twice as large in the

high as in the low regime.

Although the Poisson expectation function E(RRAD ) = exp(X 0) is
C £

non-linear, it does imply that the elasticity of S(RRAD ) with respect to
U

ozone is linear. To allow for greater flexibility, models (4.4) and (4.5)

are estimated using, respectively, the square and the square root of the

ozone concentration during the recall period at the nearest monitor. In

other words, the specification is:

(4) E(RRAD, ) - exp(Z.Y + aCOZNEAR. )5)
C v u

where Z is the vector of independent variables other than ozone and

Se(0.5, 2.0). When 6=0.5, a is positive and statistically significant;

when 5-2.0, a is positive but not significant. In fact, note that (4.5)

has a higher model likelihood than specification (3.O which is simply

equation (4) with 6-1.0, thus indicating that non-linearities in the ozone

specification are important.

4.5. Policy Implications and Conclusions

Ozone is one of six air pollutants for which the Environmental

Protection Agency has established maximum permissible ambient

concentrations. The controversy surrounding revision of the ozone standard

in 1978 (see White [20]), coupled with recent emphasis on cost-benefit
-------
4-21

analysis in government regulation (see Smith [17]), make it worthwhile to

illustrate the changes in acute respiratory health that might be associated

with changed ozone levels. We use a subset of the results presented above

to make such an illustrative calculation. The discussion here is confined

to specifications where ozone is measured by the average daily one-hour

maximum during the two weeks at the monitor nearest the respondent's

residence.

One way to assess possible pollution-related changes in acute health

status is to calculate the elasticity of E(RRAD) with respect to ozone and

evaluate the predicted total change in expected RRADs for the individuals

in the sample resulting from some hypothetical change in ozone

concentrations. Log-differentiating (4), it follows that:

(5) OE(RRAD.)/30ZNEAR.)(OZNEAR./E(RRAD. )) = 5a(OZNEAR. )5.
w \f U U C .

Note that for 5<1, the curvature of the expectation function (as

2 2
determined by 3 E(RRAD )/30ZNEAR ) cannot be determined without reference
c c

to the data for the t-th observation. It can be seen from (5) that in the

nonlinear cases where 5=0.5 or 2.0—as in specifications (4.4) and

(4.5)—evaluating the elasticity at the sample mean of OZNEAR will yield a
O

different estimate than that given by evaluating (5) for all t and then

averaging the elasticities. The results of both approaches are presented

in the top panel of Table 4-5.

The upper panel of the table indicates that the estimated elasticities

are quite sensitive to the value of
-------
4-22

1.0, the resulting elasticities are of .the same order of magnitude, with

the former roughly twice the latter. However, when 5-2.0, the estimated

elasticity is almost two orders of magnitude smaller than the others. Note

that these results .hold irrespective of the method of elasticity

calculation.

In the lower panel of Table 4-5 are presented the elasticities for the

model (4.3) in which ozone was permitted to have different coefficients in

low and high regimes. Recall that in this case 5=1.0, so that within each

regime the ozone elasticities are linear in ozone. Therefore, both methods

used above to calculate elasticities will yield the same result. However,

there are two relevant elasticity measures, one prevailing for observations

with ozone measures below the split and one for those above. Because of

the second-derivative properties noted above, reference to the parameter

estimates alone is insufficient to compare low- and high-regime

elasticities. In fact, it happens that the elasticity estimates for the

low-ozone and high-ozone regimes are virtually identical, 0.65 and 0.66,

respectively.
-------
4-23
Table 4-5. Elasticity Estimates for Alternative Specifications
Whole Sample
0.5

1 .0

2.0
Evaluated at Mean
of OZNEAR
0.506

0.290

0.0048
Mean of Individual
Elasticities
0.485

0.290

0.006
Split Sample (w/
1.0)
Split -

0.05 ppm

low regime

high regime

0.075 ppm

low regime

high regime
0.645

0.655

1.061

0.209
-------
4-24
Table U-6. Estimated Changes in RRADs Due to 10 percent Reduction in
'Ozone Concentration

5-
0.5
1 .0
2.0
Average Individual Reduction
each two weeks
(S1-S2)/n
-.00776
-.00442
-.000083
Annual Decrease in RRADs:
Urban Adult Population*

22.19 x 106
12.64 x 105
0.24 x 106
Calculated by multiplying the two-week individual change in column 2 by 26
to convert to annual changes and then by 100 million—the urban adult
population of the United States.
-------
4-25

The elasticity estimates can be used to estimate one type of health

improvement that might accompany reduced ozone concentrations. Using the
^ A A
estimates B = ( Y , a ) from the specifications (3.1), (1.4), and (4.5),

(5) is evaluated at (Z , OZNEAR ) for all t in the estimation sample and
u u
/•. <\ »
the sum S1 = I exp(Z Y + a(OZNEAR ) ) is calculated for each of the three
t t t
alternative specifications. This yields an estimate of the prevailing

count of RRADs in the sample of 3>347 adults given prevailing levels of the

independent variables including ozone. To evaluate the effect of a change,

we first assume that some hypothetical policy measure reduces by 10 percent

the two-week average daily maximum ozone concentration, OZNEAR , faced by
U

each individual and then calculate the sum 32 - Z exp(Z Y +
& t fc
a(.9*OZNEAR.) ).
w

For each of the three specifications, the average (S1-S2)/3347 is

calculated, thus giving an estimate of a typical individual's change in

two-week RRADs given a ten percent decrease in ozone concentrations.

Assuming an adult SMSA population of 110 million, and extrapolating the

two-week decrease in RRADs to an annual figure, we obtain for each

specification an estimate of the total annual decrease in

respiratory-related restricted activity days associated with a hypothetical

ten percent ozone reduction. The results are presented in Table 4-6.

It is here that the implications of the different specifications can

most forcefully be seen. At the two extremes are the 6=0.5 and 5=2.0

formulations of the model. In the former case, the ten percent reduction

evokes a total annual change of more than 22 million RRADs while in the

latter case the change is less than a quarter million RRADs.
-------
4-26

The final step in benefit estimation involves the assignment of dollar

values to these hypothetical improvements in health. Valuing reduced RRADs

is not easy, particularly since that measure embodies a range of

impairments from minor restrictions in activity to bed disability days.

However, based on separate analysis of adults' work loss and bed disability

in Volume I—wherein we found no significant associations with ambient

ozone concentrations and the more severe types of restrictions — we

presume that the effects predicted in Table 4-6 are minor restrictions in

activity.

Ideally, these minor RRADs should be valued using changes in

individuals' expenditure functions which reflect both labor-leisure

tradeoffs as well as the possibility of defending against pollution-related

illness (see Harrington and Portney [7], for instance). In practice,

alternative approaches are typically required. Using contingent valuation

methods, for example, Loehman et. al. [11] recently elicited individuals'

reported willingness to pay to avoid one day of various kinds of

respiratory impairments. The values ranged from $2.31 for a day of minor

coughing and sneezing to about $11.00 to avoid a day of severe shortness of

breath. Since the latter impairment is likely to be associated with a work

loss and/or bed disability day, the former value is probably more

appropriate for a minor RRAD. Because of the many uncertainties in

arriving at such estimates, however, we assume that a minor RRAD could be

valued at as much as $20. If each of a predicted 22 million fewer RRADs

are valued at $20, annual benefits to the adult urban population of the

U.S. would be $0.44 billion. If RRADs were as few as 250,000 (as predicted
-------
4-27

in the third row of Table 4-6), and each was valued at $2.31, the

corresponding total would be but $0.58 million.

It is important to note that reduced ozone concentrations may result

in other beneficial effects besides possible reductions in acute

respiratory illness. These include improved visibility, reduced damages to

forests, ornamental plantings, and agricultural output, as well as other

welfare-enhancing changes. All these would have to be considered (and

valued, where appropriate) in any comparison of the coats and benefits of

ozone control.

Even when attention is confined to acute respiratory illness, however,

the uncertainties in estimating benefits are substantial. Both here and in

Volume I, predicted changes in RRADs proved somewhat sensitive to the

choice and measurement of independent variables and, in Volume I at least,

the size of the sample over which the parameters were estimated. Even when

these are held constant, Table 4-6 demonstrates that predicted changes in

RRADs are also sensitive to the assumed form of the exposure-response

function (by two orders of magnitude). Moreover, this difference is based

on a comparison of point estimates without regard to confidence intervals

constructed about them. These uncertainties, coupled with sometimes

conflicting findings from other epideraiological or clinical studies, must

make one cautious in using studies like this in policymaking.
-------
4-28

APPENDIX

As described in Section 4.3, the log-likelihood function of the RRAD

models can be written as
(A1) I - I -exp(X 6) + y.X.0 + c,
. U U I/
w
where exp(X 8) = A . It is easy to show that i is concave in 8 so long as
t C

its inverse Hessian exists. As mentioned in Section 4.3, the maximum

likelihood estimates of 8 obtained by maximizing (A1) are consistent, but

the estimate of the covariance matrix of 3 . using minus the inverse of the

Hessian evaluated at $... will tend to be inconsistent if the data are not
ML

in fact generated by the specified Poisson distribution.

This is easily seen as follows. Note that the model can be

equivalently cast as a nonlinear least squares regression, the t-th

observation being
(A2) y =• E(Y.) + u.

u U
ufc
with E(u ) =• 0. Clearly, Var(u ) =» Var(Y ) - exp(X 3) so that the u. are
u C u u w

heteroscedastic. If nonlinear weighted least squares is used with the

weights exp(-X 8) formed using consistent estimates of 8, and if the data
U

are in fact Poisson-distributed as maintained, the maximum likelihood

<% -A.

consistent estimates of 8 and Cov(8) will obtain. (The consistency of 8Mr
n*j

for 6 does not depend on the weighting scheme.) However, if the data are
-------
4-29

not Poisson-distributed, the estimate of Cov(S) obtained in this manner

will be inconsistent and t-tests based thereon will be misleading. The

case is fully analogous to the estimation of the heteroscedastic linear

model which yields inconsistent covariance estimates (and, therefore,

t-statistics) if the heteroscedastic nature of the error structure is

either ignored or incorrectly specified.

White [18] and Royall [16] have demonstrated a method whereby
A
estimates of Cov(3) robust against misspecif ication of the underlying

2 — 1
distribution of the data can be obtained when [-3 J./3B30'] evaluated at
A /\
3^ fails to yield a consistent estimate of Cov(8). Denoting 1(3) as
2
[-3 X./363B'], their suggestion is to estimate Cov(S) as
(A3) KB)"1
where i is the t-th observation's contribution to the log-likelihood
TS
A
function and where all relevant evaluations in (A3) are at 8,., . This is
ML

the method used in constructing the confidence intervals for the parameter

estimates of Section 4-4. In these cases, the standard errors of the
" _i
parameter estimates obtained using KB) as the estimate of Cov(3) are

found to be about two to three times smaller than those obtained using this

alternative method. As noted by White [19], the alternative approach (i.e.

using (A3)) will typically lead to conservative inferences (i.e. "too
A
large" estimates of Cov(8)) in instances where X. is nonstochastic and
t
varies across t, as is the case here.
-------
Footnotes

.Specifically, the Tobit specification error tests of Nelson [13] and

Lin and Schmidt [9] were used. Nelson's is a Hausman test while that of

Lin and Schmidt is a Lagrange multiplier test. Under the null hypothesis

of no raisspecification, both test statistics are distributed asymptotically

2
central X (< \i where '< is the dimensionality of 3. For the specification

described above, both statistics indicate rejection of the no

misspacification hypothesis at better than the 98$ level.

2
These confidence intervals are constructed using the approach

discussed in the appendix, which should give conservative asymptotic

t-statistics. Confidence intervals based on minus the inverse Hessian of

the Poisson log-likelihood function, on the other hand, are much tighter,

but are almost certainly misleading (inconsistent), given the data used.

These results are available from the authors on request.

The substantial discrepancy between the magnitudes of the estimates

of the two-week and annual ozone coefficients results, loosely speaking,

from the fact that—while the sample means of the two measures are

virtually identical—the sample variances of the two-week measures are much

larger than those of the annual counterparts in conjunction with the fact

that the expectation E(RRAD ) is the convex function exp(X 3).
v U

For comparison's sake, model (3-D was also cast as a geometric

distribution. Here, Pr(Y -y) = Py/(1+P)y*1 for y-0,1.2,.... E(Y.) = P,
C _ v

var(Y ) =• P(l-t-P), and for purposes of econometric estimation E(Y ) =
U . U

exp(X 8) and Var(Y ) - exp(2X. 3) + exp(X 3) are specified. As expected,
U « w w
-------
4-31

the estimated variances of 3 were somewhat larger than those obtained using

the uncorrected variance version of the Poisson specification while the

estimates themselves were quite similar. However, like the Poisson, the

maximum likelihood variance estimates based on minus the inverse of the

Hessian evaluated at 8.,, are not generally consistent if the data are not
Mil

distributed according to the postulated geometric distribution. Thus,

while larger than the estimated variances of the uncorrected Poisson

specification, the ML estimates of the geometric parameter variances were

still substantially smaller than those obtained using the alternative

approach.

5 is, of course, a parameter to be estimated rather than a given

constant. The ML algorithm used to obtain the Poisaon parameter estimates,

however, did not permit estimation of such additional nonlinearities.

Kopp, Raymond, William Vaughan, Michael Hazilla and Richard Carson,

"Implications of Environmental Policy for U.S. Agriculture: The Case of

Ambient Ozone Standards," Resources for the Future working paper, January

5, 1984.
-------
4-32

References
[1] Chappie, Michael and Lester Lave, "The Health Effects of Air Pollution:
A Reanalysis," J._ Urban Econ., vol. 12 (1982) pp. 346-76.

[2] Crocker, Thomas, et. alv "Methods Development for Assessing Air
Pollution Control Benefits," Vol. 1, EPA Document EPA-600/5-79-001 a
(1979).

[3] Graves, Philip and Ronald Krumm, "Morbidity and Pollution: Model
Specification Analysis for Time-Series Data on Hospital Admissions,"
J. Environ. Econ. Manage., vol. 9 (1982) pp. 311-327.

[4] Hausman, Jerry, Bart Ostro, and David Wise, "Air Pollution and Lost
Work," NBER working paper no. 1263, January 1984.

[5] Hausman, Jerry, Bronwyn Hall, and Zvi Griliches, "Econometric Models
for Count Data with an Application to the Patents-R&D Relationship,"
Sconometrica, vol. 52 (1984) pp. 909-938.

[6] Harrington, Winston and Paul R. Portney, "Valuing the Benefits of
Health and Safety Regulation in the Presence of Defensive
Expenditures," RFF Quality of the Environment working paper
no. QE84-09, September 1984.

[7] Hazucha, Michael and David Bates, "Combined Effects of Ozone and Sulfur
Dioxide on Human Pulmonary Function,Ir Nature, vol. 257 (1975) pp.
50-51 . ...

[8] Lave, Lester and Eugene Seskin, Air Pollution and Human Health
(Baltimore, Md.: Johns Hopkins University Press, 1977).

[9] Lin, Tsai-Fen and Peter Schmidt, "A Test of the Tobit Specification
Against an Alternative Suggested by Cragg," Review of Economics ajid
Statistics, vol. 66 (1984) pp. 174-177.

[10] Lipfert, Frederick, "Air Pollution and Mortality: Specification
Searches Using SMSA-Based Data," J. Environ. Econ. Manage., vol. 11
(1984) pp. 208-243.

[11] Loehman, Edna et. al, "Distributional Analysis of Regional Benefits and
Costs of Air Quality Control," J. Environ. Econ. Manage., vol. 6
(1979) pp. 222-243.

[12] Mendelsohn, Robert and Guy Orcutt, "An Empirical Analysis of Air
Pollution Dose Response Curves," J. Environ. Econ. Manage., vol. 6
(1979) pp. 85-106.
-------
4-33
[13] Nelson, Forrest, "A Test for Misspecification in the Censored Normal
Model," Econometrioa. vol. 49 (1981) pp. 1317-1330.

[14] Ostro, Bart, "The Effects of Air Pollution on Work Loss and Morbidity,"
J. Environ. Econ. Manage., vol. 10 (1983) pp. 371-382.

[15] Portney, Paul and John Mullahy, "Ambient Ozone and Human Health: An
Epidemiological Analysis," report prepared for Economic Analysis
Branch, Office of Air Quality Planning and Standards, USEPA under
contract no. 68-02-3583, September 1983-

[16] Royall, Richard, "Robust Inference Using Maximum Likelihood
Estimators," Johns Hopkins University, Department of Biostatistics
Working Paper 549, 1984.

[17] Smith, V.K. (ed.), Environmental Policy Under Reagan's Executive Order
(Chapel-Hill, N.C.: UNC Press, 1984).

[18] White, Halbert, "Maximum Likelihood Estimation of Misspecified Models,"
Econometrica, vol. 50 (1982) pp. 1-25.

[19] , "Corrigendum," Econometrica, vol. 51 (1983) p. 513.

[20] White, Lawrence, Reforming Regulation; Processes and Problems
(Englewood Cliffs, N.J. : Prentice-Hall, Inc., 1981).
-------
Chapter 5

CONSTRUCTING A LIFETIME SMOKING PROFILE
USING THE 1979 HEALTH INTERVIEW SURVEY
We noted above that individuals' amassed "stocks" of

cigarettes consumed over a lifetime are potentially

significant influences on respiratory illness. Yet the

models estimated in Volume I all made use of a much more

crude measure of smoking behavior. An important issue

here, then, is the construction of a more sophisticated

measure given available data. One theoretically plausible

construct is K(T) = /exp ( -r ( T-t) )C (t )dt, where S-CT.T], _T
a ~
is time started smoking, T is present time, C(t) is

instantaneous cigarette consumption.at t, and r is a decay

or depreciation rate. The empirical representation of

K(T) is not straightforward, however, even given the

information available in the smoking supplement to the

1979 HIS.

This is so for several reasons. First, an
T
individual's entire lifetime smoking profile {C(t)} _ is.

never given in the data. This is so even if C(t) is

couched in discrete time as fC.} with reasonably
O
high-frequency (e.g. one month or even one year)

realizations. At best the profile can be approximated by

the use of subsidiary information. Second, the above

formulation is quite simple, one of an infinite number of

reasonable proxies for the "true" relationship. Third,
-------
5-2

while K(T) as defined above is in principle capable of

describing the effects of cigarette tar-nicotine content

and cigarette length, it seems that amending the

formulation to account for such influences would add

little to the analysis given the nature of the data.

The dataset used to construct the measures is of

course the 1979 HIS smoking supplement. This survey gives

a reasonably detailed picture of individuals' smoking

status at the time of the survey in addition to

information on past attempts to quit, age at which regular

cigarette smoking began, number of cigarettes smoked per

day at the time of peak consumption, and other attributes.

Yet most of the data in the smoking survey is of little

use insofar as construction of a "packyear" or stock

measure is concerned. (A check on several other datasets

containing information on smoking behavior reveals similar

or even more severe weaknesses.) Ignoring minor points

and the complications presented by problems such as faulty

recall, the most serious problems are the following.

Although data are given on peak daily cigarette

consumption, no information is available on when the peak

occurred (unless it coincides with, present consumption

levels, C(T)) nor on the duration of consumption at that

peak rate. Second, information on quits (number of

attempts; duration of time off) is insufficient to

construct for either current or former smokers a

reasonable profile of the time intervals over which C(t)
-------
5-3

was zero. Quit duration information is available only as

the interval from time last smoked to T for former smokers

and for the length of the single most recent quit (if any)

for current smokers. Some detail is provided for current

smokers on numbers of serious quit attempts, but what

constitutes a "serious attempt" is analytically

problematical, a subjective assessment suraly varying

across individuals. No information on age started smoking

is given for the subsample of occasional smokers.

Finally, it should be noted that even the use of an

obvious stock proxy measure like C(T-5) with, for example,

<5 equal one year, is precluded by data availability. It

is possible to determine neither consumption levels of one

year (or six months, or one month) ago nor, in many

instances, even the sign of C(T-6).

Yet there is some information that permits the

construction of a reasonably interesting, albeit rough,

proxy measure for the lifetime smoking profile K if one is

willing to make certain assumptions. Since age started

smoking is unavailable for the occasional smokers, this

subsample (about two percent) will henceforth be excluded

from the analysis. By assumption, K = 0 for all never

smokers. Thus, the proxy must be constructed for the

subsamples of former and current smokers. The data are

such that separate treatment of these two subsamples is

required. In both instances, however, several plausible

temporal smoking profiles can be created. In the absence
-------
5-4

of any prior information on which profile best captures an

individual's true consumption path, the only sensible

solution is to consider several different specifications

in the empirical analysis and assess ex post the

sensitivity of the results to the specification used.

For both former and current smokers, the construction

of the K measures relies on a major assumption about the

influence of quits on the temporal consumption profile.

That is, the profile is "forgetful" of quits: once an

individual resumes smoking after having quit, consumption

over the quit interval is treated as if there had been no

quit at all. For example, Figure 5-1 depicts the manner

in which this forgetfulness operates, with true

consumption C*(t) shown as a solid curve and proxy

consumption shown as a dashed curve:
-------
5-5
Figure 5-1 :
Hypothetical Smoking Profile
C(t) ,
T T ft

Such an approach has the unfortunate implication that, to

use an extreme example, the proxy profile of an individual

who quit smoking twenty years ago and resumed yesterday is

drastically different than that of an individual with an

otherwise identical smoking history who had not resumed

smoking. Until better microdata on individuals' smoking

histories become available, such drawbacks are inevitable.

For former smokers, the variables used to construct

the stock proxy are time started smoking (J) ; number of

cigarettes smoked per day at peak consumption (MCIGP); and

time last smoked regularly (T). There are three plausible

profiles that can be constructed using this information;

these can best be described graphically.

The first profile for former smokers, shown in Figure

5-2, assumes that peak consumption occurs at the midpoint

(-T*), and that consumption rises and falls
-------
5-6

linearly from and to zero from this peak (C(t) is

henceforth shown in solid lines):
Figure 5-2
Smoking Profile: Former Smokers I (F-I)
C(t) .
NCIGP -•
NCIG
T T ft

The second profile for former smokers, shown in

Figure 5-3, is based on the assumption that peak

consumption is attained immediately at T_ and continues at

that rate until ¥:
-------
5-7
Figure 5-3
Smoking Profile: Former Smokers II (F-II)
C(t) .
NCIGP ••
NCIG
The third former smoker profile, shown in Figure 5-4,

assumes that from T_ consumption increases linearly to

NCIGP which occurs at f, then falls instantly to zero :
-------
5-8
Figure 5-1
Smoking Profile: Former Smokers III (F-III)
CU) .
NCIGP -•
NCIG
The construction of the, profiles for the current

smokers uses T, NCIGP, and NCIG. Five profiles seem

sensible: three for current smokers for whom NCIG-NCIGP

and two for those where NCIGP exceeds NCIG.

The first profile, in Figure 5-5, is analogous to

that in 5-4: consumption increases linearly from T_ to

NCIGP which coincides with NCIG at T:
-------
5-9
Figure 5-5
Smoking Profile: Current Smokers I (C-I)
C(t) .
NCIGP
T T T t

The profile in Figure 5-6 assumes that peak

consumption first occurs at T*, then continues at that

rate to T (T* is defined for current smokers as (T-T)/2):

Figure 5-6
Smoking Profile: Current Smokers II (C-II)

C(t) ,
NCIGP ••
-------
5-1 0

The third construct for the NCIG-NCIGP group,

illustrated in Figure 5-7, assumes that NCIGP is attained

immediately at T and continues at that rate to T:
Figure 5-7
Smoking Profile: Current Smokers III (C-III)

C(t) „
NCIGP -•
"I"
T; T T t

The profile in Figure 5-3 is the first shown for the

subsample reporting NCIG less than NCIGP. Here it is

assumed that consumption increases linearly to NCIGP which

occurs at T* and then decreases linearly to NCIG at T:
-------
5-1 1
Figure 5~8
Smoking Profile: Current Smokers IV (C-IV)
C(t) ,
NCIGP ••
NCIG
Finally, the profile shown in Figure 5-9 assumes that

NCIGP is attained immediately at T and declines linearly

to NCIG at T:
-------
5-1 2

Figure 5-9
Smoking Profile: Current Smokers V (C-V)
C(t) .
NCIGP ••
NCIG
XT T t

Given these specifications, it is seen that all of

the integrals to be evalated have linear or

piecewise-linear integrands. That is, on the interval

[a,c] (where a»T and c»? or T), the integrands are either

of the form 5(t)(a+3t), te[a,c], or 5(t)(o. +8. t), te[a,b],

<5 (t) (a +3 t) , te(b,c] for b = T*. Specifically,
K(T)
I exp(-r(T-t) ) (a +3 . t)dt,
J-1fl
J
where fl *Ca»b] and 8?-(b,c]. Straightforward integration

by parts gives the solution as
2 aup(Q.)
X(T) - I exp(-rT)[r (
-------
5-1 3

The final point is the determination of r. Use of

decay or discount rates is often essential in applied

econometrics. Yet, in most instances there is no way to

know, the "correct" rate, so that in discounting future

streams or depreciating accumulated stocks, the strategy

typically adopted is to posit some rate or set of rates

and conduct analysis as if the rate is known. This

approach has been used in a wide spectrum of applications,

generally with little discussion or justification for the

rate chosen (although some studies helpfully demonstrate

the sensitivity of results to the assumed rates). Such an

approach will be used here.

Given the above assumptions on consumption profiles

and decay rates, the K proxy measures can be derived using

the relevant data in the estimation sample. However, an

obvious drawback is that with three possible consumption

profiles for NCIG=NCIGP current smokers, two for

NCIG
-------
5-1 4

The combinations to be used are (for former,

NCIG=NCIGP current, and NCIG
-------
5-15

by exposure to ambient air pollution, current cigarette

consumption, and other covariates, an individual's

prev ious cigarette smoking predisposes him or her to an

increased risk of respiratory illness. Using a subset of

the proxy profiles described above enables us to test for

the presence and extent of such effects.
-------
Chapter 6

CIGARETTE SMOKING, AIR POLLUTION, AND RESPIRATOR'/ ILLNESS:

AN ANALYSIS OF RELATIVE RISKS

6.1Introduction

The relative risks associated with cigarette smoking

and ambient air pollution are difficult to assess. First,

individuals' health status is largely subjective and often

difficult to measure.1 In addition, lung physiology is

complex as well as heterogeneous in a population of

individuals, hindering both the identification and

measurement of all potential determinants of respiratory

illness. Moreover, there is little theoretical guidance

as to the likely form of any functional relationship

between risk exposure and illness response. Finally, data

on exposure to risks are often not all one might like them

to be. It is thus apparent why one expert on quantitative

risk assessment was moved to comment:

Quantitative risk assessment is not a panacea.
A primary limitation is that such an assessment
is concerned only with what can be measured and
quantified.2

In spite of these problems, relative risk assessments

must be undertaken for smoking and air pollution. Both

have been the subject of much discussion and study in the

health and environmental policy communities, and in the
-------
6-2

popular press as well. Moreover, as discussed below,

considerable resources are being devoted to understanding

and reducing both risks. It is essential, therefore, that

the risks attributable to smoking and air pollution be

assessed simultaneously within a single coherent

framework; otherwise, risks attributed to one may in fact

be due to the other, thus biasing any estimates of the

health risk of but one of the variables.

The plan in this chapter is as follows. Section 2

discusses in greater detail the problem of acute

respiratory illness and some possible links between it and

smoking and air pollution. Section 3 describes the

dataset used in the empirical analysis, explains the

health measures utilized, and sketches the estimation

strategy. finally, Section 4 presents empirical results,

derives the estimates of the relative risks of interest,

and briefly suggests new directions for future research in

thi s area.

6.2 Smoking, Pollution, and Acute Illness

The association between cigarette smoking and several

major chronic illnesses is well known. The 1982, 1983,

and 1984 reports of the U.S. Surgeon General detail and

publicize, respectively, the relationships between

cigarette smoking and cancer, cardiovascular disease, and

chronic obstructive lung disease. For these diseases, the
-------
6-3

indictment of cigarette smoking is strong: although data

can obviously never demonstrate causality (as the Tobacco

Institute is wont to remind), the correlative evidence is

overwhelming.

Less widely publicized are associations between

cigarette smoking and less severe illnesses. There is

evidence, however, that suggests the existence of such

linkages. Chapters Three and Six of the 1979 Surgeon

General's report summarize much of the existing research

in this area. There it is reported that relative to

nonsmokers, current smokers have more frequent respiratory

tract infections and a greater prevalence of cough;

symptoms like cough and sputum production tend to increase

with the number of -cigarettes smoked. Moreover, the 1979

report finds that "...people who had ever smoked...had a

higher incidence of acute illnesses than did people who

had never smoked (p. 3~6)." Smokers report approximately

H5% more illness-related work loas days than do never

smokers.

Owing to the magnitude and severity of the illnesses

associated with cigarette smoking, considerable attention

and public resources have been devoted to the study of and

remedies for such illnesses. While one is hard-pressed to

estimate the value of the resources spent in such

activities, it is safe to venture that the value is

enormous.
-------
6-4

Other public policies have been put in place to

protect individuals' respiratory health. For example,

several federal agencies are involved in the protection

against and compensation for damages from pneumoconiosis

(black lung disease), while exposure to respirable

hazardous substances in the workplace -- like the cotton

dust which causes byssinosis — comes under the regulatory

purview of OSHA.

Of more immediate interest here, however, is the

widespread concern that ambient air pollution may be

detrimental to individuals' respiratory health. Many

clinical and epidemiological studies have tested for

possible relationships between ambient air pollution and

both morbidity and mortality. The cornerstone of federal

air pollution policy in the U.S., the Clean Air Act,

places primary emphasis on the protection of public health

from air pollution, insisting that air quality standards

be set to provide "an adequate margin of safety...to

protect the public health." Of the possible air

pollution-related illnesses, it is of course respiratory

illness that is of utmost concern. Regulatory mandates

pursuant to the Glean Air Act are not inexpensive:

according to the most recent estimates, annual costs of

complying with the Act are approximately $25 billion.

This sum is sure to grow as older sources of pollution are

retired and newer ones -- which must meet stricter
-------
6-5

emissions standards — are built to replace them. Whether

such expenditures achieve desired ends efficiently (if at

all) is a question on which we hope to shed some light.

Our task in this chapter is to assess the relative

contributions of cigarette smoking and air pollution to'an

individual's risk of suffering from acute respiratory

impairments of varying severity. We concentrate on this

category because none but the most extraordinary exposures

to air pollution can be expected to rival direct (or

perhaps even passive) smoking as a cause of the more

serious illnesses like cancer, cardiovascular disease, or

chronic lung disease. It seems to us plausible to

hypothesize that if typical levels of ambient air

pollution in the U.S. are to influence individuals' health

in any manner, then acute respiratory illness must be a

primary area of suspicion. Because it can be quite

expensive to control air pollution, it is important to

assess how the benefits of doing so compare to those

associated with policies oriented towards smoking

cessation. We suggest that an analysis of the relative

contributions of cigarette smoking and ambient air

pollution to acute respiratory illness is one way to

approach this important assessment.3

6.3 Data and Estimation Strategy

The individual data used here are from the 1979
-------
6-6

Health Interview Survey (HIS), a national sample of

approximately 110,000 individuals conducted over the

course of each year by the National Center for Health

Statistics. In this regard, the analysis in this chapter

is similar to that in Volume I and in Chapter 4 in the

present volume. The socioeconomic data elicited from each

respondent in the HIS includes information on age, race,

sex, income, education, as well as other individual- and

household-specific characteristics. In addition, the

supplemental survey on smoking behavior administered in

the 1979 HIS make it particularly useful for present

purposes. This supplemental questionnaire was asked of

one-third of the approximately 78,000 adults ( 17 •*• years)

interviewed, and provides detailed data on 'lifetime

smoking history and present smoking behavior.

Restrictions in activity due to any illness

experienced during the two-week period prior to the date

of each interview are reported by the interviewee or

another household member responding for the interviewee.

Manifestations of illnesses are classified in three types:

bed disability days, work or school loss days, and what

might best be thought of as minor restricted activity

days. The latter are days on which the subject was

neither bedridden nor forced to miss work or school, but

on which the individual did suffer from an impairment

sufficient to cause a perceptible restriction on usual
-------
6-7

activity. The information on health impairments elicited

in the survey is coded by cause according to the

International Classification of Disease. As discussed

earlier, attention is limited in this chapter to those

restrictions in activity due to respiratory illness.

All air pollution data come from the U.S.

Environmental Protection Agency's SAROAD system. The air

quality data used here are measured over the two-week

recall period for which the individual acute health data

are available. The received opinion of respiratory

physiologists suggests that not all airborne pollutants

are equally important in influencing respiratory health.

Accordingly, the analysis of acute respiratory illness

here uses air pollution data for ozone (OZONE), a gaseous

pollutant that is the primary constituent of smog, and

sulfates (SULFATE), perhaps the most harmful of airborne

particulate matter. The subsequent analysis characterizes

individuals' exposures to air pollution using data from

the air pollution monitors nearest the center of the

census tract in which the individual resides. No

individual for whom the nearest monitor is more than ten

miles away is included in the estimation sample, with the

sample average distance from centroid to monitor being

slightly more than four miles. For more details on the

air pollution data used in this analysis, consult Volume

I.
-------
6-8

We include two measures to control for cigarette

smoking. The first is the individual's daily consumption

of cigarettes at the time of the interview (NCIG). (The

HIS unfortunately contains no information on cigar or pipe

smoking.) Since the consumption data are self-reported,

some caution must be exercised in light of Warner's

underreporting hypothesis [5]; however, we make no attempt

here to correct for this possible errors-in-variables

problem. (Interestingly, in light of the mounting

evidence on the harms of passive smoking, attributing zero

as the number of cigarettes smoked per day by a

"nonsmoker" represents perhaps an understatement of the

daily dosage of cigarettes.)

Both medical evidence and common sense suggest that

the rate of current cigarette consumption alone is an

insufficient characterization of an individual's

sraoking-related risk of respiratory illness (see, for

example, Chapter Six of the 1979 Surgeon General's

report). A more appropriate characterization of these

risks incorporates the influences of both current as well

as past cigarette consumption. Accordingly, the influence

on the likelihood of current acute respiratory illness of

lifetime cigarette consumption is measured by the variable

PACKS, a proxy for the number of cigarette packs that a

given individual has "amassed" over his or her lifetime.

PACKS can be viewed as a stock or state variable equal to
-------
6-9

the integral over an individual's lifetime cigarette pack

consumption profile (C(t)}. (See Chapter 5 for a

discussion of the creation of the K(t) measures.) The

measure defined in Chapter 5 and converted into pack units

is selected from the set of candidates to serve as the

pack/ear proxy in the present analysis.

Table 6-1 provides a summary description of the air

pollution and smoking measures, as well as the other

independent variables used, and Table 6-2 depicts their

sample means, minima, and maxima.

Among the measures of respiratory illness available

in the HIS, the number restricted activity days due to

respiratory illness during the two-week recall period

(RRAD) is a logical choice for use in the present

analysis. However, one drawback to its use as a measure

of health status is that it is a somewhat aggregated

concept. Any day reported as a bed disability or work

loss day, when due to respiratory illness, is counted as a

RRAD, as are days when individuals are hampered in minor

ways from performing usual activities without confinement

to bed or work loss. It is possible, however, that the

determinants of minor restrictions are likely to be

different — in kind or in magnitude -- from the

determinants of severe limitations.

The HIS data do not enable a complete disaggregation

of these different types of respiratory restrictions. For
-------
Table 6-1

Variable Definitions
Variable Name
D escri pt i on
OZONE
SULFATE
NCIG

PACKS

TEMP

RAIN

AGE

EDUC

INCOME

CHRONLIM

MALE

WHITE

BLUECOL

WHITECOL

INSCHOOL
Average daily maximum one-hour ozone
reading during two-week recall period
at monitor nearest the centroid of
respondent's census tract of residence,
subject to ten-mile distance cutoff
(in parts per million)

Average 24-hour sulfate concentration
during two-week period at monitor nearest
the centroid of respondent's census tract
of residence, subject to ten-mile
distance cutoff (in wg/m )

Number of cigarettes smoked per day

Proxy for lifetime cigarette consumption,
in packs (see text or [5] for detailed
description)

Average daily maximum temperature during
two-week recall period (in degrees r)

Average daily precipitation during
two-week recall period (in inches)

Age, in years

Number of years of schooling

Annual family income, in 1979 dollars

Equals 1 if respondent reports a
persistent limitation in activity due
to a chronic ailment, equals 0 otherwise

Equals 1 if male, equals 0 if female

Equals 1 if white, equals 0 if black

Equals 1 if respondent reports usual
activity is working and usual employment
is blue collar, equals 0 otherwise

Equals 1 if respondent reports usual
activity is working and usual employment
is white collar, equals 0 otherwise
Equals 1 if
acti vi ty is
otherwise
respondent reports usual
going to school, equals 0
-------
Table 6-2

Sample Summary of Independent Variables (n=3073)
Variable
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
3LUECOL
WHITECOL
INSCHOOL
Mean
.0426
1 0.87
7.454
3299.1
63.92
.114
U2.83
1 1 .73
17095
.173
.433
.857
.232
.288
.079
Minimum
0
.784
0
0
11.14
0
17
0
500
0
0 •
0
0
0
0
Max imum
. 21 36
52. 1 4
98
441 74. 1
106.36
.637
96
1 3
30000
1
1
1
1
1
1
-------
6-1 0

example, due to some peculiarities in data collection, it

is not possible in all instances to disentangle work loss

from bed restricted days (see Volume I for a detailed

discussion of these problems). However, the data do

permit a unique disaggregation of RRADs into two

qualitatively distinct types: days restricted in activity

due to respiratory illness but not confined to bed (minor

RRAD, or RADM), and days restricted in activity due to

respiratory illness with bed confinement (severe RRAD, or

RADS) . . Thus, it is possible to determine for each

individual the number of RADM, RADS, and nonrestricted

days (NRD) occurring during the two-week recall period.

In the analysis to follow, we consider two separate

definitions of RADM and RADS: first, those related to

respiratory illness classified by NCHS as either chronic

or acute (RADM-CA, RADS-CA), and, second, those related

only to respiratory illness classified as acute (RADM-A,

RADS-A)." The sample frequencies are presented in Table

6-3.

The nature of these health status measures is such

that several peculiar characteristics must be treated

simultaneously in the estimation procedure. First, the

measure best suited for the analysis is multivariate in

nature: during any two-week period, individuals can report

minor or severe restrictions in activity due to

respiratory illness, or can report no respiratory
-------
Table 6-3

Sample Frequency Distribution of RRAD Measures (n=3073)
Number of Days
0
1
2
3
4
5
6
7
3
9
1 0
1 1
1 2
13
1 4
RADM-CA
3013
16
1 4
8
6
2
1
2
1
1
1
0
1
0
7
RADM-A
3027
1 4
1 2
6
5
2
1
1
1
1
1
0
1
0
1
RADS-CA
3007
1 9
1 8
1 0
5
4
1
3
3
0
0
0
0
0
3
RADS-A
301 4
1 8
1 5
1 0
5
4
1
2
3
0
0
0
0
0
1
-------
6-1 1

impairment. Second, outcomes are mutually exclusive. On

a day where an individual reports a RADM, neither a RADS

nor a NRD can be reported; similar exclusivity holds for

RADS and NRD. Third, for all individuals, each of RADM,

RADS, and NRD is constrained to take integer values in

{0, 1 , . . . , 1 4}, with the sum RADM+RADS-i-NRD equal to

fourteen. Finally, because of the protocol of the HIS, it

is not possible to determine on what days during the

two-week recall period a given individual reported the

RADM, RADS, or NRD; only the number of each type of

outcome is known. While it seems sensible to suppose that

RRADs would be contiguous rather than disparate during any

particular time interval, the data used here do not permit

such a conjecture to be verified.

Following the discussion in Chapter 2 of this volume,

the estimation strategy is to view each day during the

two-week recall period as a trial on which one and only

one of the three possible outcomes can occur. For each

individual, then, there are fourteen trials. Because any

one individual's covariates are invariant across the

fourteen trials and, as noted above, because it is

impossible to ascertain which health outcomes occurred on

which days (except, of course, in the polar case where the

same outcome occurs on all fourteen days), it is plausible

for estimation purposes to assume independence both across

trials for an individual and across individuals. (In the
-------
6-1 2

estimation subsample used, it happens that at most one

individual per household is included. Thus, contagion

effects -- which might otherwise vitiate tine assumption of

independence across individuals -- can be ignored.)

The preceeding paragraphs describe a model that can

be appropriately cast in terms of a multinomial

distribution with k=»3 possible outcomes; n =n =n = l4
t T

independent trials for all t,t; and probability vector

( *M ,Tfo , IT.. ) (M = RADM, S-RADS, N=NRD) such that
Mt St Nt
ir.. +ir_ -(-ir.I -1 . The number of successes or incidences of
Mt St Nt '•
each type is n for q»M,S,N, and n *nq *n« -14 fpr all
qt t t t -
t. Thus, denoting the multinomial (vector) random

variable for the t-th individual as Y. ,
n
Pr(Y =y.) - nl H [ ( IT }qt]/n !, (1)
C C qefl qt qt
where Q={M,S,N} and n»14. A logit specification for the

ir is assumed:
qt
ir - exp(X 8 )/( I exp(X 8 )), (2)
qt q * r
for q=M,S,N. The parameter vectors 3 are unique only up

to a difference, so that some normalization is necessary;

3.T=«0 is used here. Details on estimation are presented in
N
the appendix.

A basic and more popular version of the model
-------
6-13

described above is the ordered logit model described in

Chapter 2, in which it is assumed that there exists some

mechanism that orders the outcome probabilities according

to a particular latent measure (illness severity, for

example). The typical assumption is that the coefficients

S*=*(3 0,...,8 y.) are invariant across the outcomes q
q qz . qK.

(except for the outcome whose parameter vector remains

normalized to zero), with the ordering is characterized by

outcome-specific intercept terms, such that

8 01<3 .•" signifying the

ordering "more severe than." For purposes of comparison,

therefore, we also present estimates of a a multiple-trial

version of an ordered logit model. It happens that this

is a parameter-restricted version of the multinomial model

specified above, with (K-1) restrictions of the form 33-8*

on the likelihood function (A.2) implied by the ordered

logit likelihood function. It is thus possible to test in

a straightforward manner whether these restrictions are

valid insofar as the model and data sample used here are

concerned. The test is a standard likelihood ratio test,

with the test statistic computed as LR = -2(4-4,.); ln is
U A U

the maximized likelihood function value for the ordered

logit specification and 4 is the corresponding value for
-------
6-1 4

the multinomial model (A,2). Under the null hypothesis

that the (X-1) restrictions are valid, LR is distributed

asymptotically as central x with (K-1) degrees of

freedom.
6.4 Estimates of Model Parameters and Relative Risks

The estimates of the model Using the chronic and

acute RRAD measures of respiratory illness are presented

in Table 6-4. Insofar as the parameter estimates

associated with the independent variables other than

smoking or pollution are concerned, it is seen that most

are statistically significant in at least one of the

RADM-CA or RADS-CA estimated parameter vectors, with

generally plausible signs in most instances. The

parameter estimate associated with the current level of

cigarette smoking (NCIG) is statistically important in fJ,,
M

but is insignificant in &„. Lifetime cigarette

consumption (PACKS) plays an opposite role: its associated

parameter estimates are positive and significant in the 3M

vector, but statistically indistinguishable from zero in

Bg. SULFATE appears to be an insignificant contributor to

either RADM-CA or RADS-CA. OZONE, conversely, has an

associated parameter estimate in 8,, that is positive and

statistically significant, although the ozone coefficient

in 3S is statistically unimportant. This finding is

consistent with those in Volume I. There, using more
-------
Table 6-4
Model Estimates: Chronic and Acute RRADs with
Linear Risk Factor Influence
Vari abl e
INTERCEPT

OZONE

SULFATE

NCIG

PACKS

TEMP

RAIN

AGE

EDUC

INCOME

CHRONLIM

MALE

WHITE

BLUECOL

WHITECOL

INSCHOOL

Log(D— 273^.55
3M-CA
-6.34"
(9.8)
7.03
(2.5)
.0061
(.54)
-.0034
(.61 )
.36E-4
(3.1)
-.01 4
(3.4)
0.85
(1.3)
-.0084
(1 .9)
-.0082
(.39)
-.32E-4
•(4.1)
0.99
(6.7)
0.28
(2.1)
2.59
(5.1)
-1 .76
(5.9)
-0.18
(1.0)
-0.42
(1.4)

^S-CA
-3.64
(7.2)
1 .77
(.45)
- .01 2
(.84)
.026
(4.8)
.53S-5
(.33)
-.025
(5.2)
0.45
(.62)
.31E-3
( .061 )
-.058
(2.4)
-.63E-4
(7.0)
0.46
(2.6)
.079
(.53)
0.78
(3-2)
-0.17
(.75)
0 .83
(4.3)
0.90
(3.D

Note: Asymptotic normal scores for H :3 -0 in parentheses
-------
6-15

"primitive" OLS and logit techniques, we found positive

and often significant associations between ozone and minor

illnesses among adults, but no pattern of associations

when we examined either work loss or bed disability days.

Thus, the findings in this chapter provide some

corroborative evidence using a more sophisticated and

appropriate statistical approach.

Similar estimates obtain in the model of the

acute-only respiratory ailments RADM-A and RADS~A,

presented in Table 6-5. Most notable is that the

individual parameter significance levels tend to be

somewhat lower than those estimated in the chronic-acute

model of Table 6-4, although the qualitative

interpretation is in most instances unchanged._ Of

particular import is that the coefficient estimates

associated with OZONE and PACKS in 3 are no longer

significant at the 95? level.5

In Chapter 4 we found that various nonlinear

transformations of the ozone measure lead to differing

conclusions about the significance of the relationship

between ozone and respiratory health. There, remember,

the transformation (OZONE)' proved most significant. As

such nonlinearities are potentially important in the

present analysis as well, we also consider simple

transformations of OZONE, NCIG, and PACKS of the form

OZONE*1, NCIG*2, and PACKS*3 for Alf\a,A,>0. (We ignore
-------
Tabls 6-5
Model Estimates: Acute-only RRADs wi
Linear Risk Factor Influence
th
Variable
» v* » .1. w* is J_ w
INTERCEPT
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
*M-A
-6.67
(8.9)
5.09
(1.3)
. 0031
( .59)
- .0088
(1.1)
3. 1 E-4
(1 .8)
-.0069
(1.2)
0.46
(.53)
.0053
(.95)
-.0088
(.3D
.28E-4
(2.8)
-1.10
(3.6)
-0.15
(.79)
1 .82
(3.6)
-1 .25
(3.6)
0.13
(.59)
0.28
(.82)
PS-A,
-4.66
(8.1)
-2.39
(.49)
.0031
( .22)
.023
(3.5)
1 . 4E-5
( .072)
-.024
(4.4)
0.54
(.68)
.0057
(.98)
-.039
(T.4)
. 41 E-4
'(4. 4)
0. 1 3
(.58)
-0.30
(1 .3)
0.65
(2.4)
0.55
( 2 ; 1 )
1 .35
(5.9)
1 .57
(5.0)
Log(i)--20M8.53
Note: Asymptotic normal scores for
in parentheses
-------
6-1 6

transformations of SULFATE because of the its generally

insignificant contributions as witnessed in Tables 6-4 and

6-5.) The software used for estimation does not enable

maximum likelihood estimation of the \.; instead, a grid
J

search approach is used, where the search is over

( Al , A2, A^efx-yxf, and *-{0 . 5 , 1 . 0 , 1 . 5 , 2 . 0 }. Of the

sixty-four possible (AlfA2,A3) triples, that which

maximizes the conditional (on AltA2) likelihood function

with respect to (8M,3q) is selected as the (pseudo) MLE.

The estimates of the RADM-CA and RADS-CA model using

the nonlinear transformations are presented in Table 6-6.

The pseudo-MLEs of the A are At=0.5 , Aa»1.5, and A,*1.0,
t\ r . ,

with a likelihood ratio test indicating that these

transformations are jointly significant, at greater than

the 95% level. The overall qualitative findings are

unchanged; however, the parameter estimates associated

with the transformed risk factors are more finely resolved

than those presented in Table 6-4. Similar statements can

be made about the acute-only model, whose estimates are

presented in Table 6-7. Again, the pseudo-MLEs for the A.
K

are 0.5, 1.5, and 1.0 for the OZONE, NCIG, and PACKS

transformations, respectively. The likelihood ratio test

of the joint significance of the transformations is

significant only at slightly above the 90$ level; since

the A^ are not true MLEs, however, such an LR test is

somewhat misleading, and is biased in favor of accepting
-------
Table 6-6
Model Estimates: Chronic and Acute RRADs with
Nonlinear Risk Factor Influences
Variable
INTERCEPT
OZONE*1
SULFATE
NCIG*2
PACKS*3 '
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
3M-CA
-6.75
(10.5)
5.10
(3.5)
.0039
(.35)
-.25E-3
( .32)
.35E-4
(3.0)
-.020
(4.2)
1 .01
(1.5)
-.0080
(1.9)
-.0071
(.33)
-.32E-4
(4.1 )
0.99
(6.7)
0.28
(2:0)
2.56
(5.0)
-1 .77
(6.0)
-0.19
(1.1)
-0.41
(1.3)
3S-CA
-3.64
(7.2)
2.86
(1 .7)
-.01 4
O.O)
.0034
(5.3)
.81 E-5
(.53)
-.030
(5.9)
0.57
(.78)
-.94E-3
(.19)
-.059
(2.4)
-.64E-4
'(7.1)
0.46
(2.9)
.064
(.43)
0.75
(3.0)
-0.15
(.65)
0.86
(4.4)
0.86
(3.0)
Log(l)—2729. 13
Note: Asymptotic normal scores for H.:g »0 in parentheses
\ from grid search: A^O.5; A^-1.5; A3-1.0.
-------
Table 6-7
Model Estimates: Acute-only RRADs with
Nonlinear Risk Factor Influence
Var iable
INTERCEPT
OZONEA l
SULFATE
NCIG1'
PACKS*3
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
SM-A
-7.09
(9.5)
4.89
(2.5)
.0052
(.37)
-.33E-3
(.26)
.23E-4
(1.3)
-.01 3
(2.1)
0.68
(.78)
. 0066
(1.2)
-.0076
C.27)
-.28E-4
(2.8)
-1 .09
(3.6)
-0.14
(.74)
1 .79
(3.5)
-1 .27
(3.7)
0.12
(.54)
0.30
(.89)
BS-A
-4.59
(7.9)
1 .35
(.71)
. 44E-3
( .032)
.0034
(4.1 )
-.88E-6
C.046)
-.029
(5.2)
0.69
(.87)
.0052
(.91)
-.040
(T.4)
-.42E-4
-(4.5)
0.13
(.58)
-0.31
(1 .9)
0 .62
(2.3)
0 .56
(2.2)
1 .36
(6.0)
1 .55
(4;9)
Log(A)—2045.12
Note: Asymptotic normal scores for H : 3 =-0 in parentheses
* from grid search: ^=0.5; A2=1.5; A3=1.0.
-------
6-17

the null that transformations are not important.

The, estimates of the ordered logit models are

presented in Table 6-8. The X, transformations suggested
i\

by the multinomial models of both Table 6-6 and Table 6-7

are used here. In the model of RADM-CA and RADS-CA

(column 1), the parameter estimates for (OZONE) , TEMP,

INCOME, CHRONLIM, WHITE, BLUECOL, (NCIG)1'5 and PACKS are

all significant at greater than the 99* level. A perhaps

peculiar result is that the estimate of the RADM intercept

exceeds that for RADS, thus calling into question the

validity of the ordered logit specification. Inde.ed, the

2
X/.5,-distributed likelihood ratio test statistic of the

restrictions on the multinomial model that are implied by

the ordered specification has a value of 89.58, suggesting

that the ordered specification can be rejected with

considerable confidence in favor of the unrestricted

multinomial model. Similar results obtain for the model

of RADM-A and RADS-A (column 2): estimated parameters

associated with TEMP, INCOME, WHITE, WHITECOL, INSCHOOL,

and (NCIG). are significant at above the 99? critical

level, while the asymptotic t-statistics associated with

(OZONE)!5 and CHRONLIM parameters exceed the 95J level.

2
In this instance, the x test statistic for the ordered

logit model restrictions has a value of 68.58, again

suggesting that the ordered specification be rejected in

favor of the general multinomial model.
-------
Table 6-8
Model Estimates: Ordered Logit, Nonlinear Risk Factor Influence
Variable
Chronic-Acute
Acute-only
INTERCEPT-RADM
INTERCEPT-RADS
OZONE*1
SULFATE
NCIG*2
PACKS^3
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
-4.28
(11 ;6)
-5.07
(13-7)
4.17
(3.8)
-.0033
(.37)
.001 7
(3.4)
.24E-4
(2.6)
-.024
(7.1 )
0 .76
(1.5)
-.0045
C1 -4)
-.029
(T.8)
-.46E-4
(7.9)
0.75
(6.7)
0.17
(1 .7)
1 .38
(6.3)
-0.93
(5.3)
0 .28
(2.1)
0.21
(1.0)
-4.82
(11.0)
-5.45 '
(12.3)
3.15
(2.3)
.0033
(.35)
.0020
(3.0)
.93E-4
(.71)
-.022
(5.3)
0.57
(1.1)
.0063
(1 .6)
-.024
(T.2)
-.35E-4
(5.2)
-0.41
(2.3)
-0.22
(1 .8)
0.99
(4.2)
-0.26-
(1.3)
0 .71
(4.6)
0.92
(4.0)
LogU)
-2773-84
-2079.41
Note: Asymptotic normal scores for H : 3 =»0 in parentheses.
\. Ok
k from grid search: ^=0.5; X2=1.5; X3=1.0.
-------
6-1 8

On the basis of these results, we elect to use the

estimates of the nonlinear risk factor multinomial models

presented in Tables 6-6 and 6~7 as the foundation of the

relative risk estimates. In the multinomial model,

translations from the qualitative outcomes to quantitative

estimates of relative risks are fairly straightforward.

One obvious strategy for evaluating the relative risks of

smoking and air pollution would be to assess and compare

the estimated elasticities of the daily outcome-specific

probabilities with respect to the pollution and smoking

control variables. Using the incidence probabilities

defined in (2), and allowing for the cases where the

control variables are subject to the nonlinear

transformations h(x )-(x . } k, the elasticity formula is:
C K w *C
XB,.)). (3)

i £ **

which simplifies when \ -1. (Here, X denotes the

transformed X vector.)
U

While the elasticity comparison approach provides

perhaps the most straightforward method for assessing the

relative risks of interest, the nature of the data used

here renders it somewhat uninformative. In brief, the

problem is that 64J of the sample are classified as
-------
6-19

current nonsmokers, while UH% are never smokers. It is

seen by inspection of equation (3) that for these large

subsamples the estimated NCIG and PACKS elasticities are

zero.

We therefore adopt in lieu of elasticity comparison

an approach that considers the discrete changes from the

baseline or prevailing daily incidence probabilites

attributable to a variety of discrete changes in the

control variables from their prevailing sample values.

This strategy has at least two advantages. First, it

circumvents the non- or never-smoker problem. Second, the

magnitudes of the hypothetical discrete changes in the

control variables are set to mimic potentially interesting

policy measures.

The strategy is as follows. First, for each of the

four incidence outcomes (RADM-CA, RADM-A, RADS-CA,

RADS-A), a baseline mean probability is calculated using

the estimated models in Tables 6-7 and 6-8. This mean

probability — which is simply the sample average of the

IT — is denoted ir . The second step is to perturb the
qt q '
control variable of interest in each X by the specified
c
amount and reevaluate each individual's incidence

probability using the perturbed X . The sample average of
t- ,
these new ir is denoted TT^ . Finally, for each illness
v
measure and each control perturbation considered, the

diffference ^"' i3 calculated. The results are
-------
6-20

presented in Table 6-9.

As Table 6-9 indicates, depending upon the specific

model of interest, changes in either or both measures of

smoking as well as ambient ozone concentrations can affect

the likelihood of an individual's reporting a minor or

severe respiratory impairment. For instance, in the

RADM-CA model a 5 percent increase from the sample mean in

the average daily maximum 1-hour ozone concentration

increases the estimated risk of a minor respiratory

-4
impairment on any given day by an average of 1.76*10 . A

comparable increase in risk is predicted to result from an

individual's having smoked slightly more than an

incremental one pack per day for two years (since one pack

-4
per day for one year adds 0.75*10 to the risk of a

RADM-CA). The same model reveals that a ten percent

increase in the average daily maximum ozone concentration

is about equivalent to an increase of an extra one pack a

day smoked for five years in terms of incremental risk

(3-50*10~4 and 3.85*10~4, respectively).

We can also compare the incremental risks of air

pollution with those associated with current cigarette

consumption. For instance, from the RADS-CA model, a five

percent increase in ambient ozone concentrations increases

the baseline risk of a severe acute respiratory illness by

— h
0.89*10 . This is about one-twelfth the effect of an

individual's currently smoking an additional half pack of
-------
Table 6-9

Estimated Mean Changes from
Baseline Probabilities ir (x10,OOQ)
RADM-CA RADS-CA RADM-A RADS-A
Baseline ir
60.202 50.671 35.331 40.910
Hy pothetical
Control Change;

OZONE +
OZONE +
t
NCIG +
O
NCIG,. +
t
PACKS
W
PACKS,.

.05
. 1 0

5
1 0

+ 3
+ 1

*OZONE
*OZONE

65
825

1
3

0
3

.76
.50

--
« ..

.75
.35

0.
1 .

4.
10.

0 .
0.

89
75

47
80

1 4
71

0
1

.98
.95

--
.. _

.29
.49

0
0

3
3

.35
.68

.46
. 40

--
—
Notes: "--" signifies negative predicted change
OZONE signifies the sample mean concentration of
OZONE
v
.0426
-------
6-21

cigarettes per day. Comparable calculations could be made

in the other models as well (although we prefer not to go

into detail here since the significance levels on the

variables of interest are not sufficiently high to warrant

large confidence in the estimated risk changes).

For purposes of public policy, it would be desirable

to go beyond the estimation of relative risks to consider

the cost and efficacy of "control" measures. This would

permit at least crude cost-effectiveness comparisons to be

made. Some estimates are available regarding ozone

reductions. White [6] reports that when the National

Ambient Air Quality Standard for ozone was reviewed in

1978, the marginal cost of meeting a standard of 0.12 ppm

as opposed to one of 0.14 ppm was approximately $2.0

billion. Although the form of that standard (second

highest hourly reading at a monitor) differs from the

measurement of ozone in this study (average daily maximum

one-hour reading during a two-week period), a link between

the two could be made. This would permit an estimate of

the costs per unit of predicted ozone risk reduction,

holding other possibly beneficial effects of ozone

reduction (agricultural productivity increases, for

example) constant.

In principle, estimates could be assembled on the

costs of reducing cigarette consumption (for an excellent

discussion of the nature of such costs, see [1]). Using
-------
6-22

such data, and the results presented above, estimates of

cost per unit of reduced risk from smoking could be

derived and compared with those resulting from pollution

control. Finally, if appropriate allowances were made for

the qualitatively different nature of the two risks -- the

differing degrees of voluntarism, for instance -- it would

be possible to draw inferences about potentially efficient

resource allocation.
-------
6-23

APPENDIX

Given an independent sample of T observations, the

likelihood function is
T n
nl H { H [(ir )Pt]/n 1 }. (A.1 )
t-1 refl t t
In logs,
T
i - Z I In [X 3 - log( I exp(X 3 ))]} + c, (A.2)
t-1 refl t sea ^ 3

where c does not depend on 3=»(3..t8c). I is concave in 3,
n o

thus assuring convergence.

A Mewton-Raphson algorithm programmed in SAS's PROG

MATRIX is used for estimation. Except for the adjustment

for the multiple-trial nature of the data, the vector of

first derivatives and matrix of second partials of 1 with

respect to 3 are identical to those of the more familiar

single-trial multinomial logit model. Thus,
and
-------
6-24

where q,pe{M,S}, and S. . is the Kronecker delta. The

information matrix estimate is
evaluated at 3; its inverse serves to estimate Cov(S).
-------
6-25
Notes

*3ee Manning, et. al . [4].

2Lave [2], p.2.

3It is obvious that the "target" groups in a smoking

cessation or mitigation policy differ from those

in a policy designed to reduce ambient concentrations

of air pollution. One might argue that

a critical difference is that smokers assume their

risks voluntarily whereas exposure to ambient air

pollution is largely involuntarily; policy measures,

it is argued, should be more concerned-with those

risks assumed involuntarily, these being more in the

nature of classic economic externalities. However,

the recently mounting evidence on the health

consequences of passive smoking suggests that the

target groups in smoking mitigation policies might

well extend beyond the population of voluntary

smokers. To the extent that passive smoking is

involuntary — in the sense that the coats

associated therewith have not been capitalized by

market forces — then the distinction between the
-------
6-26

air pollution and smoking policy target groups tends

to blur .

"*A11 illnesses reported in the HIS are coded as either

chronic or acute. Regardless of the interval between

incidence and time of survey, some illnesses are --

by definition — coded as chronic due to their

intrinsically chronic nature (e.g. emphysema, lung

cancer, most cardiovascular problems). Moreover,

illnesses that might otherwise be classified as

acute are classified as chronic if the interval

between their incidence and the time of the interview

exceeds three months. Thus, an acute illness,

according to the NCHS codification scheme, is an

illness that is typically construed as acute and

that has had a duration of less than three months

at the time of the interview.

slt is admittedly troublesome that the signs of the

estimated coefficients for either MCIG or PACKS are

negative -- although not statistically distinguish-

able from zero -- in some of the specifications. We

suspect that this phenomenon is attributable largely

to collinearity between the two measures; indeed,

their sample correlation is 0.55.
-------
6-27

On a priori grounds, as argued earlier, both

should be included in a model of respiratory illness.

However, if collinearity is severe, their separate

influences become difficult to identify. To explore

further this possibility, we estimated two alternative

versions of the multinomial model for both

specifications (CA,A) of the RRAD measures, one in

which NCIG , but not PACKS, is included, and one in

which PACKS, but not MCIG, is included. The results

largely corroborate the collinearity hypothesis: in

all cases, the estimates of the parameters associated

with the single included smoking measure are positive

for both the RADM and RADS probabilitea.
-------
6-28

REFERENCES

[1] Atkinson, A.B. and T.W. Meade. "Methods and

Preliminary Findings in Assessing the Economic and

Health Services Consequences of Smoking, with

Particular Reference to Lung Cancer," Journal of the

Royal Statistical Society A 137, pp. 297-312, 1974.

[2] Lave, Lester 3. Quantitative Risk Assessment in

Regulation. Washington: Brookings, 1982.

[3] Maddala, G.S. Limited-Dependent and Qualitative

Variables in Econometrics. Cambridge: Cambridge

University Press, 1983.

[4] Manning, W., J. Newhouse, and J. Ware. "The Status of

Health in Demand Estimation; or, Beyond Excellent,

Good, Fair, Poor," in V. Fuchs, ed. Economic Aspects

of Health. Chicago: University of Chicago Press for

MBER, 1982.

[5] Warner, Kenneth E. "Possible Increases in the

Underreporting of Cigarette Consumption," Journal of

the American Statistical Association 73, pp. 314-313,

1 978.

[6] White, Lawrence. Reforming Regulation: Processes and

Problems. Englewood Cliffs, NJ: Prentice-Hall, 1981.
-------
Chapter 7

CHRONIC RESPIRATORY DISEASE

In the initial analysis in Volume I of ozone and chronic respiratory

disease (CRD), several regressions were estimated over what was referred to

as a "resident! ally stable" group of individuals. That is, the

observations were restricted to those individuals who had been living in

the same place for five years at the time they were interviewed in the 1979

HIS. The purpose of this restriction was to reduce the chances that

someone who had lived in another location for a long time would be matched

up to air pollution exposures at his or her new location, thus confounding

our analysis of CRD. Our findings in Volume I (see especially p. 4-71)

suggested that concentrating on the residentially stable made a difference

in the conclusions one draws from such analysis.

However, the five year residency requirement we imposed in that

analysis is itself rather weak. Accordingly, in analysis conducted since

the completion of Volumes I and II, we have reexamined the incidence of

CRD—and its possible link to air pollution—using a group of individuals

who had lived for at least ten years at the location they reported in the

1979 HIS. While this does not eliminate the possibility of spurious

correlation, it lessens it when compared to the five-year residency

restriction imposed earlier. These results are reported here.

These results are responsive in other ways to comments and suggestions

on our earlier work. For instance, in response to puzzlement over the
-------
7-2

relatively weak performance of the smoking variables in explaining CRD in

the earlier work, we included in the reanalysis the variable PACKYRS.' This

measure, described in detail in Chapter 5, proxies individuals' lifetime

smoking habits. It is included along with NCIGS, a measure of current

smoking activity. Also, we have purged the list of regressors of many

which had little or no explanatory power in the original analysis. In this

respect, the models estimated below are akin to the "lean" model in the

original analysis (see equation (29), p. 4-77 of Volume I). Finally, in

the analysis here we have included an additional measure of long-term air

pollution concentrations, one which takes data from just one year (1979)

but includes annual average readings for all monitors within 20 miles of

the respondents' census tract cenfcroids. These are denoted as OZ79AV,

S479AV, and SP79AV for ozone, sulfatea, and total suspended participates,

respectively.

The analysis of CRD below differs from that in Volume I in one other

important respect. Here we have run separate regressions for those

individuals who received the "probe" questions concerning respiratory

illness and for those who did not. (Recall that in addition to the main

questionnaire, all respondents in the HIS were given one of six different

probes inquiring in detail about six specific disease categories. Thus,

one-fifth of the respondents were asked whether they had any of a number of

specific respiratory diseases; the other five-sixths of the sample was

probed (one-sixth each) about cardiovascular, geni to-urinary,

rausculoskeletal, digestive, and nervous system disorders.) Even those

individuals not receiving the respiratory probe could report the presence
-------
7-3

of CRD in open-ended questions earlier in the survey. However, those who

had a condition like asthma, and who forgot to volunteer that information

in the open-ended questions, would have the chance to report it if they

received the respiratory probe (where asthma is listed). They would not

have this opportunity if they received, say, the cardiovascular probe.

Because of this difference, it is of course possible that the reported

incidence of CRD might differ between the two groups. When we separated

the two groups, this is precisely what was found. The sample below

consists of 2,743 individuals who had lived in the same dwelling for at

least ten years at the time of the 1979 HIS.' In addition these were

individuals for whom complete data were available on the dependent and

independent variables of interest. Of the 2,743 individuals, 460 had

received the respiratory probe questionnaire while the remaining 2,283 had

been administered one of the other five probes. Of the 460 receiving the

respiratory probe, 67 (or 15 percent) reported the presence of a chronic

respiratory condition. Of the 2,283 not receiving the respiratory probe,

only 74 (or 3 percent) reported such a condition. Since the assignment of

the six probes was random, this suggests reporting differences that merit

separate investigation. This we do below.

The results of our limited reanalysis of the determinants of CRD are

presented below.' All models are estimated using logit techniques.

Equations (1) - (5) pertain to those receiving the respiratory probe while

(6) - (10) include only individuals not receiving that probe. In equation

(1), exposures to air pollution are characterized by the annual average

daily one-hour maximum ozone concentration at the nearest monitor (OZ79NR) ,
-------
7-4
Ok
i
a
z
t
x
Ok
I,
r*
•
ec
•o —
e a
• a

m u*
_
• *
«O O

** 1*
-_ .^
• •
N O

* in
0» *sj
* t
^

(J» f«*>
Irt «J«t
• *
O O

** »o *•*
o
^* w

* ^
CO 0
• •
* rj

<* ^ «•*
o o o m
• • * t
* *4 00
w

J^ ^ .
z **i ^ 5
— fc _ ™ *
i»*> •»» 3; 2
M M
o n o 5

lift ^»
O ^ (N* h»
o o» o o
O r* O O
^

** ^
0*0 m so
00 00
oo do"

0 sT »N -O
o * o o
* • • t
Q *+ O O
W

f* ^*
O0 0| ^)
O •«• O O
• * • .
0- 00
w

^ r»
0 0
t •

m 01
o o y> 04 M
°°. ° f

O -•%
O <** t
O O
O •
o o
• 1
o

o
o •»
0 O
0 0

,*
0 -•
0 0
o o

f*t
o
o «•
o o
0 0

IM
o
o -»
O (9
o o

o
o »
o o
0 0

^»
WV ^
oc o
»• X
* vt
U C>
« M
a, w
!«
1 HI
1 •
I e
i ^
•fl I -

*M la*
O 1 U
m i o
1 1 4*

1 L,
** 1 a

•*> 1 •
o i u
1 1 U
1-4
1 C
1 0
I L
IX
-. 1 u
m \s
o !*•
m )••<
1 I •
1

!£
i

• 1 M
«*» i e
O 1 O
m i-i«
1 |««
1 >
!•
• IJ»
•* 1 O
O 1
n im
t ICB
I'M
1 -
1**
1

1
1
1 »
OB | *
e» 1 c
OO l«4
7 p
1 Ik
1 b
1 O

OBI t W
r* 1 «4
7 IS
! *
1

•o i e
• 1 O
•« t u
> i
!••*
i •
i

* 1 10
OK 1
f» 1 •
-« 1 «l
i i e
1 0
1**
o i:
» t w
at i •

1 1 O
10
l-o
i-e
•& t
a> l
o 1
A O 1

»* «* 1
• t 1
.* U 1
J 1
-------
7-5
Table 7-1 (cont'd.) Regression Results
VARIABLE DESCRIPTION
OZ79NR Average daily maximum one-hour ozone concentration in
1979 at monitor nearest individual's residence (in
parts-per-million)

OZ79AV Same as above but averaged over all monitors within 20
miles of residence (in ppm)

OZMULT Average hourly reading over all monitors within 20 miles
and averaged over the period 1974-79 wherever data were
available

S479NR Average 24-hour reading for sulfates for 1979 at nearest
monitor in micrograms-per-cubic meter)

S479AV Same as above but averaged over all monitors within 20
miles

SP79NR Average 24-hour reading for total suspended particulates
for 1979 at nearest monitor (in ug/m )

SP79AV Same as above but averaged over all monitors within 20
miles

SPMULT Average 24-hour reading for all monitors within 20 miles
and averaged over 1974-79 wherever data were available

RACE Dummy variable (-1 if white, -0 if other)

SEX Dummy variable (»1 if male, =0 otherwise (female,
ambiguous, etc.))

INCOME 1979 household income in dollars

EDUCATION Years of school completed

AGE In years
2
AGE Square of above

PACKYRS Lifetime cigarette consumption

CIGS/DAY. Number of cigarettes per day currently smoked
-------
7-6

and by the annual daily average sulfate concentration, again at the nearest

monitor (S479NR). (Recall that annual averages are used in explaining

chronic illness rather than the concentrations during the two-week recall

period. The latter are the appropriate measures in analyses of acute

illness like those in Chapters 4 and 6 above.)

According to equation (1), annual average ozone concentrations are

positively and significantly associated with the likelihood of reporting

CRD in the probe group. Neither sulfates nor any of the other independent

variables are related to CRD in a statistically significant way, including

the more sophisticated smoking variable PACKYRS.' In equation (2), sulfates

are replaced by total suspended particulate matter (also measured at the

nearest monitor) with virtually no change in the results. In equation (3)

both ozone and particulates are averaged over all the monitors within

twenty miles of the respondent's home. Thi's reduces both the magnitude as

well as the significance of the estimated of the ozone effect. The

particulate estimate changes sign (it is expected to be positive) but is

still far from being significant. The size and significance of the

coefficient estimates on the other regresaors are unaffected by this change

in the characterization of exposure. Equation (4) replicates (3) but with

sulfates substituted for total suspended participates. The results are

virtually identical to those in (3) with none of the regressors being

significantly associated with the likelihood of CRD.

In equation (5) , ozone and participates are measured by the multiyear

(1974-1979) annual average concentration (see Volume I, Chapter 2,

especially p. 2-37). This change makes a substantial difference in the
-------
7-7

size and significance of the estimated ozone effect. In addition, the

parameter estimate associated with particulates increases substantially in

significance, although it is still well below conventionally accepted

levels (t = 1.96 connotes significance at the 5 percent level). As in

equations (1) - (4), none of the other regressors, including either smoking

variable, is significantly associated with CRD.'

Equations (6) - (10) perform the same set of regressions as (1) - (5).

The difference is that the sample in the former consists of 2,283

individuals, none of whom received the respiratory probe as part of the

1979 HISJ Each of these individuals had the opportunity to report the

presence of a chronic respiratory disease in the open-ended part of the HIS

(and 74 did so), but they were not shown a list of CRDs and asked whether

they had any of them. As indicated above, only 3 percent of this group

reported CRD, as compared with 15 percent of the sample used in equations

(1) - (5).

The findings in (6) - (10) provide an interesting contrast to the

earlier ones. The ozone variable is never estimated to be significantly

associated with CRD in (6) - (10).' However, the total suspended

particulates coefficient estimate is uniformly more significant in this

latter set of regressions.' In fact, in equation (10) TSP is positively and

significantly (at the 5 percent level) associated with CRD.' Sulfates

performed as weakly as in the earlier runs.

Of equal interest is the performance of other independent variables in

(6) - (10). For instance, income is negatively and significantly

associated with CRD in all five regressions. All other things equal,
-------
7-8

individuals having higher incomes are relatively less inclined to report

CRD.' In addition, both cigarette smoking variables are significant.

PACKYRS, the measure of accumulated smoking history, is positively related

to the likelihood of CRD as one would expect. The sign of NCIGS is

negative, however, suggesting that current smokers are less likely to

experience CRD.' One explanation for this seemingly counterintuitive

finding is that individuals who believe they have or have been diagnosed as

having CRD have in all likelihood quit smoking. If so, one would expect to

find only those free of CRD among individuals currently smoking.

Because our findings are quite sensitive to the choice of the "probe"

or "non-probe" samples, some discussion is required.' It is our opinion that

the "probe" sample—that is, those who received questions about particular

respiratory diseases—is more likely to reflect accurately the incidence of

CRD in the United States. In fact, the National Center for Health

Statistics uses the results from the six different probes to make its

estimates of specific disease prevalence in the United States. On the

other hand, one must admit the possibility that at least some individuals

are motivated by the probe to report having some diseases of which they

have heard but for which they never received a professional diagnosis.

Concerning the poor performance of even the more sophisticated smoking

measures in (1) - (5), we intend to do additional work. One direction for

this work will be the disaggregation of the set of CRDs into

disease-specific analyses.' For instance,, it might be the case that smoking

(or other of the independent variables, for that matter) is related to the

incidence of emphysema but not to asthma or chronic sinusitis. By
-------
7-9

aggregating these different forms of CRD in the present analysis, we may be

obscuring disease-specific associations. This may also shed some light on

the role of ozone and other air pollutants in CRD.
-------
Chapter 8

ADDITIONAL SENSITIVITY ANALYSES

This chapter summarizes the results of sane additional sensitivity

analyses conducted pursuant to a variety of comments and suggestions

received during the peer review phase of the project.

8.1. The Effects of Precipitation on Acute Health Status

It was suggested by several peer reviewers that the use of two-week

daily average precipitation (AVPRECIP) as a covariate in the acute health

status models was perhaps an inappropriate characterization of the threat

to health posed by precipitation. Rather, it was argued, a superior

characterization would account not only for the mean effects of

precipitation (as captured by AVPRECIP), but also for the variance

effects, i.e. the number of days during the two-week period on which

rainfall occurred. The hypothesis is that the same total amount of

precipitation during a two-week period (=»14*AVPRECIP) poses a different

risk to health (respiratory health, in particular) when spread out evenly

over the two-week period than when concentrated over a one or two day

span.

Our data enable the examination of such effects. The idea is to

construct a measure of precipitation that captures both the mean and the

variance effects. The measure we created to assess this question

(RAINDAY.) is formulated as AVPRECIP divided by the average number of days

during the two-week period on which any precipitation occurred at all

(AVRAINYN). Thus, the measure can be construed as the average amount of
-------
8-2

precipitation occurring on the days when any precipitation occurred at

all. The measure is positively related to the mean effects, but

negatively related to the variance effects.

In order to assess the possible effects of substituting this new

measure as an explanatory variable, we examined the sample correlation of

the three measures:

AVPRECIP AVRAJNYN RAINDAY
AVPRECIP 1.000 0.533 0.790

AVRAINYN . 1.000 0.038

RAINDAY 1.000

The extraordinarily high correlation between AVPRECIP and RAINDAY has led

us' to conclude that the substitution of the latter measure for the former

in our acute health models would probably, have little material influence

on the results. Thus, while the Question of the appropriate

characterization of weather stress in statistical models of illness risks

is certainly an interesting one that merits additional study, it seems

reasonable to suggest that such additional effort in the present analysis

would probably not lead to additional clarification of the air pollution -

health effects relationships of primary interest.

8.2 Sample Size, Model Specification, and Parameter Estimate Sensitivity

In many of the models estimated in Volume I, the point estimates of

the relationships between air pollution and illness varied across

specifications depending on what set of regressors was used. The addition

or deletion of regressors not only implied respecifications of the null
-------
3-3

hypotheses under test, but also typically necessitated different sample

sizes on which the estimation was performed. In most cases, the varying

sample sizes were attributable to the fact that the data availability for

the various air pollution measures differed by pollutant, so that when

different sets of pollution measures were tested, the sample sizes varied

accordingly.

Sample selection considerations aside, the effects of using these

different sample sizes should be manifested only in the efficiency

properties of the estimators. However, inferences about the relationship

between air pollution and illness outcomes depend on the sample and

specification used. Thus, an understanding of the cause of the variance

in parameter estimates seems essential.

We concentrate our analysis of this phenomenon on models (49) and

(50) estimated in Volume I. Here, it is noteworthy that the addition of

the covariates N2NR01 and CONR01 in (50) to the set of pollutants included

in specification (49) (i.e., 03NR01, S4NR01 , and SPNR01). has at least two

important implications. First, the estimated coefficient associated with

03NR01 falls from 1.87 to 1.41, and the associated t-statistic drops from

2.32 to 1.46. Second, owing to the relative paucity of GO data — the

estimation subsample in (50) is about 25 percent smaller than that used in

(49) — 3,703 versus 4,899, respectively. Thus, one is necessarily led to

the question: Is the change in the estimated ozone coefficient and its

significance level due to the inclusion of the additional covariates, to

the smaller subsaraple, or to both?

To investigate this important question, we reestimated equation (49)

using the same sample of 3,703 on which specification (50) was estimated.

The results of this exercise are reported in Table 8-1 below. There it is
-------
OEP VARlABLEl TRADR3*
Table 8-1
SOURCE OF
MODEL 18
ERROR 3684
C TOTAL 5702
ROOT MSE
DEP MEAN
c.v.
SUM OF
SQUARES
97;427923
5307.067
5404.495
1 '200239
0.183365
654.563
MEAN
SQUARE
5.412662
1.440572
R«SQUARE
AOJ R«8Q
F VALUE

3.757
0.01SO
0.0132
PR08>F

0.0001
VARIABLE OF
INTERCEP
03NR01
S4NRQI
SPNR01
RACEM1BO
MARY1 NO
INCOMCQN
FAT
FATSO
AGE
AGE SO
SMQKY1NQ
CHRLWOUM
DMAXTEMP
AVPRECIP
HUMIDRF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
PARAMETER
ESTIMATE
0^767031
•0.00287839
•0,00032749
0^105050
0,044511
0.012«69
•.0000062991
-0^775180
0,156010
0.010623
•0,000111508
0.'046446
•0,000489087
0.256219
•0.00179559
•0.00312125
•0." 020593
0.003097651
STANDARD
ERROR

0.450348
0.924612
0.004161826
0.0006888529
0.061582
0.044609
0.046157
.00000239667
0.316011
0.061056
0.006666158
0.0000702869
0.042435
0.006831445
0.056659
0.002027Q94
0,001211361
0.217254
0.002611947
T FOR HOf
PARAM£TER«0

1.703
1,920
•0.692
•0.475
1.706
0.998
0.279
•2,628
•2.453
2.555
1.594
•1,586
1.095
•0,072
4.522
•0.886
•2,577
-0,095
1,186
PROB > ITl

0,0886
0,0550
0,4892
0.6S<»5
0,0881
0.3184
0,7804
0,0086
0,0142
0,0107
0,1111
0,1127
0,2738
0,9429
0,0001
0,3758
0.0100
0,9245
0.2357
-------
8-4

seen that the estimated coefficient associated with 03NRQ1 is 1.77 with a

t-statistic of 1.92. This coefficient estimate is relatively close to the

1.87 estimated on the larger subsample — well within one-half of the

standard deviations of either estimate. It seems then that the difference

between the 1.77 value in the r_eestiraated model (49) and the 1.41 value

estimated in model (50) should be largely attributed to the inclusion of

the two additional pollution covariates. That such a change results is of

little surprise given that the partial correlation between 03NR01 and

N2NR01 is large (0.281). In the presence of such high correlation, one

would expect that the separate influences of 0 and NO- would be more

difficult to identify than would be the case if the two measures were

orthogonal. The results of this exercise are somewhat reassuring given

that samples of varying sizes were used in estimation throughout Volume I.

In summary, on the basis of this (admittedly small-scale) exercise,

it seems fair to say that the effects of using different estimation

samples were indeed largely restricted to efficiency effects, and that the

dispersion of the estimates of the air pollution - illness relationship

should be attributed not to the different samples used, but rather, as

would be hoped, to the different specifications tested.

8.3 Poisson Regression Analysis of Volume I Models (48), (49), and (50)

In the later phases of our research, we have largely turned our

attention to estimation techniques which we believe better treat the

nature of our dependent variables than the methods utilized in the

large-scale analyses presented in Volume I. Insofar as the

restricted-activity-day measures of illness are concerned, the Poisson

regression technique (described in detail in Chapter 4) has been a
-------
3-5

preferred estimation method. While the large part of our analysis using

this methodology is presented in Chapter 4, it has been proposed by some

of our reviewers that for purposes of comparison we reestimate using

Poisson methods some of the specifications that were estimated by OLS in

Volume I. Three such reestimations are presented here.

We elect to concentrate this effort on the total respiratory

restricted activity days (TRADRSP) models whose OLS estimates were

presented as models (48)-(50) in Volume I. Recall that these models were

formulated on three different assumptions about which air pollutants

should be included as explanatory variables. Models (48)-(50) specified

the set of air pollution regressors as, respectively, {03NR01, S4NR01},

{03NR01, S4NR01 , SPNR01}, and {03NRQ1 , S4NR01 , SPNR01, N2NR01 , CONR01}.

Due to availability of pollution data, the samples on which these

specifications were estimated had varying numbers of observations;

respectively, these were 4,906 (197); 4,899 (197); and 3,703 (154), where

the figures in parentheses are the number of observations having positive

TRADRSP realizations.

The results of the Poisson reestimations are presented in Tables 8-2

through 8-4. There it is seen that inferences drawn in Volume I about the

relationship between ozone and respiratory-related restricted activity

days are largely corroborated by the reanalysis. Specificically, in all

three specifications, the coefficient estimate associated with 03NR01 is

positive, and statistically different from zero. (Recall from Chapter 4,

however, that these significance levels are perhaps overstated. The

robust covariance estimation techniques used in Chapter 4 are not used in

this reanalysis, so that some caution should be exercised in interpreting

significance levels. However, recall also that the parameter estimates
-------
Table 8-2
095
•*• NUMBER Of OBSERVATIONS •«•

N UBS N POS N iERO

4906 197 4709
I.MT
93NR01
•«• PARAMETER ESTIMATES •••
HAT

•0,97*1*7
STO tHR

-------
Table 8-3
089
*** NUMBER OF OBSERVATIONS ***

N 083 N POS N ZERO

4899 197 4702
VARIABLE

INT
03NR01
S4NR01
3PNR01
RACEtetflO
3EXM1FO
MARYINO
INCOMCON
FAT
FATSQ
AGE
AGESQ
3MOKY1NO
EDCOMCON
CMR^MOUM
*«* PARAMETER ESTIMATES ***
BETA MAT 3TD ERR T 3TAT
-1.95552
10.2733
•0.0062^601
•0. 00225262
0.669553
0,175727
OMAXTEMP
•.000028456
-1.7037
0.325624
0.0544776
».00060?24T
0.247066
0.009U445
1.03795
•O.Q0682027
-0.0161773
0,7(J2162
0. 01483*2
0.63180*5
1.44988
0,00728557
0,00126152
0.131209
0^0762988
0.0786125
0.000004451
0.340436
0,05*1044
0.0120^78
0.000127149
0^0725259
0.0118421
0.0791518
0.00302260
0,00199425
0.35^582
0,004^0774
-3.06347
7.0856
-0,860058
-1,78563
5.10296
2.30315
-1,066
-6,39313
-5.00447
5.60412
4,5031
•4.73653
2,85506
0,769665
13,1134
-2.25636
•6.11196
i.95272
3.2205
-------
Table 8-4
08S
t«« NUMBER OF OBSERVATIONS •*•

N ogs N COS X iERO

3703 15* 3349
••• PARAMETER ESTIMATES •••
INT
03MROI
S4NROI
SPNR01
RACE* 180
SEX&1FO"
WARY1NO
INCUMCON
fAT
8ET* HAT

•i.soazb
8,45847
HUMI0HF
S2NH01
CONR01
0.1U061
•000039U1
0.058130*
,000596945
"
SMQKY1NO
COCOMCON
STO &RR

0,691047
0,00904154
0.00146432
0,0947231
0,0881084
,0000051072
•'0,357296,
0,0595565
0.0134132
.0,00679773
' 0,961681
•0,0113383
0,6813664
O.OU736B
0.01139U
0,00261315
0,0254361
O.Q03335SI
0,0022185
0.432224
0,00518343
T STAT

•2,1826
5.12837
-2,7466
4,83992
1.72194
1.29456
•7,66391
•4,87693
5,80728
4,33381
.4,31483
3.27755
•0.533707
11,0007
•3,39926
•0,42321
•0,0403237
U7»79i
-------
3-6

themselves should be consistent.) In addition to the ozone relationship,

the other estimated relationships are largely in line with those reported

in the original Volume I specifications estimated by OLS.

The upshot of this analysis, then, is that the inferences suggested

in Volume I seem substantiated, and while the magnitudes of the estimated

responses do differ (as would be expected with different estimation

techniques), the direction and general magnitudes of the estimates are

quite comparable.

8.4 Sensitivity to Aggregation Across Smoking and Chronic Illness Status

A common econometric problem occurs when disparate structures are

mistakenly assumed to be identical. When empirical analysis proceeds by

aggregating the disparate structures and estimating as if they were

identical, it will generally be the case that none of the structures will

be estimated consistently. It has been suggested that insofar as the

health outcome models estimated in Volume I are concerned, such

aggregation bias poses a potential problem when the structures of the

health outcome models are assumed to be the same across either smoking

status or chronic illness categories.

In the present section, we undertake a reanalysis of some of the

specifications estimated in Volume I, considering the possibility that

individuals' illness responses to covariates are different depending on

whether they are never, former, or current smokers, and on whether they

are or are not plagued by a chronic respiratory condition.

The first analysis — that of differential responsiveness across

smoking status — uses Poisson regression analysis of the TRADRSP

dependent variable. The sample sizes used for the groups of never,
-------
3-7

former, and current smokers are, respectively, 1,439 (47); 565 (26); and

1,243 (47), where again the number of positive TRADRSP realizations are

given in parentheses. The set of air pollution regressors is limited in

this exercise to ozone and sulfates.

The results of this analysis are presented in Tables 8-5 through 8-7

in which both the Poisson ML covariance estimates and those obtained using

the robust methods discussed in Chapter 4 of this volume are presented.

These results reveal an interesting pattern of the relationship between

ozone and TRADRSP. While Table 8-5 shows the estimated relationship

between ozone and TRADRSP to be negative (though statistically

indistinguishable from zero) for never smokers, entirely different, and

somewhat surprising, inferences are drawn about the relationship between

ozone and acute respiratory illness for the groups of former and current

smokers. In Tables 8-6 and 8-7 it is seen that the estimated ozone effect ,

for both these groups is positive and statistically significant at

conventional levels even when the robust estimates of the parameter

standard errors are used. The magnitude of the response appears to be

largest for the group of former smokers, although the physiological

underpinnings of this phenomenon are not obvious. While we have not

tested statistically for whether the structures of the models for the

three groups are the same (using, e.g., a likelihood ratio test), the

results suggest that a reasonable conjecture is that such tests would

reject the hypothesis of homogeneity.

In the second analysis, we use OLS to assess the possibility that the

structures of the TRADRSP models differ depending on whether an individual

has a chronic respiratory illness. The analysis is somewhat hampered

because only a small number of individuals (364) in this estimation sample
-------
Table 8-5
OBS
••• NUMBER OF OBSERVATIONS •••

N 089 N POS N ZERO

1*39 47 1392
VARIABLE

INT
03NROI
S4NRQI
RACEW180
SEXM1FO
XNCOMCON
AOE
EOCOMCON
CHRLMOUM
AVMAXTMP
AVPRECIP
• •• PARAMETER ESTIMATES •«•

8CTA MAT STO ERR T STAT
•1,72325
-0,0921191
1,4318
0,0116624
», 000043179
•0,00137636
0,011961
1,03309
•9.8738C-0*
0,811043
0,561939
4,12275
0,018894
0,327568
0,162797
,0000086888
0.00382972
0,0210916
0,171375
0,00524898
0,720308
•3,76634
•0,417984
•4,87558
4,37101
0.0716374
•4.96956
•0,35939
0.548135
6,02825
•0,00188108
1,12597
•«** PARM, ESTS. (RQ8UST VARIANCE ESTIMATES)
R08UST

INT
03NR01
S4NR01
RACCW180
SEXM1FO
INCOMCON
AGE
EOCOMCON
AVMAXTMP
AVPRECIP
•2,1164»
•1,72325-
•0,0921191
1.4318
0.0116624
••000043179
•0,00137636
0.011561
1,03309
•9,8738E«06
0,811043
STO ERR

1*94121
6.3755
0,0417871
0,518293
0,462579
,0000256016
0,0085561
0,0603543
0.511635
0,0133843
1,50329
T STAT

•1,09027
•0,270292
•2,20449
2,76253
0,0252116
•1,68659
.0.160863
0.191552
2.01919
,000737712
0,53951
-------
Table 8-6
OSS
*«* NUMBER OF OBSERVATIONS ***

N Q83 N PQ3 N ZERO

665 26 639
*** PARAMETER ESTIMATES •**
VARIABLE

INT
Q3NR01
34NR01
RACEfclBO
SEXM1FO
INCOMCON
AGE
EDCOMCON
CHRCMOUM
AVMAXTMP
AVPRECTP
BET* HAT

•29,1 S89
16.6943
0.0321111
•0.643355
.0000026724
0.00746*27
•0.0547761
0. 4$20ai
•0.0388972
-2.*342
STD ERR

400480
3,50114
0.0103115
400480
0,196764
.0000115801
0,00629732
0.0333252
0.2063
0.00613555
1.17849
T STAT

0.00007281
4.T6824
.0000736409
•3.44474
0.230776
1.1861
•1.64368
2.23986
•6.33965
-1.98729
**** PAR*. ESTS. (ROBUST VARIANCE ESTIMATES) *«**
ROBUST

INT
03NR01
S4NRQ1
RACEW13Q
SEXM1FO
INCOMCON
AGE
EOCQMCON
AVPRECIP
BETA HAT

•29.1589
16.6943
0.0321111
29.491T
•0.643355
,0000026724
0.00746927
-0.0547761
0.462081
•0.0388972
•2.342
STD ERR

1.70479
6.96203
0.0250393
0,4108<»4
0.494653
.0000240709
0.0133358
0.0810222
0.659609
0.0133654
2.59748
T STAT

•17.1041
2.3979
1.28243
71,7745
•1.2876
0.111023
0,560093
•0,676062
0,700538
•2,91028
•0,901644
-------
Table 8-7
oas
»•• NUMBER Of OBSERVATIONS •••

N OSS N POS N ZERO

1243 4? 1196
VARIABUC

INT
03NH01
S4NKQ1
RACE* 180
*•• PARAMCT6R ESTIMATES •••

HAT STU tRH T STAT
•2,45614
9*16519
-0, 00376^5
INCOMCON
AQ6
NC18SOYN
EOCOMCON
AVMAATMH
0,317721
'•000046108
»«00004272tt
0.015498
0.04978*
0,670ttQ2
•0« 024? 194^
4*33699
0.446641
3*29159
0,0133045
0,249973
0,139093
0*000008284
0*00468037
0*00432861
0,0265047
0*165151
0*00446028
0*614482
•5,04713
Z,78443
-0,283251
3*58481
2,28424
•5,56598
•0*00912914
3,58035
1*87831
4,06175
•5,54213
7*05796
•••» RARM» ESTS* (ROBUST VARIANCE ESTIMATES!
R08UST

INT
03NR01
S4NH01
RACEM18U
INCOMCON
Aae
NCI BSD YN
AVMAXTMf
AVPHEC1P
9ETA HAT

•2»4S614
9,16519
•0,003768*
0*896108
0*317721
**000046108
.,00004272*
0.015498
0*049784
0*670802
•0*0247194
4*33699
STO EHR

1*61305
4*26883
0,0218712
0*647155
0*462594
•0000262119
0,0105212
0,0157747
0,0669773
0,513254
0,0118029
2*69697
T STAT

•1,52267
2,14701
•0*172304
1,40642
0*686026
•1*75907
•0,0040611
0,982459
0,743297
1,30696
•2*0943»
1,60809
-------
8-8

report chronic respiratory conditions. The results are presented in

Tables 8-8 and 8-9, where it is seen that the estimated magnitudes of the

ozone effects are dramatically different in the two instances. Note

carefully, however, that the means of the dependent variable for the two

samples differ by an order of magnitude (0.11 for the sample having no

chronic respiratory illness, 1.04 for the sample reporting seme chronic

respiratory illness). On the basis of this phenomenon, it appears that

homogeneity of the two groups can be rejected without any additional

analysis solely on ground that the outcomes are far too disparate to

believe that the expected values could possibly be the same. For example,

a simple t-test of homogeneity of means would surely reject the null

hypothesis in this instance.
-------
Table 8-8
OEP VARIABLEI TRAORSP

SOURCE or
MODEL 11
ERROR 4515
C TOTAU 4586
ROOT MSE
OEP MEAN
C.V.
SUM Of
SQUARES
11*553244
2928.552
2941*105
0.805374
0,106030
759.5683
MEAN
SQUARE
1*141204
0.648627

R-SQUARC
AOJ R«SQ

r VAUUC

1*759
0,0043
0.0018
PR08>r

0.0551
VARIABLE Or
PARAMETER
ESTIMATE
XNTERCER
03NR01
S4NR01
RACCW180
INCOMCON
A8C
NCI9SOYN
EDCOMCON
4VMAXTMP>
PORMEfl
1 0*278607
1 0.564490
1 -0,00182198
1 0.042292
1 «0.030077
1 ••0000022889
1 •«•000475199
I 0.001486407
1 -0,000753311
1 -0,00252467
I Ot082526
I 0*044112
STANDARD
ERROR

0*085010
0.564345
0.002222703
0.037615
0.024943
•00900138313
0,0007031875
0.001009186
0*004108444
0.000838691
0.122702
0*032747
T FOR MOI
PARAMETERS

1.177
I.000
-O.ffO
1*124
•1*206
•l*6fS
•0.676
1*473
•0*183
•3*010
0*673
1*347
PROS * ITI

O.OOli
0.3172
0.4124
0.2609
0.2279
0.0980
0.4992
0.1409
0,8545
0*0026
O.S013
0*1780
-------
Table 8-9
OER VARIABLE! TRADRSP
SOURCE OF
MODEL 11
ERROR 392
C TOTAL 363
ROOT MSC
OEP MEAN
C,V.
SUM or
SQUARES
229,693
3319,766
3849,462
3,071019
1,038462
295,7278
MEAN
SQUARE
20,881194
9,431160
R«SQUARE
AOJ R»SQ
F VALUE

2.214
0,0647
0,03SS
PR08*F

0,0139
VARIABLE OF
INTERCEP
03NR01
S4NR01
RACEW1BO
INCOMCOW
AOC
NCI3SDYN
EOCOMCOM
AVRRECXP
1
1
I
1
1
1
1
1
1
1
1
1
PARAMETER
ESTIMATE

1,296410
9,363780
•4,036888
0,707993
0.923020
,0000263253
0,011920
0*016938
•0,0037767
•0,026317
2,926834
•0*464974
STANDARD
ERROR

1,264699
8,108099
0,030704
0,499220
0,391846
,00001916124
0.00989562
0,013916
0.060918
0,011300
1,699772
0,439989
T FOR MOt
PARAMETERS

0,993
1.199
•1.201
1.418
2,623
-1.3T4
1.209
1.224
•0,062
•2,329
1.722
•U067
PR08 > ITI

0*3212
0.2469
0,2304
0.1970
0,0091
0.1704
0.2292
0.2219
0,9506
0.0204
0.0860
0.2869
-------
TECHNICAL REPORT DATA
(Please read Instructions on the ret erse before completing)
1. REPORT NO. 2.
EPA-450/5-85-005C
4 TITLE AND SUBTITLE
Ambient Ozone and Human Health: An Epidemiologica
Analysis Volume III
7. AUTHORIS)
Paul R. Portney and John Mull any
9 PERFORMING ORGANIZATION NAME AND ADDRESS
Resources for the Future
1616 P Street N.W.
Washington, DC 20036
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Enviornmental Protection Agency
Office of Air Quality Planning and Standards (MD-12
Research Triangle Park, NC 27711
15 SUPPLEMENTARY NOTES
Project Officer: Thomas G. Walton
3. RECIPIENT'S ACCESSION NO.
5. REPORT DATE
1 June 1985 (Date of Preparation )
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT NO.
10. PROGRAM ELEMENT NO.
12A2A
11 CONTRACT/GRANT NO.
68-02-3583
13 TYPE OF REPORT AND PERIOD COVERED
Final Report
M 14. SPONSORING AGENCY CODE
OAQPS

16. ABSTRACT
This report is the third volume of an analysis of the relationship between
ozone and human health benefits.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS b.iDENTIFI
Benefit Analysis
Air Pollution, 03
Epidemiology
18 DISTRIBUTION STATEMENT 19. SEC'JRl
Unclas
Release Unlimited 20 secypi
Unc las
ERS/OPEN ENDED TERMS C. COS AT I Field/Group

rv CLASS (This Report) 21 NO. OF PAGES
sified 226
TY CLASS iThispagei 22. PRICE
sified
EPA Form 2220-1 iRev. 4-77) PREVIOUS EDITION 'S OBSOLETE
-------
INSTRUCTIONS

1. REPORT NUMBER
Insert the EPA report number as it appears on the cover of the publication.

2. LEAVE BLANK

3. RECIPIENTS ACCESSION NUMBER
Reserved for use by each report recipient.

TITLE AND SUBTITLE
'"itle should indicate clearly and briefly the subject coverage of the report, and be displayed prominently. Set subtitle, if used, in smaller
•je or otherwise subordinate it to mam title. When a report is prepared in more than one volume, repeat the primary title, add volume
mber and include subtitle for the specific title.

5. REPORT DATE
Each report shall carry a date indicating at least month and year. Indicate the basis on which it was selected (e.g., date of issue, dare of
<>, oval, date of preparation, etc.).

6. PERFORMING ORGANIZATION CODE
Leave blank.

7. AUTHOR(S)
Give name(s) in conventional order (John R Doe, J. Robert Doe, etc.). List author's affiliation if it differs from the performing orgam
zation.

8. PERFORMING ORGANIZATION REPORT NUMBER
Insert if performing organization wishes to assign this number.

9. PERFORMING ORGANIZATION NAME AND ADDRESS
Give name, street, city, state, and ZIP code. List no more than two levels of an organizational hirearchy.

10. PROGRAM ELEMENT NUMBER
Use the program element number under which the report was prepared. Subordinate numbers may be included in parentheses.

11. CONTRACT/GRANT NUMBER
Insert contract or grant number under which report was prepared.

12. SPONSORING AGENCY NAME AND ADDRESS
Include ZIP code.

13. TYPE OF REPORT AND PERIOD COVERED
Indicate interim final, etc., and if applicable, dates covered.

14. SPONSORING AGENCY CODE
Insert appropriate code.

15. SUPPLEMENTARY NOTES
Enter information not included elsewhere but useful, such as: Prepared in cooperation with. Translation of, Presented at conference of.
To be published in, Supersedes, Supplements, etc.

16. ABSTRACT
Include a brief (200 words or less) factual summary of the most significant information contained in the report. If the report Contains a
significant bibliography or literature survey, mention it here.

17. KEY WORDS AND DOCUMENT ANALYSIS
(a) DESCRIPTORS - Select from the Thesaurus of Engineering and Scientific Terms the proper authorized terms that identify the major
concept of the research and are sufficiently specific and precise to be used as index entries for cataloging.

(b) IDENTIFIERS AND OPEN-ENDED TERMS - Use identifiers for project names, code names, equipment designators, etc. Use open-
ended terms written m descriptor form for those subjects for which no descriptor exists.

(c) COS ATI HELD GROUP - Field and group assignments are to be taken from the 1965 COSATI Subject Category List. Since the ma-
jority of documents are multidisciphnary in nature, the Primary Field/Group assignment(s) will be specific discipline, area of human
endeavor, or type of physical object. The application(s) will be cross-referenced with secondary Field/Group assignments that will follow
the primary postmg(s).

18. DISTRIBUTION STATEMENT
Denote relea;>ability to the public or limitation for reasons other than security for example "Release Unlimited." Cite any availability to
the public, with address and price.

19. & 20. SECURITY CLASSIFICATION
DO NOT submit classified reports to the National Technical Information service.

21. NUMBER OF PAGES
Insert the total number of pages, including this one and unnumbered pages, but exclude distribution list, if any.

22. PRICE
Insert the price set by the National Technical Information Service or the Government Printing Office, if known
EPA Form 2220-1 (Rev. 4-77) (Reverse)
-------
DATE DUE
.J __
il.ll--- j • „. ••
-------
5 I
> Tl
1C 3
a <
3 -
ft
2
o
-
on
O
a) y 3; sr ^
< ft) c
_

" r* a
O O O
,g a
c° a
J QJ -*
- - S
a, n> J
-^ O ^j

III
m > T3 m TITJ
5^ 3 3 S S
-------