United States       Office of Air Cudhty       EPA-450/5-85-005c
Environmental Protection   Planning ana Standards      August 1985
Agency         Research Triangle Park NC 27711
Air
Ambient Ozone And
Human Health:
An  Epidemiological
Analysis

Volume III

-------
                      AMBIENT OZONE AND HUMAN HEALTH:

                        AN EPIDEMIOLOGICAL ANALYSIS
                     Paul R. Portney and John Mullahy
                         Resources for the Future
                            1616 P Street, N.W.
                          Washington, D.C.  20036
                                Volume III
                               Final Report
                                 June 1985
Submitted to the Economic Analysis Branch, Office of Air Quality Planning
and Standards, Environmental Protection Agency, Research Triangle Park,
North Carolina 27711, under contract number 68-02-3583.

-------
                                 DISCLAIMER
     This report has been reviewed by the Office of Air Quality Planning



and Standards, U. S. Environmental  Protection  Agency, and approved for



publication as received from Resources for the Future.   The analysis and



conclusions presented in this report are those of the authors and should



not  be interpreted  as necessarily reflecting  the official  policies of



the U. S. Environmental Protection Agency.

-------
                             TABLE OF CONTENTS
                                                                 Page
CHAPTER 1.  INTRODUCTION
1-1
CHAPTER 2.  ECONOMETRIC ESTIMATION OF HEALTH STATUS MODELS

   2.1   Introduction                                             2-1
   2.2   Some Problems with Least-Squares Estimation of Health
          Status Models                                           2-3
   2.3   Tobit Health Outcome Models                              2-7
   2.4   Cragg-class Health Outcome Models                        2-11
   2.5   Truncated-Normal Estimation                              2-15
   2.6   Heckman's Approach:  Sample Selection                    2-19
   2.7   Tobin, Cragg and Heckman:  A Digression                  2-21
   2.8   Poisson-distributed Health Outcome Measures              2-30
   2.9   Geometric-distributed Health Outcome Measures            2-33
   2.10  Multinomial-distributed Health Outcome Measures          2-35
   2.11  Estimation of Grouped Data Models Under the
          Normality Assumption                                    2-38
   2.12  Summary and Conclusions                                  2-40

CHAPTER 3.  AIR POLLUTION MONITORS AND INDIVIDUAL EXPOSURE        3-1

CHAPTER 4.  URBAN AIR QUALITY AND ACUTE RESPIRATORY ILLNESS

   4.1   Introduction                                             4-1
   4.2   Framework for the Analysis                               4-3
   4.3   Model Specification                                      4-8
   4.4   Empirical Results                                        4-11
   4.5   Policy Implications                                      4-20
         Appendix                                                 4-28

CHAPTER 5.  CONSTRUCTING A LIFETIME SMOKING PROFILE USING THE
            1979 HEALTH INTERVIEW SURVEY                          5-1

CHAPTER 6.  CIGARETTE SMOKING, AIR POLLUTION, AND RESPIRATORY
            ILLNESS:  AN ANALYSIS

   6.1   Introduction                                             6-1
   6.2   Smoking, Pollution, and Acute Illness                    6-2
   6.3   Data and Estimation Strategy                             6-5
   6.4   Estimates of Model Parameters and Relative Risks         6-14
         Appendix                                                 6-23
CHAPTER 7.  CHRONIC RESPIRATORY DISEASE
7-1

-------
                                                                  Page

CHAPTER 8.  ADDITIONAL SENSITIVITY ANALYSES

   8.1   The Effects of Precipitation on Acute Health Status      8-1
   8.2   Sample Size, Model Specification, and Parameter
           Estimate Sensitivity                                   8-2
   8.3   Poisson Regression Analysis of Volume I
           Models (48), (49), and (50)                            8-4
   8.4   Sensitivity to Aggregation Across Smoking and Chronic
           Illness Status                                         8-6

-------
                               Chapter  1
                             INTRODUCTION
     Volume I of  this  report  presents  a. great many results from our basic




analysis of ozone and acute and  chronic illness.  As indicated in Volume I,




we had to  make  a number of decisions along the way in the early stages of




our research.   One of the most important  concerned the tradeoff between the




breadth of our  analysis  as  opposed  to  the possible in-depth investigation




of a relatively small number of hypotheses.   In other words, should we use




fairly  standard   statistical   techniques   to  investigate  dose-response




relationships for  a  broad range  of  possible illness, using a  variety of




explanatory variables, and in a number of different population groups?  Or




should  we  winnow  out  a relatively few "promising"  relationships  using




preliminary tests, and  then  allocate time and  computing  resources  to the



application of more powerful  statistical  techniques to these relationships?




     With some  exceptions we  adopted the former approach.   Because of our




unique  and   comprehensive   data  on  air  pollution  concentrations  and



individuals'  health and socioeconomic status,  and we elected to test a wide




variety of  hypotheses  about  the  possible relationships  between  ozone and




other air  pollutants  on the one hand,  and a  variety  of  acute  and chronic




illnesses  on  the  other.   In  addition, we  examined separately  several

-------
                                  1-2






different degrees of  severity  for the acute  illnesses  we examined and we



also conducted separate analyses  for  adults  and for children (aged 17 and



below).  Of course,  as we point out in Volume I, we did conduct additional



sensitivity analyses where our  preliminary  research  suggested statistically



significant  associations  between ozone  and  the  dependent  variable  in



question.  Nevertheless,  the general approach was a  "broad brush" one.



     Since  completing the  work  reported  in Volumes  I  and  II,   we  have



received  many helpful  comments  on  and   constructive  criticisms  of  the



approach  we took in our analyses.   Many of  these  comments  came  in  an



EPA-sponaored public Peer Review Meeting held  in Raleigh (N.C.) on  April 3,



1984.'   There experts from  the epidemiological, clinical, biostatistical,



and economic  communities  presented us with a  number of useful suggestions



for further work.   In addition, we have received many useful comments from



our EPA  project  officers  and from our colleagues at RFF  and elsewhere who



have  read  with  interest our  original  work.    Finally, we  have given



considerable thought ourselves  to  ways in which  the  original analysis might



be extended or improved.



     Thus, over the past  year we  have tried to  conduct additional  analyses



that address  some of  most important  questions arising out of our  original



work.  Volume III below presents  the results of some of that work.  We say



"some"  because  we  are  continuing  to conduct  additional epidemiological



analyses  as  time and resources permit,  at least some  of  which  may not be



complete  until  after this  report has been submitted.    In one  sense,  in



fact,  we hope to never  be  "done" with  our work even  though  this report



completes our analysis for EPA.

-------
                                  1-3





     In one way or  another,  each  of  the following chapters is designed to



address one  or more  of  the  questions  raised in  our  earlier work.   For



instance,  Chapter  2 is purely methodological.   It presents  a  variety of



different   estimation  techniques   that  may  be   appropriate   when  the



assumptions that lie  behind  ordinary  least squares (OLS)  are violated as



several careful readers of  the studies in Volume I suggested they might be.



There we consider the sorts of problems that arise with OLS in the special



context of  health  effects  estimation.   Among the alternatives  to  OLS we



consider    are   Tobit    estimation,    Cragg-type    "hurdles"    models,



sample-selection and count-data models, multinomial logit  approaches, and



grouped  dependent   variable   techniques.    This   chapter   is  a  long  and



technical  one, we  realize.    However,  we feel  it is necessary to  set the



stage for  the  empirical  work presented in  later  chapters;  it should also



prove  useful  to anyone  about to embark for  the first time on  his own



estimation  of   air pollution (or  other  environmentally-related)  health



effects.



     Chapter 3  is much shorter and simpler.  It addresses a common reaction



to our  original  analysis.    Remember  that  in Volume I, the  air  pollution



readings we  assign to each  individual  are those measured at  the  monitor



nearest his or  her  home,  provided  that the monitor in question is. no more



tihan twenty miles  away (sometimes less).    We  continue  to  believe  this is



preferable to  the  most common alternative  to  this approach—suggested in



the literature matching  each individual in an SMSA to the  air  pollution



concentrations  averaged  over all the  monitors in  the  SMSA,  or within  a



subset of  it.  However, because most individuals  do travel  about  within an

-------
                                  1-4






area, it  is possible  that  the  area  wide averaged approach  might  better



characterize the  exposures  of  at  least  seme  individuals.    If  so,  these



averaged  concentrations  would   be   the  appropriate  ones   to  use  in



epidemiological analyses.   Hence,  it  is  of  interest to  know how closely



correlated  are  the readings  at  the  monitor(s)  nearest  the  individuals'



dwellings with the average of  all  the monitors within a given radius of the



dwelling.  This analysis is  undertaken  in  Chapter 3.



     That exercise in  turn forma  the basis for some sensitivity analysis we



conduct  in Chapter 4 of the  effect  on our  findings of different rules about



matching air  pollution concentrations  to individuals.  Chapter  4 extends



and  improves  upon our original work in  a number of other  ways,  as  well.



For  instance,  building on the methodology presented in Chapter  2 of this



volume,  in Chapter  4  we investigate the determinants  of  acute respiratory



disease using  poisson regression instead of  the OLS  and  logit techniques



employed  in Volume I.'   For  reasons  presented  in Chapters  2 and  4,  we



consider this to be a  significant improvement on our earlier analysis.  In



addition, Chapter 4 presents a more sophisticated analysis of  the possible



non-linearities  that   may  characterize  the  dose-response  relationship



linking   acute  respiratory   disease   to   ambient   ozone   and  sulfate



concentrations.     Not   only   do  we   consider   spline-type  functional



relationships, but  we also  allow for  a  variety  of  non-linearities  within



the  (already  non-linear) poisson  approach.   This, too,  sheds additional



light  on  the  analysis  in Volume  I.    Finally,  we  believe  that  the



elasticity-of-response calculations contained in the last  part  of Chapter 4



are  a useful  way  to view the possible effects  of changes  in ambient ozone

-------
                                  1-5





concentrations on human  health.   This suggests how  our  findings  might be



used in applied policy analysis  if it were desired to do so.



     One of the respects  in which  the analysis in Volume I could clearly be



improved concerns the  measures  of cigarette smoking we  employed.   Recall



that in most  of  the  models estimated,  we used MCIGS, a continuous measure



of  daily  cigarette  consumption, or SMOKY1NO, a  dummy  variable indicating



whether or not  an individual  is a never- or former  smoker  as opposed to a



current smoker.  We  also occasionally used  an  additional  dummy variable,



FORMER, to distinguish between those who  do  not smoke now but once did from



those who never smoked.



     However,  even this additional treatment  resulted in our finding a less



pronounced relationship between smoking and  ill health than  we might have



expected (although we  hasten  to point  out  that  even our  crude measures of



smoking  were  often   positively  and   significantly  associated  with  ill



health).  One reason  for  this  was  our inability in the Volume I analyses to



make  use  of  all  the data  provided  in the  HIS  Smoking  Supplement  on



individuals'  lifetime smoking histories. Thus, one of our  purposes  in the



analyses we  have conducted since  April  1984 was  to develop a.  measure of



lifetime  smoking behavior  and employ it  in our  analyses.   Chapter  5



presents the approach we took in  doing so.   While the  HIS  smoking data do



not enable us to specify an exact profile of respondents' lifetime smoking



habits,  they  do permit  the  construction  of several  plausible profiles.



These are discussed in some detail in Chapter 5.  Among other things, that

-------
                                  1-6






chapter discusses the differing weights that might  be  given to cigarettes



smoked years ago compared to recent  cigarette consumption.



     Chapter 6 presents  the results  of additional empirical analysis of the



relationship  between  air  pollution   (ozone   and  sulfates)   and  acute



respiratory disease. It  extends the analysis in Chapter  4 of this volume,



and  all  the  work  in Volume  I,  in several important  ways.    First,  the



analysis  in Chapter  6   incorporates  the  more sophisticated measures  of



individual  smoking.   For  instance, in  addition to  NCIGS,  a  measure  of



current smoking  habit,  the  analysis also  includes the  variable  PACKS,  a



proxy  for  lifetime  cigarette  consumption.    The  analysis  in Chapter  6



extends our earlier research  in another suggested  direction.   That is,  it



models the  individuals'   health  outcomes as  a multinomial logit process  in



which, on  any given day during the two-week recall period, an individual



could  report  no restriction  of activity  at  all,  a minor  restriction  in



activity attributable to respiratory illness (with no bed confinement),  or



what  we  refer to as  a   "severe" respiratory restriction—i.e.,  one which



requires confinement to  bed for  at least half the dayj  For reasons spelled



out  in Chapter  6,  we feel this  is another  productive  way to model  the



possible  relationship  between ambient air quality and  acute  respiratory



disease.   (The chapter   also  contains  a very  brief  discussion of ordered



logit as an estimation approach.)



     Chapter  6  is  intended to  accomplish  one  additional objective.   The



comments  on our  work in Volume I  often  expressed surprise that cigarette



smoking  did  not  completely   "swamp"   ambient  ozone  pollution  in  its



contribution to acute (and chronic)  illness—even though we  generally found

-------
                                  1-7





a positive and significant association between smoking and illness.   Thus,



one purpose of the analysis in Chapter 6 is to explore in somewhat greater



detail the  relative  risks posed  by  cigarette smoking and  air  pollution.



This is important since considerable public resources are currently devoted



to  reducing both.   While far  from  being  comprehensive on  the  subject,



Chapter 6 does explore these relative  risks in some detail.



     As  the  preceding pages  suggest,  most  of  the emphasis in  Volume  III



falls  on the  possible relationship  between  ambient ozone (and  sulfate)



concentrations and  acute  respiratory health.    This  reflects  the  heavy



emphasis  given acute  health  effects  in  our  earlier work as  described in



Volume I.   However,  we did devote some  attention in Volune I to possible



relationships  between  long-term  exposures   to  air  pollution  and  the



prevalence of chronic  respiratory and  other kinds of  disease.



     Chapter 7 below  presents  the results of some preliminary reanalysis of



those finding, specifically those dealing with chronic respiratory disease.



The analysis  below  extends our  original  work in several  important  ways.



First, we restrict our attention  in Chapter 7 to a group of individuals who



at  the  time of  the  1979  HIS had lived in their present location for at



least ten  years.   This is a more irresidentially stable" group than that



analyzed  in  Volune  I,  an important  consideration in  the epidemiological



investigation of  chronic illness.  In  addition, the individuals analyzed in



Chapter 7 are divided into two distinct  groups depending on whether or not



they received a special supplement  (or  "probe") on  respiratory  disease as



part  of  the  1979  HIS.'     Because   the  reported   incidence  of  chronic



respiratory disease  varies by a  factor  of six  between  those  who  received

-------
                                  1-3






the probe and those who did not, we felt the two groups should be analyzed



separately rather than pooled as  in our  original  analysis.   Finally,  this



reanalysis includes some model specifications in which ozone is measured by



the ambient concentration  averaged over all the monitors with ten or twenty



miles of each resident's dwelling.



     Finally, in Chapter 8 we report our responses to a variety of comments



or queries on Volumes I and II.'    None of these required the preparation of



a separate chapter, but each was important enough to merit consideration.



     One final  note  about Volume  III.'   Several of  the  chapters have  been



written to serve more than one  purpose.   For instance,  a slightly revised



version of Chapter 4 will  be appearing in the Journal of Urban Economics in



1986 under  the  title "Urban Air  Quality and Acute  Respiratory Illness.lf



Similarly, the material in Chapter 6 formed the basis of a paper presented



at the 1984 annual meetings of the American Economic Association in Dallas.



While  we  have  modified  them   for  incorporation  into  Volume  III,  some



material—particularly the brief descriptions of  the HIS and air pollution



data bases—will occasionally appear repetitive.

-------
                                Chapter 2



              ECONOMETRIC ESTIMATION  OF HEALTH STATUS MODELS








2.1  INTRODUCTION



     In  the  last  decade or  30,  estimation of  microeconoraic models  of



individual behavior  using large individual- or  ho use hold-level  data sets



has  flourished and  proven  an  important  advance  in   applied economics.



Details typically masked in  aggregate  time-series  data analysis are often



available in individual  cross-sectional data, thus enabling the testing of



hypotheses about responses of individuals to  changes in  constraints.



     In such micro datasets  one is prone  to find measures that economists



would characterize either as  corner-solution realizations  of instantaneous



optimizing decisions or as discrete representations of  such decisions.  An



example of the former  case would  be where  one has data  on  the number of



hours an  individual  worked in the market over a given  year,, and for some



subset of individuals no market hours were worked.



     An instance  of  the  latter case is  where data are available only on



whether or not  an  individual  had  purchased  some  consumer durable over the



previous twelve months, but  not on the  amount of the expenditure.  Assuming



such statistical models to be the  objectives  of estimation, then the former



is an example  of  what  have  come to be known as limited dependent variable



(LDV) models,  while the  latter  is a member  of  the class of qualitative



dependent variable (QDV) models.   Tobin's pioneering 1957  paper on durables



demand is the forerunner of  LDV estimation in economics.   Using data on 735



households,  Tobin modeled the ratio of durables expenditures to disposable



income;  for 183 of these spending units,  no durables were  purchased during

-------
                                   2-2
the time period of interest and a "corner solution" had to be treated.  As



is well  known,  the  solution to this problem was the  genesis  of  the Tobit



estimator,  which will  be  discussed below.  Note that if Tobin only had data



on whether  or not  there was  sane  durable purchased  rather  than  on the



actual amount, a QDV model  (such as  binary probit or logit) would have been



the appropriate approach.



     In  this  chapter  we discuss the theory  and practice of  econometric



estimation of LDV and QDV models as they pertain to health status measures



such   as   respiratory-related   restricted   activity   days,    or   the



presence-absence of a chronic  respiratory  condition.   It  is  seen that,



owing  to the  nature  of the available  micro  data,  standard  econometric



techniques  such  as  ordinary  least  squares  (OLS)   will  typically  be



inappropriate  tools  for the  analysis  of  the  relationships between air



pollution and human- health.  The available data on health status measures,



rather, are generally of a nature best described as qualitative or limited



dependent  variables.   This being  the  case,  more  complicated  estimation



techniques are in general required in order to obtain consistent estimates



of the parameters governing the health status outcomes.  Maximum likelihood



is the estimation method most commonly used  in such analysis.



     The treatment  here  is necessarily  brief.   However,  several excellent



surveys are available  for the reader who  wishes more detailed treatments of



the topics to be discussed  below. The 1981  and 1984 surveys by Amemiya are



excellent overviews of  qualitative  and  limited dependent variable models,



respectively, and the 1983  monograph by Maddala provides  broad coverage in



both  these  areas.    The  often-cited  1981  volume  edited  by Manski  and



McFadden is also an excellent  survey of  topics  in qualitative and limited



dependent variable estimation.

-------
                                   2-3
     Seme definitional  preliminaries  are appropriate here.  First, standard



practice  is followed,  with  random   variables  represented  in  upper-case



notation, their  realizations  in lower-case.   Second,  the  terms "censored



distribution" and "truncated  distribution"  will  be  used with considerable



frequency below.  The introduction to chapter 6 of Maddala (1983) provides



a good heuristic explanation of  censoring and truncation as they pertain to



the normal econometric  model.



     The plan for the  remainder  of  this  chapter  is as follows.   First, we



briefly  assess  problems associated  with  least squares estimation  of  air



pollution -  health  status  models.   Then we turn to  a  discussion  of some



techniques  that might  be   considered more or less  appropriate for  the



estimation  problems attendant   to  estimation  of  health  status  models.



Following  this  we   turn  to  a  discussion  of  prediction  based  on  the



estimation of the various models.  A  summary concludes the chapter.








2.2 SOME PROBLEMS WITH  LEAST-SQUARES  ESTIMATION OF HEALTH STATUS MODELS



     As  mentioned   above,   this  chapter  surveys   various  econometric



techniques  for  estimating   health   outcome models.    As  will  be  seen



throughout,   these  techniques  are   generally  such  that  iterative  (and



sometimes costly)  maximum  likelihood methods  are required in order  to



obtain consistent and efficient  estimates of the models' parameters.  Since



sound econometric policy  analysis depends  at  least in part on obtaining



consistent,   if  not   efficient,  parameter  estimates,  the question  is then



begged:   why is  it   necessary to utilize  such complicated  and  expensive



methods  when simple  and inexpensive least-squares algorithms  abound?  In a



nutshell, the answer is that least-squares  estimates of models of the genre



we are considering will generally be  biased and inconsistent.  The purpose

-------
                                    2-4
of this section is  to  briefly demonstrate why this is so.   To this end, a
brief exposition of the fundamentals of the basic linear econometric model
is presented, the requirements for consistent estimation of the parameters
are explained, and why at least seme of these requirements are unlikely to
be met  in the health  status  models to  be considered is  discussed.   The
exposition of the linear model  and its properties follows that of Schmidt
(1976), which is among the most lucid in published texts.
     Of  fundamental concern  is  consistent  and,  if  possible,  efficient
estimation of the parameter vector 8 in the case  where random variables Y,
                                                                        2
are distributed in some manner with finite mean u. and finite variance a .
Specifying w.=X.0  makes  the  problem nontrivial,  with X.  a  1xk  vector of
independent  variables  which  will   in  general  include  measures  of  air
pollution and other covariatea,  and 8 a kx1  vector of  unknown parameters to
be estimated.  Given these  assumptions, we can write
                 .,                                                 (1)
where, because E(Y.) - X. 3 and Var(I  )  »  a  ,  e,  has mean  zero and variance
 2
a .    The  unobserved  realizations  of  e.   correspond   to  the  observed
realizations  of  I., y. .   It  is assumed that there  exist  T  independent
observations on (y.,X.).
     The model  described satisfies full  ideal  conditions (Schmidt,  p. 2)
when
     i)  X is a nonstochastic matrix of  rank k
-------
                                    2-5
         matrix of  independent  variables,  y will henceforth denote the Txl



         vector of  the  realizations  y. .



     It  can  be  demonstrated  that,  with  or  without  the assumption  of


                                                 -1
normality for e,  the  OLS  estimator of  B,  B - (X'X)  .X'y, is consistent:



     i)  B -  (X'Xj'lx'y



           -  (X'X)~1X'(X8+£)



           -  B  + (X'X)"1X'e



       E(S>-  3  * (X'X)~1X'E(e)



           =•  8+0

                        A

           *  S, so  that 8 is  unbiased  for 3;


                                  "2—1
     ii) The  covariance matrix of 8 is  a (X'X)  .  so that, with all limits



         taken for  T-*•«••,



         lim  a2(X'X)"1  =  a2lim(X'X)"1



                       =•  a2lim(X'X/T)~1T~1



                       »  a2lim(cf1T~1)



                       -  0,



         because from  above  Q is finite nonaingular so  that  its inverse



         exists and is  finite,  and a  is finite;

                          ^.

    iii) Therefore, since S is unbiased and its covariance matrix vanishes



         in the limit,  then 8  is  consistent.



     Because  of  its   computational  ease,  least squares  is  obviously  an



appealing tool  for model  estimation.   The  analyst must assess whether any



or all of the above conditions  fail  to characterize the data or model under



consideration to see  if least squares  maintains its consistency properties.



Should least squares  prove inconsistent, alternative,  and generally more



costly,  methods of  estimation  must  be  utilized   in   order  to  obtain



consistent estimates  of 8.

-------
                                    2-6
     As  discussed  in  detail  below,  a  very general  characterization of


quantitative health outcomes measures  is that they are  data bounded from


below  by  zero,  i.e.  data  realized only  in nonnegative quantities.   Of


specific concern here are measures  like  "amount  of time spent ill."  Such


measures are generally modeled  econometrically as the censored or truncated

                                                                 *
counterparts  of  normally-distributed  latent random  variables Y.  having

                        2
E(Y.) - X.3, Var(Y.)  = a .   However, if the realizations  of Y. are censored


frcra below at zero, we have
                                                                     (2)
     E(Y*)
where   and $. are the standard normal  density and  distribution functions


evaluated at (X.B/tf).   In the truncated case,  where Pr(y.>0)  -  1,
     E(Y*) - X^ * ai/*i.                                           (3)
     When defined in terms of these expectations, the problems inherent in


least squares estimation become apparent.  Since E(a<<»./*.) * 0,  then E(e  )


4 0 when e.  is defined as the difference between either  E(Y..) or  E(Y.|y.>0)


and  X.8 in  (2).    Thus least  squares  regression  of  y  on X  will yield


inconsistent  estimates  of  3,   given  that   the null  error  expectation


assumption has been violated.  Heckman  (1976)  is a  good general  discussion


of such problems.


     Not all  measures  of  interest in  our analysis are  cast  in terms of


normally-distributed,  parti ally-observed  random variables.   In  the other

-------
                                   2-7
cases we shall investigate, there are yet different characteristics of the


data or  the assumed  statistical  distributions that  render  least squares


inappropriate given the objective of consistent parameter estimation.  For


example,  least-squares   estimation  strategy  is   generally  completely


inappropriate when outcomes are qualitative since no objective function of


interest can be east  in  terms of  linear expectations functions like those


above.     We now   turn  to  an  assessment  of  various  approaches  to  the


estimation of health status models.





2.3 TOBIT HEALTH OUTCOME  MODELS


     A  logical  starting  point  is  the basic Tobit  model.    The  nature of


several of  the  health status  measures of  interest  in the  micro data sets


being analyzed in  this study  is such that  Tobit estimation would seem—at


least at first blush—to be a sensible approach.  (See Osfcro  (1983) for an


application of Tobit to a similar  problem.)


     Tobit estimation has  been utilized in a  variety of areas in applied


microeconomics,  ranging  from   labor  supply (see  the excellent  survey by


Killingsworth (1983)), to  health economics (Ostro,  (1983)),  to commodity


demands or expenditures  (Tobin  (1957),  Pitt  (1983)),  and many others (see


Amemiya (1984)  for  an extensive bibliography).   The basic idea underlying


Tobit  estimation  is  that  one posits  the existence  of  (latent)  random

           *                                                       2
variables Y.  are  independently, normally distributed  (MID)  (X.S.a ).   In

                                              *
many interpretations of  the Tobit model, the Y., are stochastic indicators


of intensity of desire for undertaking some activity.  Owing to the nature

                                                      *
of the  activity,  however,  some realizations of the  Y.  are censored while


for  the  others,   the   intensities  are  mapped   directly   into  actual


undertakings of the activity.    Some  threshold,  in  effect,  is crossed such

-------
                                    2-6
that the activities are actually undertaken.  For example, the fundamental

                                                *
idea behind Tobin's seminal paper  is  that the Y. represent intensities of


desire  to  purchase  consumer  durables.    When  certain  (assumed  known)


thresholds are crossed,  these  intensities  become  actual purchases.  In most


applied  areas,  the  thresholds   are   zero,   so   that  the  mappings  from


intensities into undertaken activities  can  be looked at  as occurring when

                            *
the  realizations  of  the I.  occur in the  interior of  commodity  space.


Otherwise, corner solutions  obtain (for one  discussion of  estimation in the


Kuhn-Tucker/ corner-sol ution/Tobit context, see Wales  and Woodland  (1983)).


     Assuming,  then,  that  the  thresholds  are  known and constant  across


individuals, the basic Tobit model can be  described by (4):


     Y.* - NID(X, 3, a2)

                  *                                                <*>
     yi =• max(G, y. ) .


Setting C  - 0  gives the model we shall discuss in the sequel.  Letting fl-

                                                          *
signify the index set for observations for which  raax(0, y.) - 0,  and fl  be

                                                       *
the  index  set for  observations  for  which max(0,  y.)  >  0,   then  the


likelihood function for the  Tobit model described here is


               X 3       y -X  3
                                                                   (5)
In log form (5) is
         Z ln(1-*.) - |Q,|lna - Zin*.                                (6)
                 X                  x
                               iefl,


where  |»| denotes cardinality and where terms not involving  (3, a) are


dropped.


     The first-order conditions for maximizing I are the  (k  •*•  1) equations
     3H/36 = E (-X /a)X| * I (y^-X^X^/a  = 0

                  l

-------
                                    2-9
                       2              223                     (7)
             EA.(X.6)/a  + Z ((y.-X.S)   -  a )/./ (1-<&. ) •    Using terms  in these  equations,  the  method of

Berndt-Hall-Hall-Hausman (1974) among others,  can  be  used for optimization,
and statistical inference is  based  on the asymptotic t- tests generated by
           N        .
utilizing  [z(l.Z!)] .  as  the estimate of  cov(S)  (t. is  the  i-th term of
          1=1   1 l                                   1
[(34/38)',
     Several characteristics of the Tobit model are noteworthy.  First, as

Amemiya (1984)  points out, the likelihood function (5) can  be rewritten as
     L - Cud-*,) n *,1 Cn( <(>/*. a)]                                  (8)
Written in  this form, the  likelihood function of the  Tobit  model  can be

viewed as  the  product of the likelihood  functions of  a probit model with

parameter vector a »  (8/a)  (first brackets) and a truncated-at-zero normal

distribution with parameters  (S,a)  and  E(Y..)  -  X. 6  •*-  a«f./*.  (second

brackets).   As such,  separate maximization subject to the restrictions that

the  probit  parameter vector  be  a  positive  scalar  multiple  (specifically

1/a) of the parameter vector of  the  truncated normal model  yields the Tobit

model.   The probit  component can,  of course,  be  viewed  as  the model of

whether  or  not the  threshold   is   crossed,  while  the  truncated  normal

component  models  the conditional   phenomenon  of  the magnitude  of  the

activity given that the activity  is  undertaken.

     It  is  certainly reasonable  to  consider  the  possibility that  the

parameter restrictions described in the  proceeding paragraph are  in fact

invalid.   This  would indicate,   therefore,  that the  model  of threshold

-------
                                   2-1 0
crossing  is  not  as  intimately  related  to  the  conditional model  of  the



magnitude of the undertaken activity as is implied by the Tobit model.  In



the  context  of health  outcomes,  this could mean that  the phenomenon of



whether some illness  occurs  is governed by  a  set of parameters different



than that  determining the amount,  duration,  or severity of  the illness,



given that seme illness  occurs.   We discuss such issues in greater detail



later in the Chapter.



     Another characteristic of  the Tobit model  that merits discussion is



the fact  that  the  parameters  estimated under the assumptions of the Tobit



model are  in general  nonrobuat to  departures  from  many of  the underlying



assumptions.   That is,  violation in the  data of seme  of  the properties



implied when the likelihood function is  written in the form (5) will lead



to inconsistent estimates of the parameters  (S,a).   This phenomenon, which



is not  uncommon in many types of  models  that are  estimated  by means of



maximum likelihood, stands  in contrast to more familiar formulations such



as  OLS  and nonlinear*  least  squares where,  in  spite of a  variety of



departures from  the  assumed  ideal  structure of  the  error  terms,  one can



still obtain consistent estimates of the  structural  parameters.



     Two of the most often discussed violations that  bode dire  consequences



for Tobit parameter estimates  are violations  of the MID  assunption:  first,



that the  error variances are  nonconstant across observations,  and second,



that the error structure, though perhaps  homoscedastic,  is nonnormal.  Note



that normal, homosoedastic  errors are implied when writing the likelihood



function  in  the  form (5).  The results  of several  studies, summarized by



Amemiya  (1984),  suggest that  under  either type  of  departure,  the maximum



likelihood Tobit parameter estimates are  inconsistent.

-------
                                   2-1 1
2.4 CRAGG-CLASS HEALTH OUTCOME MODELS



     In a  1971  paper,  Cragg proposed a set  of  models for situations that



can be  depicted as follows.   An economic agent  makes  two  (simultaneous)



decisions.  A  dichotomous  decision  is made  about  whether or not to engage



in  some  activity.   Conditional  on an affirmative  for  this  decision,  a



decision  is  made  regarding how  much  of  the  activity  to  pursue.   The



activities can be  construed  in  the broadest  of  terms:   expenditures,



quantities demanded or supplied,  or  the  amount of  time spent  in ill health.



Such models have  come  to be known as hurdles models, that is, conditional



or some hurdle being  crossed, a  decision is made about seme magnitude of



interest.   Although these  processes  might in some cases seem logically to



be ordered in  a temporal manner, the statistical properties of  the model



abstract  from   any  temporal considerations,  the  quantity  decision being



described in terms of  conditional  densities.



     Cragg proposed several models.   However, because of the nature of the



present study,  only two members  of  this  set will concern  us  here, these



being the  formulations wherein  the quantity or  second-stage  decision is



defined only on the positive reals.   This is in obvious reference to ideas



like "given that  an individual had  some illness, how much  time  was spent



ill."    Although  Cragg's  other  formulations  are also  interesting,  their



discussion is  omitted  for economy of  space.



     For  notational  ease,  we  will  assume  that  the  same  vector  of



independent  variables,  X.,  influences  both  the first-and second-stage



decisions.  This is a  completely  innocuous assumption, however, as elements



of parameter vectors  can  be restricted equal to  zero to accommodate more



general  cases.    Regardless  of the   specification of the second-stage  or



conditional   decision,  the    first-stage   is   described  by   a   binary

-------
                                   2-1 2
probit  model,   i.e.   the    existence   of    latent    random   variables
 *          ?
Y..J -N(X. 8, , a")   is  posited.    Only  the  signs  of  the   realizations  are

recorded,  however, and are codified according to
           0,  y   < 0
Because of  this codification  scheme,  there  is  no information  about the
                                *                          *
scale of  the  random variables Y^ (i.e. the  mappings  of  y.,  into y,1 are
                                     *                   *  -
unaffected  by  transformations  of  Y(1  of  the  form  9Y...  for  Q  > 0) .

Therefore, some normalization  is  required, the most common being a. » 1 .

This formulation gives  rise to Cragg's formulation of the  hurdle-crossing

model, where,  with obvious change  from  Cragg's notation, we  specify
     Pr(yn - 1)  - XX^)                                          (10)

     Pr(y11 - 0)  - tf-Xj^),

where * is the  standard normal distribution function (Cragg uses C(*) for

*(•)).

     For  strictly  positive  second-stage  quantity  realizations,  Cragg

proposes two alternative formulations.   Both are based on the specification

of  the conditional  densities  for  random  variables  Y,.^  given  that  the

activity is in fact undertaken.

     The first formulation is  one  where the conditional  density for the

realizations of the '£.-  is truncated-normal,  with the truncation point at

zero.  Thus we have
                    yi2~Xi32     Xi32
                                  ~^),     y_2 >  o                   (11)

-------
                                   2-13
                = 0                    ,     else,




where $ and $ are as defined earlier.  With obvious notations! change from



Cragg's   article,   the   (unconditional)   likelihood   of   the   positive



realizations,  can be written as



     f(y.-)  = s(y<0|y-1=1) Pr>(y4, -1)  =
        ic.       id'  i i        n.


                                                                   (12)

                y 0.  Therefore,  the likelihood function of Cragg's first model is







                        7i2 "V  2          Xi82
     L - H *(-X.8)  II (	:—)  + ln«(X 8  ) - Ina - ln*(	). (
                                    ff            IT                ff
In the form (14), it is straightforward to see that maximization of I is fully



equivalent to the two-stage maximization problemr



     1) Probit estimation of the parameter vector  8  via maximization of
     2) Truncated-normal estimation of  the  parameters  (82»a^ via



        of
                     7~XS              XiS2
                                            ^-^-).                   (16)

-------
                                   2-1 4
Because  of  the  complexity of  the  log  likelihood  (I1*),  estimation  in this

two-stage fashion is likely to  be  somewhat easier than attempting to maximize

(14) with respect to the (2k+1 )   parameters (0  ,  8  ,  a).

     Cragg's second formulation again depends on the probit first-stage model,

but  the  conditional  density  of  the  positive  realizations  is respecified.

Instead of assuming that  the  conditional  density of the positive realizations

of  Y.2  is  truncated-normal,   the  model  is  now  formulated  such  that  the

logarithms  of   the  y    are normal,  i.e.  conditional  on  y...=0,  log(y._)  -

N(X.8_,0 ).  The conditional density for the  isfl.  is
                    (yi2a)
                          _,   log(y  )-X
where the  term  (yi2)  .  is  the  Jacobian  of   the  transformation from y,_ to

log(y._).   Therefore,  the likelihood for  the  iefl.,  which is Cragg's equation

(11), is

     f(y12) - Myl2|yll-l)Pr(yll-1)
                        log(y  )-X 3
                   \  »i /     *>^   *• ^*\ *./ v ^ \                        rift^
The likelihood function for the entire sample is

                            _,   log(y  )-X 3
     L- tt^-X.ft.) II (y.-a) !*( - — - L^-j^x.g.)               (19)
                        l£:           a            l
In log form,
                                 log(y  )-
                    8)) * E
              '            lefl,


          -Iny   - Ino                                              (20)

As in Cragg's first model, the second model can be estimated in  two  stages:

-------
                                   2-15
     1)  Probit estimation of  B-  as  above;


     2)   OL5  estimation of  (82»cr) using  the log  transform  of  the  y._  as


         dependent variables  and X. as  the independent  variables.    This  is


         perhaps surprising,  but  results because the terms  in (20)  involving


         (82,a) are  identical  to  those of  the  likelihood  function  of  the


         familiar  normal linear  model.


     Because of the simplicity of  this  two-stage approach, estimation in such a


framework is  obviously  appealing.   Duan,  et. al.  (1983)  have  proposed the


second Cragg model to estimate medical  expenditures:  individuals either have or


do not  have  medical  expenses, and  given that  they  have medical expenses, the

                                                                           2
conditional   density of  the  expenditures  is lognormal,  log(Y.?)-(X. S?, 
-------
                                  2-16
truncated-frcm-below distribution where  the point  of  truncation  is  constant



across observations and is assumed to be zero.   The results easily generalize,



however, and for  a discussion of the statistical  properties  of  the truncated



normal distribution in the most  general case, the reader is referred to Johnson



and Kotz (1970,  pp. 81-87).



     It should  be  noted  that interest  in  the  truncated normal  should  not  be



confined to the role  it plays  in the Cragg model.   The distribution is indeed



useful in many empirical situations.  Hurd  (1979) notes that







     (e)stimation  based  on  only  positive  y's   cornea  about  very



     naturally in a number of kinds  of studies.  For example, in many



     labor supply  studies one of the right-hand variables, the wage



     rate,  is  only  observed  when  the  left-hand variable,  labor



     supply, is positive.   Imputing the unobaerved wage rates causes



     a  number  of  complications  that can  be avoided  by discarding



     those  observations for which  labor  supply  is zero.   Another



     example is a  demand study where the price is  not known unless a



     purchase is made.  (Hurd, 1979, p. 248).







     For our purposes,  the likelihood function of  the truncated normal can



be constructed as follows.  We assume the existence of T. +• T  realizations


                                  2
of random  variables Y..-NID  (X.S.a ).   However,  for whatever reasons,  only



the positive realization of  the Y,.  are used in the analysis, these assumed



to number T .  Given these assumptions,  the likelihood function is






         T1
     L = IT (<(>./
-------
                                   2-17
where .  is the standard normal  density evaluated at  ((y. -  X.3)/a), and  *.

is  the  standard normal  distribution function evaluated at (X. 3/a) which

serves  as  the  normalizing  factor   of   the  truncated  density.    The

log-likelihood function (suppressing terms not depending on  (3, a))  is
         T                  2                 XiS
     A • I ~.5((y.  - X.B)/ar ~ log a - log  *(-£-)                   (22)
        i-1  •
Estimation is by means  of  maximum likelihood.  The first-order  conditions

for a maximum of I are
             T1r~*l    7i"Xi8
     31/36 =• E ![•— * (   2  )] X[  - 0
            i="1    i      
-------
                                  2-13
     Olsen's method relies  on a method  of moments technique  whereby the



moments  (specifically the  mean and variance) of  the  empirical incomplete



distribution that  of  the positive y. , are related to the moments  of the



complete distribution via  formulae developed  by  Pearson and  Lee (1908).



Extending  the  Pearson-Lee  methodology  to  the  multiple regression  case,



Olsen  demonstrates  that  the least squares  slope  coefficients  differ from



the true slope coefficients by a common factor, and he presents in tabular



form  the multiplicative correction  factors needed  to transform  the OLS



estimates of the slope,  intercept,  and standard error parameters (based on



data  from   the   incomplete  distribution)  to  the corresponding  complete



distribution estimates*   In practice, we have fitted polynominal functions



of the third degree to Olsen's tabled data so that the transformations are



facilitated.



     Olsen    also   presents   the   multipliers   for   transforming   the



(mean/standard error)  ratio estimated  by  OLS on the incomplete  distribution



to  the corresponding ratio  of the complete  distribution,  (u/a).   Olsen



notes that $(u/
-------
                                   2-19
2.6 HECKMAN'S APPROACH;   SAMPLE SELECTION

     A very popular technique for estimating models with limited dependent

variable estimation is the sample selection model, attributable largely to

Heckman  (1976,   1979).    The  model  has  a  number  of  applications  (see

Heckman's  1976  article  in  particular), and  is  quite  easy  to estimate.

Because it is so well-known, we will  only provide a sketch of the details.

The following section, which contrasts  and  compares  the  Tobit,  Cragg,  and

Heckman  models,  sheds   some   more  light   on  subtleties  of  Heckman'3

formulation.

     Heckman considers the following two-equation model:
                                                                   (24)
                  £i2
It is assumed that e... and e.? are distributed, j.oint normal, with marginal
                2     "        2
densities N(0, a ) and N(0,   o
                                                                   (26)
         - o,  y,- < o.
In Heckman's model,  the realizations y..  are available to the analyst only
      *
when y   > 0,  i.e. when y._  »1 .

-------
                                  2-20
     A concrete example  is  where (24)  is a  model  determining market wage


rate  (or log(wage  rate))  by a linear function of X.  and random error and


where  (25)  is  a model determining hours  of  labor  supplied in the market.


It  is assumed  that  either  hours of labor  supplied or  a  discrete binary


indicator of whether  or  not any hours were  supplied is available for all


observations.    However,  because market  wage rates are  only  observed for


individuals for whom  the  market  wage rate exceeds  the reservation wage at

                                                     *
zero hours, data on the y...  are available only when  y.?  > 0 (y.? - 1).


     Heckman then  considers  the  expectation E(Y  ]y   » 1),  which can be


written as
     E(Yillyi2 '1) 'Vl  *E(sillyi2 - 1)'                        (27)
If  one considers least-squares estimation,  of  (27), the question  is:   Do


there  obtain  consistent estimates of  8.  when  y.  is  regressed  on X.  for


those  i  for  whom y,_-1 ?   Basically the issue is whether the expectation


S(e  |y    •  1)  is null.   In general, and  thus  at  the core of the sample


selection  bias problem, the answer is "no".  Based on  well-known formulae,


it holds that
     E(eil'yi2 = 1) " ai2VB2n~*i)f                               (28)




where  . ,  (1-*.), and a., are  all  positive, then least


squares estimation of  (27)  will be based on an  expectations function with


nonnull  disturbance expectation,  and  will  therefore  yield  inconsistent

-------
                                   2-21
estimates of 3. .


     Heckman's  suggested  procedure  in   this  situation  is  as  follows.


Estimate on  the  entire  sample  a probit model for  the discrete indicator


representation of the model (25).   This yields a consistent estimate of the


parameter vector (6 /a_) from which consistent estimates of \. * ./(1-$.)


are  constructed.   Form  the Tx(k-M) matrix  Z  » [XJA], where A is  a  Tx1


vector with typical element \. ,  and regress y. on [x. ,  i.].  This procedure


yields  consistent  estimates  of  the  parameters 3.  and  (a12/a?),  having


effectively  solved the  omitted variables problem  by using  a  consistent


estimate of E(e  |y.2 » 1)  as a  regressor.

                                                                *
     In the context of health outcomes models, one could define y.2 as some


latent index of the propensity to be ill.  Given that  this index is greater


than some threshold level,  illness results, its magnitude determined by the


realization  y  .    The  translation  of  the latent  illness model  into


Heckman's framework is not  straightforward, however.   For those individuals


not reporting- illness over the sample interval f  we observe zero time spent


ill  rather  than  not  observing  the amount.   It is  therefore  difficult to


interpret the meaning of the realized, but unobserved, y   for the healthy.


We turn in the next section to a more detailed analysis of such subtleties.




2.7 TOBIN,  CRAGG, AND HECKMAN;  A DIGRESSION


     As there are some similarities between and  among the models described


above  and  identified for  expositional parsimony as the models  of  Tobin,


Cragg, and Heckman,  it  is  appropriate to  summarize  their  similarities  and


differences  and  in so doing to  elucidate the circumstances  in  which each


model  is more or less appropriate.  (The  discussion of Cragg1 s model here


is  the second Cragg  model  (probit/truncated-normal), as  that  version is

-------
                                  2-22
most similar to the others  discussed  here.)

     First to note is that the Tobit model results as a restricted version

of  both  the  Cragg and the Heckman models.   The reason for  this is purely

mechanical, however, and should  not  be taken to  imply that the Cragg and

Heckman  models  are in  general  identical.   As we  will  see  below,  these

models are structurally quite  different.

     To  see  that  the  Cragg model   reduces  to  the  Tobit,  the  Cragg

log-dikelihood  function  can  be  written  (following  Lin  and  Schmidt  (LS)

(1984)) as

                                           X 3
      4 - £ ln*(-X 8,)  + E [lirtU^ ) - ln*(-~)_

                                                                   (29)
         (1/2)ln(2H
-------
                                   2-23
excerpt from LS provides a  particularly  cogent  summary description of the

appropriateness of the restricted (Tobit) versus the unrestricted versions

of the Cragg model:


     (I)n the Tobit model any variable which increases  the probability
     of a non-zero value must also  increase  the mean of the positive
     values; a  positive  element of  8  means  that  an  increase  in the
     corresponding variable  (element of  X.)  increases both Pr(y.>0)
     and E(y |y  > 0).   This is not  always reasonable.  As an example,
     consider a, hypothetical sample  of buildings, and'suppose  that we
     wish to analyze  the  dependent variable "loss due to fire," during
     some  time period.    Since this  is  often  zero but  otherwise
     positive,  the Tobit model  might be  an obvious choice.   However,
     it  is  not   hard  to  imagine  that   newer   (and   more'  valuable)
     buildings  might  be  less  likely to have  fires,   but might  have
     greater average losses  when  a  fire did occur.   The Tobit model
     can not accommodate  this possibility.

     Another problem with the  Tofait  model  is  that  it links the shape
     of  the  distribution  of  the  positive  observations  and  the
     probability of  a positive observation.   For rare  events (like
     fires),  the   shape   of   the   distribution  of   the   positive
     observations  would have to resemble the  extreme  upper*  tail  of a
     normal,  which  would   imply  a  continuous  and  faster  than
     exponential  decline  in  density as  one moved  away from zero.
     Conversely,  when  zero  occurs  less  than half of the  time,  the
     Tobit model necessarily implies a non-zero mode for the non-zero
     observations.

     Cragg's model avoids both of  the above  problems  with  the Tobit
     model.  A reasonably strong case can be made for  it as a general
     alternative to  the  Tobit  model,  for analysis  of data sets  to
     which Tobit is typically applied—namely, data sets in which zero
     is a common (and meaningful) value of the dependent variable, and
     the non-zero observations are all positive.  The  distribution of
     such  a  dependent  variable is  characterized  by  the probability
     that it equals zero and by the  (conditional) distribution of the
     positive observations,  both of which Cragg1s model parameterizes
     in a general  way.   (LS,  pp. 174-175  )

     Turning now to  Heckman's  formulation, his  two-equation model  is seen

to reduce  to the  Tobit model  as  follows.   Recall that the model  can be

written (with notational  changes obvious) as
                                                                   (30)

-------
                                  2-2 U
       2= Xi82
 *
Y   is a latent variable, however, and only a discrete (0,1) sign indicator


of its  realization  y\2  is  available,   y   is observed only when  y.»  = 1.


Letting 3- =8-  and  e.  =e.?  (i.e.  the error structure  is  univariate rather


than bivariate), then the  Heckman model  is  the  standard  Tobit model.   The


logic  is  that  when  these  restrictions  are  imposed   in  the  Heckman


two-equation model, the remaining single equation plays  both the censoring


and the determination-of-intensity roles.  Since the censoring occurs  as a

                                                             *
result of a non-positive realization of the random variable Y.2, the Tobit


requirement that the quantity or  intensity realization be  confined to the

                                                                        *
nonnegative orthant is automatically satisfied when the restriction y-i^Viy


(i.e.  S-i-6-,   en*e-2^  is  imP°3ecl-   In  general,  however,  the  Heckman

two-equation framework  is  not specifically  designed  to model  situations


where realizations  of  the dependent variable of  interest  are necessarily


nonnegative and  are recorded for all  individuals/observations, and where


Pr(y.aQ)  > 0.   Heckman's formulation has y./O  except on a set of measure


zero.  We turn now to an explanation of the fundamental differences between


the  Heckman  two-equation formulation and the  two versions-of-rinterest of


the Cragg model.

                                                                       *
     The two-equation Heckman model describes two phenomenon,  Y   and Y.o,

                                                                    2
that   are  marginally,  distributed,  respectively,   as   MID(X. B , a ) and

          22
NID(X.32,0-)  (a? is usually restricted - 1 for  normalization when only the

          *
sign of  y.~  is observed).    The  joint  distribution is bivariate NIDCX.S^


Y a     2    2
 i82f  °1 •  a-| ,   P) »  where p is the  correlation  of  (s.,.  £jO»  (a^/a^),


which is  in  general nonzero.  The  important  point  is that these marginal


and joint distributions are unconditional.  That is, for all i,  there  exist

-------
                                   2-25
realizations (y..., y.p)  although  the realizations y.. for  some i will be


unavailable to the researcher.  Casting the problem concretely in the area


where Heckman1s  model  has been most  fruitfully  applied,  labor economics,


sheds further  light on the  subtleties  of  his  model.   Here  we  define

                  *
y. =log(W.) and  y.~=log(H.+1) , where W.  is wage  earned in market work and
 !• 1      i        \(~      L ,            C
                                    *
H. is hours of market work.   Thus,  y.2  is positive only if market hours are


positive.   It is posited that the expected values of  both Y.   and Y._ are


linear functions  of  personal characteristics and other  variables  so that


the  two-'equation model  results.    However,  because  we  only  observe  the


market wage for  those  individuals actually participating  in  market work


(those for  whom  H.>0),  some subset of observations will  not have data on


the y...   There is a market wage determined for nonparticipants; whether or


not such  individuals have  knowledge of  their market  wages  is immaterial.


The  relevant  analytical fact is  that  such data  are unavailable  to  the


researcher.


     In  this  labor supply framework,  it  is  apparent why  the estimation


techniques  developed for  the  two-^equation Heckman  model   and discussed


earlier in  this  chapter  have  such  appeal.   The more immediate concern, of


course,   is  whether such techniques  are  in  fact  appropriate  to  the


estimation requirements of the present analysis.   In a nutshell, Heckman's


model is  one  where there are two  equations  of  interest,  both holding for


all  i  unconditionally,  and   where (except when  restricted so  as  to  be


identical to a  Tobit model)  the probability of  observing  realizations of


the dependent variable  equal  zero is  zero.  Does such a formulation capture


the essence of  the "corner solution"  problems of the health status outcomes


phenomenon?


     It   seems  rather  artificial   to  cast such  phenomena   in such  this

-------
                                  2-26
framework.    It is not  generally the case  with the generation  of  health



outcomes data that one  can  posit  the  existence of some latent variable such



that data for the  illness raeasure(s)  of interest are only available given a



positive realization of  the latent  variable.    Rather,  the  processes  of



interest here  are represented more  typically  by  data that  indicate  the



realizations of  illness  outcomes for  all individuals, even  though  these



realizations are quite  frequently on  the boundary of the "consumption" set.



In  sketching some  of  the  differences between  the Heckman  two-equation



formulation  and the  Cragg models with  particular  reference to  data sets



where the zero or  corner solution outcomes are meaningful and where nonzero



outcomes are strictly positive,  LS observe  that  in such cases the Heckman



model's  assumptions  are not particularly representative of  the situation



because in the Heckman  formulation:
     — the observed values  of  y. ^  need not be positive, in the sense



     that the model  implies  a non-zero probability  of  observed y   <



     0; and  the  unobserved y   are literally unobserved,  rather than



     observed as equal  to zero.   The first  of  these problems can be



     circumvented,  for example,  by measuring y.. in logarithms,...and



     the second problem is in any  case fundamental.  (LS p. 175).
We  turn  now to a  discussion of  how the Cragg models  differ  in substance



from the Heckman two-equation setup and argue  that the Cragg formulations



are relatively more suited  than  Heckman1s  model  to the nature of a subset



of our estimation requirements.



     Although  like  the  Heckman   formulation  in  being  a  "two-model"



specification, the fundamental point  of departure for  the Cragg technique

-------
                                   2-27
is  that  one  of  the  two  models  is  formulated  in  terms  of  conditional


expectations.    The conditions  on which  the expectations are taken are, as


described above, the outcomes of unconditional models, which are generally


stated as binary representations of latent random variables.  Thus, in the


context of health measures,  there is an  unconditional model defined for all


individuals  determining   the   binary  outcome   (illness,   no   illness).


Conditional on an "illness"  outcome, the quantity  or duration of illness is


determined  either  by  a   lognormal   or   truncated-normal   model.     The


unconditional  likelihood for a  representative  ill  individual is then


     density(illness  duration given some illness)*Pr(some illness), (31)


which  is equation (12)  as  specified earlier.   There is  no  density of the


quantity of illness defined for the healthy, unlike Heckman's formulation that


defines such a density for  all  individuals.


     Deaton and Irish (DI)   (1984), in  an independent  line  of  investigation,


have   purportedly  cast   Cragg's  first  model   in   a  two-equation  Heckman


formulation.  They indicate that a positive observation on the quantity measure

                                                                        *
of  interest is  made  when,  in the notation used earlier, both Y.    and Y._ are


realized as positive,  else  a zero or a nonparticipation results.  In two cases,


DI specify
                                                                   (32)
     7
Cast thusly,  the Cragg model  can be  viewed as a Heckman two-equation model, but


with a restriction imposed that  is absent in Heckman's formulations. That is DI


seem  to  have  ignored  one  aspect  of  the  Cragg  model  that  is  key  in

-------
                                  2-28
                                                               *
differentiating it  from  Heckman's specification,  viz.  that  y.?>0  is  both a



necessary  and  sufficient  condition   for  a  positive  realization  of   y...  to


                           i  *                     *
result.   That  is,  Pr(y.  >0|y.2>0)  - 1,  Pr(y.  Ojy_2<0)  = 1.  When, and only



when the first hurdle  is  traversed is  there a positive amount of the activity



undertaken.  So  DI's  statement  that  positive realizations  of  both variables



determines whether y.  is observed  positive is somewhat  misleading in  that a



positive realization of either suffices to assume the positivity of the other.



Neither of Cragg's specifications, then,  is  really  in  the spirit of the model



proposed  by  Heckman except,  of  course,  when  both  the  Cragg model and the



Heckman   two-equation  formulation   are  restricted   such  that   the   Tobit



specification results.



     Owing to  the  subtleties of  the  arguments,  it  is likely that the above



discussion has provided  somewhat less  than  a total clarification  of  all the



relevant  issues.   Some of  these shortcomings are due  to the fact that even



central participants in the academic debates appear still  unconvinced about the



nature  of  the  differences among  the  estimation techniques.   For example,  as



noted  earlier  Duan  and  coauthors  (1983)   have  used the  Cragg  estimation



technique  to  model   individuals'  medical  expenditures.    The   expenditure



decision, in the spirit of  Cragg's specification,  is statistically modeled as



two separate processes.  Model one determines the binary  outcome of whether or



not  any  expenditures  will  occur,  and model  two determines  the amount  of



expenditure  (positive  by  definition)  that results conditional on there being



some expenditure.   In this paper,  Duan  and coauthors  assert that the covariance



between the error terms of the two models  is  irrelevant insofar as construction



of the likelihood function is concerned.



     Recently,  however,  Hay  and  Olsen  (1984)  have questioned  the Duan and



coauthors  method,  stating  that  this  approach  "requires some  fairly   unusual

-------
                                   2-29
assumptions on  the model  joint  error  distribution  and  functional  form  (p.



279)."   Moreover,  Hay and  Olsen  go on  to claim that the  Duan and coauthors



formulation "can  be interpreted  as  being nested  in the more  general sample



selection models (p. 279)."   Duan  and coauthors respond that Hay and Olsen "are



incorrect in  claiming  that  our models are nested  within  the sample selection



model," and that "the  conditional  specification  in  the multi-part (i.e.,  Duan



and coauthors)  model  is  preferable to the unconditional  specification in  the



selection model  for modeling actual (v. potential) outcomes  (p. 283)."



     As  we argued earlier,  the  sample  selection  or   Heckman  approach  is



particularly  fruitful   when  analyzing   phenomena   such  as   labor  market



participation.  Quoting Duan and coauthors:








     For   certain  empirical   problems   such   as   labor   force



     participation,  the  primary  goal  might  be  to  predict  the



     potential outcome instead  of  the  actual outcome; therefore, an



     unconditional specification such as the sample selection models



     might be preferable.  For the present application,  however, the



     goal  is  to   predict  the  actual  expense,  not the potential



     expense;  therefore,  the  unconditional  equation...  is of  no



     direct  interest,  and   the  preference  for  the  unconditional



     specification in  the other empirical problems does  not  apply to



     the present application,   (p.  286).








     In  any event, this  discussion  demonstrates that there  still  exists



some  confusion  on  these points  in the  published  literature.   We  have



attempted to be  as thorough  as  time and space permit  in hope of emphasizing



one  extremely  important  message.   That  is,  it  is essential  that  the

-------
                                   2-30
researcher  be  intimately  familiar  with the  behavioral  and  statistical

structure of the  models  of interest in  order  to avoid being swallowed by

the  slippery  quicksand we  have  described.   The nature of  health status

measures  as  conditional  or  unconditional  and the interpretation  of  any

latent  variables  in  the  model  must  be quite  clear  before  the  correct

estimation technique can  be selected.   When, and  only when, such issues are

in order  is it possible  to make  sense of the estimated obtained and their

relevance to benefit estimation.

     It  seems  that  the  logic of  the health  status   outcome  measures of

interest  in   this   study  is   better  captured  in   terras  of  Cragg's

specifications . than  in  the  Heckman  two-equation  model   although  this

question is obviously still open to informed debate.   The specification of

the  magnitude-of-illness  model  as   a  conditional  model  is,  however,

intuitively plausible, and Cragg's  formulations  provide a natural vehicle

for  translating such intuitive plausibility into an econometric framework.

However,  it so  happens  that the  assumption of normally,  or at least

continuously,  distributed  random variables,  which characterizes the above

models,  is  not  necessarily  appropriate insofar  as  count  measures  like

"days"  or "times"  are concerned.    To  a discussion  of  some alternative

estimation techniques that  might  be used  in such  situations we  now turn.



2.8  PQISSON-DISTRIBUTED HEALTH OUTCOME  MEASURES

     In  modeling  event counts (non-negative integer data)  over some time

interval  (t,  t+dt), the Poisson distribution  is  commonly used.   Here, a

random variable Y^ follows  the probability law


     ?r(Y  = y) = exp (-X.nT/y!, yetO,1 ,2,...}
          1               l   "                 '                       (33)
                = 0             , else

-------
                                   2-31
with    j


     It happens that  there  exist  health outcome data of interest that are


recorded  as  nonnegative  integers,  most  obviously as  counts of  days  of


activity restriction.   For  any  individual,  such measures can, over a time


interval (t, t+dt), say one two-week period, assume only integer values in


{0,1,2,...,14}.   Because of the paucity of observations likely to be found


at the  upper  (14  day) limit, we  ignore the  fact that these measures obey


upper bounds and concentrate instead on the complications presented by the


large  number  of   individuals  who  in  a  typical  random   sample  of  the


population report  zero days of  restricted activity.


     Analogous to  the familiar normal  distribution where  for econometric


work one typically specifies  u.  - X.S,  the  \. parameter  of the Poisson


distribution  can.   be  reparameterized    to   admit    the   influence  of


covariates.  Since  for  all i,   \.  > 0, a straightforward  approach is to


assume A. » exp(X. 3)  and to estimate S  by maximum  likelihood  (see Hausman,


Hall, Griliches  (1984),  Hausman,  Ostro,  Wise  (1983),  Portney and Mullahy


(1985)).   This  is  the  approach  adopted  here  for  modeling the restricted

                            "%
activity day outcomes.


     One  drawback  of  the Poisson model  is the  restriction that  E(Y.)


» Var(Y.).  Should this restriction not in fact  characterize  the data, the
                                                          >*
maximum likelihood estimates of the covariance  matrix of 3 based on minus


the inverse of the estimated Hessian will  be  inconsistent and  t-tests based


thereon  would  be  misleading.   Hausman,  Ostro,  and Wise  circumvent  this


restriction by  allowing  for an  overdispersion  parameter.    A  different


approach is used here, using an estimator of the covariance matrix that is


robust  against  departures  from  the  mean»variance  restriction,   this

-------
                                   2-32
procedure described below.



     Given T  independent observations,  the log-likelihood function of  the



Poisson health outcome model  can be written as
     i = I -exp(X.S)  + y.X.3 + C,                                    (34)

         i       l      x



where exp(X.S) »  X.,  y.  is the observed count of illness days, and C  does



not depend on  0.   It is obvious that  i  is concave  in 6.   The  first-order



conditions for the maximization of  i are
     3A/3B = E -exp(X.S)X!  + y.X!  =• 0                                (35)
             •        J»   J.     i, J.




with the maximum guaranteed by the condition





     32l/363Bf  - Z -(XlX.)exp(X.S)                                  (36)
                 L    i i      i




negative definite.



     The maximum likelihood estimates of 0  obtained by maximizing  (46) are



consistent,  but  the  estimate of  the  covariance  matrix  of  SM,   using


   2-i               "
[-3 1/3B36']   evaluated at 8Mr  will be inconsistent  if  the data are not  in
                             nt.


fact generated by the specified Poisson distribution.



     This  is  most easily  seen as  follows.   Note that the  model  can  be



equivalently  cast as  a  nonlinear  least  squares  regression,  the  i-th



observation being
                                                                    (37)
          exp(X.[S)
with E(e.) =• 0.  Clearly, var(e.)  =  var(Y.)  = exp(X.S),  so that  the  e.  are



heteroscedastic.   If nonlinear  weighted least  squares  is  used with  the

-------
                                   2-33
weights exp(~X.8) formed  using  consistent  estimates of 3, and  if  the  data

are  in  fact  Poisson  as specified,  the  maximum  likelihood  consistent
                        ^                                     *\
estimates of 8 and cov(B) will  obtain.   (The  consistency  of  3MT  for 3  does

not  depend  on  the   weighting  scheme.)    However,  if   the  data  is   not

Poisson-distributed,  the estimate of cov(3)  obtained in this  manner will be

inconsistent and asymptotic t-tests based thereon will be misleading.   The

case is  fully analogous  to  the estimation of  the heteroscedastic  linear

model  which  yields   inconsistent   covariance  estimates  (and,   therefore,

t-statistics)  if the  heteroscedastic nature of  the error  structure  is

either ignored or incorrectly specified.
                                                                         «*
     Royall  (1984) has demonstrated a method whereby estimates of  cov(3)

robust against misspecification of  the underlying distribution  of  the  data

can  be obtained  for  various  distributions,  including the  Poisaon,  when

   2-1
[-3 1/3838'] -  evaluated  at   &ML fails to  yield  a consistent  estimate  of
    *•                        2
cov(B).  Denoting 1(8)  as [-3 1/3638'], Royall's suggestion is  to  estimate
    A
cov(B)  as
                      31.^/38)']I(8)                                  (38)
where  I.  is  the  i-th observation's  contribution to  the  log-likelihood
                                                            /*
function and where all relevant evaluations in  (38) are at  3M,.  This will

be the  approach  adopted  in empirical  implementation  of  the Poisson model

the present study.
2.9 GEOMETRIC-DISTRIBUTED HEALTH OUTCOME MEASURES

     One alternative to the  Poisson model for  the modeling  of  count data  is

the  geometric  distribution.    Though  seemingly  not  as   often used   by

-------
                                   2-34
econometricians as the Poisson,  the  geometric  is  a logical  choice should an

alternative to  the  Poisson  be desired.   Furthermore,  the basic geometric

specification does  not suffer from the  mean=variance  restriction that is

implied in the basic Poisson model.  As will .be seen  below, the variance of

a geometric-distrbuted discrete random variable  is  greater than its mean,

although the fact that the variance  depends on the mean limits somewhat the

flexibility of the distribution.

     Our  description  of  the  properties  of  the  geometric  distribution

follows that  of  Johnson  and Kotz (1969).  First,  it should be noted that

the geometric  is  a special   case of the  negative binomial.  Discussion is

confined  here to  the geometric because it  is  computationally  far more

straightforward  than  is  the general  negative  binomial.    The geometric

distribution is defined as follows:



     Pr(X-k) - Pk(H-Pr(k*1), k  -0,1,2...                           (39)

             « 0            , else

with  P>0.   It  holds that  E(X)  -  P and  Var(X)  =• P(1+P).   As  in the

econometric  specification of  the  Poisson model considered  earlier,  one

allows the P  to vary  across  observations as P.,  and  again  P. =• exp(X.S) is

a sensible parameterization  due  to  the required positivity  of the P..

     Given this, the likelihood  function  for T independent  observations can

be written as
        T
    L = H exp (k.Xlft)(1  + exp(X.3))~(!V1)                           (40)
       i=-1                     L

with loglihood
       T
       I
- Ui-H)  log (1  H- exp(xiS))                           (41)

-------
                                   2-35
where -k.  is the observed count  for  the  i-th  observation.  The ML estimate 3

satisfies
             T
           - l£k.  - (k.  +1)  exp  (X 3)/(1  +  exp  (X  B))3X! - 0       (42)
            1-1  *•

The Hessian is

                    T
     H - 321/3B3S'- I -(k  +  1)Cexp(X 3)/(1  *  exp(X.S))2]X!X  ,      (43)
                   1-1    1            x              111

which  is  seen by  inspection to  be negative definite.   Because  it  is a

fairly uncluttered expression, estimation and inference can proceed using

-H as  an  estimate of the information matrix  and  (-H)  .  as  an estimate of

the covariance matrix.   Unfortunately, much  like the Poisson  specification,

the covariance estimate thus  obtained is not robust to departures from the

data  being  in fact  geometric.    However,  the methods  proposed  by Royall

(1984) and  described for the Poisson model can be used for the geometric

distribution  also.    As the  development is  identical,  the  details  are

omitted for economy of  space.
2.10 MULTINOMIAL-DISTRIBUTED  HEALTH  OUTCOME MEASURES

     One type of micro  data  of  particular interest in health econometrics

is of the following nature.  We observe over the course of some fixed time

period  (say  one two-week period) the  number  of times (say  days)  that an

individual's health  status  is  characterized  by (k-1) mutually exclusive

illness outcome measures  and,  therefore, the  number of  days  on  which no

illness  resulted,  which  can  be viewed  as  the  k-th  activity.     To  be

-------
                                  2-35
concrete,  the  two-week  illness  profile  for  some  individual  who  has  in

his/her illness "possibility set" two illnesses (minor restricted activity

day (=M),  and severe restricted  activity  day  (=S)), and healthy days (=H)

(=14-M-S)  might look like

     H - 11

     M -  2

     S =  1

     Given observations on such health outcome profiles, it is appropriate

to view the data characterizing individuals' health status as realizations

of  multinomial   random   variables   (see  Morey   (1981)   for   a  related

discussion).   Recalling  from  discrete statistical theory,  the multinomial

distribution of a random variable Y. with  parameters (T; P.,..., P ) can be
                                  "*                      '   - ,   «V

written
                    k   t
     Pr(Y - y) - T!  H (P.J/t !),                                    (44)
        "          j-t   J   J


where  T  is the  number  of  trials  (here days),  the  t.  are  the  number of

occurrences of the j-th outcome,  and P.  are the probabilities that the j-th

outcome will  occur on  a single trial.   To extend the statistical model to

the health  status measures, we  consider  each daily outcome  as  one trial

from a multinomial  distribution  with individual-specific parameter vector

for the ra-th  individual  (T  ;  P.  ,...,  P,  ).   Assuming  T  = T' - T for all
                          mi.      k   .             m    m

ra, m',  we henceforth drop the subscripts on the  T parameters.  The profile

for two weeks, then,  is the  14 (by  assumption  independent) daily trials for

each individual.  The  econometric  objective  is the estimation of the P.  ,
                                                                       Jm
i.e. estimation  of  the  probabilities  of  realizing one of  the k possible

-------
                                   2-37
outcomes on a given day.

     For  computational  simplicity,  we  proceed  as  follows.    A  logistic

distribution for  the daily outcome  probabilities  is assumed.   Thus,  the

probability that the outcome is Z on any trial  is
     PZ  =* exp(Xm3z)/ I exp(X & )                                    (45)
       m             jefl


for  Zefl={M,S,H}.     The logistic  distribution  assures  that  for  all  m

the multinomial requirement (Z P.  =-1)  is met.
                            jeQ Jm '
     Since the  probabilities (45) are  unique only up  to a difference in

parameter  vectors   (8 -8.,),   some  normalization   is  required.    The

normalization most convenient and easily interpreted i3 &„  - 0,  so that 8,.
                                                          ti               M

and  fj_  are  interpreted  as  differences  between  the  respective  illness
      O

parameter vectors and the no-illness parameter vector.

     The objective,  then, is estimation of  the parameter vectors  3.. and 8 .
                                                                  M     o.
This is,  of  course,  fully analogous  to the widely-used multinomial logit

model where a  single outcome from a set of mutually exclusive- outcomes is

considered.   In fact, that  case  is merely a  special  case  of the  present

exposition for which T  « 1  for all  m.
                      tn

     Estimation is by means of maximum  likelihood.  Assuming the  existence

of N independent  profile draws  from the population,  the  likelihood of the

data as a function of the parameters is


            N               M        fcj
     L(B) - H Pr(y  - y) -   H T!  H (P.  /t.!)                        (46)
           m-1    "m    ~    m-1   jefl   Jm   Jm

where the P.   are as defined in (57)  and where ft  is the  illness-type  index
           Jm
set.   In log form,

-------
                                   2-33
            N
     1(6)  -I   St.  log P.   +  C,                                   (47)
           m=1  jeQ Jm     Jm

where C  is  a  constant not  depending  on  3.   Given  the  assumed logistic

probabilities,  we have
            N
     US)  - I   S t  [X 8  -  log  (  E  exp  (X  0  ))] + G.              (48)
           m=1  jefl Jra    J         ksft

Maximizing (48) can be accomplished with  only  a slight modification of most

existing (single-trial) multinomial logit programs.
2.11  ESTIMATION OF GROUPED DATA MODELS UNDER  THE NORMALITY ASSUMPTION

     There are often institutional or other constraints in the sampling or

data-recording processes  that  have the effect of  generating inexact data

for research  purposes.    A  common case is the  situation where continuous

measures of interest, such  as  the amount  of  time spent in ill health, are

cast in the recorded micro data as grouped or interval data.  We discussed

above strategies that might be  considered when the outcomes are recorded as

"number of days" or "number of  times," i.e. where the data can be viewed as

realizations   of   discrete   statistical    processes   rather   than   as

discrete/integer codings  of  fundamentally  continuous processes.   In this

section  we concern  ourselves   with  the situations where  the  underlying

processes are best viewed as continuous phenomena but where the vagaries of

either the sampling or  data-coding procedures  are  such that only a finite

number  of  intervals  in which  the  continuous  measure is defined  are

determined and the only  data available to the analyst are indicators of the

interval bounds in which the (unknown) continuous measure is  realized.  For

example, the  latent continuous  measure might be  "time  spent ill over some

-------
                                   2-39
time interval y  (say  t),"  but  owing to whatever reasons, all one knows  is



whether t-0,  te(0,4 days],  te(4  days,  3  days],  or  tc(S>  days,  365  days)  (for



y=one  year).   The  purpose of  this section  is to  present  an  estimating



technique designed to  handle such situations.



     The method is based on the work of Rosett  and  Nelson  (RN)  (1975), who



developed what is  known  as  the  two-limit probit estimation technique, and



of  Stewart   (1983),  who   generalized   the  RN  method  to  account  for



multi-interval data.   We will, therefore,  refer to the  model  expounded  here



as the RNS method.   Here is posited the  existence  of  normally-distributed


                   *             2                        *
random variables  I.  - NID(X 3,a  ).   The realizations  y.  are  unobserved,

                                                           #
however.   Available is the knowledge that the realization y.  is an element



of some proper subset  of 8.   More formally,  partition R into  P (>2) subsets

               P

J ,  such that  U  J»R,  J. BJ.-
-------
                                   2-40
             T

     3i/3o = Z (6  _..>/.•>  ~ 8 ,.,/a( ,..  -  Q,   ,H.J  - 0,





where 9  ,. .  = (A  - X.3/a)((A -X,B/a)), and  4>(c)  is the standard normal
       pi. i;      p    i        p  i
            - 5        2
density  (2ir)  " exp(-.5c ).   (Note that  when P  = 2, i.e. when the model  is



binary probit,  a parameter  normalization  is required.   Typically  a=1  is



used. This,  of  course,  reduces the  number  of  first order conditions  from



(m+1) to  m,  where m is  the  dimensionality of   0.)   Stewart has shown how



iterative least squares can be used  to  obtain  the ML  estimate.   The reader



is referred to his work for the details.







2.12 SUMMARY AND CONCLUSIONS



     This brief  survey has  attempted  to  present  an  overview  of several



approaches  to econometric estimation  of air  pollution  - health outcomes



models in situations where the distributions health outcome data are  such



that methods other than  linear ordinary  least squares are  likely to  be



required  in order to obtain  consistent  parameter estimates.   The data  used



in  this   study  are in  all  instances  of  this "nonstandard" nature.    In



particular,  the  analysis  to follow  concentrates on three of the types  of



data    described   in    the    preceeding    discussion:    count   data,



multinomial-distributed data, and discrete indicator,  or  (0,1),  data of the



probit sort.   The following chapters discuss  in some  detail  estimation  of



such models,  and implement some of   the estimation  techniques presented  in



the analysis  of this chapter.



     The  scope  of the  present analysis precludes consideration  of several



interesting  research  issues  that  must be  placed  on the  menu of future



research.    First,  the matter  of  severity  of chronic  illnesses is  left

-------
                                   2-41
untreated.  It clearly  is  plausible that not only the presence or absence



of, for  example,  chronic  respiratory illness is  related  to air pollution



exposures, but  also that  the  severity of  the  illness -  defined  by some



metric of severity — is responsive to  pollution exposures  as well as other



covariates.  A second interesting issue that merits analysis in the future



is the  possibility  that some subset of  the explanatory  variables  used to



explain    health    outcomes    is    correlated    with    heterogeneous



individual-specific components  of the unobserved equation error terms.  In



the present context, it might  be argued  that covariates  such as cigarette



consumption,  income, labor market status, and even air pollution exposures



(on  this  last  point,  see  Rosenzweig and  Wolpin  (1984))  are  possibly



correlated with unobservable errors.  When such heterogeneous unobservables



are present and — the crux of the problem — are correlated with observed



explanatory  variables,   parameter  estimates  obtained  without  explicit



recognition and control  for this  nonzero  correlation are will in general be



inconsistent.   Sane instrumental  variable technique will likely be required



in order to obtain consistent parameter estimates under such circumstances.

-------
                                   2-42
                                REFERENCES



Amemiya, T.   1981.   "Qualitative  Response  Models:   A Survey," Journal  of



   Economic Literature,  vol.  19,  pp.  1483-1536.



	.   1983.    "Nonlinear Regression Models,"  in  Z.  Griliches and M.  D.



   Intriligator,  eds.,   Handbook  of  Econometrics,   vol.   1,   (Amsterdam:



   North-Holland).



	.  1982*.  "Tobit Models:   A Survey,"  Journal  of Econometrics,  vol.  24,



   pp. 3-61.



Serndt,  E.  R.,  B.  H.   Hall,   R.  E.  Hall,  and  J.   A.  Hausman.    1974.



   "Estimation  and  Inference  in  Nonlinear  Structural Models," Annals  of



   Economic and Social Measurement, vol.  3,  pp.  653-665.



Breusch, T. and A. R. Pagan.   1980.   "The Lagrange Multiplier  Test and Its



   Application to Model Specification in  Econometrics," Review of  Economic



   Studies, vol. 47,  pp. 239-253.



Cox,  D.  R.  and D.   V.  Hinkley.   1974.    Theoretical Statistics  (London:



   Chapman and Hall).



Cragg,  J.   G.    1971.   "Some Statistical Models for  Limited Dependent



   Variables   with   Application   to   the  Demand   for   Durable   Goods,"



   Econometrica, vol. 39,  pp.  829-844.



Duan,  N.,  W.  G.  Manning,  C.  M.  Morris  and  J.  P.  Newhouse.    1983-   "A



   Comparison  of Alternative  Models  for  the  Demand for  Medical  Care,"



   Journal of Business and_Economic Statistics,  vol.  1, pp.  115-126.



	,  	,   	,   and   	.      1984.     "Choosing   Between   the



   Sample-Selection Model  and the Multipart Model," Journal  of  Business



   and Economic Statistics,  vol. 2, pp.  283-289.



Dudley,  L.  and C.   Montmarquette.   1976.    "A  Model  of   the Supply  of



   Bilateral Foreign Aid,"  American Economic Review,  vol.  66, pp.  132-142.

-------
                                   2-43
Hausraan, J.  A.   1978.   "Specification Tests  in  Econometrics,1' 2conometrica,



   vol. 46,  pp.  1251-1271.



	,   B. Hall  and Z.  Griliches.    1984.   "Econometric  Methods for Count



   Data with an Application to the  Patents-R&D  Relationship," Econometrica,



   vol. 52,  pp.  909-938.



	,   B. Ostro  and D,  Wise.  1984.   "Air  Pollution and Lost Work," NBER



   working paper  1263,  January.



Hay,  J.  W.   and  R.  J.  Olsen.   1984.   "Let  Them  Eat Cake:   A  Note on



   Comparing Alternative Models of  the Demand for Medical Care," Journal of



   Business  and Economic Statistics,  vol.  2,  pp. 279-282.



Heckman,  J.    1976.    "The  Common  Structure of  Statistical Models  of



   Truncation,. Sample   Selection  and  Limited  Dependent  Variables  and a



   Simple Estimator for   Such  Models,"  Annals  of  Economic   and  Social



   Measurement, vol. 5,  pp. 475-492.



	.     1979.     "Sample Selection  Bias  as  a   Specification  Error,""



   Econometrica,.  vol.  47,  pp. 153-161.



Hurd,  M.    1979.    "Estimation  in  Truncated  Samples  When  There  is



   Heteroscedasticity,™  Journal of  Econometrics, vol.  11, pp. 247-258.



Johnson, N.  L. and  S. Kotz.   1969.   Distributions in Statistics;   Discrete



   Distributions  (New York:  Wiley).



	  and  	.     1970.    Distributions in  Statistics;    Continuous



   Univariate Distributions - I (New York: Wiley).



Kendall, M.  G.  and A.  Stuart.  1973.   Advanced  Theory  of  Statistics,  vol. 3



   (London:   Griffin).



Killingsworth,  M. R.  1983.  Labor  Supply  (Cambridge:   Cambridge University



   Press).



Lin,  T.-F.  and  P.  Schmidt.    1984.    "A  Test  of the Tobit specification

-------
                                   2-44
   Against  an  Alternative Suggested  by  Cragg,"  Review  of  Economics and



   Statistics,  vol. 66,  pp.  174-177.



Maddala, G. S.   1977.   Econometrics (New York:  McGraw-Hill).



	.   1983.  Limited-Dependent and Qualitative Variables in  Econometrics



   (Cambridge:   Cambridge University Press).



Manski, C. F. and D. McFadden.   1981.   Structural Analysis of Discrete Data



   with Econometric Applications (Cambridge, Mass:  MIT  Press).



Morey,  E.  R.  1981.  "The Demand for Site-Specific Recreational Activities:



   A  Characteristics  Approach,"  Journal  of  Environmental  Economics and



   Management,  vol. 8,  pp. 345-371.



Nelson, F. D.   1981.   "A Test  for Misspecification in  the Cenaored-Normal



   Model," Econometrica,  vol.  49,  pp.  1317-1329.



Qlsen,  R.    1980.   "Approximating a Truncated Normal  Regression with the



   Method of Moments,"  Econometrica,  vol. 48, pp. 1099-1106.



Ostro,  B.    1983.    "The  Effects  of  Air  Pollution  on  Work  Loss and



   Morbidity," Journal of Environmental  Economics and Management, vol. 10,



   pp.  371-382.



Pearson,  K.  and A.  Lee.   1908.   "Generalized Probable Error in Multiple



   Normal Correlations," Biometrika, vol. 6, pp. 59-68.



Pitt,  M.    1983.    "Food Preferences and  Nutrition  in Rural  Bangladesh,"



   Review of Economics  and Statistics,  vol. 65, pp. 105-114.



Portney,  P.  R.  and J.  Mullahy.   1985.   "Urban Air Quality  and  Acute



   Respiratory Illness," Journal of Urban Economics, forthcoming.



Rao, C. R.   1965.   Linear Statistical  Inference and Its Applications, (New



   York:  Wiley).



Rosenzweig,  M.  R.  and  K. I. Wolpin.  1984. "Migration  Selectivity and the



   Effects   of   Public   Programs,"   University   of   Minnesota,  Economic

-------
                                   2-45
   Development Center,  Bulletin




Rosett, R. N.  and  F.  D.  Nelson.  1975.  "Estimation of a  Two-Limit  Probit




   Regression Model,"  Econometrica,  vol.  43,  pp.  141-146.




Royal1, R.  1984.  "Robust  Inference Using Maximum  Likelihood  Estimators,"




   Johns Hopkins University, Department of  Biostatistics Working  Paper.




Schmidt, P.  1976.   Econometrics (New York:   Marcel  Dekker).




Smith, M.  and G. Maddala.   1983-   "Multiple Model Testing for  Non-Nested




   Heteroscedastic Censored Regression Models,"  Journal  of Econometrics,




   vol. 21, pp. 71-81.




Stapleton, D.  and  D.  Young.    1984.    "Censored  Normal  Regression with




   Measurement Error on  the Dependent Variable,"   Econometrica, vol. 52,




   pp. 737-760.




Stewart,  M.  B.   1983.    On Least  Squares  Estimation when  the Dependent




   Variable  is  Grouped,"    Review of  Economics   Studies,  vol.  50,  pp.




   737-753.



Tobin,  J.    1957.   "Estimation  of Relationships  for  Limited Dependent




   Variables," Eoonometrica, vol.  26, pp. 24-36.




Wales, T.  and  A. Woodland.   1983.  "Estimation of  Consumer Demand Systems




   with  Binding  Non-Negativitay  Constraints,*    Journal  of Econometrics,



   vol. 21, pp. 263-285.



White, H.  1982.   "Maximum  Likelihood  Estimation of Misspecified Models,"




   Sconometrica,  vol. 50,  pp.  1-25.



White, H.   1983.   "Corrigendum," Econometrica, vol.  51, p. 513.

-------
                              Chapter 3




            AIR POLLUTION  MONITORS  AND INDIVIDUAL EXPOSURES








     The models estimated in Volume I typically utilized as measures of




an  individual's  exposure  the   pollutant-specific  readings  from  the




monitor nearest the centroid of  the respondent's census tract for which




the  data were  available.    In  most  cases,  screening criteria  were




established so that it  was necessary  both for a monitor to have recorded




at  least  some minimal  nunber  of  hourly  readings  during  the  two-week




period  and  for  the  monitor  to  be  located   not  further  than  some




prescribed  distance  (20  miles;   10  miles)  frcra the  residents'  census




tract centroids.



     It is  possible  that the nearest-monitor readings  we  utilized are




not representative of  the pollution  "profile"  of  the  metropolitan area




in which each respondent lives.   If  some average of the readings fron a




nunber   of    nearby   monitors   better   characterizes   the   ambient




concentrations facing the individuals in question,  then the consistency




of  results  obtained using nearest-monitor readings must be called into



question.   (This  abstracts, of course,  from  the larger question of the




ability  of   ambient  monitors  at   all  to  measure  the  exposure  of




individuals.)




     The purpose  of  this very brief chapter and assess  the pollution




profiles constructed  using the  nearest-monitor readings  versus  those




that  result  when  the  average readings from a number  of  monitors are




used.   The  extent  to which the  two constructs  are correlated indicates



the sensitivity of our  results to the use of nearest-monitor readings to

-------
                                  3-2



characterize exposure.



     The procedure is as follows.   For each of the six pollutants used



in our  study—ozone,  3ulfates,  TSP,  NO ,  CO,  and  S02—we utilized the



data  for  the  14,441  adults  in  the  main  sample  and  constructed the



nearest-monitor  measures used in the main study.  These were designated



as XXNR01, where XX  is  the specific pollutant  (03,34,SP,N2,GO,32).  In



our   study,    recall,    these   measures    were    subjected    to   a



miles-from-census-tract-centroid cutoff of  5,  10,  or,  most often,  20



miles;  the  specific distance  will  be  obvious  from  the  context.   (Mo



minimal hours standard is used here.)



     For  these   same  individuals,  we  then  constructed  two  averaged



measures for  each pollutant.   The two measures constructed  were the



simple  average over all  the  available readings from monitors within 10



and then within 20 miles of the census  tract  centroids.   These measures



were designated  XXAVGYY,  where XX  was  as defined above and Y.Y was either



10 or 20.  Thus,  N2AVG20 is the  average  of all  nitrogen  dioxide monitors



within 20  miles  of the census  tract  centroid.



     Then, given  these  measures,  we  calculated for  each pollutant the



correlation  between  the nearest-monitor  reading and the area-average



reading at both the 10 and  20  miles  cutoff values.We also  calculated the



number of  monitors used  to  construct the two area-aver ages.



     The results are presented in  the  tables that follow.   In each,  case,



"r" is  the simple  correlation  coefficient between the nearest-monitored



reading and the  1 0 or 20 mile  averaged readings.

-------
                             3-3
10 mile:
                                  OZONE
,965,                         N = 8,323








03NR01
03AVG1 0
20 mile:








03NR01
03AVG20
Number
n
1
2
3
4
5
Mean
.'0454
.0460
r - .'931,
Number
n.
1
2
3
4
5
Mean
. 0450
.'0461
Monitor
f(n)
3832
2303
985
553
302
Max
.251
.'236

Monitor
f(n)
2665
2463
1447
954
847
Max
.'251
.225
Readings in Area-Average:
n
6
7
8
9
10
Min
0
0
N =»
f(n)
244
262
179
112
51



11 ,241
Readings in Area-Average:
n
6
7
8
9
10
Min
0
.003
f(n)
679
646
550
631
359




-------
3-4
   SULFATES
10 mile: r = .952,
Number
n
1
2
3
4
5
Mean
S4NR01 10.528
S4AVG10 10.544
20 mile: r - .'91 2,
Number
n
1
2
3
4
5
Mean
S4NR01 10.590
S4AVG20 10.523

Monitor
f(n)
2693
1 134
401
308
247
Max
52.136
52.136

Monitor
f(n)
2595
1230
823
614
559
Max
52 .'136
52.136
N = 5,
249
Readings in Area-Average:
n
6
7
8
9
10
Min
0
0
N - 7
f(n)
134
109
101
116
6



,512
Readings in Area-Readings:
n
6
7
8
9
10
Min
0
1.586
f(n)
526
559
329
250
27




-------
                             3-5
                                   TSP
10 mile:
859,
N - 12,598
Number Monitor Readings in Area-Average:







SPNR01
SPAVG10
20 mile:








SPNR01
SPAVG20
n
1
2
3
4
5
Mean
70.' 478
72.021
r = .'818
Number
n
1
2
3
4
5
Mean
70.128
71.948
fCn)
1851
1283
1275
979
1012
Max
284.004
253.^28

Monitor Readings
f(n)
822
895
702
797
1 084
Max
284.004
272.244
n
5
7
8
9
10
Min
9.996
15.092
N =
f(n)
1039
975
1207
1455
1522



13-772
in Are a- Aver age:
n
6
•j
8
9
10
Min
9.996
15.092
f(n)
1272
1373
1429
2330
3068




-------
                             3-6
10 mile:   r
,951
N - 6,393
                  Number Monitor Readings in Area-Average:
                         f(n)
                                       f(n)






N2NR01
N2AVG1 0
20 mile:
1
2
3
4
5
Mean
117.857
1 1 7. 323
r - .'923
3195
1593
775
485
212
Max
435.316
435.316

Number Monitor







N2NR01
N2AVG20
n
1
2
3
4
5
Mean
112.913
111. '646
f(n)
3004
1890
747
442
417
Max
435.' 31 6
375.' 928
6
7
8
9
1
Min
0
0
N
Readings in
96
36
1
0
0 0



=. 8,452
Area-Readings:
n f(n)
6
7
8
9
1
Min
0
0
668
692
398
168
0 26




-------
                             3-7
                                   CO
10 mile:   r  =  .887
N = 3,921
Number Monitor Readings in Area-Average:


CONR01
COAVG1 0
20 mile:



CONR01
COAVG20
n
1
2
3
4
5
Mean
3.306
3.937
r =» .'838
Number
n
1
2
3
4
5
Mean
3-717
3.808
f(n)
3638
2087
946
676
510
Max
26. 583
26.583
Monitor Readings
f(n)
2344
2536
1130
1215
984
Max
26.583
25.111
n
6
7
8
9
10
Min
0
0
N = 10,939
in Area-Average
n
6
7
8
9
10
Min
0
0
f(n)
288
280
235
146
115


f(n)
480
298
553
838
56i


-------
                             3-3
10 mile:   r
,'857
M = 8,842
Number Monitor







S2NR01
S2AVG1 0
20 mile:








S2NR01
S2AVG20
n
1
2
3
4
5
Mean
68. 591
69.375
'r = .'819
Number
n
1
2
3
4
5
Mean
66. 222
67.050
f(n)
2976
1855
1280
966
770
Max
760.088
568.988

Monitor
f(n)
2414
1733
1204
1129
1069
Max
760.088
568.988
Readings in Area-Average:
n
6
7
8
9
10'
Min
0
0
N -
f(n)
356
182
206
192
59



10,784
Readings in Area-Average:
n
6
7
3
9
10
Min
0
0
f(n)
919
726
841
541
208




-------
                                  3-9



     The results are  quite  reassuring  about  the use of nearest-monitor



data to proxy individual  exposure.'   The  correlation coefficients between



the nearest-monitor reading and  the average of all  monitors  within 10



miles range  from  .965 for ozone  (a highly  dispersed pollutant)  to .86



for TSP and  SO   (more localized  pollutants).   The 20-mile correlations



follow similar relationships,  but are  of course somewhat lower than the



10-mile correlations due to the decreased weight of the nearest-monitor



reading  in   calculating  the 20-mile  averages.   What  is particularly



encouraging is that no correlation coefficient is below 0.8,  leading us



to suspect that  the use of the  nearest-monitor  reading would be unlikely



to impart any systematic  biases vis-a-vis use of area-averaged readings.



In the following chapter, we make  use  of  the  area-averaged readings to



test this suspicion.

-------
                                 Chapter  4






              URBAN AIR QUALITY AND ACUTE RESPIRATORY ILLNESS






4.1   Introduction




     Over the past fifteen years, economists interested in the benefits of




air pollution  control  have concerned  themselves  with  more than  just the




appropriate valuation of health  gains  and  losses.   In addition, some have




explored  in epidemiological  analyses  the  actual  physical  relationships




between air pollution and health itself using statistical techniques common




in  the  social  and natural  sciences.    Most  of  these studies  have  used




aggregate  data at  the  city  or SMSA  level to  test  for  the  effects  of




prolonged  exposures  to  air  pollution  on  the  mortality rates  across the




units of observation.  The studies of  Lave  and Seskin  [8], Crocker, et al.




[2], Mendelsohn and Orcutt  [12],  Chappie  and Lave [1],  and Lipfert [10] are




among the best examples.



     Relatively less  attention has been given  in this literature  to the




relationship between air pollution and sickness (or morbidity).   This  is



unfortunate  because  morbidity  is observed  much  more  frequently  than




mortality and may be of greater economic  significance than premature death.




When researchers  have examined  possible links between air  pollution and



morbidity, they have generally been forced  through lack of data to do so in




the  absence of  information  about individuals'  socioeconoraic and  other




characteristics—even though  these characteristics  may have  an important




effect on health status.

-------
                                    4-2






     Volume I  presents  our  recently completed comprehensive investigation



of the effects of ozone (ground-level rather than stratospheric) and other



air pollutants  on individuals'  acute and  chronic health status.   Unlike



many  previous   studies,  this  work  is  based  on a  large  and  relatively



detailed  individual  data  base,  allowing  controls  for  certain important



socioeconomic   and   demographic  characteristics   in  addition   to  the



meteorological measures sometimes included in earlier studies using either



aggregate or less detailed individual data.  This chapter presents seme of



the major  findings  concerning the  effects of urban  air  quality  on acute



respiratory disease  using  an estimation  technique not  employed in Volume I.



Chapter 7 reports some new findings  on air  pollution and chronic illness.



     Of particular concern here is  the  sensitivity  of the findings to the



measures of air  quality used.   As suggested above,  most previous analyses



of  the health  effects associated  with air  pollution  have characterized



individual exposures using some measure of air quality averaged over most



or  all of  the  monitors in  the urban  areas  where  the  individuals live.



However,  many  persons  may  get most  or  all of  their  ambient  exposure



proximate to the monitors  nearest  their  homes.  As part of our larger study



in Volume  I,  therefore,  each individual in the  sample  was  matched to the



nearest  ten  air pollution monitors  for  each  of   eight   different  air



pollutants so  as to use  close-to-home  pollution readings to characterize



exposure.   Because  this was  very resource-intensive, it is important to



illustrate  the  difference such  an  approach  may  make  when  estimating



dose-response  relationships.    Additional  sensitivity 'analyses   in  this



chapter  explore interactive  effects  as well  as possible  thresholds  and

-------
                                    4-3





non-linearities in the relationship between air pollution and acute health



status.



     In Section II  we briefly describe the data  used  in our analysis and



the  independent  variables  we include.   In Section  III,  we  discuss the



estimating techniques used to explore possible links between air pollution



and  acute  respiratory disease.   In  Section  IV we present  our empirical



findings and  in Section  V we draw  some  cautious  inferences  from them for



applied welfare calculations.








4-2.  Framework for the Analysis



     The individual  data  underlying both our  larger study as  well as the



present  chapter   come from   the  1979  Health   Interview Survey  (HIS)—a



nationwide sample  of approximately 110,000  individuals conducted during the



course  of  each year  by  the National Center for  Health Statistics.   All



acute illness experienced during the  two-^week  period  prior  to the date of



each interview was  to be  reported by each respondent  or the family member



responding  for him  or  her.    Manifestations of  these  illnesses  were



classified in  three types—bed disability days  (the  most serious  of the



three categories), work or school loss days, and what might best be thought



of  as  minor  restricted activity days.   The latter  are  days  on which the



respondent was neither bed-ridden nor  forced to miss work or school but did



suffer from an acute  impairment sufficient to  cause him or her to restrict



activity in some noticeable  way.   The dependent variable in the subsequent



analysis is total  restricted activity  days—the total num'ber of days during



the  two-week  period on which  any  of these three types of  acute illness

-------
                                    4-4






occurred.  Finally, all acute (and chronic) health information elicited in



the survey was  coded by cause, using  the  International  Classification of



Disease.   Attention is limited in  this  chapter to total restricted activity



days due to respiratory disease since this is the type of acute impairment



most likely to result from  exposure  to  air  pollution.



     The socioeconomic  data elicited  from each respondent in  the Health



Interview   Survey    includes,    among   many   other    individual    and



household-specific characteristics,  information on age,  race,  sex, income,



and education.  In addition,  several supplements to the 1979 survey made it



particularly useful  for  epidemiological  purposes.   Specifically,  the 1979



HIS contained a  supplemental questionnaire asked of one-third  of  all the



adults interviewed  (26,271   of  a  total  of  79,743  adults)  which  provided



detailed data on  lifetime  smoking history,  including  the tar  and nicotine



content of the brands most commonly smoked.  Smoking data are obviously of



great  importance  if  one  is interested  in exploring the  determinants  of



respiratory and  other types  of disease.   The 1979  HIS also  included  a



supplement (again to one-third of all adults surveyed) designed to provide



detailed information on residential  histories.  This  is  not important for



our present purposes but will play a major role in our analysis in Chapter



7 of the determinants of chronic respiratory and other types of  disease.



     All air pollution data come from the Environmental Protection Agency's



SAROAD system.  For  our analysis of  the  relationship between air pollution



and acute morbidity in the larger  study,  all air quality  data were specific



to  the  two-week  recall  period  for which  individual   health data were



available.    This  is also  the case here,  save for  sensitivity  analyses

-------
                                    4-5






conducted using annual average  data  as  a proxy for air quality during the



two-week  period.    As  indicated  above,  most  of  the  analysis  below



characterizes individuals' exposures to  air  pollution using  data from the



air  pollution monitors  nearest  their   residences.    No  individuals  are



included in  the  final  sample if  the nearest monitor  for any pollutant is



more than ten miles  away.   The average  distance to the nearest monitor is



slightly more  than four miles.   In addition to the air  pollution data,



meteorological data were added from the monitoring network of the National



Oceanic  and  Atmospheric  Administration.    Included  are  observations  on



temperature and precipitation during  the  two-week recall period.



     The  overall  sample  from  which the  subsample  used here is  drawn



consists  of  14,441  individuals  aged seventeen  and  above for  whom  both



smoking  data and at least  some air pollution data were  available.   The



models  estimated  below  are based on a  smaller  subsample, however,  since



complete  data are  required  for  each  of  the air  pollutants  and  other



independent variables.



     The analysis of acute respiratory disease includes air pollution data



during  the two-week  recall  period for ozone,  a  gaseous  pollutant  that is



the primary  constituent  of smog, and well as sulfates, perhaps  the  most



harmful  of the  airborne  particles.   It  is  worth noting  that the computer



algorithm used to  match  individuals  to the  ten  nearest  ozone and sulfate



monitors  could  only be  used  for  monitors  within  SMSAs.    Thus,   the



estimation sample consists  entirely of city and suburban residents from



around  the United States.   Table  4-1  lists the independent variables  used



in the analysis of acute  respiratory  disease  and their sample means.

-------
                                    U-6
Table 4-1.   Variable Definitions and Sample Means
Variable Name
OZNEAR
S4NEAR
OZAV1 Of
S4AV10
OZAV20
S4AV20
OZANNR
S4ANNR



OZAN1 0


S4AN10


OZAN20
          Description
Average daily maximum one-hour ozone
reading during two week recall period
at monitor nearest the centroid of
respondent's census tract of residence
(in parts per million)

Average 24-hour sulfate concentration
during two weeks at nearest monitor
(see above) (in micrograms per
cubic meter)
Average daily maximum one-hour
ozone reading during two weeks
averaged over all monitors within
a ten mile radius of respondent's
census tract centroid

Average 24-hour sulfate concentration
during two weeks averaged as in OZAV1 0

Same as OZAV10 but averaged over all
monitors within 20 mile radius

Same as S4AV1 0 but averaged over all
monitors within 20 mile radius

Average daily maximum one-hour ozone
concentration over entire calendar
year 1979 as measured at the nearest
monitor

Average 24-hour sulfate concentration
over calendar year 1979 as measured at
the nearest monitor'

Same as OZANNR but averaged over all
monitors within ten mile radius

Same as S4ANNR but averaged over all
monitors within 10 miles

Sane as OZAN10 but averaged over all
monitors within 20 miles
Sample Mean


    0.042
   10.876
    0.043
   10.890


    0.044


   10.700


    0.042
   10.752



    0.043


   10.709


    0.044

-------
                                    1-7
Table 4-1  (cont'd).   Variable Definitions  and Sample Means
Variable Name

S4AN20
WHITE
MALE
INCOME
AGE
GIGS
FORMER
SCHLYR
CHRONIC
Description
Same as S4AN1 0 but averaged over
all monitors' within 20 miles
Dummy variable: 1 if white,
0 otherwise
Dummy variable: 1 if male,
0 if female
Annual household income
in dollars
Age in years
Number of cigarettes smoked per day
Dummy variable: 1 if respondent
formerly smoked regularly but does
not presently, 0 if not
Years of education completed
Dummy variable: 1 if respondent
Sample Mean

10.588
0.852
0.436
17,152
42.30
7.58
0.20
11.73
0.17
MAXTMP
RAIN
RRAD
has any limitation in activity due
to chronic illness, 0 otherwise

Average daily maximum temperature           64.02
during two-week period

Average daily rainfall during               0.12
two-week period

Number of respiratory-related restricted    0.162
activity days during two-week recall
period

-------
                                    4-3






4.3  Model Specification



     For  reasons  of  economy and  computational  simplicity,  most  of the



models in Volume  I  were estimated using  ordinary least squares and  logit



techniques  (where the  dependent variable  was,   respectively,  either the



number  of  days  of  a  particular kind  of impairment  during  the two-week



recall period or  a  dichotomous  indicator  of an individual having at  least



one day of that kind of impairment  during  the  period).   As Chapter 2 points



out, however, estimation techniques like OLS are  not ideally suited to the



nature of our measures  of  acute health status, however.  Recall that that



measure  is   the   number  of  respiratory-related   restricted  activity  days



during the two-week  recall  period (RRADs).  Clearly this measure  is bounded



by zero and fourteen and because of  survey protocol can assume only integer



values in  {0,1,2,...'., 14}.   The frequency distribution of  RRADs  for the



sample of 3.347  adults is presented  in Table 4-2.   Because, of the  small



number of observations at the  upper  (14  day)  limit,  the implications of



this  upper  bound for  estimation  strategy are  ignored in  the following



analysis; we concentrate  instead  on  the complications  arising from the



overwhelmingly large number of  individuals reporting zero  RRADs.



     A  standard  approach  in such  circumstances  is  to use the Tobit or



censored normal estimator where one observes T independent observations on



yfc  which are  the  realizations of random variables  Y * subject  to the
 L.                                                      t

                                          2

censoring rule y  =«max(0,y *), Y. *-N(X 3,


obtained  using  the  Tobit  model   are generally  inconsistent  when the



underlying data are not distributed as censored normal  with

-------
                                    4-9





Table 4-2.  RRAD Frequency Distribution
RRAD
0
1
2
3
4
5
6
7
8
10
11
12
14
1 OBS
3227
25
28
23
9
7
2
3
3
3
1
1
15
%_
96.42
0.75
0.84
0.69
0.27
0.21
0.06
0.09
0.09
0.09
0.03
0.03
0.45

-------
                                   4-10





independent-, identically distributed errors.   Estimating a Tobit model of



RRADs using the  two-week average pollution data  from  the nearest monitor



and the other independent variables in Table 4-1  above,  some tests for its



appropriateness were conducted and strong evidence of tnisspecification was



found.   While this  might  be  attributable  to omitted  variables  or other



factors unrelated to departures from the usual assumptions about the error



distribution  in   the  Tobit  model,  a  different  statistical  approach is



utilized here.



     In  modeling event  counts  (non-negative  integer  data)  over  a  time



interval  (t,t+dt),  the  Poisson  distribution  is commonly  used.   Here,



discrete random variates  Y   follow the  probability law:
(1)



                    =• 0            ,     else







with  E(Y  )  - Var(Y. ) -  \  .   Given the  nonnegative  integer nature of  the
         u         U     t ,


RRAD  measure,  such a  probability  law has  obvious  appeal  for  estimation.



Analogous  to  the  normal  distribution  where  for  econometric  work  one



typically  specifies  E(Y.fc)   -  \i   -  X 8,   the  parameter  of  the  Poisson



distribution can  be reparameterized  to admit the influence of  covariates.



Since  for all  t,  X  >  0,  a straightforward approach is  to  assume A   -
                    c                                                  t


exp(X  3)  and  to  estimate  8 by  maximum  likelihood  (see  Hausman,  Hall,



Griliches  [5],  Hausman,  Ostro, Wise  [4]).   This  is the approach  adopted



here for modeling the RRAD  outcomes.

-------
                                   4-11
     A drawback of the Poisson specification is  the restriction  that  E(Y.  )
                                                                         t


= Var(Y ).  Should this restriction not  characterize  the  data,  the  maximum
       w


likelihood estimates of the covariance matrix of  3 will  be inconsistent and



asymptotic t-tests based thereon would be misleading.   Hausman,  Ostro,  and



Wise  circumvent  this  restriction   by  allowing for  an  overdispersion



parameter.  We  take a different  approach here,  using an  estimator of  the



covariance  matrix  that  is  more  robust   against   departures   from  the



restriction that  the  mean  be  equal   to the  variance.    Details of  this



procedure are presented in the appendix.



     Given  the  assumptions  on  the  parameterization   of   the  \. ,   the
                                                                    U


log-likelihood function to be maximized is:
(2)     i =« I-exp(XtS)  +• ytXt& + c,

            w
where X.  is the vector of independent variables  as  described in  Table 1, y.
       W                                                              .    V*


is the observed  READ count for individual t, and  c  does not depend on 3.



The ML estimate of 8 satisfies:
(3)     3*/3B - I(-exp(X.S)  * y.)X!  - 0.

                t              G  c
4.4.  Empirical Results



     Table 4-3  presents  the results  of  our  basic  model  and the  variants


designed  to  test  the sensitivity  of  the results to  assumptions  about

-------
                                   4-12






individual exposures to ambient air  pollution.   In specifications (3.1) -



(3-3) each  individual's  count of  respiratory restricted  activity days is



hypothesized to be related to ambient  air  quality during the individual's



two-week recall period.   In   (3.1) exposures  are  proxied by readings from



the one ozone and one sulfate monitor nearest each individual's residence;



in specifications  (3-2)  and  (3.3).  readings are averaged,  respectively,



over all  monitors  within  10  and  20  miles  of each respondent's residence.



Specifications (3.4)  -  (3-6)  use  annual  1979  average air pollution readings



as a proxy  for air  pollution  exposure  during each recall period.   As in



(3«D ~  (3»3)»  equation  (3-4)  uses  the  annual  average a.t  the nearest



monitor  to proxy individual exposure while (3-5) and (3.6) use the average



of the annual averages  of  all  monitors  within 10 and 20 miles respectively.



     Table 4-3 indicates that of  the non-pollution variables, race, income



and  temperature  are  related  consistently  across models to  RRADs  in a



statistically significant way—with  whites,  those with lower incomes, and



those exposed to  colder temperatures  all experiencing relatively more acute



respiratory  illness  during  the   two-week  recall  period.   Because those



reporting the presence of  a chronic illness would  be expected to experience



more restrictions in activity during any two-week period, a dummy variable



identifying  such  individuals was   included.    As  expected,  this  dummy



variable was  positively and significantly related to  the number of RRADs.



Finally, while always of the  expected sign, the number  of  cigarettes smoked

-------
                                  4-13
Table 4-3.   Model Estimates:  Sensitivity to Air Pollution Measurement
         ('Dependent  variable  is RRADs during two-week recall period)
Model
Independent 3.1 3.2
Variable
OZNEAR 6.883
(1.97)
OZAV10 6.614
(1.91)
OZAV20
OZANNR
OZAN1 0
OZAN20
S4NEAR -0.005
(0.22)
S4AV10 -0.0210
(0.67)
S4AV20
S4ANNR
S4AN10
S4AN20
WHITE 1.261 1.258
(2;87) (2.86)
3.3 3.4 3.5 3.6


9.324
(2.41)
17.603
19.449
(2.88)
1 7. 473
(2.12)


-0.046
(1.4)
-0.0175
(0.41)
-0.0558
(1.34)
-0 . 0765
(1.87)
1.249 1.165 1.163 1.188
(2.85) (2.65) (2.65) (2.72)

-------
                                   4-14
Table 4-3 (cont'd.)   Model  Estimates:  Sensitivity to Air Pollution
                   Measurement  (Dependent  variable is RRADs during two-week
                   recall period)
Model
Independent
Variable
MALE
INCOME
AGE
GIGS
FORMER
SCHLY.R
CHRONIC
MAXTMP
RAIN
INTERCEPT
N
*
L
3.1
-0.054
(0.19)
-0.000035
(2.3D
0.00031
(0.05) -
0.015
(1.53)
0.312
(0.89)
0.0067
(0.17)
0.776
(2.45)
-0.019
(2.45)
1.629
(1.07)
-2.127
(2.06)
3,347
-741.5
3.2
-0.058
(0.21)
-0.000035
(2:30)
0.00050
(0.08)
0.015
(U56)
0.319
(0.91)
0.0066
(0;17)
0.769
(2.42)
-0.013
(2.54)
1.735
(1:13)
-1.993
(1.92)
3,347
-740.9
3-3
-0.064
(0.23)
-0.000035
(2;28)
0.00086
(0;14)
0.016
(1.62)
0.323
(0:92)
0.0067
(0.17)
0.760
(2.39)
-0.021
(2;70)
1.952
(1^28)
-1.780
(1.67)
3,347
-732.0
3.4
-0.062
(0.22)
-0.000035
(2.27)
0.00076
(0.13)
0.016
(1.7D
0.340
(0.98)
.000062
(0.02)
0.707
(2.18)
-0.016
(2;18)
1.801
(1.12)
-2.559
(2;26)
3,347
-710.0
3.5
-0.065
(0.23)
-0.000034
(2.22)
0.0013
(0.23)
0.016
(1.70)
0.344
(o;99)
0.0035
(0;09)
0.071
(2:19)
-0.017
(2.38)
1 .992
(1.30)
-2.257
(1:93)
3,347
-707.0
3.6
-0.055
(0.20)
-0.000033
(2.19)
0.0013
(0.21)
0.016
(1.72)
0.328
(0:94)
0.0010
(0.03)
0.720
(2;24)
-0.013
(2.48)
2.049
(U34)
-1.950
(1.66)
3,347
-712.0
L   =  Log likelihood

(Asymptotic normal statistics for Ho:3
•0  in parentheses)

-------
                                   4-15






per day and the  dummy  variable indicating that the respondent is a former



smoker were not  significant  at conventional  levels,  a somewhat surprising



finding given the concentration on respiratory  disease.



     The main  focus of  our  analysis  is  the  relationship  between acute



respiratory disease (as measured by RRADs)  and  urban  air quality.  As Table



4-3 indicates, in  only  one  of the six specifications is the hypothesis of



no relationship  between  ozone and RRADs not rejected at  at least the 95S



level.    This finding is fully consistent with the analysis  in  Volume I



where we used different samples, estimating techniques, and  combinations of



independent variables—including  monitored readings  for  as many  as  five



separate  air  pollutants.    There  positive  and  significant associations



between ozone and RRADS in adults were frequent although not  uniform.



     The statistical significance of the ozone coefficients is not altered



appreciably by using monitored readings averaged over 1 0 or 20 miles rather



than readings at the nearest monitor.  This is intuitively  plausible since



ozone tends to be a diffuse  (as opposed to  a "hot-spot") pollutant.  To the



extent they are general izable, our findings suggest that city or SMSA-wide



average  readings  may   be   preferable   to  nearest-monitor  readings  to



characterize individual exposure to ozone in view  of  the resources required



to obtain the latter.



     Using air pollution  data averaged  over the  entire year during which



the health  interview took place—models  (3.4) -  (3.6)—results  in larger



estimated  coefficients  and  higher asymptotic   t-ratios  for  the  ozone



variable than when air  quality data contemporaneous to the recall period



are  used.     The  importance  of  this finding should  be  discounted,  we

-------
                                   4-16



 believe.    So  long  as  one  is  concerned with  the possible  relationships


 between urban air quality and day-to-day variations in acute morbidity,  the


 correct  measure of  pollution must  be one  which is  coincident  with, or


 slightly  precedes, the period during which health status is  being  observed.


 To  illustrate,  consider  an  individual  interviewed for the  HIS on  January


 15,  1979.   Clearly, using  1979  annual average air pollution readings  for


 ozone  and sulfates  to help explain RRADs between jgiuary 1-14 brings into

play 50  weeks  of  data  which could  have no  effect  whatsoever on  health


during the recall period.   For this reason, the use of contemporaneous  (or


"real  time")  air  pollution data  should  be  considered  the conceptually


correct approach when analyzing acute respiratory  disease.


     Based on the findings in Table 4-3 we cannot reject the hypothesis of


no relationship  between ambient sulfate concentrations  and RRADS during the


two-week recall  period.   It should  be noted, however,  that sulfates and


other particulates are generally monitored only every six days.  Thus, any


two-week period  will  contain at most three 24-hour sulfate measurements and


this may  affect the findings.   (Ozone, on  the other hand,  is monitored


continuously and is measured in specifications (3-1) ~  (3-6) by the average


daily  maximum   one-hour  reading—measured  during  the  recall  period  or


annually depending on  the equation.)    Note also that the  coefficient on


sulfates  is more  sensitive to  the  choice  of  exposure  proxy.   This is


because concentrations of  sulfates  and other particulates exhibit greater

                                          il
variation within an  area  than does  ozone.    (It should be noted here that


the sample correlation between OZNEAR and S4NEAR is 0.108.  We conducted

-------
                                   4-17






teats for  possible  degradation  of  parameter  estimates  due to collinearity



but found no evidence thereof.)



     Prior  clinical  and  epidemiological  analyses  suggest   the  possible



importance of interactive or synergistic effects of certain air pollutants



(see  Hazucha and   Bates  [7]  and  Graves  and Krumm  [3], for  instance).



Accordingly, the existence  of such an  effect between ozone  and sulfates



(OZXS4)   is  tested.    The  results  are presented  in  specification (4.1)  in



Table  4-4,   and do not  support   the  hypothesis  that  such  effects  are



important. In (4.2)  another  hypothesized  interactive effect is tested, that



between  ozone and average maximum  temperature (OZXTEMP)  during  the recall



period.   Again, no  evidence of such an effect is found.   These results are



consistent  with the  more  extensive  analysis  of  interactive effects  in



Volume I.



     So-called threshold effects or other  types  of non-linearities in the



relationship between  ozone and  RRADs are potentially important  and are



tested  for  here.    To  see whether  the relationship  with RRADs  differs



between lower and higher concentrations,  the sample was twice divided into



two  separate regimes,  once  with  the  dividing  point   being  0.05  ppm.



Separate coefficients were estimated on the ozone variable in the lower and



higher   regimes.      In   this  specification  ozone   is   positively  and



significantly associated with  the  expected  number of RRADs in regimes both



above and below  0.05 ppm.   A causal inspection of the coefficients in (4.3)



could convey the  impression that a marginal  change in ozone will  have a



larger impact on RRADs  at lower than at higher  concentrations.   In fact,



this  is not  the case.    When the  first   derivatives of the  estimating

-------
                                       4-13
Table 4-4.  Model Estimates:   Alternative Specifications  (Dependent variable is
          'RRADs during two-week recall  period)
Model
Independent
Variable
OZNEAR
OZH75
OZL05
(OZNEAR)2
(OZNEAR)1/2
34 NEAR
OZXS4
OZXTEMP
WHITE
MALE
INCOME
AGE
4.1
7.410
(1.24)




-0.003
(0.09)
-0.047
(0.09)

1.262
(2.37)
-0.054
(0.19)
-0.000035
(2.3D
0.00031
(0.05)
4.2
70.659
(1.77)




-0.003
(0.12)

-0.874
(1:65)
1.235
(2.80)
-0.053
(0:19)
-0.000036
'(2.37)
0.00067
(0.11)
4.3

9.554
(2.71)
22.505
(2.11)


-0.0023
(0.10)


1.259
(2.86)
-0.049
(0.18)
-0.000036
'(2.38)
-0.000024
'(0.04)
4.4



1.343
(0.07)

-0.0017
(0.07)


1.290
(2.93)
-0.043
(0.15)
-0.000035
(2.32)
0.00025
(0.04)
4.5




4.926
(2.45)
-0.0074
(0.31)


1.239
(2.83)
-0.060
(0.21)
-0.000036
(2.32)
0.00034
(0.06)

-------
                                       4-19
Table 4-4 Cont'd.)   Model Estimates:   Alternative  Specifications  (Dependent
                  variable is RRADs during two-week  recall  period)
Model
Independent
Variable
4.1
4.2
4.3
4.4
4.5

GIGS
FORMER
SCHLYR
CHRONIC
MAXTMP
RAIN
INTERCEPT
N
i
0.015
(t.53)
0.312
(0.89)
0.0067
(0.17)
0.776
(2.44)
-0.019
(2.44)
1 .632
U.07)
-2.152
(1.92)
3,347
-2049.4
0.015
(t.52)
0.318
(0.90)
0.0036
(0.09)
0.779
(2.49)
0.0059
(0.32)
1.763
(1.12)
-3.827
(2.29)
3,347
-2031 .2
0.015
(T.52)
0.318
(0.90)
0.0051
(0.13)
0.773
(2.44)
-0.019
(2.50)
1 .626
(T.07)
-2 . 489
(2.26)
3,347
-2039.2
0.014
O.49)
0.303
(0.87)
• 0.0063
(0.16)
0.773
(2.43)
-0.013
(1.85)
1 .366
(0.87)
-2 . 225
(2:14)
3,347
-2054.3
0.151
(t.55)
0.319
(0.91)
0.0071
(0.18)
0.781
(2.47)
-0.023
(2.92)
1.776
(T.17)
-2 . 498
(2.46)
3,347
-2043.1
(Asymptotic normal statistics  for
Ho:3  -0
in parentheses)

-------
                                   4-20





equation are evaluated at the  appropriate  ozone  concentration  for each of



the individuals in the low  and high regimes and the resulting values then



averaged,  the estimated first  derivative  is nearly twice  as  large in the



high as in the low regime.



     Although  the Poisson  expectation function E(RRAD )  =   exp(X  0)  is
                                                        C          £


non-linear,  it does imply that  the elasticity  of S(RRAD )  with respect to
                                                        U


ozone is linear.  To allow for  greater flexibility, models (4.4) and (4.5)



are estimated using, respectively,  the square and the  square  root of the



ozone concentration during  the  recall  period at the  nearest  monitor.   In



other words, the specification  is:







(4)   E(RRAD, ) - exp(Z.Y  +  aCOZNEAR. )5)
            C         v            u




where  Z  is  the  vector  of independent  variables other  than ozone  and

                             s*

Se(0.5, 2.0).   When    6=0.5, a is positive and statistically significant;



when 5-2.0,  a is positive but   not  significant.   In fact,  note that (4.5)



has  a  higher model  likelihood than specification  (3.O  which  is simply



equation (4) with 6-1.0,  thus indicating that non-linearities  in the ozone



specification are important.







4.5.  Policy Implications  and Conclusions



     Ozone  is  one  of  six  air  pollutants  for  which  the  Environmental



Protection    Agency    has    established   maximum   permissible   ambient



concentrations.   The  controversy surrounding revision  of the ozone  standard



in  1978 (see White  [20]),  coupled  with  recent emphasis  on   cost-benefit

-------
                                   4-21





analysis in government regulation  (see Smith  [17]),  make  it worthwhile to



illustrate the changes in acute respiratory health that might be associated



with changed ozone levels.  We use a subset of  the results presented above



to make such an illustrative calculation.  The discussion here is confined



to specifications  where  ozone is  measured  by the average  daily one-hour



maximum  during the  two  weeks  at  the  monitor  nearest  the  respondent's



residence.



     One way  to assess  possible pollution-related  changes in  acute health



status is to calculate the elasticity of E(RRAD) with respect to ozone and



evaluate the predicted total  change  in  expected RRADs for the individuals



in   the   sample  resulting   from   some  hypothetical  change   in  ozone



concentrations. Log-differentiating  (4), it follows that:







(5)   OE(RRAD.)/30ZNEAR.)(OZNEAR./E(RRAD. )) = 5a(OZNEAR. )5.
              w         \f        U        U              C   .




     Note  that for  5<1,  the  curvature  of  the expectation function  (as


               2                2
determined by  3 E(RRAD )/30ZNEAR ) cannot be  determined without reference
                      c         c


to the data for the t-th  observation.  It can be seen from (5) that in the



nonlinear  cases  where   5=0.5  or  2.0—as  in  specifications  (4.4)  and



(4.5)—evaluating  the elasticity at the sample mean of OZNEAR  will yield a
                                                            O


different estimate  than  that given  by  evaluating  (5)  for all  t  and then



averaging the  elasticities.   The results of  both approaches  are presented



in the top panel of Table 4-5.



     The upper panel of  the table indicates that the estimated elasticities



are quite sensitive to the  value of  
-------
                                   4-22






1.0,  the resulting  elasticities  are of .the same  order  of  magnitude,  with



the former roughly  twice  the latter.   However,  when 5-2.0,  the estimated



elasticity is almost two orders  of  magnitude smaller than the others.  Note



that  these  results  .hold   irrespective   of   the  method   of  elasticity



calculation.



     In the lower panel  of Table 4-5  are presented the elasticities for the



model (4.3) in which ozone was permitted to have different coefficients in



low and high regimes.   Recall that in this case 5=1.0, so that within each



regime the ozone elasticities are linear in ozone.  Therefore, both methods



used above to calculate elasticities will yield the same result.  However,



there are two relevant elasticity measures, one prevailing for observations



with ozone measures below the split and one for  those  above.   Because of



the  second-derivative properties noted above, reference to  the parameter



estimates  alone  is   insufficient   to  compare  low-  and  high-regime



elasticities.   In fact, it  happens  that  the  elasticity estimates for the



low-ozone  and  high-ozone  regimes are virtually  identical, 0.65  and 0.66,



respectively.

-------
                                   4-23
Table 4-5.  Elasticity Estimates for Alternative Specifications
                               Whole Sample
0.5

1 .0

2.0
                         Evaluated at Mean
                            of OZNEAR
0.506

0.290

0.0048
                    Mean of Individual
                        Elasticities
0.485

0.290

0.006
                        Split Sample (w/
             1.0)
Split -

   0.05 ppm

     low regime

     high regime

   0.075 ppm

     low regime

     high regime
0.645

0.655



1.061

0.209

-------
                                   4-24
Table U-6.   Estimated Changes  in RRADs Due to 10 percent Reduction in
         'Ozone Concentration


5-
0.5
1 .0
2.0
Average Individual Reduction
each two weeks
(S1-S2)/n
-.00776
-.00442
-.000083
Annual Decrease in RRADs:
Urban Adult Population*

22.19 x 106
12.64 x 105
0.24 x 106
Calculated by multiplying the two-week  individual change in column 2 by 26
to convert to annual changes  and then  by 100 million—the urban adult
population of the United States.

-------
                                   4-25


The  elasticity  estimates  can  be used  to  estimate  one  type  of  health

improvement that might  accompany  reduced  ozone concentrations.  Using the
           ^      A   A
estimates  B  =  (  Y ,  a ) from the specifications (3.1), (1.4), and (4.5),

(5)  is  evaluated  at (Z , OZNEAR  ) for all  t  in the estimation sample and
                       u        u
                      /•.   <\         »
the sum S1 =  I  exp(Z  Y + a(OZNEAR )  ) is calculated for each of the three
              t       t            t
alternative specifications.    This yields  an estimate of  the prevailing

count of RRADs in the  sample of 3>347  adults given prevailing levels of the

independent variables  including ozone.  To evaluate  the  effect  of a change,

we first assume  that some hypothetical  policy measure reduces by 10 percent

the  two-week  average  daily  maximum ozone concentration, OZNEAR , faced by
                                                               U

each   individual   and  then   calculate   the   sum   32  -   Z   exp(Z Y  +
             &                                               t        fc
a(.9*OZNEAR.)  ).
           w

     For  each of  the  three  specifications,  the average  (S1-S2)/3347 is

calculated, thus  giving an estimate  of  a  typical  individual's  change in

two-week  RRADs  given  a  ten  percent  decrease  in  ozone  concentrations.

Assuming  an adult  SMSA population of  110  million, and extrapolating the

two-week  decrease  in  RRADs  to  an   annual  figure,  we obtain for  each

specification   an    estimate   of   the   total   annual    decrease   in

respiratory-related restricted activity days associated  with a  hypothetical

ten percent ozone reduction.   The  results  are presented  in Table 4-6.

     It is here that  the  implications  of  the different specifications can

most forcefully  be seen.   At  the two  extremes  are the 6=0.5  and 5=2.0

formulations  of  the model.   In the former  case, the ten percent reduction

evokes  a  total  annual change of  more  than 22 million  RRADs while  in the

latter case the  change is less than a  quarter million RRADs.

-------
                                   4-26






     The final step in benefit  estimation involves the assignment of dollar



values to these hypothetical  improvements in health.  Valuing reduced RRADs



is  not  easy,  particularly  since  that  measure  embodies  a  range  of



impairments from  minor restrictions in  activity to bed  disability days.



However, based on separate  analysis of adults' work loss and bed disability



in  Volume  I—wherein  we  found  no significant  associations  with ambient



ozone  concentrations  and  the  more  severe types  of  restrictions —  we



presume that the  effects predicted in Table  4-6  are minor restrictions in



activity.



     Ideally,   these  minor  RRADs  should  be  valued  using  changes  in



individuals'   expenditure  functions  which   reflect   both  labor-leisure



tradeoffs as well as the possibility of defending against pollution-related



illness  (see  Harrington and Portney [7],  for  instance).    In  practice,



alternative approaches are typically required.  Using contingent valuation



methods, for  example,  Loehman  et.  al.  [11] recently elicited individuals'



reported  willingness  to   pay   to   avoid   one  day  of  various  kinds  of



respiratory impairments.   The  values ranged from $2.31  for a day of minor



coughing and sneezing to about  $11.00 to avoid a  day of severe shortness of



breath.  Since the latter impairment is likely to be associated with a work



loss  and/or  bed  disability  day,  the  former  value  is  probably  more



appropriate for  a minor  RRAD.    Because of  the  many uncertainties  in



arriving at such  estimates,  however, we  assume  that a minor RRAD could be



valued at  as much as $20.    If each of a  predicted 22  million fewer RRADs



are valued at $20,  annual  benefits to  the adult  urban  population of the



U.S. would be  $0.44 billion.  If RRADs were as few  as 250,000  (as predicted

-------
                                  4-27





in  the  third  row  of  Table  4-6),  and  each was  valued  at $2.31,  the



corresponding total  would be but  $0.58 million.



     It is important  to  note  that reduced ozone  concentrations may result



in  other   beneficial  effects   besides   possible  reductions  in  acute



respiratory illness.   These include  improved visibility, reduced damages to



forests, ornamental plantings, and  agricultural  output,   as  well  as other



welfare-enhancing changes.   All these  would have  to be  considered (and



valued,  where appropriate) in any comparison  of  the coats and benefits of



ozone control.



     Even when attention is confined to acute  respiratory illness, however,



the uncertainties in estimating benefits are substantial.  Both here and in



Volume  I,  predicted  changes  in RRADs  proved somewhat  sensitive  to  the



choice and measurement of independent variables  and, in Volume I at least,



the size of the sample over which the parameters were estimated.  Even when



these are held constant,  Table 4-6  demonstrates  that  predicted changes in



RRADs are  also  sensitive to the  assumed form  of  the  exposure-response



function (by two orders of magnitude).   Moreover, this difference is based



on a  comparison  of  point  estimates  without regard to confidence intervals



constructed  about   them.    These  uncertainties,  coupled with  sometimes



conflicting findings  from  other  epideraiological  or clinical  studies, must



make one cautious in using studies like this in policymaking.

-------
                                   4-28





APPENDIX



     As described in Section  4.3,  the log-likelihood function of the  RRAD



models can be written as
(A1)     I - I -exp(X 6)  + y.X.0 + c,
             .        U      U I/
             w
where exp(X 8) = A .   It  is  easy  to show that i is concave  in 8  so long  as
           t      C


its  inverse  Hessian  exists.   As mentioned  in Section  4.3,  the  maximum



likelihood estimates of  8 obtained by maximizing  (A1) are  consistent, but

                                         >s

the estimate of the covariance matrix of 3 .  using  minus  the inverse of the



Hessian evaluated at  $...  will tend to be inconsistent if the  data are not
                       ML


in fact generated by the specified Poisson distribution.



     This  is  easily  seen  as follows.    Note  that  the  model  can   be



equivalently  cast  as   a  nonlinear  least  squares  regression,   the t-th



observation being
(A2)     y  =• E(Y.) + u.

          u      U
                         ufc
with E(u  )  =•  0.   Clearly, Var(u ) =» Var(Y ) - exp(X 3) so  that  the  u.  are
        u                       C         u         u                 w



heteroscedastic.   If  nonlinear  weighted least  squares is  used with  the




weights exp(-X 8)  formed  using  consistent estimates of 8,  and  if  the data
              U



are  in  fact  Poisson-distributed  as  maintained,   the  maximum  likelihood

                                   <%                                     -A.

consistent estimates of 8 and Cov(8) will obtain.   (The consistency of  8Mr
                                                                         n*j



for  6  does  not  depend on the weighting scheme.)  However,  if  the  data  are

-------
                                   4-29


not  Poisson-distributed,  the estimate  of  Cov(S) obtained  in this manner

will be  inconsistent  and t-tests  based thereon will  be  misleading.    The

case is  fully analogous  to  the estimation of  the  heteroscedastic linear

model  which  yields   inconsistent  covariance  estimates  (and,  therefore,

t-statistics)  if  the  heteroscedastic  nature  of  the error  structure is

either ignored or incorrectly specified.

     White  [18]  and  Royall  [16]  have  demonstrated a  method  whereby
                  A
estimates  of  Cov(3)  robust  against misspecif ication of  the underlying

                                                  2        — 1
distribution of the data  can be obtained when [-3 J./3B30']   evaluated at
A                                                  /\
3^  fails  to  yield  a consistent  estimate of  Cov(8).   Denoting  1(3) as
   2
[-3 X./363B'], their  suggestion is to  estimate Cov(S) as
(A3)     KB)"1
where  i  is  the  t-th  observation's  contribution to  the log-likelihood
        TS
                                                             A
function and where all  relevant evaluations in  (A3) are  at  8,., .   This is
                                                              ML

the method used in constructing the confidence intervals for the parameter

estimates of  Section 4-4.   In  these cases,  the  standard errors  of the
                                      " _i
parameter estimates  obtained  using KB)    as  the estimate of  Cov(3) are

found to be about two to three  times smaller than those  obtained using this

alternative method.  As  noted by White [19],  the  alternative approach  (i.e.

using  (A3))  will  typically lead  to  conservative inferences  (i.e.  "too
                         A
large"  estimates  of Cov(8)) in instances  where X.  is nonstochastic and
                                                   t
varies across  t,  as is the  case here.

-------
                                 Footnotes






     .Specifically,  the Tobit specification error tests of Nelson [13] and



Lin and Schmidt  [9]  were used.  Nelson's is  a  Hausman test while that of



Lin and Schmidt  is a Lagrange  multiplier test.   Under the null hypothesis



of no raisspecification, both test statistics are distributed  asymptotically


         2
central X (< \i where  '<  is  the  dimensionality  of 3.  For the specification



described   above,    both   statistics   indicate  rejection   of   the  no



misspacification hypothesis at  better than the 98$  level.


     2
      These  confidence  intervals   are  constructed   using  the  approach



discussed  in  the  appendix,  which  should  give  conservative  asymptotic



t-statistics.  Confidence  intervals  based on  minus the inverse Hessian of



the Poisson log-likelihood function,  on  the other  hand, are much tighter,



but are  almost certainly misleading  (inconsistent), given  the  data  used.



These results are available from the  authors on  request.


     3

      The substantial  discrepancy  between  the magnitudes of the estimates



of  the  two-week  and  annual  ozone  coefficients  results, loosely speaking,



from  the  fact  that—while  the  sample means   of  the  two measures are



virtually identical—the sample variances of the two-week measures are much



larger than those of the annual counterparts  in conjunction with the fact



that the expectation E(RRAD ) is the  convex function exp(X 3).
                           v                              U

     4

      For  comparison's sake,  model   (3-D  was  also cast  as  a geometric



distribution.   Here, Pr(Y -y)   = Py/(1+P)y*1  for y-0,1.2,....  E(Y.)  = P,
                          C                            _            v


var(Y )  =•  P(l-t-P),  and  for  purposes  of econometric  estimation  E(Y )   =
     U        .                                                        U


exp(X 8) and  Var(Y ) - exp(2X. 3)  +  exp(X 3)  are specified.  As expected,
     U             «           w           w

-------
                                   4-31



                           A

the estimated variances of 3 were somewhat  larger  than those obtained using



the  uncorrected  variance version  of  the  Poisson  specification while the



estimates themselves were quite similar.  However,  like  the  Poisson, the



maximum likelihood  variance estimates based  on minus  the inverse  of the



Hessian evaluated at 8.,,  are not generally consistent  if the data are not
                       Mil


distributed  according  to the  postulated  geometric distribution.   Thus,



while  larger  than  the  estimated  variances  of  the  uncorrected Poisson



specification, the ML  estimates of the geometric parameter variances were



still  substantially smaller than  those  obtained  using  the  alternative



approach.



         5  is, of course, a  parameter  to be  estimated  rather  than a given



constant.  The ML algorithm used to obtain  the Poisaon parameter estimates,



however, did not permit estimation of  such  additional nonlinearities.


     5

      Kopp,   Raymond, William Vaughan,  Michael Hazilla and Richard Carson,



"Implications of  Environmental  Policy for  U.S.  Agriculture:    The Case of



Ambient Ozone  Standards," Resources for  the  Future working paper, January



5, 1984.

-------
                                    4-32


                                 References
 [1] Chappie,  Michael  and Lester  Lave,  "The  Health Effects  of Air Pollution:
      A Reanalysis," J._ Urban Econ.,  vol.  12 (1982) pp.  346-76.

 [2] Crocker,  Thomas,  et.  alv  "Methods  Development for  Assessing Air
      Pollution Control Benefits,"  Vol.  1, EPA Document  EPA-600/5-79-001 a
      (1979).

 [3] Graves,  Philip and Ronald Krumm,  "Morbidity and  Pollution:  Model
      Specification Analysis for  Time-Series Data on  Hospital Admissions,"
      J.  Environ.  Econ. Manage.,  vol.  9  (1982) pp. 311-327.

 [4] Hausman,  Jerry,  Bart Ostro,  and  David Wise, "Air Pollution and Lost
      Work,"  NBER  working paper no. 1263,  January 1984.

 [5] Hausman,  Jerry, Bronwyn Hall,  and Zvi Griliches, "Econometric Models
      for Count Data with an Application to  the Patents-R&D Relationship,"
      Sconometrica, vol. 52  (1984)  pp.  909-938.

 [6] Harrington, Winston and Paul R.  Portney, "Valuing the  Benefits of
      Health and Safety Regulation in the Presence of Defensive
      Expenditures," RFF Quality  of the Environment working paper
      no. QE84-09,  September 1984.

 [7] Hazucha,  Michael  and David Bates,  "Combined Effects of Ozone and Sulfur
      Dioxide on Human Pulmonary  Function,Ir  Nature, vol.  257 (1975) pp.
      50-51 .                                            ...

 [8] Lave, Lester  and  Eugene Seskin,  Air Pollution and Human Health
      (Baltimore,  Md.:  Johns Hopkins University Press,  1977).

 [9] Lin, Tsai-Fen and Peter Schmidt,  "A Test of the  Tobit Specification
      Against an Alternative Suggested by Cragg," Review of Economics ajid
      Statistics,  vol. 66 (1984)  pp.  174-177.

[10] Lipfert,  Frederick, "Air Pollution and  Mortality:   Specification
      Searches Using  SMSA-Based Data,"  J.  Environ. Econ.  Manage., vol.  11
      (1984)  pp. 208-243.

[11] Loehman,  Edna et. al, "Distributional Analysis of Regional Benefits and
      Costs of Air Quality Control,"  J.  Environ. Econ. Manage., vol. 6
      (1979)  pp. 222-243.

[12] Mendelsohn, Robert and Guy Orcutt,  "An  Empirical Analysis of Air
      Pollution Dose  Response Curves,"  J.  Environ. Econ.  Manage., vol.  6
      (1979)  pp. 85-106.

-------
                                    4-33
[13] Nelson,  Forrest,  "A Test for Misspecification  in the Censored Normal
      Model," Econometrioa.  vol.  49  (1981) pp.  1317-1330.

[14] Ostro,  Bart,  "The Effects of Air Pollution on  Work Loss and Morbidity,"
      J.  Environ.  Econ.  Manage.,  vol.  10  (1983) pp. 371-382.

[15] Portney, Paul and John Mullahy,  "Ambient Ozone and Human Health:  An
      Epidemiological  Analysis,"  report prepared for Economic Analysis
      Branch, Office of Air Quality  Planning and Standards, USEPA under
      contract no. 68-02-3583,  September  1983-

[16] Royall,  Richard,  "Robust Inference Using Maximum Likelihood
      Estimators," Johns Hopkins  University, Department of Biostatistics
      Working Paper 549, 1984.

[17] Smith,  V.K.  (ed.),  Environmental Policy Under  Reagan's Executive Order
      (Chapel-Hill, N.C.:   UNC Press,  1984).

[18] White,  Halbert, "Maximum Likelihood  Estimation of Misspecified Models,"
      Econometrica, vol. 50  (1982) pp. 1-25.

[19] 	 , "Corrigendum,"  Econometrica, vol. 51 (1983) p. 513.

[20] White,  Lawrence,  Reforming Regulation;  Processes and Problems
      (Englewood Cliffs, N.J. :  Prentice-Hall,  Inc., 1981).

-------
                        Chapter 5

         CONSTRUCTING A LIFETIME SMOKING PROFILE
         USING THE  1979 HEALTH INTERVIEW SURVEY
     We noted above  that  individuals'  amassed "stocks"  of

cigarettes  consumed  over   a   lifetime   are  potentially

significant influences  on respiratory  illness.    Yet  the

models estimated  in  Volume  I all made use of  a  much more

crude  measure  of  smoking  behavior.   An important  issue

here,  then,  is   the  construction of a more  sophisticated

measure given  available data.  One theoretically  plausible

construct is K(T)  =  /exp ( -r ( T-t) )C (t )dt,  where S-CT.T],  _T
                     a                              ~
is  time   started  smoking,   T  is  present time,  C(t)  is

instantaneous  cigarette consumption.at t, and r is a decay

or  depreciation  rate.    The  empirical  representation  of

K(T)  is   not  straightforward,   however,  even  given  the

information available  in  the   smoking  supplement  to  the

1979 HIS.

     This   is   so  for   several   reasons.     First,   an
                                                    T
individual's entire lifetime smoking  profile {C(t)}  _  is.

never  given in   the  data.   This is  so  even  if C(t)  is

couched   in  discrete   time   as   fC.}   with   reasonably
                                     O
high-frequency   (e.g.   one  month   or  even  one   year)

realizations.   At  best  the  profile  can be approximated  by

the  use   of  subsidiary  information.    Second, the  above

formulation is quite simple, one  of  an infinite  number  of

reasonable  proxies  for the  "true"  relationship.   Third,

-------
                           5-2



while K(T)  as  defined  above  is  in  principle capable  of



describing the  effects  of cigarette tar-nicotine  content



and   cigarette   length,   it  seems   that  amending   the



formulation  to  account   for  such  influences  would  add



little to the analysis given the nature  of the data.



     The  dataset  used  to  construct  the  measures  is  of



course the 1979 HIS  smoking supplement.   This  survey  gives



a  reasonably  detailed  picture  of  individuals'   smoking



status  at   the   time  of  the  survey   in   addition   to



information on past  attempts to  quit,  age at which  regular



cigarette smoking began,  number of cigarettes smoked  per



day at the time of peak consumption,  and other attributes.



Yet most  of  the data in  the  smoking survey  is of little



use  insofar   as  construction  of  a  "packyear"  or  stock



measure is concerned.   (A check  on several other  datasets



containing information on smoking behavior reveals  similar



or  even  more severe  weaknesses.)   Ignoring  minor points



and the complications presented  by problems such  as faulty



recall,  the most serious problems are the following.



     Although  data   are  given   on  peak  daily  cigarette



consumption,  no information is  available on when  the peak



occurred  (unless  it  coincides   with, present  consumption



levels,  C(T))  nor on the duration of consumption  at  that



peak  rate.    Second,  information on   quits  (number  of



attempts;  duration   of   time   off)   is  insufficient   to



construct  for   either   current   or   former   smokers   a



reasonable profile  of the time   intervals  over which  C(t)

-------
                           5-3



was zero.  Quit  duration  information  is  available  only as



the interval  from time last smoked to  T for former  smokers



and for the length of the  single most  recent quit (if any)



for current smokers.   Some  detail  is  provided  for  current



smokers  on numbers  of serious  quit  attempts,  but  what



constitutes   a    "serious    attempt"    is   analytically



problematical,  a  subjective  assessment  suraly  varying



across individuals.  No information on age started  smoking



is  given  for   the  subsample  of   occasional   smokers.



Finally,   it   should  be  noted  that  even  the  use  of  an



obvious stock proxy measure like C(T-5) with,  for example,



<5  equal  one  year,  is precluded by data  availability.   It



is possible to determine neither consumption levels of one



year  (or  six months,  or  one  month)  ago nor,  in  many



instances, even the sign of C(T-6).



     Yet   there  is  some   information  that  permits  the



construction  of  a reasonably  interesting, albeit  rough,



proxy measure for the lifetime smoking profile  K  if one is



willing  to  make  certain  assumptions.   Since  age  started



smoking  is  unavailable for  the occasional smokers,  this



subsample  (about two  percent) will henceforth  be excluded



from  the  analysis.    By  assumption,  K = 0  for  all  never



smokers.    Thus,  the proxy  must  be  constructed for  the



subsamples of  former and  current  smokers.   The data are



such  that  separate treatment of  these two subsamples  is



required.   In both instances,  however,  several  plausible



temporal  smoking profiles  can be  created.   In  the  absence

-------
                           5-4




of any prior information on which profile best captures an



individual's  true   consumption   path,  the  only  sensible



solution is  to  consider several  different  specifications



in  the  empirical   analysis   and  assess  ex   post   the



sensitivity of the  results  to  the specification used.



     For both former and current smokers, the  construction



of the  K  measures  relies on  a  major  assumption about  the



influence  of  quits on  the  temporal  consumption  profile.



That  is,   the  profile   is  "forgetful" of  quits:   once  an



individual resumes  smoking  after  having  quit, consumption



over the quit interval  is  treated  as  if  there had been no



quit at  all.   For  example, Figure  5-1 depicts  the manner



in   which   this   forgetfulness    operates,   with   true



consumption  C*(t)   shown  as   a  solid  curve  and  proxy



consumption shown as a dashed  curve:

-------
                           5-5
                       Figure 5-1 :
               Hypothetical Smoking  Profile
      C(t)  ,
                T          T           ft

Such an approach has  the  unfortunate  implication  that,  to

use an extreme example,  the proxy profile of  an individual

who quit smoking twenty  years ago and  resumed yesterday  is

drastically different than  that  of  an individual with  an

otherwise  identical  smoking history  who  had  not  resumed

smoking.   Until  better  microdata on  individuals'  smoking

histories become available, such drawbacks are inevitable.

     For former  smokers,  the variables used  to  construct

the stock  proxy  are time  started  smoking (J) ; number  of

cigarettes smoked per day at peak consumption (MCIGP);  and

time last smoked regularly (T).   There are three  plausible

profiles that  can  be constructed using  this  information;

these can best be described graphically.

     The first profile for former smokers, shown  in  Figure

5-2, assumes  that peak  consumption  occurs at  the  midpoint

         (-T*),  and  that  consumption  rises  and   falls

-------
                           5-6

linearly  from  and  to  zero  from   this   peak   (C(t)   is

henceforth shown in solid lines):
                        Figure 5-2
         Smoking Profile:  Former  Smokers  I  (F-I)
      C(t) .
    NCIGP  -•
    NCIG
                T          T           ft

     The  second  profile   for  former  smokers,  shown  in

Figure  5-3,   is   based   on  the  assumption  that   peak

consumption is  attained immediately at T_  and  continues  at

that rate until ₯:

-------
                           5-7
                        Figure 5-3
        Smoking Profile:  Former Smokers II  (F-II)
      C(t) .
    NCIGP  ••
    NCIG
     The third former smoker profile,  shown in Figure  5-4,

assumes  that  from  T_  consumption  increases  linearly  to

NCIGP which occurs at f,  then falls instantly  to zero  :

-------
                           5-8
                        Figure 5-1
       Smoking Profile:  Former Smokers III (F-III)
      CU)  .
    NCIGP  -•
    NCIG
     The  construction of  the,  profiles  for  the  current

smokers  uses  T,  NCIGP,   and  NCIG.    Five  profiles  seem

sensible:  three  for  current  smokers for whom  NCIG-NCIGP

and two for those where NCIGP  exceeds NCIG.

     The  first  profile,   in  Figure  5-5,  is  analogous  to

that  in  5-4:  consumption  increases linearly  from  T_  to

NCIGP which coincides with NCIG at T:

-------
                           5-9
                        Figure 5-5
         Smoking Profile:  Current Smokers I (C-I)
      C(t) .
    NCIGP
                T          T           T          t

     The  profile   in  Figure   5-6   assumes   that   peak

consumption  first  occurs  at  T*, then  continues  at  that

rate to T (T*  is defined for current  smokers as (T-T)/2):

                        Figure 5-6
        Smoking Profile: Current Smokers II (C-II)

      C(t)  ,
    NCIGP  ••

-------
                           5-1 0

     The  third   construct   for  the  NCIG-NCIGP   group,

illustrated in Figure 5-7, assumes  that  NCIGP  is  attained

immediately at T  and continues at that rate to  T:
                        Figure  5-7
       Smoking Profile:  Current Smokers III  (C-III)

      C(t)  „
    NCIGP  -•
                           "I"
                T;          T           T          t

     The profile in Figure  5-3  is  the  first  shown  for  the

subsample  reporting  NCIG  less  than  NCIGP.    Here it  is

assumed that consumption increases linearly to NCIGP which

occurs at T* and then  decreases  linearly to NCIG at T:

-------
                           5-1 1
                        Figure 5~8
        Smoking Profile:  Current Smokers IV (C-IV)
      C(t) ,
    NCIGP  ••
    NCIG
     Finally, the profile shown in Figure 5-9 assumes that

NCIGP is attained  immediately  at T  and  declines  linearly

to NCIG at  T:

-------
                           5-1 2

                        Figure 5-9
         Smoking Profile:  Current Smokers V (C-V)
      C(t)  .
    NCIGP  ••
    NCIG
                XT           T          t

     Given these  specifications,  it  is  seen that  all  of

the   integrals   to    be    evalated    have   linear   or

piecewise-linear  integrands.    That  is,  on  the  interval

[a,c] (where a»T  and  c»?  or  T),  the integrands  are either

of the form 5(t)(a+3t),  te[a,c], or 5(t)(o. +8. t),  te[a,b],

<5 (t) (a +3 t) , te(b,c] for b = T*.   Specifically,
K(T)
               I exp(-r(T-t) ) (a +3 . t)dt,
            J-1fl
                J
where fl *Ca»b] and  8?-(b,c].   Straightforward integration

by parts gives the solution as
            2                                  aup(Q.)
     X(T) - I exp(-rT)[r  (
-------
                           5-1 3



     The final  point  is the determination  of  r.   Use  of



decay  or  discount  rates  is  often  essential  in  applied



econometrics.   Yet,  in  most instances there is  no  way  to



know,  the  "correct" rate,  so  that  in discounting  future



streams or  depreciating accumulated stocks, the  strategy



typically adopted  is  to posit  some  rate or set  of  rates



and  conduct  analysis  as  if  the  rate  is   known.    This



approach has been used in a wide spectrum of applications,



generally with  little discussion or  justification for  the



rate  chosen  (although  some studies  helpfully  demonstrate



the sensitivity of results  to the  assumed rates).   Such  an



approach will be used here.



     Given  the  above  assumptions  on  consumption  profiles



and decay rates, the K proxy measures can be derived using



the relevant  data  in  the estimation sample.   However,  an



obvious drawback  is  that with  three  possible  consumption



profiles  for  NCIG=NCIGP   current   smokers,   two   for



NCIG
-------
                           5-1 4



     The  combinations   to   be  used  are   (for   former,



NCIG=NCIGP current, and  NCIG
-------
                           5-15



by  exposure  to ambient  air  pollution,  current  cigarette



consumption,    and   other   covariates,   an   individual's



prev ious cigarette  smoking predisposes  him  or her  to  an



increased risk of respiratory  illness.   Using  a  subset  of



the proxy profiles  described above  enables us  to  test  for



the presence  and extent of such effects.

-------
                        Chapter 6

CIGARETTE SMOKING, AIR POLLUTION, AND RESPIRATOR'/ ILLNESS:

              AN ANALYSIS OF RELATIVE RISKS



6.1Introduction

     The relative risks  associated  with cigarette smoking

and ambient air pollution are difficult to assess.  First,

individuals' health status is largely subjective and often

difficult  to  measure.1    In addition,  lung  physiology  is

complex  as  well   as   heterogeneous   in  a  population  of

individuals,   hindering   both   the   identification   and

measurement of  all  potential  determinants  of  respiratory

illness.   Moreover,  there  is  little  theoretical guidance

as  to  the  likely  form  of  any functional  relationship

between risk exposure and illness response.  Finally,  data

on exposure to risks are often not all one might like  them

to be.  It is thus apparent why one expert on quantitative

risk assessment was moved to comment:



     Quantitative  risk assessment  is  not  a  panacea.
     A primary limitation  is  that  such an assessment
     is concerned only with what can  be measured and
     quantified.2

     In spite of these problems, relative risk assessments

must  be  undertaken  for  smoking  and air  pollution.   Both

have been the  subject of  much  discussion and  study in the

health  and  environmental policy  communities,  and  in  the

-------
                           6-2



popular  press  as  well.   Moreover,  as  discussed  below,



considerable resources  are being  devoted  to understanding



and reducing both risks.  It  is essential,  therefore,  that



the  risks  attributable to  smoking and  air pollution  be



assessed   simultaneously    within   a   single   coherent



framework;  otherwise, risks attributed to  one  may in  fact



be  due  to  the other,  thus biasing  any  estimates  of  the



health risk of but one of  the  variables.



     The plan  in this  chapter  is as follows.    Section 2



discusses   in   greater  detail   the   problem    of   acute



respiratory illness and some  possible links between it and



smoking  and  air  pollution.    Section  3  describes  the



dataset  used  in  the   empirical   analysis,  explains  the



health  measures   utilized,  and   sketches  the   estimation



strategy.   finally,  Section 4 presents  empirical results,



derives  the  estimates  of  the  relative risks of  interest,



and briefly suggests new directions for  future  research in



thi s area.








6.2 Smoking, Pollution, and Acute Illness



     The association between  cigarette smoking  and several



major  chronic  illnesses is well  known.    The  1982,  1983,



and  1984  reports of the U.S.  Surgeon General  detail  and



publicize,    respectively,    the   relationships   between



cigarette smoking  and  cancer,  cardiovascular disease,  and



chronic obstructive lung disease.   For these diseases, the

-------
                           6-3



indictment of  cigarette  smoking is strong:  although  data



can obviously never demonstrate  causality  (as  the  Tobacco



Institute is wont  to  remind),  the  correlative  evidence  is



overwhelming.



     Less  widely  publicized   are   associations   between



cigarette smoking  and less  severe illnesses.   There  is



evidence, however,  that  suggests  the  existence  of  such



linkages.   Chapters  Three and  Six  of  the 1979  Surgeon



General's report summarize much of the  existing  research



in  this   area.   There  it  is  reported  that relative  to



nonsmokers,  current smokers have more frequent  respiratory



tract  infections  and  a  greater   prevalence   of   cough;



symptoms like cough and sputum production tend  to  increase



with the  number  of -cigarettes  smoked.   Moreover,  the  1979



report finds that  "...people  who had ever  smoked...had  a



higher incidence of  acute  illnesses  than  did  people who



had never smoked (p.  3~6)."   Smokers  report approximately



H5%  more illness-related  work  loas  days   than do  never



smokers.



     Owing to the  magnitude and  severity of  the illnesses



associated with  cigarette  smoking, considerable attention



and public resources  have been devoted to the study  of and



remedies  for such illnesses.  While one  is  hard-pressed  to



estimate   the  value  of   the   resources  spent  in   such



activities,   it   is safe  to  venture  that   the  value  is



enormous.

-------
                           6-4



     Other  public  policies  have  been  put  in  place  to



protect  individuals'  respiratory  health.    For  example,



several  federal  agencies  are  involved in the  protection



against  and  compensation  for damages  from  pneumoconiosis



(black   lung   disease),   while  exposure  to   respirable



hazardous substances in the  workplace  -- like  the  cotton



dust which causes byssinosis  — comes under  the regulatory



purview of OSHA.



     Of  more  immediate  interest  here,  however,  is  the



widespread  concern  that   ambient  air  pollution  may  be



detrimental  to  individuals'   respiratory  health.    Many



clinical  and  epidemiological  studies  have  tested  for



possible  relationships  between  ambient air  pollution  and



both morbidity and mortality.   The  cornerstone of  federal



air  pollution policy  in   the U.S.,  the Clean  Air  Act,



places primary emphasis  on the protection of  public health



from  air  pollution,  insisting that  air  quality  standards



be  set  to   provide  "an  adequate  margin  of  safety...to



protect   the   public  health."     Of  the   possible  air



pollution-related  illnesses,  it  is  of course  respiratory



illness  that  is  of  utmost  concern.    Regulatory  mandates



pursuant  to  the  Glean  Air  Act   are  not   inexpensive:



according  to the  most  recent estimates, annual  costs  of



complying  with  the  Act  are  approximately  $25  billion.



This sum  is sure to grow as older sources of  pollution are



retired  and  newer  ones   --  which  must  meet  stricter

-------
                           6-5



emissions standards — are built to replace them.   Whether



such expenditures  achieve  desired  ends  efficiently (if  at



all) is a question on which we hope to shed some light.



     Our task  in this chapter  is  to assess  the  relative



contributions of cigarette smoking and air pollution to'an



individual's  risk  of  suffering  from  acute  respiratory



impairments of  varying severity.   We concentrate  on  this



category because none but the most extraordinary exposures



to  air  pollution  can be  expected  to  rival  direct  (or



perhaps  even  passive)  smoking as  a  cause  of  the  more



serious  illnesses  like cancer,  cardiovascular  disease,  or



chronic  lung  disease.     It   seems   to  us  plausible  to



hypothesize  that   if   typical  levels   of   ambient   air



pollution in the U.S. are to influence individuals' health



in  any  manner,  then acute  respiratory  illness must  be  a



primary  area  of  suspicion.    Because  it  can  be  quite



expensive  to  control air  pollution,  it  is  important  to



assess  how the  benefits  of   doing  so  compare  to  those



associated   with   policies    oriented   towards   smoking



cessation.    We  suggest   that  an analysis   of  the  relative



contributions   of   cigarette   smoking  and   ambient   air



pollution  to acute  respiratory  illness   is  one  way  to



approach this important  assessment.3








6.3 Data and Estimation  Strategy



     The  individual  data  used  here are  from  the  1979

-------
                           6-6



Health  Interview  Survey  (HIS),   a  national   sample  of



approximately  110,000   individuals   conducted   over   the



course  of  each  year  by  the National  Center  for  Health



Statistics.  In  this  regard,  the  analysis  in  this  chapter



is similar  to  that in  Volume I  and  in  Chapter  4  in  the



present volume.   The socioeconomic data elicited  from  each



respondent in the  HIS includes  information on age, race,



sex,   income, education,  as  well  as other  individual-  and



household-specific  characteristics.     In   addition,   the



supplemental survey  on  smoking  behavior  administered  in



the  1979   HIS  make  it  particularly   useful  for  present



purposes.    This supplemental questionnaire  was  asked  of



one-third  of the approximately  78,000 adults  ( 17 •*•  years)



interviewed,  and  provides   detailed   data   on  'lifetime



smoking history and present smoking behavior.



     Restrictions   in   activity   due   to  any   illness



experienced during the  two-week period prior to  the  date



of  each  interview  are  reported  by   the  interviewee  or



another household  member responding for the  interviewee.



Manifestations  of illnesses are  classified  in three types:



bed  disability  days,  work or school  loss  days,  and  what



might  best be  thought  of  as  minor   restricted  activity



days.   The  latter  are  days  on which the  subject  was



neither bedridden  nor forced to  miss  work or  school,  but



on  which   the  individual  did  suffer  from an  impairment



sufficient  to  cause  a  perceptible  restriction  on  usual

-------
                           6-7



activity.  The  information  on health impairments  elicited



in  the  survey   is   coded   by   cause   according   to  the



International  Classification of  Disease.    As  discussed



earlier, attention  is limited  in this  chapter to  those



restrictions in activity  due to respiratory illness.



     All   air    pollution    data   come  from   the   U.S.



Environmental Protection Agency's SAROAD system.   The air



quality  data used  here   are measured  over  the  two-week



recall  period for  which  the  individual acute  health data



are  available.     The  received  opinion  of   respiratory



physiologists suggests  that  not  all  airborne  pollutants



are equally  important  in influencing  respiratory  health.



Accordingly,  the analysis  of  acute respiratory  illness



here uses air  pollution  data for  ozone (OZONE),  a  gaseous



pollutant that  is the  primary constituent  of smog,  and



sulfates (SULFATE),  perhaps  the most  harmful  of  airborne



particulate matter.   The  subsequent  analysis  characterizes



individuals'  exposures  to air  pollution  using data  from



the  air pollution  monitors  nearest  the  center  of  the



census  tract   in  which  the  individual  resides.     No



individual  for  whom  the  nearest monitor is more than ten



miles away is included in the estimation sample, with the



sample  average  distance   from centroid  to monitor  being



slightly more  than  four  miles.   For more  details on the



air pollution data used  in  this analysis,  consult Volume



I.

-------
                           6-8



     We  include   two  measures  to  control  for  cigarette



smoking.  The first  is  the  individual's  daily consumption



of cigarettes at  the  time of the interview  (NCIG).   (The



HIS unfortunately contains no information on cigar or pipe



smoking.)   Since  the consumption data are  self-reported,



some  caution must  be   exercised  in  light  of  Warner's



underreporting hypothesis [5]; however,  we make no attempt



here  to  correct   for  this  possible  errors-in-variables



problem.     (Interestingly,   in   light   of   the  mounting



evidence on the  harms of passive smoking, attributing zero



as  the   number   of   cigarettes  smoked  per   day   by   a



"nonsmoker"  represents  perhaps  an  understatement  of  the



daily dosage of  cigarettes.)



     Both medical  evidence  and common sense  suggest that



the  rate of  current cigarette  consumption  alone   is  an



insufficient   characterization   of    an    individual's



sraoking-related  risk of respiratory  illness  (see,  for



example,  Chapter  Six   of   the  1979   Surgeon  General's



report).   A  more appropriate  characterization  of  these



risks incorporates the  influences of both current as well



as past cigarette consumption.  Accordingly, the influence



on the likelihood of   current acute  respiratory illness  of



lifetime cigarette consumption is measured by the variable



PACKS,  a  proxy  for the  number  of  cigarette  packs  that  a



given individual  has  "amassed"  over his  or  her lifetime.



PACKS can be  viewed as  a stock  or state  variable equal  to

-------
                           6-9



the integral over  an  individual's  lifetime cigarette pack



consumption  profile   (C(t)}.     (See   Chapter   5   for  a



discussion  of  the  creation  of the  K(t)  measures.)   The



measure defined in Chapter 5  and converted into  pack units



is  selected from  the  set of  candidates  to serve  as  the



pack/ear proxy in the present analysis.



     Table  6-1 provides  a summary description of  the  air



pollution  and  smoking  measures,  as  well as  the  other



independent  variables  used,  and  Table 6-2 depicts  their



sample means, minima, and maxima.



     Among  the measures  of  respiratory  illness  available



in  the  HIS, the  number   restricted  activity  days  due  to



respiratory  illness  during   the  two-week  recall  period



(RRAD)  is  a  logical  choice  for  use  in  the  present



analysis.   However,  one  drawback to its  use  as  a  measure



of  health  status  is  that  it  is  a somewhat  aggregated



concept.   Any day  reported  as a  bed  disability  or  work



loss day, when due to respiratory illness, is  counted as a



RRAD,  as  are  days  when individuals  are  hampered  in minor



ways from  performing  usual activities  without  confinement



to  bed  or work loss.   It is possible, however,  that  the



determinants  of   minor  restrictions  are  likely  to   be



different  —  in   kind   or   in   magnitude --   from  the



determinants of severe limitations.



     The HIS data  do  not  enable  a complete disaggregation



of these different types  of respiratory restrictions.   For

-------
                          Table  6-1

                   Variable Definitions
Variable Name
              D escri pt i on
  OZONE
  SULFATE
  NCIG

  PACKS



  TEMP


  RAIN


  AGE

  EDUC

  INCOME

  CHRONLIM



  MALE

  WHITE

  BLUECOL



  WHITECOL



  INSCHOOL
Average daily maximum one-hour ozone
reading during two-week recall period
at monitor nearest the centroid of
respondent's census tract of residence,
subject to ten-mile distance cutoff
(in parts per million)

Average 24-hour sulfate concentration
during two-week period at monitor nearest
the centroid of respondent's census tract
of residence, subject to ten-mile
distance cutoff (in wg/m )

Number of cigarettes smoked per day

Proxy for lifetime cigarette consumption,
in packs (see text or [5] for detailed
description)

Average daily maximum temperature during
two-week recall period (in degrees r)

Average daily precipitation during
two-week recall period (in inches)

Age, in years

Number of years of schooling

Annual family income, in 1979 dollars

Equals 1 if respondent reports a
persistent limitation in activity due
to a chronic ailment, equals 0 otherwise

Equals 1 if male, equals 0 if female

Equals 1 if white, equals 0 if black

Equals 1 if respondent reports usual
activity is working and usual employment
is blue collar, equals 0 otherwise

Equals 1 if respondent reports usual
activity is working and usual employment
is white collar, equals 0 otherwise
Equals 1  if
acti vi ty is
otherwise
respondent reports usual
going to school, equals 0

-------
                    Table 6-2




Sample Summary of Independent  Variables (n=3073)
Variable
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
3LUECOL
WHITECOL
INSCHOOL
Mean
.0426
1 0.87
7.454
3299.1
63.92
.114
U2.83
1 1 .73
17095
.173
.433
.857
.232
.288
.079
Minimum
0
.784
0
0
11.14
0
17
0
500
0
0 •
0
0
0
0
Max imum
. 21 36
52. 1 4
98
441 74. 1
106.36
.637
96
1 3
30000
1
1
1
1
1
1

-------
                           6-1 0



example, due to some  peculiarities  in  data  collection,  it



is not  possible in all  instances  to disentangle  work  loss



from  bed  restricted  days  (see  Volume I  for  a  detailed



discussion  of   these  problems).    However,  the  data  do



permit  a   unique  disaggregation   of   RRADs    into   two



qualitatively distinct types:  days  restricted  in activity



due to  respiratory illness but  not  confined  to  bed  (minor



RRAD,  or  RADM),  and  days  restricted  in  activity due  to



respiratory illness with bed  confinement  (severe RRAD,  or



RADS) .  .  Thus,   it  is   possible  to  determine  for  each



individual  the  number  of  RADM,  RADS,  and  nonrestricted



days  (NRD)  occurring  during  the  two-week recall  period.



In  the   analysis   to  follow,   we   consider  two  separate



definitions  of  RADM  and  RADS:  first,  those  related  to



respiratory illness classified by NCHS as  either  chronic



or  acute  (RADM-CA,  RADS-CA),  and,  second,   those  related



only  to respiratory  illness  classified as  acute (RADM-A,



RADS-A)."    The  sample frequencies are presented in  Table



6-3.



     The  nature of  these  health  status  measures is  such



that  several   peculiar   characteristics  must  be  treated



simultaneously  in the estimation procedure.    First,  the



measure best  suited for the analysis is multivariate  in



nature:  during  any two-week period,  individuals can report



minor   or  severe  restrictions    in   activity  due   to



respiratory   illness,   or    can   report   no   respiratory

-------
                        Table  6-3



Sample Frequency Distribution  of  RRAD  Measures  (n=3073)
Number of Days
0
1
2
3
4
5
6
7
3
9
1 0
1 1
1 2
13
1 4
RADM-CA
3013
16
1 4
8
6
2
1
2
1
1
1
0
1
0
7
RADM-A
3027
1 4
1 2
6
5
2
1
1
1
1
1
0
1
0
1
RADS-CA
3007
1 9
1 8
1 0
5
4
1
3
3
0
0
0
0
0
3
RADS-A
301 4
1 8
1 5
1 0
5
4
1
2
3
0
0
0
0
0
1

-------
                           6-1 1



impairment.   Second, outcomes are mutually  exclusive.   On



a day where  an  individual  reports a RADM,  neither  a  RADS



nor  a NRD  can  be reported; similar  exclusivity  holds  for



RADS and NRD.   Third,  for  all individuals, each  of  RADM,



RADS,  and  NRD   is  constrained  to take  integer  values  in



{0, 1 , . . . , 1 4},   with   the  sum   RADM+RADS-i-NRD   equal   to



fourteen.   Finally,  because of  the protocol  of the HIS,  it



is  not  possible  to determine  on  what   days  during  the



two-week  recall  period  a  given  individual  reported  the



RADM,   RADS,  or  NRD;  only the  number  of  each  type  of



outcome is  known.  While it seems sensible to  suppose that



RRADs  would be  contiguous rather  than disparate during any



particular  time interval, the  data used here do not permit



such a conjecture to be verified.



     Following the discussion  in  Chapter  2 of  this volume,



the  estimation  strategy is to  view  each day during  the



two-week  recall  period as  a trial  on which  one  and  only



one  of the  three possible outcomes  can  occur.   For  each



individual,  then, there  are fourteen trials.   Because any



one   individual's  covariates   are  invariant   across  the



fourteen   trials  and,   as   noted  above,   because  it  is



impossible  to ascertain  which health outcomes occurred on



which days  (except,  of  course,  in the polar case where the



same outcome occurs  on  all  fourteen  days), it  is plausible



for  estimation purposes to  assume independence both across



trials for an individual and across  individuals.   (In the

-------
                           6-1 2

estimation  subsample  used,  it  happens  that  at  most  one

individual  per  household  is included.    Thus, contagion

effects -- which might otherwise vitiate tine assumption of

independence across individuals -- can be ignored.)

     The  preceeding  paragraphs  describe a  model that  can

be   appropriately   cast   in   terms   of   a   multinomial

distribution  with   k=»3   possible   outcomes;    n =n  =n = l4
                                                 t   T

independent  trials  for  all  t,t;  and  probability   vector

( *M ,Tfo , IT.. )   (M = RADM,    S-RADS,    N=NRD)    such    that
  Mt  St  Nt
ir.. +ir_ -(-ir.I -1 .   The  number of  successes  or incidences  of
 Mt  St  Nt '•
each type  is  n    for q»M,S,N,  and  n   *nq  *n«  -14 fpr  all
               qt                      t   t    t -
t.     Thus,  denoting  the  multinomial  (vector)   random

variable for the t-th individual as Y.  ,
                           n
     Pr(Y =y.) - nl H [ ( IT  }qt]/n  !,                   (1)
         C  C      qefl   qt      qt
where Q={M,S,N}  and  n»14.   A  logit specification for  the

ir   is assumed:
 qt
     ir   - exp(X 8 )/( I exp(X 8  )),                   (2)
      qt          q           * r
for q=M,S,N.   The  parameter  vectors 3  are unique only  up

to a difference,  so  that some normalization is necessary;

3.T=«0 is used here.  Details on estimation are presented  in
 N
the appendix.

     A  basic  and  more  popular   version  of  the  model

-------
                           6-13



described above  is the  ordered  logit model  described  in



Chapter 2,  in  which  it is assumed that  there exists  some



mechanism that orders  the  outcome  probabilities  according



to  a  particular  latent  measure  (illness  severity,  for



example).   The typical assumption is  that the coefficients



S*=*(3 0,...,8 y.)   are  invariant   across  the  outcomes   q
 q   qz   .   qK.


(except  for the  outcome whose  parameter  vector  remains



normalized to zero),  with the ordering is characterized  by



outcome-specific      intercept      terms,      such     that



8 01<3 .•"   signifying  the



ordering  "more severe  than."   For  purposes  of comparison,



therefore, we also present estimates  of a a multiple-trial



version of  an  ordered logit model.   It  happens  that  this



is a parameter-restricted version of  the multinomial model



specified above,  with (K-1) restrictions of the form 33-8*



on  the likelihood function  (A.2)  implied  by  the  ordered



logit likelihood function.  It is thus possible to test in



a  straightforward  manner  whether  these  restrictions  are



valid  insofar  as  the model and  data  sample used here are



concerned.   The  test  is  a standard likelihood ratio test,



with the  test  statistic  computed as  LR = -2(4-4,.); ln  is
                                              U  A    U


the  maximized  likelihood  function  value for  the  ordered



logit  specification  and  4  is the  corresponding value for

-------
                           6-1 4



the multinomial  model  (A,2).   Under the  null  hypothesis



that the  (X-1)  restrictions  are valid,  LR  is  distributed



asymptotically  as  central  x   with   (K-1)   degrees  of



freedom.
6.4 Estimates of Model Parameters and Relative Risks



     The  estimates  of  the  model  Using  the  chronic  and



acute RRAD  measures  of respiratory  illness are  presented



in  Table   6-4.     Insofar   as   the   parameter   estimates



associated  with  the  independent  variables   other  than



smoking or  pollution  are  concerned,  it is  seen  that  most



are  statistically significant  in  at  least  one  of  the



RADM-CA  or  RADS-CA   estimated  parameter  vectors,  with



generally   plausible   signs   in  most  instances.     The



parameter  estimate  associated with  the current level  of



cigarette smoking (NCIG)  is  statistically important in fJ,,
                                                         M


but   is   insignificant   in   &„.      Lifetime   cigarette



consumption (PACKS)  plays  an opposite role:  its  associated



parameter estimates  are positive and  significant  in the 3M



vector, but  statistically indistinguishable from  zero  in



Bg.  SULFATE appears  to be an insignificant  contributor to



either  RADM-CA  or  RADS-CA.    OZONE,  conversely,  has  an



associated  parameter  estimate  in 8,,  that is  positive  and



statistically significant, although  the ozone coefficient



in  3S  is   statistically  unimportant.    This  finding  is



consistent  with  those  in Volume  I.   There, using  more

-------
                       Table  6-4
   Model  Estimates:  Chronic and Acute RRADs with
           Linear  Risk Factor Influence
Vari abl e
INTERCEPT

OZONE

SULFATE

NCIG

PACKS

TEMP

RAIN

AGE

EDUC

INCOME

CHRONLIM

MALE

WHITE

BLUECOL

WHITECOL

INSCHOOL

Log(D— 273^.55
3M-CA
-6.34"
(9.8)
7.03
(2.5)
.0061
(.54)
-.0034
(.61 )
.36E-4
(3.1)
-.01 4
(3.4)
0.85
(1.3)
-.0084
(1 .9)
-.0082
(.39)
-.32E-4
•(4.1)
0.99
(6.7)
0.28
(2.1)
2.59
(5.1)
-1 .76
(5.9)
-0.18
(1.0)
-0.42
(1.4)

^S-CA
-3.64
(7.2)
1 .77
(.45)
- .01 2
(.84)
.026
(4.8)
.53S-5
(.33)
-.025
(5.2)
0.45
(.62)
.31E-3
( .061 )
-.058
(2.4)
-.63E-4
(7.0)
0.46
(2.6)
.079
(.53)
0.78
(3-2)
-0.17
(.75)
0 .83
(4.3)
0.90
(3.D

Note:  Asymptotic normal scores for H :3 -0 in parentheses

-------
                           6-15



"primitive" OLS  and logit  techniques,  we found  positive



and often significant associations between ozone and minor



illnesses  among  adults,  but   no  pattern of  associations



when we  examined either  work  loss  or  bed disability days.



Thus,   the   findings   in  this   chapter  provide   some



corroborative  evidence  using a  more  sophisticated  and



appropriate statistical approach.



     Similar  estimates   obtain   in  the  model   of   the



acute-only   respiratory   ailments   RADM-A   and   RADS~A,



presented  in  Table  6-5.    Most  notable   is  that  the



individual  parameter  significance   levels  tend   to  be



somewhat  lower  than those estimated  in  the  chronic-acute



model    of    Table    6-4,    although    the    qualitative



interpretation  is   in   most   instances   unchanged._    Of



particular  import   is   that   the   coefficient  estimates



associated  with OZONE  and   PACKS  in  3  are  no  longer



significant at the  95?  level.5



     In   Chapter  4  we  found  that   various   nonlinear



transformations  of   the  ozone measure  lead  to  differing



conclusions  about   the  significance  of   the  relationship



between  ozone  and   respiratory health.    There,  remember,



the transformation  (OZONE)'   proved most significant.   As



such  nonlinearities  are  potentially  important  in  the



present   analysis   as   well,   we   also   consider   simple



transformations  of   OZONE,  NCIG,   and PACKS  of  the  form



OZONE*1,  NCIG*2, and PACKS*3  for  Alf\a,A,>0.    (We  ignore

-------
                     Tabls  6-5
     Model  Estimates:  Acute-only RRADs  wi
          Linear  Risk  Factor  Influence
                                         th
Variable
» v* » .1. w* is J_ w
INTERCEPT
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
*M-A
-6.67
(8.9)
5.09
(1.3)
. 0031
( .59)
- .0088
(1.1)
3. 1 E-4
(1 .8)
-.0069
(1.2)
0.46
(.53)
.0053
(.95)
-.0088
(.3D
.28E-4
(2.8)
-1.10
(3.6)
-0.15
(.79)
1 .82
(3.6)
-1 .25
(3.6)
0.13
(.59)
0.28
(.82)
PS-A,
-4.66
(8.1)
-2.39
(.49)
.0031
( .22)
.023
(3.5)
1 . 4E-5
( .072)
-.024
(4.4)
0.54
(.68)
.0057
(.98)
-.039
(T.4)
. 41 E-4
'(4. 4)
0. 1 3
(.58)
-0.30
(1 .3)
0.65
(2.4)
0.55
( 2 ; 1 )
1 .35
(5.9)
1 .57
(5.0)
Log(i)--20M8.53
Note:  Asymptotic normal  scores for
                                           in parentheses

-------
                           6-1 6




transformations  of  SULFATE because  of  the  its  generally




insignificant contributions as witnessed in Tables 6-4 and



6-5.)   The  software used  for  estimation does  not  enable




maximum likelihood  estimation  of  the \.;  instead,  a  grid
                                       J


search  approach  is   used,   where   the  search  is  over



( Al , A2, A^efx-yxf,   and  *-{0 . 5 , 1 . 0 , 1 . 5 , 2 . 0 }.     Of   the




sixty-four   possible   (AlfA2,A3)   triples,   that   which




maximizes the  conditional  (on AltA2) likelihood  function




with respect to (8M,3q) is selected as the (pseudo)  MLE.



     The estimates  of  the  RADM-CA  and  RADS-CA model using




the nonlinear transformations  are  presented  in  Table  6-6.




The pseudo-MLEs  of  the A   are  At=0.5 ,  Aa»1.5,  and A,*1.0,
                        t\          r       .            ,


with  a  likelihood  ratio   test   indicating  that   these




transformations  are jointly significant,  at  greater  than




the  95%  level.    The  overall  qualitative  findings  are




unchanged;   however, the   parameter  estimates  associated




with the transformed risk  factors are more finely resolved



than those presented in Table 6-4.   Similar statements can




be made  about  the  acute-only  model, whose  estimates  are



presented in Table 6-7.  Again, the pseudo-MLEs  for  the A.
                                                         K


are  0.5,  1.5,  and  1.0  for  the  OZONE,  NCIG,  and  PACKS



transformations,  respectively.  The  likelihood  ratio  test



of  the  joint  significance  of  the  transformations  is




significant   only  at slightly  above  the 90$  level;  since




the  A^  are   not  true  MLEs,  however, such  an LR test  is




somewhat misleading, and  is  biased  in  favor  of  accepting

-------
                      Table  6-6
   Model Estimates:  Chronic  and  Acute RRADs with
         Nonlinear Risk Factor Influences
Variable
INTERCEPT
OZONE*1
SULFATE
NCIG*2
PACKS*3 '
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
3M-CA
-6.75
(10.5)
5.10
(3.5)
.0039
(.35)
-.25E-3
( .32)
.35E-4
(3.0)
-.020
(4.2)
1 .01
(1.5)
-.0080
(1.9)
-.0071
(.33)
-.32E-4
(4.1 )
0.99
(6.7)
0.28
(2:0)
2.56
(5.0)
-1 .77
(6.0)
-0.19
(1.1)
-0.41
(1.3)
3S-CA
-3.64
(7.2)
2.86
(1 .7)
-.01 4
O.O)
.0034
(5.3)
.81 E-5
(.53)
-.030
(5.9)
0.57
(.78)
-.94E-3
(.19)
-.059
(2.4)
-.64E-4
'(7.1)
0.46
(2.9)
.064
(.43)
0.75
(3.0)
-0.15
(.65)
0.86
(4.4)
0.86
(3.0)
Log(l)—2729. 13
Note:  Asymptotic normal  scores  for  H.:g  »0  in parentheses
      \  from  grid search:  A^O.5;  A^-1.5;  A3-1.0.

-------
                        Table 6-7
        Model Estimates: Acute-only RRADs with
           Nonlinear Risk Factor Influence
Var iable
INTERCEPT
OZONEA l
SULFATE
NCIG1'
PACKS*3
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
SM-A
-7.09
(9.5)
4.89
(2.5)
.0052
(.37)
-.33E-3
(.26)
.23E-4
(1.3)
-.01 3
(2.1)
0.68
(.78)
. 0066
(1.2)
-.0076
C.27)
-.28E-4
(2.8)
-1 .09
(3.6)
-0.14
(.74)
1 .79
(3.5)
-1 .27
(3.7)
0.12
(.54)
0.30
(.89)
BS-A
-4.59
(7.9)
1 .35
(.71)
. 44E-3
( .032)
.0034
(4.1 )
-.88E-6
C.046)
-.029
(5.2)
0.69
(.87)
.0052
(.91)
-.040
(T.4)
-.42E-4
-(4.5)
0.13
(.58)
-0.31
(1 .9)
0 .62
(2.3)
0 .56
(2.2)
1 .36
(6.0)
1 .55
(4;9)
Log(A)—2045.12
Note:  Asymptotic normal scores for H : 3 =-0 in parentheses
      *  from grid search: ^=0.5; A2=1.5; A3=1.0.

-------
                           6-17


the null that transformations are not important.


     The,  estimates  of  the   ordered   logit  models  are


presented in Table 6-8.   The X,  transformations  suggested
                               i\


by the multinomial models  of  both  Table  6-6  and  Table 6-7


are  used here.    In  the  model  of RADM-CA and  RADS-CA


(column  1),  the  parameter  estimates for  (OZONE)   ,  TEMP,


INCOME, CHRONLIM, WHITE, BLUECOL,  (NCIG)1'5  and  PACKS are


all significant  at greater than  the  99*  level.   A perhaps


peculiar result is that the estimate of the RADM  intercept


exceeds  that for  RADS, thus  calling  into  question  the


validity of  the  ordered logit  specification.  Inde.ed, the

 2
X/.5,-distributed likelihood  ratio test statistic  of  the


restrictions on  the  multinomial  model  that are implied by


the ordered specification has a value of 89.58, suggesting


that  the  ordered  specification  can   be  rejected  with


considerable  confidence  in  favor  of  the  unrestricted


multinomial  model.   Similar  results obtain  for  the model


of  RADM-A   and   RADS-A  (column  2):  estimated  parameters


associated  with  TEMP,   INCOME,  WHITE,  WHITECOL,  INSCHOOL,


and  (NCIG).    are significant  at  above the  99?  critical


level,  while the asymptotic t-statistics  associated with


(OZONE)!5  and  CHRONLIM  parameters  exceed the 95J  level.

                         2
In  this  instance,  the  x   test statistic  for  the ordered


logit  model  restrictions  has  a  value of  68.58,  again


suggesting  that  the  ordered specification be  rejected in


favor  of the general multinomial model.

-------
                           Table 6-8
Model Estimates: Ordered Logit, Nonlinear Risk Factor Influence
Variable
Chronic-Acute
 Acute-only
INTERCEPT-RADM
INTERCEPT-RADS
OZONE*1
SULFATE
NCIG*2
PACKS^3
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
-4.28
(11 ;6)
-5.07
(13-7)
4.17
(3.8)
-.0033
(.37)
.001 7
(3.4)
.24E-4
(2.6)
-.024
(7.1 )
0 .76
(1.5)
-.0045
C1 -4)
-.029
(T.8)
-.46E-4
(7.9)
0.75
(6.7)
0.17
(1 .7)
1 .38
(6.3)
-0.93
(5.3)
0 .28
(2.1)
0.21
(1.0)
-4.82
(11.0)
-5.45 '
(12.3)
3.15
(2.3)
.0033
(.35)
.0020
(3.0)
.93E-4
(.71)
-.022
(5.3)
0.57
(1.1)
.0063
(1 .6)
-.024
(T.2)
-.35E-4
(5.2)
-0.41
(2.3)
-0.22
(1 .8)
0.99
(4.2)
-0.26-
(1.3)
0 .71
(4.6)
0.92
(4.0)
LogU)
 -2773-84
-2079.41
Note: Asymptotic normal scores for H  : 3 =»0 in parentheses.
      \.                             Ok
       k from grid search: ^=0.5; X2=1.5; X3=1.0.

-------
                           6-1 8



     On the  basis  of these  results,  we elect to  use  the



estimates  of the nonlinear  risk  factor  multinomial models



presented  in Tables  6-6  and 6~7 as the  foundation of  the



relative  risk  estimates.     In  the   multinomial   model,



translations from the qualitative outcomes  to quantitative



estimates  of  relative risks  are fairly  straightforward.



One obvious strategy  for evaluating the relative  risks of



smoking and air  pollution  would be to  assess  and  compare



the estimated  elasticities  of  the daily  outcome-specific



probabilities  with  respect  to  the  pollution  and  smoking



control  variables.    Using the  incidence  probabilities



defined in   (2),  and allowing for  the cases where  the



control   variables    are   subject   to   the   nonlinear



transformations h(x   )-(x .  } k, the elasticity formula is:
                   C K    w *C
                                             XB,.)). (3)

                                      i £ **






which  simplifies  when  \ -1.     (Here,  X   denotes  the



transformed X  vector.)
             U


     While  the  elasticity  comparison  approach  provides




perhaps the most  straightforward  method for assessing the




relative  risks  of interest, the  nature of  the  data used




here  renders  it  somewhat  uninformative.   In  brief,  the



problem  is  that   64J  of  the   sample  are  classified  as

-------
                           6-19

current nonsmokers,  while  UH% are  never  smokers.   It  is

seen by  inspection of equation  (3)  that for  these large

subsamples the  estimated NCIG and  PACKS  elasticities  are

zero.

     We therefore  adopt  in lieu  of  elasticity comparison

an approach  that  considers the discrete  changes  from  the

baseline  or   prevailing  daily   incidence   probabilites

attributable  to  a  variety  of  discrete  changes   in  the

control  variables from  their  prevailing sample  values.

This strategy has  at least  two  advantages.   First,  it

circumvents  the non- or never-smoker problem.  Second,  the

magnitudes  of the  hypothetical   discrete  changes  in  the

control variables are set to mimic potentially interesting

policy  measures.

     The strategy  is as  follows.   First, for  each  of  the

four   incidence    outcomes   (RADM-CA,   RADM-A,   RADS-CA,

RADS-A), a  baseline  mean probability  is  calculated using

the estimated models in Tables  6-7 and  6-8.   This  mean

probability  —  which is  simply the  sample  average  of  the

IT   —  is denoted ir  .   The second step is  to  perturb  the
 qt                 q '
control variable  of  interest  in each  X   by  the specified
                                        c
amount    and   reevaluate   each   individual's   incidence

probability  using the perturbed X .  The sample average  of
                                 t- ,
these   new ir    is  denoted  TT^ .   Finally,  for  each illness
             v
measure  and  each control  perturbation  considered,  the

diffference    ^"'    i3   calculated.    The  results  are

-------
                           6-20



presented in Table 6-9.



     As Table  6-9  indicates,  depending upon  the  specific



model of  interest,  changes  in either or  both measures  of



smoking as well as ambient ozone  concentrations can affect



the  likelihood  of  an  individual's  reporting  a minor  or



severe  respiratory  impairment.     For  instance,  in  the



RADM-CA model a 5 percent increase from  the sample mean in



the  average  daily  maximum  1-hour  ozone  concentration



increases  the  estimated  risk  of  a  minor  respiratory


                                                    -4
impairment on any given day by an average  of 1.76*10  .   A



comparable increase in risk is predicted to result from an



individual's   having  smoked   slightly   more  than   an



incremental one pack per day for  two years (since one pack


                                     -4
per  day for  one year  adds  0.75*10    to  the  risk of  a



RADM-CA).    The  same  model  reveals  that  a   ten  percent



increase in  the  average  daily maximum ozone concentration



is about equivalent  to an increase  of an  extra one pack a



day  smoked  for  five years  in terms of  incremental  risk



(3-50*10~4 and 3.85*10~4, respectively).



     We  can  also  compare  the incremental  risks   of  air



pollution  with  those  associated  with  current  cigarette



consumption.  For instance,  from  the RADS-CA model, a five



percent increase in ambient ozone concentrations increases



the baseline risk of a severe acute respiratory illness by

       — h
0.89*10  .   This  is about  one-twelfth  the effect  of  an



individual's  currently smoking an  additional  half pack of

-------
                          Table 6-9

                Estimated Mean Changes from
            Baseline Probabilities ir  (x10,OOQ)
                    RADM-CA  RADS-CA  RADM-A  RADS-A
Baseline ir
                     60.202   50.671   35.331   40.910
Hy pothetical
Control Change;

OZONE +
OZONE +
t
NCIG +
O
NCIG,. +
t
PACKS
W
PACKS,.

.05
. 1 0

5
1 0

+ 3
+ 1

*OZONE
*OZONE




65
825

1
3




0
3

.76
.50

--
« ..

.75
.35

0.
1 .

4.
10.

0 .
0.

89
75

47
80

1 4
71

0
1




0
1

.98
.95

--
.. _

.29
.49

0
0

3
3




.35
.68

.46
. 40

--
—
Notes: "--" signifies negative predicted change
       OZONE signifies the sample mean concentration of
          OZONE
               v
                    .0426

-------
                           6-21



cigarettes per day.   Comparable  calculations  could be  made



in the other models as well  (although we  prefer  not  to go



into  detail  here  since   the  significance levels on  the



variables of interest are not sufficiently high to warrant



large confidence in the estimated risk  changes).



     For purposes of  public  policy,  it  would be  desirable



to go beyond the  estimation  of  relative  risks  to consider



the cost  and  efficacy of "control"  measures.   This  would



permit at least crude cost-effectiveness comparisons  to be



made.    Some  estimates   are  available  regarding  ozone



reductions.   White  [6]   reports  that  when  the  National



Ambient  Air  Quality  Standard  for ozone  was  reviewed  in



1978, the marginal  cost  of meeting a standard  of 0.12 ppm



as  opposed to  one  of   0.14  ppm  was  approximately  $2.0



billion.    Although  the  form  of that  standard  (second



highest  hourly reading  at  a  monitor)  differs   from  the



measurement of  ozone  in  this  study (average  daily maximum



one-hour reading during a two-week period), a link between



the two  could  be  made.  This would  permit an  estimate of



the  costs  per  unit  of  predicted  ozone risk  reduction,



holding   other   possibly  beneficial  effects   of   ozone



reduction   (agricultural  productivity   increases,    for



example) constant.



     In  principle,  estimates  could  be  assembled on  the



costs of  reducing  cigarette  consumption (for an excellent



discussion of  the  nature of  such costs,  see  [1]).   Using

-------
                           6-22




such data, and  the  results presented above,  estimates  of




cost  per   unit  of  reduced  risk  from  smoking  could  be




derived and  compared  with those resulting  from  pollution




control.   Finally,  if  appropriate allowances were made for




the qualitatively different nature of the  two risks -- the




differing degrees of voluntarism, for instance -- it would




be possible to draw inferences about potentially  efficient




resource  allocation.

-------
                           6-23

APPENDIX



     Given  an  independent sample  of T  observations,  the

likelihood function is
           T          n
        nl H { H [(ir  )Pt]/n  1 }.                    (A.1 )
          t-1 refl    t       t
In logs,
         T
    i -  Z   I  In  [X 3  - log( I exp(X 3 ))]} + c, (A.2)
        t-1 refl    t            sea     ^ 3


where c does not  depend  on  3=»(3..t8c).   I  is concave in 3,
                                n   o

thus assuring convergence.

     A  Mewton-Raphson  algorithm programmed  in  SAS's PROG

MATRIX  is  used  for  estimation.   Except for the adjustment

for the multiple-trial  nature of  the  data,  the vector of

first derivatives and matrix  of second partials of  1 with

respect to  3  are identical to  those  of  the more familiar

single-trial multinomial logit model.  Thus,
and

-------
                           6-24




where  q,pe{M,S},  and  S. .  is  the  Kronecker  delta.    The




information matrix estimate is
evaluated at 3;  its inverse serves  to estimate Cov(S).

-------
                           6-25
Notes








*3ee Manning,  et.  al .  [4].








2Lave [2],  p.2.








3It is obvious that the "target"  groups  in a smoking



   cessation or  mitigation  policy differ from those



   in a policy designed to  reduce ambient concentrations



   of air pollution.   One might argue that



   a critical  difference is that  smokers assume  their



   risks voluntarily whereas exposure to ambient air



   pollution is  largely involuntarily;  policy measures,



   it is argued, should be  more concerned-with those



   risks assumed involuntarily, these being more in  the



   nature of classic economic externalities.  However,



   the recently mounting evidence on the health



   consequences  of passive  smoking suggests that the



   target groups in smoking mitigation policies  might



   well extend beyond the population of  voluntary



   smokers.  To the extent  that passive  smoking  is



   involuntary — in the sense that the  coats



   associated therewith have not  been capitalized by



   market forces — then the distinction between the

-------
                           6-26



   air  pollution  and  smoking  policy  target  groups  tends



   to  blur .








"*A11  illnesses  reported  in  the HIS are  coded as  either



   chronic  or acute.   Regardless  of  the interval between



   incidence and  time of  survey,  some illnesses  are --



   by  definition  — coded as  chronic due to their



   intrinsically  chronic  nature  (e.g. emphysema, lung



   cancer,  most cardiovascular problems).   Moreover,



   illnesses that might  otherwise be classified  as



   acute  are classified  as  chronic if the interval



   between  their  incidence  and the time of  the interview



   exceeds  three  months.  Thus,  an acute illness,



   according to the NCHS  codification scheme, is an



   illness  that is typically  construed  as acute  and



   that has had a duration  of less than three months



   at  the time  of the interview.








slt is  admittedly troublesome that the  signs of  the



   estimated coefficients for either MCIG or PACKS are



   negative --  although  not statistically distinguish-



   able from zero --  in  some  of  the  specifications.  We



   suspect  that this  phenomenon  is attributable  largely



   to  collinearity between  the two measures; indeed,



   their  sample correlation is 0.55.

-------
                        6-27



On a priori grounds, as argued earlier,  both



should be included in a model of respiratory illness.



However, if collinearity is severe, their separate



influences become difficult to identify.  To explore



further this possibility, we estimated two alternative



versions of the multinomial model for both



specifications (CA,A) of the RRAD measures, one in



which NCIG , but not PACKS, is included, and one in



which PACKS, but not MCIG, is included.   The results



largely corroborate the collinearity hypothesis: in



all cases, the estimates of the parameters associated



with the single included smoking measure are positive



for both the RADM and RADS probabilitea.

-------
                           6-28




REFERENCES








[1]   Atkinson,   A.B.   and   T.W.   Meade.    "Methods   and




     Preliminary Findings  in  Assessing  the Economic  and




     Health   Services   Consequences   of    Smoking,   with




     Particular Reference to Lung  Cancer,"  Journal of the




     Royal Statistical Society  A 137,  pp.  297-312,  1974.




[2]  Lave,  Lester   3.   Quantitative  Risk   Assessment  in




     Regulation.  Washington:  Brookings,  1982.




[3]   Maddala,   G.S.   Limited-Dependent  and  Qualitative




     Variables  in   Econometrics.    Cambridge:   Cambridge




     University Press,  1983.




[4] Manning, W., J.  Newhouse,  and  J.  Ware.  "The Status of




     Health  in  Demand  Estimation;  or, Beyond  Excellent,




     Good, Fair, Poor,"  in  V.  Fuchs,  ed.  Economic  Aspects




     of Health.  Chicago: University  of Chicago Press for




     MBER, 1982.




[5]   Warner,   Kenneth   E.   "Possible  Increases   in   the




     Underreporting of  Cigarette Consumption,"  Journal of




     the American Statistical Association  73, pp.  314-313,



     1 978.




[6] White,  Lawrence.  Reforming Regulation:  Processes  and




     Problems. Englewood Cliffs, NJ: Prentice-Hall, 1981.

-------
                              Chapter 7






                      CHRONIC RESPIRATORY DISEASE






     In the initial analysis in Volume  I  of  ozone and chronic respiratory




disease (CRD),  several regressions were  estimated over what was referred to




as  a  "resident! ally  stable"  group   of  individuals.     That   is,   the




observations were restricted to those  individuals who had  been  living in




the same place  for  five years at the time they were interviewed in the  1979




HIS.    The  purpose  of  this  restriction was  to  reduce  the  chances  that




someone who had lived in another location for a long time would be matched




up to air pollution exposures at his or her new location, thus confounding




our analysis of  CRD.   Our findings in  Volume I  (see  especially  p.  4-71)



suggested that  concentrating on the residentially stable made a difference




in the conclusions  one draws from such analysis.




     However,   the  five  year  residency requirement  we  imposed  in  that




analysis is itself rather weak.   Accordingly,  in analysis conducted since




the completion of  Volumes  I and  II,  we have reexamined  the  incidence of




CRD—and its possible link to air  pollution—using a  group of individuals



who had lived for  at  least  ten  years  at the  location  they reported in the




1979  HIS.    While  this  does not  eliminate  the  possibility of  spurious




correlation, it  lessens  it  when  compared   to  the  five-year  residency



restriction imposed earlier.  These results are reported here.




     These results  are responsive in other ways to comments and suggestions




on our  earlier work.    For  instance,  in response to  puzzlement  over  the

-------
                                  7-2






relatively weak performance of  the  smoking  variables  in explaining CRD in



the earlier work,  we included  in the reanalysis  the variable PACKYRS.'  This



measure, described  in  detail  in  Chapter  5,  proxies  individuals'  lifetime



smoking habits.   It is  included along with NCIGS,  a  measure  of current



smoking activity.   Also,  we  have  purged the  list of  regressors  of many



which had little or no  explanatory power in  the  original analysis.  In this



respect, the  models estimated  below  are  akin  to the  "lean"  model  in the



original analysis  (see equation (29), p.  4-77  of  Volume  I).   Finally, in



the analysis  here we have  included an additional measure of long-term air



pollution concentrations,  one which takes data from  just  one year  (1979)



but includes  annual average  readings  for all monitors within 20 miles of



the respondents'  census  tract  cenfcroids.    These  are  denoted  as OZ79AV,



S479AV, and SP79AV  for  ozone,  sulfatea,  and total suspended  participates,



respectively.



     The analysis of CRD below  differs from that in Volume I in one other



important  respect.    Here  we  have  run  separate  regressions  for  those



individuals  who  received  the   "probe"  questions  concerning respiratory



illness and for those  who  did not.   (Recall that  in  addition to the main



questionnaire, all  respondents  in the HIS were  given  one of  six different



probes  inquiring  in detail about six specific  disease categories.   Thus,



one-fifth of the respondents were asked whether  they had any of a number of



specific  respiratory  diseases;   the other  five-sixths of the  sample was



probed    (one-sixth   each)    about    cardiovascular,   geni to-urinary,



rausculoskeletal,  digestive,  and  nervous  system  disorders.)    Even those



individuals not  receiving  the  respiratory probe could report the presence

-------
                                 7-3





of CRD in open-ended  questions  earlier  in  the survey.   However, those who



had a condition like  asthma,  and who  forgot to volunteer that information



in the open-ended questions,  would have the  chance to report  it  if they



received the respiratory  probe  (where asthma is listed).   They would not



have this opportunity if  they  received,  say, the cardiovascular probe.



     Because of this difference, it  is of course possible that the reported



incidence of CRD  might  differ between  the  two  groups.   When we separated



the  two  groups,  this is  precisely  what   was  found.    The  sample  below



consists  of  2,743  individuals who  had  lived  in the same  dwelling for  at



least ten  years at  the  time of  the 1979  HIS.'    In addition  these were



individuals   for  whom complete  data were available  on the  dependent  and



independent   variables  of  interest.   Of  the 2,743  individuals, 460  had



received the respiratory probe questionnaire while the remaining 2,283 had



been administered one of the  other  five probes.   Of the 460 receiving the



respiratory   probe,  67  (or  15  percent)  reported the  presence of a chronic



respiratory  condition.  Of the  2,283  not receiving the respiratory probe,



only 74  (or  3 percent) reported such a  condition.  Since the assignment of



the six probes was  random, this suggests reporting differences  that  merit



separate investigation.   This  we do  below.



     The results of our limited reanalysis  of  the  determinants  of CRD are



presented  below.'     All  models  are  estimated  using  logit  techniques.



Equations (1) - (5) pertain to those receiving the respiratory probe  while



(6) - (10)  include only individuals not  receiving that probe.  In equation



(1),  exposures  to  air pollution are  characterized by the  annual  average



daily one-hour  maximum ozone concentration at the nearest monitor (OZ79NR) ,

-------
                                               7-4
        Ok
        i
        a
        z
        t
       x
       Ok
I,
r*
•
ec
           •o —
           e a
           • a

m u*
_
• *
«O O




** 1*
-_ .^
• •
N O





* in
0» *sj
* t
^





(J» f«*>
Irt «J«t
• *
O O


** »o *•*
o 
^* w


* ^
CO 0
• •
* rj




<* ^ «•*
o o o m
• • * t
* *4 00
w







J^ ^ .
z **i ^ 5
— fc _ ™ *
i»*> •»» 3; 2
M M
o n o 5

lift ^»
O ^ (N* h»
o o» o o
O r* O O
^



** ^
0*0 m so
00 00
oo do"





0 sT »N -O
o * o o
* • • t
Q *+ O O
W




f* ^*
O0 0| ^)
O •«• O O
• * • .
0- 00
w


^ r»
0 0
t •






m 01
o o y>  04 M
°°. ° f


O -•%
O <** t
O O
O •
o o
• 1
o





o
o •»
0 O
0 0


,*
0 -•
0 0
o o



f*t
o
o «•
o o
0 0


IM
o
o -»
O (9
o o




o
o »
o o
0 0







^»
WV ^
oc o
»• X
* vt
U C>
« M
a, w
!«
1 HI
1 •
I e
i ^
•fl I -

*M la*
O 1 U
m i o
1 1 4*

1 L,
** 1 a

•*> 1 •
o i u
1 1 U
1-4
1 C
1 0
I L
IX
-. 1 u
m \s
o !*•
m )••<
1 I •
1

!£
i

• 1 M
«*» i e
O 1 O
m i-i«
1 |««
1 >
!•
• IJ»
•* 1 O
O 1
n im
t ICB
I'M
1 -
1**
1

1
1
1 »
OB | *
e» 1 c
OO l«4
7 p
1 Ik
1 b
1 O

OBI t W
r* 1 «4
7 IS
! *
1

•o i e
• 1 O
•« t u
> i
!••*
i •
i

* 1 10
OK 1
f» 1 •
-« 1 «l
i i e
1 0
1**
o i:
» t w
at i •

1 1 O
10
l-o
i-e
•& t
a> l
o 1
A O 1

»* «* 1
• t 1
.* U 1
J 1

-------
                                  7-5
Table 7-1 (cont'd.)  Regression Results
                         VARIABLE DESCRIPTION
OZ79NR         Average daily maximum one-hour  ozone  concentration in
               1979 at monitor nearest individual's  residence  (in
               parts-per-million)

OZ79AV         Same as above but averaged over all monitors  within 20
               miles of residence  (in ppm)

OZMULT         Average hourly reading over all monitors  within 20 miles
               and averaged over the period 1974-79  wherever data were
               available

S479NR         Average 24-hour reading for sulfates  for  1979 at nearest
               monitor in micrograms-per-cubic meter)

S479AV         Same as above but averaged over all monitors  within 20
               miles

SP79NR         Average 24-hour reading for total  suspended particulates
               for 1979 at nearest monitor (in ug/m  )

SP79AV         Same as above but averaged over all monitors  within 20
               miles

SPMULT         Average 24-hour reading for all monitors  within 20 miles
               and averaged over 1974-79  wherever data were  available

RACE           Dummy variable (-1  if white, -0 if other)

SEX            Dummy variable (»1  if male,  =0  otherwise  (female,
               ambiguous, etc.))

INCOME         1979 household income in dollars

EDUCATION      Years of school completed

AGE            In years
   2
AGE            Square of above

PACKYRS        Lifetime cigarette  consumption

CIGS/DAY.       Number of cigarettes per day currently  smoked

-------
                                  7-6






and by the annual daily average  sulfate  concentration, again at the nearest



monitor  (S479NR).   (Recall  that  annual  averages  are used  in explaining



chronic  illness  rather  than  the concentrations during the two-week recall



period.   The  latter  are  the appropriate measures  in analyses  of  acute



illness like those in Chapters 4 and 6 above.)



     According to  equation  (1),  annual  average ozone  concentrations  are



positively and significantly associated with  the  likelihood of reporting



CRD in the probe group.  Neither  sulfates nor  any  of  the other independent



variables are related to CRD  in a statistically significant way, including



the more sophisticated smoking variable  PACKYRS.'  In equation (2), sulfates



are replaced  by  total suspended  particulate matter  (also  measured at  the



nearest monitor)  with virtually no change in the results.  In equation (3)



both  ozone  and  particulates are  averaged  over all  the  monitors within



twenty miles of  the respondent's  home.  Thi's reduces  both  the magnitude as



well  as   the  significance of the  estimated  of  the ozone  effect.    The



particulate estimate  changes  sign (it is expected to be  positive) but is



still  far  from  being  significant.    The  size and  significance of  the



coefficient estimates on the  other regresaors are unaffected by this change



in the characterization of exposure.  Equation  (4)  replicates (3) but with



sulfates  substituted  for total suspended participates.    The  results  are



virtually  identical  to  those in  (3)  with none  of   the  regressors  being



significantly associated with the  likelihood  of CRD.



     In  equation (5) , ozone  and participates are measured  by the multiyear



(1974-1979)  annual   average  concentration  (see  Volume  I,  Chapter  2,



especially  p.  2-37).    This  change makes a  substantial  difference in the

-------
                                  7-7





size  and  significance of  the  estimated ozone  effect.    In  addition,  the



parameter estimate associated with particulates increases substantially in



significance,   although  it  is  still  well  below  conventionally  accepted



levels  (t  =  1.96 connotes  significance  at the  5  percent level).   As in



equations (1)  - (4),  none of the other regressors,  including either smoking



variable, is  significantly associated with CRD.'



     Equations (6)  -  (10)  perform the same set of regressions as (1) - (5).



The  difference  is   that   the  sample  in  the  former   consists  of  2,283



individuals,  none of  whom  received the  respiratory probe as  part  of  the



1979  HISJ   Each of  these  individuals had  the  opportunity to report  the



presence of a chronic respiratory disease in the open-ended part of the HIS



(and 74 did so), but  they were not  shown a list of CRDs and asked whether



they had  any  of  them.   As indicated  above, only  3  percent  of this group



reported CRD, as compared  with  15 percent  of  the sample used in equations



(1) - (5).



     The  findings  in (6)  - (10) provide  an  interesting contrast  to  the



earlier ones.   The  ozone  variable is never estimated  to be significantly



associated with CRD  in   (6)   - (10).'     However,  the  total  suspended



particulates  coefficient  estimate is uniformly  more significant  in  this



latter set of regressions.'   In  fact, in equation (10) TSP is positively and



significantly  (at  the 5  percent level)  associated with  CRD.'   Sulfates



performed as  weakly as in the earlier runs.



     Of equal  interest is  the performance of other  independent variables in



(6)  -  (10).     For   instance,   income   is  negatively  and  significantly



associated with  CRD  in  all five regressions.    All other things  equal,

-------
                                 7-8






individuals having higher  incomes  are relatively less  inclined  to report



CRD.'    In  addition,  both  cigarette  smoking  variables are  significant.



PACKYRS, the measure of accumulated smoking history,  is positively related



to  the  likelihood of  CRD  as  one  would  expect.   The sign  of  NCIGS  is



negative,   however,  suggesting  that current  smokers  are  less likely  to



experience  CRD.'    One  explanation  for  this seemingly  counterintuitive



finding is that individuals who  believe they have or have been diagnosed as



having CRD have in all  likelihood quit smoking.  If so, one would expect to



find only those free of CRD among individuals  currently smoking.



     Because our findings  are quite sensitive to the  choice of the "probe"



or "non-probe" samples,  some discussion is required.' It is our opinion that



the "probe" sample—that is, those who received questions about particular



respiratory diseases—is more likely to reflect accurately the incidence of



CRD  in the  United  States.    In  fact,   the  National  Center for  Health



Statistics  uses  the results  from  the  six different  probes  to make  its



estimates   of  specific  disease  prevalence  in the  United  States.    On  the



other hand, one must admit the  possibility  that  at least some individuals



are motivated  by  the  probe  to  report  having some diseases of which they



have heard but for which they never  received a professional diagnosis.



     Concerning the poor performance of even the more sophisticated smoking



measures in (1) -  (5),  we intend  to do additional work. One direction for



this  work  will   be  the  disaggregation   of   the  set   of CRDs  into



disease-specific analyses.'   For  instance,,  it might  be the case that smoking



(or other  of  the independent variables, for that matter) is related to the



incidence  of  emphysema but  not   to  asthma  or  chronic  sinusitis.    By

-------
                                 7-9






aggregating these  different forms of CRD in the present analysis,  we may be



obscuring disease-specific  associations.   This may  also  shed some light  on



the role of ozone  and  other air pollutants in CRD.

-------
                                Chapter  8



                     ADDITIONAL SENSITIVITY  ANALYSES








     This chapter  summarizes  the results of  sane additional  sensitivity



analyses  conducted pursuant  to  a  variety  of  comments  and  suggestions



received during the peer review phase of the project.








8.1. The Effects of Precipitation on Acute Health  Status



     It was suggested by  several  peer reviewers that the use  of  two-week



daily average precipitation (AVPRECIP) as a covariate in the acute health



status models was perhaps an inappropriate  characterization of the threat



to  health posed  by precipitation.   Rather, it  was argued,  a  superior



characterization  would   account  not  only  for   the  mean   effects  of



precipitation  (as  captured by   AVPRECIP),  but  also  for   the   variance



effects,  i.e.  the  number of  days  during  the  two-week period  on which



rainfall  occurred.    The hypothesis  is  that the same  total  amount of



precipitation during a  two-week  period  (=»14*AVPRECIP)  poses  a  different



risk to health (respiratory health, in particular) when spread out evenly



over  the  two-week period than when  concentrated over  a one  or two day



span.



     Our  data  enable the examination of such  effects.   The  idea  is to



construct a measure of  precipitation  that captures  both the mean and the



variance  effects.   The  measure we  created  to assess   this   question



(RAINDAY.) is formulated as AVPRECIP divided by the average number of days



during  the  two-week period  on which any precipitation occurred at all



(AVRAINYN).   Thus,  the  measure can  be construed as the average amount of

-------
                                   8-2




precipitation occurring  on the  days  when any  precipitation  occurred at




all.    The  measure  is  positively  related  to  the  mean effects,  but




negatively related to the variance effects.




     In order  to assess  the  possible  effects  of  substituting  this new




measure as an explanatory variable, we examined the sample correlation of




the three measures:








                 AVPRECIP      AVRAJNYN       RAINDAY
     AVPRECIP     1.000         0.533          0.790




     AVRAINYN   .                1.000          0.038



     RAINDAY                                  1.000








The extraordinarily high correlation between AVPRECIP and RAINDAY has led




us' to conclude that the substitution of the latter measure for the former




in our  acute  health models  would probably, have little material  influence




on  the   results.      Thus,   while  the  Question   of   the  appropriate




characterization of weather stress in statistical models of illness risks



is certainly  an interesting one  that  merits additional  study,  it seems




reasonable to suggest that such additional effort in the present analysis



would probably not lead to additional clarification of the air pollution -



health effects relationships of primary  interest.








8.2 Sample Size, Model Specification, and  Parameter Estimate Sensitivity




     In many  of  the models  estimated  in Volume I, the point estimates of




the  relationships  between  air   pollution   and  illness  varied  across




specifications depending on what set of  regressors was used.  The addition




or deletion  of  regressors not only implied  respecifications  of  the null

-------
                                   3-3



hypotheses under  test,  but also  typically  necessitated different sample



sizes on which the  estimation  was performed.   In most cases, the varying



sample sizes were attributable to the fact that the data availability for



the  various  air  pollution measures  differed  by pollutant,  so  that when



different sets of pollution measures were tested, the sample sizes varied



accordingly.



     Sample  selection  considerations aside,  the  effects of  using  these



different  sample sizes  should   be  manifested  only  in the  efficiency



properties of the estimators.  However, inferences about the relationship



between  air  pollution  and illness  outcomes  depend on  the sample  and



specification used.   Thus, an  understanding  of the cause of the variance



in parameter estimates  seems  essential.



     We  concentrate  our  analysis of  this  phenomenon on models  (49)  and



(50)  estimated in Volume I.   Here, it is noteworthy that the addition of



the covariates N2NR01 and CONR01  in (50) to the set of  pollutants included



in specification (49) (i.e.,  03NR01, S4NR01 , and SPNR01). has at least two



important implications.  First, the estimated  coefficient associated with



03NR01  falls from 1.87 to  1.41, and the associated t-statistic drops from



2.32 to  1.46.   Second, owing  to  the relative paucity of GO data — the



estimation subsample in (50)  is about 25 percent smaller than that used in



(49)  — 3,703 versus 4,899, respectively.  Thus, one is necessarily led to



the  question: Is the change in  the estimated ozone  coefficient  and its



significance level due to  the  inclusion of  the additional  covariates,  to



the smaller subsaraple,  or to  both?



     To investigate  this important question, we reestimated equation (49)



using the same sample of 3,703 on which specification  (50)  was estimated.



The results of this  exercise  are reported in Table 8-1  below.  There it is

-------
OEP VARlABLEl TRADR3*
                             Table 8-1
SOURCE OF
MODEL 18
ERROR 3684
C TOTAL 5702
ROOT MSE
DEP MEAN
c.v.
SUM OF
SQUARES
97;427923
5307.067
5404.495
1 '200239
0.183365
654.563
MEAN
SQUARE
5.412662
1.440572
R«SQUARE
AOJ R«8Q
                                            F VALUE

                                              3.757
                                             0.01SO
                                             0.0132
    PR08>F

    0.0001
VARIABLE  OF
INTERCEP
03NR01
S4NRQI
SPNR01
RACEM1BO
MARY1 NO
INCOMCQN
FAT
FATSO
AGE
AGE SO
SMQKY1NQ
CHRLWOUM
DMAXTEMP
AVPRECIP
HUMIDRF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
     PARAMETER
      ESTIMATE
    0^767031
 •0.00287839
 •0,00032749
    0^105050
    0,044511
    0.012«69
•.0000062991
   -0^775180
    0,156010
    0.010623
•0,000111508
    0.'046446
•0,000489087
    0.256219
 •0.00179559
 •0.00312125
   •0." 020593
 0.003097651
    STANDARD
       ERROR

    0.450348
    0.924612
 0.004161826
0.0006888529
    0.061582
    0.044609
    0.046157
.00000239667
    0.316011
    0.061056
 0.006666158
0.0000702869
    0.042435
 0.006831445
    0.056659
 0.002027Q94
 0,001211361
    0.217254
 0.002611947
                                         T FOR HOf
                                        PARAM£TER«0

                                              1.703
                                              1,920
                                             •0.692
                                             •0.475
                                              1.706
                                              0.998
                                              0.279
                                             •2,628
                                             •2.453
                                              2.555
                                              1.594
                                             •1,586
                                              1.095
                                             •0,072
                                              4.522
                                             •0.886
                                             •2,577
                                             -0,095
                                              1,186
PROB > ITl

    0,0886
    0,0550
    0,4892
    0.6S<»5
    0,0881
    0.3184
    0,7804
    0,0086
    0,0142
    0,0107
    0,1111
    0,1127
    0,2738
    0,9429
    0,0001
    0,3758
    0.0100
    0,9245
    0.2357

-------
                                   8-4



seen that the estimated coefficient associated with 03NRQ1 is 1.77 with a



t-statistic of 1.92.   This coefficient  estimate is relatively close to the



1.87 estimated  on the  larger subsample —  well within  one-half  of the



standard deviations of either  estimate.  It seems then that the difference



between the  1.77  value in the r_eestiraated model  (49)  and the 1.41  value



estimated in model  (50) should be  largely  attributed to the inclusion of



the two additional pollution  covariates.  That such a change results is of



little  surprise given  that  the partial  correlation between  03NR01  and



N2NR01   is  large (0.281).   In the  presence of  such  high correlation, one



would  expect  that the  separate  influences  of 0  and  NO- would  be  more



difficult  to  identify than would  be the case  if  the  two  measures  were



orthogonal.   The  results  of  this  exercise are  somewhat reassuring given



that samples of varying sizes  were  used in estimation throughout Volume I.



     In summary,  on  the basis of this  (admittedly small-scale) exercise,



it  seems  fair  to  say that  the effects  of  using  different  estimation



samples were indeed largely restricted  to efficiency effects, and that the



dispersion of  the estimates  of  the  air pollution - illness relationship



should  be  attributed not to  the different samples used, but  rather,  as



would be hoped,  to the different  specifications tested.








8.3 Poisson Regression Analysis of  Volume I Models (48), (49), and (50)



     In the  later  phases of our  research,  we  have largely  turned our



attention  to estimation  techniques  which  we  believe  better treat  the



nature  of   our  dependent  variables  than  the  methods  utilized  in  the



large-scale   analyses   presented  in   Volume   I.      Insofar  as   the



restricted-activity-day measures of  illness  are concerned,  the  Poisson



regression  technique  (described  in   detail  in  Chapter  4)  has  been  a

-------
                                  3-5



preferred estimation method.  While the  large  part  of our analysis using



this methodology is presented in Chapter  4,  it has  been proposed by some



of  our  reviewers  that  for  purposes  of  comparison  we  reestimate  using



Poisson methods some of  the  specifications  that  were estimated by OLS in



Volume I.   Three such  reestimations are  presented here.



     We  elect  to  concentrate  this  effort   on  the  total  respiratory



restricted  activity  days   (TRADRSP)  models   whose   OLS  estimates  were



presented as models (48)-(50) in Volume I.  Recall that these models were



formulated  on three  different  assumptions about  which air  pollutants



should be  included as  explanatory variables.   Models (48)-(50) specified



the set  of air pollution regressors  as,  respectively,  {03NR01,  S4NR01},



{03NR01,  S4NR01 ,  SPNR01},  and {03NRQ1 ,  S4NR01 ,  SPNR01,  N2NR01 ,  CONR01}.



Due  to  availability  of  pollution  data,  the  samples   on  which  these



specifications  were  estimated  had  varying  numbers  of  observations;



respectively,  these were 4,906 (197);  4,899 (197);  and 3,703 (154), where



the figures in parentheses are the number of observations having positive



TRADRSP realizations.



     The results of the Poisson reestimations are presented in Tables 8-2



through 8-4.  There it is seen that inferences  drawn  in Volume I about the



relationship  between  ozone  and respiratory-related  restricted  activity



days are largely corroborated by the  reanalysis.   Specificically,  in all



three specifications,  the  coefficient  estimate associated with 03NR01 is



positive, and statistically different  from zero.  (Recall from Chapter 4,



however,  that these  significance  levels  are  perhaps  overstated.   The



robust covariance estimation techniques used in Chapter 4 are not used in



this reanalysis, so that some caution should be exercised in interpreting



significance  levels.   However,  recall  also that  the parameter estimates

-------
              Table 8-2
095
     •*• NUMBER  Of  OBSERVATIONS •«•

     N  UBS       N  POS      N iERO

      4906         197        4709
I.MT
93NR01
                •«• PARAMETER ESTIMATES •••
       HAT

 •0,97*1*7
                STO tHR

               
-------
             Table 8-3
089
      ***  NUMBER  OF  OBSERVATIONS  ***

      N  083        N  POS       N  ZERO

      4899          197         4702
VARIABLE

INT
03NR01
S4NR01
3PNR01
RACEtetflO
3EXM1FO
MARYINO
INCOMCON
FAT
FATSQ
AGE
AGESQ
3MOKY1NO
EDCOMCON
CMR^MOUM
       *«*  PARAMETER  ESTIMATES  ***
   BETA  MAT     3TD ERR       T  3TAT
   -1.95552
    10.2733
•0.0062^601
•0. 00225262
   0.669553
   0,175727
OMAXTEMP
•.000028456
    -1.7037
   0.325624
  0.0544776
».00060?24T
   0.247066
 0.009U445
    1.03795
•O.Q0682027
 -0.0161773
   0,7(J2162
  0. 01483*2
   0.63180*5
    1.44988
 0,00728557
 0,00126152
   0.131209
  0^0762988
  0.0786125
0.000004451
   0.340436
  0,05*1044
  0.0120^78
0.000127149
  0^0725259
  0.0118421
  0.0791518
 0.00302260
 0,00199425
   0.35^582
 0,004^0774
 -3.06347
   7.0856
-0,860058
 -1,78563
  5.10296
  2.30315
   -1,066
 -6,39313
 -5.00447
  5.60412
   4,5031
 •4.73653
  2,85506
 0,769665
  13,1134
 -2.25636
 •6.11196
  i.95272
   3.2205

-------
              Table 8-4
08S
    t««  NUMBER OF  OBSERVATIONS  •*•

    N  ogs        N  COS       X  iERO

      3703          15*         3349
                ••• PARAMETER ESTIMATES •••
INT
03MROI
S4NROI
SPNR01
RACE* 180
SEX&1FO"
WARY1NO
INCUMCON
fAT
  8ET* HAT

  •i.soazb
   8,45847
 HUMI0HF
 S2NH01
 CONR01
  0.1U061
•000039U1
            0.058130*
           ,000596945
            "
 SMQKY1NO
 COCOMCON
               STO &RR

              0,691047
            0,00904154
            0.00146432
              0,0947231
              0,0881084
            ,0000051072
             •'0,357296,
              0,0595565
              0.0134132
.0,00679773
 '  0,961681
 •0,0113383
             0,6813664
             O.OU736B
  0.01139U
 0,00261315
  0,0254361
            O.Q03335SI
             0,0022185
              0.432224
            0,00518343
    T  STAT

  •2,1826
  5.12837
   -2,7466
   4,83992
   1.72194
   1.29456
  •7,66391
  •4,87693
   5,80728
   4,33381
  .4,31483
   3.27755
 •0.533707
   11,0007
  •3,39926
  •0,42321
•0,0403237
                           U7»79i

-------
                                   3-6



themselves should be consistent.)  In addition to the ozone relationship,



the other estimated relationships are largely in line with those reported



in the original Volume I  specifications  estimated by OLS.



     The upshot of  this  analysis,  then,  is that the inferences suggested



in Volume I seem substantiated, and while the magnitudes of the estimated



responses  do   differ  (as  would  be  expected  with different  estimation



techniques), the  direction and  general  magnitudes of  the  estimates are



quite comparable.








8.4 Sensitivity to Aggregation Across  Smoking and Chronic Illness Status



     A  common  econometric problem occurs  when disparate  structures are



mistakenly assumed  to be  identical.   When  empirical  analysis proceeds by



aggregating the  disparate  structures  and estimating  as  if  they were



identical, it will  generally be the case that none of the structures will



be estimated  consistently.    It has  been  suggested that  insofar  as the



health  outcome  models   estimated  in   Volume   I   are  concerned,  such



aggregation bias  poses  a   potential  problem when the  structures  of the



health  outcome  models are assumed to be the same across  either smoking



status or chronic illness categories.



     In the present section,  we  undertake a  reanalysis of some  of the



specifications  estimated  in Volume  I,  considering the  possibility that



individuals' illness  responses to covariates are  different depending on



whether they are  never,  former, or current smokers,  and on whether they



are or are not plagued by a chronic respiratory  condition.



     The  first analysis  —  that  of differential  responsiveness   across



smoking  status  —  uses  Poisson regression   analysis  of  the  TRADRSP



dependent  variable.   The  sample sizes used  for  the  groups  of   never,

-------
                                  3-7



former, and current smokers are,  respectively,  1,439  (47);  565 (26);  and



1,243  (47), where  again the number of  positive TRADRSP  realizations  are



given  in parentheses.   The  set  of air pollution regressors is limited in



this exercise to ozone and sulfates.



     The results of this analysis are presented in Tables 8-5 through 8-7



in which both the Poisson ML covariance  estimates and those obtained using



the robust methods  discussed  in Chapter 4 of  this volume are presented.



These  results  reveal  an interesting pattern of the relationship between



ozone  and  TRADRSP.   While Table 8-5  shows  the  estimated relationship



between  ozone   and  TRADRSP   to  be  negative   (though  statistically



indistinguishable from  zero)  for never smokers,  entirely different,  and



somewhat surprising,  inferences  are  drawn  about the relationship between



ozone  and acute  respiratory illness for the  groups of former and current



smokers.  In Tables 8-6 and  8-7  it is seen that  the  estimated ozone effect ,



for  both  these groups  is  positive  and  statistically significant  at



conventional  levels  even  when  the  robust  estimates  of  the  parameter



standard errors  are used.  The  magnitude  of the  response  appears  to be



largest  for  the group  of  former  smokers,  although the  physiological



underpinnings of  this  phenomenon are  not  obvious.   While we  have  not



tested  statistically  for whether the  structures of  the models  for  the



three  groups  are the  same  (using,  e.g.,  a  likelihood  ratio  test),  the



results  suggest  that  a  reasonable  conjecture  is  that  such  tests  would



reject the hypothesis  of homogeneity.



     In the second analysis, we  use OLS  to assess the possibility that the



structures of the TRADRSP models  differ  depending on whether an individual



has  a  chronic respiratory  illness.   The  analysis is somewhat  hampered



because only a small number  of individuals (364) in  this  estimation sample

-------
                Table 8-5
OBS
      •••  NUMBER OF OBSERVATIONS •••

      N  089        N POS      N ZERO

      1*39           47        1392
VARIABLE

INT
03NROI
S4NRQI
RACEW180
SEXM1FO
XNCOMCON
AOE
EOCOMCON
CHRLMOUM
AVMAXTMP
AVPRECIP
       • ••  PARAMETER ESTIMATES •«•

   8CTA MAT     STO ERR      T STAT
   •1,72325
 -0,0921191
     1,4318
  0,0116624
», 000043179
•0,00137636
   0,011961
    1,03309
•9.8738C-0*
   0,811043
   0,561939
    4,12275
   0,018894
   0,327568
   0,162797
,0000086888
 0.00382972
  0,0210916
   0,171375
 0,00524898
   0,720308
               •3,76634
              •0,417984
               •4,87558
                4,37101
              0.0716374
               •4.96956
               •0,35939
               0.548135
                6,02825
            •0,00188108
                1,12597
     •«** PARM, ESTS.  (RQ8UST VARIANCE ESTIMATES)
R08UST

INT
03NR01
S4NR01
RACCW180
SEXM1FO
INCOMCON
AGE
EOCOMCON
AVMAXTMP
AVPRECIP
   •2,1164»
   •1,72325-
 •0,0921191
     1.4318
  0.0116624
••000043179
•0,00137636
   0.011561
    1,03309
•9,8738E«06
   0,811043
    STO ERR

    1*94121
     6.3755
  0,0417871
   0,518293
   0,462579
,0000256016
  0,0085561
  0,0603543
   0.511635
  0,0133843
    1,50329
                 T STAT

               •1,09027
              •0,270292
               •2,20449
                2,76253
              0,0252116
               •1,68659
              .0.160863
               0.191552
                2.01919
             ,000737712
                0,53951

-------
              Table 8-6
OSS
      *«* NUMBER OF OBSERVATIONS ***

      N Q83       N PQ3      N ZERO

        665          26         639
                *** PARAMETER ESTIMATES •**
VARIABLE

INT
Q3NR01
34NR01
RACEfclBO
SEXM1FO
INCOMCON
AGE
EDCOMCON
CHRCMOUM
AVMAXTMP
AVPRECTP
   BET* HAT

   •29,1 S89
    16.6943
  0.0321111
  •0.643355
.0000026724
 0.00746*27
 •0.0547761
   0. 4$20ai
 •0.0388972
     -2.*342
    STD ERR

     400480
    3,50114
  0.0103115
     400480
   0,196764
.0000115801
 0,00629732
  0.0333252
     0.2063
 0.00613555
    1.17849
     T STAT

 0.00007281
    4.T6824
.0000736409
   •3.44474
   0.230776
     1.1861
   •1.64368
    2.23986
   •6.33965
   -1.98729
     **** PAR*. ESTS. (ROBUST VARIANCE ESTIMATES) *«**
ROBUST

INT
03NR01
S4NRQ1
RACEW13Q
SEXM1FO
INCOMCON
AGE
EOCQMCON
AVPRECIP
   BETA HAT

   •29.1589
    16.6943
  0.0321111
    29.491T
  •0.643355
,0000026724
 0.00746927
 -0.0547761
   0.462081
 •0.0388972
     •2.342
    STD ERR

    1.70479
    6.96203
  0.0250393
   0,4108<»4
   0.494653
.0000240709
  0.0133358
  0.0810222
   0.659609
  0.0133654
    2.59748
     T STAT

   •17.1041
     2.3979
    1.28243
    71,7745
    •1.2876
   0.111023
   0,560093
  •0,676062
   0,700538
   •2,91028
  •0,901644

-------
                Table 8-7
oas
     »•• NUMBER  Of OBSERVATIONS •••

     N OSS        N POS      N ZERO

      1243          4?        1196
VARIABUC

INT
03NH01
S4NKQ1
RACE* 180
       *••  PARAMCT6R ESTIMATES  •••

       HAT     STU tRH       T  STAT
  •2,45614
   9*16519
-0, 00376^5
INCOMCON
AQ6
NC18SOYN
EOCOMCON
AVMAATMH
   0,317721
'•000046108
»«00004272tt
   0.015498
   0.04978*
   0,670ttQ2
 •0« 024? 194^
   4*33699
   0.446641
    3*29159
  0,0133045
   0,249973
   0,139093
0*000008284
 0*00468037
 0*00432861
  0,0265047
   0*165151
 0*00446028
   0*614482
  •5,04713
    Z,78443
 -0,283251
    3*58481
    2,28424
  •5,56598
•0*00912914
    3,58035
    1*87831
    4,06175
  •5,54213
    7*05796
     •••» RARM» ESTS*  (ROBUST VARIANCE ESTIMATES!
R08UST

INT
03NR01
S4NH01
RACEM18U
INCOMCON
Aae
NCI BSD YN
AVMAXTMf
AVPHEC1P
   9ETA HAT

   •2»4S614
    9,16519
 •0,003768*
   0*896108
   0*317721
**000046108
.,00004272*
   0.015498
   0*049784
   0*670802
 •0*0247194
    4*33699
    STO EHR

    1*61305
    4*26883
  0,0218712
   0*647155
   0*462594
•0000262119
  0,0105212
  0,0157747
  0,0669773
   0,513254
  0,0118029
    2*69697
     T STAT

   •1,52267
    2,14701
  •0*172304
    1,40642
   0*686026
   •1*75907
 •0,0040611
   0,982459
   0,743297
    1,30696
   •2*0943»
    1,60809

-------
                                   8-8



report  chronic  respiratory  conditions.    The results  are  presented  in



Tables 8-8 and 8-9,  where it is seen that the estimated magnitudes of the



ozone  effects  are  dramatically  different in  the two  instances.   Note



carefully, however,  that the means  of  the dependent  variable for the two



samples differ  by  an order  of  magnitude  (0.11  for the sample  having  no



chronic respiratory  illness,  1.04  for the sample  reporting  seme chronic



respiratory illness).   On  the basis of this  phenomenon,  it  appears that



homogeneity of  the  two groups  can be rejected  without   any  additional



analysis  solely  on  ground  that the  outcomes are far  too  disparate  to



believe that the expected values  could  possibly be the same.  For example,



a  simple  t-test of  homogeneity of means would  surely reject  the  null



hypothesis in this  instance.

-------
                               Table 8-8
 OEP VARIABLEI  TRAORSP

SOURCE or
MODEL 11
ERROR 4515
C TOTAU 4586
ROOT MSE
OEP MEAN
C.V.
SUM Of
SQUARES
11*553244
2928.552
2941*105
0.805374
0,106030
759.5683
MEAN
SQUARE
1*141204
0.648627

R-SQUARC
AOJ R«SQ

                                             r VAUUC

                                               1*759
                                              0,0043
                                              0.0018
                                               PR08>r

                                               0.0551
VARIABLE  Or
     PARAMETER
      ESTIMATE
XNTERCER
03NR01
S4NR01
RACCW180
INCOMCON
A8C
NCI9SOYN
EDCOMCON
4VMAXTMP>
PORMEfl
1     0*278607
1     0.564490
1  -0,00182198
1     0.042292
1    «0.030077
1 ••0000022889
1 •«•000475199
I  0.001486407
1 -0,000753311
1  -0,00252467
I     Ot082526
I     0*044112
    STANDARD
       ERROR

    0*085010
    0.564345
 0.002222703
    0.037615
    0.024943
•00900138313
0,0007031875
 0.001009186
 0*004108444
 0.000838691
    0.122702
    0*032747
 T FOR MOI
PARAMETERS

      1.177
      I.000
     -O.ffO
      1*124
     •1*206
     •l*6fS
     •0.676
      1*473
     •0*183
     •3*010
      0*673
      1*347
PROS * ITI

    O.OOli
    0.3172
    0.4124
    0.2609
    0.2279
    0.0980
    0.4992
    0.1409
    0,8545
    0*0026
    O.S013
    0*1780

-------
                               Table 8-9
OER VARIABLE! TRADRSP
SOURCE OF
MODEL 11
ERROR 392
C TOTAL 363
ROOT MSC
OEP MEAN
C,V.
SUM or
SQUARES
229,693
3319,766
3849,462
3,071019
1,038462
295,7278
MEAN
SQUARE
20,881194
9,431160
R«SQUARE
AOJ R»SQ
                                            F VALUE

                                              2.214
                                             0,0647
                                             0,03SS
                                               PR08*F

                                               0,0139
VARIABLE  OF
INTERCEP
03NR01
S4NR01
RACEW1BO
INCOMCOW
AOC
NCI3SDYN
EOCOMCOM
AVRRECXP
1
1
I
1
1
1
1
1
1
1
1
1
  PARAMETER
   ESTIMATE

   1,296410
   9,363780
  •4,036888
   0,707993
   0.923020
,0000263253
   0,011920
   0*016938
•0,0037767
  •0,026317
   2,926834
  •0*464974
    STANDARD
       ERROR

    1,264699
    8,108099
    0,030704
    0,499220
    0,391846
,00001916124
  0.00989562
    0,013916
    0.060918
    0,011300
    1,699772
    0,439989
 T FOR MOt
PARAMETERS

      0,993
      1.199
     •1.201
      1.418
      2,623
     -1.3T4
      1.209
      1.224
     •0,062
     •2,329
      1.722
     •U067
PR08 > ITI

    0*3212
    0.2469
    0,2304
    0.1970
    0,0091
    0.1704
    0.2292
    0.2219
    0,9506
    0.0204
    0.0860
    0.2869

-------
TECHNICAL REPORT DATA
(Please read Instructions on the ret erse before completing)
1. REPORT NO. 2.
EPA-450/5-85-005C
4 TITLE AND SUBTITLE
Ambient Ozone and Human Health: An Epidemiologica
Analysis Volume III
7. AUTHORIS)
Paul R. Portney and John Mull any
9 PERFORMING ORGANIZATION NAME AND ADDRESS
Resources for the Future
1616 P Street N.W.
Washington, DC 20036
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Enviornmental Protection Agency
Office of Air Quality Planning and Standards (MD-12
Research Triangle Park, NC 27711
15 SUPPLEMENTARY NOTES
Project Officer: Thomas G. Walton
3. RECIPIENT'S ACCESSION NO.
5. REPORT DATE
1 June 1985 (Date of Preparation )
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT NO.
10. PROGRAM ELEMENT NO.
12A2A
11 CONTRACT/GRANT NO.
68-02-3583
13 TYPE OF REPORT AND PERIOD COVERED
Final Report
M 14. SPONSORING AGENCY CODE
OAQPS

16. ABSTRACT
This report is the third volume of an analysis of the relationship between
ozone and human health benefits.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS b.iDENTIFI
Benefit Analysis
Air Pollution, 03
Epidemiology
18 DISTRIBUTION STATEMENT 19. SEC'JRl
Unclas
Release Unlimited 20 secypi
Unc las
ERS/OPEN ENDED TERMS C. COS AT I Field/Group

rv CLASS (This Report) 21 NO. OF PAGES
sified 226
TY CLASS iThispagei 22. PRICE
sified
EPA Form 2220-1 iRev. 4-77)    PREVIOUS EDITION 'S OBSOLETE

-------
                                                        INSTRUCTIONS

   1.   REPORT NUMBER
       Insert the EPA report number as it appears on the cover of the publication.

   2.   LEAVE BLANK

   3.   RECIPIENTS ACCESSION NUMBER
       Reserved for use by each report recipient.

       TITLE  AND SUBTITLE
       '"itle should indicate clearly and briefly the subject coverage of the report, and be displayed prominently.  Set subtitle, if used, in smaller
         •je or otherwise subordinate it to mam title. When a report is prepared in more than one volume, repeat the primary title, add volume
         mber and include subtitle for the specific title.

   5.   REPORT DATE
       Each report shall carry a date indicating at least month and year. Indicate the basis on which it was selected (e.g., date of issue,  dare of
        <>,  oval, date of preparation, etc.).

   6.   PERFORMING ORGANIZATION CODE
       Leave blank.

   7.   AUTHOR(S)
       Give name(s) in conventional order (John R Doe, J. Robert Doe, etc.).  List author's affiliation if it differs from the performing orgam
       zation.

   8.   PERFORMING ORGANIZATION REPORT  NUMBER
       Insert if performing organization wishes to assign this number.

   9.   PERFORMING ORGANIZATION NAME AND ADDRESS
       Give name, street, city, state, and ZIP code. List no more than two levels of an organizational hirearchy.

   10.  PROGRAM ELEMENT NUMBER
       Use the program element number under which the report was prepared. Subordinate numbers may be included in parentheses.

   11.  CONTRACT/GRANT NUMBER
       Insert contract or grant number under which  report was prepared.

   12.  SPONSORING AGENCY NAME AND ADDRESS
       Include ZIP code.

   13.  TYPE OF REPORT AND PERIOD COVERED
       Indicate interim final, etc., and if applicable, dates covered.

   14.  SPONSORING AGENCY CODE
       Insert appropriate code.

   15.  SUPPLEMENTARY NOTES
       Enter information not included elsewhere but useful, such as:  Prepared in cooperation with. Translation of, Presented at conference of.
       To be published in, Supersedes, Supplements, etc.

   16.  ABSTRACT
       Include a brief (200 words or less) factual summary of the most significant information contained in the report. If the report Contains a
       significant bibliography or literature survey, mention it here.

   17.  KEY WORDS AND DOCUMENT ANALYSIS
       (a) DESCRIPTORS - Select from the Thesaurus of Engineering and Scientific Terms the proper authorized terms that  identify the major
       concept of the research and are  sufficiently specific and precise to be used as index entries for cataloging.

       (b) IDENTIFIERS AND OPEN-ENDED TERMS - Use identifiers for project names, code names, equipment designators, etc. Use open-
       ended terms written m descriptor form for those subjects for which no descriptor exists.

       (c) COS ATI HELD GROUP - Field and group assignments are to be taken from the 1965 COSATI  Subject Category List.  Since the ma-
       jority of documents are multidisciphnary in nature, the Primary Field/Group assignment(s) will be specific discipline, area of human
       endeavor, or type of physical object. The application(s) will be cross-referenced with  secondary Field/Group assignments that will follow
       the  primary postmg(s).

   18.  DISTRIBUTION STATEMENT
       Denote relea;>ability to the public or limitation for reasons other than security for example "Release Unlimited."  Cite any availability to
       the  public, with address and price.

   19. & 20. SECURITY CLASSIFICATION
       DO NOT submit classified reports to the National Technical Information service.

   21.  NUMBER OF PAGES
       Insert the total number of pages, including this one and unnumbered pages, but exclude distribution list, if any.

   22.  PRICE
       Insert the price set by the National Technical Information Service or the Government Printing Office, if known
EPA Form 2220-1  (Rev. 4-77) (Reverse)

-------
    DATE DUE
     .J   __
il.ll--- j • „. ••

-------
                                                                                                                                   5  I
                                                                                                                                            > Tl
                                                                                                                                            1C 3
                                                                                                                                            a <
                                                                                                                                              3 -
                                                                                                                                 ft
                                                                                                                                   2
                                                                                                                                   o
                                                                                                                                   -
                                                                                                                                   on
                                                                                                                                   O
a) y 3; sr ^
< ft) c
        _

      " r* a
                                                                                                                                            O O O
                                                                                                                                              ,g a
                                                                                                                                             c° a
                                                                                                                                             J QJ -*
  - -  S
  a, n>  J
  -^ O  ^j

  III
                                                                                                                       m > T3 m TITJ
                                                                                                                       5^ 3 3 S S

-------