United States Office of Air Cudhty EPA-450/5-85-005c
Environmental Protection Planning ana Standards August 1985
Agency Research Triangle Park NC 27711
Air
Ambient Ozone And
Human Health:
An Epidemiological
Analysis
Volume III
-------
AMBIENT OZONE AND HUMAN HEALTH:
AN EPIDEMIOLOGICAL ANALYSIS
Paul R. Portney and John Mullahy
Resources for the Future
1616 P Street, N.W.
Washington, D.C. 20036
Volume III
Final Report
June 1985
Submitted to the Economic Analysis Branch, Office of Air Quality Planning
and Standards, Environmental Protection Agency, Research Triangle Park,
North Carolina 27711, under contract number 68-02-3583.
-------
DISCLAIMER
This report has been reviewed by the Office of Air Quality Planning
and Standards, U. S. Environmental Protection Agency, and approved for
publication as received from Resources for the Future. The analysis and
conclusions presented in this report are those of the authors and should
not be interpreted as necessarily reflecting the official policies of
the U. S. Environmental Protection Agency.
-------
TABLE OF CONTENTS
Page
CHAPTER 1. INTRODUCTION
1-1
CHAPTER 2. ECONOMETRIC ESTIMATION OF HEALTH STATUS MODELS
2.1 Introduction 2-1
2.2 Some Problems with Least-Squares Estimation of Health
Status Models 2-3
2.3 Tobit Health Outcome Models 2-7
2.4 Cragg-class Health Outcome Models 2-11
2.5 Truncated-Normal Estimation 2-15
2.6 Heckman's Approach: Sample Selection 2-19
2.7 Tobin, Cragg and Heckman: A Digression 2-21
2.8 Poisson-distributed Health Outcome Measures 2-30
2.9 Geometric-distributed Health Outcome Measures 2-33
2.10 Multinomial-distributed Health Outcome Measures 2-35
2.11 Estimation of Grouped Data Models Under the
Normality Assumption 2-38
2.12 Summary and Conclusions 2-40
CHAPTER 3. AIR POLLUTION MONITORS AND INDIVIDUAL EXPOSURE 3-1
CHAPTER 4. URBAN AIR QUALITY AND ACUTE RESPIRATORY ILLNESS
4.1 Introduction 4-1
4.2 Framework for the Analysis 4-3
4.3 Model Specification 4-8
4.4 Empirical Results 4-11
4.5 Policy Implications 4-20
Appendix 4-28
CHAPTER 5. CONSTRUCTING A LIFETIME SMOKING PROFILE USING THE
1979 HEALTH INTERVIEW SURVEY 5-1
CHAPTER 6. CIGARETTE SMOKING, AIR POLLUTION, AND RESPIRATORY
ILLNESS: AN ANALYSIS
6.1 Introduction 6-1
6.2 Smoking, Pollution, and Acute Illness 6-2
6.3 Data and Estimation Strategy 6-5
6.4 Estimates of Model Parameters and Relative Risks 6-14
Appendix 6-23
CHAPTER 7. CHRONIC RESPIRATORY DISEASE
7-1
-------
Page
CHAPTER 8. ADDITIONAL SENSITIVITY ANALYSES
8.1 The Effects of Precipitation on Acute Health Status 8-1
8.2 Sample Size, Model Specification, and Parameter
Estimate Sensitivity 8-2
8.3 Poisson Regression Analysis of Volume I
Models (48), (49), and (50) 8-4
8.4 Sensitivity to Aggregation Across Smoking and Chronic
Illness Status 8-6
-------
Chapter 1
INTRODUCTION
Volume I of this report presents a. great many results from our basic
analysis of ozone and acute and chronic illness. As indicated in Volume I,
we had to make a number of decisions along the way in the early stages of
our research. One of the most important concerned the tradeoff between the
breadth of our analysis as opposed to the possible in-depth investigation
of a relatively small number of hypotheses. In other words, should we use
fairly standard statistical techniques to investigate dose-response
relationships for a broad range of possible illness, using a variety of
explanatory variables, and in a number of different population groups? Or
should we winnow out a relatively few "promising" relationships using
preliminary tests, and then allocate time and computing resources to the
application of more powerful statistical techniques to these relationships?
With some exceptions we adopted the former approach. Because of our
unique and comprehensive data on air pollution concentrations and
individuals' health and socioeconomic status, and we elected to test a wide
variety of hypotheses about the possible relationships between ozone and
other air pollutants on the one hand, and a variety of acute and chronic
illnesses on the other. In addition, we examined separately several
-------
1-2
different degrees of severity for the acute illnesses we examined and we
also conducted separate analyses for adults and for children (aged 17 and
below). Of course, as we point out in Volume I, we did conduct additional
sensitivity analyses where our preliminary research suggested statistically
significant associations between ozone and the dependent variable in
question. Nevertheless, the general approach was a "broad brush" one.
Since completing the work reported in Volumes I and II, we have
received many helpful comments on and constructive criticisms of the
approach we took in our analyses. Many of these comments came in an
EPA-sponaored public Peer Review Meeting held in Raleigh (N.C.) on April 3,
1984.' There experts from the epidemiological, clinical, biostatistical,
and economic communities presented us with a number of useful suggestions
for further work. In addition, we have received many useful comments from
our EPA project officers and from our colleagues at RFF and elsewhere who
have read with interest our original work. Finally, we have given
considerable thought ourselves to ways in which the original analysis might
be extended or improved.
Thus, over the past year we have tried to conduct additional analyses
that address some of most important questions arising out of our original
work. Volume III below presents the results of some of that work. We say
"some" because we are continuing to conduct additional epidemiological
analyses as time and resources permit, at least some of which may not be
complete until after this report has been submitted. In one sense, in
fact, we hope to never be "done" with our work even though this report
completes our analysis for EPA.
-------
1-3
In one way or another, each of the following chapters is designed to
address one or more of the questions raised in our earlier work. For
instance, Chapter 2 is purely methodological. It presents a variety of
different estimation techniques that may be appropriate when the
assumptions that lie behind ordinary least squares (OLS) are violated as
several careful readers of the studies in Volume I suggested they might be.
There we consider the sorts of problems that arise with OLS in the special
context of health effects estimation. Among the alternatives to OLS we
consider are Tobit estimation, Cragg-type "hurdles" models,
sample-selection and count-data models, multinomial logit approaches, and
grouped dependent variable techniques. This chapter is a long and
technical one, we realize. However, we feel it is necessary to set the
stage for the empirical work presented in later chapters; it should also
prove useful to anyone about to embark for the first time on his own
estimation of air pollution (or other environmentally-related) health
effects.
Chapter 3 is much shorter and simpler. It addresses a common reaction
to our original analysis. Remember that in Volume I, the air pollution
readings we assign to each individual are those measured at the monitor
nearest his or her home, provided that the monitor in question is. no more
tihan twenty miles away (sometimes less). We continue to believe this is
preferable to the most common alternative to this approachsuggested in
the literature matching each individual in an SMSA to the air pollution
concentrations averaged over all the monitors in the SMSA, or within a
subset of it. However, because most individuals do travel about within an
-------
1-4
area, it is possible that the area wide averaged approach might better
characterize the exposures of at least seme individuals. If so, these
averaged concentrations would be the appropriate ones to use in
epidemiological analyses. Hence, it is of interest to know how closely
correlated are the readings at the monitor(s) nearest the individuals'
dwellings with the average of all the monitors within a given radius of the
dwelling. This analysis is undertaken in Chapter 3.
That exercise in turn forma the basis for some sensitivity analysis we
conduct in Chapter 4 of the effect on our findings of different rules about
matching air pollution concentrations to individuals. Chapter 4 extends
and improves upon our original work in a number of other ways, as well.
For instance, building on the methodology presented in Chapter 2 of this
volume, in Chapter 4 we investigate the determinants of acute respiratory
disease using poisson regression instead of the OLS and logit techniques
employed in Volume I.' For reasons presented in Chapters 2 and 4, we
consider this to be a significant improvement on our earlier analysis. In
addition, Chapter 4 presents a more sophisticated analysis of the possible
non-linearities that may characterize the dose-response relationship
linking acute respiratory disease to ambient ozone and sulfate
concentrations. Not only do we consider spline-type functional
relationships, but we also allow for a variety of non-linearities within
the (already non-linear) poisson approach. This, too, sheds additional
light on the analysis in Volume I. Finally, we believe that the
elasticity-of-response calculations contained in the last part of Chapter 4
are a useful way to view the possible effects of changes in ambient ozone
-------
1-5
concentrations on human health. This suggests how our findings might be
used in applied policy analysis if it were desired to do so.
One of the respects in which the analysis in Volume I could clearly be
improved concerns the measures of cigarette smoking we employed. Recall
that in most of the models estimated, we used MCIGS, a continuous measure
of daily cigarette consumption, or SMOKY1NO, a dummy variable indicating
whether or not an individual is a never- or former smoker as opposed to a
current smoker. We also occasionally used an additional dummy variable,
FORMER, to distinguish between those who do not smoke now but once did from
those who never smoked.
However, even this additional treatment resulted in our finding a less
pronounced relationship between smoking and ill health than we might have
expected (although we hasten to point out that even our crude measures of
smoking were often positively and significantly associated with ill
health). One reason for this was our inability in the Volume I analyses to
make use of all the data provided in the HIS Smoking Supplement on
individuals' lifetime smoking histories. Thus, one of our purposes in the
analyses we have conducted since April 1984 was to develop a. measure of
lifetime smoking behavior and employ it in our analyses. Chapter 5
presents the approach we took in doing so. While the HIS smoking data do
not enable us to specify an exact profile of respondents' lifetime smoking
habits, they do permit the construction of several plausible profiles.
These are discussed in some detail in Chapter 5. Among other things, that
-------
1-6
chapter discusses the differing weights that might be given to cigarettes
smoked years ago compared to recent cigarette consumption.
Chapter 6 presents the results of additional empirical analysis of the
relationship between air pollution (ozone and sulfates) and acute
respiratory disease. It extends the analysis in Chapter 4 of this volume,
and all the work in Volume I, in several important ways. First, the
analysis in Chapter 6 incorporates the more sophisticated measures of
individual smoking. For instance, in addition to NCIGS, a measure of
current smoking habit, the analysis also includes the variable PACKS, a
proxy for lifetime cigarette consumption. The analysis in Chapter 6
extends our earlier research in another suggested direction. That is, it
models the individuals' health outcomes as a multinomial logit process in
which, on any given day during the two-week recall period, an individual
could report no restriction of activity at all, a minor restriction in
activity attributable to respiratory illness (with no bed confinement), or
what we refer to as a "severe" respiratory restrictioni.e., one which
requires confinement to bed for at least half the dayj For reasons spelled
out in Chapter 6, we feel this is another productive way to model the
possible relationship between ambient air quality and acute respiratory
disease. (The chapter also contains a very brief discussion of ordered
logit as an estimation approach.)
Chapter 6 is intended to accomplish one additional objective. The
comments on our work in Volume I often expressed surprise that cigarette
smoking did not completely "swamp" ambient ozone pollution in its
contribution to acute (and chronic) illnesseven though we generally found
-------
1-7
a positive and significant association between smoking and illness. Thus,
one purpose of the analysis in Chapter 6 is to explore in somewhat greater
detail the relative risks posed by cigarette smoking and air pollution.
This is important since considerable public resources are currently devoted
to reducing both. While far from being comprehensive on the subject,
Chapter 6 does explore these relative risks in some detail.
As the preceding pages suggest, most of the emphasis in Volume III
falls on the possible relationship between ambient ozone (and sulfate)
concentrations and acute respiratory health. This reflects the heavy
emphasis given acute health effects in our earlier work as described in
Volume I. However, we did devote some attention in Volune I to possible
relationships between long-term exposures to air pollution and the
prevalence of chronic respiratory and other kinds of disease.
Chapter 7 below presents the results of some preliminary reanalysis of
those finding, specifically those dealing with chronic respiratory disease.
The analysis below extends our original work in several important ways.
First, we restrict our attention in Chapter 7 to a group of individuals who
at the time of the 1979 HIS had lived in their present location for at
least ten years. This is a more irresidentially stable" group than that
analyzed in Volune I, an important consideration in the epidemiological
investigation of chronic illness. In addition, the individuals analyzed in
Chapter 7 are divided into two distinct groups depending on whether or not
they received a special supplement (or "probe") on respiratory disease as
part of the 1979 HIS.' Because the reported incidence of chronic
respiratory disease varies by a factor of six between those who received
-------
1-3
the probe and those who did not, we felt the two groups should be analyzed
separately rather than pooled as in our original analysis. Finally, this
reanalysis includes some model specifications in which ozone is measured by
the ambient concentration averaged over all the monitors with ten or twenty
miles of each resident's dwelling.
Finally, in Chapter 8 we report our responses to a variety of comments
or queries on Volumes I and II.' None of these required the preparation of
a separate chapter, but each was important enough to merit consideration.
One final note about Volume III.' Several of the chapters have been
written to serve more than one purpose. For instance, a slightly revised
version of Chapter 4 will be appearing in the Journal of Urban Economics in
1986 under the title "Urban Air Quality and Acute Respiratory Illness.lf
Similarly, the material in Chapter 6 formed the basis of a paper presented
at the 1984 annual meetings of the American Economic Association in Dallas.
While we have modified them for incorporation into Volume III, some
materialparticularly the brief descriptions of the HIS and air pollution
data baseswill occasionally appear repetitive.
-------
Chapter 2
ECONOMETRIC ESTIMATION OF HEALTH STATUS MODELS
2.1 INTRODUCTION
In the last decade or 30, estimation of microeconoraic models of
individual behavior using large individual- or ho use hold-level data sets
has flourished and proven an important advance in applied economics.
Details typically masked in aggregate time-series data analysis are often
available in individual cross-sectional data, thus enabling the testing of
hypotheses about responses of individuals to changes in constraints.
In such micro datasets one is prone to find measures that economists
would characterize either as corner-solution realizations of instantaneous
optimizing decisions or as discrete representations of such decisions. An
example of the former case would be where one has data on the number of
hours an individual worked in the market over a given year,, and for some
subset of individuals no market hours were worked.
An instance of the latter case is where data are available only on
whether or not an individual had purchased some consumer durable over the
previous twelve months, but not on the amount of the expenditure. Assuming
such statistical models to be the objectives of estimation, then the former
is an example of what have come to be known as limited dependent variable
(LDV) models, while the latter is a member of the class of qualitative
dependent variable (QDV) models. Tobin's pioneering 1957 paper on durables
demand is the forerunner of LDV estimation in economics. Using data on 735
households, Tobin modeled the ratio of durables expenditures to disposable
income; for 183 of these spending units, no durables were purchased during
-------
2-2
the time period of interest and a "corner solution" had to be treated. As
is well known, the solution to this problem was the genesis of the Tobit
estimator, which will be discussed below. Note that if Tobin only had data
on whether or not there was sane durable purchased rather than on the
actual amount, a QDV model (such as binary probit or logit) would have been
the appropriate approach.
In this chapter we discuss the theory and practice of econometric
estimation of LDV and QDV models as they pertain to health status measures
such as respiratory-related restricted activity days, or the
presence-absence of a chronic respiratory condition. It is seen that,
owing to the nature of the available micro data, standard econometric
techniques such as ordinary least squares (OLS) will typically be
inappropriate tools for the analysis of the relationships between air
pollution and human- health. The available data on health status measures,
rather, are generally of a nature best described as qualitative or limited
dependent variables. This being the case, more complicated estimation
techniques are in general required in order to obtain consistent estimates
of the parameters governing the health status outcomes. Maximum likelihood
is the estimation method most commonly used in such analysis.
The treatment here is necessarily brief. However, several excellent
surveys are available for the reader who wishes more detailed treatments of
the topics to be discussed below. The 1981 and 1984 surveys by Amemiya are
excellent overviews of qualitative and limited dependent variable models,
respectively, and the 1983 monograph by Maddala provides broad coverage in
both these areas. The often-cited 1981 volume edited by Manski and
McFadden is also an excellent survey of topics in qualitative and limited
dependent variable estimation.
-------
2-3
Seme definitional preliminaries are appropriate here. First, standard
practice is followed, with random variables represented in upper-case
notation, their realizations in lower-case. Second, the terms "censored
distribution" and "truncated distribution" will be used with considerable
frequency below. The introduction to chapter 6 of Maddala (1983) provides
a good heuristic explanation of censoring and truncation as they pertain to
the normal econometric model.
The plan for the remainder of this chapter is as follows. First, we
briefly assess problems associated with least squares estimation of air
pollution - health status models. Then we turn to a discussion of some
techniques that might be considered more or less appropriate for the
estimation problems attendant to estimation of health status models.
Following this we turn to a discussion of prediction based on the
estimation of the various models. A summary concludes the chapter.
2.2 SOME PROBLEMS WITH LEAST-SQUARES ESTIMATION OF HEALTH STATUS MODELS
As mentioned above, this chapter surveys various econometric
techniques for estimating health outcome models. As will be seen
throughout, these techniques are generally such that iterative (and
sometimes costly) maximum likelihood methods are required in order to
obtain consistent and efficient estimates of the models' parameters. Since
sound econometric policy analysis depends at least in part on obtaining
consistent, if not efficient, parameter estimates, the question is then
begged: why is it necessary to utilize such complicated and expensive
methods when simple and inexpensive least-squares algorithms abound? In a
nutshell, the answer is that least-squares estimates of models of the genre
we are considering will generally be biased and inconsistent. The purpose
-------
2-4
of this section is to briefly demonstrate why this is so. To this end, a
brief exposition of the fundamentals of the basic linear econometric model
is presented, the requirements for consistent estimation of the parameters
are explained, and why at least seme of these requirements are unlikely to
be met in the health status models to be considered is discussed. The
exposition of the linear model and its properties follows that of Schmidt
(1976), which is among the most lucid in published texts.
Of fundamental concern is consistent and, if possible, efficient
estimation of the parameter vector 8 in the case where random variables Y,
2
are distributed in some manner with finite mean u. and finite variance a .
Specifying w.=X.0 makes the problem nontrivial, with X. a 1xk vector of
independent variables which will in general include measures of air
pollution and other covariatea, and 8 a kx1 vector of unknown parameters to
be estimated. Given these assumptions, we can write
., (1)
where, because E(Y.) - X. 3 and Var(I ) » a , e, has mean zero and variance
2
a . The unobserved realizations of e. correspond to the observed
realizations of I., y. . It is assumed that there exist T independent
observations on (y.,X.).
The model described satisfies full ideal conditions (Schmidt, p. 2)
when
i) X is a nonstochastic matrix of rank k
-------
2-5
matrix of independent variables, y will henceforth denote the Txl
vector of the realizations y. .
It can be demonstrated that, with or without the assumption of
-1
normality for e, the OLS estimator of B, B - (X'X) .X'y, is consistent:
i) B - (X'Xj'lx'y
- (X'X)~1X'(X8+£)
- B + (X'X)"1X'e
E(S>- 3 * (X'X)~1X'E(e)
= 8+0
A
* S, so that 8 is unbiased for 3;
"21
ii) The covariance matrix of 8 is a (X'X) . so that, with all limits
taken for T-*«,
lim a2(X'X)"1 = a2lim(X'X)"1
= a2lim(X'X/T)~1T~1
» a2lim(cf1T~1)
- 0,
because from above Q is finite nonaingular so that its inverse
exists and is finite, and a is finite;
^.
iii) Therefore, since S is unbiased and its covariance matrix vanishes
in the limit, then 8 is consistent.
Because of its computational ease, least squares is obviously an
appealing tool for model estimation. The analyst must assess whether any
or all of the above conditions fail to characterize the data or model under
consideration to see if least squares maintains its consistency properties.
Should least squares prove inconsistent, alternative, and generally more
costly, methods of estimation must be utilized in order to obtain
consistent estimates of 8.
-------
2-6
As discussed in detail below, a very general characterization of
quantitative health outcomes measures is that they are data bounded from
below by zero, i.e. data realized only in nonnegative quantities. Of
specific concern here are measures like "amount of time spent ill." Such
measures are generally modeled econometrically as the censored or truncated
*
counterparts of normally-distributed latent random variables Y. having
2
E(Y.) - X.3, Var(Y.) = a . However, if the realizations of Y. are censored
frcra below at zero, we have
(2)
E(Y*)
where and $. are the standard normal density and distribution functions
evaluated at (X.B/tf). In the truncated case, where Pr(y.>0) - 1,
E(Y*) - X^ * ai/*i. (3)
When defined in terms of these expectations, the problems inherent in
least squares estimation become apparent. Since E(a<<»./*.) * 0, then E(e )
4 0 when e. is defined as the difference between either E(Y..) or E(Y.|y.>0)
and X.8 in (2). Thus least squares regression of y on X will yield
inconsistent estimates of 3, given that the null error expectation
assumption has been violated. Heckman (1976) is a good general discussion
of such problems.
Not all measures of interest in our analysis are cast in terms of
normally-distributed, parti ally-observed random variables. In the other
-------
2-7
cases we shall investigate, there are yet different characteristics of the
data or the assumed statistical distributions that render least squares
inappropriate given the objective of consistent parameter estimation. For
example, least-squares estimation strategy is generally completely
inappropriate when outcomes are qualitative since no objective function of
interest can be east in terms of linear expectations functions like those
above. We now turn to an assessment of various approaches to the
estimation of health status models.
2.3 TOBIT HEALTH OUTCOME MODELS
A logical starting point is the basic Tobit model. The nature of
several of the health status measures of interest in the micro data sets
being analyzed in this study is such that Tobit estimation would seemat
least at first blushto be a sensible approach. (See Osfcro (1983) for an
application of Tobit to a similar problem.)
Tobit estimation has been utilized in a variety of areas in applied
microeconomics, ranging from labor supply (see the excellent survey by
Killingsworth (1983)), to health economics (Ostro, (1983)), to commodity
demands or expenditures (Tobin (1957), Pitt (1983)), and many others (see
Amemiya (1984) for an extensive bibliography). The basic idea underlying
Tobit estimation is that one posits the existence of (latent) random
* 2
variables Y. are independently, normally distributed (MID) (X.S.a ). In
*
many interpretations of the Tobit model, the Y., are stochastic indicators
of intensity of desire for undertaking some activity. Owing to the nature
*
of the activity, however, some realizations of the Y. are censored while
for the others, the intensities are mapped directly into actual
undertakings of the activity. Some threshold, in effect, is crossed such
-------
2-6
that the activities are actually undertaken. For example, the fundamental
*
idea behind Tobin's seminal paper is that the Y. represent intensities of
desire to purchase consumer durables. When certain (assumed known)
thresholds are crossed, these intensities become actual purchases. In most
applied areas, the thresholds are zero, so that the mappings from
intensities into undertaken activities can be looked at as occurring when
*
the realizations of the I. occur in the interior of commodity space.
Otherwise, corner solutions obtain (for one discussion of estimation in the
Kuhn-Tucker/ corner-sol ution/Tobit context, see Wales and Woodland (1983)).
Assuming, then, that the thresholds are known and constant across
individuals, the basic Tobit model can be described by (4):
Y.* - NID(X, 3, a2)
* <*>
yi = max(G, y. ) .
Setting C - 0 gives the model we shall discuss in the sequel. Letting fl-
*
signify the index set for observations for which raax(0, y.) - 0, and fl be
*
the index set for observations for which max(0, y.) > 0, then the
likelihood function for the Tobit model described here is
X 3 y -X 3
(5)
In log form (5) is
Z ln(1-*.) - |Q,|lna - Zin*. (6)
X x
iefl,
where |»| denotes cardinality and where terms not involving (3, a) are
dropped.
The first-order conditions for maximizing I are the (k * 1) equations
3H/36 = E (-X /a)X| * I (y^-X^X^/a = 0
l
-------
2-9
2 223 (7)
EA.(X.6)/a + Z ((y.-X.S) - a )/./ (1-<&. ) Using terms in these equations, the method of
Berndt-Hall-Hall-Hausman (1974) among others, can be used for optimization,
and statistical inference is based on the asymptotic t- tests generated by
N .
utilizing [z(l.Z!)] . as the estimate of cov(S) (t. is the i-th term of
1=1 1 l 1
[(34/38)',
Several characteristics of the Tobit model are noteworthy. First, as
Amemiya (1984) points out, the likelihood function (5) can be rewritten as
L - Cud-*,) n *,1 Cn( <(>/*. a)] (8)
Written in this form, the likelihood function of the Tobit model can be
viewed as the product of the likelihood functions of a probit model with
parameter vector a » (8/a) (first brackets) and a truncated-at-zero normal
distribution with parameters (S,a) and E(Y..) - X. 6 *- a«f./*. (second
brackets). As such, separate maximization subject to the restrictions that
the probit parameter vector be a positive scalar multiple (specifically
1/a) of the parameter vector of the truncated normal model yields the Tobit
model. The probit component can, of course, be viewed as the model of
whether or not the threshold is crossed, while the truncated normal
component models the conditional phenomenon of the magnitude of the
activity given that the activity is undertaken.
It is certainly reasonable to consider the possibility that the
parameter restrictions described in the proceeding paragraph are in fact
invalid. This would indicate, therefore, that the model of threshold
-------
2-1 0
crossing is not as intimately related to the conditional model of the
magnitude of the undertaken activity as is implied by the Tobit model. In
the context of health outcomes, this could mean that the phenomenon of
whether some illness occurs is governed by a set of parameters different
than that determining the amount, duration, or severity of the illness,
given that seme illness occurs. We discuss such issues in greater detail
later in the Chapter.
Another characteristic of the Tobit model that merits discussion is
the fact that the parameters estimated under the assumptions of the Tobit
model are in general nonrobuat to departures from many of the underlying
assumptions. That is, violation in the data of seme of the properties
implied when the likelihood function is written in the form (5) will lead
to inconsistent estimates of the parameters (S,a). This phenomenon, which
is not uncommon in many types of models that are estimated by means of
maximum likelihood, stands in contrast to more familiar formulations such
as OLS and nonlinear* least squares where, in spite of a variety of
departures from the assumed ideal structure of the error terms, one can
still obtain consistent estimates of the structural parameters.
Two of the most often discussed violations that bode dire consequences
for Tobit parameter estimates are violations of the MID assunption: first,
that the error variances are nonconstant across observations, and second,
that the error structure, though perhaps homoscedastic, is nonnormal. Note
that normal, homosoedastic errors are implied when writing the likelihood
function in the form (5). The results of several studies, summarized by
Amemiya (1984), suggest that under either type of departure, the maximum
likelihood Tobit parameter estimates are inconsistent.
-------
2-1 1
2.4 CRAGG-CLASS HEALTH OUTCOME MODELS
In a 1971 paper, Cragg proposed a set of models for situations that
can be depicted as follows. An economic agent makes two (simultaneous)
decisions. A dichotomous decision is made about whether or not to engage
in some activity. Conditional on an affirmative for this decision, a
decision is made regarding how much of the activity to pursue. The
activities can be construed in the broadest of terms: expenditures,
quantities demanded or supplied, or the amount of time spent in ill health.
Such models have come to be known as hurdles models, that is, conditional
or some hurdle being crossed, a decision is made about seme magnitude of
interest. Although these processes might in some cases seem logically to
be ordered in a temporal manner, the statistical properties of the model
abstract from any temporal considerations, the quantity decision being
described in terms of conditional densities.
Cragg proposed several models. However, because of the nature of the
present study, only two members of this set will concern us here, these
being the formulations wherein the quantity or second-stage decision is
defined only on the positive reals. This is in obvious reference to ideas
like "given that an individual had some illness, how much time was spent
ill." Although Cragg's other formulations are also interesting, their
discussion is omitted for economy of space.
For notational ease, we will assume that the same vector of
independent variables, X., influences both the first-and second-stage
decisions. This is a completely innocuous assumption, however, as elements
of parameter vectors can be restricted equal to zero to accommodate more
general cases. Regardless of the specification of the second-stage or
conditional decision, the first-stage is described by a binary
-------
2-1 2
probit model, i.e. the existence of latent random variables
* ?
Y..J -N(X. 8, , a") is posited. Only the signs of the realizations are
recorded, however, and are codified according to
0, y < 0
Because of this codification scheme, there is no information about the
* *
scale of the random variables Y^ (i.e. the mappings of y., into y,1 are
* * -
unaffected by transformations of Y(1 of the form 9Y... for Q > 0) .
Therefore, some normalization is required, the most common being a. » 1 .
This formulation gives rise to Cragg's formulation of the hurdle-crossing
model, where, with obvious change from Cragg's notation, we specify
Pr(yn - 1) - XX^) (10)
Pr(y11 - 0) - tf-Xj^),
where * is the standard normal distribution function (Cragg uses C(*) for
*()).
For strictly positive second-stage quantity realizations, Cragg
proposes two alternative formulations. Both are based on the specification
of the conditional densities for random variables Y,.^ given that the
activity is in fact undertaken.
The first formulation is one where the conditional density for the
realizations of the '£.- is truncated-normal, with the truncation point at
zero. Thus we have
yi2~Xi32 Xi32
~^), y_2 > o (11)
-------
2-13
= 0 , else,
where $ and $ are as defined earlier. With obvious notations! change from
Cragg's article, the (unconditional) likelihood of the positive
realizations, can be written as
f(y.-) = s(y<0|y-1=1) Pr>(y4, -1) =
ic. id' i i n.
(12)
y 0. Therefore, the likelihood function of Cragg's first model is
7i2 "V 2 Xi82
L - H *(-X.8) II ( :) + ln«(X 8 ) - Ina - ln*( ). (
ff IT ff
In the form (14), it is straightforward to see that maximization of I is fully
equivalent to the two-stage maximization problemr
1) Probit estimation of the parameter vector 8 via maximization of
2) Truncated-normal estimation of the parameters (82»a^ via
of
7~XS XiS2
^-^-). (16)
-------
2-1 4
Because of the complexity of the log likelihood (I1*), estimation in this
two-stage fashion is likely to be somewhat easier than attempting to maximize
(14) with respect to the (2k+1 ) parameters (0 , 8 , a).
Cragg's second formulation again depends on the probit first-stage model,
but the conditional density of the positive realizations is respecified.
Instead of assuming that the conditional density of the positive realizations
of Y.2 is truncated-normal, the model is now formulated such that the
logarithms of the y are normal, i.e. conditional on y...=0, log(y._) -
N(X.8_,0 ). The conditional density for the isfl. is
(yi2a)
_, log(y )-X
where the term (yi2) . is the Jacobian of the transformation from y,_ to
log(y._). Therefore, the likelihood for the iefl., which is Cragg's equation
(11), is
f(y12) - Myl2|yll-l)Pr(yll-1)
log(y )-X 3
\ »i / *>^ * ^*\ *./ v ^ \ rift^
The likelihood function for the entire sample is
_, log(y )-X 3
L- tt^-X.ft.) II (y.-a) !*( - - L^-j^x.g.) (19)
l£: a l
In log form,
log(y )-
8)) * E
' lefl,
-Iny - Ino (20)
As in Cragg's first model, the second model can be estimated in two stages:
-------
2-15
1) Probit estimation of B- as above;
2) OL5 estimation of (82»cr) using the log transform of the y._ as
dependent variables and X. as the independent variables. This is
perhaps surprising, but results because the terms in (20) involving
(82,a) are identical to those of the likelihood function of the
familiar normal linear model.
Because of the simplicity of this two-stage approach, estimation in such a
framework is obviously appealing. Duan, et. al. (1983) have proposed the
second Cragg model to estimate medical expenditures: individuals either have or
do not have medical expenses, and given that they have medical expenses, the
2
conditional density of the expenditures is lognormal, log(Y.?)-(X. S?,
-------
2-16
truncated-frcm-below distribution where the point of truncation is constant
across observations and is assumed to be zero. The results easily generalize,
however, and for a discussion of the statistical properties of the truncated
normal distribution in the most general case, the reader is referred to Johnson
and Kotz (1970, pp. 81-87).
It should be noted that interest in the truncated normal should not be
confined to the role it plays in the Cragg model. The distribution is indeed
useful in many empirical situations. Hurd (1979) notes that
(e)stimation based on only positive y's cornea about very
naturally in a number of kinds of studies. For example, in many
labor supply studies one of the right-hand variables, the wage
rate, is only observed when the left-hand variable, labor
supply, is positive. Imputing the unobaerved wage rates causes
a number of complications that can be avoided by discarding
those observations for which labor supply is zero. Another
example is a demand study where the price is not known unless a
purchase is made. (Hurd, 1979, p. 248).
For our purposes, the likelihood function of the truncated normal can
be constructed as follows. We assume the existence of T. + T realizations
2
of random variables Y..-NID (X.S.a ). However, for whatever reasons, only
the positive realization of the Y,. are used in the analysis, these assumed
to number T . Given these assumptions, the likelihood function is
T1
L = IT (<(>./
-------
2-17
where . is the standard normal density evaluated at ((y. - X.3)/a), and *.
is the standard normal distribution function evaluated at (X. 3/a) which
serves as the normalizing factor of the truncated density. The
log-likelihood function (suppressing terms not depending on (3, a)) is
T 2 XiS
A I ~.5((y. - X.B)/ar ~ log a - log *(-£-) (22)
i-1
Estimation is by means of maximum likelihood. The first-order conditions
for a maximum of I are
T1r~*l 7i"Xi8
31/36 = E ![ * ( 2 )] X[ - 0
i="1 i
-------
2-13
Olsen's method relies on a method of moments technique whereby the
moments (specifically the mean and variance) of the empirical incomplete
distribution that of the positive y. , are related to the moments of the
complete distribution via formulae developed by Pearson and Lee (1908).
Extending the Pearson-Lee methodology to the multiple regression case,
Olsen demonstrates that the least squares slope coefficients differ from
the true slope coefficients by a common factor, and he presents in tabular
form the multiplicative correction factors needed to transform the OLS
estimates of the slope, intercept, and standard error parameters (based on
data from the incomplete distribution) to the corresponding complete
distribution estimates* In practice, we have fitted polynominal functions
of the third degree to Olsen's tabled data so that the transformations are
facilitated.
Olsen also presents the multipliers for transforming the
(mean/standard error) ratio estimated by OLS on the incomplete distribution
to the corresponding ratio of the complete distribution, (u/a). Olsen
notes that $(u/
-------
2-19
2.6 HECKMAN'S APPROACH; SAMPLE SELECTION
A very popular technique for estimating models with limited dependent
variable estimation is the sample selection model, attributable largely to
Heckman (1976, 1979). The model has a number of applications (see
Heckman's 1976 article in particular), and is quite easy to estimate.
Because it is so well-known, we will only provide a sketch of the details.
The following section, which contrasts and compares the Tobit, Cragg, and
Heckman models, sheds some more light on subtleties of Heckman'3
formulation.
Heckman considers the following two-equation model:
(24)
£i2
It is assumed that e... and e.? are distributed, j.oint normal, with marginal
2 " 2
densities N(0, a ) and N(0, o
(26)
- o, y,- < o.
In Heckman's model, the realizations y.. are available to the analyst only
*
when y > 0, i.e. when y._ »1 .
-------
2-20
A concrete example is where (24) is a model determining market wage
rate (or log(wage rate)) by a linear function of X. and random error and
where (25) is a model determining hours of labor supplied in the market.
It is assumed that either hours of labor supplied or a discrete binary
indicator of whether or not any hours were supplied is available for all
observations. However, because market wage rates are only observed for
individuals for whom the market wage rate exceeds the reservation wage at
*
zero hours, data on the y... are available only when y.? > 0 (y.? - 1).
Heckman then considers the expectation E(Y ]y » 1), which can be
written as
E(Yillyi2 '1) 'Vl *E(sillyi2 - 1)' (27)
If one considers least-squares estimation, of (27), the question is: Do
there obtain consistent estimates of 8. when y. is regressed on X. for
those i for whom y,_-1 ? Basically the issue is whether the expectation
S(e |y 1) is null. In general, and thus at the core of the sample
selection bias problem, the answer is "no". Based on well-known formulae,
it holds that
E(eil'yi2 = 1) " ai2VB2n~*i)f (28)
where . , (1-*.), and a., are all positive, then least
squares estimation of (27) will be based on an expectations function with
nonnull disturbance expectation, and will therefore yield inconsistent
-------
2-21
estimates of 3. .
Heckman's suggested procedure in this situation is as follows.
Estimate on the entire sample a probit model for the discrete indicator
representation of the model (25). This yields a consistent estimate of the
parameter vector (6 /a_) from which consistent estimates of \. * ./(1-$.)
are constructed. Form the Tx(k-M) matrix Z » [XJA], where A is a Tx1
vector with typical element \. , and regress y. on [x. , i.]. This procedure
yields consistent estimates of the parameters 3. and (a12/a?), having
effectively solved the omitted variables problem by using a consistent
estimate of E(e |y.2 » 1) as a regressor.
*
In the context of health outcomes models, one could define y.2 as some
latent index of the propensity to be ill. Given that this index is greater
than some threshold level, illness results, its magnitude determined by the
realization y . The translation of the latent illness model into
Heckman's framework is not straightforward, however. For those individuals
not reporting- illness over the sample interval f we observe zero time spent
ill rather than not observing the amount. It is therefore difficult to
interpret the meaning of the realized, but unobserved, y for the healthy.
We turn in the next section to a more detailed analysis of such subtleties.
2.7 TOBIN, CRAGG, AND HECKMAN; A DIGRESSION
As there are some similarities between and among the models described
above and identified for expositional parsimony as the models of Tobin,
Cragg, and Heckman, it is appropriate to summarize their similarities and
differences and in so doing to elucidate the circumstances in which each
model is more or less appropriate. (The discussion of Cragg1 s model here
is the second Cragg model (probit/truncated-normal), as that version is
-------
2-22
most similar to the others discussed here.)
First to note is that the Tobit model results as a restricted version
of both the Cragg and the Heckman models. The reason for this is purely
mechanical, however, and should not be taken to imply that the Cragg and
Heckman models are in general identical. As we will see below, these
models are structurally quite different.
To see that the Cragg model reduces to the Tobit, the Cragg
log-dikelihood function can be written (following Lin and Schmidt (LS)
(1984)) as
X 3
4 - £ ln*(-X 8,) + E [lirtU^ ) - ln*(-~)_
(29)
(1/2)ln(2H
-------
2-23
excerpt from LS provides a particularly cogent summary description of the
appropriateness of the restricted (Tobit) versus the unrestricted versions
of the Cragg model:
(I)n the Tobit model any variable which increases the probability
of a non-zero value must also increase the mean of the positive
values; a positive element of 8 means that an increase in the
corresponding variable (element of X.) increases both Pr(y.>0)
and E(y |y > 0). This is not always reasonable. As an example,
consider a, hypothetical sample of buildings, and'suppose that we
wish to analyze the dependent variable "loss due to fire," during
some time period. Since this is often zero but otherwise
positive, the Tobit model might be an obvious choice. However,
it is not hard to imagine that newer (and more' valuable)
buildings might be less likely to have fires, but might have
greater average losses when a fire did occur. The Tobit model
can not accommodate this possibility.
Another problem with the Tofait model is that it links the shape
of the distribution of the positive observations and the
probability of a positive observation. For rare events (like
fires), the shape of the distribution of the positive
observations would have to resemble the extreme upper* tail of a
normal, which would imply a continuous and faster than
exponential decline in density as one moved away from zero.
Conversely, when zero occurs less than half of the time, the
Tobit model necessarily implies a non-zero mode for the non-zero
observations.
Cragg's model avoids both of the above problems with the Tobit
model. A reasonably strong case can be made for it as a general
alternative to the Tobit model, for analysis of data sets to
which Tobit is typically appliednamely, data sets in which zero
is a common (and meaningful) value of the dependent variable, and
the non-zero observations are all positive. The distribution of
such a dependent variable is characterized by the probability
that it equals zero and by the (conditional) distribution of the
positive observations, both of which Cragg1s model parameterizes
in a general way. (LS, pp. 174-175 )
Turning now to Heckman's formulation, his two-equation model is seen
to reduce to the Tobit model as follows. Recall that the model can be
written (with notational changes obvious) as
(30)
-------
2-2 U
2= Xi82
*
Y is a latent variable, however, and only a discrete (0,1) sign indicator
of its realization y\2 is available, y is observed only when y.» = 1.
Letting 3- =8- and e. =e.? (i.e. the error structure is univariate rather
than bivariate), then the Heckman model is the standard Tobit model. The
logic is that when these restrictions are imposed in the Heckman
two-equation model, the remaining single equation plays both the censoring
and the determination-of-intensity roles. Since the censoring occurs as a
*
result of a non-positive realization of the random variable Y.2, the Tobit
requirement that the quantity or intensity realization be confined to the
*
nonnegative orthant is automatically satisfied when the restriction y-i^Viy
(i.e. S-i-6-, en*e-2^ is imP°3ecl- In general, however, the Heckman
two-equation framework is not specifically designed to model situations
where realizations of the dependent variable of interest are necessarily
nonnegative and are recorded for all individuals/observations, and where
Pr(y.aQ) > 0. Heckman's formulation has y./O except on a set of measure
zero. We turn now to an explanation of the fundamental differences between
the Heckman two-equation formulation and the two versions-of-rinterest of
the Cragg model.
*
The two-equation Heckman model describes two phenomenon, Y and Y.o,
2
that are marginally, distributed, respectively, as MID(X. B , a ) and
22
NID(X.32,0-) (a? is usually restricted - 1 for normalization when only the
*
sign of y.~ is observed). The joint distribution is bivariate NIDCX.S^
Y a 2 2
i82f °1 a-| , P) » where p is the correlation of (s.,. £jO» (a^/a^),
which is in general nonzero. The important point is that these marginal
and joint distributions are unconditional. That is, for all i, there exist
-------
2-25
realizations (y..., y.p) although the realizations y.. for some i will be
unavailable to the researcher. Casting the problem concretely in the area
where Heckman1s model has been most fruitfully applied, labor economics,
sheds further light on the subtleties of his model. Here we define
*
y. =log(W.) and y.~=log(H.+1) , where W. is wage earned in market work and
! 1 i \(~ L , C
*
H. is hours of market work. Thus, y.2 is positive only if market hours are
positive. It is posited that the expected values of both Y. and Y._ are
linear functions of personal characteristics and other variables so that
the two-'equation model results. However, because we only observe the
market wage for those individuals actually participating in market work
(those for whom H.>0), some subset of observations will not have data on
the y... There is a market wage determined for nonparticipants; whether or
not such individuals have knowledge of their market wages is immaterial.
The relevant analytical fact is that such data are unavailable to the
researcher.
In this labor supply framework, it is apparent why the estimation
techniques developed for the two-^equation Heckman model and discussed
earlier in this chapter have such appeal. The more immediate concern, of
course, is whether such techniques are in fact appropriate to the
estimation requirements of the present analysis. In a nutshell, Heckman's
model is one where there are two equations of interest, both holding for
all i unconditionally, and where (except when restricted so as to be
identical to a Tobit model) the probability of observing realizations of
the dependent variable equal zero is zero. Does such a formulation capture
the essence of the "corner solution" problems of the health status outcomes
phenomenon?
It seems rather artificial to cast such phenomena in such this
-------
2-26
framework. It is not generally the case with the generation of health
outcomes data that one can posit the existence of some latent variable such
that data for the illness raeasure(s) of interest are only available given a
positive realization of the latent variable. Rather, the processes of
interest here are represented more typically by data that indicate the
realizations of illness outcomes for all individuals, even though these
realizations are quite frequently on the boundary of the "consumption" set.
In sketching some of the differences between the Heckman two-equation
formulation and the Cragg models with particular reference to data sets
where the zero or corner solution outcomes are meaningful and where nonzero
outcomes are strictly positive, LS observe that in such cases the Heckman
model's assumptions are not particularly representative of the situation
because in the Heckman formulation:
the observed values of y. ^ need not be positive, in the sense
that the model implies a non-zero probability of observed y <
0; and the unobserved y are literally unobserved, rather than
observed as equal to zero. The first of these problems can be
circumvented, for example, by measuring y.. in logarithms,...and
the second problem is in any case fundamental. (LS p. 175).
We turn now to a discussion of how the Cragg models differ in substance
from the Heckman two-equation setup and argue that the Cragg formulations
are relatively more suited than Heckman1s model to the nature of a subset
of our estimation requirements.
Although like the Heckman formulation in being a "two-model"
specification, the fundamental point of departure for the Cragg technique
-------
2-27
is that one of the two models is formulated in terms of conditional
expectations. The conditions on which the expectations are taken are, as
described above, the outcomes of unconditional models, which are generally
stated as binary representations of latent random variables. Thus, in the
context of health measures, there is an unconditional model defined for all
individuals determining the binary outcome (illness, no illness).
Conditional on an "illness" outcome, the quantity or duration of illness is
determined either by a lognormal or truncated-normal model. The
unconditional likelihood for a representative ill individual is then
density(illness duration given some illness)*Pr(some illness), (31)
which is equation (12) as specified earlier. There is no density of the
quantity of illness defined for the healthy, unlike Heckman's formulation that
defines such a density for all individuals.
Deaton and Irish (DI) (1984), in an independent line of investigation,
have purportedly cast Cragg's first model in a two-equation Heckman
formulation. They indicate that a positive observation on the quantity measure
*
of interest is made when, in the notation used earlier, both Y. and Y._ are
realized as positive, else a zero or a nonparticipation results. In two cases,
DI specify
(32)
7
Cast thusly, the Cragg model can be viewed as a Heckman two-equation model, but
with a restriction imposed that is absent in Heckman's formulations. That is DI
seem to have ignored one aspect of the Cragg model that is key in
-------
2-28
*
differentiating it from Heckman's specification, viz. that y.?>0 is both a
necessary and sufficient condition for a positive realization of y... to
i * *
result. That is, Pr(y. >0|y.2>0) - 1, Pr(y. Ojy_2<0) = 1. When, and only
when the first hurdle is traversed is there a positive amount of the activity
undertaken. So DI's statement that positive realizations of both variables
determines whether y. is observed positive is somewhat misleading in that a
positive realization of either suffices to assume the positivity of the other.
Neither of Cragg's specifications, then, is really in the spirit of the model
proposed by Heckman except, of course, when both the Cragg model and the
Heckman two-equation formulation are restricted such that the Tobit
specification results.
Owing to the subtleties of the arguments, it is likely that the above
discussion has provided somewhat less than a total clarification of all the
relevant issues. Some of these shortcomings are due to the fact that even
central participants in the academic debates appear still unconvinced about the
nature of the differences among the estimation techniques. For example, as
noted earlier Duan and coauthors (1983) have used the Cragg estimation
technique to model individuals' medical expenditures. The expenditure
decision, in the spirit of Cragg's specification, is statistically modeled as
two separate processes. Model one determines the binary outcome of whether or
not any expenditures will occur, and model two determines the amount of
expenditure (positive by definition) that results conditional on there being
some expenditure. In this paper, Duan and coauthors assert that the covariance
between the error terms of the two models is irrelevant insofar as construction
of the likelihood function is concerned.
Recently, however, Hay and Olsen (1984) have questioned the Duan and
coauthors method, stating that this approach "requires some fairly unusual
-------
2-29
assumptions on the model joint error distribution and functional form (p.
279)." Moreover, Hay and Olsen go on to claim that the Duan and coauthors
formulation "can be interpreted as being nested in the more general sample
selection models (p. 279)." Duan and coauthors respond that Hay and Olsen "are
incorrect in claiming that our models are nested within the sample selection
model," and that "the conditional specification in the multi-part (i.e., Duan
and coauthors) model is preferable to the unconditional specification in the
selection model for modeling actual (v. potential) outcomes (p. 283)."
As we argued earlier, the sample selection or Heckman approach is
particularly fruitful when analyzing phenomena such as labor market
participation. Quoting Duan and coauthors:
For certain empirical problems such as labor force
participation, the primary goal might be to predict the
potential outcome instead of the actual outcome; therefore, an
unconditional specification such as the sample selection models
might be preferable. For the present application, however, the
goal is to predict the actual expense, not the potential
expense; therefore, the unconditional equation... is of no
direct interest, and the preference for the unconditional
specification in the other empirical problems does not apply to
the present application, (p. 286).
In any event, this discussion demonstrates that there still exists
some confusion on these points in the published literature. We have
attempted to be as thorough as time and space permit in hope of emphasizing
one extremely important message. That is, it is essential that the
-------
2-30
researcher be intimately familiar with the behavioral and statistical
structure of the models of interest in order to avoid being swallowed by
the slippery quicksand we have described. The nature of health status
measures as conditional or unconditional and the interpretation of any
latent variables in the model must be quite clear before the correct
estimation technique can be selected. When, and only when, such issues are
in order is it possible to make sense of the estimated obtained and their
relevance to benefit estimation.
It seems that the logic of the health status outcome measures of
interest in this study is better captured in terras of Cragg's
specifications . than in the Heckman two-equation model although this
question is obviously still open to informed debate. The specification of
the magnitude-of-illness model as a conditional model is, however,
intuitively plausible, and Cragg's formulations provide a natural vehicle
for translating such intuitive plausibility into an econometric framework.
However, it so happens that the assumption of normally, or at least
continuously, distributed random variables, which characterizes the above
models, is not necessarily appropriate insofar as count measures like
"days" or "times" are concerned. To a discussion of some alternative
estimation techniques that might be used in such situations we now turn.
2.8 PQISSON-DISTRIBUTED HEALTH OUTCOME MEASURES
In modeling event counts (non-negative integer data) over some time
interval (t, t+dt), the Poisson distribution is commonly used. Here, a
random variable Y^ follows the probability law
?r(Y = y) = exp (-X.nT/y!, yetO,1 ,2,...}
1 l " ' (33)
= 0 , else
-------
2-31
with j
It happens that there exist health outcome data of interest that are
recorded as nonnegative integers, most obviously as counts of days of
activity restriction. For any individual, such measures can, over a time
interval (t, t+dt), say one two-week period, assume only integer values in
{0,1,2,...,14}. Because of the paucity of observations likely to be found
at the upper (14 day) limit, we ignore the fact that these measures obey
upper bounds and concentrate instead on the complications presented by the
large number of individuals who in a typical random sample of the
population report zero days of restricted activity.
Analogous to the familiar normal distribution where for econometric
work one typically specifies u. - X.S, the \. parameter of the Poisson
distribution can. be reparameterized to admit the influence of
covariates. Since for all i, \. > 0, a straightforward approach is to
assume A. » exp(X. 3) and to estimate S by maximum likelihood (see Hausman,
Hall, Griliches (1984), Hausman, Ostro, Wise (1983), Portney and Mullahy
(1985)). This is the approach adopted here for modeling the restricted
"%
activity day outcomes.
One drawback of the Poisson model is the restriction that E(Y.)
» Var(Y.). Should this restriction not in fact characterize the data, the
>*
maximum likelihood estimates of the covariance matrix of 3 based on minus
the inverse of the estimated Hessian will be inconsistent and t-tests based
thereon would be misleading. Hausman, Ostro, and Wise circumvent this
restriction by allowing for an overdispersion parameter. A different
approach is used here, using an estimator of the covariance matrix that is
robust against departures from the mean»variance restriction, this
-------
2-32
procedure described below.
Given T independent observations, the log-likelihood function of the
Poisson health outcome model can be written as
i = I -exp(X.S) + y.X.3 + C, (34)
i l x
where exp(X.S) » X., y. is the observed count of illness days, and C does
not depend on 0. It is obvious that i is concave in 6. The first-order
conditions for the maximization of i are
3A/3B = E -exp(X.S)X! + y.X! = 0 (35)
J» J. i, J.
with the maximum guaranteed by the condition
32l/363Bf - Z -(XlX.)exp(X.S) (36)
L i i i
negative definite.
The maximum likelihood estimates of 0 obtained by maximizing (46) are
consistent, but the estimate of the covariance matrix of SM, using
2-i "
[-3 1/3B36'] evaluated at 8Mr will be inconsistent if the data are not in
nt.
fact generated by the specified Poisson distribution.
This is most easily seen as follows. Note that the model can be
equivalently cast as a nonlinear least squares regression, the i-th
observation being
(37)
exp(X.[S)
with E(e.) = 0. Clearly, var(e.) = var(Y.) = exp(X.S), so that the e. are
heteroscedastic. If nonlinear weighted least squares is used with the
-------
2-33
weights exp(~X.8) formed using consistent estimates of 3, and if the data
are in fact Poisson as specified, the maximum likelihood consistent
^ *\
estimates of 8 and cov(B) will obtain. (The consistency of 3MT for 3 does
not depend on the weighting scheme.) However, if the data is not
Poisson-distributed, the estimate of cov(3) obtained in this manner will be
inconsistent and asymptotic t-tests based thereon will be misleading. The
case is fully analogous to the estimation of the heteroscedastic linear
model which yields inconsistent covariance estimates (and, therefore,
t-statistics) if the heteroscedastic nature of the error structure is
either ignored or incorrectly specified.
«*
Royall (1984) has demonstrated a method whereby estimates of cov(3)
robust against misspecification of the underlying distribution of the data
can be obtained for various distributions, including the Poisaon, when
2-1
[-3 1/3838'] - evaluated at &ML fails to yield a consistent estimate of
* 2
cov(B). Denoting 1(8) as [-3 1/3638'], Royall's suggestion is to estimate
A
cov(B) as
31.^/38)']I(8) (38)
where I. is the i-th observation's contribution to the log-likelihood
/*
function and where all relevant evaluations in (38) are at 3M,. This will
be the approach adopted in empirical implementation of the Poisson model
the present study.
2.9 GEOMETRIC-DISTRIBUTED HEALTH OUTCOME MEASURES
One alternative to the Poisson model for the modeling of count data is
the geometric distribution. Though seemingly not as often used by
-------
2-34
econometricians as the Poisson, the geometric is a logical choice should an
alternative to the Poisson be desired. Furthermore, the basic geometric
specification does not suffer from the mean=variance restriction that is
implied in the basic Poisson model. As will .be seen below, the variance of
a geometric-distrbuted discrete random variable is greater than its mean,
although the fact that the variance depends on the mean limits somewhat the
flexibility of the distribution.
Our description of the properties of the geometric distribution
follows that of Johnson and Kotz (1969). First, it should be noted that
the geometric is a special case of the negative binomial. Discussion is
confined here to the geometric because it is computationally far more
straightforward than is the general negative binomial. The geometric
distribution is defined as follows:
Pr(X-k) - Pk(H-Pr(k*1), k -0,1,2... (39)
« 0 , else
with P>0. It holds that E(X) - P and Var(X) = P(1+P). As in the
econometric specification of the Poisson model considered earlier, one
allows the P to vary across observations as P., and again P. = exp(X.S) is
a sensible parameterization due to the required positivity of the P..
Given this, the likelihood function for T independent observations can
be written as
T
L = H exp (k.Xlft)(1 + exp(X.3))~(!V1) (40)
i=-1 L
with loglihood
T
I
- Ui-H) log (1 H- exp(xiS)) (41)
-------
2-35
where -k. is the observed count for the i-th observation. The ML estimate 3
satisfies
T
- l£k. - (k. +1) exp (X 3)/(1 + exp (X B))3X! - 0 (42)
1-1 *
The Hessian is
T
H - 321/3B3S'- I -(k + 1)Cexp(X 3)/(1 * exp(X.S))2]X!X , (43)
1-1 1 x 111
which is seen by inspection to be negative definite. Because it is a
fairly uncluttered expression, estimation and inference can proceed using
-H as an estimate of the information matrix and (-H) . as an estimate of
the covariance matrix. Unfortunately, much like the Poisson specification,
the covariance estimate thus obtained is not robust to departures from the
data being in fact geometric. However, the methods proposed by Royall
(1984) and described for the Poisson model can be used for the geometric
distribution also. As the development is identical, the details are
omitted for economy of space.
2.10 MULTINOMIAL-DISTRIBUTED HEALTH OUTCOME MEASURES
One type of micro data of particular interest in health econometrics
is of the following nature. We observe over the course of some fixed time
period (say one two-week period) the number of times (say days) that an
individual's health status is characterized by (k-1) mutually exclusive
illness outcome measures and, therefore, the number of days on which no
illness resulted, which can be viewed as the k-th activity. To be
-------
2-35
concrete, the two-week illness profile for some individual who has in
his/her illness "possibility set" two illnesses (minor restricted activity
day (=M), and severe restricted activity day (=S)), and healthy days (=H)
(=14-M-S) might look like
H - 11
M - 2
S = 1
Given observations on such health outcome profiles, it is appropriate
to view the data characterizing individuals' health status as realizations
of multinomial random variables (see Morey (1981) for a related
discussion). Recalling from discrete statistical theory, the multinomial
distribution of a random variable Y. with parameters (T; P.,..., P ) can be
"* ' - , «V
written
k t
Pr(Y - y) - T! H (P.J/t !), (44)
" j-t J J
where T is the number of trials (here days), the t. are the number of
occurrences of the j-th outcome, and P. are the probabilities that the j-th
outcome will occur on a single trial. To extend the statistical model to
the health status measures, we consider each daily outcome as one trial
from a multinomial distribution with individual-specific parameter vector
for the ra-th individual (T ; P. ,..., P, ). Assuming T = T' - T for all
mi. k . m m
ra, m', we henceforth drop the subscripts on the T parameters. The profile
for two weeks, then, is the 14 (by assumption independent) daily trials for
each individual. The econometric objective is the estimation of the P. ,
Jm
i.e. estimation of the probabilities of realizing one of the k possible
-------
2-37
outcomes on a given day.
For computational simplicity, we proceed as follows. A logistic
distribution for the daily outcome probabilities is assumed. Thus, the
probability that the outcome is Z on any trial is
PZ =* exp(Xm3z)/ I exp(X & ) (45)
m jefl
for Zefl={M,S,H}. The logistic distribution assures that for all m
the multinomial requirement (Z P. =-1) is met.
jeQ Jm '
Since the probabilities (45) are unique only up to a difference in
parameter vectors (8 -8.,), some normalization is required. The
normalization most convenient and easily interpreted i3 & - 0, so that 8,.
ti M
and fj_ are interpreted as differences between the respective illness
O
parameter vectors and the no-illness parameter vector.
The objective, then, is estimation of the parameter vectors 3.. and 8 .
M o.
This is, of course, fully analogous to the widely-used multinomial logit
model where a single outcome from a set of mutually exclusive- outcomes is
considered. In fact, that case is merely a special case of the present
exposition for which T « 1 for all m.
tn
Estimation is by means of maximum likelihood. Assuming the existence
of N independent profile draws from the population, the likelihood of the
data as a function of the parameters is
N M fcj
L(B) - H Pr(y - y) - H T! H (P. /t.!) (46)
m-1 "m ~ m-1 jefl Jm Jm
where the P. are as defined in (57) and where ft is the illness-type index
Jm
set. In log form,
-------
2-33
N
1(6) -I St. log P. + C, (47)
m=1 jeQ Jm Jm
where C is a constant not depending on 3. Given the assumed logistic
probabilities, we have
N
US) - I S t [X 8 - log ( E exp (X 0 ))] + G. (48)
m=1 jefl Jra J ksft
Maximizing (48) can be accomplished with only a slight modification of most
existing (single-trial) multinomial logit programs.
2.11 ESTIMATION OF GROUPED DATA MODELS UNDER THE NORMALITY ASSUMPTION
There are often institutional or other constraints in the sampling or
data-recording processes that have the effect of generating inexact data
for research purposes. A common case is the situation where continuous
measures of interest, such as the amount of time spent in ill health, are
cast in the recorded micro data as grouped or interval data. We discussed
above strategies that might be considered when the outcomes are recorded as
"number of days" or "number of times," i.e. where the data can be viewed as
realizations of discrete statistical processes rather than as
discrete/integer codings of fundamentally continuous processes. In this
section we concern ourselves with the situations where the underlying
processes are best viewed as continuous phenomena but where the vagaries of
either the sampling or data-coding procedures are such that only a finite
number of intervals in which the continuous measure is defined are
determined and the only data available to the analyst are indicators of the
interval bounds in which the (unknown) continuous measure is realized. For
example, the latent continuous measure might be "time spent ill over some
-------
2-39
time interval y (say t)," but owing to whatever reasons, all one knows is
whether t-0, te(0,4 days], te(4 days, 3 days], or tc(S> days, 365 days) (for
y=one year). The purpose of this section is to present an estimating
technique designed to handle such situations.
The method is based on the work of Rosett and Nelson (RN) (1975), who
developed what is known as the two-limit probit estimation technique, and
of Stewart (1983), who generalized the RN method to account for
multi-interval data. We will, therefore, refer to the model expounded here
as the RNS method. Here is posited the existence of normally-distributed
* 2 *
random variables I. - NID(X 3,a ). The realizations y. are unobserved,
#
however. Available is the knowledge that the realization y. is an element
of some proper subset of 8. More formally, partition R into P (>2) subsets
P
J , such that U J»R, J. BJ.-
-------
2-40
T
3i/3o = Z (6 _..>/.> ~ 8 ,.,/a( ,.. - Q, ,H.J - 0,
where 9 ,. . = (A - X.3/a)((A -X,B/a)), and 4>(c) is the standard normal
pi. i; p i p i
- 5 2
density (2ir) " exp(-.5c ). (Note that when P = 2, i.e. when the model is
binary probit, a parameter normalization is required. Typically a=1 is
used. This, of course, reduces the number of first order conditions from
(m+1) to m, where m is the dimensionality of 0.) Stewart has shown how
iterative least squares can be used to obtain the ML estimate. The reader
is referred to his work for the details.
2.12 SUMMARY AND CONCLUSIONS
This brief survey has attempted to present an overview of several
approaches to econometric estimation of air pollution - health outcomes
models in situations where the distributions health outcome data are such
that methods other than linear ordinary least squares are likely to be
required in order to obtain consistent parameter estimates. The data used
in this study are in all instances of this "nonstandard" nature. In
particular, the analysis to follow concentrates on three of the types of
data described in the preceeding discussion: count data,
multinomial-distributed data, and discrete indicator, or (0,1), data of the
probit sort. The following chapters discuss in some detail estimation of
such models, and implement some of the estimation techniques presented in
the analysis of this chapter.
The scope of the present analysis precludes consideration of several
interesting research issues that must be placed on the menu of future
research. First, the matter of severity of chronic illnesses is left
-------
2-41
untreated. It clearly is plausible that not only the presence or absence
of, for example, chronic respiratory illness is related to air pollution
exposures, but also that the severity of the illness - defined by some
metric of severity is responsive to pollution exposures as well as other
covariates. A second interesting issue that merits analysis in the future
is the possibility that some subset of the explanatory variables used to
explain health outcomes is correlated with heterogeneous
individual-specific components of the unobserved equation error terms. In
the present context, it might be argued that covariates such as cigarette
consumption, income, labor market status, and even air pollution exposures
(on this last point, see Rosenzweig and Wolpin (1984)) are possibly
correlated with unobservable errors. When such heterogeneous unobservables
are present and the crux of the problem are correlated with observed
explanatory variables, parameter estimates obtained without explicit
recognition and control for this nonzero correlation are will in general be
inconsistent. Sane instrumental variable technique will likely be required
in order to obtain consistent parameter estimates under such circumstances.
-------
2-42
REFERENCES
Amemiya, T. 1981. "Qualitative Response Models: A Survey," Journal of
Economic Literature, vol. 19, pp. 1483-1536.
. 1983. "Nonlinear Regression Models," in Z. Griliches and M. D.
Intriligator, eds., Handbook of Econometrics, vol. 1, (Amsterdam:
North-Holland).
. 1982*. "Tobit Models: A Survey," Journal of Econometrics, vol. 24,
pp. 3-61.
Serndt, E. R., B. H. Hall, R. E. Hall, and J. A. Hausman. 1974.
"Estimation and Inference in Nonlinear Structural Models," Annals of
Economic and Social Measurement, vol. 3, pp. 653-665.
Breusch, T. and A. R. Pagan. 1980. "The Lagrange Multiplier Test and Its
Application to Model Specification in Econometrics," Review of Economic
Studies, vol. 47, pp. 239-253.
Cox, D. R. and D. V. Hinkley. 1974. Theoretical Statistics (London:
Chapman and Hall).
Cragg, J. G. 1971. "Some Statistical Models for Limited Dependent
Variables with Application to the Demand for Durable Goods,"
Econometrica, vol. 39, pp. 829-844.
Duan, N., W. G. Manning, C. M. Morris and J. P. Newhouse. 1983- "A
Comparison of Alternative Models for the Demand for Medical Care,"
Journal of Business and_Economic Statistics, vol. 1, pp. 115-126.
, , , and . 1984. "Choosing Between the
Sample-Selection Model and the Multipart Model," Journal of Business
and Economic Statistics, vol. 2, pp. 283-289.
Dudley, L. and C. Montmarquette. 1976. "A Model of the Supply of
Bilateral Foreign Aid," American Economic Review, vol. 66, pp. 132-142.
-------
2-43
Hausraan, J. A. 1978. "Specification Tests in Econometrics,1' 2conometrica,
vol. 46, pp. 1251-1271.
, B. Hall and Z. Griliches. 1984. "Econometric Methods for Count
Data with an Application to the Patents-R&D Relationship," Econometrica,
vol. 52, pp. 909-938.
, B. Ostro and D, Wise. 1984. "Air Pollution and Lost Work," NBER
working paper 1263, January.
Hay, J. W. and R. J. Olsen. 1984. "Let Them Eat Cake: A Note on
Comparing Alternative Models of the Demand for Medical Care," Journal of
Business and Economic Statistics, vol. 2, pp. 279-282.
Heckman, J. 1976. "The Common Structure of Statistical Models of
Truncation,. Sample Selection and Limited Dependent Variables and a
Simple Estimator for Such Models," Annals of Economic and Social
Measurement, vol. 5, pp. 475-492.
. 1979. "Sample Selection Bias as a Specification Error,""
Econometrica,. vol. 47, pp. 153-161.
Hurd, M. 1979. "Estimation in Truncated Samples When There is
Heteroscedasticity, Journal of Econometrics, vol. 11, pp. 247-258.
Johnson, N. L. and S. Kotz. 1969. Distributions in Statistics; Discrete
Distributions (New York: Wiley).
and . 1970. Distributions in Statistics; Continuous
Univariate Distributions - I (New York: Wiley).
Kendall, M. G. and A. Stuart. 1973. Advanced Theory of Statistics, vol. 3
(London: Griffin).
Killingsworth, M. R. 1983. Labor Supply (Cambridge: Cambridge University
Press).
Lin, T.-F. and P. Schmidt. 1984. "A Test of the Tobit specification
-------
2-44
Against an Alternative Suggested by Cragg," Review of Economics and
Statistics, vol. 66, pp. 174-177.
Maddala, G. S. 1977. Econometrics (New York: McGraw-Hill).
. 1983. Limited-Dependent and Qualitative Variables in Econometrics
(Cambridge: Cambridge University Press).
Manski, C. F. and D. McFadden. 1981. Structural Analysis of Discrete Data
with Econometric Applications (Cambridge, Mass: MIT Press).
Morey, E. R. 1981. "The Demand for Site-Specific Recreational Activities:
A Characteristics Approach," Journal of Environmental Economics and
Management, vol. 8, pp. 345-371.
Nelson, F. D. 1981. "A Test for Misspecification in the Cenaored-Normal
Model," Econometrica, vol. 49, pp. 1317-1329.
Qlsen, R. 1980. "Approximating a Truncated Normal Regression with the
Method of Moments," Econometrica, vol. 48, pp. 1099-1106.
Ostro, B. 1983. "The Effects of Air Pollution on Work Loss and
Morbidity," Journal of Environmental Economics and Management, vol. 10,
pp. 371-382.
Pearson, K. and A. Lee. 1908. "Generalized Probable Error in Multiple
Normal Correlations," Biometrika, vol. 6, pp. 59-68.
Pitt, M. 1983. "Food Preferences and Nutrition in Rural Bangladesh,"
Review of Economics and Statistics, vol. 65, pp. 105-114.
Portney, P. R. and J. Mullahy. 1985. "Urban Air Quality and Acute
Respiratory Illness," Journal of Urban Economics, forthcoming.
Rao, C. R. 1965. Linear Statistical Inference and Its Applications, (New
York: Wiley).
Rosenzweig, M. R. and K. I. Wolpin. 1984. "Migration Selectivity and the
Effects of Public Programs," University of Minnesota, Economic
-------
2-45
Development Center, Bulletin
Rosett, R. N. and F. D. Nelson. 1975. "Estimation of a Two-Limit Probit
Regression Model," Econometrica, vol. 43, pp. 141-146.
Royal1, R. 1984. "Robust Inference Using Maximum Likelihood Estimators,"
Johns Hopkins University, Department of Biostatistics Working Paper.
Schmidt, P. 1976. Econometrics (New York: Marcel Dekker).
Smith, M. and G. Maddala. 1983- "Multiple Model Testing for Non-Nested
Heteroscedastic Censored Regression Models," Journal of Econometrics,
vol. 21, pp. 71-81.
Stapleton, D. and D. Young. 1984. "Censored Normal Regression with
Measurement Error on the Dependent Variable," Econometrica, vol. 52,
pp. 737-760.
Stewart, M. B. 1983. On Least Squares Estimation when the Dependent
Variable is Grouped," Review of Economics Studies, vol. 50, pp.
737-753.
Tobin, J. 1957. "Estimation of Relationships for Limited Dependent
Variables," Eoonometrica, vol. 26, pp. 24-36.
Wales, T. and A. Woodland. 1983. "Estimation of Consumer Demand Systems
with Binding Non-Negativitay Constraints,* Journal of Econometrics,
vol. 21, pp. 263-285.
White, H. 1982. "Maximum Likelihood Estimation of Misspecified Models,"
Sconometrica, vol. 50, pp. 1-25.
White, H. 1983. "Corrigendum," Econometrica, vol. 51, p. 513.
-------
Chapter 3
AIR POLLUTION MONITORS AND INDIVIDUAL EXPOSURES
The models estimated in Volume I typically utilized as measures of
an individual's exposure the pollutant-specific readings from the
monitor nearest the centroid of the respondent's census tract for which
the data were available. In most cases, screening criteria were
established so that it was necessary both for a monitor to have recorded
at least some minimal nunber of hourly readings during the two-week
period and for the monitor to be located not further than some
prescribed distance (20 miles; 10 miles) frcra the residents' census
tract centroids.
It is possible that the nearest-monitor readings we utilized are
not representative of the pollution "profile" of the metropolitan area
in which each respondent lives. If some average of the readings fron a
nunber of nearby monitors better characterizes the ambient
concentrations facing the individuals in question, then the consistency
of results obtained using nearest-monitor readings must be called into
question. (This abstracts, of course, from the larger question of the
ability of ambient monitors at all to measure the exposure of
individuals.)
The purpose of this very brief chapter and assess the pollution
profiles constructed using the nearest-monitor readings versus those
that result when the average readings from a number of monitors are
used. The extent to which the two constructs are correlated indicates
the sensitivity of our results to the use of nearest-monitor readings to
-------
3-2
characterize exposure.
The procedure is as follows. For each of the six pollutants used
in our studyozone, 3ulfates, TSP, NO , CO, and S02we utilized the
data for the 14,441 adults in the main sample and constructed the
nearest-monitor measures used in the main study. These were designated
as XXNR01, where XX is the specific pollutant (03,34,SP,N2,GO,32). In
our study, recall, these measures were subjected to a
miles-from-census-tract-centroid cutoff of 5, 10, or, most often, 20
miles; the specific distance will be obvious from the context. (Mo
minimal hours standard is used here.)
For these same individuals, we then constructed two averaged
measures for each pollutant. The two measures constructed were the
simple average over all the available readings from monitors within 10
and then within 20 miles of the census tract centroids. These measures
were designated XXAVGYY, where XX was as defined above and Y.Y was either
10 or 20. Thus, N2AVG20 is the average of all nitrogen dioxide monitors
within 20 miles of the census tract centroid.
Then, given these measures, we calculated for each pollutant the
correlation between the nearest-monitor reading and the area-average
reading at both the 10 and 20 miles cutoff values.We also calculated the
number of monitors used to construct the two area-aver ages.
The results are presented in the tables that follow. In each, case,
"r" is the simple correlation coefficient between the nearest-monitored
reading and the 1 0 or 20 mile averaged readings.
-------
3-3
10 mile:
OZONE
,965, N = 8,323
03NR01
03AVG1 0
20 mile:
03NR01
03AVG20
Number
n
1
2
3
4
5
Mean
.'0454
.0460
r - .'931,
Number
n.
1
2
3
4
5
Mean
. 0450
.'0461
Monitor
f(n)
3832
2303
985
553
302
Max
.251
.'236
Monitor
f(n)
2665
2463
1447
954
847
Max
.'251
.225
Readings in Area-Average:
n
6
7
8
9
10
Min
0
0
N =»
f(n)
244
262
179
112
51
11 ,241
Readings in Area-Average:
n
6
7
8
9
10
Min
0
.003
f(n)
679
646
550
631
359
-------
3-4
SULFATES
10 mile: r = .952,
Number
n
1
2
3
4
5
Mean
S4NR01 10.528
S4AVG10 10.544
20 mile: r - .'91 2,
Number
n
1
2
3
4
5
Mean
S4NR01 10.590
S4AVG20 10.523
Monitor
f(n)
2693
1 134
401
308
247
Max
52.136
52.136
Monitor
f(n)
2595
1230
823
614
559
Max
52 .'136
52.136
N = 5,
249
Readings in Area-Average:
n
6
7
8
9
10
Min
0
0
N - 7
f(n)
134
109
101
116
6
,512
Readings in Area-Readings:
n
6
7
8
9
10
Min
0
1.586
f(n)
526
559
329
250
27
-------
3-5
TSP
10 mile:
859,
N - 12,598
Number Monitor Readings in Area-Average:
SPNR01
SPAVG10
20 mile:
SPNR01
SPAVG20
n
1
2
3
4
5
Mean
70.' 478
72.021
r = .'818
Number
n
1
2
3
4
5
Mean
70.128
71.948
fCn)
1851
1283
1275
979
1012
Max
284.004
253.^28
Monitor Readings
f(n)
822
895
702
797
1 084
Max
284.004
272.244
n
5
7
8
9
10
Min
9.996
15.092
N =
f(n)
1039
975
1207
1455
1522
13-772
in Are a- Aver age:
n
6
j
8
9
10
Min
9.996
15.092
f(n)
1272
1373
1429
2330
3068
-------
3-6
10 mile: r
,951
N - 6,393
Number Monitor Readings in Area-Average:
f(n)
f(n)
N2NR01
N2AVG1 0
20 mile:
1
2
3
4
5
Mean
117.857
1 1 7. 323
r - .'923
3195
1593
775
485
212
Max
435.316
435.316
Number Monitor
N2NR01
N2AVG20
n
1
2
3
4
5
Mean
112.913
111. '646
f(n)
3004
1890
747
442
417
Max
435.' 31 6
375.' 928
6
7
8
9
1
Min
0
0
N
Readings in
96
36
1
0
0 0
=. 8,452
Area-Readings:
n f(n)
6
7
8
9
1
Min
0
0
668
692
398
168
0 26
-------
3-7
CO
10 mile: r = .887
N = 3,921
Number Monitor Readings in Area-Average:
CONR01
COAVG1 0
20 mile:
CONR01
COAVG20
n
1
2
3
4
5
Mean
3.306
3.937
r =» .'838
Number
n
1
2
3
4
5
Mean
3-717
3.808
f(n)
3638
2087
946
676
510
Max
26. 583
26.583
Monitor Readings
f(n)
2344
2536
1130
1215
984
Max
26.583
25.111
n
6
7
8
9
10
Min
0
0
N = 10,939
in Area-Average
n
6
7
8
9
10
Min
0
0
f(n)
288
280
235
146
115
f(n)
480
298
553
838
56i
-------
3-3
10 mile: r
,'857
M = 8,842
Number Monitor
S2NR01
S2AVG1 0
20 mile:
S2NR01
S2AVG20
n
1
2
3
4
5
Mean
68. 591
69.375
'r = .'819
Number
n
1
2
3
4
5
Mean
66. 222
67.050
f(n)
2976
1855
1280
966
770
Max
760.088
568.988
Monitor
f(n)
2414
1733
1204
1129
1069
Max
760.088
568.988
Readings in Area-Average:
n
6
7
8
9
10'
Min
0
0
N -
f(n)
356
182
206
192
59
10,784
Readings in Area-Average:
n
6
7
3
9
10
Min
0
0
f(n)
919
726
841
541
208
-------
3-9
The results are quite reassuring about the use of nearest-monitor
data to proxy individual exposure.' The correlation coefficients between
the nearest-monitor reading and the average of all monitors within 10
miles range from .965 for ozone (a highly dispersed pollutant) to .86
for TSP and SO (more localized pollutants). The 20-mile correlations
follow similar relationships, but are of course somewhat lower than the
10-mile correlations due to the decreased weight of the nearest-monitor
reading in calculating the 20-mile averages. What is particularly
encouraging is that no correlation coefficient is below 0.8, leading us
to suspect that the use of the nearest-monitor reading would be unlikely
to impart any systematic biases vis-a-vis use of area-averaged readings.
In the following chapter, we make use of the area-averaged readings to
test this suspicion.
-------
Chapter 4
URBAN AIR QUALITY AND ACUTE RESPIRATORY ILLNESS
4.1 Introduction
Over the past fifteen years, economists interested in the benefits of
air pollution control have concerned themselves with more than just the
appropriate valuation of health gains and losses. In addition, some have
explored in epidemiological analyses the actual physical relationships
between air pollution and health itself using statistical techniques common
in the social and natural sciences. Most of these studies have used
aggregate data at the city or SMSA level to test for the effects of
prolonged exposures to air pollution on the mortality rates across the
units of observation. The studies of Lave and Seskin [8], Crocker, et al.
[2], Mendelsohn and Orcutt [12], Chappie and Lave [1], and Lipfert [10] are
among the best examples.
Relatively less attention has been given in this literature to the
relationship between air pollution and sickness (or morbidity). This is
unfortunate because morbidity is observed much more frequently than
mortality and may be of greater economic significance than premature death.
When researchers have examined possible links between air pollution and
morbidity, they have generally been forced through lack of data to do so in
the absence of information about individuals' socioeconoraic and other
characteristicseven though these characteristics may have an important
effect on health status.
-------
4-2
Volume I presents our recently completed comprehensive investigation
of the effects of ozone (ground-level rather than stratospheric) and other
air pollutants on individuals' acute and chronic health status. Unlike
many previous studies, this work is based on a large and relatively
detailed individual data base, allowing controls for certain important
socioeconomic and demographic characteristics in addition to the
meteorological measures sometimes included in earlier studies using either
aggregate or less detailed individual data. This chapter presents seme of
the major findings concerning the effects of urban air quality on acute
respiratory disease using an estimation technique not employed in Volume I.
Chapter 7 reports some new findings on air pollution and chronic illness.
Of particular concern here is the sensitivity of the findings to the
measures of air quality used. As suggested above, most previous analyses
of the health effects associated with air pollution have characterized
individual exposures using some measure of air quality averaged over most
or all of the monitors in the urban areas where the individuals live.
However, many persons may get most or all of their ambient exposure
proximate to the monitors nearest their homes. As part of our larger study
in Volume I, therefore, each individual in the sample was matched to the
nearest ten air pollution monitors for each of eight different air
pollutants so as to use close-to-home pollution readings to characterize
exposure. Because this was very resource-intensive, it is important to
illustrate the difference such an approach may make when estimating
dose-response relationships. Additional sensitivity 'analyses in this
chapter explore interactive effects as well as possible thresholds and
-------
4-3
non-linearities in the relationship between air pollution and acute health
status.
In Section II we briefly describe the data used in our analysis and
the independent variables we include. In Section III, we discuss the
estimating techniques used to explore possible links between air pollution
and acute respiratory disease. In Section IV we present our empirical
findings and in Section V we draw some cautious inferences from them for
applied welfare calculations.
4-2. Framework for the Analysis
The individual data underlying both our larger study as well as the
present chapter come from the 1979 Health Interview Survey (HIS)a
nationwide sample of approximately 110,000 individuals conducted during the
course of each year by the National Center for Health Statistics. All
acute illness experienced during the two-^week period prior to the date of
each interview was to be reported by each respondent or the family member
responding for him or her. Manifestations of these illnesses were
classified in three typesbed disability days (the most serious of the
three categories), work or school loss days, and what might best be thought
of as minor restricted activity days. The latter are days on which the
respondent was neither bed-ridden nor forced to miss work or school but did
suffer from an acute impairment sufficient to cause him or her to restrict
activity in some noticeable way. The dependent variable in the subsequent
analysis is total restricted activity daysthe total num'ber of days during
the two-week period on which any of these three types of acute illness
-------
4-4
occurred. Finally, all acute (and chronic) health information elicited in
the survey was coded by cause, using the International Classification of
Disease. Attention is limited in this chapter to total restricted activity
days due to respiratory disease since this is the type of acute impairment
most likely to result from exposure to air pollution.
The socioeconomic data elicited from each respondent in the Health
Interview Survey includes, among many other individual and
household-specific characteristics, information on age, race, sex, income,
and education. In addition, several supplements to the 1979 survey made it
particularly useful for epidemiological purposes. Specifically, the 1979
HIS contained a supplemental questionnaire asked of one-third of all the
adults interviewed (26,271 of a total of 79,743 adults) which provided
detailed data on lifetime smoking history, including the tar and nicotine
content of the brands most commonly smoked. Smoking data are obviously of
great importance if one is interested in exploring the determinants of
respiratory and other types of disease. The 1979 HIS also included a
supplement (again to one-third of all adults surveyed) designed to provide
detailed information on residential histories. This is not important for
our present purposes but will play a major role in our analysis in Chapter
7 of the determinants of chronic respiratory and other types of disease.
All air pollution data come from the Environmental Protection Agency's
SAROAD system. For our analysis of the relationship between air pollution
and acute morbidity in the larger study, all air quality data were specific
to the two-week recall period for which individual health data were
available. This is also the case here, save for sensitivity analyses
-------
4-5
conducted using annual average data as a proxy for air quality during the
two-week period. As indicated above, most of the analysis below
characterizes individuals' exposures to air pollution using data from the
air pollution monitors nearest their residences. No individuals are
included in the final sample if the nearest monitor for any pollutant is
more than ten miles away. The average distance to the nearest monitor is
slightly more than four miles. In addition to the air pollution data,
meteorological data were added from the monitoring network of the National
Oceanic and Atmospheric Administration. Included are observations on
temperature and precipitation during the two-week recall period.
The overall sample from which the subsample used here is drawn
consists of 14,441 individuals aged seventeen and above for whom both
smoking data and at least some air pollution data were available. The
models estimated below are based on a smaller subsample, however, since
complete data are required for each of the air pollutants and other
independent variables.
The analysis of acute respiratory disease includes air pollution data
during the two-week recall period for ozone, a gaseous pollutant that is
the primary constituent of smog, and well as sulfates, perhaps the most
harmful of the airborne particles. It is worth noting that the computer
algorithm used to match individuals to the ten nearest ozone and sulfate
monitors could only be used for monitors within SMSAs. Thus, the
estimation sample consists entirely of city and suburban residents from
around the United States. Table 4-1 lists the independent variables used
in the analysis of acute respiratory disease and their sample means.
-------
U-6
Table 4-1. Variable Definitions and Sample Means
Variable Name
OZNEAR
S4NEAR
OZAV1 Of
S4AV10
OZAV20
S4AV20
OZANNR
S4ANNR
OZAN1 0
S4AN10
OZAN20
Description
Average daily maximum one-hour ozone
reading during two week recall period
at monitor nearest the centroid of
respondent's census tract of residence
(in parts per million)
Average 24-hour sulfate concentration
during two weeks at nearest monitor
(see above) (in micrograms per
cubic meter)
Average daily maximum one-hour
ozone reading during two weeks
averaged over all monitors within
a ten mile radius of respondent's
census tract centroid
Average 24-hour sulfate concentration
during two weeks averaged as in OZAV1 0
Same as OZAV10 but averaged over all
monitors within 20 mile radius
Same as S4AV1 0 but averaged over all
monitors within 20 mile radius
Average daily maximum one-hour ozone
concentration over entire calendar
year 1979 as measured at the nearest
monitor
Average 24-hour sulfate concentration
over calendar year 1979 as measured at
the nearest monitor'
Same as OZANNR but averaged over all
monitors within ten mile radius
Same as S4ANNR but averaged over all
monitors within 10 miles
Sane as OZAN10 but averaged over all
monitors within 20 miles
Sample Mean
0.042
10.876
0.043
10.890
0.044
10.700
0.042
10.752
0.043
10.709
0.044
-------
1-7
Table 4-1 (cont'd). Variable Definitions and Sample Means
Variable Name
S4AN20
WHITE
MALE
INCOME
AGE
GIGS
FORMER
SCHLYR
CHRONIC
Description
Same as S4AN1 0 but averaged over
all monitors' within 20 miles
Dummy variable: 1 if white,
0 otherwise
Dummy variable: 1 if male,
0 if female
Annual household income
in dollars
Age in years
Number of cigarettes smoked per day
Dummy variable: 1 if respondent
formerly smoked regularly but does
not presently, 0 if not
Years of education completed
Dummy variable: 1 if respondent
Sample Mean
10.588
0.852
0.436
17,152
42.30
7.58
0.20
11.73
0.17
MAXTMP
RAIN
RRAD
has any limitation in activity due
to chronic illness, 0 otherwise
Average daily maximum temperature 64.02
during two-week period
Average daily rainfall during 0.12
two-week period
Number of respiratory-related restricted 0.162
activity days during two-week recall
period
-------
4-3
4.3 Model Specification
For reasons of economy and computational simplicity, most of the
models in Volume I were estimated using ordinary least squares and logit
techniques (where the dependent variable was, respectively, either the
number of days of a particular kind of impairment during the two-week
recall period or a dichotomous indicator of an individual having at least
one day of that kind of impairment during the period). As Chapter 2 points
out, however, estimation techniques like OLS are not ideally suited to the
nature of our measures of acute health status, however. Recall that that
measure is the number of respiratory-related restricted activity days
during the two-week recall period (RRADs). Clearly this measure is bounded
by zero and fourteen and because of survey protocol can assume only integer
values in {0,1,2,...'., 14}. The frequency distribution of RRADs for the
sample of 3.347 adults is presented in Table 4-2. Because, of the small
number of observations at the upper (14 day) limit, the implications of
this upper bound for estimation strategy are ignored in the following
analysis; we concentrate instead on the complications arising from the
overwhelmingly large number of individuals reporting zero RRADs.
A standard approach in such circumstances is to use the Tobit or
censored normal estimator where one observes T independent observations on
yfc which are the realizations of random variables Y * subject to the
L. t
2
censoring rule y =«max(0,y *), Y. *-N(X 3,
obtained using the Tobit model are generally inconsistent when the
underlying data are not distributed as censored normal with
-------
4-9
Table 4-2. RRAD Frequency Distribution
RRAD
0
1
2
3
4
5
6
7
8
10
11
12
14
1 OBS
3227
25
28
23
9
7
2
3
3
3
1
1
15
%_
96.42
0.75
0.84
0.69
0.27
0.21
0.06
0.09
0.09
0.09
0.03
0.03
0.45
-------
4-10
independent-, identically distributed errors. Estimating a Tobit model of
RRADs using the two-week average pollution data from the nearest monitor
and the other independent variables in Table 4-1 above, some tests for its
appropriateness were conducted and strong evidence of tnisspecification was
found. While this might be attributable to omitted variables or other
factors unrelated to departures from the usual assumptions about the error
distribution in the Tobit model, a different statistical approach is
utilized here.
In modeling event counts (non-negative integer data) over a time
interval (t,t+dt), the Poisson distribution is commonly used. Here,
discrete random variates Y follow the probability law:
(1)
= 0 , else
with E(Y ) - Var(Y. ) - \ . Given the nonnegative integer nature of the
u U t ,
RRAD measure, such a probability law has obvious appeal for estimation.
Analogous to the normal distribution where for econometric work one
typically specifies E(Y.fc) - \i - X 8, the parameter of the Poisson
distribution can be reparameterized to admit the influence of covariates.
Since for all t, X > 0, a straightforward approach is to assume A -
c t
exp(X 3) and to estimate 8 by maximum likelihood (see Hausman, Hall,
Griliches [5], Hausman, Ostro, Wise [4]). This is the approach adopted
here for modeling the RRAD outcomes.
-------
4-11
A drawback of the Poisson specification is the restriction that E(Y. )
t
= Var(Y ). Should this restriction not characterize the data, the maximum
w
likelihood estimates of the covariance matrix of 3 will be inconsistent and
asymptotic t-tests based thereon would be misleading. Hausman, Ostro, and
Wise circumvent this restriction by allowing for an overdispersion
parameter. We take a different approach here, using an estimator of the
covariance matrix that is more robust against departures from the
restriction that the mean be equal to the variance. Details of this
procedure are presented in the appendix.
Given the assumptions on the parameterization of the \. , the
U
log-likelihood function to be maximized is:
(2) i =« I-exp(XtS) + ytXt& + c,
w
where X. is the vector of independent variables as described in Table 1, y.
W . V*
is the observed READ count for individual t, and c does not depend on 3.
The ML estimate of 8 satisfies:
(3) 3*/3B - I(-exp(X.S) * y.)X! - 0.
t G c
4.4. Empirical Results
Table 4-3 presents the results of our basic model and the variants
designed to test the sensitivity of the results to assumptions about
-------
4-12
individual exposures to ambient air pollution. In specifications (3.1) -
(3-3) each individual's count of respiratory restricted activity days is
hypothesized to be related to ambient air quality during the individual's
two-week recall period. In (3.1) exposures are proxied by readings from
the one ozone and one sulfate monitor nearest each individual's residence;
in specifications (3-2) and (3.3). readings are averaged, respectively,
over all monitors within 10 and 20 miles of each respondent's residence.
Specifications (3.4) - (3-6) use annual 1979 average air pollution readings
as a proxy for air pollution exposure during each recall period. As in
(3«D ~ (3»3)» equation (3-4) uses the annual average a.t the nearest
monitor to proxy individual exposure while (3-5) and (3.6) use the average
of the annual averages of all monitors within 10 and 20 miles respectively.
Table 4-3 indicates that of the non-pollution variables, race, income
and temperature are related consistently across models to RRADs in a
statistically significant waywith whites, those with lower incomes, and
those exposed to colder temperatures all experiencing relatively more acute
respiratory illness during the two-week recall period. Because those
reporting the presence of a chronic illness would be expected to experience
more restrictions in activity during any two-week period, a dummy variable
identifying such individuals was included. As expected, this dummy
variable was positively and significantly related to the number of RRADs.
Finally, while always of the expected sign, the number of cigarettes smoked
-------
4-13
Table 4-3. Model Estimates: Sensitivity to Air Pollution Measurement
('Dependent variable is RRADs during two-week recall period)
Model
Independent 3.1 3.2
Variable
OZNEAR 6.883
(1.97)
OZAV10 6.614
(1.91)
OZAV20
OZANNR
OZAN1 0
OZAN20
S4NEAR -0.005
(0.22)
S4AV10 -0.0210
(0.67)
S4AV20
S4ANNR
S4AN10
S4AN20
WHITE 1.261 1.258
(2;87) (2.86)
3.3 3.4 3.5 3.6
9.324
(2.41)
17.603
19.449
(2.88)
1 7. 473
(2.12)
-0.046
(1.4)
-0.0175
(0.41)
-0.0558
(1.34)
-0 . 0765
(1.87)
1.249 1.165 1.163 1.188
(2.85) (2.65) (2.65) (2.72)
-------
4-14
Table 4-3 (cont'd.) Model Estimates: Sensitivity to Air Pollution
Measurement (Dependent variable is RRADs during two-week
recall period)
Model
Independent
Variable
MALE
INCOME
AGE
GIGS
FORMER
SCHLY.R
CHRONIC
MAXTMP
RAIN
INTERCEPT
N
*
L
3.1
-0.054
(0.19)
-0.000035
(2.3D
0.00031
(0.05) -
0.015
(1.53)
0.312
(0.89)
0.0067
(0.17)
0.776
(2.45)
-0.019
(2.45)
1.629
(1.07)
-2.127
(2.06)
3,347
-741.5
3.2
-0.058
(0.21)
-0.000035
(2:30)
0.00050
(0.08)
0.015
(U56)
0.319
(0.91)
0.0066
(0;17)
0.769
(2.42)
-0.013
(2.54)
1.735
(1:13)
-1.993
(1.92)
3,347
-740.9
3-3
-0.064
(0.23)
-0.000035
(2;28)
0.00086
(0;14)
0.016
(1.62)
0.323
(0:92)
0.0067
(0.17)
0.760
(2.39)
-0.021
(2;70)
1.952
(1^28)
-1.780
(1.67)
3,347
-732.0
3.4
-0.062
(0.22)
-0.000035
(2.27)
0.00076
(0.13)
0.016
(1.7D
0.340
(0.98)
.000062
(0.02)
0.707
(2.18)
-0.016
(2;18)
1.801
(1.12)
-2.559
(2;26)
3,347
-710.0
3.5
-0.065
(0.23)
-0.000034
(2.22)
0.0013
(0.23)
0.016
(1.70)
0.344
(o;99)
0.0035
(0;09)
0.071
(2:19)
-0.017
(2.38)
1 .992
(1.30)
-2.257
(1:93)
3,347
-707.0
3.6
-0.055
(0.20)
-0.000033
(2.19)
0.0013
(0.21)
0.016
(1.72)
0.328
(0:94)
0.0010
(0.03)
0.720
(2;24)
-0.013
(2.48)
2.049
(U34)
-1.950
(1.66)
3,347
-712.0
L = Log likelihood
(Asymptotic normal statistics for Ho:3
0 in parentheses)
-------
4-15
per day and the dummy variable indicating that the respondent is a former
smoker were not significant at conventional levels, a somewhat surprising
finding given the concentration on respiratory disease.
The main focus of our analysis is the relationship between acute
respiratory disease (as measured by RRADs) and urban air quality. As Table
4-3 indicates, in only one of the six specifications is the hypothesis of
no relationship between ozone and RRADs not rejected at at least the 95S
level. This finding is fully consistent with the analysis in Volume I
where we used different samples, estimating techniques, and combinations of
independent variablesincluding monitored readings for as many as five
separate air pollutants. There positive and significant associations
between ozone and RRADS in adults were frequent although not uniform.
The statistical significance of the ozone coefficients is not altered
appreciably by using monitored readings averaged over 1 0 or 20 miles rather
than readings at the nearest monitor. This is intuitively plausible since
ozone tends to be a diffuse (as opposed to a "hot-spot") pollutant. To the
extent they are general izable, our findings suggest that city or SMSA-wide
average readings may be preferable to nearest-monitor readings to
characterize individual exposure to ozone in view of the resources required
to obtain the latter.
Using air pollution data averaged over the entire year during which
the health interview took placemodels (3.4) - (3.6)results in larger
estimated coefficients and higher asymptotic t-ratios for the ozone
variable than when air quality data contemporaneous to the recall period
are used. The importance of this finding should be discounted, we
-------
4-16
believe. So long as one is concerned with the possible relationships
between urban air quality and day-to-day variations in acute morbidity, the
correct measure of pollution must be one which is coincident with, or
slightly precedes, the period during which health status is being observed.
To illustrate, consider an individual interviewed for the HIS on January
15, 1979. Clearly, using 1979 annual average air pollution readings for
ozone and sulfates to help explain RRADs between jgiuary 1-14 brings into
play 50 weeks of data which could have no effect whatsoever on health
during the recall period. For this reason, the use of contemporaneous (or
"real time") air pollution data should be considered the conceptually
correct approach when analyzing acute respiratory disease.
Based on the findings in Table 4-3 we cannot reject the hypothesis of
no relationship between ambient sulfate concentrations and RRADS during the
two-week recall period. It should be noted, however, that sulfates and
other particulates are generally monitored only every six days. Thus, any
two-week period will contain at most three 24-hour sulfate measurements and
this may affect the findings. (Ozone, on the other hand, is monitored
continuously and is measured in specifications (3-1) ~ (3-6) by the average
daily maximum one-hour readingmeasured during the recall period or
annually depending on the equation.) Note also that the coefficient on
sulfates is more sensitive to the choice of exposure proxy. This is
because concentrations of sulfates and other particulates exhibit greater
il
variation within an area than does ozone. (It should be noted here that
the sample correlation between OZNEAR and S4NEAR is 0.108. We conducted
-------
4-17
teats for possible degradation of parameter estimates due to collinearity
but found no evidence thereof.)
Prior clinical and epidemiological analyses suggest the possible
importance of interactive or synergistic effects of certain air pollutants
(see Hazucha and Bates [7] and Graves and Krumm [3], for instance).
Accordingly, the existence of such an effect between ozone and sulfates
(OZXS4) is tested. The results are presented in specification (4.1) in
Table 4-4, and do not support the hypothesis that such effects are
important. In (4.2) another hypothesized interactive effect is tested, that
between ozone and average maximum temperature (OZXTEMP) during the recall
period. Again, no evidence of such an effect is found. These results are
consistent with the more extensive analysis of interactive effects in
Volume I.
So-called threshold effects or other types of non-linearities in the
relationship between ozone and RRADs are potentially important and are
tested for here. To see whether the relationship with RRADs differs
between lower and higher concentrations, the sample was twice divided into
two separate regimes, once with the dividing point being 0.05 ppm.
Separate coefficients were estimated on the ozone variable in the lower and
higher regimes. In this specification ozone is positively and
significantly associated with the expected number of RRADs in regimes both
above and below 0.05 ppm. A causal inspection of the coefficients in (4.3)
could convey the impression that a marginal change in ozone will have a
larger impact on RRADs at lower than at higher concentrations. In fact,
this is not the case. When the first derivatives of the estimating
-------
4-13
Table 4-4. Model Estimates: Alternative Specifications (Dependent variable is
'RRADs during two-week recall period)
Model
Independent
Variable
OZNEAR
OZH75
OZL05
(OZNEAR)2
(OZNEAR)1/2
34 NEAR
OZXS4
OZXTEMP
WHITE
MALE
INCOME
AGE
4.1
7.410
(1.24)
-0.003
(0.09)
-0.047
(0.09)
1.262
(2.37)
-0.054
(0.19)
-0.000035
(2.3D
0.00031
(0.05)
4.2
70.659
(1.77)
-0.003
(0.12)
-0.874
(1:65)
1.235
(2.80)
-0.053
(0:19)
-0.000036
'(2.37)
0.00067
(0.11)
4.3
9.554
(2.71)
22.505
(2.11)
-0.0023
(0.10)
1.259
(2.86)
-0.049
(0.18)
-0.000036
'(2.38)
-0.000024
'(0.04)
4.4
1.343
(0.07)
-0.0017
(0.07)
1.290
(2.93)
-0.043
(0.15)
-0.000035
(2.32)
0.00025
(0.04)
4.5
4.926
(2.45)
-0.0074
(0.31)
1.239
(2.83)
-0.060
(0.21)
-0.000036
(2.32)
0.00034
(0.06)
-------
4-19
Table 4-4 Cont'd.) Model Estimates: Alternative Specifications (Dependent
variable is RRADs during two-week recall period)
Model
Independent
Variable
4.1
4.2
4.3
4.4
4.5
GIGS
FORMER
SCHLYR
CHRONIC
MAXTMP
RAIN
INTERCEPT
N
i
0.015
(t.53)
0.312
(0.89)
0.0067
(0.17)
0.776
(2.44)
-0.019
(2.44)
1 .632
U.07)
-2.152
(1.92)
3,347
-2049.4
0.015
(t.52)
0.318
(0.90)
0.0036
(0.09)
0.779
(2.49)
0.0059
(0.32)
1.763
(1.12)
-3.827
(2.29)
3,347
-2031 .2
0.015
(T.52)
0.318
(0.90)
0.0051
(0.13)
0.773
(2.44)
-0.019
(2.50)
1 .626
(T.07)
-2 . 489
(2.26)
3,347
-2039.2
0.014
O.49)
0.303
(0.87)
0.0063
(0.16)
0.773
(2.43)
-0.013
(1.85)
1 .366
(0.87)
-2 . 225
(2:14)
3,347
-2054.3
0.151
(t.55)
0.319
(0.91)
0.0071
(0.18)
0.781
(2.47)
-0.023
(2.92)
1.776
(T.17)
-2 . 498
(2.46)
3,347
-2043.1
(Asymptotic normal statistics for
Ho:3 -0
in parentheses)
-------
4-20
equation are evaluated at the appropriate ozone concentration for each of
the individuals in the low and high regimes and the resulting values then
averaged, the estimated first derivative is nearly twice as large in the
high as in the low regime.
Although the Poisson expectation function E(RRAD ) = exp(X 0) is
C £
non-linear, it does imply that the elasticity of S(RRAD ) with respect to
U
ozone is linear. To allow for greater flexibility, models (4.4) and (4.5)
are estimated using, respectively, the square and the square root of the
ozone concentration during the recall period at the nearest monitor. In
other words, the specification is:
(4) E(RRAD, ) - exp(Z.Y + aCOZNEAR. )5)
C v u
where Z is the vector of independent variables other than ozone and
s*
Se(0.5, 2.0). When 6=0.5, a is positive and statistically significant;
when 5-2.0, a is positive but not significant. In fact, note that (4.5)
has a higher model likelihood than specification (3.O which is simply
equation (4) with 6-1.0, thus indicating that non-linearities in the ozone
specification are important.
4.5. Policy Implications and Conclusions
Ozone is one of six air pollutants for which the Environmental
Protection Agency has established maximum permissible ambient
concentrations. The controversy surrounding revision of the ozone standard
in 1978 (see White [20]), coupled with recent emphasis on cost-benefit
-------
4-21
analysis in government regulation (see Smith [17]), make it worthwhile to
illustrate the changes in acute respiratory health that might be associated
with changed ozone levels. We use a subset of the results presented above
to make such an illustrative calculation. The discussion here is confined
to specifications where ozone is measured by the average daily one-hour
maximum during the two weeks at the monitor nearest the respondent's
residence.
One way to assess possible pollution-related changes in acute health
status is to calculate the elasticity of E(RRAD) with respect to ozone and
evaluate the predicted total change in expected RRADs for the individuals
in the sample resulting from some hypothetical change in ozone
concentrations. Log-differentiating (4), it follows that:
(5) OE(RRAD.)/30ZNEAR.)(OZNEAR./E(RRAD. )) = 5a(OZNEAR. )5.
w \f U U C .
Note that for 5<1, the curvature of the expectation function (as
2 2
determined by 3 E(RRAD )/30ZNEAR ) cannot be determined without reference
c c
to the data for the t-th observation. It can be seen from (5) that in the
nonlinear cases where 5=0.5 or 2.0as in specifications (4.4) and
(4.5)evaluating the elasticity at the sample mean of OZNEAR will yield a
O
different estimate than that given by evaluating (5) for all t and then
averaging the elasticities. The results of both approaches are presented
in the top panel of Table 4-5.
The upper panel of the table indicates that the estimated elasticities
are quite sensitive to the value of
-------
4-22
1.0, the resulting elasticities are of .the same order of magnitude, with
the former roughly twice the latter. However, when 5-2.0, the estimated
elasticity is almost two orders of magnitude smaller than the others. Note
that these results .hold irrespective of the method of elasticity
calculation.
In the lower panel of Table 4-5 are presented the elasticities for the
model (4.3) in which ozone was permitted to have different coefficients in
low and high regimes. Recall that in this case 5=1.0, so that within each
regime the ozone elasticities are linear in ozone. Therefore, both methods
used above to calculate elasticities will yield the same result. However,
there are two relevant elasticity measures, one prevailing for observations
with ozone measures below the split and one for those above. Because of
the second-derivative properties noted above, reference to the parameter
estimates alone is insufficient to compare low- and high-regime
elasticities. In fact, it happens that the elasticity estimates for the
low-ozone and high-ozone regimes are virtually identical, 0.65 and 0.66,
respectively.
-------
4-23
Table 4-5. Elasticity Estimates for Alternative Specifications
Whole Sample
0.5
1 .0
2.0
Evaluated at Mean
of OZNEAR
0.506
0.290
0.0048
Mean of Individual
Elasticities
0.485
0.290
0.006
Split Sample (w/
1.0)
Split -
0.05 ppm
low regime
high regime
0.075 ppm
low regime
high regime
0.645
0.655
1.061
0.209
-------
4-24
Table U-6. Estimated Changes in RRADs Due to 10 percent Reduction in
'Ozone Concentration
5-
0.5
1 .0
2.0
Average Individual Reduction
each two weeks
(S1-S2)/n
-.00776
-.00442
-.000083
Annual Decrease in RRADs:
Urban Adult Population*
22.19 x 106
12.64 x 105
0.24 x 106
Calculated by multiplying the two-week individual change in column 2 by 26
to convert to annual changes and then by 100 millionthe urban adult
population of the United States.
-------
4-25
The elasticity estimates can be used to estimate one type of health
improvement that might accompany reduced ozone concentrations. Using the
^ A A
estimates B = ( Y , a ) from the specifications (3.1), (1.4), and (4.5),
(5) is evaluated at (Z , OZNEAR ) for all t in the estimation sample and
u u
/. <\ »
the sum S1 = I exp(Z Y + a(OZNEAR ) ) is calculated for each of the three
t t t
alternative specifications. This yields an estimate of the prevailing
count of RRADs in the sample of 3>347 adults given prevailing levels of the
independent variables including ozone. To evaluate the effect of a change,
we first assume that some hypothetical policy measure reduces by 10 percent
the two-week average daily maximum ozone concentration, OZNEAR , faced by
U
each individual and then calculate the sum 32 - Z exp(Z Y +
& t fc
a(.9*OZNEAR.) ).
w
For each of the three specifications, the average (S1-S2)/3347 is
calculated, thus giving an estimate of a typical individual's change in
two-week RRADs given a ten percent decrease in ozone concentrations.
Assuming an adult SMSA population of 110 million, and extrapolating the
two-week decrease in RRADs to an annual figure, we obtain for each
specification an estimate of the total annual decrease in
respiratory-related restricted activity days associated with a hypothetical
ten percent ozone reduction. The results are presented in Table 4-6.
It is here that the implications of the different specifications can
most forcefully be seen. At the two extremes are the 6=0.5 and 5=2.0
formulations of the model. In the former case, the ten percent reduction
evokes a total annual change of more than 22 million RRADs while in the
latter case the change is less than a quarter million RRADs.
-------
4-26
The final step in benefit estimation involves the assignment of dollar
values to these hypothetical improvements in health. Valuing reduced RRADs
is not easy, particularly since that measure embodies a range of
impairments from minor restrictions in activity to bed disability days.
However, based on separate analysis of adults' work loss and bed disability
in Volume Iwherein we found no significant associations with ambient
ozone concentrations and the more severe types of restrictions we
presume that the effects predicted in Table 4-6 are minor restrictions in
activity.
Ideally, these minor RRADs should be valued using changes in
individuals' expenditure functions which reflect both labor-leisure
tradeoffs as well as the possibility of defending against pollution-related
illness (see Harrington and Portney [7], for instance). In practice,
alternative approaches are typically required. Using contingent valuation
methods, for example, Loehman et. al. [11] recently elicited individuals'
reported willingness to pay to avoid one day of various kinds of
respiratory impairments. The values ranged from $2.31 for a day of minor
coughing and sneezing to about $11.00 to avoid a day of severe shortness of
breath. Since the latter impairment is likely to be associated with a work
loss and/or bed disability day, the former value is probably more
appropriate for a minor RRAD. Because of the many uncertainties in
arriving at such estimates, however, we assume that a minor RRAD could be
valued at as much as $20. If each of a predicted 22 million fewer RRADs
are valued at $20, annual benefits to the adult urban population of the
U.S. would be $0.44 billion. If RRADs were as few as 250,000 (as predicted
-------
4-27
in the third row of Table 4-6), and each was valued at $2.31, the
corresponding total would be but $0.58 million.
It is important to note that reduced ozone concentrations may result
in other beneficial effects besides possible reductions in acute
respiratory illness. These include improved visibility, reduced damages to
forests, ornamental plantings, and agricultural output, as well as other
welfare-enhancing changes. All these would have to be considered (and
valued, where appropriate) in any comparison of the coats and benefits of
ozone control.
Even when attention is confined to acute respiratory illness, however,
the uncertainties in estimating benefits are substantial. Both here and in
Volume I, predicted changes in RRADs proved somewhat sensitive to the
choice and measurement of independent variables and, in Volume I at least,
the size of the sample over which the parameters were estimated. Even when
these are held constant, Table 4-6 demonstrates that predicted changes in
RRADs are also sensitive to the assumed form of the exposure-response
function (by two orders of magnitude). Moreover, this difference is based
on a comparison of point estimates without regard to confidence intervals
constructed about them. These uncertainties, coupled with sometimes
conflicting findings from other epideraiological or clinical studies, must
make one cautious in using studies like this in policymaking.
-------
4-28
APPENDIX
As described in Section 4.3, the log-likelihood function of the RRAD
models can be written as
(A1) I - I -exp(X 6) + y.X.0 + c,
. U U I/
w
where exp(X 8) = A . It is easy to show that i is concave in 8 so long as
t C
its inverse Hessian exists. As mentioned in Section 4.3, the maximum
likelihood estimates of 8 obtained by maximizing (A1) are consistent, but
>s
the estimate of the covariance matrix of 3 . using minus the inverse of the
Hessian evaluated at $... will tend to be inconsistent if the data are not
ML
in fact generated by the specified Poisson distribution.
This is easily seen as follows. Note that the model can be
equivalently cast as a nonlinear least squares regression, the t-th
observation being
(A2) y = E(Y.) + u.
u U
ufc
with E(u ) = 0. Clearly, Var(u ) =» Var(Y ) - exp(X 3) so that the u. are
u C u u w
heteroscedastic. If nonlinear weighted least squares is used with the
weights exp(-X 8) formed using consistent estimates of 8, and if the data
U
are in fact Poisson-distributed as maintained, the maximum likelihood
<% -A.
consistent estimates of 8 and Cov(8) will obtain. (The consistency of 8Mr
n*j
for 6 does not depend on the weighting scheme.) However, if the data are
-------
4-29
not Poisson-distributed, the estimate of Cov(S) obtained in this manner
will be inconsistent and t-tests based thereon will be misleading. The
case is fully analogous to the estimation of the heteroscedastic linear
model which yields inconsistent covariance estimates (and, therefore,
t-statistics) if the heteroscedastic nature of the error structure is
either ignored or incorrectly specified.
White [18] and Royall [16] have demonstrated a method whereby
A
estimates of Cov(3) robust against misspecif ication of the underlying
2 1
distribution of the data can be obtained when [-3 J./3B30'] evaluated at
A /\
3^ fails to yield a consistent estimate of Cov(8). Denoting 1(3) as
2
[-3 X./363B'], their suggestion is to estimate Cov(S) as
(A3) KB)"1
where i is the t-th observation's contribution to the log-likelihood
TS
A
function and where all relevant evaluations in (A3) are at 8,., . This is
ML
the method used in constructing the confidence intervals for the parameter
estimates of Section 4-4. In these cases, the standard errors of the
" _i
parameter estimates obtained using KB) as the estimate of Cov(3) are
found to be about two to three times smaller than those obtained using this
alternative method. As noted by White [19], the alternative approach (i.e.
using (A3)) will typically lead to conservative inferences (i.e. "too
A
large" estimates of Cov(8)) in instances where X. is nonstochastic and
t
varies across t, as is the case here.
-------
Footnotes
.Specifically, the Tobit specification error tests of Nelson [13] and
Lin and Schmidt [9] were used. Nelson's is a Hausman test while that of
Lin and Schmidt is a Lagrange multiplier test. Under the null hypothesis
of no raisspecification, both test statistics are distributed asymptotically
2
central X (< \i where '< is the dimensionality of 3. For the specification
described above, both statistics indicate rejection of the no
misspacification hypothesis at better than the 98$ level.
2
These confidence intervals are constructed using the approach
discussed in the appendix, which should give conservative asymptotic
t-statistics. Confidence intervals based on minus the inverse Hessian of
the Poisson log-likelihood function, on the other hand, are much tighter,
but are almost certainly misleading (inconsistent), given the data used.
These results are available from the authors on request.
3
The substantial discrepancy between the magnitudes of the estimates
of the two-week and annual ozone coefficients results, loosely speaking,
from the fact thatwhile the sample means of the two measures are
virtually identicalthe sample variances of the two-week measures are much
larger than those of the annual counterparts in conjunction with the fact
that the expectation E(RRAD ) is the convex function exp(X 3).
v U
4
For comparison's sake, model (3-D was also cast as a geometric
distribution. Here, Pr(Y -y) = Py/(1+P)y*1 for y-0,1.2,.... E(Y.) = P,
C _ v
var(Y ) = P(l-t-P), and for purposes of econometric estimation E(Y ) =
U . U
exp(X 8) and Var(Y ) - exp(2X. 3) + exp(X 3) are specified. As expected,
U « w w
-------
4-31
A
the estimated variances of 3 were somewhat larger than those obtained using
the uncorrected variance version of the Poisson specification while the
estimates themselves were quite similar. However, like the Poisson, the
maximum likelihood variance estimates based on minus the inverse of the
Hessian evaluated at 8.,, are not generally consistent if the data are not
Mil
distributed according to the postulated geometric distribution. Thus,
while larger than the estimated variances of the uncorrected Poisson
specification, the ML estimates of the geometric parameter variances were
still substantially smaller than those obtained using the alternative
approach.
5 is, of course, a parameter to be estimated rather than a given
constant. The ML algorithm used to obtain the Poisaon parameter estimates,
however, did not permit estimation of such additional nonlinearities.
5
Kopp, Raymond, William Vaughan, Michael Hazilla and Richard Carson,
"Implications of Environmental Policy for U.S. Agriculture: The Case of
Ambient Ozone Standards," Resources for the Future working paper, January
5, 1984.
-------
4-32
References
[1] Chappie, Michael and Lester Lave, "The Health Effects of Air Pollution:
A Reanalysis," J._ Urban Econ., vol. 12 (1982) pp. 346-76.
[2] Crocker, Thomas, et. alv "Methods Development for Assessing Air
Pollution Control Benefits," Vol. 1, EPA Document EPA-600/5-79-001 a
(1979).
[3] Graves, Philip and Ronald Krumm, "Morbidity and Pollution: Model
Specification Analysis for Time-Series Data on Hospital Admissions,"
J. Environ. Econ. Manage., vol. 9 (1982) pp. 311-327.
[4] Hausman, Jerry, Bart Ostro, and David Wise, "Air Pollution and Lost
Work," NBER working paper no. 1263, January 1984.
[5] Hausman, Jerry, Bronwyn Hall, and Zvi Griliches, "Econometric Models
for Count Data with an Application to the Patents-R&D Relationship,"
Sconometrica, vol. 52 (1984) pp. 909-938.
[6] Harrington, Winston and Paul R. Portney, "Valuing the Benefits of
Health and Safety Regulation in the Presence of Defensive
Expenditures," RFF Quality of the Environment working paper
no. QE84-09, September 1984.
[7] Hazucha, Michael and David Bates, "Combined Effects of Ozone and Sulfur
Dioxide on Human Pulmonary Function,Ir Nature, vol. 257 (1975) pp.
50-51 . ...
[8] Lave, Lester and Eugene Seskin, Air Pollution and Human Health
(Baltimore, Md.: Johns Hopkins University Press, 1977).
[9] Lin, Tsai-Fen and Peter Schmidt, "A Test of the Tobit Specification
Against an Alternative Suggested by Cragg," Review of Economics ajid
Statistics, vol. 66 (1984) pp. 174-177.
[10] Lipfert, Frederick, "Air Pollution and Mortality: Specification
Searches Using SMSA-Based Data," J. Environ. Econ. Manage., vol. 11
(1984) pp. 208-243.
[11] Loehman, Edna et. al, "Distributional Analysis of Regional Benefits and
Costs of Air Quality Control," J. Environ. Econ. Manage., vol. 6
(1979) pp. 222-243.
[12] Mendelsohn, Robert and Guy Orcutt, "An Empirical Analysis of Air
Pollution Dose Response Curves," J. Environ. Econ. Manage., vol. 6
(1979) pp. 85-106.
-------
4-33
[13] Nelson, Forrest, "A Test for Misspecification in the Censored Normal
Model," Econometrioa. vol. 49 (1981) pp. 1317-1330.
[14] Ostro, Bart, "The Effects of Air Pollution on Work Loss and Morbidity,"
J. Environ. Econ. Manage., vol. 10 (1983) pp. 371-382.
[15] Portney, Paul and John Mullahy, "Ambient Ozone and Human Health: An
Epidemiological Analysis," report prepared for Economic Analysis
Branch, Office of Air Quality Planning and Standards, USEPA under
contract no. 68-02-3583, September 1983-
[16] Royall, Richard, "Robust Inference Using Maximum Likelihood
Estimators," Johns Hopkins University, Department of Biostatistics
Working Paper 549, 1984.
[17] Smith, V.K. (ed.), Environmental Policy Under Reagan's Executive Order
(Chapel-Hill, N.C.: UNC Press, 1984).
[18] White, Halbert, "Maximum Likelihood Estimation of Misspecified Models,"
Econometrica, vol. 50 (1982) pp. 1-25.
[19] , "Corrigendum," Econometrica, vol. 51 (1983) p. 513.
[20] White, Lawrence, Reforming Regulation; Processes and Problems
(Englewood Cliffs, N.J. : Prentice-Hall, Inc., 1981).
-------
Chapter 5
CONSTRUCTING A LIFETIME SMOKING PROFILE
USING THE 1979 HEALTH INTERVIEW SURVEY
We noted above that individuals' amassed "stocks" of
cigarettes consumed over a lifetime are potentially
significant influences on respiratory illness. Yet the
models estimated in Volume I all made use of a much more
crude measure of smoking behavior. An important issue
here, then, is the construction of a more sophisticated
measure given available data. One theoretically plausible
construct is K(T) = /exp ( -r ( T-t) )C (t )dt, where S-CT.T], _T
a ~
is time started smoking, T is present time, C(t) is
instantaneous cigarette consumption.at t, and r is a decay
or depreciation rate. The empirical representation of
K(T) is not straightforward, however, even given the
information available in the smoking supplement to the
1979 HIS.
This is so for several reasons. First, an
T
individual's entire lifetime smoking profile {C(t)} _ is.
never given in the data. This is so even if C(t) is
couched in discrete time as fC.} with reasonably
O
high-frequency (e.g. one month or even one year)
realizations. At best the profile can be approximated by
the use of subsidiary information. Second, the above
formulation is quite simple, one of an infinite number of
reasonable proxies for the "true" relationship. Third,
-------
5-2
while K(T) as defined above is in principle capable of
describing the effects of cigarette tar-nicotine content
and cigarette length, it seems that amending the
formulation to account for such influences would add
little to the analysis given the nature of the data.
The dataset used to construct the measures is of
course the 1979 HIS smoking supplement. This survey gives
a reasonably detailed picture of individuals' smoking
status at the time of the survey in addition to
information on past attempts to quit, age at which regular
cigarette smoking began, number of cigarettes smoked per
day at the time of peak consumption, and other attributes.
Yet most of the data in the smoking survey is of little
use insofar as construction of a "packyear" or stock
measure is concerned. (A check on several other datasets
containing information on smoking behavior reveals similar
or even more severe weaknesses.) Ignoring minor points
and the complications presented by problems such as faulty
recall, the most serious problems are the following.
Although data are given on peak daily cigarette
consumption, no information is available on when the peak
occurred (unless it coincides with, present consumption
levels, C(T)) nor on the duration of consumption at that
peak rate. Second, information on quits (number of
attempts; duration of time off) is insufficient to
construct for either current or former smokers a
reasonable profile of the time intervals over which C(t)
-------
5-3
was zero. Quit duration information is available only as
the interval from time last smoked to T for former smokers
and for the length of the single most recent quit (if any)
for current smokers. Some detail is provided for current
smokers on numbers of serious quit attempts, but what
constitutes a "serious attempt" is analytically
problematical, a subjective assessment suraly varying
across individuals. No information on age started smoking
is given for the subsample of occasional smokers.
Finally, it should be noted that even the use of an
obvious stock proxy measure like C(T-5) with, for example,
<5 equal one year, is precluded by data availability. It
is possible to determine neither consumption levels of one
year (or six months, or one month) ago nor, in many
instances, even the sign of C(T-6).
Yet there is some information that permits the
construction of a reasonably interesting, albeit rough,
proxy measure for the lifetime smoking profile K if one is
willing to make certain assumptions. Since age started
smoking is unavailable for the occasional smokers, this
subsample (about two percent) will henceforth be excluded
from the analysis. By assumption, K = 0 for all never
smokers. Thus, the proxy must be constructed for the
subsamples of former and current smokers. The data are
such that separate treatment of these two subsamples is
required. In both instances, however, several plausible
temporal smoking profiles can be created. In the absence
-------
5-4
of any prior information on which profile best captures an
individual's true consumption path, the only sensible
solution is to consider several different specifications
in the empirical analysis and assess ex post the
sensitivity of the results to the specification used.
For both former and current smokers, the construction
of the K measures relies on a major assumption about the
influence of quits on the temporal consumption profile.
That is, the profile is "forgetful" of quits: once an
individual resumes smoking after having quit, consumption
over the quit interval is treated as if there had been no
quit at all. For example, Figure 5-1 depicts the manner
in which this forgetfulness operates, with true
consumption C*(t) shown as a solid curve and proxy
consumption shown as a dashed curve:
-------
5-5
Figure 5-1 :
Hypothetical Smoking Profile
C(t) ,
T T ft
Such an approach has the unfortunate implication that, to
use an extreme example, the proxy profile of an individual
who quit smoking twenty years ago and resumed yesterday is
drastically different than that of an individual with an
otherwise identical smoking history who had not resumed
smoking. Until better microdata on individuals' smoking
histories become available, such drawbacks are inevitable.
For former smokers, the variables used to construct
the stock proxy are time started smoking (J) ; number of
cigarettes smoked per day at peak consumption (MCIGP); and
time last smoked regularly (T). There are three plausible
profiles that can be constructed using this information;
these can best be described graphically.
The first profile for former smokers, shown in Figure
5-2, assumes that peak consumption occurs at the midpoint
(-T*), and that consumption rises and falls
-------
5-6
linearly from and to zero from this peak (C(t) is
henceforth shown in solid lines):
Figure 5-2
Smoking Profile: Former Smokers I (F-I)
C(t) .
NCIGP -
NCIG
T T ft
The second profile for former smokers, shown in
Figure 5-3, is based on the assumption that peak
consumption is attained immediately at T_ and continues at
that rate until ₯:
-------
5-7
Figure 5-3
Smoking Profile: Former Smokers II (F-II)
C(t) .
NCIGP
NCIG
The third former smoker profile, shown in Figure 5-4,
assumes that from T_ consumption increases linearly to
NCIGP which occurs at f, then falls instantly to zero :
-------
5-8
Figure 5-1
Smoking Profile: Former Smokers III (F-III)
CU) .
NCIGP -
NCIG
The construction of the, profiles for the current
smokers uses T, NCIGP, and NCIG. Five profiles seem
sensible: three for current smokers for whom NCIG-NCIGP
and two for those where NCIGP exceeds NCIG.
The first profile, in Figure 5-5, is analogous to
that in 5-4: consumption increases linearly from T_ to
NCIGP which coincides with NCIG at T:
-------
5-9
Figure 5-5
Smoking Profile: Current Smokers I (C-I)
C(t) .
NCIGP
T T T t
The profile in Figure 5-6 assumes that peak
consumption first occurs at T*, then continues at that
rate to T (T* is defined for current smokers as (T-T)/2):
Figure 5-6
Smoking Profile: Current Smokers II (C-II)
C(t) ,
NCIGP
-------
5-1 0
The third construct for the NCIG-NCIGP group,
illustrated in Figure 5-7, assumes that NCIGP is attained
immediately at T and continues at that rate to T:
Figure 5-7
Smoking Profile: Current Smokers III (C-III)
C(t)
NCIGP -
"I"
T; T T t
The profile in Figure 5-3 is the first shown for the
subsample reporting NCIG less than NCIGP. Here it is
assumed that consumption increases linearly to NCIGP which
occurs at T* and then decreases linearly to NCIG at T:
-------
5-1 1
Figure 5~8
Smoking Profile: Current Smokers IV (C-IV)
C(t) ,
NCIGP
NCIG
Finally, the profile shown in Figure 5-9 assumes that
NCIGP is attained immediately at T and declines linearly
to NCIG at T:
-------
5-1 2
Figure 5-9
Smoking Profile: Current Smokers V (C-V)
C(t) .
NCIGP
NCIG
XT T t
Given these specifications, it is seen that all of
the integrals to be evalated have linear or
piecewise-linear integrands. That is, on the interval
[a,c] (where a»T and c»? or T), the integrands are either
of the form 5(t)(a+3t), te[a,c], or 5(t)(o. +8. t), te[a,b],
<5 (t) (a +3 t) , te(b,c] for b = T*. Specifically,
K(T)
I exp(-r(T-t) ) (a +3 . t)dt,
J-1fl
J
where fl *Ca»b] and 8?-(b,c]. Straightforward integration
by parts gives the solution as
2 aup(Q.)
X(T) - I exp(-rT)[r (
-------
5-1 3
The final point is the determination of r. Use of
decay or discount rates is often essential in applied
econometrics. Yet, in most instances there is no way to
know, the "correct" rate, so that in discounting future
streams or depreciating accumulated stocks, the strategy
typically adopted is to posit some rate or set of rates
and conduct analysis as if the rate is known. This
approach has been used in a wide spectrum of applications,
generally with little discussion or justification for the
rate chosen (although some studies helpfully demonstrate
the sensitivity of results to the assumed rates). Such an
approach will be used here.
Given the above assumptions on consumption profiles
and decay rates, the K proxy measures can be derived using
the relevant data in the estimation sample. However, an
obvious drawback is that with three possible consumption
profiles for NCIG=NCIGP current smokers, two for
NCIG
-------
5-1 4
The combinations to be used are (for former,
NCIG=NCIGP current, and NCIG
-------
5-15
by exposure to ambient air pollution, current cigarette
consumption, and other covariates, an individual's
prev ious cigarette smoking predisposes him or her to an
increased risk of respiratory illness. Using a subset of
the proxy profiles described above enables us to test for
the presence and extent of such effects.
-------
Chapter 6
CIGARETTE SMOKING, AIR POLLUTION, AND RESPIRATOR'/ ILLNESS:
AN ANALYSIS OF RELATIVE RISKS
6.1Introduction
The relative risks associated with cigarette smoking
and ambient air pollution are difficult to assess. First,
individuals' health status is largely subjective and often
difficult to measure.1 In addition, lung physiology is
complex as well as heterogeneous in a population of
individuals, hindering both the identification and
measurement of all potential determinants of respiratory
illness. Moreover, there is little theoretical guidance
as to the likely form of any functional relationship
between risk exposure and illness response. Finally, data
on exposure to risks are often not all one might like them
to be. It is thus apparent why one expert on quantitative
risk assessment was moved to comment:
Quantitative risk assessment is not a panacea.
A primary limitation is that such an assessment
is concerned only with what can be measured and
quantified.2
In spite of these problems, relative risk assessments
must be undertaken for smoking and air pollution. Both
have been the subject of much discussion and study in the
health and environmental policy communities, and in the
-------
6-2
popular press as well. Moreover, as discussed below,
considerable resources are being devoted to understanding
and reducing both risks. It is essential, therefore, that
the risks attributable to smoking and air pollution be
assessed simultaneously within a single coherent
framework; otherwise, risks attributed to one may in fact
be due to the other, thus biasing any estimates of the
health risk of but one of the variables.
The plan in this chapter is as follows. Section 2
discusses in greater detail the problem of acute
respiratory illness and some possible links between it and
smoking and air pollution. Section 3 describes the
dataset used in the empirical analysis, explains the
health measures utilized, and sketches the estimation
strategy. finally, Section 4 presents empirical results,
derives the estimates of the relative risks of interest,
and briefly suggests new directions for future research in
thi s area.
6.2 Smoking, Pollution, and Acute Illness
The association between cigarette smoking and several
major chronic illnesses is well known. The 1982, 1983,
and 1984 reports of the U.S. Surgeon General detail and
publicize, respectively, the relationships between
cigarette smoking and cancer, cardiovascular disease, and
chronic obstructive lung disease. For these diseases, the
-------
6-3
indictment of cigarette smoking is strong: although data
can obviously never demonstrate causality (as the Tobacco
Institute is wont to remind), the correlative evidence is
overwhelming.
Less widely publicized are associations between
cigarette smoking and less severe illnesses. There is
evidence, however, that suggests the existence of such
linkages. Chapters Three and Six of the 1979 Surgeon
General's report summarize much of the existing research
in this area. There it is reported that relative to
nonsmokers, current smokers have more frequent respiratory
tract infections and a greater prevalence of cough;
symptoms like cough and sputum production tend to increase
with the number of -cigarettes smoked. Moreover, the 1979
report finds that "...people who had ever smoked...had a
higher incidence of acute illnesses than did people who
had never smoked (p. 3~6)." Smokers report approximately
H5% more illness-related work loas days than do never
smokers.
Owing to the magnitude and severity of the illnesses
associated with cigarette smoking, considerable attention
and public resources have been devoted to the study of and
remedies for such illnesses. While one is hard-pressed to
estimate the value of the resources spent in such
activities, it is safe to venture that the value is
enormous.
-------
6-4
Other public policies have been put in place to
protect individuals' respiratory health. For example,
several federal agencies are involved in the protection
against and compensation for damages from pneumoconiosis
(black lung disease), while exposure to respirable
hazardous substances in the workplace -- like the cotton
dust which causes byssinosis comes under the regulatory
purview of OSHA.
Of more immediate interest here, however, is the
widespread concern that ambient air pollution may be
detrimental to individuals' respiratory health. Many
clinical and epidemiological studies have tested for
possible relationships between ambient air pollution and
both morbidity and mortality. The cornerstone of federal
air pollution policy in the U.S., the Clean Air Act,
places primary emphasis on the protection of public health
from air pollution, insisting that air quality standards
be set to provide "an adequate margin of safety...to
protect the public health." Of the possible air
pollution-related illnesses, it is of course respiratory
illness that is of utmost concern. Regulatory mandates
pursuant to the Glean Air Act are not inexpensive:
according to the most recent estimates, annual costs of
complying with the Act are approximately $25 billion.
This sum is sure to grow as older sources of pollution are
retired and newer ones -- which must meet stricter
-------
6-5
emissions standards are built to replace them. Whether
such expenditures achieve desired ends efficiently (if at
all) is a question on which we hope to shed some light.
Our task in this chapter is to assess the relative
contributions of cigarette smoking and air pollution to'an
individual's risk of suffering from acute respiratory
impairments of varying severity. We concentrate on this
category because none but the most extraordinary exposures
to air pollution can be expected to rival direct (or
perhaps even passive) smoking as a cause of the more
serious illnesses like cancer, cardiovascular disease, or
chronic lung disease. It seems to us plausible to
hypothesize that if typical levels of ambient air
pollution in the U.S. are to influence individuals' health
in any manner, then acute respiratory illness must be a
primary area of suspicion. Because it can be quite
expensive to control air pollution, it is important to
assess how the benefits of doing so compare to those
associated with policies oriented towards smoking
cessation. We suggest that an analysis of the relative
contributions of cigarette smoking and ambient air
pollution to acute respiratory illness is one way to
approach this important assessment.3
6.3 Data and Estimation Strategy
The individual data used here are from the 1979
-------
6-6
Health Interview Survey (HIS), a national sample of
approximately 110,000 individuals conducted over the
course of each year by the National Center for Health
Statistics. In this regard, the analysis in this chapter
is similar to that in Volume I and in Chapter 4 in the
present volume. The socioeconomic data elicited from each
respondent in the HIS includes information on age, race,
sex, income, education, as well as other individual- and
household-specific characteristics. In addition, the
supplemental survey on smoking behavior administered in
the 1979 HIS make it particularly useful for present
purposes. This supplemental questionnaire was asked of
one-third of the approximately 78,000 adults ( 17 * years)
interviewed, and provides detailed data on 'lifetime
smoking history and present smoking behavior.
Restrictions in activity due to any illness
experienced during the two-week period prior to the date
of each interview are reported by the interviewee or
another household member responding for the interviewee.
Manifestations of illnesses are classified in three types:
bed disability days, work or school loss days, and what
might best be thought of as minor restricted activity
days. The latter are days on which the subject was
neither bedridden nor forced to miss work or school, but
on which the individual did suffer from an impairment
sufficient to cause a perceptible restriction on usual
-------
6-7
activity. The information on health impairments elicited
in the survey is coded by cause according to the
International Classification of Disease. As discussed
earlier, attention is limited in this chapter to those
restrictions in activity due to respiratory illness.
All air pollution data come from the U.S.
Environmental Protection Agency's SAROAD system. The air
quality data used here are measured over the two-week
recall period for which the individual acute health data
are available. The received opinion of respiratory
physiologists suggests that not all airborne pollutants
are equally important in influencing respiratory health.
Accordingly, the analysis of acute respiratory illness
here uses air pollution data for ozone (OZONE), a gaseous
pollutant that is the primary constituent of smog, and
sulfates (SULFATE), perhaps the most harmful of airborne
particulate matter. The subsequent analysis characterizes
individuals' exposures to air pollution using data from
the air pollution monitors nearest the center of the
census tract in which the individual resides. No
individual for whom the nearest monitor is more than ten
miles away is included in the estimation sample, with the
sample average distance from centroid to monitor being
slightly more than four miles. For more details on the
air pollution data used in this analysis, consult Volume
I.
-------
6-8
We include two measures to control for cigarette
smoking. The first is the individual's daily consumption
of cigarettes at the time of the interview (NCIG). (The
HIS unfortunately contains no information on cigar or pipe
smoking.) Since the consumption data are self-reported,
some caution must be exercised in light of Warner's
underreporting hypothesis [5]; however, we make no attempt
here to correct for this possible errors-in-variables
problem. (Interestingly, in light of the mounting
evidence on the harms of passive smoking, attributing zero
as the number of cigarettes smoked per day by a
"nonsmoker" represents perhaps an understatement of the
daily dosage of cigarettes.)
Both medical evidence and common sense suggest that
the rate of current cigarette consumption alone is an
insufficient characterization of an individual's
sraoking-related risk of respiratory illness (see, for
example, Chapter Six of the 1979 Surgeon General's
report). A more appropriate characterization of these
risks incorporates the influences of both current as well
as past cigarette consumption. Accordingly, the influence
on the likelihood of current acute respiratory illness of
lifetime cigarette consumption is measured by the variable
PACKS, a proxy for the number of cigarette packs that a
given individual has "amassed" over his or her lifetime.
PACKS can be viewed as a stock or state variable equal to
-------
6-9
the integral over an individual's lifetime cigarette pack
consumption profile (C(t)}. (See Chapter 5 for a
discussion of the creation of the K(t) measures.) The
measure defined in Chapter 5 and converted into pack units
is selected from the set of candidates to serve as the
pack/ear proxy in the present analysis.
Table 6-1 provides a summary description of the air
pollution and smoking measures, as well as the other
independent variables used, and Table 6-2 depicts their
sample means, minima, and maxima.
Among the measures of respiratory illness available
in the HIS, the number restricted activity days due to
respiratory illness during the two-week recall period
(RRAD) is a logical choice for use in the present
analysis. However, one drawback to its use as a measure
of health status is that it is a somewhat aggregated
concept. Any day reported as a bed disability or work
loss day, when due to respiratory illness, is counted as a
RRAD, as are days when individuals are hampered in minor
ways from performing usual activities without confinement
to bed or work loss. It is possible, however, that the
determinants of minor restrictions are likely to be
different in kind or in magnitude -- from the
determinants of severe limitations.
The HIS data do not enable a complete disaggregation
of these different types of respiratory restrictions. For
-------
Table 6-1
Variable Definitions
Variable Name
D escri pt i on
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
Average daily maximum one-hour ozone
reading during two-week recall period
at monitor nearest the centroid of
respondent's census tract of residence,
subject to ten-mile distance cutoff
(in parts per million)
Average 24-hour sulfate concentration
during two-week period at monitor nearest
the centroid of respondent's census tract
of residence, subject to ten-mile
distance cutoff (in wg/m )
Number of cigarettes smoked per day
Proxy for lifetime cigarette consumption,
in packs (see text or [5] for detailed
description)
Average daily maximum temperature during
two-week recall period (in degrees r)
Average daily precipitation during
two-week recall period (in inches)
Age, in years
Number of years of schooling
Annual family income, in 1979 dollars
Equals 1 if respondent reports a
persistent limitation in activity due
to a chronic ailment, equals 0 otherwise
Equals 1 if male, equals 0 if female
Equals 1 if white, equals 0 if black
Equals 1 if respondent reports usual
activity is working and usual employment
is blue collar, equals 0 otherwise
Equals 1 if respondent reports usual
activity is working and usual employment
is white collar, equals 0 otherwise
Equals 1 if
acti vi ty is
otherwise
respondent reports usual
going to school, equals 0
-------
Table 6-2
Sample Summary of Independent Variables (n=3073)
Variable
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
3LUECOL
WHITECOL
INSCHOOL
Mean
.0426
1 0.87
7.454
3299.1
63.92
.114
U2.83
1 1 .73
17095
.173
.433
.857
.232
.288
.079
Minimum
0
.784
0
0
11.14
0
17
0
500
0
0
0
0
0
0
Max imum
. 21 36
52. 1 4
98
441 74. 1
106.36
.637
96
1 3
30000
1
1
1
1
1
1
-------
6-1 0
example, due to some peculiarities in data collection, it
is not possible in all instances to disentangle work loss
from bed restricted days (see Volume I for a detailed
discussion of these problems). However, the data do
permit a unique disaggregation of RRADs into two
qualitatively distinct types: days restricted in activity
due to respiratory illness but not confined to bed (minor
RRAD, or RADM), and days restricted in activity due to
respiratory illness with bed confinement (severe RRAD, or
RADS) . . Thus, it is possible to determine for each
individual the number of RADM, RADS, and nonrestricted
days (NRD) occurring during the two-week recall period.
In the analysis to follow, we consider two separate
definitions of RADM and RADS: first, those related to
respiratory illness classified by NCHS as either chronic
or acute (RADM-CA, RADS-CA), and, second, those related
only to respiratory illness classified as acute (RADM-A,
RADS-A)." The sample frequencies are presented in Table
6-3.
The nature of these health status measures is such
that several peculiar characteristics must be treated
simultaneously in the estimation procedure. First, the
measure best suited for the analysis is multivariate in
nature: during any two-week period, individuals can report
minor or severe restrictions in activity due to
respiratory illness, or can report no respiratory
-------
Table 6-3
Sample Frequency Distribution of RRAD Measures (n=3073)
Number of Days
0
1
2
3
4
5
6
7
3
9
1 0
1 1
1 2
13
1 4
RADM-CA
3013
16
1 4
8
6
2
1
2
1
1
1
0
1
0
7
RADM-A
3027
1 4
1 2
6
5
2
1
1
1
1
1
0
1
0
1
RADS-CA
3007
1 9
1 8
1 0
5
4
1
3
3
0
0
0
0
0
3
RADS-A
301 4
1 8
1 5
1 0
5
4
1
2
3
0
0
0
0
0
1
-------
6-1 1
impairment. Second, outcomes are mutually exclusive. On
a day where an individual reports a RADM, neither a RADS
nor a NRD can be reported; similar exclusivity holds for
RADS and NRD. Third, for all individuals, each of RADM,
RADS, and NRD is constrained to take integer values in
{0, 1 , . . . , 1 4}, with the sum RADM+RADS-i-NRD equal to
fourteen. Finally, because of the protocol of the HIS, it
is not possible to determine on what days during the
two-week recall period a given individual reported the
RADM, RADS, or NRD; only the number of each type of
outcome is known. While it seems sensible to suppose that
RRADs would be contiguous rather than disparate during any
particular time interval, the data used here do not permit
such a conjecture to be verified.
Following the discussion in Chapter 2 of this volume,
the estimation strategy is to view each day during the
two-week recall period as a trial on which one and only
one of the three possible outcomes can occur. For each
individual, then, there are fourteen trials. Because any
one individual's covariates are invariant across the
fourteen trials and, as noted above, because it is
impossible to ascertain which health outcomes occurred on
which days (except, of course, in the polar case where the
same outcome occurs on all fourteen days), it is plausible
for estimation purposes to assume independence both across
trials for an individual and across individuals. (In the
-------
6-1 2
estimation subsample used, it happens that at most one
individual per household is included. Thus, contagion
effects -- which might otherwise vitiate tine assumption of
independence across individuals -- can be ignored.)
The preceeding paragraphs describe a model that can
be appropriately cast in terms of a multinomial
distribution with k=»3 possible outcomes; n =n =n = l4
t T
independent trials for all t,t; and probability vector
( *M ,Tfo , IT.. ) (M = RADM, S-RADS, N=NRD) such that
Mt St Nt
ir.. +ir_ -(-ir.I -1 . The number of successes or incidences of
Mt St Nt '
each type is n for q»M,S,N, and n *nq *n« -14 fpr all
qt t t t -
t. Thus, denoting the multinomial (vector) random
variable for the t-th individual as Y. ,
n
Pr(Y =y.) - nl H [ ( IT }qt]/n !, (1)
C C qefl qt qt
where Q={M,S,N} and n»14. A logit specification for the
ir is assumed:
qt
ir - exp(X 8 )/( I exp(X 8 )), (2)
qt q * r
for q=M,S,N. The parameter vectors 3 are unique only up
to a difference, so that some normalization is necessary;
3.T=«0 is used here. Details on estimation are presented in
N
the appendix.
A basic and more popular version of the model
-------
6-13
described above is the ordered logit model described in
Chapter 2, in which it is assumed that there exists some
mechanism that orders the outcome probabilities according
to a particular latent measure (illness severity, for
example). The typical assumption is that the coefficients
S*=*(3 0,...,8 y.) are invariant across the outcomes q
q qz . qK.
(except for the outcome whose parameter vector remains
normalized to zero), with the ordering is characterized by
outcome-specific intercept terms, such that
8 01<3 ." signifying the
ordering "more severe than." For purposes of comparison,
therefore, we also present estimates of a a multiple-trial
version of an ordered logit model. It happens that this
is a parameter-restricted version of the multinomial model
specified above, with (K-1) restrictions of the form 33-8*
on the likelihood function (A.2) implied by the ordered
logit likelihood function. It is thus possible to test in
a straightforward manner whether these restrictions are
valid insofar as the model and data sample used here are
concerned. The test is a standard likelihood ratio test,
with the test statistic computed as LR = -2(4-4,.); ln is
U A U
the maximized likelihood function value for the ordered
logit specification and 4 is the corresponding value for
-------
6-1 4
the multinomial model (A,2). Under the null hypothesis
that the (X-1) restrictions are valid, LR is distributed
asymptotically as central x with (K-1) degrees of
freedom.
6.4 Estimates of Model Parameters and Relative Risks
The estimates of the model Using the chronic and
acute RRAD measures of respiratory illness are presented
in Table 6-4. Insofar as the parameter estimates
associated with the independent variables other than
smoking or pollution are concerned, it is seen that most
are statistically significant in at least one of the
RADM-CA or RADS-CA estimated parameter vectors, with
generally plausible signs in most instances. The
parameter estimate associated with the current level of
cigarette smoking (NCIG) is statistically important in fJ,,
M
but is insignificant in &. Lifetime cigarette
consumption (PACKS) plays an opposite role: its associated
parameter estimates are positive and significant in the 3M
vector, but statistically indistinguishable from zero in
Bg. SULFATE appears to be an insignificant contributor to
either RADM-CA or RADS-CA. OZONE, conversely, has an
associated parameter estimate in 8,, that is positive and
statistically significant, although the ozone coefficient
in 3S is statistically unimportant. This finding is
consistent with those in Volume I. There, using more
-------
Table 6-4
Model Estimates: Chronic and Acute RRADs with
Linear Risk Factor Influence
Vari abl e
INTERCEPT
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
Log(D 273^.55
3M-CA
-6.34"
(9.8)
7.03
(2.5)
.0061
(.54)
-.0034
(.61 )
.36E-4
(3.1)
-.01 4
(3.4)
0.85
(1.3)
-.0084
(1 .9)
-.0082
(.39)
-.32E-4
(4.1)
0.99
(6.7)
0.28
(2.1)
2.59
(5.1)
-1 .76
(5.9)
-0.18
(1.0)
-0.42
(1.4)
^S-CA
-3.64
(7.2)
1 .77
(.45)
- .01 2
(.84)
.026
(4.8)
.53S-5
(.33)
-.025
(5.2)
0.45
(.62)
.31E-3
( .061 )
-.058
(2.4)
-.63E-4
(7.0)
0.46
(2.6)
.079
(.53)
0.78
(3-2)
-0.17
(.75)
0 .83
(4.3)
0.90
(3.D
Note: Asymptotic normal scores for H :3 -0 in parentheses
-------
6-15
"primitive" OLS and logit techniques, we found positive
and often significant associations between ozone and minor
illnesses among adults, but no pattern of associations
when we examined either work loss or bed disability days.
Thus, the findings in this chapter provide some
corroborative evidence using a more sophisticated and
appropriate statistical approach.
Similar estimates obtain in the model of the
acute-only respiratory ailments RADM-A and RADS~A,
presented in Table 6-5. Most notable is that the
individual parameter significance levels tend to be
somewhat lower than those estimated in the chronic-acute
model of Table 6-4, although the qualitative
interpretation is in most instances unchanged._ Of
particular import is that the coefficient estimates
associated with OZONE and PACKS in 3 are no longer
significant at the 95? level.5
In Chapter 4 we found that various nonlinear
transformations of the ozone measure lead to differing
conclusions about the significance of the relationship
between ozone and respiratory health. There, remember,
the transformation (OZONE)' proved most significant. As
such nonlinearities are potentially important in the
present analysis as well, we also consider simple
transformations of OZONE, NCIG, and PACKS of the form
OZONE*1, NCIG*2, and PACKS*3 for Alf\a,A,>0. (We ignore
-------
Tabls 6-5
Model Estimates: Acute-only RRADs wi
Linear Risk Factor Influence
th
Variable
» v* » .1. w* is J_ w
INTERCEPT
OZONE
SULFATE
NCIG
PACKS
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
*M-A
-6.67
(8.9)
5.09
(1.3)
. 0031
( .59)
- .0088
(1.1)
3. 1 E-4
(1 .8)
-.0069
(1.2)
0.46
(.53)
.0053
(.95)
-.0088
(.3D
.28E-4
(2.8)
-1.10
(3.6)
-0.15
(.79)
1 .82
(3.6)
-1 .25
(3.6)
0.13
(.59)
0.28
(.82)
PS-A,
-4.66
(8.1)
-2.39
(.49)
.0031
( .22)
.023
(3.5)
1 . 4E-5
( .072)
-.024
(4.4)
0.54
(.68)
.0057
(.98)
-.039
(T.4)
. 41 E-4
'(4. 4)
0. 1 3
(.58)
-0.30
(1 .3)
0.65
(2.4)
0.55
( 2 ; 1 )
1 .35
(5.9)
1 .57
(5.0)
Log(i)--20M8.53
Note: Asymptotic normal scores for
in parentheses
-------
6-1 6
transformations of SULFATE because of the its generally
insignificant contributions as witnessed in Tables 6-4 and
6-5.) The software used for estimation does not enable
maximum likelihood estimation of the \.; instead, a grid
J
search approach is used, where the search is over
( Al , A2, A^efx-yxf, and *-{0 . 5 , 1 . 0 , 1 . 5 , 2 . 0 }. Of the
sixty-four possible (AlfA2,A3) triples, that which
maximizes the conditional (on AltA2) likelihood function
with respect to (8M,3q) is selected as the (pseudo) MLE.
The estimates of the RADM-CA and RADS-CA model using
the nonlinear transformations are presented in Table 6-6.
The pseudo-MLEs of the A are At=0.5 , Aa»1.5, and A,*1.0,
t\ r . ,
with a likelihood ratio test indicating that these
transformations are jointly significant, at greater than
the 95% level. The overall qualitative findings are
unchanged; however, the parameter estimates associated
with the transformed risk factors are more finely resolved
than those presented in Table 6-4. Similar statements can
be made about the acute-only model, whose estimates are
presented in Table 6-7. Again, the pseudo-MLEs for the A.
K
are 0.5, 1.5, and 1.0 for the OZONE, NCIG, and PACKS
transformations, respectively. The likelihood ratio test
of the joint significance of the transformations is
significant only at slightly above the 90$ level; since
the A^ are not true MLEs, however, such an LR test is
somewhat misleading, and is biased in favor of accepting
-------
Table 6-6
Model Estimates: Chronic and Acute RRADs with
Nonlinear Risk Factor Influences
Variable
INTERCEPT
OZONE*1
SULFATE
NCIG*2
PACKS*3 '
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
3M-CA
-6.75
(10.5)
5.10
(3.5)
.0039
(.35)
-.25E-3
( .32)
.35E-4
(3.0)
-.020
(4.2)
1 .01
(1.5)
-.0080
(1.9)
-.0071
(.33)
-.32E-4
(4.1 )
0.99
(6.7)
0.28
(2:0)
2.56
(5.0)
-1 .77
(6.0)
-0.19
(1.1)
-0.41
(1.3)
3S-CA
-3.64
(7.2)
2.86
(1 .7)
-.01 4
O.O)
.0034
(5.3)
.81 E-5
(.53)
-.030
(5.9)
0.57
(.78)
-.94E-3
(.19)
-.059
(2.4)
-.64E-4
'(7.1)
0.46
(2.9)
.064
(.43)
0.75
(3.0)
-0.15
(.65)
0.86
(4.4)
0.86
(3.0)
Log(l)2729. 13
Note: Asymptotic normal scores for H.:g »0 in parentheses
\ from grid search: A^O.5; A^-1.5; A3-1.0.
-------
Table 6-7
Model Estimates: Acute-only RRADs with
Nonlinear Risk Factor Influence
Var iable
INTERCEPT
OZONEA l
SULFATE
NCIG1'
PACKS*3
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
SM-A
-7.09
(9.5)
4.89
(2.5)
.0052
(.37)
-.33E-3
(.26)
.23E-4
(1.3)
-.01 3
(2.1)
0.68
(.78)
. 0066
(1.2)
-.0076
C.27)
-.28E-4
(2.8)
-1 .09
(3.6)
-0.14
(.74)
1 .79
(3.5)
-1 .27
(3.7)
0.12
(.54)
0.30
(.89)
BS-A
-4.59
(7.9)
1 .35
(.71)
. 44E-3
( .032)
.0034
(4.1 )
-.88E-6
C.046)
-.029
(5.2)
0.69
(.87)
.0052
(.91)
-.040
(T.4)
-.42E-4
-(4.5)
0.13
(.58)
-0.31
(1 .9)
0 .62
(2.3)
0 .56
(2.2)
1 .36
(6.0)
1 .55
(4;9)
Log(A)2045.12
Note: Asymptotic normal scores for H : 3 =-0 in parentheses
* from grid search: ^=0.5; A2=1.5; A3=1.0.
-------
6-17
the null that transformations are not important.
The, estimates of the ordered logit models are
presented in Table 6-8. The X, transformations suggested
i\
by the multinomial models of both Table 6-6 and Table 6-7
are used here. In the model of RADM-CA and RADS-CA
(column 1), the parameter estimates for (OZONE) , TEMP,
INCOME, CHRONLIM, WHITE, BLUECOL, (NCIG)1'5 and PACKS are
all significant at greater than the 99* level. A perhaps
peculiar result is that the estimate of the RADM intercept
exceeds that for RADS, thus calling into question the
validity of the ordered logit specification. Inde.ed, the
2
X/.5,-distributed likelihood ratio test statistic of the
restrictions on the multinomial model that are implied by
the ordered specification has a value of 89.58, suggesting
that the ordered specification can be rejected with
considerable confidence in favor of the unrestricted
multinomial model. Similar results obtain for the model
of RADM-A and RADS-A (column 2): estimated parameters
associated with TEMP, INCOME, WHITE, WHITECOL, INSCHOOL,
and (NCIG). are significant at above the 99? critical
level, while the asymptotic t-statistics associated with
(OZONE)!5 and CHRONLIM parameters exceed the 95J level.
2
In this instance, the x test statistic for the ordered
logit model restrictions has a value of 68.58, again
suggesting that the ordered specification be rejected in
favor of the general multinomial model.
-------
Table 6-8
Model Estimates: Ordered Logit, Nonlinear Risk Factor Influence
Variable
Chronic-Acute
Acute-only
INTERCEPT-RADM
INTERCEPT-RADS
OZONE*1
SULFATE
NCIG*2
PACKS^3
TEMP
RAIN
AGE
EDUC
INCOME
CHRONLIM
MALE
WHITE
BLUECOL
WHITECOL
INSCHOOL
-4.28
(11 ;6)
-5.07
(13-7)
4.17
(3.8)
-.0033
(.37)
.001 7
(3.4)
.24E-4
(2.6)
-.024
(7.1 )
0 .76
(1.5)
-.0045
C1 -4)
-.029
(T.8)
-.46E-4
(7.9)
0.75
(6.7)
0.17
(1 .7)
1 .38
(6.3)
-0.93
(5.3)
0 .28
(2.1)
0.21
(1.0)
-4.82
(11.0)
-5.45 '
(12.3)
3.15
(2.3)
.0033
(.35)
.0020
(3.0)
.93E-4
(.71)
-.022
(5.3)
0.57
(1.1)
.0063
(1 .6)
-.024
(T.2)
-.35E-4
(5.2)
-0.41
(2.3)
-0.22
(1 .8)
0.99
(4.2)
-0.26-
(1.3)
0 .71
(4.6)
0.92
(4.0)
LogU)
-2773-84
-2079.41
Note: Asymptotic normal scores for H : 3 =»0 in parentheses.
\. Ok
k from grid search: ^=0.5; X2=1.5; X3=1.0.
-------
6-1 8
On the basis of these results, we elect to use the
estimates of the nonlinear risk factor multinomial models
presented in Tables 6-6 and 6~7 as the foundation of the
relative risk estimates. In the multinomial model,
translations from the qualitative outcomes to quantitative
estimates of relative risks are fairly straightforward.
One obvious strategy for evaluating the relative risks of
smoking and air pollution would be to assess and compare
the estimated elasticities of the daily outcome-specific
probabilities with respect to the pollution and smoking
control variables. Using the incidence probabilities
defined in (2), and allowing for the cases where the
control variables are subject to the nonlinear
transformations h(x )-(x . } k, the elasticity formula is:
C K w *C
XB,.)). (3)
i £ **
which simplifies when \ -1. (Here, X denotes the
transformed X vector.)
U
While the elasticity comparison approach provides
perhaps the most straightforward method for assessing the
relative risks of interest, the nature of the data used
here renders it somewhat uninformative. In brief, the
problem is that 64J of the sample are classified as
-------
6-19
current nonsmokers, while UH% are never smokers. It is
seen by inspection of equation (3) that for these large
subsamples the estimated NCIG and PACKS elasticities are
zero.
We therefore adopt in lieu of elasticity comparison
an approach that considers the discrete changes from the
baseline or prevailing daily incidence probabilites
attributable to a variety of discrete changes in the
control variables from their prevailing sample values.
This strategy has at least two advantages. First, it
circumvents the non- or never-smoker problem. Second, the
magnitudes of the hypothetical discrete changes in the
control variables are set to mimic potentially interesting
policy measures.
The strategy is as follows. First, for each of the
four incidence outcomes (RADM-CA, RADM-A, RADS-CA,
RADS-A), a baseline mean probability is calculated using
the estimated models in Tables 6-7 and 6-8. This mean
probability which is simply the sample average of the
IT is denoted ir . The second step is to perturb the
qt q '
control variable of interest in each X by the specified
c
amount and reevaluate each individual's incidence
probability using the perturbed X . The sample average of
t- ,
these new ir is denoted TT^ . Finally, for each illness
v
measure and each control perturbation considered, the
diffference ^"' i3 calculated. The results are
-------
6-20
presented in Table 6-9.
As Table 6-9 indicates, depending upon the specific
model of interest, changes in either or both measures of
smoking as well as ambient ozone concentrations can affect
the likelihood of an individual's reporting a minor or
severe respiratory impairment. For instance, in the
RADM-CA model a 5 percent increase from the sample mean in
the average daily maximum 1-hour ozone concentration
increases the estimated risk of a minor respiratory
-4
impairment on any given day by an average of 1.76*10 . A
comparable increase in risk is predicted to result from an
individual's having smoked slightly more than an
incremental one pack per day for two years (since one pack
-4
per day for one year adds 0.75*10 to the risk of a
RADM-CA). The same model reveals that a ten percent
increase in the average daily maximum ozone concentration
is about equivalent to an increase of an extra one pack a
day smoked for five years in terms of incremental risk
(3-50*10~4 and 3.85*10~4, respectively).
We can also compare the incremental risks of air
pollution with those associated with current cigarette
consumption. For instance, from the RADS-CA model, a five
percent increase in ambient ozone concentrations increases
the baseline risk of a severe acute respiratory illness by
h
0.89*10 . This is about one-twelfth the effect of an
individual's currently smoking an additional half pack of
-------
Table 6-9
Estimated Mean Changes from
Baseline Probabilities ir (x10,OOQ)
RADM-CA RADS-CA RADM-A RADS-A
Baseline ir
60.202 50.671 35.331 40.910
Hy pothetical
Control Change;
OZONE +
OZONE +
t
NCIG +
O
NCIG,. +
t
PACKS
W
PACKS,.
.05
. 1 0
5
1 0
+ 3
+ 1
*OZONE
*OZONE
65
825
1
3
0
3
.76
.50
--
« ..
.75
.35
0.
1 .
4.
10.
0 .
0.
89
75
47
80
1 4
71
0
1
0
1
.98
.95
--
.. _
.29
.49
0
0
3
3
.35
.68
.46
. 40
--
Notes: "--" signifies negative predicted change
OZONE signifies the sample mean concentration of
OZONE
v
.0426
-------
6-21
cigarettes per day. Comparable calculations could be made
in the other models as well (although we prefer not to go
into detail here since the significance levels on the
variables of interest are not sufficiently high to warrant
large confidence in the estimated risk changes).
For purposes of public policy, it would be desirable
to go beyond the estimation of relative risks to consider
the cost and efficacy of "control" measures. This would
permit at least crude cost-effectiveness comparisons to be
made. Some estimates are available regarding ozone
reductions. White [6] reports that when the National
Ambient Air Quality Standard for ozone was reviewed in
1978, the marginal cost of meeting a standard of 0.12 ppm
as opposed to one of 0.14 ppm was approximately $2.0
billion. Although the form of that standard (second
highest hourly reading at a monitor) differs from the
measurement of ozone in this study (average daily maximum
one-hour reading during a two-week period), a link between
the two could be made. This would permit an estimate of
the costs per unit of predicted ozone risk reduction,
holding other possibly beneficial effects of ozone
reduction (agricultural productivity increases, for
example) constant.
In principle, estimates could be assembled on the
costs of reducing cigarette consumption (for an excellent
discussion of the nature of such costs, see [1]). Using
-------
6-22
such data, and the results presented above, estimates of
cost per unit of reduced risk from smoking could be
derived and compared with those resulting from pollution
control. Finally, if appropriate allowances were made for
the qualitatively different nature of the two risks -- the
differing degrees of voluntarism, for instance -- it would
be possible to draw inferences about potentially efficient
resource allocation.
-------
6-23
APPENDIX
Given an independent sample of T observations, the
likelihood function is
T n
nl H { H [(ir )Pt]/n 1 }. (A.1 )
t-1 refl t t
In logs,
T
i - Z I In [X 3 - log( I exp(X 3 ))]} + c, (A.2)
t-1 refl t sea ^ 3
where c does not depend on 3=»(3..t8c). I is concave in 3,
n o
thus assuring convergence.
A Mewton-Raphson algorithm programmed in SAS's PROG
MATRIX is used for estimation. Except for the adjustment
for the multiple-trial nature of the data, the vector of
first derivatives and matrix of second partials of 1 with
respect to 3 are identical to those of the more familiar
single-trial multinomial logit model. Thus,
and
-------
6-24
where q,pe{M,S}, and S. . is the Kronecker delta. The
information matrix estimate is
evaluated at 3; its inverse serves to estimate Cov(S).
-------
6-25
Notes
*3ee Manning, et. al . [4].
2Lave [2], p.2.
3It is obvious that the "target" groups in a smoking
cessation or mitigation policy differ from those
in a policy designed to reduce ambient concentrations
of air pollution. One might argue that
a critical difference is that smokers assume their
risks voluntarily whereas exposure to ambient air
pollution is largely involuntarily; policy measures,
it is argued, should be more concerned-with those
risks assumed involuntarily, these being more in the
nature of classic economic externalities. However,
the recently mounting evidence on the health
consequences of passive smoking suggests that the
target groups in smoking mitigation policies might
well extend beyond the population of voluntary
smokers. To the extent that passive smoking is
involuntary in the sense that the coats
associated therewith have not been capitalized by
market forces then the distinction between the
-------
6-26
air pollution and smoking policy target groups tends
to blur .
"*A11 illnesses reported in the HIS are coded as either
chronic or acute. Regardless of the interval between
incidence and time of survey, some illnesses are --
by definition coded as chronic due to their
intrinsically chronic nature (e.g. emphysema, lung
cancer, most cardiovascular problems). Moreover,
illnesses that might otherwise be classified as
acute are classified as chronic if the interval
between their incidence and the time of the interview
exceeds three months. Thus, an acute illness,
according to the NCHS codification scheme, is an
illness that is typically construed as acute and
that has had a duration of less than three months
at the time of the interview.
slt is admittedly troublesome that the signs of the
estimated coefficients for either MCIG or PACKS are
negative -- although not statistically distinguish-
able from zero -- in some of the specifications. We
suspect that this phenomenon is attributable largely
to collinearity between the two measures; indeed,
their sample correlation is 0.55.
-------
6-27
On a priori grounds, as argued earlier, both
should be included in a model of respiratory illness.
However, if collinearity is severe, their separate
influences become difficult to identify. To explore
further this possibility, we estimated two alternative
versions of the multinomial model for both
specifications (CA,A) of the RRAD measures, one in
which NCIG , but not PACKS, is included, and one in
which PACKS, but not MCIG, is included. The results
largely corroborate the collinearity hypothesis: in
all cases, the estimates of the parameters associated
with the single included smoking measure are positive
for both the RADM and RADS probabilitea.
-------
6-28
REFERENCES
[1] Atkinson, A.B. and T.W. Meade. "Methods and
Preliminary Findings in Assessing the Economic and
Health Services Consequences of Smoking, with
Particular Reference to Lung Cancer," Journal of the
Royal Statistical Society A 137, pp. 297-312, 1974.
[2] Lave, Lester 3. Quantitative Risk Assessment in
Regulation. Washington: Brookings, 1982.
[3] Maddala, G.S. Limited-Dependent and Qualitative
Variables in Econometrics. Cambridge: Cambridge
University Press, 1983.
[4] Manning, W., J. Newhouse, and J. Ware. "The Status of
Health in Demand Estimation; or, Beyond Excellent,
Good, Fair, Poor," in V. Fuchs, ed. Economic Aspects
of Health. Chicago: University of Chicago Press for
MBER, 1982.
[5] Warner, Kenneth E. "Possible Increases in the
Underreporting of Cigarette Consumption," Journal of
the American Statistical Association 73, pp. 314-313,
1 978.
[6] White, Lawrence. Reforming Regulation: Processes and
Problems. Englewood Cliffs, NJ: Prentice-Hall, 1981.
-------
Chapter 7
CHRONIC RESPIRATORY DISEASE
In the initial analysis in Volume I of ozone and chronic respiratory
disease (CRD), several regressions were estimated over what was referred to
as a "resident! ally stable" group of individuals. That is, the
observations were restricted to those individuals who had been living in
the same place for five years at the time they were interviewed in the 1979
HIS. The purpose of this restriction was to reduce the chances that
someone who had lived in another location for a long time would be matched
up to air pollution exposures at his or her new location, thus confounding
our analysis of CRD. Our findings in Volume I (see especially p. 4-71)
suggested that concentrating on the residentially stable made a difference
in the conclusions one draws from such analysis.
However, the five year residency requirement we imposed in that
analysis is itself rather weak. Accordingly, in analysis conducted since
the completion of Volumes I and II, we have reexamined the incidence of
CRDand its possible link to air pollutionusing a group of individuals
who had lived for at least ten years at the location they reported in the
1979 HIS. While this does not eliminate the possibility of spurious
correlation, it lessens it when compared to the five-year residency
restriction imposed earlier. These results are reported here.
These results are responsive in other ways to comments and suggestions
on our earlier work. For instance, in response to puzzlement over the
-------
7-2
relatively weak performance of the smoking variables in explaining CRD in
the earlier work, we included in the reanalysis the variable PACKYRS.' This
measure, described in detail in Chapter 5, proxies individuals' lifetime
smoking habits. It is included along with NCIGS, a measure of current
smoking activity. Also, we have purged the list of regressors of many
which had little or no explanatory power in the original analysis. In this
respect, the models estimated below are akin to the "lean" model in the
original analysis (see equation (29), p. 4-77 of Volume I). Finally, in
the analysis here we have included an additional measure of long-term air
pollution concentrations, one which takes data from just one year (1979)
but includes annual average readings for all monitors within 20 miles of
the respondents' census tract cenfcroids. These are denoted as OZ79AV,
S479AV, and SP79AV for ozone, sulfatea, and total suspended participates,
respectively.
The analysis of CRD below differs from that in Volume I in one other
important respect. Here we have run separate regressions for those
individuals who received the "probe" questions concerning respiratory
illness and for those who did not. (Recall that in addition to the main
questionnaire, all respondents in the HIS were given one of six different
probes inquiring in detail about six specific disease categories. Thus,
one-fifth of the respondents were asked whether they had any of a number of
specific respiratory diseases; the other five-sixths of the sample was
probed (one-sixth each) about cardiovascular, geni to-urinary,
rausculoskeletal, digestive, and nervous system disorders.) Even those
individuals not receiving the respiratory probe could report the presence
-------
7-3
of CRD in open-ended questions earlier in the survey. However, those who
had a condition like asthma, and who forgot to volunteer that information
in the open-ended questions, would have the chance to report it if they
received the respiratory probe (where asthma is listed). They would not
have this opportunity if they received, say, the cardiovascular probe.
Because of this difference, it is of course possible that the reported
incidence of CRD might differ between the two groups. When we separated
the two groups, this is precisely what was found. The sample below
consists of 2,743 individuals who had lived in the same dwelling for at
least ten years at the time of the 1979 HIS.' In addition these were
individuals for whom complete data were available on the dependent and
independent variables of interest. Of the 2,743 individuals, 460 had
received the respiratory probe questionnaire while the remaining 2,283 had
been administered one of the other five probes. Of the 460 receiving the
respiratory probe, 67 (or 15 percent) reported the presence of a chronic
respiratory condition. Of the 2,283 not receiving the respiratory probe,
only 74 (or 3 percent) reported such a condition. Since the assignment of
the six probes was random, this suggests reporting differences that merit
separate investigation. This we do below.
The results of our limited reanalysis of the determinants of CRD are
presented below.' All models are estimated using logit techniques.
Equations (1) - (5) pertain to those receiving the respiratory probe while
(6) - (10) include only individuals not receiving that probe. In equation
(1), exposures to air pollution are characterized by the annual average
daily one-hour maximum ozone concentration at the nearest monitor (OZ79NR) ,
-------
7-4
Ok
i
a
z
t
x
Ok
I,
r*
ec
o
e a
a
m u*
_
*
«O O
** 1*
-_ .^
N O
* in
0» *sj
* t
^
(J» f«*>
Irt «J«t
*
O O
** »o **
o
^* w
* ^
CO 0
* rj
<* ^ «*
o o o m
* t
* *4 00
w
J^ ^ .
z **i ^ 5
fc _ *
i»*> »» 3; 2
M M
o n o 5
lift ^»
O ^ (N* h»
o o» o o
O r* O O
^
** ^
0*0 m so
00 00
oo do"
0 sT »N -O
o * o o
* t
Q *+ O O
W
f* ^*
O0 0| ^)
O « O O
* .
0- 00
w
^ r»
0 0
t
m 01
o o y> 04 M
°°. ° f
O -%
O <** t
O O
O
o o
1
o
o
o »
0 O
0 0
,*
0 -
0 0
o o
f*t
o
o «
o o
0 0
IM
o
o -»
O (9
o o
o
o »
o o
0 0
^»
WV ^
oc o
» X
* vt
U C>
« M
a, w
!«
1 HI
1
I e
i ^
fl I -
*M la*
O 1 U
m i o
1 1 4*
1 L,
** 1 a
*> 1
o i u
1 1 U
1-4
1 C
1 0
I L
IX
-. 1 u
m \s
o !*
m )<
1 I
1
!£
i
1 M
«*» i e
O 1 O
m i-i«
1 |««
1 >
!
IJ»
* 1 O
O 1
n im
t ICB
I'M
1 -
1**
1
1
1
1 »
OB | *
e» 1 c
OO l«4
7 p
1 Ik
1 b
1 O
OBI t W
r* 1 «4
7 IS
! *
1
o i e
1 O
« t u
> i
!*
i
i
* 1 10
OK 1
f» 1
-« 1 «l
i i e
1 0
1**
o i:
» t w
at i
1 1 O
10
l-o
i-e
& t
a> l
o 1
A O 1
»* «* 1
t 1
.* U 1
J 1
-------
7-5
Table 7-1 (cont'd.) Regression Results
VARIABLE DESCRIPTION
OZ79NR Average daily maximum one-hour ozone concentration in
1979 at monitor nearest individual's residence (in
parts-per-million)
OZ79AV Same as above but averaged over all monitors within 20
miles of residence (in ppm)
OZMULT Average hourly reading over all monitors within 20 miles
and averaged over the period 1974-79 wherever data were
available
S479NR Average 24-hour reading for sulfates for 1979 at nearest
monitor in micrograms-per-cubic meter)
S479AV Same as above but averaged over all monitors within 20
miles
SP79NR Average 24-hour reading for total suspended particulates
for 1979 at nearest monitor (in ug/m )
SP79AV Same as above but averaged over all monitors within 20
miles
SPMULT Average 24-hour reading for all monitors within 20 miles
and averaged over 1974-79 wherever data were available
RACE Dummy variable (-1 if white, -0 if other)
SEX Dummy variable (»1 if male, =0 otherwise (female,
ambiguous, etc.))
INCOME 1979 household income in dollars
EDUCATION Years of school completed
AGE In years
2
AGE Square of above
PACKYRS Lifetime cigarette consumption
CIGS/DAY. Number of cigarettes per day currently smoked
-------
7-6
and by the annual daily average sulfate concentration, again at the nearest
monitor (S479NR). (Recall that annual averages are used in explaining
chronic illness rather than the concentrations during the two-week recall
period. The latter are the appropriate measures in analyses of acute
illness like those in Chapters 4 and 6 above.)
According to equation (1), annual average ozone concentrations are
positively and significantly associated with the likelihood of reporting
CRD in the probe group. Neither sulfates nor any of the other independent
variables are related to CRD in a statistically significant way, including
the more sophisticated smoking variable PACKYRS.' In equation (2), sulfates
are replaced by total suspended particulate matter (also measured at the
nearest monitor) with virtually no change in the results. In equation (3)
both ozone and particulates are averaged over all the monitors within
twenty miles of the respondent's home. Thi's reduces both the magnitude as
well as the significance of the estimated of the ozone effect. The
particulate estimate changes sign (it is expected to be positive) but is
still far from being significant. The size and significance of the
coefficient estimates on the other regresaors are unaffected by this change
in the characterization of exposure. Equation (4) replicates (3) but with
sulfates substituted for total suspended participates. The results are
virtually identical to those in (3) with none of the regressors being
significantly associated with the likelihood of CRD.
In equation (5) , ozone and participates are measured by the multiyear
(1974-1979) annual average concentration (see Volume I, Chapter 2,
especially p. 2-37). This change makes a substantial difference in the
-------
7-7
size and significance of the estimated ozone effect. In addition, the
parameter estimate associated with particulates increases substantially in
significance, although it is still well below conventionally accepted
levels (t = 1.96 connotes significance at the 5 percent level). As in
equations (1) - (4), none of the other regressors, including either smoking
variable, is significantly associated with CRD.'
Equations (6) - (10) perform the same set of regressions as (1) - (5).
The difference is that the sample in the former consists of 2,283
individuals, none of whom received the respiratory probe as part of the
1979 HISJ Each of these individuals had the opportunity to report the
presence of a chronic respiratory disease in the open-ended part of the HIS
(and 74 did so), but they were not shown a list of CRDs and asked whether
they had any of them. As indicated above, only 3 percent of this group
reported CRD, as compared with 15 percent of the sample used in equations
(1) - (5).
The findings in (6) - (10) provide an interesting contrast to the
earlier ones. The ozone variable is never estimated to be significantly
associated with CRD in (6) - (10).' However, the total suspended
particulates coefficient estimate is uniformly more significant in this
latter set of regressions.' In fact, in equation (10) TSP is positively and
significantly (at the 5 percent level) associated with CRD.' Sulfates
performed as weakly as in the earlier runs.
Of equal interest is the performance of other independent variables in
(6) - (10). For instance, income is negatively and significantly
associated with CRD in all five regressions. All other things equal,
-------
7-8
individuals having higher incomes are relatively less inclined to report
CRD.' In addition, both cigarette smoking variables are significant.
PACKYRS, the measure of accumulated smoking history, is positively related
to the likelihood of CRD as one would expect. The sign of NCIGS is
negative, however, suggesting that current smokers are less likely to
experience CRD.' One explanation for this seemingly counterintuitive
finding is that individuals who believe they have or have been diagnosed as
having CRD have in all likelihood quit smoking. If so, one would expect to
find only those free of CRD among individuals currently smoking.
Because our findings are quite sensitive to the choice of the "probe"
or "non-probe" samples, some discussion is required.' It is our opinion that
the "probe" samplethat is, those who received questions about particular
respiratory diseasesis more likely to reflect accurately the incidence of
CRD in the United States. In fact, the National Center for Health
Statistics uses the results from the six different probes to make its
estimates of specific disease prevalence in the United States. On the
other hand, one must admit the possibility that at least some individuals
are motivated by the probe to report having some diseases of which they
have heard but for which they never received a professional diagnosis.
Concerning the poor performance of even the more sophisticated smoking
measures in (1) - (5), we intend to do additional work. One direction for
this work will be the disaggregation of the set of CRDs into
disease-specific analyses.' For instance,, it might be the case that smoking
(or other of the independent variables, for that matter) is related to the
incidence of emphysema but not to asthma or chronic sinusitis. By
-------
7-9
aggregating these different forms of CRD in the present analysis, we may be
obscuring disease-specific associations. This may also shed some light on
the role of ozone and other air pollutants in CRD.
-------
Chapter 8
ADDITIONAL SENSITIVITY ANALYSES
This chapter summarizes the results of sane additional sensitivity
analyses conducted pursuant to a variety of comments and suggestions
received during the peer review phase of the project.
8.1. The Effects of Precipitation on Acute Health Status
It was suggested by several peer reviewers that the use of two-week
daily average precipitation (AVPRECIP) as a covariate in the acute health
status models was perhaps an inappropriate characterization of the threat
to health posed by precipitation. Rather, it was argued, a superior
characterization would account not only for the mean effects of
precipitation (as captured by AVPRECIP), but also for the variance
effects, i.e. the number of days during the two-week period on which
rainfall occurred. The hypothesis is that the same total amount of
precipitation during a two-week period (=»14*AVPRECIP) poses a different
risk to health (respiratory health, in particular) when spread out evenly
over the two-week period than when concentrated over a one or two day
span.
Our data enable the examination of such effects. The idea is to
construct a measure of precipitation that captures both the mean and the
variance effects. The measure we created to assess this question
(RAINDAY.) is formulated as AVPRECIP divided by the average number of days
during the two-week period on which any precipitation occurred at all
(AVRAINYN). Thus, the measure can be construed as the average amount of
-------
8-2
precipitation occurring on the days when any precipitation occurred at
all. The measure is positively related to the mean effects, but
negatively related to the variance effects.
In order to assess the possible effects of substituting this new
measure as an explanatory variable, we examined the sample correlation of
the three measures:
AVPRECIP AVRAJNYN RAINDAY
AVPRECIP 1.000 0.533 0.790
AVRAINYN . 1.000 0.038
RAINDAY 1.000
The extraordinarily high correlation between AVPRECIP and RAINDAY has led
us' to conclude that the substitution of the latter measure for the former
in our acute health models would probably, have little material influence
on the results. Thus, while the Question of the appropriate
characterization of weather stress in statistical models of illness risks
is certainly an interesting one that merits additional study, it seems
reasonable to suggest that such additional effort in the present analysis
would probably not lead to additional clarification of the air pollution -
health effects relationships of primary interest.
8.2 Sample Size, Model Specification, and Parameter Estimate Sensitivity
In many of the models estimated in Volume I, the point estimates of
the relationships between air pollution and illness varied across
specifications depending on what set of regressors was used. The addition
or deletion of regressors not only implied respecifications of the null
-------
3-3
hypotheses under test, but also typically necessitated different sample
sizes on which the estimation was performed. In most cases, the varying
sample sizes were attributable to the fact that the data availability for
the various air pollution measures differed by pollutant, so that when
different sets of pollution measures were tested, the sample sizes varied
accordingly.
Sample selection considerations aside, the effects of using these
different sample sizes should be manifested only in the efficiency
properties of the estimators. However, inferences about the relationship
between air pollution and illness outcomes depend on the sample and
specification used. Thus, an understanding of the cause of the variance
in parameter estimates seems essential.
We concentrate our analysis of this phenomenon on models (49) and
(50) estimated in Volume I. Here, it is noteworthy that the addition of
the covariates N2NR01 and CONR01 in (50) to the set of pollutants included
in specification (49) (i.e., 03NR01, S4NR01 , and SPNR01). has at least two
important implications. First, the estimated coefficient associated with
03NR01 falls from 1.87 to 1.41, and the associated t-statistic drops from
2.32 to 1.46. Second, owing to the relative paucity of GO data the
estimation subsample in (50) is about 25 percent smaller than that used in
(49) 3,703 versus 4,899, respectively. Thus, one is necessarily led to
the question: Is the change in the estimated ozone coefficient and its
significance level due to the inclusion of the additional covariates, to
the smaller subsaraple, or to both?
To investigate this important question, we reestimated equation (49)
using the same sample of 3,703 on which specification (50) was estimated.
The results of this exercise are reported in Table 8-1 below. There it is
-------
OEP VARlABLEl TRADR3*
Table 8-1
SOURCE OF
MODEL 18
ERROR 3684
C TOTAL 5702
ROOT MSE
DEP MEAN
c.v.
SUM OF
SQUARES
97;427923
5307.067
5404.495
1 '200239
0.183365
654.563
MEAN
SQUARE
5.412662
1.440572
R«SQUARE
AOJ R«8Q
F VALUE
3.757
0.01SO
0.0132
PR08>F
0.0001
VARIABLE OF
INTERCEP
03NR01
S4NRQI
SPNR01
RACEM1BO
MARY1 NO
INCOMCQN
FAT
FATSO
AGE
AGE SO
SMQKY1NQ
CHRLWOUM
DMAXTEMP
AVPRECIP
HUMIDRF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
PARAMETER
ESTIMATE
0^767031
0.00287839
0,00032749
0^105050
0,044511
0.012«69
.0000062991
-0^775180
0,156010
0.010623
0,000111508
0.'046446
0,000489087
0.256219
0.00179559
0.00312125
0." 020593
0.003097651
STANDARD
ERROR
0.450348
0.924612
0.004161826
0.0006888529
0.061582
0.044609
0.046157
.00000239667
0.316011
0.061056
0.006666158
0.0000702869
0.042435
0.006831445
0.056659
0.002027Q94
0,001211361
0.217254
0.002611947
T FOR HOf
PARAM£TER«0
1.703
1,920
0.692
0.475
1.706
0.998
0.279
2,628
2.453
2.555
1.594
1,586
1.095
0,072
4.522
0.886
2,577
-0,095
1,186
PROB > ITl
0,0886
0,0550
0,4892
0.6S<»5
0,0881
0.3184
0,7804
0,0086
0,0142
0,0107
0,1111
0,1127
0,2738
0,9429
0,0001
0,3758
0.0100
0,9245
0.2357
-------
8-4
seen that the estimated coefficient associated with 03NRQ1 is 1.77 with a
t-statistic of 1.92. This coefficient estimate is relatively close to the
1.87 estimated on the larger subsample well within one-half of the
standard deviations of either estimate. It seems then that the difference
between the 1.77 value in the r_eestiraated model (49) and the 1.41 value
estimated in model (50) should be largely attributed to the inclusion of
the two additional pollution covariates. That such a change results is of
little surprise given that the partial correlation between 03NR01 and
N2NR01 is large (0.281). In the presence of such high correlation, one
would expect that the separate influences of 0 and NO- would be more
difficult to identify than would be the case if the two measures were
orthogonal. The results of this exercise are somewhat reassuring given
that samples of varying sizes were used in estimation throughout Volume I.
In summary, on the basis of this (admittedly small-scale) exercise,
it seems fair to say that the effects of using different estimation
samples were indeed largely restricted to efficiency effects, and that the
dispersion of the estimates of the air pollution - illness relationship
should be attributed not to the different samples used, but rather, as
would be hoped, to the different specifications tested.
8.3 Poisson Regression Analysis of Volume I Models (48), (49), and (50)
In the later phases of our research, we have largely turned our
attention to estimation techniques which we believe better treat the
nature of our dependent variables than the methods utilized in the
large-scale analyses presented in Volume I. Insofar as the
restricted-activity-day measures of illness are concerned, the Poisson
regression technique (described in detail in Chapter 4) has been a
-------
3-5
preferred estimation method. While the large part of our analysis using
this methodology is presented in Chapter 4, it has been proposed by some
of our reviewers that for purposes of comparison we reestimate using
Poisson methods some of the specifications that were estimated by OLS in
Volume I. Three such reestimations are presented here.
We elect to concentrate this effort on the total respiratory
restricted activity days (TRADRSP) models whose OLS estimates were
presented as models (48)-(50) in Volume I. Recall that these models were
formulated on three different assumptions about which air pollutants
should be included as explanatory variables. Models (48)-(50) specified
the set of air pollution regressors as, respectively, {03NR01, S4NR01},
{03NR01, S4NR01 , SPNR01}, and {03NRQ1 , S4NR01 , SPNR01, N2NR01 , CONR01}.
Due to availability of pollution data, the samples on which these
specifications were estimated had varying numbers of observations;
respectively, these were 4,906 (197); 4,899 (197); and 3,703 (154), where
the figures in parentheses are the number of observations having positive
TRADRSP realizations.
The results of the Poisson reestimations are presented in Tables 8-2
through 8-4. There it is seen that inferences drawn in Volume I about the
relationship between ozone and respiratory-related restricted activity
days are largely corroborated by the reanalysis. Specificically, in all
three specifications, the coefficient estimate associated with 03NR01 is
positive, and statistically different from zero. (Recall from Chapter 4,
however, that these significance levels are perhaps overstated. The
robust covariance estimation techniques used in Chapter 4 are not used in
this reanalysis, so that some caution should be exercised in interpreting
significance levels. However, recall also that the parameter estimates
-------
Table 8-2
095
* NUMBER Of OBSERVATIONS «
N UBS N POS N iERO
4906 197 4709
I.MT
93NR01
« PARAMETER ESTIMATES
HAT
0,97*1*7
STO tHR
-------
Table 8-3
089
*** NUMBER OF OBSERVATIONS ***
N 083 N POS N ZERO
4899 197 4702
VARIABLE
INT
03NR01
S4NR01
3PNR01
RACEtetflO
3EXM1FO
MARYINO
INCOMCON
FAT
FATSQ
AGE
AGESQ
3MOKY1NO
EDCOMCON
CMR^MOUM
*«* PARAMETER ESTIMATES ***
BETA MAT 3TD ERR T 3TAT
-1.95552
10.2733
0.0062^601
0. 00225262
0.669553
0,175727
OMAXTEMP
.000028456
-1.7037
0.325624
0.0544776
».00060?24T
0.247066
0.009U445
1.03795
O.Q0682027
-0.0161773
0,7(J2162
0. 01483*2
0.63180*5
1.44988
0,00728557
0,00126152
0.131209
0^0762988
0.0786125
0.000004451
0.340436
0,05*1044
0.0120^78
0.000127149
0^0725259
0.0118421
0.0791518
0.00302260
0,00199425
0.35^582
0,004^0774
-3.06347
7.0856
-0,860058
-1,78563
5.10296
2.30315
-1,066
-6,39313
-5.00447
5.60412
4,5031
4.73653
2,85506
0,769665
13,1134
-2.25636
6.11196
i.95272
3.2205
-------
Table 8-4
08S
t«« NUMBER OF OBSERVATIONS *
N ogs N COS X iERO
3703 15* 3349
PARAMETER ESTIMATES
INT
03MROI
S4NROI
SPNR01
RACE* 180
SEX&1FO"
WARY1NO
INCUMCON
fAT
8ET* HAT
i.soazb
8,45847
HUMI0HF
S2NH01
CONR01
0.1U061
000039U1
0.058130*
,000596945
"
SMQKY1NO
COCOMCON
STO &RR
0,691047
0,00904154
0.00146432
0,0947231
0,0881084
,0000051072
'0,357296,
0,0595565
0.0134132
.0,00679773
' 0,961681
0,0113383
0,6813664
O.OU736B
0.01139U
0,00261315
0,0254361
O.Q03335SI
0,0022185
0.432224
0,00518343
T STAT
2,1826
5.12837
-2,7466
4,83992
1.72194
1.29456
7,66391
4,87693
5,80728
4,33381
.4,31483
3.27755
0.533707
11,0007
3,39926
0,42321
0,0403237
U7»79i
-------
3-6
themselves should be consistent.) In addition to the ozone relationship,
the other estimated relationships are largely in line with those reported
in the original Volume I specifications estimated by OLS.
The upshot of this analysis, then, is that the inferences suggested
in Volume I seem substantiated, and while the magnitudes of the estimated
responses do differ (as would be expected with different estimation
techniques), the direction and general magnitudes of the estimates are
quite comparable.
8.4 Sensitivity to Aggregation Across Smoking and Chronic Illness Status
A common econometric problem occurs when disparate structures are
mistakenly assumed to be identical. When empirical analysis proceeds by
aggregating the disparate structures and estimating as if they were
identical, it will generally be the case that none of the structures will
be estimated consistently. It has been suggested that insofar as the
health outcome models estimated in Volume I are concerned, such
aggregation bias poses a potential problem when the structures of the
health outcome models are assumed to be the same across either smoking
status or chronic illness categories.
In the present section, we undertake a reanalysis of some of the
specifications estimated in Volume I, considering the possibility that
individuals' illness responses to covariates are different depending on
whether they are never, former, or current smokers, and on whether they
are or are not plagued by a chronic respiratory condition.
The first analysis that of differential responsiveness across
smoking status uses Poisson regression analysis of the TRADRSP
dependent variable. The sample sizes used for the groups of never,
-------
3-7
former, and current smokers are, respectively, 1,439 (47); 565 (26); and
1,243 (47), where again the number of positive TRADRSP realizations are
given in parentheses. The set of air pollution regressors is limited in
this exercise to ozone and sulfates.
The results of this analysis are presented in Tables 8-5 through 8-7
in which both the Poisson ML covariance estimates and those obtained using
the robust methods discussed in Chapter 4 of this volume are presented.
These results reveal an interesting pattern of the relationship between
ozone and TRADRSP. While Table 8-5 shows the estimated relationship
between ozone and TRADRSP to be negative (though statistically
indistinguishable from zero) for never smokers, entirely different, and
somewhat surprising, inferences are drawn about the relationship between
ozone and acute respiratory illness for the groups of former and current
smokers. In Tables 8-6 and 8-7 it is seen that the estimated ozone effect ,
for both these groups is positive and statistically significant at
conventional levels even when the robust estimates of the parameter
standard errors are used. The magnitude of the response appears to be
largest for the group of former smokers, although the physiological
underpinnings of this phenomenon are not obvious. While we have not
tested statistically for whether the structures of the models for the
three groups are the same (using, e.g., a likelihood ratio test), the
results suggest that a reasonable conjecture is that such tests would
reject the hypothesis of homogeneity.
In the second analysis, we use OLS to assess the possibility that the
structures of the TRADRSP models differ depending on whether an individual
has a chronic respiratory illness. The analysis is somewhat hampered
because only a small number of individuals (364) in this estimation sample
-------
Table 8-5
OBS
NUMBER OF OBSERVATIONS
N 089 N POS N ZERO
1*39 47 1392
VARIABLE
INT
03NROI
S4NRQI
RACEW180
SEXM1FO
XNCOMCON
AOE
EOCOMCON
CHRLMOUM
AVMAXTMP
AVPRECIP
PARAMETER ESTIMATES «
8CTA MAT STO ERR T STAT
1,72325
-0,0921191
1,4318
0,0116624
», 000043179
0,00137636
0,011961
1,03309
9.8738C-0*
0,811043
0,561939
4,12275
0,018894
0,327568
0,162797
,0000086888
0.00382972
0,0210916
0,171375
0,00524898
0,720308
3,76634
0,417984
4,87558
4,37101
0.0716374
4.96956
0,35939
0.548135
6,02825
0,00188108
1,12597
«** PARM, ESTS. (RQ8UST VARIANCE ESTIMATES)
R08UST
INT
03NR01
S4NR01
RACCW180
SEXM1FO
INCOMCON
AGE
EOCOMCON
AVMAXTMP
AVPRECIP
2,1164»
1,72325-
0,0921191
1.4318
0.0116624
000043179
0,00137636
0.011561
1,03309
9,8738E«06
0,811043
STO ERR
1*94121
6.3755
0,0417871
0,518293
0,462579
,0000256016
0,0085561
0,0603543
0.511635
0,0133843
1,50329
T STAT
1,09027
0,270292
2,20449
2,76253
0,0252116
1,68659
.0.160863
0.191552
2.01919
,000737712
0,53951
-------
Table 8-6
OSS
*«* NUMBER OF OBSERVATIONS ***
N Q83 N PQ3 N ZERO
665 26 639
*** PARAMETER ESTIMATES **
VARIABLE
INT
Q3NR01
34NR01
RACEfclBO
SEXM1FO
INCOMCON
AGE
EDCOMCON
CHRCMOUM
AVMAXTMP
AVPRECTP
BET* HAT
29,1 S89
16.6943
0.0321111
0.643355
.0000026724
0.00746*27
0.0547761
0. 4$20ai
0.0388972
-2.*342
STD ERR
400480
3,50114
0.0103115
400480
0,196764
.0000115801
0,00629732
0.0333252
0.2063
0.00613555
1.17849
T STAT
0.00007281
4.T6824
.0000736409
3.44474
0.230776
1.1861
1.64368
2.23986
6.33965
-1.98729
**** PAR*. ESTS. (ROBUST VARIANCE ESTIMATES) *«**
ROBUST
INT
03NR01
S4NRQ1
RACEW13Q
SEXM1FO
INCOMCON
AGE
EOCQMCON
AVPRECIP
BETA HAT
29.1589
16.6943
0.0321111
29.491T
0.643355
,0000026724
0.00746927
-0.0547761
0.462081
0.0388972
2.342
STD ERR
1.70479
6.96203
0.0250393
0,4108<»4
0.494653
.0000240709
0.0133358
0.0810222
0.659609
0.0133654
2.59748
T STAT
17.1041
2.3979
1.28243
71,7745
1.2876
0.111023
0,560093
0,676062
0,700538
2,91028
0,901644
-------
Table 8-7
oas
» NUMBER Of OBSERVATIONS
N OSS N POS N ZERO
1243 4? 1196
VARIABUC
INT
03NH01
S4NKQ1
RACE* 180
* PARAMCT6R ESTIMATES
HAT STU tRH T STAT
2,45614
9*16519
-0, 00376^5
INCOMCON
AQ6
NC18SOYN
EOCOMCON
AVMAATMH
0,317721
'000046108
»«00004272tt
0.015498
0.04978*
0,670ttQ2
0« 024? 194^
4*33699
0.446641
3*29159
0,0133045
0,249973
0,139093
0*000008284
0*00468037
0*00432861
0,0265047
0*165151
0*00446028
0*614482
5,04713
Z,78443
-0,283251
3*58481
2,28424
5,56598
0*00912914
3,58035
1*87831
4,06175
5,54213
7*05796
» RARM» ESTS* (ROBUST VARIANCE ESTIMATES!
R08UST
INT
03NR01
S4NH01
RACEM18U
INCOMCON
Aae
NCI BSD YN
AVMAXTMf
AVPHEC1P
9ETA HAT
2»4S614
9,16519
0,003768*
0*896108
0*317721
**000046108
.,00004272*
0.015498
0*049784
0*670802
0*0247194
4*33699
STO EHR
1*61305
4*26883
0,0218712
0*647155
0*462594
0000262119
0,0105212
0,0157747
0,0669773
0,513254
0,0118029
2*69697
T STAT
1,52267
2,14701
0*172304
1,40642
0*686026
1*75907
0,0040611
0,982459
0,743297
1,30696
2*0943»
1,60809
-------
8-8
report chronic respiratory conditions. The results are presented in
Tables 8-8 and 8-9, where it is seen that the estimated magnitudes of the
ozone effects are dramatically different in the two instances. Note
carefully, however, that the means of the dependent variable for the two
samples differ by an order of magnitude (0.11 for the sample having no
chronic respiratory illness, 1.04 for the sample reporting seme chronic
respiratory illness). On the basis of this phenomenon, it appears that
homogeneity of the two groups can be rejected without any additional
analysis solely on ground that the outcomes are far too disparate to
believe that the expected values could possibly be the same. For example,
a simple t-test of homogeneity of means would surely reject the null
hypothesis in this instance.
-------
Table 8-8
OEP VARIABLEI TRAORSP
SOURCE or
MODEL 11
ERROR 4515
C TOTAU 4586
ROOT MSE
OEP MEAN
C.V.
SUM Of
SQUARES
11*553244
2928.552
2941*105
0.805374
0,106030
759.5683
MEAN
SQUARE
1*141204
0.648627
R-SQUARC
AOJ R«SQ
r VAUUC
1*759
0,0043
0.0018
PR08>r
0.0551
VARIABLE Or
PARAMETER
ESTIMATE
XNTERCER
03NR01
S4NR01
RACCW180
INCOMCON
A8C
NCI9SOYN
EDCOMCON
4VMAXTMP>
PORMEfl
1 0*278607
1 0.564490
1 -0,00182198
1 0.042292
1 «0.030077
1 0000022889
1 «000475199
I 0.001486407
1 -0,000753311
1 -0,00252467
I Ot082526
I 0*044112
STANDARD
ERROR
0*085010
0.564345
0.002222703
0.037615
0.024943
00900138313
0,0007031875
0.001009186
0*004108444
0.000838691
0.122702
0*032747
T FOR MOI
PARAMETERS
1.177
I.000
-O.ffO
1*124
1*206
l*6fS
0.676
1*473
0*183
3*010
0*673
1*347
PROS * ITI
O.OOli
0.3172
0.4124
0.2609
0.2279
0.0980
0.4992
0.1409
0,8545
0*0026
O.S013
0*1780
-------
Table 8-9
OER VARIABLE! TRADRSP
SOURCE OF
MODEL 11
ERROR 392
C TOTAL 363
ROOT MSC
OEP MEAN
C,V.
SUM or
SQUARES
229,693
3319,766
3849,462
3,071019
1,038462
295,7278
MEAN
SQUARE
20,881194
9,431160
R«SQUARE
AOJ R»SQ
F VALUE
2.214
0,0647
0,03SS
PR08*F
0,0139
VARIABLE OF
INTERCEP
03NR01
S4NR01
RACEW1BO
INCOMCOW
AOC
NCI3SDYN
EOCOMCOM
AVRRECXP
1
1
I
1
1
1
1
1
1
1
1
1
PARAMETER
ESTIMATE
1,296410
9,363780
4,036888
0,707993
0.923020
,0000263253
0,011920
0*016938
0,0037767
0,026317
2,926834
0*464974
STANDARD
ERROR
1,264699
8,108099
0,030704
0,499220
0,391846
,00001916124
0.00989562
0,013916
0.060918
0,011300
1,699772
0,439989
T FOR MOt
PARAMETERS
0,993
1.199
1.201
1.418
2,623
-1.3T4
1.209
1.224
0,062
2,329
1.722
U067
PR08 > ITI
0*3212
0.2469
0,2304
0.1970
0,0091
0.1704
0.2292
0.2219
0,9506
0.0204
0.0860
0.2869
-------
TECHNICAL REPORT DATA
(Please read Instructions on the ret erse before completing)
1. REPORT NO. 2.
EPA-450/5-85-005C
4 TITLE AND SUBTITLE
Ambient Ozone and Human Health: An Epidemiologica
Analysis Volume III
7. AUTHORIS)
Paul R. Portney and John Mull any
9 PERFORMING ORGANIZATION NAME AND ADDRESS
Resources for the Future
1616 P Street N.W.
Washington, DC 20036
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Enviornmental Protection Agency
Office of Air Quality Planning and Standards (MD-12
Research Triangle Park, NC 27711
15 SUPPLEMENTARY NOTES
Project Officer: Thomas G. Walton
3. RECIPIENT'S ACCESSION NO.
5. REPORT DATE
1 June 1985 (Date of Preparation )
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT NO.
10. PROGRAM ELEMENT NO.
12A2A
11 CONTRACT/GRANT NO.
68-02-3583
13 TYPE OF REPORT AND PERIOD COVERED
Final Report
M 14. SPONSORING AGENCY CODE
OAQPS
16. ABSTRACT
This report is the third volume of an analysis of the relationship between
ozone and human health benefits.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS b.iDENTIFI
Benefit Analysis
Air Pollution, 03
Epidemiology
18 DISTRIBUTION STATEMENT 19. SEC'JRl
Unclas
Release Unlimited 20 secypi
Unc las
ERS/OPEN ENDED TERMS C. COS AT I Field/Group
rv CLASS (This Report) 21 NO. OF PAGES
sified 226
TY CLASS iThispagei 22. PRICE
sified
EPA Form 2220-1 iRev. 4-77) PREVIOUS EDITION 'S OBSOLETE
-------
INSTRUCTIONS
1. REPORT NUMBER
Insert the EPA report number as it appears on the cover of the publication.
2. LEAVE BLANK
3. RECIPIENTS ACCESSION NUMBER
Reserved for use by each report recipient.
TITLE AND SUBTITLE
'"itle should indicate clearly and briefly the subject coverage of the report, and be displayed prominently. Set subtitle, if used, in smaller
je or otherwise subordinate it to mam title. When a report is prepared in more than one volume, repeat the primary title, add volume
mber and include subtitle for the specific title.
5. REPORT DATE
Each report shall carry a date indicating at least month and year. Indicate the basis on which it was selected (e.g., date of issue, dare of
<>, oval, date of preparation, etc.).
6. PERFORMING ORGANIZATION CODE
Leave blank.
7. AUTHOR(S)
Give name(s) in conventional order (John R Doe, J. Robert Doe, etc.). List author's affiliation if it differs from the performing orgam
zation.
8. PERFORMING ORGANIZATION REPORT NUMBER
Insert if performing organization wishes to assign this number.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Give name, street, city, state, and ZIP code. List no more than two levels of an organizational hirearchy.
10. PROGRAM ELEMENT NUMBER
Use the program element number under which the report was prepared. Subordinate numbers may be included in parentheses.
11. CONTRACT/GRANT NUMBER
Insert contract or grant number under which report was prepared.
12. SPONSORING AGENCY NAME AND ADDRESS
Include ZIP code.
13. TYPE OF REPORT AND PERIOD COVERED
Indicate interim final, etc., and if applicable, dates covered.
14. SPONSORING AGENCY CODE
Insert appropriate code.
15. SUPPLEMENTARY NOTES
Enter information not included elsewhere but useful, such as: Prepared in cooperation with. Translation of, Presented at conference of.
To be published in, Supersedes, Supplements, etc.
16. ABSTRACT
Include a brief (200 words or less) factual summary of the most significant information contained in the report. If the report Contains a
significant bibliography or literature survey, mention it here.
17. KEY WORDS AND DOCUMENT ANALYSIS
(a) DESCRIPTORS - Select from the Thesaurus of Engineering and Scientific Terms the proper authorized terms that identify the major
concept of the research and are sufficiently specific and precise to be used as index entries for cataloging.
(b) IDENTIFIERS AND OPEN-ENDED TERMS - Use identifiers for project names, code names, equipment designators, etc. Use open-
ended terms written m descriptor form for those subjects for which no descriptor exists.
(c) COS ATI HELD GROUP - Field and group assignments are to be taken from the 1965 COSATI Subject Category List. Since the ma-
jority of documents are multidisciphnary in nature, the Primary Field/Group assignment(s) will be specific discipline, area of human
endeavor, or type of physical object. The application(s) will be cross-referenced with secondary Field/Group assignments that will follow
the primary postmg(s).
18. DISTRIBUTION STATEMENT
Denote relea;>ability to the public or limitation for reasons other than security for example "Release Unlimited." Cite any availability to
the public, with address and price.
19. & 20. SECURITY CLASSIFICATION
DO NOT submit classified reports to the National Technical Information service.
21. NUMBER OF PAGES
Insert the total number of pages, including this one and unnumbered pages, but exclude distribution list, if any.
22. PRICE
Insert the price set by the National Technical Information Service or the Government Printing Office, if known
EPA Form 2220-1 (Rev. 4-77) (Reverse)
-------
DATE DUE
.J __
il.ll--- j .
-------
5 I
> Tl
1C 3
a <
3 -
ft
2
o
-
on
O
a) y 3; sr ^
< ft) c
_
" r* a
O O O
,g a
c° a
J QJ -*
- - S
a, n> J
-^ O ^j
III
m > T3 m TITJ
5^ 3 3 S S
------- |