MORBIDITY, AIR POLLUTION AND HEALTH STATISTICS
Bart D. Ostro
and
Robert C. Anderson
Economic Analysis Division, PM-220
Office of Policy Analysis
*U.S. Environmental Protection Agency
401 M Street, S. W.
Washington, D.C. 20460
(202) 382-2790
Presented at the Joint Statistical Meetings of the American
Statistical Association and Biometric Society
Detroit, Michigan
August 12, 1981
*This paper represents the work and views of the authors. It does
not necessarily represent the policy or position of the U.S.
Environmental Protection Agency.
-------
The Clean Air Act of 1970 committed the public and private
sectors to billions of dollars in expenditures for pollution con-
trol. The authority provided by this Act, including the regulation
of stationary and mobile sources of pollution, is scheduled for
review by Congress this year. Though air pollution is known to
affect agricultural output, materials, and visibility, the major
thrust of the Act and related regulations issued by the U.S.
Environmental Protection Agency is to reduce pollution that causes
adverse impacts on human health. The Clean Air Act specifies that
national ambient air quality standards should be set to protect the
health of sensitive groups with an adequate margin of safety.
A new need to estimate actual health effects, and not just
thresholds, for various air pollutants now exists. Executive Order
12291 issued by President Reagan in February 1981 requires consid-
eration of the potential benefits and costs of all major regula-
tions. It is expected that the revision of primary air standards
over the next two years will be interpreted as major regulations
subject to the requirements of the order.
Recent research has tested empirically the hypothesis that air
pollution affects the incidence of human mortality and morbidity.
This research has played a major role in the establishment of pri-
mary air standards in the United States.
Although potential mortality effects obviously concern policy
makers, morbidity effects should be of equal concern to the
researcher. First, there may be a good deal of acute illness
resulting from air pollution that never results in death. Second,
morbidity is probably a more sensitive indicator of
-------
-2-
pollution effects because of the immediacy of the effect. Third,
it requires a smaller sample size to obtain statistical signifi-
cance since sickness occurs more frequently than death. Fourth,
the measurement of chronic and more severe morbidity can serve as
a verification of the estimated mortality effects.
This paper reviews some of the research techniques that have
been used to estimate the health effects of air pollution. It
highlights some of the reported results and suggests some of the
problems inherent in this type of statistical analysis. Finally,
it provides some new evidence of the morbidity effects from air
pollution based on some recently developed data.
STATISTICAL TECHNIQUES
Four principal approaches have been used to assess the impact
of air pollution on human health. The studies may be broadly
characterized as those involving animals and those involving
humans. Extensive use has been made of animal studies, especially
with respect to mechanisms of damage to pulmonary macrophages.
Animal studies have the great virtues of permitting measurement of
long-term responses to relatively low levels of pollution and of
permitting direct examination of damaged tissue. However, they
have the the fundamental problem of extrapolation to man. With
this link being imprecise at best, animal studies can play only a
minor role in the estimation of pollution control benefits.
Researchers have conducted at least three types of human
studies - chamber experiments, statistical analyses of occupational
exposure, and epidemiologic studies of general populations.
-------
-3-
Although chamber studies on healthy and diseased subjects can
reveal the levels at which various acute effects are observed in
humans, they have other problems such as the ethics of research
and the difficulty of ascertaining chronic effects. Chamber
studies can have great value in identifying thresholds for various
subgroups of the population, but they can play only a limited role
in assessing economic benefits because of the virtual omission of
consideration of chronic effects. They typically measure changes
in metabolism or organ function, as opposed to changes in human
activity. Further, because of the heterogeneous mixture of sus-
pended particles found in the environment, comparisons between
chamber studies of laboratory-produced particles and actual human
exposure is extremely difficult.
Studies of occupational exposures face problems of measuring
actual exposures (particularly a problem for chronic effects
resulting from past exposure) and the typical lack of compatability
between the mix of occupational pollutants and concentrations and
those experienced by general populations. Moreover, there is sub-
stantial evidence of selection by industry and self selection by
workers so that exposured industrial groups are not representative
of the population. Therefore, occupational data will have limited
relevance to the estimation of benefits of control of population
exposures to the air pollutants of concern to policy makers.
Because of the problems with the aforementioned approaches,
assessments of general population benefits from controlling air
pollution must rely on epidemiologic methods in general popula-
tions to detect chronic effects of long-term exposures to low
-------
-4-
levels of pollution. The epidemiologic approach has the great
virtue of being able to estimate the response to the full range of
conditions to which humans are actually exposed. Its main draw-
backs are the difficulty of controlling all potentially con-
founding variables and it can at best establish only correlation
and not causality. Microepidemiologic studies using data on
individuals are preferable to studies that use data on the means
of large groups exposed to different levels of pollution because
data on individuals spans a far larger range of variation than do
data on, say, metropolitan area averages. For example, cigarette
smoking is a major contributor to adverse health states, probably
much more important tnan than air pollution parameters. The use
of citywide averages may obscure statistically significant dose-
response relationships that exist among the nonsmoking subset of
the population. Despite the preferability of data on individuals,
such data are often unavailable, and many researchers have usually
relied upon average responses and average characteristics for pop-
ulation subgroups as the basis for analysis.
Statutory obligations (in the Clean Air Act of 1970) forced
regulators to set air pollution standards to provide a margin of
safety in protecting sensitive groups. This statutory directive
undoubtedly shifted research interests toward defining threshold
levels for sensitive groups. For this purpose, chamber studies
were indispensable. Interest in population epidemiologic
approaches, however, continued principally because it was the
only method for possibly evaluating the incremental impacts on
the general population of alternative air standards.
-------
-5-
RESULTS OF PREVIOUS STUDIES
In the 1960s and into the 1970s, several epidemiologic
studies and chamber experiments were reported. They concerned
the health consequences of both short-term and chronic exposures
to particulates and sulfur oxides (including sulfate).* A number
of the earlier epidemiologic studies of particulates and SO2
focused on severe air pollution episodes such as those in London,
New York City, and Donora, Pennsylvania. In London, 4,000 excess
deaths were attributed to one severe episode in 1952. Morbidity
effects from acute exposures, measured in terms of hospital emer-
gency room visits, doctor visits, and industrial sickness records
also have been demonstrated.
Long-term health effects of chronic exposures, have also been
documented. In this brief summary of this work, we will focus on
two types of studies: epidemiological studies of mortality and
morbidity and approaches attempting to measure the increased risk
of long-term exposure.
Beginning with the pioneering work of Lave and Seskin, a
number of studies have used regression analysis to measure the
long-term effects of sulfur oxides and particulates on
mortality.}J Lave and Seskin attempted to explain several
*We have concentrated on S02, 804, and particulates in this anal-
ysis because of the relative wealth of epidemiologic evidence
linking them to chronic health effects in humans (as contrasted
to CO, NOx, or ozone).
I/ Lave, Lester, and Eugene Seskin, "Air Pollution and Human
Health," Science Vol. 169, August 21, 1970, pp. 722-733, and
Lave and Seskin, Air Pollution and Human Health (Baltimore:
Johns Hopkins University Press, 1977. )
-------
-6-
disease-specific mortality rates using cross-sectional data from
177 standard metropolitian statistical areas. Their work has been
viewed with some degree of caution because of such concerns as:
(1) the estimation bias resulting from omitted explanatory vari-
ables; (2) the omission of personal factors, such as age, sex, and
cigarette consumption; (3) failure to control for in- and out-
migration; (4) crude measurement of exposure, and (5) failure to
fully consider alternative functional forms. Despite these short-
comings, the basic results suggesting an association between air
pollution and mortality have held up over time under careful
scrutiny by subsequent researchers.
Further studies by others have attempted, with varying
degrees of success, to correct one or more of these deficiencies.
Mendelsohn and Orcutt^/ controlled for migration, Crocker
et al. ,3./ for medical inputs and diet, and Lipfert,.!/ for other
socioeconomic variables. Crocker et al. and GregorjL/ also cor-
rected for possible simultaneity with physician services. These
^/Mendelsohn, Robert, and Guy Orcutt, "An Empirical Analysis of
Air Pollution Dose-Response Curves," Journal of Environmental
Economics and Management, Vol. 6, June, 1979.
^/Crocker, T.D., W.D. Schulze, S. Ben-David, and A.V. Kreese,
Methods Development for Assessing Air Pollution Control
Benefits, U.S. Environmental Protection Agency, February, 1979
!/Lipfert, Frederick W., "Statistical Studies of Mortality and
Air Pollution: Multiple Regression Analyses Stratified by Age
Group," mimeo, 1979.
ji/Gregor, John J., Intra-Urban Mortality and Air Quality; An
Economic Analysis of the Costs of Pollution Induced Mortality,
Environmental Protection Agency,Corvallis, Oregon,1977.
-------
-7-
studies have obtained a range of estimates of .01 to about .2 for
the elasticity of mortality with respect to air pollution. As a
result, Freeman!!/ in his synthesis of the literature, chose .05 as
the best point estimate of the elasticity. There is still much
concern, however, about the legitimacy of these studies as evi-
dence of the chronic effects of air pollution.
Fewer studies have used multivariate techniques to estimate
morbidity effects for air pollutants. A few that should be noted
include Crocker et al., who used the Michigan Survey Research
Center interview data; Graves and Krumm,Z/ who examined data on
emergency room visits in Cook County, Illinois; Seskin,8/ who
studied unscheduled visits to health clinics in Washington, D.C.;
and Liu and Yu,2/ who used a novel two-stage approach to deal with
multicollinearity. These studies have typically found associa-
tions between particulates and/or sulfur oxides and morbidity
measures. Many of theses studies, however, have suffered from
serious methodological shortcomings or data deficiencies.
.§/Freeman, A. Myrick, "The Benefits of Air and Water Pollution
Control: A Review and Synthesis of Recent Estimates," for
The Council on Environmental Quality, 1979.
Z/Graves, Philip E., and Ronald J. Krumm, "Pollution and Hospital
Admissions: Evidence from Time Series in Chicago, ERC Research
Report, 78-laf 1979.
8/Seskin, Eugene P., "An Analysis of Some Short Term Health
Effects of Air Pollution in the Washington, D.C. Metropolitan
Area," Journal of Urban Economics, Vol. 63, July 1979.
2/Liu, Ben-Chieh, and Eden S. Yu, Physical and Economic Damage
Functions for Air Pollutants by Receptor, U.S. Environmental
Protection Agency, Corvallis, Oregon, T9~76.
-------
-8-
Because of the difficulties of measuring actual exposures and
also of controlling for diet, smoking, and other personal charac-
teristics in the epidemiological approach, a number of researchers
have resorted to expensive case control studies in which individ-
uals are monitored over relatively long periods for effects such as
cough, sputum, and respiratory disease. These studies have shown
demonstrable correlations between the frequency of symptoms or dis-
eases of the respiratory tract and air pollution levels.
For example, Lunn et al .:L2/ found a significant relationship
between respiratory illness and air pollution among children living
in different parts of Sheffield, England. Rudnik_LL/ documented a
relationship between respiratory illness and more polluted cities
in Poland. In studies of adults, both Ferris^/ and Bouhuys
et al.U/ recorded an assoication between higher levels of TSP and
increases in the rates of respiratory disease symptoms.
Epidemiologic evidence has played a central role in the
establishment of primary air standards in the United States.
i?_/Lunn, J.E., J. Knowelden, and A.J. Handyside, "Patterns of
Respiratory Illness in Sheffield Infant Schoolchildren,"
British Journal Prev. Soc. Med., Vol. 21, 1967.
li/Rudnik, J., "Epidemiological Study on Long-Term Effects on
Health of Air Pollution," Probl. Med. Wieku Rozwojowego,
Vol. 7a (Suppl.l), 1978.
H/Ferris, B.C. Jr., I.T.T. Higgins, M.W. Higgins and J.M. Peters,
"Chronic Non-specific Respiratory Disease in Berlin, New
Hampshire, 1961-1967. A follow-up study," Am. Rev. Resp.
Pis. , Vol. 107, 1973.
il/Bouhuys, A., G.J. Beck, and J.B. Schoenberg, "Do Present Levels
of Air Pollution Outdoors Affect Respiratory Health," Nature,
Vol. 276, 1978.
-------
-9-
Because of the continuing controversy over chronic morbidity
effects at exposure levels near or below the present U.S. stand-
ard, better data and improved model specification will be neces-
sary if epidemiologic evidence is to resolve the issue. In this
spirit, we have obtained access to much better data than has here-
tofore been analyzed.
PROBLEMS WITH ESTIMATING HEALTH EFFECTS
Some of the statistical problems of an epidemiological
approach to estimating the morbidity effects of air pollution are
common to almost all areas of statistical inquiry. Others are
more specifically related to uncertainty in the measurement of air
pollution and health. The statistical problems can be generalized
into three different areas: questions of proper functional form,
data and measurement problems, and specification problems and
uncertainties.
Functional Forms
Most epidemiological research on the health effects of air
pollution has assumed a linear dose-response relationship. The
additive linear functional form implies that each marginal
improvement in air quality results is a constant improvement in
health. In addition, it posits that there are no interactive
effects among pollutants or between pollution and other variables,
such as weather conditions.
Unfortunately, there is little theoretical or empirical jus-
tification for this functional form. Most clinical research has
generated an S-shaped (or logistical) dose-response relationship.
-------
-10-
However, the assumption of linearity may be an acceptable approxi-
mation of the true form over a certain range of air pollution
values. For large changes in air pollution, the linear approxima-
tion will likely be a less accurate estimate of the health effects
than some nonlinear specification.
There are two other potential problems with the linear form.
First, it can predict negative values for the dependent variable,
even if the dependent variable is always observed to be non-
negative. Second, it structurally assumes that the explanatory
variables will have a similar effect over the entire range of the
dependent variable.
If one is attempting to estimate the probability of death or
illness from air pollution and wishes to use a nonlinear func-
tional form, a number of probabilistic models are available,
including logit, probit, and Tobit. Each carries its own assump-
tions about the shape of the dose-response function and about the
error term. With the uncertainty intrinsic to an area of inquiry
such as air pollution and health, it is extremely important that
alternative function forms be tested to compare the goodness-of-
fit, compatability, and predictive results.
Data and Measurement Problems
The second major statistical problem germane to the study of
air pollution and health is that of availability and accurate
measurement of the necessary data, especially of air pollution
exposure. There are three major concerns here: which pollutant
to measure, the relationship of ambient levels to actual exposure,
and the time structure of pollutant exposure.
-------
-11-
Measureraents of ambient air pollution are obtained primarily
through Environmental Protection Agency monitors sited throughout
the country. The measurement techniques have improved dramati-
cally over the last two decades and are becoming more accurate and
specific. For example, EPA is moving towards measuring and
setting standards for inhalable particulates (those less than 15
microns) which are now believed to be more harmful to the respira-
tory system than total suspended particulates. Among the other
pollutants, however, there is still question as to which are the
most important precursors of health effects. Only further clin-
ical study will reduce uncertainty in this area.
Even the most accurate measurement of pollution at the moni-
toring site may not represent the measure of actual pollution
exposure, however. First, there can be significant spatial varia-
tion of the pollutant and the potential receptor around the source
of measurement. Second, individuals working in other areas or in
closed environments will receive different exposures for at least
part of the day. Third, actual exposure will vary according to the
time spent inside/ the degree of insulation and ventilation, and
the prevalent pollutant in the area. For example, carbon monoxide
easily penetrates all structures, while large-order particulates
and reactive pollutants, such as sulfur dioxide and ozone, do not.
Researchers usually make the simplifying assumption that, on aver-
age, the monitored air quality level is somewhat representative of
exposure. Random measurement errors of air pollution exposure
should lead to an estimate of the air pollution effect that is
biased towards zero.
-------
-12-
Another question relating to the use of ambient levels as a
proxy for exposure is that of which statistical measure to use.
The mean, maximum, and minimum pollution levels all have been used
in the past. Each suggests a different kind of relationship
between air pollution and health. Satisfactory answers to this
question would help the policy makers decide if it is chronic
doses above some minimum level or acute doses at high levels that
generate serious health effects.
Finally, there is a question of the time lag of health
effects caused by air pollution. Health effects may well be
related to current levels of pollution, or they may be a result of
cumulative exposure over a number of years. If the latter is the
case, the use of current levels may lead to a biased estimate of
the pollution effect.il/
The choice of the health measure also presents a problem for
morbidity research. Although there are many of sources of data on
illness and hospital visits, few have the standardization and
sample size necessary for a cross-sectional analysis. Thus, most
of the morbidity studies have been either time series analyses for
a given city, studies of emergency room utilization, or simple
two-city or city-rural comparisons using analysis of variance. In
addition, some surveys which have attempted to link overt effects
- e.g., eye stinging, sneezing, coughing, and breathing - with
il/See Daniel M. Violette, "Estimating the Human Health Benefits
of Improved Air Quality," prepared for the National Commission
on Air Quality Benefits Estimation Panel, January 1980, pp.
189-193.
-------
-13-
recorded levels of air pollution. Recently, some other data
bases, which include questions about health care utilization and
health status, have been used. These include the Michigan Survey
Panel Data and the National Center for Health Statistics Health
Interview Survey (HIS).
The HIS has many possibly useful indicators of health status.
For acute illness, it measures restricted activity days, work loss
days, school loss days, bed days, and hospital days. Restricted
activity days (RAD) is the inclusive term for all the ways one can
react to acute illness. It is officially defined in the HIS as a
day in which "a person cuts down on his usual activities for the
whole of that day because of an illness or injury.... It does not
imply complete inactivity, but it does imply only the minimum of
usual activities." In addition, the HIS reports the health condi-
tion or diagnosis that is believed responsible for each RAD.
The variable measuring work loss days is based on the
response to the survey question asking how many days in the last
two weeks did illness or injury prevent one from working.
Obviously, the amount of pain or discomfort tolerated by an indi-
vidual before missing work is a very subjective decision and may
have little to do with any objective measure of illness. In addi-
tion, reported or actual WLDs may be affected by other unmeasured
factors, such as response to the survey or attitude toward work.
Part of the decision to miss work, however, will be based on
socioeconomic and job-related factors that can be measured or
approximately empirically. The statistician can only assume that
there is an underlying distribution that determines the threshold
-------
-14-
of health effects. For each chronic illness, the HIS records the
duration of limitation, the degree of limitation and the diag-
nosis .
The measurement of other, potentially confounding variables
is also important to the study of air pollution and morbidity,
especially since the "true" causative variables to describe mor-
bidity are unknown. Omission of variables that explain the vari-
ation in the dependent variable can lead to serious estimation
problems.
Much of the previous research on health effects has used
aggregate data to proxy socioeconomic variables. For example, in
their mortality study, Lave and Seskin use such variables as the
percentage of population 65 or older, the percentage of the popu-
lation who are nonwhite, and the percentage with income below the
poverty level. Individual data represent a distinct improvement
and allow the researcher to disaggregate the analysis and discern
the variation in the pollution effect across categories, such as
age, race, and sex.
A number of other variables may vary collinearily with air
pollution and may also affect health status. Those most fre-
quently cited factors include occupational exposure, smoking,
migration, indoor pollution, diet, exercise, risk attitude,
weather, and "urbanness."il/ Again, some of these confounding
il/For a detailed description of the problems generated by these
factors and attempts to reconcile them in mortality studies
see Richard Wilson, et al., Health Effects of Fossil Fuel
Burning, (Cambridge: Ballinger Press,1980), pp.191-214.
-------
-15-
effects can be eliminated through using individual data, if
available. By stratifying the sample one can explicitly account
for the effects of occupation, smoking, indoor exposure, and vari-
ous geographic factors.
For some factors, such as diet, exercise, and attitude towards
health care, direct measurement through survey will probably not be
economically feasible. However, many of these influences can be
proxied by socioeconomic surrogates. A statistically significant
pollution effect can be generated artificially only if these fac-
tors vary with air pollution and not with the socioeconomic
proxies.
Specification Problems
Even if acceptable data on pollution exposure, health status,
and their potentially confounding factors are available, improper
specification of an estimated equation can seriously bias the
coefficients. Three different specification problems may be rele-
vant to this area of research: multicollinearity, omitted vari-
ables, and simultaneity.
Since the "true" model of health status is far from certain,
one can only make reasonable guesses about the variables that
should be included in a regression equation explaining illness.
A trade-off is involved. As explanatory variables are added,
multicollinearity may become a problem; specifically, variables
that vary with air pollution may be included so that the estimated
effect of pollution becomes confounded. To limit the number of
explanatory values, however, is to open up the possibility of
omitted variable bias.
-------
-16-
Multicollinearity can exist, and usually does, among air pol-
lution variables. Particulates, sulfur dioxide and sulfates are
all generated from fossil fuel combustion by stationary sources.
On the other hand, hydrocarbons, carbon monoxide, nitrous oxides,
and ozone are primarily the result of fuel combustion from mobile
sources. Multicollinearity can also arise because of the rela-
tionship between air pollution and the other explanatory variables
including socioeconomic and urbanization variables. To the extent
that these factors vary systemically (e.g., both air pollution and
urbanization may increase as we move from the southwestern to the
northeastern United States), discerning the independent influences
of air pollution will be difficult.
Another potentially serious specification error occurs when a
nonrandom explanatory variable, correlated with air pollution, is
omitted from the estimated equation. The included independent
variables then take on explanatory "noise" from both the excluded
variable and the error term and will have biased estimated coeffi-
cients. The degree of the bias will be proportional to (1) the
collinearity between the excluded and air pollution variables, and
(2) the importance of the omitted variable in explaining the depen-
dent variable.
A final specification problem is that of simultaneity. This
would occur if, for example, the explanatory variable "physicians
per capita" is used to explain the variation in health status. If
health status in turn influences the locational decision of physi-
cians, the estimated coefficients will be biased and inconsistent.
-------
-17-
A technique, such as two-stage least squares, could be used to
used to reduce this problem.
PRELIMINARY RESULTS
The data set that comes closest to meeting many of the needs
outlined above is the annual Health Interview Survey (HIS) con-
ducted by the National Center for Health Statistics. This is a
scientific survey of 50,000 households comprising roughly 120,000
people. Besides basic demographic and economic characteristics of
the respondents, the survey includes data on acute and chronic
illness (identified by diagnosis), disability days for those in
and out of the labor force, work and school loss days due to ill-
ness, measures of health care utilization, height and weight,
family income, occupation and industry of employment, and indi-
vidual cigarette consumption. The availability of the latter
makes the data set superior to many others and facilitates the
separation of health effects from cigarette smoking versus air
pollution.
For a preliminary assessment of the effects of air pollution
on morbidity, a data base was created that provides detailed
information about the individuals and their health status, the
levels of several pollutants to which they are exposed, their cli-
mate, and the area where they live. Thus, the HIS results for
1976 were merged with 1976 EPA data on ambient levels of particu-
lates (TSP), sulfur dioxide (SC>2), and sulfates (804); National
Oceanic and Atmospheric Administration data on wind, temperature,
and precipitation; and Census Bureau data on density and other
-------
-18-
urban characteristics. For this analysis, 120 cities, most of
medium size (population of 100,000-600,000), were preselected to
reduce the intracity variation of the air pollution measures.
The initial work focused on determining the contribution of
air pollution to acute illness in adults. The sample of all male
nonsmokers was used to estimate the variation in work loss days
(WLD). This group was chosen for a number of reasons. First, the
sample size of males is greater than that of females. Second,
with nonsmokers the air pollution effects cannot be attributed to
the impact of cigarette smoking.* Also, cigarette smoking may be
determined simultaneously by variables that are used to explain
health status. If smoking were included as an explanatory vari-
able, it would necessitate a slightly more complex and less easily
interpreted model. Third, males tend to have less family and
child-rearing responsibilities outside of work. Therefore, there
is less of a possibility of the occurrence of work loss days not
related to health. Thus, work lossd may be a more accurate indi-
cator of illness for males than for females. Finally, measuring
work loss is mroe conducive to a monetary evaluation of losses.
The dependent variable was hypothesized to be a function of
levels of ambient air pollution, various demographic and socio-
economic variables, the exitence of chronic disease, climate con-
ditions, and measures of "urbanness."
*There still remains the possibility that nonsmokers living with
a smoker will be affected by the smoke. This possibility will be
considered in subsequent work.
-------
-19-
Basically, two pollution variables, total suspended particu-
lates TSP and sulfates, were used. They were selected because of
the preponderance of clinical evidence previously mentioned con-
cerning their health effects and because their measurement tends
to be acceptably consistent. The correlation coefficient of these
two variables was .18.
In the past, concern has been expressed about the choice of
the measure of pollution exposure for a city. For the purposes of
this study, the SAROAD system, EPA's aerometric data bank was
used. For many cities in the sample, there was only one
population-oriented monitor. For cities with more than one
population-oriented monitor, a weighted average of the monitors,
based on the number of observations, was calculated. The TSP and
sulfate measurements were based on recordings from hi-vol 24-hr
gravimetric samplers and hi-vol colorimetric samplers, respec-
tively.
A number of demographic and socioeconomic variables --
including age, race, family income, family size, physicians per
100,000 people, blue- or white-collar worker, and whether or not
the individual was married and currently living with spouse —
were all employed to explain the variation in WLD. These vari-
ables were believed to be important factors in measuring the
degree of and response to pollution exposure and the ability to
partake in preventive care, including direct physician access,
housing and sanitary conditions, diet, exercise, and occupational
exposure. Data limitations preclude a determination of the
degree to which diet and exercise, for example, may affect health.
-------
-20-
It is believed however, that the included independent variables
are ample proxies for the measurement of access to, and use of,
preventive care while at the same time independent enough to pre-
clude problems with multicollinearity.
The existence of chronic disease (a binary variable) will
probably play an important role in determining the frequency of
work loss or activity restriction and was included in the estima-
tion. The climatic conditions faced by individuals, such as pre-
cipitation and average temperature or number of degree days, were
considered because of their potential effect on WLD. Finally,
population density was included as a measure of the general urban
structure.
Multiple regression was selected as the appropriate statisti-
cal tool because of its ability to control for many factors in the
analysis. A major uncertainty in the estimation, however, was the
exact form that the statistical model should take. A special
problem exists in that the dependent variable is truncated at
zero, and that a large percentage of the health status observa-
tions (between 70 and 95 percent) are zero. For this reason,
three different models were tested. Each has different charac-
teristics and assumptions about the structural nature of the
explanatory variables, and each generates a different shape for
the dose-response relationship.
First, the ordinary least squares (OLS) method was used.
Although cheaper to run and computationally simpler, this tech-
nique ignores the zero truncation and can possibly predict nega-
tive values for WLD. In addition, it has the implicit structural
-------
-21-
assuraption that the same factors that cause the existence of any
work loss day (the movement from zero to one or more) also explain
the particular number of WLDs, given that at least one WLD has
actually occurred. One advantage to this technique is that
linearity makes extrapolation easier. The estimated equation took
the following form:
(1) Wj_ = b0 + bi D + b2 A + b3 C + b4 M + b5 U + u
= bX + u
where Wj_ = Number of work loss days
b-[ = Estimated coefficients
D = Demographic and socioeconomic characteristics
A = Air pollution measures
C = Chronic condition
M = Meteorologic variables
U = Urban structure variables
b = Vector of the coefficients
X = Vector of the above independent variable
The partial derivation of work loss days with respect to the air
pollution variable is:
(2) c^Wj/ojA = b2
An alternative technique was to use the Tobit model. This
technique constrains the dependent variable to be non-negative
but still implies the structural assumption described above. An
additional problem is that the shape of the resulting dose-
response curve will have positive first and second derivatives
-------
-22-
(convex from below), which is contrary to the generally accepted
shape of the curve.
The stochastic model underlying the Tobit estimation is:
(3) W2 = bX + u if bX + u > 0
W2 = 0 if bX + u _< 0
with u /-^- N(0 , tf-2)
where W2 = proportion of all work days that are lost days.
The model assumes that there is an underlying stochastic index
I = Xb + u that is observed only when it is positive.
Following Tobin,—' the expected value of W2 in the model is:
(4) EW2 = bX F(Z) +
-------
-23-
Following McDonald and Moffitt,_LZ/ the relationship between
the expected value of all the observations, W2, the expected value
of those values above zero, W2*, anc^ F(Z) is:
(6) EW2 = F(Z) EW2*
The partial derivative of the expected value of all observa-
tions expressed in (6) with respect to air pollution is:
(7) 1-)EW2/9A = F(Z) (£>EW2*/1B> A) + EW2* C^F ( Z ) /£ A)
or the change in W2 for those observations above zero weighted by
the probability of being above the limit plus the change in the
probability of being above zero weighted by the expected value of
of W2 if above zero. With estimates of b and (5~ , both of the
terms on the right-hand side can be calculated.
The third technique used was the logit-linear model. In this
case/ a logit model was first used to determine the probability of
a person's having at least one WLD in the survey period. In the
second stage, OLS is used to determine whether air pollution influ-
ences the number of WLDs, given a person's has had at least one.
This method has the advantage of consistency with statistical char-
acteristics of the data. First, it truncates the dependent vari-
able at zero (and one) by turning the frequency into a probability.
Second it enables the use of different structural forms to explain
the probability of a WLD episode (one or more) and the number of
AZ/McDonald, John F.f and Robert A. Moffitt, "The Uses of Tobit
Analysis," Review of Economics and Statistics, Vol. 62, No. 2,
May 1980.
-------
-24-
WLDs. Third, the estimated equation will assume the form of the
logistic curve, the functional form that is believed to be typical
of many dose-response relationships.
The estimated equation of the logit model is:
(8) log [(W3/(l - W3)] = Xb
where W3 is the probability that WLD > 0 in the two-week survey
period. The left-hand side of (8) is simply the log of the odds
of a work loss day. The change in W3 due to a change in A is:
(9) 'W3A = b2 . W3 (1 - W3)
The equation can also be expressed in terms of probability:
(10) W3 = (1 + e~xb)~1
The expected number of work loss days is the product of the
probability of a nonzero WLD times the number of WLDs,
(11) E(W) = W3 . E(W! |W!>o)
The regression results for the three models using the sample
of all male nonsmokers, age 18-65, are presented in Table 1. The
results of the three estimates are generally consistent with
prior expectations. In all three models, particulates are shown
to be related in a positive and significant way to work loss days.*
*The value of the particulate variable ranges from 43 to 150.
Subsequent analysis suggested that the particulate coefficient
was statistically significant from zero when TSP was as low as
65 to 70 micrograms. The current annual standard is 75 micro-
grams.
-------
-25-
The mean level of sulfates does not appear to affect WLDs. This
result was confirmed when each of the pollution variables was run
separately in the regression.
There may be a number of explanations for this result.
First, the techniques for measuring sulfates are not believed to
be very accurate. The errors in measurement may lead to serious
underestimation of the coefficient. Second, the particulate
measure may be proxying a number of variables; it measures coarse
and inhalable particles as well as sulfate and nitrate particles.
Third, there may be estimation problems resulting from collinear
or omitted variables.
Using equations, (2), (7) and (9) the partial affects of air
pollution can be calculated. The results indicate that the OLS,
Tobit and logit, models predict that a one unit change in TSP
will, at the mean, change the probability of a work loss day in
the two week period by .00177, .00118, and .0013 respectively.
For the OLS model, the work loss-particulate elasticity, measured
at the mean, is 0.57.
The models also show that chronic illness is associated with
more WLDs. The OLS model has age and average temperature related
positively to WLD and blue-collar employment related negatively to
WLD. Comparing this model to the logit-OLS model some interesting
distinctions can be made.
Estimation (C) suggests that air pollution, measured by par-
ticulate levels, will affect the probability of a WLD episode.
However, estimation (D) suggests that air pollution does not influ-
ence the number of days lost, given an episode has occurred.
-------
The
CONSTANT
PMEN
SULF
AGE
CHRON
RACE
MARR
INC
TEMP
PRECIP
DENS
BLUE
F
2
R2
Estimation
(A)
OLS
-. 29
.00177b
-.0083
.004a
.22a
-.02
-.03
-.0045
.0063b
-.0013
.004
-.06C
4.07a
—
.01
-26-
TABLE 1
for WLD for Male
(B)
TOBIT
-1.2
.00228b
-.013
-.0003
.16a
-.027
-.0012
-.0014
.0014
.0024
.0069
-.017
—
113. 2a
_ _
NonSmokers
(C)
LOG IT
-3.97
.00614b
-.033
-.0076C
.35a
-.04
.37a
-.0023
.005
.009
.003
.157C
—
25. 8a
_ —
(N=4473)
(D)
OLS(W>1)
-.61
.0032
-.0055
.0746a
1.25
-.36
-.9a
.0041
.088a
-.04C
-.027
-1.33a
6.38a
—
.19
a = Significance at 1% level
b = Significance at 5% level
c = Significance at 10% level
-------
-27-
PMEN = annual arithmetic mean of participates (micrograms/
cubic meter)
SULF = annual arithmetic mean of sulfates (micrograms/cubic
meter)
AGE = age
CHRON = number of chronic conditions
RACE = 1 if nonwhite
0 if white
MARR = 1 if married and living with spouse
0 if unmarried or married and not living with spouse
INC = family income (thousands)
TEMP = annual mean temperature
PRECIP = annual precipitation
DENS = population density (thousands)
BLUE = 1 if blue-collar worker
0 if not
-------
-28-
Further evidence of this result is obtained by applying Eq(7) to
the Tobit estimates. The result, after taking the partial deriva-
tive, indicates that the first term in the right-hand side of
Eq(7) — the change in WLD for those observations above zero — is
small (.0000175) relative to the second term -- the change in the
probability of being above zero (.0001). Thus, the total effect
of air pollution on WLD is driven more by adding to the probabil-
ity of an episode than by affecting the actual number of WLDs.
The estimated equations (C) and (D) also show that being
married and working in a blue-collar job increase the probability
of a work loss episode but have a negative effect on the number of
days lost. Age has the reverse affect: it slightly decreases the
probability of an episode but has a strong positive influence on
the number of days lost. The latter result is confirmed by the
linear estimate (A).
The sensitivity of the variables was further tested by con-
sidering various other subsamples and specifications. For
example, the model was estimated for those aged 45-65 and for
those with chronic conditions. In each, the magnitude of the est-
imated air pollution coefficient increased and remained signifi-
cant.
In addition, other weather and urban variables were substi-
tuted into the regresssion with no appreciable change in the esti-
mates. The results of these statistical tests appear to confirm
the hypothesized association between air pollution and morbidity.
-------
-29-
CONCLUSION
The pending revision of U.S. primary air standards and the
analytic requirements of Executive Order 12291 will force regula-
tors to examine closely the data showing possible human health
effects from air pollution. Four principal approaches have been
used to assess these health effects: animal experiments, chamber
studies on human, statistical analyses of occupational exposures,
and epidemiologic studies in general populations. Of the four
techniques, the epidemiologic approach has the great virtues of
including the full range of exposures to air pollution, the vari-
ous combinations of air pollutants to which humans are exposed,
and other possibly synergistic and antagonistic parameters such as
smoking and medical care. Of course, the very complexity of these
interactions necessitates that great care be given to model speci-
fication and the inclusion of all relevant factors.
A number of studies have investigated the relationship
between air pollution and human morbidity and mortality using the
epidemiologic approach. Sulfur oxides and particulates have been
linked to both morbidity and mortality effects in studies of pol-
lution episodes as well as from long-term exposure to lower levels
of pollution. Critics have identified several shortcoming that
plague many of these studies including (1) omitted variables such
as diet and cigarette consumption, (2) poor control for migration,
(3) crude measurement of exposure, (4) failure to fully consider
alternative functional forms and possible simultaneous relation-
ships, and (5) use of city average data rather than data on indi-
viduals within cities.
-------
-30-
Th is study uses a data set on individuals, the Health
Interview Survey, conducted by the National Center for Health
Statistics to examine further the relationship between air pollu-
tion and various measures of morbidity. The scientific survey of
50,000 households in the HIS includes data on demographic charac-
teristics, acute and chronic illness, disability days for those
in and out of the labor force, work and school loss days due to
illness, measures of health care utilization, family income,
occupation, and cigarette smoking. This data was merged with EPA
data on ambient levels of particulates, sulfur dioxide, and
sulfates; NOAA weather data; and Census Burea data on density and
other urban characteristics.
Three regression specifications were used: logit, Tobit, and
ordinary least squares. The resulting estimates were generally
consistent with prior expect ions. In all three models, using a
sample of male nonsmokers, particulates were shown to be related
in a positive and significant way to work loss days. Various
tests of the sensitivity of the results using subsamples of the
data and alternative specifications all appear to confirm the
hypothesized association between air pollution and morbidity.
------- |