&EPA
United States
Environmentai Protection
Agency
Health Effects Research EPA-600/1-78-051
Laboratory June 1978
Research Triangle Park NC 27711
Research and Development
Population at Risk
to Various Air
Pollution
Exposures :
Data Base "Popatrisk"
-------
RESEARCH REPORTING SERIES
Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gories were established to facilitate further development and application of en-
vironmental technology. Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields.
The nine series are:
1. Environmental Health Effects Research
2. Environmental Protection Technology
3. Ecological Research
4 Environmental Monitoring
5. Socioeconomic Environmental Studies
6. Scientific and Technical Assessment Reports (STAR)
7. Interagency Energy-Environment Research and Development
8. "Special" Reports
9. Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL HEALTH EFFECTS RE-
SEARCH series. This series describes projects and studies relating to the toler-
ances of man for unhealthful substances or conditions. This work is generally
assessed from a medical viewpoint, including physiological or psychological
studies. In addition to toxicology and other medical specialities, study areas in-
clude biomedical instrumentation and health research techniques utilizing ani-
mals — but always with intended application to human health measures.
This document is available to the public through the National Technical Informa-
tion Service, Springfield, Virginia 22161.
-------
EPA-600/1-78-051
June 1978
POPULATION AT RISK TO VARIOUS AIR POLLUTION EXPOSURES:
DATA BASE "POPATRISK"
by
Sandor J. Freedman Elsa Lewis-Heise
Joseph D. Wilson Albert V. Hardy
System Sciences, Inc.
Chapel Hill, North Carolina 27514
Contract No. 68-02-2269
Project Officer
William C. Nelson
Statistics and Data Management Office
Health Effects Research Laboratory
Research Triangle Park, North Carolina 27711
Health Effects Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
-------
DISCLAIMER
This report has been reviewed by the Health Effects Research Laboratory,
U.S. Environmental Protection Agency, and approved for publication. Approval
does not signify that the contents necessarily reflect the views and policies
of the U.S. Environmental Protection Agency, nor does mention of trade names
or commercial products constitute endorsement or recommendation for use.
ii
-------
FOREWORD
The many benefits of our modern, developing, industrial society
are accompanied by certain hazards. Careful assessment of the relative
risk of existing and new man-made environmental hazards is necessary
for the establishment of sound regulatory policy. These regulations
serve to enhance the quality of our environment in order to promote the
public health and welfare and the productive capacity of our Nation's
population.
The Health Effects Research Laboratory, Research Triangle Park,
conducts a coordinated environmental health research program in toxicology,
epidemiology, and clinical studies using human volunteer subjects.
These studies address problems in air pollution, non-ionizing
radiation, environmental carcinogenesis and the toxicology of pesticides
as well as other chemical pollutants. The Laboratory participates in
the development and revision of air quality criteria documents on
pollutants for which national ambient air quality standards exist or
are proposed, provides the data for registration of new pesticides or
proposed suspension of those already in use, conducts research on
hazardous and toxic materials, and is primarily responsible for providing
the health basis for non-ionizing radiation standards. Direct support
to the regulatory function of the Agency is provided in the form of
expert testimony and preparation of affidavits as well as expert advice
to the Administrator to assure the adequacy of health care and surveillance
of persons having suffered imminent and substantial endangerment of
their health.
The data base described in this report has been developed to provide
the capability to examine easily and quickly the available county level
information on air quality; population including social-economic,
demographic, and migration factors, climatology, and mortality. The
original information has been collected by various government agencies but
has not previously been combined into a single data file. Use of this
file will permit more accurate estimates of the populations exposed to
various air pollutant levels and a better assessment of the geographic
variability between air quality and mortality.
F. G. Hueter, Ph. D.
Acting Director,
Health Effects Research Laboratory
iii
-------
ABSTRACT
The work reported herein was undertaken to provide the EPA with a user-
oriented data base containing recent county-based information, for all
counties in the contiguous United States, on population demographics,
population mobility, climatology, emissions, air quality, and age-adjusted
death rates.
The completed data base, called "POPATRISK," contains approximately
27.5 million characters and is in SYSTEM 2000, Version 2.80 format, facili-
tating access with minimal user computer training. Population demographics
are as of the 1970 Census; population mobility is described spanning the
years 1965 to 1970 for 6 sex-race categories in 7 age groupings for both
"in" and "out" migrants; climatology information contains county summaries
of temperature, precipitation and hours of sunshine; county point and area
source emission estimates are provided for 5 criteria pollutants—TSP, SO.,
NO,-, CO, and Ozone—based on the NEDS-USER file; air quality information is
based on 1974 data contained in SAROAD; age-adjusted death rates were computed
for the combined years 1969, 1970, and 1971 for 4 sex-race categories in
50 groupings of ICDA categories (8th revision).
Sample applications of the data base are provided herein. A detailed
manual documenting the county identification codes to be used in retrieving
data has been provided under separate cover. Also included in the geocoding
manual is a cross-reference table showing the relationships among POPATRISK
and the SAROAD, .FIPS, Census and NCHS geocoding schemes.
This report was submitted in fulfillment of Contract No. 68-02-2269 by
Systems Sciences, Inc. under the sponsorship of the U.S. Environmental
Protection Agency. This report covers a period from October 29, 1975, to
December 31, 1977, and work was completed as of December 31, 1977.
iv
-------
CONTENTS
Abstract ii
Figures iv
Tables v
Acknowledgements vii
1. Introduction 1
2. POPATRISK Data Base Features 3
System 2000 Version 2.80 Features 3
POPATRISK Data Base Structure 4
Components and Sources of Data 9
Variations in Geocoding Schemes Among
Sources 24
3. Use of Data Base "POPATRISK" 26
4. Sample Retrievals Using POPATRISK 29
5. Data Base Development Procedures 44
Considerations Leading to Choice of
Data Items 44
History of Development 46
Methodologies for Processing and Loading .... 47
Use of Removable Disk Pack and
Archiving Procedures . 55
Appendix A: Resolution of Geocoding
Discrepancies for Massachusetts
NEDS and SAROAD Data 69
Appendix B: Documentation of Data Tapes and
Runstreams 75
Appendix C: Other Studies 8S
-------
FIGURES
Number Page
1 POPATRISK heirarchical structure 8
2 Overview of POPATRISK data sources and development
procedures, Phase 1 51
3 Overview of POPATRISK data sources and development
procedures, Phase 2 58
C-l 1973 Monitoring Station Sample Distribution 92
C-2 Fit of present model with all valid 1973 SAROAD
population based monitoring 115
C-3 "Inherent" variability of SAROAD data for intracounty
population based monitoring 116
C-4 Comparison of fitted normal curve for predicted air
quality and "inherent" variability 117
2
C-5 Variation of R with data set size 118
vi
-------
TABLES
Number Page
1 POPATRISK Data Base Definition in System 2000 Format 5
2 POPATRISK Data Base Component Description 10
3 Aggregations of Mortality Categories for POPATRISK
Age-Adjusted Deaths 21
4 Cluster Descriptions 23
5 Illustrative Retrieval of NEDS Emissions Data for
Massachusetts in tons/year 30
6 Illustrative Retrieval of SAROAD Site and Monitoring
Data for Durham County, NC 32
7 Illustrative Retrieval of Counties with Highest Calculated
Mean S02 Monitoring (yg/m3) 35
8 Illustrative Retrieval of Counties with Highest Age-Adjusted
Auto Fatalities for White Males 37
9 Illustrative Retrieval of Counties with Highest White Male
Death Rate from Bronchitis 40
10 Illustrative Retrieval of Counties with Highest White
Female Age-Adjusted Death Rate from Breast Cancer 42
11 Illustrative Retrieval of Counties with Highest White
Death Rate from Malignant Neoplasm Cluster 1 43
12 Unallocated In-Migration Totals for Selected Counties .... 48
13 State Unallocated In-Migration Totals for All Races 49
14 Assignment of Massachusetts SAROAD Monitoring to
Census Counties 53
15 Independent Cities Recognized and Coded in Data Base 55
16 Assignment of Virginia Independent Cities to POPATRISK
Counties 56
A-l NEDS Points Apportioned into Census Counties
by using "Dummy" UTM Coordinates 72
A-2 NEDS Points Apportioned into Census Counties
by Tracing Their Name and Address 73
A-3 Total Emissions Summed by Type and by County 74
A-4 Massachusetts Area Emissions Totals by POPATRISK Counties . . 75
vii
-------
TABLES (continued)
Number Page
C-l Listing of Monitoring Methods Included In and
Excluded from Survey 90
C-2 Percentage of States' Population in Counties with
One or More Monitoring Stations 99
C-3 Cumulative Frequency Tables for Valid 1973 SAROAD
Population Based Monitoring of TSP 110
C-4 Maximum Correlation Values, R, for a Sampling of
Different Models Investigated Ill
C-5 Sample Comparison between Maximum Correlation Coefficients R
and Those 'obtained from Model 112
C-6 Sample Comparison between Chosen Model and Other
"Simple" Models 113
C-7 Effect on Regression Model of Removing 5 "Outlier"
Points from 58 Point Data Set 114
C-8-a County Emission Data and Land Use Characteristics
for North Carolina 125
C-8-b Descriptive Statistics for Model 126
C-9 Vegetation Surface Area Weighting Factors 127
C-10 Spearman Correlations: Annual Average S0» Concentrations . . 128
C-ll Multiple Regression Models: Annual Average SO^
Concentrations 129
viii
-------
ACKNOWLEDGEMENTS
The continuing support and technical direction of the EPA Project Officer,
Dr. William C. Nelson, are gratefully acknowledged. Thanks are due also to
Dr. Victor A. Hasselblad of HERL for many helpful conversations.
John Van Bruggen of HERL provided information on the ICDA aggregations in
current use at HERL.
Paul Comely, M.D., Ph.D., of System Sciences, Inc., was involved in the
interpretation of the cluster analysis and was most helpful also in inter-
preting some fine points of ICDA coding. Christopher Gordon of System Sciences,
Inc. assisted in developing the geocoding scheme for the independent cities
in Virginia and also in resolution of other geocoding problems.
Dr. I-Li Huang and Ms. Susan Alston, formerly of System Sciences, Inc.,
were responsible for some work on the data base in the intial phases of
this contract.
The authors are pleased to acknowledge also the assistance of two
consultants. Dr. Richard Kopec of the Geography Department, University of
North Carolina at Chapel Hill, was responsible for the reallocation of NEDS
point emission sources to the Massachusetts FIPS counties. Mr. John Richards
of PEDCo Environmental Specialists, Chapel Hill, North Carolina, carried
out the study on S0? - ground deposition parameters.
Finally, thanks are due to Ms. Signe Wetrogen of the Population Division,
Bureau of Census, for her continued interest in the in-out migration data
and for providing a special compilation of New York City migration data.
ix
-------
SECTION 1
INTRODUCTION
Compilation of primary source material of nationwide data on popula-
tion demography and mobility, climatology, emissions of pollutants, air
quality and deaths by ICDA cause is the responsibility of a number of
different government agencies. The data has heretofore been available
mostly in specific format which tends to be unique for each data type. The
investigator of potential relationships among those variables has, therefore,
often been required to reformat, condense, aggregate and/or disaggregate
data which may have conflicting schemes of identifying geographical areas,
population age groups and the like.
The main thrust of the work conducted under this project was to provide
the EPA with a coherent source of recent data for these variables in which
the user, with a minimum of computer training, can retrieve information
pertinent to his investigations with maximum speed and flexibility. The
user can define the format within the wide capabilities of SYSTEM 2000,
Version 2.80.
All compilations of data require some choice of what items should be
included and this work is no exception. Considerations leading to this
choice were guided by the desirabilities of time, compatibility among data
sets, reasonableness in data base size and expected retrieval costs, and
anticipated usefulness to the EPA as defined through many discussions between
the contractor staff and the EPA project officer.
The sections to follow provide a detailed description of the data base
structure and historical development. In particular, Section 2 documents
the unique features >of the POPATRISK data base including a discussion of the
meaning, units, and primary sources of each individual data item. Geocoding
unique to this data base, which was necessitated by the variance in geocoding
-------
schemes used by the different primary sources, is summarized in this report
and detailed under separate cover in the report "POPATRISK Geocoding Manual."
Section 3 gives the reader an overview of techniques for performing retrie-
vals although the discussion is by no means meant to be complete. Reference
is made to the MRI SYSTEM 2000 Reference Manual for more detailed discus-
sion of the use of a SYSTEM 2000, Version 2.80 data base. Section 4
illustrates some sample results using POPATRISK and is a brief indication of
potential uses of the data base. Development procedures used in this work
are documented in Section 5. Appendix A contains a discussion of methodologies
used to resolve geocoding discrepancies for Massachusetts. Appendix B
contains detailed documentation on tapes and runstreams provided EPA with
this report. Appendix C describes other studies performed in this work.
In particular, a mapped survey of monitoring activity for the criteria
pollutants in 1973 and a report of studies correlating ambient TSP levels
with county emission estimates are presented.
-------
SECTION 2
POPATRISK DATA BASE FEATURES
SYSTEM 2000, VERSION 2.80 FEATURES
The POPATRISK data base design and contents are a result of analysis
and evaluation of available data sources and their potential usefulness to
investigators in population-at-risk studies. The SYSTEM 2000 data base
management system was selected as the most versatile and cost-effective
medium for storage and retrieval of POPATRISK data.
SYSTEM 2000 is a general-purpose data base management system which
operates on the UNIVAC 1100 series computers as well as IBM and CDC computers.
It. provides the user xtfith a comprehensive set of data base management
capabilities for developing and utilizing information systems tailored to
the specific requirements of the user.
SYSTEM 2000, Version 2.80 offers a full range of capabilities for modi-
fication of a data base definition, data base update, and quick-response
retrievals. Specifically, a SYSTEM 2000 data base design is flexible, in
that additions can be made to the data base definition (structure) at any
time and many changes to the existing structure are easily accomplished.
The Immediate Access feature offers an English-like user-oriented language
which can be used with minimal computer training to update or retrieve data
in the data base. Along with the Report Writer feature, Immediate Access
provides the user capability to retrieve and print data quickly and in the
desired format. Update features enable the user to ADD, CHANGE, or REMOVE
data at any level from the entire logical entry down to the individual com-
ponent level.
A further extension of SYSTEM 2000 retrieval and update features which
was used extensively in development of POPATRISK is the Procedural Language
Interface feature. It enables the user to manipulate data in a data base
-------
through interfacing with COBOL, FORTRAN, or assembly language programs.
This feature is particularly useful both in processing and loading large
amounts of data and in performing complicated retrievals from logically
unrelated sets of data.
A detailed discussion of the SYSTEM 2000 data base management system,
its capabilities and command usage may be found in the SYSTEM 2000
Reference Manual, the Version 2.80 Newsletter, and Procedural Language
Interface (UM-2).
POPATRISK DATA BASE STRUCTURE
Table 1 shows the POPATRISK data base definition in SYSTEM 2000
format. As an aid to understanding the data base structure, Figure 1 is
provided to illustrate the hierarchical relationships of the data within
the data base. It is important to note that the data base is described in
a county based format since the smallest geographical entity defined in
this work is the county. It was decided to adopt a county coding convention
adhering as closely as possible to the current list of NADB SAROAD codes to
provide a system most adaptable to the data base and to the user. Some
differences do exist, however, between the POPATRISK geocoding scheme and
the SAROAD scheme.. These are documented below. Data in the data base are
stored in groupings referred to as logical entries. One logical entry
contains all information pertaining to a particular county, thus there is
one logical entry of data for each county represented in the data base.
Within a logical entry, data are organized in tree structure into repeating
groups of either related or disjoint data sets. Each repeating group
represents a different type of data and components within the group define
in detail the data sets contained therein.
In Figure 1, each level 1 repeating group is a descendant of the
Level 0 or Entry Level data and the Level 2 repeating group is a descendant
of a Level 1 repeating group. All data sets connected by a line follow a
branch of the tree structure and are considered related data sets, however,
they are disjoint to data sets connected to different lines (branches) from
the Entry Level. For example, MONITORING DATA repeating group C300 is a
-------
TABLE 1. POPATRISK DATA BASE DEFINITION IN SYSTEM 2000 FORMAT
SYSTEM RELEASE NUMBER 2t80B
DATA BASE NAME is POPATRISK
DEFINITION NUMBER 11
DATA BASE CYCLE 25M7Q
1* STATE ID (NAME XX)
2* COUNTY ID (NAME xxxx>
3* ST-COUNTY ID (NAME XUM
10* TOTAL POPULATION (INTEGER NUMBER 9(8))
n* PERCENT NONWHITE (DECIMAL NUMBER 99.9)
12* PERCENT BLACK (DECIMAL NUMBER 99,9)
13* PERCENT OVER 6<4 (DECIMAL NUMBER 99.9)
i«t* PERCENT FEMALE (DECIMAL NUMBER 99.9)
15* POP PER SQ MI (DECIMAL NUMBER 9(5).9)
16* PERCENT URBAN (DECIMAL NUMBER 99.9»
17* * EMP OTHER CNTY (NON-KEY DECIMAL NUMBER 99.9)
18* INCOME >10K/<5K (NON-KEY DECIMAL NUMBER 99,99)
19* INCOME >15K/<1QK (NON-KEY DECIMAL NUMBER 99.99)
20* INCOME >15<5K (NON-KEY DECIMAL NUMBER 99.99)
21* * USE PUB TRANS (NON-KEY DECIMAL NUMBER 99.9)
22* ALCOHOL SALE >18 (DECIMAL NUMBER 999.9)
23* GAS SALES/CAPITA (DECIMAL NUMBER 999,9)
30* MEAN TEMP jAN (NON-KEY DECIMAL NUMBER 99,9)
31» MEAN TEMP jUL (NON-KEY DECIMAL NUMBER 99.9)
32* MEAN ANNUAL TEMP (DECIMAL NUMBER 99.9)
33* MEAN PRECIP JAN (NON-KEY DECIMAL NUMBER 99,99)
3H* MEAN PRECIP JUL (NON-KEY DECIMAL NUMBER 99.99)
35* MEAN ANNUAL PPEC (DECIMAL NUMBER 99.99)
36* HOURS SUN JAN (NON-KEY INTEGER NUMBER 999)
37* HOURS SUN jUL (NON-KEY INTEGER NUMBER 999)
38* REL HUMIDITY JAN (NON-KEY DECIMAL NUMBER .99)
39* REL HUMIDITY JUL (NON-KEY DECIMAL NUMBER .99)
fO* CNTY ELEVATION (INTEGER NUMBER 9(5))
*•!* CNTY LATITUDE (NON-KEY INTEGER NUMBER 9999)
^2* * WMALE IN MIG (DECIMAL NUMBER 99,9)
43* * WMALE OUT MIG (DECIMAL NUMBER 99.9)
MH* X WFEM IN MIG (DECIMAL NUMBER 99.9)
H5* % WFEM OUT MIG (DECIMAL NUMBER 99.9)
<«6* * NWMALE IN MIG (DECIMAL NUMBER 99.9)
*»7* % NWMALE OUT MJG (DECIMAL NUMBER 99.9)
*»8* * NWFEM IN MIG (DECIMAL NUMBER 99.9)
*<9* « NWFEM OUT MlG (DECIMAL NUMBER 99.9)
100* NEDS EMISSIONS (RG>
101* NEDS POLLUTANT (INTEGER NUMBER 9(5) JN 100)
102* COUNTY TOTAL (NON-KEY DECIMAL NUMBER 9(8),999 IN 100)
103* POINT SOURCES (NON-KEY DECIMAL NUMBER 918).99? JN 100)
(continued)
-------
TABLE 1 (contined)
200* SAROAD MONITORING SITES
206* CITY POPULATION (NON-KEY INTEGER NUMBER 9(BJ IN 2DO>
207* UTM ZONE (NAME XX IN 200)
208* EASTING (NAME X(8) IN 200>
209* NORTHING (NAME X(7) IN 200)
210* ADDRESS (NQNi-KEY NAME X ( 11 ) IN 200)
211* TYPE (NAME XX IN 20C»
212* ELEV ABOVE GND (INTEGER NUMBER 999 IN 200)
213* ELEV ABOVE SEA (INTEGER NUMBER 9999 IN 200)
300* MONITORING DATA (RG IN 200)
301* POLLUTANT (INTEGER NUMPER 9(5) IN 300)
302* METHOD (NAME XX IN 300)
303* INTERVAL (NAME X IN 300>
301* « OBSERVATIONS (NON-KEY INTEGER NUMBER 9(5) IN 300)
305* GEOMETRIC MEAN (DECIMAL NUMBER 9(5).99 IN 300)
306* STD DEVIATION (NON-KEY DECIMAL NUMBER 999.99 IN 300)
307* 70TH PERCENTILE (NON-KEY DECIMAL NUMBER 9(5).99 IN 300)
308* 90TH PERCENTILE (NON-KEY DECIMAL NUMBER 9(5j.99 IN 3001
309* 99TH PERCENT1LE (NON-KEY DECIMAL NUMBER 9(5).99 IN 300)
3io» HIGHEST VALUE (DECIMAL NUMBER 9(5).99 IN 300)
3ii* 2ND HIGHEST VAL (DECIMAL NUMBER 9<5)«99 IN 3005
312* LOWEST VALUE (NON-KEY DECIMAL NUMBER 9(5).99 IN 300)
100* AIR QUALITY (RG)
HOI* AQ POLLUTANT (INTEGER NUMBER 9(5) IN 100)
102* MEAN POP SITES (DECIMAL NUMBER 9(5).99 IN MOO)
103* MEAN SRCE SITES (DECIMAL NUMBER 9(5).99 IN HOO)
101* MEAN BKGND SITES (DECIMAL NUMBER 9(5).99 IN 100)
105* MEAN TSP PREDICT (DECIMAL NUMBER 9(51.99 IN 100)
600* PERCENT IN-OUT MIGRATION (RG>
601* AGE GROUP (INTEGER NUMBER 99 IN 600)
602* * MALE IN TOT (DECIMAL NUMBER 99.99 IN 600)
603* X MALE IN ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600)
601* X MALE OUT TOT (DECIMAL NUMBER 99.99 IN 600)
605* % MALE OUT ALC (NON-KEY DECIMAL NUMBER 99.99 JN 600)
606* X FEM IN TOT (DECIMAL NUMBER 99.99 IN 600 I
607* X FEM IN ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600 )
608* X FEM OUT TOT (DECIMAL NUMBER 99.99 IN 600)
609* % FEM OUT ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600)
610* % WMALE IN TOT (DECIMAL NUMBER 99.99 IN 600)
611* X WMALE JN ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600)
612* X WMALE OUT TOT (DECIMAL NUMBER 99.99 IN 600>
613* X WMALE OUT ALC (NON-KEY DECIMAL NUMBER 99,99 IN 600)
611* X WFEM IN TOT (DECIMAL NUMBER 99.99 IN 600)
615* X WFEM IN ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600)
616* X WFEM OUT TOT (DECIMAL NUMBER 99.99 IN 600)
617* X WFEM OUT ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600)
(continued)
-------
TABLE 1 (continued)
618* * NWMALE IN TOT (DECIMAL NUMBER 99.99 IN 600)
619* X NWMALE IN ALC (NON-KEY DECIMAL NUMBER 99«99 IN 600)
620* * NWMALE OUT TOT (DECIMAL NUMBER 99.99 IN 600)
621* * NWMALE OUT ALC (NON-KEY DECIMAL NUMBER 99.99 IN 6001
622* * NWFEM JN TOT (DECIMAL NUMBER 99»99 IN 60D>
623* % NWFEM JN ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600)
62M* * NWFEM OUT TOT (DECIMAL NUMBER 99.99 IN 600)
625* I NWFEM OUT ALC (NON-KEY DECIMAL NUMBER 99.99 IN 600)
700* AGE-ADJUSTED DEATH RATES (RGJ
701* CAUSE OF DEATH (INTEGER NUMBER 99 IN 700)
702* ADJ TOTAL D-RATE (DECIMAL NUMBER 9999.99 IN 700)
703* ADJ WMALE D-RATE (DECIMAL NUMBER 9999.99 IN 700)
70«»* ADJ WFEM D-RATE (DECIMAL NUMBER 9999,99 IN 700)
705* ADJ NWMAL D-RATE (DECIMAL NUMBER 9999.99 IN 700)
706* ADJ NWFEM D-RATE (DECIMAL NUMBER 9999.99 IN 700)
800* AGE-SPECIFIC POPULATION (RG)
801* AGE GROUPING (INTEGER NUMBER 99 IN 800)
802* TOTAL POP (NON-KEY INTEGER NUMBER 9(8) JN 800)
803* FEMALE (NON-KEY INTEGER NUMBER 9(8) IN 800>
80*** WHITE (NON-KEY INTEGER NUMBER 9(8) IN 800)
805* WHITE FEMALE (NON-KEY INTEGER NUMBER 9(8) IN 800)
806* NONWHITE MALE (NON-KEY INTEGER NUMBER 9(8) IN 800)
807* TOTAL URBAN POP (NON-KEY INTEGER NUMBER 9(8) IN 800)
808* URBAN FEMALE (NON-KEY INTEGER NUMBER 9(8) IN 800)
809* URBAN WHITE (NON-KEY INTEGER NUMBER 9(8) JN 800)
810* URB WHITE FEMALE (NON-KEY INTEGER NUMBER 9(8) IN 800)
User-defined functions:
1001* AREA SOURCES (DECIMAL FUNCTION ((C102-CI 03 ) ) J
1002* MALE (INTEGER FUNCTION ( (C8D2-C8D3 ) ) )
1003* NONWHITE (JNTEGER FUNCTION ( (C802-C80H ) ) )
100M* WHITE MALE (INTEGER FUNCTION ( (C80H-C80S ) ) )
1005* NONWHITE FEMALE (INTEGER FUNCTION ( (C8Q3-C805 ) ) )
-------
1 ,
\
V i '
\
CO
V [£
' t—
\ 0 «
o /-
(N £
u "^
c
u p=
w
1 ^ _. ?^
P^ F-i ' 1 f 1 ^
ti < <: z 5
« O W i-l O
u pa cj
J" ^-1 K O
g O H J5
[r| |-^| f , ~ r ]
^ g W S H
U3 H y^
I
I
O
5^
0 K
. O C
U n
s^
o
JC
f^
i a. z
0! ^H O <
o fe e o
s js a d 1
-<
•H
OJ
W
en
M
Pi
PH
O
0)
(-1
3
to
O
g
U)
-------
descendant of C200 SAROAD MONITORING SITES and thus, entry level data 1*
through 41*, repeating group C200, and repeating group C300 are related and
belong to the same family tree. Logically, this means that for a county C2
there are a number of data sets in C200 containing information on each
monitoring site for the county, and for each monitoring site data set, there
are a number of data sets in C300 containing information on each pollutant
monitored by each site.
COMPONENTS AND SOURCES OF DATA
Table 2 is a quick-reference description of each component included
in POPATRISK. Each component number is listed in numerical data base
definition order along with the component name, description, units, picture
in COBOL notation, and original source of the data item. Certain components
in the database definition (Table 1) are described as 'NON-KEY.' All
unlabelled components are 'KEY' except those labelled RG (repeating group).
Key components in Table 2 are flagged with a + symbol to the left of the
component number. It is anticipated that these KEY items will be frequently
needed as selection criteria for other data in the data base. A component
must be KEY to be used in a retrieval WHERE clause. For example, for KEY
component C305, one could easily obtain the number of stations reporting a
yearly geometric mean greater than a specified value for any pollutant, for
any set of counties. Because of the significant cost of loading and main-
taining KEY'ed components, the KEY option is used only for those components
considered useful as selection criteria.
State and county codes (components Cl through C3) conform to the SAROAD
county assignment protocol except for isolated cases which are discussed
later in this section and also in the 'POPATRISK Geocoding Manual.' General
county description data (components CIO through C41) were extracted from
a master file created by SSI. Population figures are based on the 1970
Census and, with the exception of TOTAL POPULATION, were extracted from a
Bureau of the Census 1972 County and City Data Book computer file containing
data obtained from representative samples of population rather than a complete
count. To maintain consistency as much as possible and to prevent possible
confusion, TOTAL POPULATION CIO figures were changed to reflect the same complete
-------
;z;
o
M
H
Q.
KM
a
o
M
Q
H
53
O
1
O
CJ
H
W
|C
8*3
g°
Sg
*
*
W Pt
d °
t> w
rJ
•< ^-
fH Ed
3 S
*S
H§
OMPONEN
SCRIPTI
U W
Q
SI*
•5 v
f i
S a!
S w
|i
§
1
CO
S
PS
• r-»
t— ( ^D
ggg
O fl 1
S3 Q
i P
_J Q.
8s81
3 -rl s C-4 ~ -
en i-i P: i-i a M 4-1 1-.
Ov OS OS OS ,-s OS OSOs
^ X OS OS OS OS OS in OS OSOS
^^
OS
x-s j- 0
rH CJ 00 O
303 * cd B >s O
S o >s s y * •
B t-~ Q. 01 rOBOos
00 ctl u os O 4J -O M 3r-lvO
B £ O r-4 CJ -H i-H 3 O «• Os
•B IM " .S O 0 rH
0 B 0> 0 "5 rH f 00 -3 H «j -
O -H t) U M > B rl Of JC O
•So UH .3^ 5 co w -aug
QOO OB «> C04JO
- 4-> BOJlO
2ei)3 BC rl sf^*J_ C04J<0-
r4AO CO SOBBB CO
CO .H fi • -H B 030 C 01 C
XB C<"04J 4J Orll-IO-^ -r4MCO
•OCOg OJ-CO CO 1-IOJ4JOW M^
0)r-l 4JOIO J2 4J>«) « « U
TlOelaJ 0)"r4 4J COOr-jBrH A'4I
1-1 H -O "r-l i-l 3 -rl 3 fS B CO
•H O O 01
TJPL^OJ CDrH3 O CXOOOIO r-HuO)
OO-S-SrQP- i-l O-rlttrHQ. (XBrH
SCXtO 03 -4J O.4-1 s 9 OJ
B^0-rl3-"r-! r-lr-ICCltJ J3B
OI-Ht)!>,.r4COcoe 3 cd 3 4J 0) B- B4JO
•o ojf<4J>B3a. 4JO,on 3cnoi-io
o-aB4Jolcdo)oo oO4Jo3<33Ti>B
OOJi-IBi~4 OU(X»4JCX 3OC04J iH
(Dnn>3CO 00 CMC?* BcdCD
t» o>sO-rt>
o -H >-,T-I >, i-i 'w id 4-1 u u 4J cd a. <-> J3 W>,rH<»-l OJ B M-l-OlHU-lOl
*4cog,-ig 3CXOJ3rOO>rlO O 01 O iH
COcUO^'OCdO "H B M-lr-l
, covi BOIBBBcdBU-iB B
B o>UrH33r-l OJB 0) OJ OJr-l 01 01 0) O CO
• PM P-c PH Pj rWcd (U CC4-I
M
IT)
Q r-t IV rQ rQOPMCO^ JBtH
H4 H O HA
ZC^!cij CJCJCJO O2EJO
§ H § W SjwSoS SB
U CO H P-. CL. (1, (L, pj PH »-« M
CN (o o i-4 CN co -3- in vo t— oo
S i-l ^rHrHrMr-l r-lr-l
-1- + + + + + + + +
OS
o\
Os
OS
§
o .
-Os
in MD
•H OS
«• r-4
|o
^ 0
4J O
rl C3
01 i-l
4-1 lO-
CO
01 c
&£
4-1
0 CO
U 41
B r-4
•H
0)
-c B
4J 0
•H CJ
* B
•r4
0)
OJ JS
^ 4J
fH -rl
•a?
03 CD
CM 0)
•rl
U-l »-4
O i-l
0§
•H IM
3S
x
5
rH
V
H
•H
A
Z
M
O\
•H
Os
Os
OS
OS
0
O
0
in os
r-l SO
to- Os
I-l
B
J8o-
4J O
O
rl -
CD in
4J >
SB
S>,a
4J
I CO
O 09
U 0)
C rH
•H
4J Q
•H 0
s a
1-4
CD
01 Jl
•H iJ
3*
§ CO
>H 01
T4
1-1 r-4
°^
O CO
•H U-l
32
«
V
tx
•n
r-4
A
Z
l-l
O
CM
Os
Os
OS
B
O
•H
4J
cd
4-1
rl
0
O.
0)
B
ca
rl
4J
u
»4
rH
,0
R
00
OJ
00
3
4J
CO
X
4J
B
o
•H
4-1
3
O-j!
O M
CD
i-l CO
•a • oi
«l B
CD 60 -rl
2 « 3
o u-i pa
4-1 O
CD U-l
CO O
rl rl
O O CO
3 01 3
0" X CO
•rl B
rH CO OJ
rH O
rl
O U CN
u-i 0) r-.
> o,
3 Or" .
rH r-> B •--
ed 4J O co
00 B rl K
3 u-i co
rH O rH
03 U Cd rH
4J 4J O
O U-l cd Q
H O T3 s-'
CO
£
U
S3
CO
rJ
O
OS
OS
OS
•-H
cd
4J
o
4J
U-l
o
OJ
3
•H
cd
>
OJ .
J2 *-\
U CD
xy
JO rH
-0-3
OJ Q
3
a, meas
tations
4J CO
T-l
Q. 0)
cd B
O 1-4
rH
r< O
0) 00
a. cd
00
00
OJ >»
rH JQ
0)
CO 00
01
00 rH
o) cd
O CO
•<
s
I
CO
u
3
CO
s
cr>
CN
-1-
Q)
•0
•H
>
O
K
0.
0
4-1
00 •
00 00
OJ r-l
•H CO
rl «
O i-l
!-,
CO 4J
r4 W
01 V4
4^
CJ 1 U B
U o
&M SO
r-l n
00 00
T4 01 C
H i-l
4-1 Cd -O
B cd
o> no)
B 4) J3
0 B
fCd 4J
g rH
3
U 4-1 03
SIM
rH
B B rH
9 O 3
(5 UU4
+ *
4J
0
U-l
B
O
•H
4J
CL
•H
14
O
03
01
•a
•o
rH
OJ
•H
U-l
g
8
•o
rl
«!
•3
S
4J
cn
«
*
of sources.
B
O
-H
4-1
CO
4-1
B
g
O
o
T3
U
01
X
4J
t-l
3
U-l
M
O
U-l
-------
tl
-------
1;
8,
o
•o
3
d
•H
iJ
g
o
«NI
W
,J
I
COMPONENT
DESCRIPTION
g
u
en
en
oox
V-'X
^ -gJSria 5g
09 • qj 4J S O S*
B « >
euxu
OCU--H
-HS^ncOeO
UOrHO™
O.OBO,,
-H 3«n
MCU C/vc °
0 43 r^ ...
CS U U 1 -S
CU •HCN'3
T3B<;--Tl
-
X
U
•H U
U U
QOed-S
M-rtw
UNO
4I3IU
eo O 01 o •
-- ft * * S
ox°
00
B h
Point source
(Tons/yr)
> z
> >-l
o
ts
o
l
O -I M
CM <>l CN
CNCMfl
12
-------
W <
O p
o.
s
•o
01
3
M
9
z
I
«
ON ON
OV O\
• •
OV /—,
ON ON ON
~
<
•°
e
o
T3
01
I
iJ
C
8
es
W
nJ
3
ON
DESC
oo
-
o««ocs
CU^CM o
ticMvO
Qt)
o
M
3
M
a
o<
CT.
ai h
at o eo
60
a
B
• •>.
•» op
M
3
-0
T>
S
41
2
°
3
-oeio
a
-0-H
01 M
>3
41
•o
3
M
M
o
o.
OOOO
e «
-H4l-
0)
M
>o
Co
ENT
*
a
O
a,
3S
CN
0
OOOO
r'lce<"i
13
-------
co o.
9 nl
•o
a
MO
CO 01
en a.
CO
TJ
0)
3
G
T*
t-t
a
o
o
CM
i
acM
30
O
o m
Cn
30
O
ooooo
I-HC
J3rt-H
oa-H
•OfO a
Wl C 6 H 0> U>
O'ooj'^o'a
iwO OOiwOdl
o«3- oj3
01 x-^^* 01 U
oua) cm
cdnU'cgmoi
4l o S r-^ a) o M
0S.S ap-S
^ H h h £
U3r-o jJcr
4J0«j
0)
ij
W
H
d
0.73
3 ii
On)
neoM-i
oo
3
UU
^s «
e«
« -o
^r-l
o
lO-O
c
«
C
-Hiu
CO.
S
«J B
cJ o)
•H 4J
i> B
4-1 J3
u-l W i JJ
\D *0 M fl
O 4J
O C8
•o
§
M -H CO 3
U UJ M W TJ
I-M t—I VI I*
C O B H C CO 41
~r{ Q.-H < M -H O
h 3 O -a-
41 hi
B O. 000
33 f>
C O 01 ^
V< X J
01 01
0-i O.
55
H
H
^
3
O
5§§Sgl
ww
h43^1
8
S
S3SS5
S
14
-------
4-1
a
o
o
CN
*
*
•1C
cd <
p H
f* *C
0
C/5 fe
O
J U
> u
n
as
IH O
W E-*
O H
U M
!k
1
Id
6 p
8
CO
3
CO
g
o
ON
ON
i.
rH
id
o
B
•H
0>
rH
01
•g
g
U
01
PL.
g
IH
i
0
rH
vO
+
"
_
^^
a
id
CJ
o
rH
td
01
rH
01
•g
g
U
01
PH
0
2
M
Cd
rH
rH
VD
-
_
,
•O
0)
4-1
^^ CO
rH O
Id O
4H rH
4J Cd
4-f 4-1
a 3
O 0
01 01
rH rH
§td
B
0) 4)
•g "g
B B
U 01
CJ U
0) 0)
H O
O .J
H <
H H
i B
w w
1 1
CNI en
rH rH
v£> VO
+
•• "
.
^^
•n
01
rH (.I
CO O
4-1 rH
O rH
B B
•H fH
01 01
rH rH
§10.
01 1
CU 01
i ^
B B
01 01
o u
01 01
PH PH
* Z
M M
§ §
«4 »4!
.» in
rH rH
vO O
+
-
_
.
rH
id
o
4J
4J
3
O
01
rH
1
01
9
B
01
U
01
PH
S
H
B
o
S
§
rH
\0
+
.........<--..
f
• /4. *o
^N T3 « •
T3 4) • -W x-s 0) rH
4) u ^ ca tH o co
St-H O «J O *J -— < O
HtHOO(UVa)
3 iH i-( fH
O<-UO)QJ4>cgflJrt
fH rH rH rH E B 0
iVtQQtdcdiUVQj
i-HanBB'*-***-*'*-*
gH)(l)DQjt.V(D(U
.1 ? I f f f If
•HBBBBBBB
-COOOOOOO
3BBBBBBB
BBBBBCBB
01010101010)010)
uouuuuuu
OIOIOIQJOIOIOIO)
g ej c5 ^ E-i
< H H H
•"-i
•o
id
i)
00
id
13
u
aq
1
|
^
rH
U.)
1
4-1
el
6
rH
Csl
+
CO
01
•H
Q
01
10
CJ
•a
01
id
oo
00
oo
ra
M
ca
13
CO
CO
01
•§
H
01
01
^-*
10
11
T3
o
oo
n
M
1
o
CO *^
cd
rH 41
0^
C H
1
en
B
ON
CO
01
a
4)
4J
U
^
M
UH
O
B
O
4J
id
00
0)
BO
60
tlj
U
IH
O
14H
O
1^
1
rH
0)
•§
CJ
•x.
o
0
u
C/3
"
rH
o
r>-
+
e/>
8
ON
ON
ON
ON
ON
ON
^
8
$
to
11
IH
0
UH
01
id
4J
id
01
•o
•8
4J
3
•a
01
00
id
M
id
4J
1
g
g
*>
g
p^
+
0
o
o
C)
2
p
OI
a
&•*
o
oo
u
10
0
:
i-1.
u
to
01
0
UH
01
4J
cd
IH
J2
id
ol
•o
41
rH
id
B
01
•H
*
4?
to
3
•o
1
01
Cd
?'
a
Ed
g
i
m
O
-t-
O
o
o
0
s
IH
01
a.
CO
01
IH
O
00
01
4-1
id
u
CJ
M
O
§
•H
4.1
00
01
IH
00
00
id
i
rH
Ui
Q)
O,
4-1
tt
4J
g
V
•a
u
rH
1
UH
01
4-1
•H
•*
1
CO
3
•o
1
01
D-RATE
g
§e
i
3
+
o
8
8
rH
IH
11
a
01
4J
crj
4J
•X3
0)
cd
01
•H
s:
?
B
O
B
S
CO
3
•3
i
01
Cd
S
i
Id
g
i
o
H>
^^
O
O
o
o
o
rH
rH
41
Cv.
01
4J
cd
IH
Vj
id
01
13
01
rH
0)
UH
01
JJ
•H
f
B
O
B
•a
01
CO
3
•o
id
01
00
D-RATE
§
g
i
r-
H>
15
-------
^*^
T3
a>
g
•rl
*J
P!
o
o
•+~t
C^J
U
rJ
§
il
w (n
o
«
*
W P3
,J R
> w
-rf* ^^
OMPONENT DAT^
SCRIPT ION TYPE)
COMPONENT COMPONENT C
NUMBER NAME* DE
3 3
« «
g ' g
o u «
^ 00
- r : -. -. -. : -- = ,v
in
o
00
0. «> V
3 OJ 00 J.
M^BST 7 s §
1? .« o, S-2 ng-".^ ~ g § §
iH4)jO-S M OO-O O ^ t '
•rt •< 3 f"< _. M O 00 a r-4. t< rH M c^J S 13
<0 i5 * .3 0. 0 ,0 OoSSoi
4JOB-HOJO)J20I 01 U 0! m H » 3 O|-^iH
OOI4H
WOIBO.'WO^'-'O •> 31-10.30.00 p u-i-Ho.
004J o3-aOpq"H JSBOB O003O3AI C iHo3
n-rioB-BsfOoo w OKO -C-aooio
OOO4JMO)Ca]C-HM3 -O-H ODOlMOOki^: mOI 6000 00 iHCXIOOO
iHoirHB-Hng-ai-toton) o. oiio o) a o "H o)
4J003(JO4Jc>in60tocr' rH 0.30.00 oioooioi S-H 0.01
rtq)O.'H-H'H., tdO. 3 Br-~a Quo OO) ton OI3OO
o.m o.o.to-Hjj°04-iooni ° HOOUJSB j= o xo.on
OlrH O^H'wTH"ItntJ4J UO. 60 60UOIXOXU-I 4JO)V400M
u^MSHtj-. tooo«oi o) « ony B 60 o
O 4J 00 O.-O tU r-l O CTv BO CO 01600lu
* 4J -H o o -a -n 01 itdiJ 601000 oo> BSSo.H OI>M QOIigB-H E 3 IMOIOUHIO
O CX-H • o O, C ^ O OO.OI OIO4JBOBO. 60U-I rH
iH3>0)4JO3Jj,,alt>tl<::>oa' H -HlOOiHOO tOlO B3
iJOOP,«OTj6pM(OO M MOM4JrH-H4JiHC4* CI BOCX
lOMM^l-'CXtj.2;C(0 * Oi4-iOf034J(84J O^-i " BOBO.O.30.310 to MrHOl
O4J-H 0)Bos^lHO)M-»6060 OlHQO CX O CX S -H B i-l 3 rH
(X to 0)0)w
UJ3000l-HC4-itO .010100 3CXp(Otog4-l4J 0 rH 01
•H 0) O> .C M-l O O'QQ.MOOklU O.OO.SOIX>9iHiH M30I<-IO>
OB4J4-liHB7 j: ao.x ^
O) tO -^ rH T3 13 CX 00 1-ia.O.O.rHrHOIOIdrHBBB > 0) 3
lOjdo>3B^3 uaiD33«tfl4-i4j|«««q) maiiuT
o>r~-uMOCx3 Boo>n>ooo4-> B-HIH B u ji J3 ji oltH c -H B
M}O\\4?O)-|OO -MXIOIQUMMO a>.CtB o O M M l-i M H} OfC O
^rHr-iiijo.o.u-! -a!604j>,Oo06oeoHPi.5S!zHO»5 3 :3:z5:a
M
H
DM M
O g J W
O O MMMwc/3 2
H Jzs , i fj js j u b •• W td
&I"! ^C ^C ^C ^C EH &3 O M ^
& ^ SSnSHblzaS 3
U D O up^UrCHO^uScd
M O B< PuWStaSri-lrHOHSH
Cu « UH fdHCOMM
W O) JJWWI-IJZ2S U X W 3
1 <3-
-------
count as is used in the age-specific population repeating group. Income
ratios are based on 1969 data and were also taken from the 1972 County and
City Data Book file.
Figures for alcohol sales and gasoline sales were taken from the 1970
Census of Retail Trade. Data for alcohol sales are available only when 500
or more retail establishments exist in the county. Therefore, data is
available for roughly 60 to 70 percent of the counties. The problem is
further complicated by disclosure limitations in some of the remaining
counties. Using the data that do exist, SSI developed an index for providing
a rough indication of alcohol sales per persons 18 years of age and over for
unlisted counties.
Climate data were originally obtained from the Environmental Data
Service of the National Oceanic and Atmospheric Administration. The data
are in terms of means based on the years 1941-70. Where there is more than
one weather station, data are recorded for the station closest to the
county's population center. The same procedure was used to record the
county's elevation and latitude. Since not all counties have a primary
weather station, National Weather Service climate regions were used to extra-
polate data from other similar climatic areas.
County migration totals (components C42 through C49) are totals for in
and out migration summed across the age groups in the migration repeating
group. Detailed information on the in-out migration data will be provided
later in this section.
The NEDS EMISSIONS repeating group data (components C101 through C103)
were extracted from data present in the NADB NEDS-USER file in 1976 for the
five criteria pollutants: Total Suspended Particulates(11101), Carbon
Monoxide(42101), Sulfur Dioxide(42401), Nitrogen Dioxide(42602), and
Hydrocarbon(43101). The numbers in parentheses refer to standard NADB
pollutant codes. The repeating group consists of data sets containing
total county emissions and total point source emissions for each of the
pollutants. A user-defined function is provided for calculation of area
source emissions by subtracting the point source emissions from total
emissions.
17
-------
The SAROAD MONITORING SITES data (components C201 through C312) were
selected from 1974 data in three SAROAD files; Yearly Summary File (NADB-
YRSUM-D), Frequency File (NADB-YRFRQ-D), and the.Site Description File
(NADB-STE-INVO). The SAROAD MONITORING SITES repeating group contains data
sets describing each monitoring site, its location, type, and purpose. The
MONITORING DATA repeating group consists of data sets containing SAROAD
monitoring data as descendants of the reporting site descriptions.
Monitoring data are provided for the following pollutants: TSP(lllOl),
Nitrate(12306), Sulfate(12403), Carbon Monoxide(42101), Total Sulfur(42269) ,
Sulfur Dioxide(42401), Nitrogen Dioxide(42602), and Ozone(44201). The
numbers in parentheses refer to standard NADB pollutant codes. Sampling
intervals provided in the data base are identical to standard NADB codes,
except that codes X, Y, Z, and C were expunged from the data base because
the data were of questionable validity.
As requested by the EPA Project Officer, the yearly means in the data base
are geometric means. Although some present National Air Quality Standards
set a limit on annual arithmetic means, these means may be estimated
from a knowledge of the geometric means and geometric standard deviations.
A list and description of codes used in the data base to describe monitoring
sites and data, e.g. site type, sampling method, sampling interval may be
found in the appropriate sources as referenced by Table 2.
The AIR QUALITY DATA repeating group data (components C401 through C405)
represent calculations of air quality data for each pollutant monitored in
the county. Data are in terms of the arithmetic mean of geometric means for
each monitored pollutant for purposes: (1) population monitoring, (2) source
monitoring, and (3) background monitoring where the sampling interval code is
1 (1 hour) or 7 (24 hour). An additional component provides a calculated
predicted mean for TSP where TSP and NO emissions (tons/year) fall within the
X
following limits:1 1414.375 < TSP < 230031.803
NOY < 432778.195
rft "•"
These limits correspond to the limits of the variable set used in the
regression analysis discussed in Appendix C.
18
-------
The PERCENT IN-OUT MIGRATION repeating group data (components C601 through
C625) represent calculated migration rates for the county over the span of
years 1965-70. Documentation on the in-out migration file, as received from
the Bureau of the Census, is provided in Appendix B. It provides a
discussion of the file content, allocation procedures, and comparison with
published data. Briefly, the data in the file are based on question 19
of the 1970 Census 15 percent sample household questionnaire. It
requested those persons who reported living in a different house in 1965 to
report the state, county, and city for their residence in 1965. An alloca-
tion procedure was developed by the Bureau of the Census to provide a
residence assignment procedure for those persons indicating they lived in a
different house but did not report an address. Data in the file are in
terms of both 'unallocated' and 'allocated' figures. It was decided that
for use in the data base, the data would be best represented in terms of
Total (allocated + unallocated) and Allocated. The race categories on the
data file are white, black, and total all races. To maintain consistency
throughout the data base, race categories in the data are white, non-white,
and total all races. The 7 age groups used in the migration file roughly
correspond to aggregations of the 15 age groups for age-specific population
data to be discussed later. These seven age groups are used in the data base.
The in-out migration repeating group in the data base consists of seven
data sets, one data set for each of the age groups in the migration file.
For each data set there are figures for each of the race-sex groups in terms
of Total and Allocated. Since the Total figures are considered to be the
most useful, those components are KEY'ed.
The AGE-ADJUSTED DEATH RATES repeating group (components C701 through
C706) contains direct method age-adjusted mortality rates for the county over
a three year period from 1969 to 1971. Three tapes of raw mortality data,
one tape for each year, were obtained from the National Center For Health
Statistics. Each tape contains about two million records, one record per
death, and includes all deaths for the year. The cause of death contained
in each record is coded using a four-digit code of the Eighth Revision of the
International Classification of Diseases. Considerations of constraints
19
-------
imposed by data base size limitations and obtaining significant numbers of
deaths over a three-year period required some aggregation of mortality
categories at the three-digit level. With the approval of the EPA Project
Officer, a decision was made to use aggregations of mortality categories
corresponding to those already in use at HERL.
Two supplemental malignant neoplasm cluster groups were gleaned from
the publication 'A Sequential Space-Time Cluster Analysis of Cancer Mortality
in the United States: Etiologic Implications,1 Fred Burbank, Am. J. of
Epitf 95, 393 (1972). Further discussion of the selection methodology for
these clusters is provided in Section 5.
Table 3 describes the 50 aggregations of mortality categories used in
the POPATRISK data base. Table 4 gives a detailed description of the two
cluster groups. The POPATRISK code associated with each cause grouping in
Table 3 is the code by which the grouping is referenced in the data base.
The repeating group consists of one data set per cause grouping, containing
county age-adjusted rates for the total population and four race-sex groups.
The AGE-SPECIFIC POPULATION repeating group (components C801 through
C810) contains age-specific population data extracted from 1970 Census tapes.
Population counts on the Census tapes are based on a 100% count and were
provided for rural and urban and broken down by race-sex group and age. For
loading into the data base, the data were aggregated into 15 age categories,
1 data set per category, and expressed in terms of Total and Urban race-
sex groups. A 16th data set is provided as totals across the 15 age groups.
Race-sex group data not represented in the repeating group are provided for
as user-defined functions.
The '1000' series components are user-defined functions which may be
changed or augmented at any time without affecting the integrity of the
data base. They are mathematical expressions intended to provide useful
information needed on a less frequent basis while conserving storage space
in the data base.
20
-------
%
00
8
M
O
1
0
H
O
CTi «O
O CO
0 rH
cici
O CN
0 0
CTi CT>
1 1
O O
<• m
•-H •-!
rO r**- ON f*1) ON
vo oo oo r***- ^^
rH i-H r-l rH rH
o * o sr m
I I m i i
O ST <• rH CM
sr -* m ON
sr sr sr m
i
s
H
i
9
CO
cu
CO
s
to
CJ
0
VO
1
rH
O
cT
o
o
rH
CM
O
H
CJ
•H CM
m
00 ON
O rH CM CO «* in VO
CM CM CM CM CM CM CM
r-- oo ON
CM CM CM
O rH CM CO vtf-
CO CO CO CO CO
en
PL)
O
(X.
o
CJ
00
M
|
8
CM
ro
m
oo ON
0 rH
CM CO
m vo
oo
ON O r-l
rH CM CM
CM co sr m vo
CM CM CM CM CM
Cfl
§
co
4J
CO
CU
Q
U-i
O
CU
CO
J
M
2
CQ
•H
CO
O
rH
O
rl
CU
,0
CO
CU
CO
CO
CU
CQ
O
•rl
4J
rl
Cfl
CX
TJ
§
CU
4J
i
I
H
fi -
"2
d CO
* 9
CO CU
rH 4J
CO CO
U CU
Ujj It t
O O
W CQ
Cfl CO
rH rH
CX CX
O O B
CU CU 3
d d cu
d £ J
6C 60 CU
system
O
4-1
CO
i-l 4-1
iH CO
CX Cfl
CO CU
CU rl
rl rO
co
cu
4-1
•H
CO
"8
•H
CO CO U-l
d d iH
co co u
00 00 CO
M rl CX
o o en
rH >, 3
CO rl
4-1 CO rl
•rj d CO
3 -rl J3
CU rl 4-1
60 3 O
O O O O O
B e B e e
CO CO CO CQ CO
cfl cfl cO cfl cO
rH rH rH l-H t-H
CX CX CX CX CX
O Q O O O
CU CU CU CU CU
d d d d d
4J 4-1 4J 4-1 4-1
c d d d d
cfl cfl CO cfl cfl
added
00 60 60 60
•H iH -H -rl -H
(0
•g
CO
y
cu
rH
Cfl
*o
cO
Cfl
•H
e
CO
y
CO
plasms of lymphatic •
O CU
cu y
d w
(0
rl -H
CU 4J
4J cj
O iH
TJ CU
CO O
CX
CO O
B 4J
o eg
u B
rl <0
tQ rj ,
CO
jl
fied nature
•rl
CJ
CU
CX
CO
3
U-l
o
CO
e
CO
co
rH
&
CU
d
c* nj
00 S
•H JC
d 4J
CU CO
P^ <^
co
4J
•H
rH
f-l
CU
CQ
CO
4J
CU
•Q
iH
CO
Cfl
•H
Q)
d
CU
CO
£
(0
•H
TJ
CO
* p3
CJ
CO CO
i-l co
w >
•H o
60 rl
•H <3)
IS
rheumatic heart
CJ
fl
§
|
*o
d
to
cu
cu
U-l
cu
U CO
•H CO
4-1 CO
cO CO
E -H
Snrt
<§
r4
d *o
CU
4-1
U-l i-l
0 2
cu
CO U-l
CO O
CU
CO TO
T> rl
O
CJ UH
•H
d rl
o cu
& W
CJ O
cu cu
co co
CO to
CU CO
co co
•H iH
t3 T3
4J CU
SfH
10
A d
0)
CO 4-1
•H CU
(0 CX
C &*i
cu ,c
4J
cu cu
P rd
>-, 4J
w o
4-1
CO
CO
?s
rl
o
4-1
CO rH
•rl 3
co cj
O rl
M -H
CO CJ
r-l
CJ U-l
CO O
O
iH CO
* CO
cu co
tJ $
Cfl CO
•rl
i-H T3
Cfl
CO CO
d rS
CU 4-1
o o
%
cu
rl
rl
0)
O
"d
CO
CO
4-1
rl
CX
CU
TJ
cu
•H
U-l
•H
O
CU
CX
CQ
§
X) -H
d CO
Cfl o
^
O CO
•H rH
d CJ
| W
CJ
21
-------
00
\O csi^vDcsrOCO CO LOrH
co c^ r—( vo co *^ 10 \o r^-oo
I I I I I I I vo rOlr^ll^
O O O O r—( O O iO iA t—I lO -^ O ^
f*- a^ Ovoro-^io vo r^-oo
•^••^lOKj-ioioin 10 iotr»
ro
vo
I I
oo
r>- oo
m m m
.
I I
ON O
vO CO
t*^ r*^
rH
CO
•H
CO
m CM cs
CO O vO
OO OO ON
WWW
O CD O
rH O
oo
w
CO ON
vO W
w
ON
-oo
cd
ON
ON
ON
m
cu
co
2
o
S
CO
fa
CJ
VO
in
co
oo
co
ON
CO
co
vo
m
CJN
O- *i
oo
CN
m co
o
m
CO
H
CU
n
•o
CJ
•a
o
cu
a
P. a
-a 8*
a &
tfl O
CO 0
CO i-l O
N 4-1 U
Q) JS
rH fl CU
a S 4J
M PQ O
s, enteritis, and colitic,
f newborn
•H O
4J
•H CO
ti CU
CU &
T3 M
0 l-i
•o
CO 4->
CU P.
4-1 CU
•H O
^ X
4J CU
CO
ifl
o
ecystitis, and cholangitis
rH
0
r-l X
CU O
^
•H *
rH CO
•H
IM CO
0 CO
CO 2
•H 4J
CO -H
O rH
JC CU
J-l rH
rJ O
•H J3
CJ CJ
infection
Sx,
cu
e
•0
•H
*^
T)
CO
•H
4-1
•H
M
A
Cu
CU
2
CU
4J
cfl
4J
CO
O
S-l
Cu
4-1
O
cfl
•H
CO
Cfl
rH
p_,
^i
CU
p.
^>
s
ication of pregnancy
rH CO
& 3
O rH
o 53
S
TlJ O
cfl TO
CU cfl
•H 4J
M H
cu rj
> Q)
•H 60
rH C
CU O
Q U
iseases
13
•0
a
cfl
CO
•H
l-l
3
•r->
C
•H
r^
4J
J_j
•,-{
PQ
rly infancy
and ill-defined conditions
CO
CU
U-l
0
CO
cu
CO
CO
cu
CO
•H
M
0)
.C
0
>N W
4J CU
•H W
•H CU
C CO
CU -H
CO Tj
•> >-i
cn cu
0 rC
O 4-1
4J O
Cu
6 rH
CO
^j
o
4J
Q
f.
CO
4-1
|
•H
CJ
CJ
CO
M
cu
_f"j
4-1
O
cu
•o
•rl
CJ
•H
3
CO
cu
TJ
•H
O
•H
a
0
EC
H CM
CO
cu
•H
}_l
3
•i~>
•H
t-i
CU
M P
• cu cu
4J 4-J
co ca
3 3
rH rH
O CJ
a a
CO CO
CO cfl
_J __J
^^ r^
cu a
0 0
cu cu
d c
CO
CU 4-1 4-1
3 CO CO
CO d d
•O 00 00
•H -H
O <
22
-------
TABLE 4. CLUSTER DESCRIPTIONS
Cluster 1: ICD Rates Male * Rates Female *
Breast 174 0.24 20.89;
Bladder 188 4.89 1.90
Kidney , 187 3.25 ' 1.81
Rectum 154 4.71 3.44
Stomach 151 12.93 6.64
Large Intestine 153 12.29 13.67
Small Intestine 152 ? ?
Cluster 2;
Tongue 141 ? ?
Naso Pharynx 147 0.33 0.10
Trachea, Bronchus 162 29.59 5.07
and Lung 163
Larynx 161 2.0 0.20
* Mortality rates cited are 50th percentile values taken from Atlas
of Cancer Mortality for U.S. Counties: 1950-1969, U.S. Department
of Health, Education and Welfare. DHEW Publication No. (NIH)75-780.
23
-------
VARIATIONS IN GEOCODING SCHEMES AMONG SOURCES
The compilation of several different types of data originating from a
variety of independentisources to create POPATRISK resulted in several discre-
pancies in county code assignraent schemes. Resolution of these discrepancies
involved a significant portion of the data processing effort. A brief
description of the geocoding problems and their impact on the POPATRISK
coding scheme will be outlined here while a detailed discussion of the
resolution methodology is provided in Section 5. Full documentation of the
POPATRISK geocoding scheme, including a cross-reference table to the various
other county codes is provided under separate cover in the POPATRISK Geo-
coding Manual.
The state of Massachusetts presented a special problem when using SAROAD
county assignments in the POPATRISK data base because the county assignments
do not match those of any of the other data sources. SAROAD recognizes 13
counties while NEDS has 6 counties and Census recognizes 14 counties. With
few exceptions, Census county assignments adhere to the Federal Information
Processing Standards (FIPS) coding scheme. Since Census uses a finer geo-
graphical breakdown than NEDS and SAROAD and is most widely used, it was
decided to adopt the Census protocol for Massachusetts. All data not con-
forming to Census protocol, namely NEDS and SAROAD, required development of a
reallocation procedure to reassign data to the proper counties; Massachusetts
county codes to be used in the data base were arbitrarily chosen to be the
Census county codes with a leading zero to fulfill the four-digit code
requirement.
Throughout development of the data base, Virginia continued to be a
source of problems when attempting to mold data for Virginia counties to
conform to a county based coding scheme. The independent cities were not
compatible with the county based data base and data for each city had to be
reassigned to its original county and incorporated into the data for that
county entry. It was necessary to "create" two new counties in Virginia for
those independent cities for which no county now exists, namely those areas
formerly occupied by Warwick County and Elizabeth City, and also by Norfolk
24
-------
and Princess Anne counties. These two new counties were arbitrarily assigned
county codes of 0972 and 2144, respectively, to maintain consistency in the
numerical and alphabetical order of counties.
Another problem with independent cities occurred for the cities of
Baltimore, Maryland and St. Louis, Missouri. Both FIPS and SAROAD lists
counties alphabetically but while SAROAD lists Baltimore before Baltimore
County, FIPS lists them in opposite order. A similar problem existed with
St. Louis, Missouri. Yellowstone, Montana also appeared as an independent
city and the data were reassigned to Park County, Montana.
The five New York City boroughs presented a problem in that some data
sources provided data only for New York City as a whole with no further
breakdown into boroughs as needed for POPATRISK. Climate and geographical
data were two such examples. In these cases, the data provided for New York
City were duplicated for each of the five boroughs. NCHS mortality data were
also available only for New York City as a whole. In this case, a decision
was made to calculate age-adjusted death rates using total New York City data
along with the total population figures for all five boroughs, yielding rates
for New York City as a whole. These rates were then duplicated for each of
the five New York City boroughs.
25
-------
SECTION 3
USE OF DATA BASE "POPATRISK"
The POPATRISK data base resides on a private removable disk pack on
the Univac 1110 at Research Triangle Park. It is accessable in either
demand or batch processing mode and can be used by non-programmers with a
basic knowledge of SYSTEM 2000 N.atural Language. As part of SYSTEM 2000's
security system, a user is required to give a password for access to the
data base. Multiple passwords may be available to the user, but these
passwords should have only retrieval authorities. Assignment of passwords
and their authorities are allowed only by the Data Base Administrator
using his/her master password. Normally the Data Base Administrator maintains
exclusive update authority to the data base.
Once in possession of a valid password, a user- may access the data base
using the following commands:
@XQT N*NS2K.280
USER, 'Password1:
DATA BASE NAME IS POPATRISK:
S2K Natural Language Commands
EXIT:
As previously mentioned, it is not within the scope of this report to
provide instruction in the use of SYSTEM 2000 Natural Language. However,
there are some points to discuss concerning various retrieval and maintenance
techniques tailored specifically to the POPATRISK data base.
Since the data base is county based, the most convenient and least
expensive method of qualifying data is by use of C3 ST-COUNTY ID in a
WHERE clause to qualify data sets for a particular county or span of
26
-------
counties, e.g. WHERE C3 EQ 343000. Component C3 is the only component in the
data base.that is unique for each logical entry.
The first component of each repeating group contains data which act
as identifiers for each data set in the repeating group and are KEY'ed
for use as selection criteria in WHERE clauses. For example, C101 NEDS
POLLUTANT contains the five-digit criteria pollutant code KEY'ed for easy
access to emissions data when used in the qualification of data, e.g., WHERE
C101 EQ 11101. Each component name in the POPATRISK definition was
chosen to be 16 characters or less to allow convenient default printout
of the full component name in LIST Output.
The POPATRISK data base is structured such that all repeating groups,
with the exception of SAROAD MONITORING SITES and MONITORING DATA are disjoint.
Disjoint data sets are logically unrelated as shown in the data base structure
in Figure 1. For retrieval purposes, it is most desirable to design a
data base with repeating groups that are logically related whenever possible.
However, each set of POPATRISK data, for the most part, came from a source
for which there was no basis for a combination with other data types. For
example, in-out migration age-groups do not correspond with age-specific
age groups and NEDS emissions pollutant codes do not necessarily correspond
with SAROAD monitoring pollutants. Therefore, data available for retrieval
by means of simple S2K Natural Language commands, e.g., PRINT, LIST, are
entry level data and data along any one branch of the tree structure in
Figure 1.
Disjoint data sets may be retrieved using Natural Language in one of
two ways. The user may use "BY clause" processing to normalize the retrieval
process to a level of data that is common to families of all disjoint data
sets to be retrieved. However, because of the nature of the POPATRISK data
base, use of this retrieval method has not produced useful results. The other
method of disjoint data set retrieval is to make retrievals without WHERE
clauses. Since this method would qualify the entire data base, this method
27
-------
is not recommended. Therefore, using available Natural Language capabilities,
separate retrievals should be made for sets of unrelated data. If further
combining or processing of data is needed, each retrieval output may be written
to a report file for input to a special utility program.
Also available to the user is the Report Writer feature of SYSTEM 2000.
Although the feature requires learning a different set of retrieval techniques
and commands, it provides the capability for preparing several reports within
one run formatted specifically to the user's specifications. The merits of
using Report Writer would be dependent entirely on its adaptability to future
retrieval requirements.
Although it was not within the scope of this contract effort, an alterna-
tive solution to the limitations on retrievals of disjoint data sets dis-
cussed above is to develop either special purpose or generalized Procedural
Language Interface programs to extract requested data from the data base. PLI
offers the capability of establishing multiple positions within a data base
entry to gain full access to all data sets and to structure output in a format
most useful to the user. A PLI retrieval program would accept input parameter
cards giving the type of data requested, specific items desired, selection
criteria, and possibly data to be used in any calculations. Once the user's
retrieval needs are assessed and PLI retrieval capability implemented,
POPATRISK output could be provided in a timely and efficient manner and in
a format tailored to the user's needs.
Another aspect of active data base use and maintenance that is not
presently contemplated is updating of data. If, at some point, a decision
is made to update existing data in the data base, PL! load programs could be
developed to process new data and modify existing data and/or load new data
sets or repeating groups. Any changes to the existing data base definition
would in most cases require a reload of the data base.
28
-------
SECTION 4
SAMPLE RETRIEVALS USING POPATRISK
In this section are presented a few sample retrievals intended primarily
to demonstrate simple investigative capabilities of POPATRISK.
Table 5 illustrates one of the simplest types of retrievals available,
a listing of all entries in a county for specific data items. In particular,
the NEDS emission data, apportioned to all Massachusetts FIPS counties are
listed. One special feature should be noted. The S2K Natural Language
retrieval command which generated the listing of Table 5 calls for the item
"*C1001*" which is a user-defined function that generates the data for area
source emissions.
The result of another simple command is presented in Table 6. Here
data on SAROAD monitoring for Durham County, N. C. are requested. There
are three monitoring sites in the county; two in downtown Durham and one
near a reservoir some distance to the north. Although the two downtown
sites carry the same site code and are located at the same address, their
UTM coordinates erroneously indicate actual locations some nine kilometers
apart. Unfortunately coordinate errors of this type do occasionally occur
in the SAROAD data base.
Table 7 lists a sample of counties in descending order of county air
quality for Sulfur Dioxide averaged over sites in the county whose purpose
was to jnonitor population exposure. The average for state-county code
192140, Orleans County, Louisiana, is unreasonably high and, most likely, the
result of a keypunch error in the SAROAD data base. The potential user is
cautioned that occasional errors do occur in SAROAD so that unusual data should
be investigated thoroughly before significant conclusions are drawn. The SO
air quality does not appear to be closely correlated with any of the demographic
or socio-economic data items chosen for display.
29
-------
oooooooooooooooooooooooooooo
OOCJOOOOOOOOOOOCJOOOOOOOOOCJOOO
oc.
=>
o
Of.
<
CO
UJ
13
QJ
C
•H
•U
C!
O
O
04
aotrvm
(X
3
O
-o -o o oo vf. -
>-
z
o
oooooooaoooooooooooooooooooo
oooooocjooooooooooooooooapooo
(M
O
UJ
«-•* -O »-»->> >O
g
LU tu
ae
a. «-
(L <->
O
a.
a
UJ
w
,-j
«/) LU
ac
H uj
< X
LU 2
a.
UJ «
a: «-
x, O
*- O
o
H*
>•
3 O
O O
«J O
O
o
o
O
o
o
0
0
0
o
o
o
o
o
0
0
o
30
-------
ooooooaooooooooo oooooaoooooo
-o r\i r>j -4 r-u>r^fOf\jr-ooOT-u> f- N- is\ a o o •* K»ojw-r-o
N*oOf-W"»OfO >f K» «- O >O lA (\|O>t>>oO»»r>»ninroO«r»«vjoo»N*
»o«»K»tJ«o>o r-ror»jr-ooK>r«->roro«*«Aineh«-
KI »- K> -o oo t-jvio »-o «-rsjo »-r>jt3
>» (p- «- f\J (M
oaooaooocjoooooaooooooooaoooo
OUOOUOOtJOOUOOVJOU CJCJOOUUOOOOUU
ooc»oooaciooc3'3oooa oo^ioa ooooooa
OOtrtK>or\iOcxJ>omr^o.oi^»oOK»'~«*r\j«rt-4'oor\j^^-oor3
•OK>T-for--o«*uo fo»nooaK(ir>N.«-»OO»^
Crt OOOOOOOOOOOOOOOOOOOOOOOOOOOO
a> o o o o o o o o o o o rp a o a o o o 3 o o o o o :"j o o a
3 •OOOO4Air»>O*~Or^OO>A'41-OtJ'Of«-ir»fJ
0 ••••••••••••••••••••••••••••
O OlAOiAOfMOO rvi^«O«OC>»^»r>-«*»-O
-------
TABLE 6. ILLUSTRATIVE RETRIEVAL OF SAROAD SITE AND MONITORING DATA
FOR DURHAM COUNTY, NC
PRINT/REPEAT SUPPR£SS.NAME/C200 WHERE C301 EG 111C1
AND C3 EQ 341160:
AREA CODE* 1160
SITE CODE* 1
AGENCY* G
PURPOSE* 1
CITY* DURHAM
CITY POPULATION* 95438
UTM ZONE* 17
EASTING* 690551
NORTHING* 3975943
ADDRESS* HEALTH DEPT 300 EAST PAIN ST
TYPE* 13
ELEV ABOVE GND* 8
ELEV ABOVE SEA* 405
POLLUTANT* 11101
METHOD* 91
INTERVAL* 7
tt OBSERVATIONS*
GEOMETRIC MEAN*
STD DEVIATION*
70TH PERCENTILE*
90TH PERCENTILE*
99TH PERCENTILE*
HIGHEST VALUE*
2ND HIGHEST VAL*
LOWEST VALUE*
42
55.29
1.96
86.CC
115.00
123.CO
123.00
122.CO
6.CO
POLLUTANT* 42401
METHOD* 91
INTERVAL* 7
* OBSERVATIONS*
GEOMETRIC MEAN*
STD DEVIATION*
70TH PERCENTILE*
90TH PERCENTILE*
99TH PERCENTILE*
HIGHEST VALUE*
2ND HIGHEST VAL*
LOWEST VALUE*
POLLUTANT* 42602
METHOD* 94
INTERVAL* 7
# OBSERVATIONS*
GEOMETRIC MEAN*
STD DEVIATION*
1
45
6.55
.50
8.CO
14.60
17.00
17.CO
16.CO
5.00
45
21.44
2.02
70TH PERCENTILE*
90TH PERCENTILE*
99TH PERCENTILE*
HIGHEST VALUE*
2ND HIGHEST VAL*
LOWEST VALUE*
4C.OO
50.00
71.00
71.00
69.00
10.00
(continued)
32
-------
TABLE 6 (continued)
AREA CODE* 1160
SITE CODE* 1
AGENCY* P
PURPOSE* 1
CITY* DURHAM
CITY POPULATION* 95438
UTM ZONE* 17
EASTING* 689403
NORTHING* 3984931
ADDRESS* HEALTH DEPT 300 E MAIN ST
TYPE* 13
ELEV ABOVE GND* 50
ELEV ABOVE SEA* 405
POLLUTANT* 11101
METHOD* 91
INTERVAL* 7
# OBSERVATIONS* 6
GEOMETRIC MEAN* 55.69
STD DEVIATION* 1.21
70TH PERCENTILE* 66.79
90TH PERCENTILE* 69.49
99TH PERCENTILE* 69.49
HIGHEST VALUE* 69.49
2ND HIGHEST VAL* 66.79
LOWEST VALUE* 43.39
POLLUTANT* 42401
METHOD* 91
INTERVAL* 7
* OBSERVATIONS* 5
GEOMETRIC MEAN* 15.35
STD DEVIATION* 1.70
70TH PERCENTILE* 22.19
90TH PERCENTILE* 24.89
99TH PERCENTILE* 24.89
HIGHEST VALUE* 24.89
2ND HIGHEST VAL* 22.19
LOWEST VALUE* 7.09
POLLUTANT* 42602
METHOD* 94
INTERVAL* 7
* OBSERVATIONS* 6
GEOMETRIC MEAN* 45.29
STD DEVIATION* 1.10
70TH PERCENTILE* 47.89
90TH PERCENTILE* 49.79
99TH PERCENTILE* 49.79
HIGHEST VALUE* 49.79
2ND HIGHEST VAL* 47.89
LOWEST VALUE* 37.69
(continued)
33
-------
TABLE 6 (continued)
AREA CODE* 1160
SITE CODE* 2
AGENCY* G
PURPOSE* 1
CITY* DURHAM
CITY POPULATION* 95438
UT« ZONE* 17
EASTING* 694973
NORTHING* 4002678
ADDRESS* LAKE M1CHIE
ELEV ABOVE SEA* 403
POLLUTANT* 11101
METHOD* 91
INTERVAL* 7
tt OBSERVATIONS* 24
GEOMETRIC MEAN* 33.14
STD DEVIATION* 1.67
70TH PERCENTILE* 38.00
90TH PERCENTILE* 59.uO
99TH PERCENTILE* 85.CO
HIGHEST VALUE* 85.00
2ND HIGHEST VAL* 59.00
LOWEST VALUE* 6.QC
POLLUTANT* 42401
METHOD* 91
INTERVAL* 7
* OBSERVATIONS* 25
GEOMETRIC WEAN* 5.21
STD DEVIATION* 1.19
70TH PERCENTILE* 5.00
90TH PERCENTILE* 5 . CO
99TH PERCENTILE* 12.00
HIGHEST VALUE* 12.DC
2ND HIGHEST VAL* 6.00
LOWEST VALUE* 5.00
POLLUTANT* 42602
METHOD* 94
INTERVAL* 7
H OBSERVATIONS* 25
GEOMETRIC MEAN* 13.C6
STD DEVIATION* 1.33
70TH PERCENTILE* 16.00
90TH PERCENTILE* 17.CO
99TH PERCENTILE* 28.00
HIGHEST VALUE* 28.00
2ND HIGHEST VAL* 21.CO
LOWEST VALUE* 10.00
34
-------
I
OJ
o u
CO M
PC
o
p
Q
^J O
O
• •
o o
•0 13
U CM
• O
Kl »
• O
O Z
«1 O
O X
V CM
O
CM <
in ui CM
ui »
»- •
•- c>
VI — !
Q. —
O
O.
Z
<
UI
Z
K K .
in a- <
v •••••••••••••••«•»••«««.................».....
x •* — — _ _ CM — -" « CM — — — rt CM — n »•»• — — 3- CM —
UI
A
UI
r
o
o
Z
z ^-oo«DccrMY»r><>r>-nr>o-ivorMrvT-cc>CMO-oo-oxoa-CMoo>oo-o9-CMco — —CMOIXUIOO
m o cor^xcn—ou>r/''OOor>-o-TOuioc>r/-«oa-c-i>-rM — c^uiuio*cc*»iuiinrMOifi'«»-'«CMo-fM3-*^
re o- ui — oa)r^o-ui3-nuio-o-<o.rMa>'nr^_4i9-in-«i^cD«i«ia>ro-«»— — «->«mu>»-7X
in o CM CM »• »- o- CM <•> —ncM3-».~9-CMCMcvcMCM
i *" •» oui ~«u> CM ~— CM — — — — ui
K *O — *-t *. —
UI
ft.
a.
o
a.
z **<^cno*oO'«-M-*-«u>c^uioa'CDO'r^
o cv^o-rv—c~-.n — a>n — oot^CMOD3-«^o — o-co»-i*1 — — «or>CMr^»-nco-«r^-co)iMcnCM»nn — nccnnoccK
»- •^—•«^«nin^-»'«o»'CMOCM«)CM»na)»ocouic>-o-r>roocD->nocMUicocMa-nor>)uii^conincMcoo- — o- — — m
-I UI •« CM — Ul — -OO-CMn CMUIf^UI -OCMCM UI — O XCMUICMO— »h.«S-»CM
= u.
o
a.
_l
«
»-
o
t-
o
>-
t-
2
I> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOl/IOOOOo
o 3~a--oxoo^)*a-ocM3-eo*otMO-coo»-oa-fMOx-oocMtM»x-oiM«CMOCD»--oo3-
u -"'»--"'«-"»'«'>'»»cM»-u»CM—r»ci)coou<'«»-«-«ocMr^»na-o-<)-.«o»^'«'n«~«n'«f»CMa-u — o»CMa-o->nis.anoo — O«»TCOOCM — — » Q
•- » 3- < a> o- 3- — n-«ot^-e-ao^«-oa-n-o-o— 01/13-—«IOT— a-— M a- — o- — — * tn -o i^ — r o- o- o-
m —x»n — « — u>«n«ui3-cncM«ooa-«noCMCM«nnn — n — CMCM—«r»3-i/iCMCMOoo — 3-<^a-
in u
•« I
35
-------
The data in Table 8 were retrieved in order to investigate a possible
relationship between county alcohol sales and the age-adjusted white male
death rate for deaths attributable to motor-vehicle accidents. In order to
place the numbers listed for alcohol sales/capita population over 18 in
perspective, a feature of System 2000, namely the capability to produce
nationwide averages for data items, was used in a separate retrieval to obtain
the value of $64.40 for the nationwide average of alcohol sales > 18. Inspection
of the data in Table 8 reveals that the counties reporting high death rates
did not have unusually high per capita alcohol sales but that a disproportionate
share of high death rate counties were rural counties. Data on gas sales/
capita for State County Code 453330, Loving County, Texas, appear unreasonable
and may be due to confidentiality restrictions in reporting of gas sales.
In Table 9 the age-adjusted death rates for white males for- bronchitis
are compared to those for influenza and pneumonia for counties having high
death rates from bronchitis. A large proportion of the counties retrieved
have small total populations and generally high migration rates. While
the data suggest that there may be some relationship between the incidence
of disease and population mobility, no conclusions should be drawn in this
regard, nor with regard to possible association between the two disease
categories, until more counties (especially urban counties) are included.
The data base provides the capability of pursuing implications of high
migration rates in counties with high (or low) death rates. This might
be a logical next step since high migration rates for age groups not nor-
mally associated with the particular disease category under investigation
would not, in and of themselves, indicate a county was unsuitable for
drawing conclusions about possible associations.
The final retrievals in Tables 10 and 11 compare the geographical
distribution of age-adjusted death rates for the white female population for
breast cancer and for a cluster which includes breast cancer as a significant
component. Eleven of the first 37 counties occur in both lists, but the data
suggest a significantly different geographical distribution of the two
variables. Similar retrievals of data using different variables could serve
to illustrate the suitability of the cluster technique for investigating air
quality-disease association.
36
-------
O
O
III
T3
01
3
•H
4-1
c
O
O
3k
••»
Ct
O
V
0)
oc
LU
ex
o
CO
A
LU
_)
<
V)
O
X
o
CJ O
r~ •
«J O
• o
fM
* O
fVI
OJ Kl
O O
• P-
•O O
T*
o d
» z
to <
•% f>
o
(M 9
j a
H N-
M O
H a
to tu
1-4 X
_J 3
en
OC
X
LU
u
cc
iU
0.
>•
H
ae
o
t
V)
IM
37
-------
T-r-r- »-»-r-r-T- C
O
O
•O
a)
4J
C
8
00
w
in
ooooooooooooooooooooooooooooooooooo
OKO>roo<«»OOK1'Ot\4--»N-'OI>O»«"»»*-l/AI^>
38
-------
•o
•H T-
t! rs«
g
o
00
ro»—
ooooooooooooooooooooooooooooooooooooo
39
-------
^fMi<>o.«Mo-ooui^ — o-«o«/>— «xoxi«!a-oin»-»inooeo«e« oa>
ss
W
1
g
P
qj
*
M
H
M
H
CO
W
53
ffi
H
jj
El
M
g
O
^
j
>
i_i
s
H
g
W
>
M
H
H
co
i-4
M
•
o\
M
H
[NFLUENZA
1 I
en
u
M
«
i
P
1
£~J
PH
CO
§
0
CJ
+T*
N
s
1
"^
^^
00
^
tt
P
fn
o
u
I
h—i
H
M
w
CJ
o
*
o
I
o
M
1
CJ3
IH
s
w
3
w
H
M
P
\
<4
'IS
0
H
\ — '
P-i
CN
, ,
P
z
= 0
o o
0 0-
1 O
I/I O
0
•
r,»»oooa.o.»r.o
ooo.i^-ocMuir-DOin-or-eo-o^«»-a-— • — CM — CM — IM — oa- T o CM — — — . a
3
C
•H
4J
C
O
o
c
•H W)
crj C
•P O
»d r-(
O (0
O ^
J-l CU
co ^
r-l O
> M
*ri *rH
' M T3
w C
(U CU
M CJ
CO
'O CU
*rM C
rQ *H
g
O CO
0 -H
4J
rH -H
CO rC
H 0
0) C •
> O CO
CU ^ N
W ,0 C
U-l >-( 3
O O M
4J fj
t-( CO -H
3 CU
CO 4-1 >-l
CU CO O
CU 4-^ CO
Si CO CU
-Ti "rrt
CO 60 H
.G 00
4J c
3 X! -H
O 4J t)
4J -H C
C S O
•H a,
M CO CO
(X CU O
CU ±J J-(
> a o
O 3 O
-0 O
cfl O
40
-------
a)
3
0
•H
W
C
O
O
W
oo
3- <
o
CD
o IM —
O--OO
(MO —
e>
-O
o
U1 —
a-o
oQTaoo
« « — _. n _
oo — — — —
oo
CO'«
rio-
o —
41
-------
M
H
ui
o
IE
O
1/1 ..
a- o
u o
*• o
a- -e
ff H »* CD CC ^ (£ *O O CD F^ «O ^J W O O ^> 7 **1 O O* 0^ O" Fv i/l a" y y a* **> t*\ ^1 ^1 f^ <^ ^3 O O
> r>n-«oir«.r^rvrx^>^)<«^)^)<4)-«<«-«<«-«^i<4»^>a*<4><4}inoor4'*^o<^i/i^t^iftnv*cDi/^a'~*i>ka'oDao**n'«aDr4CCocoi/ia>&>'*>
a*(/>oo^-^ <*>is*e»je4*^^jMioi%»>*muir»i-a)-o*r»e*'»/>i/)-*a- — — O--O-O — ^»or*«v«o-«CDeDOf^ij«3-coinr^** if) —• -a •« r^a>.*'nr^inco'ncNJt/ibni/irM«nriao4n«iia'r>.
3
o
CL
- O
O !•••
— O
o
- o
n z
u <
t- ec
V) UJ
3 OOOOOOODOOOOOOOOOOOODOOOODOOOOOOOOODO
• r)oooa~»«a-«*v^rtK4*-*«M—*oa'*«O'w>r*)aN''>F>i~MvH'.*v«.M*MvMa'*A'4ifi'-r'>'*)
42
-------
S3
w o
H M
< H
2S
re o
H M
3*
Q W
Q 3
w a
H S
OT fti
n w
9ft
T S
w §
e>
< P
w <
^C S3
5l O
W M
f*H H
W 5
H S3
M P4
ffl O
§ P i
H K
V3 H
W M
CO ^
0
M O
EC S3
O
£3
M
ta X-N
W g
M H
§d
w a
p> c/j
H ^J
4 ^
H W
W E3
iJ H
i-J S5
M <
^J
O
^ 'J
§
Sa
o
r^
U
19
1
y-
B
Ul
o
a:
o
•
o
- o
to o
a- •
o ^)
- o
a- —
a~
o »-
a-
— a-
Cl CJ
o u
w o
» z
u
— 3-
O
" U!
Ul —
_l O
i- rv
>-
N. Ul
t~ K
_l X
u, -"*~«---*;;-5-5*22SSi'>rt^;S*r>"'><0Sl-S"0-
Ul
u.
X
T
O
«, oo.^«*«^«or*»-^^^-t*-o.---.-,o-,o.W-,-,.
*-
^
o
X
UJ
X
•ft
z
z
Ul
u.
w
Ul
»-
z
If
CL
z _^^,,,-.^,.^U,-.o-»«^o-*,u1-a,ooo.ao<_
,- "SSJJ"5M«*::* «.«,«« £S£o^./i:i°°i J«S"-J,
_i
3
o
4
»-
O
>-
o
••»
»-
z
o ooooooooooooooooooooooooooooooooooooo
0 4— ««-«l^*«<*lOO"*»***OO4OOOOOr">
^ ^> 1*1 r^ f*t (^ oo i*s ft o* tt o* o tt ^ oo iv •*« ^ f*> f> a* u> ••) ^ «• i^ M i*) GO oo ^ a* ^k *^ do v^> i/t
tfi oa"fvc*4— •r^Na'"-«*^"-iiif>*— •a-"-»ui — -~«-«a'a'—»f*-~c»*«— XINOJ— — r^*^*-c*4»*»
•
43
-------
SECTION 5
DATA BASE DEVELOPMENT PROCEDURES
CONSIDERATIONS LEADING TO CHOICE OF DATA ITEMS
Choices of specific data items to be included in POPATRISK were made
in close consultation with the EPA Project Officer. Considerations of
utility to EPA were primary but the limitations imposed by data base size
and consequent development and retrieval times played a significant role
in the decision process. In this section, some rationale is given for the
specific choices made.
Entry Level Data Items
The county level is the smallest geographic entity for which deaths
and population mobility data are available. In addition, area emission
estimates provided by the NEDS file are available only on a county basis.
Even if these data sources were not restricted to the county level, daily
population mobility such as commuting to work, strongly suggests that a
there is not much to be gained in studies of possible relationships
between air quality and disease in a finer geographic subdivision than
the county unless data on individuals themselves are presented. .For these
reasons, the county was chosen as the geographical entity for POPATRISK.
Once this decision was made, it was decided to include data on population
density, racial makeup, measures of socio-economic status, climatology, and
population mobility both with regard to work/residence and to in- and out-
migration.
Monitoring Data
It was decided to preserve in its entirety the integrity of monitoring
data in SAROAD. The inclusion of UTM coordinates, e.g. makes feasible the
44
-------
development of air quality estimates on a finer than county level for
special user purposes. The geometric mean and standard deviation enable a
ready computation of arithmetic mean, if the user wishes, and might even
be accomplished by a user-defined function if experience shows this statis-
tical measure is in much demand.
Air Quality
A number of schemes of varying complexity were considered as candidates
for the estimation of county air quality. In the end, the simplest, i.e.,
an arithmetic mean of geometric means for all stations by purpose type, was
chosen because no other seemed to offer distinct advantages. As mentioned
above both intraarea and interarea population mobility and the coarseness of
mortality statistics were considered to limit the usefullness of a finer
geographical breakdown. It was felt also that in preserving the SAROAD monitor-
ing data intact, POPATRISK would enable individual investigators to utilize
the data for constructing more complex air quality assessments to meet
particular analytical needs with maximum flexibility.
In-out Migration
The Census tape from which POPATRISK data on in-out migration were
derived, was specially compiled by Census for this work and represents to
our knowledge, the first use of these data at the county level. It was felt
especially important to include these data so that individual assessment
could be made of the stability of a county with regard to population and
consequently so that possible artifacts in the relationship between air
quality and mortality could be flagged and, perhaps, explained.
Mortality
The direct method of age-adjustment was chosen to correspond to the
practice most widely used for making county mortality statistics comparable
from county to county. Limitations on data base size precluded more than
about 50 ICDA categories so that it was decided to choose the 48 categories
corresponding to aggregations of ICDA's already in use at HERL. Following
a published attempt in the literature to provide the most rational basis for
45
-------
a88re8ating ICDA's into meaningful groups it was decided to include two other
aggregations °n an experimental basis. These aggregations were derived
from the results of a cluster analysis reported by F. Burbank (A Sequential
Space-time Cluster Analysis of Cancer Mortality in the United States:
Etiologic Implications, Am. J. Epid _95, 383-417, 1972) in which ICDA cancer
mortalities were analyzed on the basis of geographical similarity among tumor
types, separately for the white male and white female population. For this
work, it was decided to select groupings of ICDA's which were quite similar
for the white male and white female population. With this selection criterion
only two clusters seemed meaningful and were used for the remaining two
mortality categories. The utility of this approach remains a subject for
future investigation.
HISTORY OF DEVELOPMENT
The POPATRISK data base development and loading procedure was modular-
ized, in that, as data from each selected source became available, method-
ologies for processing and loading the data along with resolution of
geocoding discrepancies were developed and implemented.
Initial verification of raw data to be used in the data base involved
for the most part close inspection of the data and extensive correspondence
with those persons responsible for providing the data. Raw data verification
processes were performed on Census and mortality data during processing
procedures and will be discussed later in this section.
A more extensive verification process was performed on the in-out
migration data obtained from the Bureau of the Census. The verification
effort was done after consultation with the Bureau of the Census because its
use in POPATRISK represented the first access to this data outside the
Bureau. Detailed documentation on the migration file is provided in Appendix B.
To verify the data on the in-out migration file, 33 counties which are
also state economic areas were selected and data for these counties were
compared with published Census data. The nunber of unallocated out-migrants
by sex and by age group in these counties matched exactly the data in Table
2 of the PC(2)-2E Migration Between State Economic Areas Census report.
46
-------
The numbers of unallocated in-migrants on the file were also checked
for these counties. Since comparisons with published reports could not be
made by sex or by age group, in-migrant totals on the file were summed for
both sexes and all ages. These unallocated in-migrant totals on the file
were compared with totals obtained by adding the "abroad in 1965" data in
Table 1 of the PC(2)-2E Migration Between State Economic Areas to the sum
of the male and female "population 5 years old and over" data in Table 2
of the PC(2)-2E publication. Exact matches could not be made with published
data on in-migration because of differing allocation procedures used in
creating the migration file, but the differences in the data totals of
each county selected were less than 1%. Table 12 gives the results of this
comparison.
State totals on the tape were also verified for the contiguous
United States and the District of Columbia by aggregating totals for all
ages and both sexes. These totals corresponded exactly with the unallocated
out-migrant totals determined from Table 4 of PC(2)-2E by the following
formula: "total persons living in state in 1965" - "total persons still
living in state in 1970."
As with the in-migrant county totals, unallocated in-migrant state totals
on our file differed slightly (less than 1%) from published data. Data from
Table 1 of PC(2)-2E under the general heading "different state economic
area" and the specific heading "different state" were added to "abroad in
1965" data from Table 44 of PC(2)-2B, Mobility for States and the Nation.
The resulting data totals compared closely with the sum of male and female
in-migrant totals for all ages on our tape. This comparison is shown in
Table 13.
METHODOLOGIES FOR PROCESSING AND LOADING
During the entire procedure of processing and loading the POPATRISK
data base, extreme care was taken to assure proper maintenance of the quality
and integrity of the data from their verified sources, through the processing
stage, and into the data base. This involved a thorough investigation of
the data to isolate all anomalies and geocoding discrepancies in the data
47
-------
TABLE 12. UNALLOCATED IN-MIGRATION TOTALS (ALL RACES, AGES, BOTH SEXES) FOR
SELECTED COUNTIES WHERE COUNTY=STATE ECONOMIC AREA (SEA)
STATE
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Georgia
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Massachusetts
Michigan
Mississippi
Missouri
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
Ohio
Oklahoma
Oregon
Pennsylvania
South Carolina
Tennessee
Utah
Washington
West Virginia
COUNTY
FIPS CODE
SEA
TOTAL 1
TOTAL 2
**
Pima
Pulaski
Ventura
El Paso
Hartford
New Castle
Bibb
Boone
St. Joseph
Woodbury
Sedgwick
Fayette
Calcasieu
Aroostook
Worcester
Kent
Hinds
Greene
Clark
Hillsborough
Atlantic
Bernalillo
Broome
Wake
Allen
Creek •
Lane
Blair
Aiken
Davidson
Salt Lake
Spokane
Kanawha
04 019
05 119
06 111
08 041
09 003
10 003
13 021
17 007
18 141
19 193
20 173
21 067
22 019
23 003
25 027
26 081
28 049
29 077
32 003
33 Oil
34 001
35 001
36 007
37 183
39 003
40 037
41 039
42 013
•45 003
47 037
49 035
53 063
54 039
B
A
7
B
C
A
F
2
B
A
A
E
D
1
B
B
A
C
A
A
E
A
E
E
0
C
B
F
B
B
A
D
C
95501
52092
114038
94668
95197
52501
16985
6545
29369
14571
62292
38991
18725
13273
61123
49305
31879
33160
73634
35751
21378
68623
25542
51031
12341
9506
50693
9868
13024
66472
57067
58094
22999
95371
52129
113820
94312
94403
52338
16988
6568
29379
14584
62222
38985
18722
13154
60888
49343
31881
33181
73572
35796
21395
68495
25542
51005
12330
9506
50676
9868
12997
66346
56803
58011
22861
*
**
Total from 1970 migration tape
Total from Census publication
48
-------
TABLE 13. STATE UNALLOCATED IN-MIGRATION TOTALS FOR
ALL RACES, AGES, BOTH SEXES
STATE
Alabama
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of
Columbia
Florida
Georgia
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
FIPS CODE
01
04
05
06
08
09
10
11
12
13
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
44
45
46
47
48
49
50
51
53
54
55
56
TOTAL FROM 1970
MIGRATION TAPE
253274
367799
183763
2164948
393295
310808
74058
100313
1344272
474952
100797
797673
387630
189704
261540
239756
248315
82386
513130
450030
564476
269816
168462
410022
76863
138803
119133
106157
723874
144420
1053286
415305
59260
681849
303150
273894
582853
103951
246710
56105
326387
1028483
118723
56911
641962
512079
114621
270043
52575
TOTAL FROM CENSUS
PUBLICATION
253445
368192
183918
2171140
393764
311115
74175
100436
1345594
475353
100926
798727
388114
190120
261819
239915
248684
82527
513893
450842
565340
270656
168549
410605
76972
138998
119224
106285
724639
144617
1054876
415688
59322
683030
303425
274395
584037
104311
246959
56177
326679
1029826
118865
56940
643407
513084
114722
270534
52611
49
-------
as well as verification that all such problems were resolved as accurately
as possible. The effort was accomplished, for the most part, through use
of county code cross-walk files developed to match record identifiers with
data base entries. These files were merged to produce the county cross-
reference table included in the POPATRISK Geocoding Manual.
Every opportunity was taken to spot-check .data during processing and
loading to insure not only accuracy of processing but also that data were
loaded into the proper county entry and in the correct location. The
cohesive nature of the data base entries provided opportunity for an on-
going process of verifying already loaded data as new data were being
processed and loaded.
Figure 2 provides an overview of the first phase of the POPATRISK
data base development process. The loading processes shown in Figure 2
were fairly mechanical in nature and involved relatively moderate amounts
of data. For these sets of data the loading procedure involved extracting
data from the input files and structuring the data into S2K load string
format with the proper associated county code and then performing Natural
Language Queue Access loading. All loading processes involved incremental
loads in groups of county repeating groups to assure successful execution
of load jobs within the constraints of run time and file sizes.
Population demographics and climate data items in the entry level were
taken straight from the SSI-developed tape and loaded county by county.
Matching of counties in the data base presented no problems with this data
since discrepancies had been resolved in previous work by SSI. However, as
mentioned in Section 2, entry level data for New York City were provided in
one record for New York City as a whole. Income ratios and climate data
were considered the same for the five boroughs and were duplicated for each
borough entry. Other demographic data for each borough were extracted from
the 1972 County and City Data Book.
The NEDS emissions and SAROAD monitoring data were also simply
extracted from the appropriate NADB files and loaded into the data base
county by county. Users should be aware that some monitoring data in
50
-------
I
\
I^^L
^^
^
*
^
<* w
r^ w
ON H
f-4 M
W
B9
r^ p£
M U
0
a
a
o
H
M
ZS
•«!
0 H
53
01
w
n)
co
cu
t-i
a
13
CU
O
O
4J
c
0)
rt
en
QJ
o
M
a
o
CO
ctf
o
CM
Q)
•H
d
bO
51
-------
SAROAD is of questionable nature. This applies for the most part to a
few of the reported values for observed maxima. It was not possible, within
time and budget constraints, to edit SAROAD data for the purposes of this
contract. However, it is our judgment that the yearly averages abstracted
for use in POPATRISK can be used with considerable confidence. It should
also be noted that although running-average monitoring data, identified
by sampling intervals X, Y, or Z, were initially loaded into the data
base, quality and the usefulness of these data were found to be
questionable and the data were subsequently removed.
Since NEDS recognizes only 6 counties in Massachusetts, SAROAD
recognizes 13, and the POPATRISK data base contains 14 Census counties,
resolution of the discrepancies required a complete reapportionment of
both sets of data to the respective data base counties.
As discussed in Section 2, the Census coding scheme was selected for
Massachusetts because of its finer geographic breakdown and widespread use.
The four-digit county codes used are the three-digit FIPS codes with leading
zeros added to create a four-digit POPATRISK code. Table 14 lists each
SAROAD city and county and POPATRISK code to which it was assigned. The
NEDS area emissions were aggregated for the state and then disaggregated
into 14 Census counties using 1970 Census figures, gasoline sales, etc.,
on a basis which conforms largely to the guidelines in the EPA document
'Guide for Compiling a Comprehensive Emission Inventory1 APTD-1135. There
are approximately 2,500 point sources in Massachusetts. They were assigned
to appropriate Census counties by a computerized procedure involving
geographic location by UTM coordinates. A detailed discussion of the
procedures used to assign NEDS point source data to Massachusetts counties
is provided in Appendix A.
SAROAD monitoring data for Massachusetts were simply reassigned to
their appropriate data base counties by identifying the city name of all
SAROAD monitoring sites and then locating the data base county in which
the city is located.
52
-------
TABLE 14. ASSIGNMENT OF MASSACHUSETTS SAROAD MONITORING TO CENSUS COUNTIES
(all state 22)
SAROAD City Name
Adams
Amherst
Athol
Attleboro
Ayer
Belchertown
Boston
Brookline
Cambridge
Chicopee
Fall River
Fallmouth
Fitchburg
Framingham
Greenfield
Haverhill
Holyoke
Lawrence
Lee
Lowell
Lynn
Marblehead
Maynard
Medford
Needham
New Bedford
Newburyport
North Adams
Northfield
Norwood
Peabody
Pittsfield
Plymouth
Quincy
Revere
Springfield
Waltham
Warren
Woburn
Worchester
Census County Name
Berkshire
Hampshire
Worcester
Bristol
Middlesex
Hampshire
Suffolk
Norfolk
Middlesex
Hampden
Bristol
Barnstable
Worcester
Middlesex
Franklin
Essex
Hampden
Essex
Berkshire
Middlesex
Essex
Essex
Middlesex
Middlesex
Norfolk
Bristol
Essex
Berkshire
Franklin
Norfolk
Essex
Berkshire
Plymouth
Norfolk
Suffolk
Hampden
Middlesex
Worcester
Middlesex
Worcester
POPATRISK County Code
0003
0015
0027
0005
0017
0015
0025
0021
0017
0013
0005
0001
0027
0017
0011
0009
0013
0009
0003
0017
0009
0009 .
0017
0017
0021
0005
0009
0003
0011
0021
0009
0003
0023
0021
0025
0013
0017
0027
0017
0027
53
-------
Since POPATRISK is a county based data base, data for independent cities
without county status required reassignment to their original counties.
Only three independent cities are universally given county status and there-
fore maintain that status in the data base. They are Baltimore, Maryland,
St. Louis, Missouri, and Carson City, Nevada and are listed in Table 15.
Independent cities in Virginia were assigned to their original counties as
shown in Table 16. As mentioned in Section 2, it was necessary to create
two data base counties for those independent cities for which no county now
exists, namely Elizabeth City - Warwick and Norfolk - Princess Anne, with
arbitrarily assigned codes 0972 and 2144 respectively. Entry level data
for the new counties were obtained by aggregating data for independent
cities contained therein.
The NEDS emission data for Virginia independent cities were added to
the NEDS data for the county to which they were assigned. Independent city
SARQAD site and monitoring data were simply retrieved and loaded into the
county entry to which they were assigned.
To calculate monitoring air quality data, the geometric means for each
pollutant were retrieved from the data base for purposes of (1) population
monitoring, (2) source monitoring, and (3) background monitoring where the
sampling interval is 1 (1 hour) or 1 (24 hours). Arithmetic averages of the
geometric means were calculated for each pollutant for each of the three
site types.
A predicted mean of emissions for TSP was calculated by SSI on the
basis of a regression model described in Appendix C. A card deck of NEDS
data had to be used because the regression model was developed using a
version of the NEDS emissions inventory which is no longer currently access-
ible through NADB files. The predicted mean for TSP was calculated for
counties where TSP and NOx emissions (tons/year) fall within the following
limits:
1414.375 ^ TSP £ 230031.803
NOX < 432778.195
54
-------
IABLE H. I» "TXES KECOC^EO MD CODE. I, DATA BASE
POPATRISK State-Counti_Code
Baltimore, Maryland 26 4280
St. Louis, Missouri 29 0040
Carson City, Nevada
55
-------
TABLE 16. ASSIGNMENT OF VIRGINIA INDEPENDENT CITIES TO POPATRISK COUNTIES
(all state 48)
SAROAD
City Name Code
Alexandria 0080
Bedford 0320
Bristol 0480
Buena Vista 0560
Lexington 1740
Charlottesville 0680
Clifton Forge 0780
Covington 0840
Colonial Heights 0820
Danville 0920
Emporia 0980
Fairfax 1040
Falls Church 1080
Franklin 1180
Fredericksburg 1240
Galax 1280
Harrisonburg 1480
Hopewell 1560
Lynchburg 1840
Martinsville 1940
Norton 2240
Petersburg 2360
Radford 2600
Richmond 2660
Roanoke 2700
Salem 2800
South Boston 2920
Staunton 3060
Waynesboro 3320
Suffolk 3080
Williamsburg 3360
Winchester 3380
Hampton 1440
Newport-News 2120
Chesapeake 0710
Norfolk 2140
Portsmouth 2440
Virginia Beach 3240
County Name
Arlington
Bedford
Washington
Rockbridge
Rockbridge
Albemarle
Alleghany
Alleghany
Chesterfield
Pittsylvania
Greensville
Fairfax
Fairfax
Southampton
Spotsylvania
Grayson
Rockingham
Prince George
Campbell
Henry
Wise
Dinwiddle '
Montgomery
Henrico
Roanoke
Roanoke
Halifax
Augusta
Augusta
Nansemond
James City
Frederick
Elizabeth City-Warwick
Elizabeth City-Warwick
Norfolk-Princess Anne
Norfolk-Princess Anne
Norfolk-Princess Anne
Norfolk-Princess Anne
POPATRISK County Code
0200
0340
3300
2740
2740
0060
0100
0100
0720
2380
1400
1060
1060
2940
3000
1360
2760
2500
0580
1520
3420
0960
2020
1500
2720
2720
1420
0260
0260
2060
1600
1220
0972
0972
2144
2144
2144
2144
56
-------
A formatted input file of SAROAD data selected from the data base was
read simultaneously with NEDS input card deck. The county identifiers on
each file were matched and calculations performed. Results were placed in
S2K load strings and subsequently loaded into the data base.
In those cases for which there were no monitoring data for a county,
but the TSP and NO values qualified the county for calculation of mean
X
predicted TSP, a single data set for TSP exists containing the mean
predicted TSP.
The air quality repeating group provides the user readily accessible
estimates for assessing the general exposure of county population to the
various pollutants.
Figure 3 provides an overview of the second phase of the data base
development process. Processing and loading of data at this stage involved
numerous calculations and handling of much larger quantities of data.
Therefore, S2K Procedural Language Interface COBOL programs were used to
interface with the data base, retrieve data base data needed for calcu-
lations, and load the data by means of PLI optimized loading.
The Census in-out migration tape was read and the FIPS county code for
each record on the tape was written to a file to create a cross-walk file
for matching migration counties with data base counties. Although there
are state total records on the original tape, these data were not needed
for POPATRISK and were disregarded. Also disregarded on the tape were
migration records for Alaska and Hawaii.
Geocoding problems with the migration data were resolved in a manner
similar to that used for previous data types. Baltimore and St. Louis
independent cities and counties were reordered to match the SAROAD scheme;
data for Yellowstone Park, Montana was added into the record for Park
57
-------
CSl
0)
w
ctf
M
(U
V-i
-o
(U
o
o
M
ex
4-1
d
0)
CU
o
0)
T3
C
cfl
0)
O
V-i
3
O
M
CO
M
<&
PL.
O
•H
VJ
0)
O
00
•H
58
-------
County, Montana, and data for all Virginia independent cities were added
into records for their corresponding counties. There was no record on
the migration tape for Loving, Texas, POPATRISK code 45 3330. Since the
population of that county was 164, the data would have been suppressed by
Census confidentiality requirements.
When all geocoding problems were resolved, a PLI program was written
to process the migration tape using the FIPS - POPAIRISK cross-walk file
and load the data county by county. Processing of migration data involved
retrieval of the 16th data set (totals) of the age-specific Census data.
These race-sex totals were used to convert migration data to percentages
of the total population for a particular race-sex group. For example, the
number of white male in-migrants for age group 14 was divided by total
white males in the county and multiplied by 100 to give a percentage of
the total population of white males that were both in-migrants and between
age 5 and 14. Migration percentages were calculated for both 'Total'
(Allocated + Unallocated) and 'Allocated' data. As data sets were processed
and loaded, total county migration figures were tallied and loaded into
the entry level components for the county.
The New York City boroughs presented a unique problem with in-out
migration data. As the Census documentation in Appendix B explains, persons
reporting in 1970 that their residence in 1965 was merely a different
house in New York City or an incomplete response of "New York City - no
borough specified" were not included in either the in-or out-mi grant tallies
for individual boroughs. An extra summary record for New York City as a
whole is provided that excludes those persons with the above responses.
The documentation also describes another extra record on the tape that
contains out-migrant counts for those persons with a 1965 residence of "New
York City - no borough specified" and 1970 residence outside New York City.
However, this record was not found on the tape.
59
-------
In order to complete the data for New York City boroughs, SSI contacted
the Bureau of the Census to discuss the possibility of retrieving data
not included on the tape and incorporating that data into the data base.
The Census Bureau was very helpful in providing hard copy tabulations of
the needed data. Specifically, tabulations provided show counts of persons
reporting 1970 residence of one of the New York City boroughs and 1965
residence of 'New York City' - borough not specified by sex, race, and age
groups; also supplied were counts of persons reporting 1970 residence out-
side New York City and 1965 residence 'New York City' - no borough speci-
fied by race, sex, and age group.
A procedure was developed to distribute among boroughs the count of
in-migrants who gave incomplete responses (no borough specified) to 1965
residences, between the five boroughs according to a ratio proportional to
in-migrants giving complete responses. The procedure resulted in estimated
counts of 'Incomplete response' in-migrants from each of the five boroughs,
which when summed over the 1970 residence boroughs, provided out-migrant
counts from each of the boroughs. Both in- and out-migrant figures were
considered 'allocated1 and were added to their respective borough records.
'Incomplete response' out-migrants from New York City were distributed
to each of the five boroughs according to the 1970 borough population.
These figures were also considered 'allocated' and where added to their
appropriate boroughs.
The procedure for processing the NCHS mortality data and calculating
age-adjusted rates was as follows:
To ease the problem of working with such large numbers of records on
the raw tapes, each tape was divided into five separate mass storage files.
Since the raw data were sorted by state and county, three mass storage
files, one from each tape, could be read and manipulated simultaneously
resulting in a merge of all deaths over the three-year period for each county
into a worktape record. Counts of deaths were stored in the worktape
record in a three dimensional age(15), race-sex(4), ICDA(50) array. The
15 age groups used for mortality data are identical to those used in the
60
-------
age-specific population data. Five runs were made, using three mass storage
raw data files at a time, to create five separate tapes of mortality
counts, one record per county. These five work tapes were then saved to
be combined into one file spanning two tapes for delivery to EPA along with
this final report. Documentation on the work tapes are provided with the
tapes as well as in Appendix B of this report.
Direct method age-adjusting was accomplished as follows:
Let the indices i, j, k be defined as:
i 1CDA (1-50)
j Age Group (1-15)
k Race-sex Group (WM, WF, NWM, NWF)
and let:
Dijk = total number dead in county for years 1969, 1970, 1971
in ICDA i cause category, age group j, and race-sex
group k, taken from the work tape record.
Rjk = number of people in age group j and race-sex group k in
county (i.e. number at risk)
Dijk
Sijk = Rij = age-specific death rate in ICDA i cause in age
group j and race-sex group k
Pj = number of people in age group j in the standard U.S.
population in base year 1970.
P= ZPj = total U.S. population in 1970
j
Aik = age-adjusted death rate in ICDA i cause in race-sex group
k (per 100,000)
Aik can be written as:
15
Aik = Z Sijk Pj
1=1
61
-------
with a substitution for Sijk:
15
Aik = £ Dijk Pj_
j=l Rjk P
The actual calculation of age-adjusted rate per 100,000 over 3 year
period is performed as follows:
Dijk =.Dijk = average number of deaths over 3 year period
3
15
Aik = £ Pijk . . Pi . 100,000
j=l Rjk P
The variable R., is easily retrieved from the AGE-SPECIFIC POPULATION
JK
repeating group. Since the number dead, Dijk, is divided by three, the
death rates represent the average yearly age-adjusted death rate per
100,000 population over the three-year period from 1969 to 1971.
Verification of the mortality data and processing methodology began
with the mortality count worktape records prior to any calculations and
loading. Three test files were created for testing the worktape creation
software. When verified correct, the worktapes were created and extensive
spot-checking of the worktapes was accomplished using published NCHS
publications. Data is published for each year, showing total deaths for
each ICDA category for each county. Adding deaths across age groups and
across the three years provided verification of total death counts on the
worktapes.
Geocoding problems with the NCHS coding scheme were handled in a
similar manner to the in-out migration data. Mortality worktape records
containing counts for Virginia independent cities were combined with the
worktape records for their respective counties. With a few exceptions,
the worktape delivered to EPA contains all geocoding corrections
giving a county by county match-up of worktape records with data base
counties. Mortality data were available only for New York City as a whole
with no breakdown for the five boroughs. Therefore, only one record exists
62
-------
on the worktape for the five New York City boroughs.
A PLI COBOL program was developed to extract death counts from the
worktape, retrieve county Census data, calculate age-adjusted rates and
load the rates into the data base, using PLI optimized load. Upon comple-
tion of testing of the mortality calculations and loading process and trial
loading of data for several counties, it became apparent that processing
time and charges for loading the data in their present format would be
prohibitive. Version 2.80 of SYSTEM 2000 offers the capability for creation
and removal of indices at any time, in effect, changing components from
KEY to NON-KEY, and vice versa. Therefore, all the KEY'ed components in
the mortality repeating group were charged to NON-KEY during the loading
process. This action eliminated the excessive processing time and expense
of creation and maintenance of KEY item pointers in the data base files.
A decision was made to process and load the mortality data for New
York City into each of the five New York City boroughs using the total
population of the five boroughs for calculations. This resulted in identical
data for each of the boroughs which represent age-adjusted death rates
for the whole of New York City.
At the present time, there are no published data directly comparable
to the direct method age-adjusted death rates present in the data base.
Although NCHS did provide unpublished age-adjusted data for 1969-71, they
customarily use the 1940 standard million for calculations. Therefore, to
verify the data, extensive hand calculations were done for randomly selected
counties. In addition, calculations were done using the 1940 standard
million to arrive at results comparable to NCHS data. Calculated data
corresponded extremely well with the NCHS data. Any slight variances in
results could be explained by the slight differences in the distribution of
1970 county population data across race groups.
It should be noted that in some cases, calculated age-adjusted death
rates may be misleading, specifically in cases where there are few or no
people in a race-sex-age group in a county. The situation occurs, for
example, in counties where there are few, if any, non-white persons. In
63
-------
calculation of age-adjusted death rates, there was no satisfactory method
of selecting a threshold, or cut-off point, for nuirber of deaths per
population below which calculations would not be done. Therefore, in
those cases where a few deaths occur in a very small population, the age-
adjusted death rate may be extremely high. The user may avoid problems
with interpretation of the data by glancing at population data to obtain a
feeling for the distribution across race-sex-age groups.
There were isolated cases in which one or more deaths occurred in a
county for a race-sex-age group for which there were no people. This
problem may be due to the inherently imprecise nature of Census data when
working with small numbers or to inaccuracies in the NCHS tapes. In these
cases, the age-specific rate for that race-sex-age group was set to 0.
When all loading and verification of the mortality data were completed,
all components in the mortality repeating group were converted back to
KEY'ed components. The data base was then reorganized using the REORGANIZE
ALL command to re-structure the data base tables and pointers to maximize
efficiency of retrievals.
The 1970 age-specific population data were taken from raw Census tapes
and aggregated into 15 age groups and 8 race-sex categories (4 total
categories and 4 urban categories). It was assumed that since age-specific
Census data would be accessed rather infrequently, those race-sex categories
defined in the data base would be the minimum necessary for derivation of
all categories. For example, total non-white can be derived by subtracting
total white from the total population. User-defined functions are provided
to derive those components not explicitly defined.
The Census data were aggregated, formatted for entry into the data
base, and then placed in mass storage files. A PLI program was used to
load the data by means of the PLI optimized loading. During loading,
another component for non-white males was calculated and loaded to further
simplify calculations of components not explicitly defined for use in
in-out migration and mortality calculations. To further enhance the
Census data repeating group, figures were totalled across the 15 age groups
64
-------
to create a 16th data set giving county totals for each component. These
totals were loaded as data with AGE GROUPING code 0.
Geocoding problems with independent cities for Census data were handled
as has been previously discussed. The Census county codes are shown in
the cross-reference file of the POPATRISK Geocoding Manual except for the
District of Columbia. There were no data for the District of Columbia on
the Census tape, and therefore, the data had to be manually extracted from
Census publication PC(V2)-10 General Population Characteristics and loaded
separately.
Verification of the Census data has been an on-going process of spot
cross-checking against various sources. The primary source used for
verification was Census publication PC(V2) General Population Character-
istics giving detailed population figures for each county. Loaded Census
data were checked against PC(V2) and then cross-checked against the county
entry level data and the Bureau of Census publication 1972 County and
City Data Book (CCDB) to assure data loaded into each county were the
correct data for that county.
There were found, in some cases, to be minor discrepancies between
the total age-specific population as shown in the Census repeating group
and the total population as shown in component CIO. These discrepancies
are explained in the CCDB to occur because figures shown in the main body
of the book are only representative counts. Complete counts appear in
Appendices A and B of the CCDB and correspond exactly with those loaded
into the data base Census repeating group. In order to resolve the
problem and to eliminate confusion over the figures, CIO TOTAL POPULATION
figures were changed to show complete counts.
Conversion to SYSTEM 2000 Version 2.80
Loading of the data base was initiated using SYSTEM 2000 Version 2.65.
The data base was partially loaded when the National Computer Center
installed a new version of SYSTEM 2000 Version 2.80. Normally, new releases
of the system are fairly transparent to data base development and require
65
-------
a. minimal effort to adapt to changes. However, Version 2.80 contains
major changes to the basic file structure, namely decreasing page size
from 504 words to 448 words. This change, in effect, reduces input/
output charges for accessing and updating the data base. A reload of the
data base was necessary to gain optimal benefit from Version 2.80 features.
Reloading also provided an opportunity to make desired modifications to
the data base design to provide for more efficient retrievals and space in
the data base for newly obtained in-out migration and mortality data.
Redesign and conversion to Version 2.80 involved unloading the
existing data base and reloading the data into a newly designed data base
created under Version 2.80. The conversion went smoothly, and the benefits
of the redesign and conversion far outweighed the cost of the effort.
USE OF REMOVABLE DISK PACK AND ARCHIVING PROCEDURES
As data base development proceeded and the size of the data base files
began to increase substantially, the cost of maintaining the files on
mass storage became prohibitive. The use of magnetic tapes as the primary
storage medium was also undesirable because of the high I/O costs involved
in transferring large files between tape and mass storage.
In light of the significant increase in file sizes expected with the
addition of in-out migration and mortality data to the data base, the most
cost-effective solution to the data base storage problem was a private
removable disk pack. Therefore the POPATRISK data base was transferred to
removable disk pack to minimize monthly charges for disk space utilization
and to provide for relatively easy access for frequent use.
There are a few drawbacks to use of the private disk pack which should
be mentioned here. The first is the inconvenience of having to have the
disk pack mounted before accessing the data base. Experience has shown,
however, that this is only a minor inconvenience and usually involves
only a 3-5 minute wait. The other drawback is that files on the disk pack
are not protected from accidental loss by the Secure Processor. Archiving
of files residing on a disk pack is left completely to the user.
66
-------
Utilization of a dependable backup and restore system is vital to
successful creation of a SYSTEM 2000 data base. SYSTEM 2000 flags a
data base damaged whenever an update job, either demand or batch, terminates
abnormally. Having been saved prior to the update job, the previous
updated version of the data base may be restored and operations resumed.
Since this error is generated by an update run 'MAX TIME1 or any EXEC 8
system interrupt,it became a fairly common problem with POPATRISK and was
resolved routinely.
A POPATRISK backup system was utilized during development of the data
base to both insure against accidental loss and to provide sufficient
archiving of data base updates. Similar to backup systems used by the
National Mr Data Branch of EPA, n generations of the data base are archived
on tape, one generation per backup tape. The tapes are cycled each time
the data base is saved so that only the last n archival versions of the
data base are retained. An archival restore runstream was used in conjunc-
tion with the backup system to provide access to previous versions of the
data base. Both backup and restore runstreams are fully documented in
Appendix B.
A UNIVAC EXEC 8 system error encountered during data base development
should be mentioned here because of its impact on the data base and the
substantial cost in man hours and computer time required for recovery.
National Computer Center personnel determined that a system error caused
damage to one or more of the data base files, resulting in a shift in data
within the data base. Since the data base was not flagged as damaged, the
error went undetected throughout a significant portion of data base loading
and archiving. Recovery from the damage required recovering appropriate
data files and reloading all data loaded since the last correct version of
the data base. This problem of unflagged damage to data is a data processor's
nightmare and is mentioned here to emphasize the point that although safe-
guards are built into a data management system, they do not always work.
Experience with POPATRISK has shown the importance of continuous monitoring
of existing data in the data base as well as extensive verification of data
presently being loaded.
67
-------
The POPATRISK data base was loaded by both S2K Natural Language and
Procedural Language COBOL Interface programming. During the initial
stages of data base development,when moderate amounts of data were being
loaded and data base files were small, Natural Language Queue Access
loading performed a satisfactory function. As the volume of incoming data
grew substantially and the data base files expanded, increases in computer
I/O changes made Natural Language processing prohibitive. Considering
the need for a more efficient method of loading large quantities of data
and the necessity for interaction with data base data during processing,
and loading, it was decided that the benefits of PLI programs would far
out-weigh the costs of their development.
Procedural Language Interface optimized loading provided a much more
cost-effective method of loading and allowed full access to existing data
base data, a capability most useful in calculation of migration and age-
adjusted mortality data. However, even with the increased efficiency of
loading, by the time mortality data were being loaded, the data base files
had expanded such that running times and I/O charges for mortality load
jobs were growing exhorbitantly.
The major factor contributing to the increased I/O charges was the
number of data sets to be loaded, all components of which were KEY'ed.
A decision was made to make the components NON-KEY during loading, a new
option available under SYSTEM 2000 Version 2.80. The action in fact
decreased mortality data load times by at least a factor of 10. Changing
the components back to KEY upon completion of loading was accomplished in
2 hours of computer time. The net results of the loading procedure were
substantial savings in time and computer charges. The REORGANIZE ALL
command was issued at various times during loading and upon completion of
data base development to reorder all the data base tables whose entries
were scattered as a result of incremental load and update operations.
68
-------
APPENDIX A
RESOLUTION OF GEOCODING DISCREPANCIES FOR
MASSACHUSETTS NEDS AND SAROAD DATA
Overview of Work Done on Massachusetts NEDS
Massachusetts NEDS point sources were assigned to appropriate Census
counties by a computerized procedure involving geographic location by UTM
coordinates. Area sources were apportioned by first aggregating the entire
state's area emissions and then disaggregating into Census counties following
the EPA publication, "Guide for Compiling a Comprehensive Emission Inventory."
Massachusetts SAROAD monitoring stations were assigned by identifying the
city name of all SAROAD monitoring sites and then locating the Census county
in which the city was located.
Detailed Procedures Used in Treating NEDS Point Sources
To obtain Census county emissions totals for NEDS point sources in
Massachusetts, the points were apportioned from NEDS counties into Census
counties. SSI staff accessed the AEROS NADB*NEDS-USER file to retrieve
the following data for approximately 2500 Massachusetts points.
STATE
COUNTY
AQCR
PLANT ID NUMBER
CITY
UTM ZONE
YEAR OF RECORD
ESTABLISHMENT NAME AND ADDRESS
POINT-ID
UTM-HORIZONTAL
UTM-VERTICAL
EMISSION ESTIMATES OF PARTICIPATES
EMISSION ESTIMATES OF S0?
EMISSION ESTIMATES OF NO".
EMISSION ESTIMATES OF HC
EMISSION ESTIMATES OF CO
69
-------
These data were supplied to an SSI consultant, Dr. Richard J. Kopec,
who identified Census counties for each point source according to UTM
coordinates of the point. For the several points in which UTM coordinates
were missing in the NEDS-USER file, SSI staff searched for another point
in the file with the same city code as the point with incomplete information.
In most cases, the UTM coordinate of this new point was used to locate
the original point in the correct Census county.
For example, the NEDS point
STATE 22
COUNTY 1291
AQCR 119
PLANT ID NUMBER 0510
CITY 1700
UTM ZONE 19
YEAR OF RECORD 72
ESTABLISHMENT NAME AND ADDRESS POLAROID, 1 UPLAND RD,
POINT-ID 03
UTM-HORIZONTAL
UTM-VERTICAL
EMISSION ESTIMATES OF PARTICIPATES 0000000
EMISSION ESTIMATES OF SO 0000000
EMISSION ESTIMATES OF NO 0000000
EMISSION ESTIMATES OF HC 0000008
EMISSION ESTIMATES OF CO 0000000
has no UTM coordinates (UTM-HORIZONTAL, UTM-VERTICAL). It has a city code
of 1700. The following point, also with a city code of 1700, and with UTM
coordinates present in the point record was located in the NEDS-USER FILE.
STATE 22
COUNTY 1291
AQCR 119
PLANT ID NUMBER 5732
CITY 1700
UTM ZONE 19
YEAR OF RECORD 72
ESTABLISHMENT NAME AND ADDRESS RICHARD A KLEIN CO.349 LENOX,NORWOC
POINT-ID 01
UTM-HORIZONTAL 3181
UTM-VERTICAL 46723
EMISSION ESTIMATES OF PARTICULATES 0000000
EMISSION ESTIMATES OF SO 0000000
EMISSION ESTIMATES OF NO". 0000000
EMISSION ESTIMATES OF HC 0000022
EMISSION ESTIMATES OF CO 0000000
70
-------
The UTM coordinates of this source were substituted in the point record of
the original source. These "dummy" UTM coordinates were used to place the
original source in the correct Census county. Table A-l lists
the point sources with missing UTM coordinates and includes the "dummy"
coordinate used to identify the point's Census county. Although in many
cases identical multiple entries appear in the NEDS file due, e.g., to
multiple stacks at a given site, only one point is listed below for each
plant. (Points around the city of Boston were carefully checked when using
this method of Census county identification because the city lies in more
than one county boundary.)
In a few instances it was not possible to locate other points in the
file with the same city code as the incomplete point source. For example,
no points in the Lower Pioneer Valley Region (City Code 6866) contained
UTM coordinates in the point records, so there was no "dummy" UTM coordinate
available. Also, several point records contained UTM coordinates that
misplaced the point source by locating it outside of the state.
In these cases the name and address of each point was traced to place
the point in the correct Census county. Table A-2 contains record descriptions
of point sources which were located in Census counties by this method.
Again, only one set of values is listed for each plant.
After identifying Census counties for all the Massachusetts points,
county totals were calculated for each of the five emission categories.
Table A-3, "Total Emissions Summed by Type and by County," illustrates the
results of this summation and the data that were loaded into the POPATRISK
data base.
Detailed Results for NEDS Area Sources
Table A-4 gives the results of apportioning state total emissions
among 14 Census counties.
71
-------
W
H
00 JO
S31VWI1S3 NOISSIW3
OH JO
S31VWI1S3 NOISSIW3
m jo
S31VWI1S3 NOISSIW3
2OS JO
S3J.VWIJ.S3 NOISSIW3
S3ivinoiibvd jo
S3IVWI1S3 NOISSIW3
OOOOCDOOOOOOOOOOOOOOOOOOOOCDOO OOO
O OOCDC5CI7OOOOOOCDOOOOOOOOOOCDOOOO OCDO
OOOOOOOOOOOOCDOCDOOCDCDOOOOOOOOCD OCDO
OOCDOOOOOOCDCDCDC^OOOOC^CiCDOOOOOOOO OOO
ooooooooooooooaooooooooooooo ooo
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
iojcoc^cr*vx5i-ncT»c\jcoto o cocNJO(Nj«^-<3-o^<^-i — UD oo«^-o
ro«X>O«3-'d-oC5O. — CNJCNJ
l^-unoOOC5rOCXjrocr.COOOOOOOOOOC3OO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOCT OOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOO >OOOOOOOOOOOOOOoO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
JJ
ct)
H
, — ro«3-c\JC\J
rororororo
ocr>c\jCvj, — • — cvj
cooorororororo
- z: C3
" 2: e£ Z
o: a I— o 3: •—
1— oo H- O I
OO " CO LU C£
o LU o
~^ OO LU CO Z CQ
<-o a
10 t— ce o « -
t— o o LU v:
* o ca CM r> a_ LU
o <_>
s: •-• 10 -
o o
0^0338 jo
3NOZ
OOOOOOOOOOOOOOOOOOOOCMOOr-^ OOO
, — oicricncTi<^oir>CNjc\jLr>r— i — oO
OCMOOOi — i — OOOOOO
• — « — CNJOi — CXJO CDO. —
I — CMoJtnoooooooooi — r— o
o o o o ooo
OOO
CM CMOsJOU
CVICMCJOJCVJCO
31V1S
72
-------
CM
4s
0)
03 JO
S11VWI1S3 NOISSIW3
3H JO
*ON JO
S31VWI1S3 NOISSIW3
ZOS JO
S3IVWI1S3 NOISSIW3
S31V~inOIlHVd JO
S31W'JI1S3 MOISSIW3
uoz laon-win
QIJV
JO 8V3A
3'joz uin
A1I3
01 INVId
*
to
o
e:
o
8
s:
3
CO
O
a.
OOOOOOCMOCM
OOOOOOOOCM
ooooooooo
ooooooooo
OOOOOOOOO
§00000000
oooooooo
§«j- cn f—. ro 10 »— u"> oo
CO o O •— OCDOOO
OOOOOOOVOO
CD CD CO CO CD CD C^ CD ^D
CD CD CD O CD CD CD CD CD
OOOOOOOOO
OOOOOOOOO
0<
o<
)O O O O O OO
> O O O O O OO
ooooooooo
ooooooooo
ooooooooo
r-OOOOOOOO
CMOOOOOOOO
ooooooooo
CD CD O CD CD CD CD fTt CD
ooooooooo
OO O
OOOOOO
CMOOOOOi—Ovf
CD CD CD O CO CD CD CD CD
CD CD CD CD CD CD CO ^D CD
lOi— cn O O OO O op i—
cn i— co ro UD r*^ co
oo cn cn cn cn o oo 10 CD
ir> ID i— i— ^roroor^.
orz: or z: LU
so h- 3 > :r
OH- z o o a.
LU_I « —_
cao. i— (— <_> o.
o to to or
: o z
. .-• o .
ouj
ro
or 3: z 01
o <
c_>.
CD '
See :
o.
O LU CM
2? CD P*» •> ~I~ CO * O
i— CD. «ZD
00
OOJCMCMCMCNJO*3-O
cn cn cn cn en cn cn co co
OOCDOrO
-stCDI-
CMirjrococnr^.— ro
CNJOl— OJOr— OC-Jr—
cnocricncncnocvjcNj
•—•— r— r— r— ,— OO*J-«tf-
r-i— 1— ^_^-^-^_OO
^-t— I— .— r-r— ^0000
cn cn cn en cn en CM cn cn
-------
Table A-3
TOTAL EMISSIONS SUMMED BY TYPE AND BY COUNTY
COUNTY =1
Barns table
VARIABLE SUM
PART
SO 2
NOX
HC
CO
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
1383.0000000
31682.0000000
8699.0000000
590.0000000
1202.0000000
rniiMTY-7
Berkshire
1363.0GOOOOO
10727.0000000
2017.0000000
1178.0000000
2192.0000000
rniiNTY-c; _ __ _
Bristol
6726.000000
109956.000000
36436.000000
23459.000000
2858.000000
rniiMTv-7 -
Dukes
44.00000000
2.00000000
15.00000000
30.00000000
227.00000000
rniiMTY-Q - _
Essex
2344.0000000
15274.0000000
3859.0000000
7246.0000000
1128.0000000
COUNTY =11
Franklin
VARIABLE SUM
PART
S02
NOX
HC
CO
PART
S02
JJOX
HC
CO
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
500.00000000
1297.00000000
328.00000000
731 .00000000
1409.00000000
rniiwTY-i "3
Hampden
27003.0000000
55770.0000000
22517.0000000
18852.0000000
3433.0000000
rniiMTY-i c
Hampshire
9260.0000000
22639.0000000
4915.0000000
8407.0000000
1633.0000000
rfl!IIMTY=1 7 - -
Middlesex
5176.0000000
12562.0000000
2860.0000000
28724.0000000
3248.0000000
- mi IWTV-1 Q -
Nantucket
86.00000000
5.00000000
32.00000000
230.00000000
459.00000000
COUNTY=21
Norfolk
VARIABLE SUM
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
PART
S02
NOX
HC
CO
1080.0000000
3905.0000000
939.0000000
16457.0000000
750.0000000
.-rnnwTY-7^-
Plymouth
173.00000000
11.00000000
64.00000000
928.00000000
922.00000000
rnilMTY-?f",_
Suffolk
3325.0000000
7324.0000000
1658.0000000
38277.0000000
122.0000000
--milNTY-97- _ __ _
Worcester
14437.0000000
19081 .0000000
9228.0000000
6071 .0000000
17810.0000000
74
-------
Table A-4
MASSACHUSETTS AREA EMISSIONS TOTALS BY POPATRISK COUNTIES
(All State 22)
POPATRISK
COUNTY
0001
0003
0005
0007
0009
0011
0013
0015
0017
0019
0021
0023
0025
0027
TSP
1,136.5
1,806.6
4,711.1
70.2
6,741.9
580.3
4,989.0
1,104.6
13,872.5
47.8
3,718.8
3,127.9
6,520.5
5,582.6
so2
1,827.2
3,747.5
11,095.3
102.4
14,958.0
1,280.0
10,882.6
2,574.9
30,154.6
93.6
12,400.9
7,097.2
14,569.4
15,299.7
NOX
6,558.1
9,068.3
21,000.7
426.0
31,680.4
2,689.4
23,954.8
5,046.8
66,601.1
242.1
27,461.9
14,495.6
29,334.6
29,877.5
CO
56,345.0
67,896.0
138,511.3
3,785.2
223,638.6
18,994.2
172,158.6
34,182.5
486,865.6
1,931.7
198,855.1
100,580.0
204,053.0
201,280.0
75
-------
APPENDIX B
DOCUMENTATION OF DATA TAPES AND RUNSTREAMS
BUREAU OF THE CENSUS IN-OUT MIGRATION TAPE FILE DESCRIPTION
1. File Title: Migration by counties, 1970
2. Technical Characteristics
a. Tape type and density
9 track, 1600 BPI, EBCDIC, odd parity, standard IBM OS labels.
b. Record/field size. Fixed length
1464 character records, 7320 character blocks. Data records
contain 120 characters for geography and 168 8-character
data cells.
3. File Size: 3192 records
4. File Sequence: FIPS state by FIPS county.
5. File contains one record for each state and one record for each county.
State records contain a "000" for county code and a record identifier
of "A" County records contain a valid county code and a blank record
identifier. There is a special record for New York City as a whole in
addition to the individual counties in New York City. This record is
identified as state 36, county 000 and record identifier "B."
There is also an additional out-migrant record in New York coded 36995
which contains the tabulation of only records with a response of
"New York City—no borough given" for 1965 and a residence code in 1970
outside of New York City. This out-migrant record, however, should not
be used with much confidence.
RECORD LAYOUT
Matrix is:
Migration (2), Race (3), Sex (2), Age (7), Allocation (2). (168 cells).
Geography Characters
State 1-2
County 3-5
Identifier 6
Unused 7 - 119
Dollar Sign 120
76
-------
Data 121 - 1464
Immigration 121 - 792
Total All Races 121 - 344
Male 121 - 232
5-14 years
Allocated 121 - 128
Not Allocated 129 - 136
15-19 years
Allocated 137 - 144
Not Allocated 145 - 152
20-24 years
Allocated 153 - 160
Not Allocated 161 - 168
25-29 years
Allocated 169 - 176
Not Allocated 177 - 184
30-34 years
Allocated 185 - 192
Not Allocated 193 - 200
45-64 years
Allocated 201 - 208
Not Allocated 209 - 216
65 years and over
Allocated 217 - 224
Not Allocated 225 - 232
Female 233 - 344
5-14 years
Allocated 233 - 240
Not Allocated 241 - 248
15-19 years
White 345 - 568
Male 345 - 456
Female 457 - 568
•
•
Black
Male 569 - 680
•
•
Female 681 - 792
77
-------
Out Migration
Total all races
Male 793 - 904
Female 905 - 1016
White
Male 1017 - 1128
•
•
•
Female 1129 - 1240
•
•
Black
Male 1241 - 1352
•
•
•
Female 1353 - 1464
In-migrants in this file also are persons who reported a foreign country as
their residence in 1965; however,the out-migrant category excludes persons who
had reported their residence in that county in 1965 but were living overseas
in 1970. For counties that do not have any persons reporting a foreign
county as their residence in 1965, the count of unallocated in-migrants will
equal the number shown in Table 119 of the PC(1)-C report.
6. Considerations Regarding File Content
Allocation Method;
The data presented in this file are based on question 19 of the 1970 Census
15 percent household sample questionnaire. It requested those persons
who reported living in a different house in 1965 to report the state,
county and city, town or village (if applicable) for their residence in
1965. Persons who indicated that they lived in a different house but
did not report its address were classified as "moved, residence not
reported."
An allocation procedure was developed for such nonresponses which
assigned a 1965 state and county of residence based on the respondents
age, sex, race, college status and military status. Such allocations are
by design valid only to the county level of geography.
To allow comparability with published reports, both allocated and
unallocated migrant counts are shown for each county or Census county
division in the United States.
78
-------
Suppression
Data on in-migrants were suppressed for those counties that had fewer
than 34 weighted cases. Specifically, these counties are:
1) Angoon County, Alaska (St. 02, County 030)
2) Hinsdale County, Colorado (St. 08, County 055)
3) Yellowstone National Park County, Montana (St. 30, County 113)
Data fields for these counties will contain zeroes.
Comparison with Published Data
In theory, the unallocated counts of migrants into any county developed
from this file should equal the number of persons living in a different
county plus those abroad in 1965. (See Table 119, 125, or 145 of the
1970 Census State reports.)
However, there will be in some counties fewer unallocated "different
county in-migrants" than are shown in the published reports. This stems
from the allocation procedure which altered the records for persons who
were in the military in either 1965 or 1970 and who reported "at sea in
1965" by substituting a county of residence in 1965. However, if the
individual was not in the military either in 1965 or 1970, his "at sea"
response was not changed.
Some counties may also have more unallocated "different county in-migrants"
than are shown in the published reports. This stems from the different
interpretations of a complete report for persons reporting a foreign
country as their residence in 1965.
The number of unallocated out-migrants shown on this file for a given
county may be compared to Table 2 of the PC(2)-2E Migration Between
State Economic Areas report provided that the given county is also a
state economic area.
Migration Between New York City Boroughs
A special code was assigned to those persons living in New York City in
1970 whose reponse for 1965 residence merely indicated different house-
New York City. Persons assigned this code were not included in either
the in- or out-migrant tallies for individual boroughs.
Because of this unique situation, a special record is provided for New
York City as a whole. (This record is identified as State 36, County 000
and Record Identifier B.) For this summary record, in-migrants exclude
those persons who reported their residence in 1965 to either a New York
City borough and those persons cited to "place of residence-New York City-
no borough given."
In-and out-migrants for specific boroughs include only those persons with
either allocated or complete responses to residence five years ago.
Persons giving "New York City-no borough given" were not included as in-
or out-migrants from boroughs.
79
-------
MORTALITY WORKTAPE FILE DESCRIPTION-AGGREGATION OF 1969-71 MORTALITY
DATA INTO 50 ICDA CATEGORIES
1. File Title: Mortality Counts,for 50 ICDA's
2. Technical Character!sitics
a. Tape type and density
9 track, 1600 BPI, standard UNIVAC labeled tape
b. Record/field size. Fixed length
24005 character records, 24005 character blocks. Data
records contain 5 characters for state and county and
3000 8 character data cells.
3. File size: 3067 records
4. File sequence. File contains one record for each NCHS county,
one record provided for New York City area as a whole with no
borough breakdown. Records for Virginia independent cities have
been combined with their original county records. Each record
contains combined mortality counts for 1969-1971 broken out into
50 categories (48 HERL categories + 2 approved neoplasm clusters) ,
15 age groups, and 4 race-sex categories.
RECORD LAYOUT
Matrix is:
ICD # (50), Age (15), Sex Race (4), (3000 cells).
Geography Characters
State 1-2
County 3-5
Data 6 -24005
ICDA - 1
0-4 Age Group
White Male 6-13
White Female 14 - 21
Non-white Male 22 - 29
Non-white Female 30-37
80
-------
Geography
5-9 Age Group
White Male
White Female
Non-white Male
Non-white Female
10-14 Age Group
White Male
White Female
Non-white Male
Non-white Female
15-19 Age Group
White Male
White Female
Non-white Male
Non-white Female
20-24 Age Group
White Male
White Female
Non-white Male
Non-white Female
25-29 Age Group
White Male
White Female
Non-white Male
Non-white Female
30-34 Age Group
White Male
White Female
Non-white Male
Non-white Female
35-39 Age Group
White Male
White Female
Non-white Male
Non-white Female
40-44 Age Group
White Male
White Female
Non-white Male
Non-white Female
Characters
38 - 45
46 - 53
54 - 61
62 - 69
70 - 101
102 - 133
134 - 165
166 - 197
198 - 229
230 - 261
262 - 293
81
-------
••*».
Geography Characters
45-49 Age Group 247 - 325
White Male
White Female
Non-white Male
Non-white Female
50-54 Age Group 326 - 357
White Male
White Female
Non-white Male
Non-white Female
55-59 Age Group 358 - 389
White Male
White Female
Non-white Male
Non-white Female
60-64 Age Group 390 - 421
White Male
White Female
Non-white Male
Non-white Female
65-69 Age Group 422 - 453
White Male
White Female
Non-white Male
Non-white Female
70 > Age Group 454 - 485
White Male
White Female
Non-white Male
Non-white Female
ICDA - 2 through ICDA - 50
0-4 Age Group through 70 > Age Group 486 - 24005
White Male
White Female
Non-white Male
Non-white Female
82
-------
POPATRISK DATA BASE BACKUP SYSTEM
A data base backup system was used during development of the POPATRISK
data base to provide adequate archiving of updates during loading. Backup
procedures were used to protect against computer system failures as well as
data processing errors. The runstream documented below causes cycling of
a specified number of backup tapes so that the current version of the data
base is copied to the tape containing the oldest version.
0RUN
aS¥M PRINTStf'SITEID'
0ASG,T T.,F40,DSD087
0REMARK* . GET CYCLED BACKUP TAPE AND ASSIGN
«8EDfR APPAR*POPTAPES.
OFF DSPLIT
SPLIT TAPE N N
§ADD,P TAPE
^REMARK* . COPY EACH DATA BASE FILE TO TAPE
eC()Py,GM I POPATRISK — . ,POPSAVE.
,GM 2POPATRISK —. ,POPSAVE.
GM 3POPATRISK —. .POPSAVE.
«COPffGAi 4POPATRISK —. .POPSAVE.
flCOPy,GM 5POPATRI6K —.iPOPSAVE.
flC()py,GM 6POPATRI5K —. ,POPSAVE.
aCOPy,GM 7POPATRISK —. 9P()PSAVE.
- «?PRT,F I POPATRISK—.
SPRT,F 2POPATRISK—.
SPRT,F 3POPATRISK—.
«PRT,F 4POPATRI5K—.
tfPRT.F 5POPATRISK —.
8PRT.F 6POPATRISK—.
iSPRT.F 7POPATRISK—.
SIED.U APPAH*P()PTAPES.
MOVE N
LNPRINT!
SFREE APPAR*P()PSAVE
e?FIN
0RUN
Data base backup requires just over five minutes SUP time. Therefore,
the run card should specify 10 minutes run time.
83
-------
PR I NTS,, 'SITE ID'
This command is used to route output to user's remote terminal.
Elimination of the command causes printing at the RTF on-site printer.
0ASG.T T.,F40,DSD087
A tempory file is assigned on the disk pack to assure pack is mounted
prior to accessing files.
0ED.R APPAR*POPTAPES.
OFF DSPLIT
SPLIT TAPE N N
The UNIVAC editor is invoked to access a file containing a list of
backup tape assignment commands. The SPLIT command copies the Nth tape
assignment command to the temporary element TAPE.
0ADD.P TAPE
Temporary element TAPE is added into the runstream causing assignment
of the selected tape.
§COP¥,GM IPOPATRISK — . ,P()PSAVE.
§COPr,GM 2POPATRISK — . ,POPSAVE.
dCOPY, GM 3POHATRISK — . ,POPSAVE.
aC()Pr,GM 4POPATRISK — . ,P()PSAVE.
§COPY«GM 5POPATRISK — • ,P()PSAVE.
@C()PVtGM 6POPATRI6K-. ,P()PSAVE.
GM 7POPATRISK — . ,POPSAVE.
Each of the seven SYSTEM 2000 data base files is copied to tape in
COPY, G format.
£PRT,F IPOPATRISK — .
@PRT,F 2POPATRISK—.
SPRT,F 3POPATRISK—.
0PRT.F 4POPATRISK —.
§PRT,F 5POPATRISK—.
0PRT,F 6POPATRISK —.
F 7POPATRISK---.
84
-------
The Master File Directory entry for each file currently on the disk
pack is printed
9ED.U APPAR*POPTAPES.
MOVE N
LNPRINT!
The tape file is edited to move the tape command just used to the
top of the tape list, thus cycling the backup tapes.
3FREE APPAR*POPSAVE -
8FJN
The backup tape is released and the run terminated.
The runstream documented below is used in conjunction with the backup
runstream discussed above, and is used to copy a previous version of the
data base from tape to disk pack.
PRINTS, ,'SITEID'
T T.,F40,DSD087
0ASG,T POPSAVE.,16N,TAPE 4
tf DELETE, C IPOPATRISK —
@ASG,C IPOPATRISK— ,///45000,DSD087
QCOPY.G POPSAVE., I POPATRISK--.
^DELETE ,C 2POPATR I SK—
0ASG,C 2POPATRISK — ,///45000,DSD087
@C()PVfG POPSAVE. ,2POPATRISK — .
^DELETE ,C 3POPATRISK—
QASG,C 3POPATRISK— f///45000,DSD087
SCOPy.G POPSAVE. ,3P()PATRISK—.
^DELETE, C 4POPATRISK—
«JASG,C 4POPATRISK — , ///45000,DSD087
«C()P)T,0 POPSAVE., 4POPATRISK— .
flDELETE,C 5POPATRISK— •
^ASG,C bPOPATRISK— ,///45000,DSD087
«COP1T,G POPSAVE., 5POPATR I SK— .
«DELETE,C 6POPATRISK—
, C 6POPATR ISK— f ///45000 f DSD087
POPSAVE., 6POPATR ISK—.
C 7POPATRISK—
§ASG , C 7POP ATR I SK— , F40/0/POS/999 , DSD087
fiCOP Y , G POPSA VE . , 7 POPATR I SK— *
85
-------
dFREE IPOPATRISK —
«$FREE 2POPATRISK —
«JFREE 3POPATRISK —
0FREE 4POPATRISK —
0FREE 5POPATRISK —
0FREE 6POPATRISK —
0FREE 7POPATRISK —
8PRT.F IPOPATRISK —
@PRT,F 2POPATRISK —
9PRT,F 3POPATRISK —
@PRT,F 4POPATRISK—
@PRT,F 5POPATRISK—
«PRT,F 6POPATRISK—
^PRTfF 7POPATRISK —
®FREE POPSAVE.
SRUN
Run card should contain 10 minutes run time.
0SYM PRINTS,,'SITEIDy
^ASGtT T.,F4Q,DSD087
Same as in backup runstream.
SASG,T POPSAVE.,I6N,TAPE #
Assign tape containing version of data base to be restored.
^DELETE, C IPOPATRISK —
@ASG,C 1 POPATRI^K—, ///45000,DSD087
JCOPY,G POPSAVE.,IPOPATRISK--.
§DELETE,C 2POPATRISK—
eASG,C 2POPATRISK—,///45000,DSDQ87
eC()Py,G POPSAVE.,2POPATRISK-.
QiDELETE,C 3POPATRISK —
^ASG,C 3POPATRISK--,///45000,DSD087
tfCOP/,G POPSAVE..3POPATRISK—.
^DELETE,C 4POPATPISK —
^ASG.C 4POPATRISK—,///45000,DSDOS7
^C()PY,G POPSAVE., 4POPATRISK-.
^DELETE,C 5POPATHISK —
^ASG,C 5POPATRISK— ,///45000 ,DSD087
^COPY , G POPSAVE., bPOPATRISK—.
^DELETE ,C 6POPATRISK—
§ASG,C 6POPATR ISK— , ///45000 ,DSD087
dCOPY.G POPSAVE.,6POPATRISK—.
^DELETE,C 7POPATRISK —
dASG,C 7POPATRISK—,F40/0/P()S/999,DSDOB7
«8C()PY,G POPSAVE. ,7POPATRISK—.
86
-------
Each of the seven disk pack files are deleted, reassigned on the disk
pack, and the files copied from tape to disk pack.
@FREE IPOPATRISK—
§FREE 2POPATRISK—
tSFREE 3POPATRISK--
0FREE 4POPATRISK—
3FREE 5POPATRISK—
. 0FREE 6POPATRISK—
tfFREE 7POPATRISK —
Free each disk pack file so the Master File Directory can be updated
with current file information.
a/PRT,F IPOPATRISK—.
-------
APPENDIX C
OTHER STUDIES
SURVEY OF 1973 M3NITORING ACTIVITY
This section documents an initial survey made of monitoring stations
active in 1973 by county in the contiguous United States for seven major
air pollutants: TSP, SO-, NO , CO, Ozone, Suspended Nitrates and Suspended
Sulfates. Tabulations of levels of monitoring activity were made for all
counties in the 48 contiguous states (Alaska and Hawaii were not considered)
In addition, seven maps were produced, one for each pollutant considered,
on which the number of monitoring stations is indicated by a corresponding
number written in the county of interest of U.S. Department of Commerce
Maps of the United States, stock number 0301-1895.
This survey was undertaken in order to:
1) Determine the geographical distribution of air monitoring
activity for the seven pollutants of interest in an easily
comprehended pictorial form.
2) Determine the percentages of U.S. population on a state and
national basis residing in counties for which the air quality
is monitored by numbers of stations falling into the following
categories: at least 1, at least 2, at least 3, at least 5,
at least 10.
3) Provide a sound basis for judgment on the extent to which
methods for augmenting the existing air quality data should
be investigated.
88
-------
Methodology
Sources of Data—
The primary source of data relating to air quality monitoring sites
was the publication, Directory of Air Quality Monitoring Sites Active in
1973 (Directory), EPA-450/2-75-006. 1973 was chosen as the year investi-
gated because of the ready availability of this siting information.
The Directory lists siting information, pollutant monitored, method used,
and other data on a state but not strictly a county basis. In order to
obtain an accurate count of monitoring sites by county it was necessary
to keep a running tabulation while scanning by hand through the states
listed. A computer program to perform this task was considered but
rejected as being much more costly to develop and implement.
Population data for the year 1972 were considered sufficiently
accurate and were obtained from the 1972 County and City Data Book. A
listing (SAROAD file name NADB-PARMFL) of methods, by pollutant, in use for
monitoring purposes was obtained from Mr. A. A. Slaymaker, Chief, Data
Processing Section, National Air Data Branch. Another publication found
useful in identifying quickly the location and names of counties was
Office of Air Program's Federal Air Quality Control Programs, /'AP-102.
Tabulation of Monitoring Methods Included in and Excluded from the Survey—
A number of stations listed in the Directory used methods which
were deemed unacceptable or unsatisfactory for the purposes of this survey.
For example, methods yielding results not directly relatable to air quality
standards were deemed unsatisfactory. Although a count of all methods in
use was kept, the results reported are only for those methods considered
to yield reliable air quality data. In Table C-l we tabulate, for each
pollutant surveyed, a list of methods included in the count and alongside
(if applicable) a list of methods excluded. In no case would inclusion of
89
-------
TABLE C-1. LISTING OF MONITORING METHODS
INCLUDED IN AND EXCLUDED FROM SURVEY
Pollutant
Methods Included
Methods Excluded
SO,
flame photometric
West—Gaeke colorimetric
conductimetric
pulsed-flourescent
pararosanaline-sulfuric acid
total sulfur flame
photometric
instrumental coulometric
polarographic
Hydrogen Peroxide NaOH
titration
Sequential-conductimetric
Hydrogen Peroxide
NO,
Colorimetric-Lyshkow (mod.)
coulometric
chemiluminescence
Sodium Arsenite - frit
NASN Sodium Arnsenite -
orifice
TEA method - frit
TGS method - frit
TGS method - orifice
Griess-Satzman
polarographic
TEA Method-orifice
Sulfates
Colorimetric
turbidimetric
methyl thymol blue (ASTM)
barium sulfate (ASTM)
Nitrates
2-4 Xylenol
reduction-diazo coupling
specific ion electrod<=-
phenol-disulfonic acid
ultraviolet-spectrophoto-
metric
Ozone
chemiluminescence
ultraviolet DASIBI CORP
total oxidant - 0.2
(NO + NO )
total oxidant colorimetric
neutral KI
TSP
Hi-Vol
Membrane Sampler
millipore filter
cassette
sticky paper
soil index tape sampler
bucket gravimetric
nephelometer
smoke shade
cyclone
CO
nondispersive infra-red
gas chromatographic
dual isotope flourescence
90
detection tube
catalytic comb-thermal
detector
-------
unacceptable and/or unsatisfactory methods in the count materially alter
the results of the survey. We are pleased to acknowledge the assistance of
Mr. L. J. Purdue of the Environmental Monitoring and Support Lab, Research
Center, RTF, in assisting in the technical judgment of reliability and
applicability of monitoring methods.
The Case of Massachusetts—
Massachusetts presented some special problems in performing this
survey. As of the date of this report, no EPA guidelines exist for a
clear cut separation of Massachusetts geography into counties. Many areas
are specified as townships. The 1970 Census, however, divides Massachusetts
into 14 counties and it was decided to use this division as the basis for
deriving siting information. The Directory specified the AQCR within
which monitoring stations were located. By determining the exact location
of each monitoring station within an AQCR we were able to determine the
corresponding Census county. Census data then provided the necessary
population figures.
Results
Maps—
Figure C-l presents seven maps of selected areas in the U.S., one
for each of the seven pollutants of interest. Each map presents numbers
written within each county's borders indicating how many stations are
monitoring a particular pollutant. No numbers are shown for counties with
no monitoring stations. Complete U.S. maps by county are being provided
EPA under separate cover.
Tables—
Table C-2 shows tabulation tables, one for each of the seven air
pollutants of interest, in which the results of this survey are summarized
on a state basis. For each of the 48 states studied, the entries show the
percentage of state population residing in counties (within the state)
having at least 1, at least 2, at least 3, at least 5, and at least 10
stations monitoring that particular pollutant. Finally, corresponding
total numbers for the 48 states are given.
91
-------
Figure C-l. 1973 TSP Monitoring
Station Sample Distribution
92
-------
Figure C-l. 1973 SO Monitoring
Station Sample Distribution
93
-------
Figure C-l. 1973 NO Monitoring
Station Sample Distribution
•
94
-------
Figure C-l. 1973 CO Monitoring Station
Sample Distribution
95
-------
c
o
•H
Q
c
o
•H
4-J
n)
4-1
M)
C
•H
C
O
0)
c
o
N
o
CTv
r-t
tH
O
0)
n
M
•H
PM
VO
-------
Figure C-l. 1973 Sulfates Monitoring
Station Sample Distribution
97
-------
Figure C-l. 1973 Nitrates Monitoring
Station Sample Distribution
98
-------
TABLE C-2. PERCENTAGE OF STATES' POPULATION IN COUNTIES
WITH ONE OR MORE MONITOR1NO- STATIONS
1973 TSP
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
77
97
61
93
93
92
100
81
56
58
84
66
53
77
33
43
74
81
90
85
69
54
71
60
67
98
100
84
82
100
70
59
69
92
94
81
100
85
31
80
71
68
62
82
95
46
73
67
79
at least
2
52
90
36
82
87
92
100
71
37
58
77
47
35
43
18
31
56
72
80
78
58
22
63
42
45
86
97
78
66
94
47
38
61
61
74
78
95
51
23
68
60
55
12
63
77
46
60
30
66
at least
3
48
84
27
71
73
89
70
62
37
23
71
45
22
41
14
0
43
67
79
69
54
16
54
32
38
83
76
77
56
90
39
12
58
45
49
67
80
49
14
53
55
43
0
44
61
34
49
0
58
at least
5
43
75
3
35
27
78
70
51
17
16
65
42
16
41
11
0
32
63
71
66
49
16
54
27
38
81
16
49
51
83
29
0
51
36
45
56
70
37
0
47
55
43
0
27
54
17
44
0
49
at least
10
25
55
0
35
0
78
70
27
17
0
54
33
10
0
0
0
0
53
36
47
44
0
48
0
11
81
0
8
0
48
11
0
42
21
27
30
61
25
0
36
52
0
0
17
34
13
28
0
34
99
-------
TABLE C-2. PERCENTAGE OF STATES' POPULATIONS IN COUNTIES
WITH ONE OR MORE MONITORING STATIONS
1973 SO,
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
-------
TABLE C-2. PERCENTAGE OF STATES' POPULATION IN COUNTIES
WITH ONE OR MORE MONITORING STATIONS
1973 NO,,
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
XX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
35
79
17
96
23
100
85
47
21
0
53
56
16
43
72
30
23
74
52
48
52
14
37
2
38
81
5
50
36
48
24
0
55
37
27
50
95
79
1
39
61
43
0
19
54
17
44
21
51
at least
2
26
75
15
82
23
93
70
34
21
0
53
42
13
31
52
30
4
64
0
45
47
14
37
0
11
0
0
49
31
24
20
0
54
21
27
20
70
47
1
30
51
0
0
0
34
0
7
0
38
at least
3
21
0
0
77
23
90
0
27
13
0
49
27
10
7
44
0
0
58
0
40
44
0
34
0
0
0
0
35
0
15
3
0
23
21
0
17
70
45
1
0
21
0
0
0
0
0
0
0
27
at least
5
21
0
0
60
23
82
0
19
13
0
49
11
10
0
38
0
0
56
0
0
38
0
34
0
0
0
0
0
0
0
3
0
9
0
0
17
61
38
1
0
16
0
0
0
0
0
0
0
18
at least
10
0
0
0
35
0
26
0
0
13
0
49
11
0
0
23
0
0
16
0
0
25
0
20
0
0
0
0
0
0
0
0
0
0
0
0
17
61
0
0
0
0
0
0
0
0
0
0
0
10
101
-------
TABLE C-2.
PERCENTAGE OF STATES' POPULATION IN COUNTIES
WITH ONE OR MORE MONITORING STATIONS
1973 CO
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
10
75
0
93
42
78
0
24
13
0
52
15
16
46
5
16
0
66
70
39
6
0
50
0
38
81
30
94
43
31
3
0
36
36
37
30
61
3
0
33
42
68
22
32
64
13
18
0
41
at least
2
0
75
0
75
23
26
0
19
13
0
49
0
0
24
1
0
0
61
46
30
0
0
34
0
0
0
30
44
43
21
0
0
27
21
27
30
61
0
0
30
0
43
0
10
42
0
24
0
27
at least
3
0
0
0
61
23
0
0
0
0
0
49
0
0
0
1
0
0
44
38
30
0
0
34
0
0
0
0
22
43
8
0
0
16
21
0
17
61
0
0
18
0
0
0
10
42
0
24
0
20
at least
5
0
0
0
38
23
0
0
0
0
0
49
0
0
0
0
0
0
23
0
0
0
0
34
0
0
0
0
0
36
8
0
0
0
0
0
0
61
0
0
0
0
0
0
0
0
0
0
0
11
at least
10
0
0
0
35
0
0
0
0
0
0
0
0
0
0
0
0
0
23
0
0
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
i
102
-------
TABLE C-2. PERCENTAGE OF STATES' POPULATIONS IN COUNTIES
WITH ONE OR MORE MONITORING STATIONS
1973 OZONE
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
0
75
0
96
42
82
0
41
13
0
51
26
16
33
33
28
0
46
59
30
38
15
37
0
26
56
35
39
31
31
10
0
46
36
34
17
61
9
0
35
22
68
0
23
34
0
24
0
37
at least
2
0
0
0
83
23
26
0
27
0
0
49
15
0
24
27
16
0
46
38
0
0
14
20
0
0
56
0
15
31
9
7
0
31
21
27
17
61
0
0
33
14
43
0
5
34
0
23
0
25
at least
3
0
0
0
73
23
26
0
19
0
0
49
0
0
0
0
0
0
29
25
0
0
0
20
0
0
0
0
0
31
0
0
0
14
0
0
17
0
0
0
18
0
0
0
5
34
0
0
0
16
at least
5
0
0
0
50
23
26
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
31
0
0
0
6
0
0
17
0
0
0
0
0
0
0
0
34
0
0
0
8
at least
10
0
0
0
35
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
103
-------
TABLE C-2.
PERCENTAGE OF STATES' POPULATIONS IN COUNTIES
WITH ONE OR MORE MONITORING STATIONS
1973 SULFATES
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
0
97
0
0
0
90
0
0
60
0
0
42
0
0
24
0
0
55
0
0
69
0
21
8
0
25
0
0
0
44
0
59
9
0
0
0
0
0
0
7
17
0
0
0
0
0
0
0
14
at least
2
0
86
0
0
0
86
0
0
19
0
0
19
0
0
12
0
0
45
0
0
59
0
17
8
0
25
0
0
0
11
0
59
9
0
0
0
0
0
0
3
17
0
0
0
0
0
0
0
8
at least
3
0
78
0
0
0
56
0
0
8
0
0
12
0
0
5
0
0
43
0
0
56
0
14
8
0
25
0
0
0
4
0
31
9
0
0
0
0
0
0
0
17
0
0
0
0
0
0
0
6
at least
5
0
75
0
0
0
56
0
0
0
0
0
12
0
0
5
0
0
23
0
0
49
0
14
0
0
25
0
0
0
4
0
12
9
0
0
0
0
0
0
0
17
0
0
0
0
0
0
0
5
at least
10
0
55
0
0
0
51
0
0
0
0
0
0
0
0
2
0
0
8
0
0
44
0
0
0
0
25
0
0
0
4
0
0
9
0
0
0
0
0
0
0
17
0
0
0
0
0
0
0
4
1
104
-------
TABLE C-2. PERCENTAGE OF STATES' POPULATIONS IN COUNTIES
WITH ONE OR MORK MONITORING STATIONS
1973 NITRATES
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MX
NB
NV
NH
NJ
NM
NY
NC
. ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
9
97
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
44
0
59
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
at least
2
0
86
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
11
0
31
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
at least
3
0
81
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
0
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
at least
5
0
75
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
at least
10
0
55
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
9
105
-------
CORRELATION AND REGRESSION STUDIES: TSP-NEDS EMISSIONS
BACKGROUND
Results of the survey of monitoring activity show that 79% of the
U.S. population (1970 Census data) resided in counties having at least
one station monitoring TSP in 1973. Even if all stations reported valid
data there were nevertheless more than 42 million Americans whose
resident counties were not being monitored for TSP. Instrumental tech-
niques for monitoring this pollutant are well-established and stable so
that one would expect that the ambient monitoring data are reliable. It
was therefore decided to undertake a study to investigate possible
correlations between TSP and various emissions and meteorological parameters
on a county basis. The results of these studies were used to augment the
air quality data for counties with no monitoring for TSP but which had
emmissions for TSP and NO lying within the range of the regression analysis.
X
METHODOLOGY AND RESULTS
A file of mean annual ambient air quality for all stations reporting
valid TSP data to SAROAD in 1973 was assembled. Although exploratory
studies were made using all such data, the final work reported here was
performed using only data from stations sited to provide ambient air quality
measurements for the resident population. Table C-3 lists the numbers, of
counties in various categories referring to level of monitoring activity
in those individual counties. Thus the entry '114' under category 'N>4'
means that a total of 114 counties had 4 or more stations reporting valid
yearly SAROAD readings for TSP. The 'N>1' category contains the entire
data set.
The division of valid stations into these categories was made for the
following reasons:
1. It was intended to develop a regression model relating mean annual
TSP ambient air quality to NEDS emission values independently for data in
a number of these categories. The detailed forms of the models thus ob-
tained could be compared and the model giving the best "over all' fit
selected. It is, of course, essential that the functional form of the
106
-------
regression model does not change as more or fewer stations, selected on
some rational basis, are included in the data set.
2. The model selected as a result of the above considerations must
then be tested against a known data set. This data set could come only
from the valid SAROAD readings. The procedure followed was to derive the
model from a comparison of results in the N>4, N>6, N>7, N>10 subgroups
and then to compare the predicted and measured values for the entire 605
point data set.
3. It is of some interest to determine the maximum correlation co-
efficient obtainable as the N categories are scanned since this result has
implications for determining how representative a given station's monitoring
is of emissions in the county.
For each county reporting some valid yearly data to SAROAD, both the
geometric and arithmetic means of the annual average for all stations reporting
valid data within the county were computed. Since air pollution data tends
to be lognormally distributed, the geometric means of the annual averages
will also be lognormally distributed if one assumes that the ambient air
quality annual averages at stations within a county are independently
distributed. Although this is not likely to be a good assumption, it was
felt important to study correlations of the geometric means as well as the
arithmetic means because of the simplicity of the resulting distribution.
Actually, the results of the study showed no significant difference in the
structure of the models resulting from the use of either of these alternative
averaging procedures. The magnitudes of the correlation coefficients were
likewise similar.
Standard correlation and regression analyses were run for a variety
of empirical models using the SAS (Statistical Analysis System) available
on EPA's UNIVAC 1110. After experience with numerous trials the following
requirements for an acceptable model were adopted:
1. A correlation coefficient, R, of 0.7 or greater,
2. Parsimony and simplicity of the resulting model,
3. Stability of model parameters with respect to variation of
the N category of the data set.
107
-------
Table C-4 shows the maximum correlation values obtained for the four N
categories used as a function of the number of parameters used in the
regression model. The possible parameters in the model are listed at the
top of the table and the resulting parameter set which maximized R are
listed in the table proper. Note that 'TSP', which is the NEDS total TSP
county emission value, makes the major contribution to explaining the data
variance in all cases. Significant contributions are made also by high
powers of NO and HC NEDS county total emissions. It should also be
noted that the maximum value for R obtainable with this parameter set
decreases tnonotonically with the value of the N category.
The model chosen as giving the best 'overall fit' as a result of many
trials and comparison of maximum and near maximum values for R was a linear
4 5
regression model having NEDS county total TSP, (NO ) , and (NO ) as the
X X
independent variables.
Table C-5 gives a comparison between the maximum correlation coeffi-
cients from Table 2 for a three parameter model and the R values obtained
from the chosen theory for the four N categories used. Except for the
N>10 categories differences in R are in the third decimal place.
Table C-6 gives a similar comparison, for the N>7 category between the
chosen model and a sampling of other 'simple' models. Here again the
differences are small.
While it is our judgement that the model chosen gives the best overall
fit to the data in the N>4, N>6, N>7 and N>10 categories the differences in
fit between it and alternative models are small enough so that it does not
seem appropriate to perform any attempts at theoretically justifying the
resulting functional form. Rather the model should be viewed as a purely
empirical means of calculating TSP air quality in counties for which data
is missing or inadequate. In this regard, it should be noted that an
attempt was made to impose the theoretically justifiable functional
TSP
form (where U is the county mi
model with disappointing results.
TSP
form (where U is the county mean annual wind speed) on the regression
108
-------
While the model should be viewed as purely empirical, it is never-
theless important to ensure that the choice of high powers of the NO
X
NEDS county total emissions is not the result of a few outlying data
points. Accordingly, 5 points of the 58 point data set for the N>7
category were chosen for removal from the data set. The choice was made
as follows:
1. A single parameter model using only NEDS county total TSP
emissions was used to predict air quality in the 58 counties of interest.
The 5 counties giving the worst agreement with measured averages were then
removed from the data set and the resulting 53 point data set was re-
analyzed. Table C-6 gives the result of this analysis by showing a comparison
between the 1st and 2nd best values for R for the two data sets along with
the resulting variable sets. It can be seen that the model chosen holds up
quite well, differences in R being mainly in the third decimal place.
Having chosen the model, we calculated the TSP ambient air quality for
all 605 counties for which valid data was available using regression co-
efficients derived from the N>7 category data. The ratio of measured to
calculated.values were then computed and the results plotted on the
frequency graph Figure C-2. In this figure, the computed ratios appear as
the discontinuous curves labelled 'data' while the smooth curve is a fitted
normal distribution. It can be seen from this figure that the empirical
theory developed predicts the TSP yearly average ambient air quality within
about ±37% for a 67% confidence interval. The population variance 02
and its associated 95% confidence interval are o^ = .076 +.009.
Part of the variance of the prediction is attributable to an "inherent*
variability in the data which results from the fact that within a given
county measured air quality at monitoring stations itself varies. Figure C-3
gives a measure of this variability. The ratio of individual station data
within a county to geometric mean for all stations within that county were
computed and plotted on this figure. The variance of the resulting fitted
normal curve and its associated 95% confidence interval is o = .056 ± .004.
109
-------
Figure C-4 shows the two fitted normal curves from Figures C-2 and C-3 plotted
to the same scale and superimposed. Comparison of o^ and o^ indicates
that not much improvement in the predictive value of the regression model
can be obtained. We are, however, currently investigating the use of
exp(aNO ) as a possible parameter primarily because of the simplicity of its
X
functional form.
Figure OS shows the variation of the square of the correlation co-
2
efficient, R , for the chosen model with the number of points included in
the correlation data set. The numbers on the curve indicate the corresponding
N category. Provided that errors in NEDS emission data show no systematic
correlation with N category (which seems a reasonable assumption) the
2
monotonic decrease of R with N value indicates that representativeness of
population based monitoring requires something in the neighborhood of 7 or
more stations per county with present siting strategies. Thus, for example,
if one were constrained to a fixed number of monitoring stations nationwide
and/or if one were interested in reducing the amount of monitoring with
little or no reduction in the overall quality of monitoring data, the
results presented here are highly suggestive of a strategy which would
consist of eliminating monitoring in most or all counties having only a few
stations monitoring population ambients and instead concentrating one's
effort to place at least seven stations in a smaller number of counties
representative of a wide range of NEDS emissions for TSP and NO nationwide.
X
In terms of the primary aim of this contract, i.e., to develop a data
base including air quality measures for all counties in the contiguous
United States, the regression model presented in this report would seem
quite adequate for filling in some missing TSP data.
TABLE C-3. CUMULATIVE FREQUENCY TABLES FOR VALID 1973
SAROAD POPULATION BASED MONITORING OF TSP
Category N: Number of Stations in County Reporting Valid Population
Based TSP Data to SAROAD in 1973
" N>10 N>9 N>8 N>7 N>6 N>5 N>4 N>3 N>2 N>1
Total Number of
Stations in Given 27 33 42 58 71 87 114 170 266 605
'N1 category
(number of points
in given data set)
110
-------
TABLE C-4. MAXIMUM CORRELATION VALUES, R, FOR A SAMPLING
OF DIFFERENT MODELS INVESTIGATED
Possible Emission Variables in Model: (TSP)1; (S02)J; (N0x)k; (HC/; (CO)™
where i, j, k, £, m = 0, 1, 2, 3, 4, 5
Number of
Parameters Resulting Correlation
N Category Needed Variables in model which maximize R Coefficient T
1
2
N>4
4
1
2
N^fi
3
4
1
2
3
4
1
2
^
3
4
TSP
TSP,
TSP,
TSP,
TSP
TSP,
TSP,
TSP,
TSP
TSP,
TSP,
TSP,
TSP
(TSP)
(TSP)
(TSP)
(N0x)5
(NO )4, (NO )5
x , x _
(NO ) , (NO ) ,CO
X •"•
(HC)5
? 5
(NO r (HOD
*5 ^
(NO ) , (HC) , CO
(HC)5
(NO )2, HC5
2 2 3
(S02)Z, (N0xr, (HC)J
2, (NO )5
li X S
\ S02, (N0x)5
5, SO., (N0v)4, (NO^)5
.43
.46
.49
.51
.50
.55
.59
.61
.60
.66
.71
.73
.67
.78
.84
.84
111
-------
TABLE C-5. SAMPLE COMPARISON BETWEEN MAXIMUM CORRELATION
COEFFICIENTS R AND THOSE OBTAINED FROM MODEL CHOSEN
AS GIVING BEST OVERALL FIT (THREE PARAMETER THEORY)
Chosen model: TSP;
NO4; NO5
x' x
N Category
N>4
N>6
N>7
N>10
Variables
TSP,
TSP,
TSP,
TSP4,
which
NO 4,
x
NO 2,
x
NO 2,
x
so2,
maximize R
NO 5
x
HC5
HC5
NO 5
x
Maximum R
.49
.59
.71
.84
Number of points
in data set
114
71
58
27
R for Model
Chosen
.49
.59
.71
.77
112
-------
TABLE C-6. SAMPLE COMPARISON BETWEEN CHOSEN MODEL AND
OTHER 'SIMPLE' MODELS (N>7)
3 parameter
2 parameter
Variables
TSP, NO 4, NO 5
x x
TSP NO 3, NO 5
x x
TSP, NO 3, NO 4
x x
TSP, NO 2, e x
' x '
TSP, NO 5
TSP, NO 4
x
NO
TSP, e x
R
.71
.70
.70
.70
.64
.63
.63
Chosen model
113
-------
TABLE C-7. EFFECT ON REGRESSION MODEL OF REMOVING 5 'OUTLIER1 POINTS
FROM 58 POINT DATA SET BASED ON SINGLE PARAMETER THEORY
53 Point Data Set
Number of
Parameters
1
1
2
2
3
3
1st & 2nd Best
Variable Set
TSP
NO
X
TSP, (NOx)A
TSP, (NO )5
X
TSP, (N0x)2, (N0x)3
TSP, (NO )4, (NO )5
A X
R
.626
.628
.653
.653
.654
.653
Original
58 Point Data Set
Number of
Parameters
1
1
2
2
3
3
1st & 2nd Best
Variable Set
TSP
(NO )2
A
TSP, (NO )5
TSP, (NO )4
A.
TSP, (N0x)4, (N0x)5
TSP, (NO )3, (NO )5
A X
R
.598
.396
.642
.643
.707
.705
114
-------
30
60
Q
<
O
o:
LL
O
TJ
O
O
o
cc
u-
o
>.
o
c.
0)
73
cr
0)
20
DATA &
(COSpoints total in data set)
FITTED NORMAL CURVE
Icr (sample)
2a- (sample)
SAMPLE
MEAN
.4
Ratio
I 2.5
measured (SAROAD)
IBP calculated (present model)
FIG. c-2. Fit of Present Model with all Valid 1973 SAROAD
Population Based Monitoring
115
-------
200
150
D
Q:
o
c:
0>
rj
Or
(U
100
50
DATA
'-•FITTED NORMAL CURVE
lo-
.4
2.5
TSP Measured fSAROAD) for IndividualJStc.tion
HOT io
Geometric Mean ior All Stations in County
FIG, c-3. 'inherent1 Variability of SAROAD Data for Intracounty
Population Based Monitoring
116
-------
Ratio
FIG. c-4. Comparison of Fitted Normal Curve for Predicted Air
Quality and ' Inherent'Variability
117
-------
CO
G
O
•H
CO
CO
en
C
o
•rl
CO
CD
•H
§
co
fi
o
•rt
CO
CO
CO
c
o
•H
CO
CO
•H
ca
a
o
•H
co
CO
O
O
CU
N
cu
CO
&
4J
•H
>
CM
M-l
O
0
O
n)
•H
M
IT)
I
00
•rl
CO
H
CO
cu
!-|
43
n)
•H
VJ
c
cu
a.
cu
4-1
•H
•H
CO
X
CL, PL,
CO X
H W
a
0)
•a
cu
(X
cu
13
G
•H
CU
a
•H
rt
4-1
o
O
O
o
o
o
o
O
O
CO
W
O
c
o
•H
l-i
O
u
en
4J
c
•H
o
(X,
14-1
o
1-1
01
I
ts
111
p
o
o
CM
o
o
- zl
118
-------
CORRELATION AND REGRESSION STUDIES: SO - GROUND DEPOSITION
A rigorous determination of population-at-risk with respect to
air polution exposure requires a prohibitive level of air monitoring to
account for local variations. Many counties lack the financial resources
and technical expertise to successfully operate a sufficient number of
monitoring stations. Therefore, the population-at-risk study inevitably
involves some extrapolation of a limited air quality data set. The errors
introduced by the extrapolation represent a limitation to the accuracy of
the risk estimates. Selected factors which could conceivably introduce
local variations in air quality are discussed below followed by an attempt
to incorporate one of the dominant factors into the statistical models used
to predict air pollution exposure.
POLLUTANT REACTIVITY AND ATMOSPHERIC "CLEANSING" PROCESSES
Analyses discussed above consider only the SO emission density and
the general source characterization (point and area). While these are
probably the dominant factors influencing the local SO levels on an
annual average, there are a variety of additional factors which can affect
observed air quality. A partial list is as follows:
- Ground Deposition Processes
- Regional Transport (Inter-county dispersion)
- Atmospheric reactivity (smog chemistry) of S0_
- SO Washout and Rainout
- Plume reactivity of SO
- Type of atmospheric monitors
- Monitoring site characteristics
- Monitor operational characteristics
- Micrometeorological factors
The latter group of factors has a potentially large impact on observed
air quality; however, evaluation requires a time-consuming, comprehensive
study of local conditions which is beyond the scope of the present project.
Therefore, attempts to improve the prediction capabilities of the models
119
-------
are limited to consideration of atmospheric reactivity and pollutant
removal processes.
SO
reactivity in large point source plumes has been observed to
range from 2 to 15 percent per hour depending on meteorological conditions
and measuring technique. On an assumed average basis this does not
appear to be a major factor; particularly since area sources of SO are
dominant in most regions. Area sources emit SO at levels subject to
surface air flow disturbances and therefore, experience less defined
plumes.
Washout and rainout are one of the two major atmospheric removal
processes; however, conditions are relatively similar in a given region
of the county to adjacent counties in the same general region.
Atmospheric reactivity, regional transport and ground deposition
processes appear to be the dominant factors potentially introducing local
variation in SO., levels. The relative importance of these has been eval-
^ 1
uated by Richards and Gerstle using a Continous Stirred Tank Reactor (CSTR)
model. This approach involves CSTR equations for a first order reaction
process in conjunction with NEDS emission data and SAROAD ambient SO data
in a 10 state northwestern region. Comparison of the predicted SO annual
average concentration at various decay rates with observed concentrations
are used to identify the prevailing SO decay rate and the SO oxidation
rate. By dividing the 10 state west coast region into four approximately
equal regions, it was possible to roughly calculate the extent of regional
transport.
The results of the CSTR calculations suggest that atmospheric reactivity
is relatively insignificant compared with ground deposition and other
"cleansing processes." Most SO appears to be removed prior to atmospheric
oxidation. Due to the rapid rate of SO removal (half-life of less than
12 hours) regional transport of S02 also appears to be less important
than ground deposition. Based on these rough analyses, the scope of the
remainder of this section is restricted to ground deposition processes.
120
-------
GROUND DEPOSITION
Ground deposition processes have gained recognition in the last
3 to 5 years as a major sink for SO and other atmospheric pollutants.
Studies done in England in regions dominated by grass fields or wheat
crops indicate that over 40 to 60 percent of emitted S02 could ultimately
O Q /
be removed in one of the ground deposition processes. ' ' Recent work
in the RAPS Study Region (Metropolitan St. Louis) has also indicated strong
ground deposition. While these processes have not been adequately
studied it is logical to assume that differential ground deposition char-
acteristics are responsible for some of the observed variation in air
quality.
There are at least three distinct mass transfer mechanisms collect-
ively described on ground deposition: 1) adsorption on soil and
vegetation surfaces, 2) absorption in water droplets and layers, and
3) physiological uptake in vegetation.
1. The absorption process is a potentially reversible operation
which is independent of plant characteristics. Absorption occurs
only during periods when water droplets are on the leaf surfaces.
This would primarily be restricted to night times when the temp-
erature drops below the dew point or during periods of rain.
The absorption may or may not be irreversible depending on the
fate of the droplets. It is logical to assume that pollutants
absorbed in dew are released upon evaporation of the droplets.
SO collected during the night could be released during the warm
late morning periods by this process.
There are several factors which probably inhibit mass transfer to
the water droplets. These droplets are present only during periods
generally characterized by poor mixing, therefore, the diffusional
boundary layer in the gas phase is at a maximum. Furthermore,
the droplets are stagnant, therefore, capacity for SO due to the
rise in pH which results from absorption of SO and CO . This
pH limit is particularly important for rain droplets which have
121
-------
an initial pH of - 3.5 to 4.5.
Absorption processes could be either controlled by the gas phase
diffusional resistivity or the gas liquid solubility equili-
brium. This depends primarily on the presence of alkaline dust
on vegetation surfaces, the pH of rain in the region of interest
and the quantity of water on the surfaces. If it is diffusion
controlled, the rate is proportional to the 1% power of the
absolute temperature and to the % power of windspeed. The
solubility step, however, is inversely proportional to temperature.
2. Adsorption is a reversible process which is directly proportional
to the available surface area. Water vapor and CO probably
compete with SO,, and other pollutants for the active sites, there-
fore, such gases influence the surface area relationship.
Adsorption involves a gas phase diffusion step which limits the
overall rate of mass transfer. According to standard mass
transfer equations, this process should be directly proportional
to the 1-2 power of the absolute temperature and to the ^ power
of the average windspeed. There should be only a weak dependence
on the type of plants, however, the total available surface area
would be important.
3. The physiological uptake process is inherently irreversible and
highly dependent on plant characteristics. The process is more
complicated with respect to mass transfer and is potentially more
important than the surface adsorption/absorption mechanisms. The
pollutant gases must traverse a diffusional boundary layer and a
relatively stagnant stomatal cavity. Upon contact with a cell,
the molecule is adsorbed on the cell wall prior to diffusion
through the cell membrane. It is not presently known which
diffusional mass transfer operation controls the uptake rate. The
overall mechanism should be directly proportional to temperature
within the normal tolerance limits of the plant.
The opportunity for physiological uptake is limited to daylight
122
-------
hours during which photosyntheses is occurring. Certain defensive
mechanisms could reduce this uptake since both SO and CO could
eventually stimulate closure of the stomata. Water stress would
also keep stomata closed to reduce transpiration losses. Physio-
logical uptake is directly proportional to the leaf surface area.
Unlike the surface removal processes, physiological uptake is
favored during the warm months.
Physiological uptake introduces a seasonal and diurnal cycle which
conceivably influences dose-rate. While this is not of direct interest in
the present study which involves annual average data, the dose-rate of SO
could be two to four times lower in daytime periods in warm months compared
with night time or winter periods.
ANALYSIS OF GROUND DEPOSITION PARAMETERS
Ground deposition processes conceivably exert a strong influence of
annual average SO concentrations. Therefore, an attempt has been made
to incorporate ground deposition parameters into the statistical model
used to predict SO exposure in those counties lacking adequate monitoring
equipment. The mass transfer analyses discussed earlier indicate that the
most useful parameters include the following:
- vegetation surface area
- vegetation type
temperature to 1^ or 2 power
- windspeed to % power
The latter two factors introduce a seasonal and diurnal cycle not of direct
interest to the present study. The vegetation factor, however, should be
useful in approximating the influence of ground deposition processes.
A limited analysis has been performed using the Statistical Analysis
System (SAS) computational package in order to evaluate the impact of the
vegetation parameters on the predictive capabilities of the ambient SO-
emmission inventory models. The scope was limited to North Carolina
counties due to the availability of land use/vegetation distribution data.
123
-------
These data included a county by county tabulation of acreage devoted to
crop land, pastures, forests and "other" uses in 1958 and 1967. The
similarities of the 1958 and 1967 data sets indicated that land use is
relatively stable and therefore, probably representative of 1975
conditions. The SO ambient concentration data for 1975 was obtained from
the SAROAD data files. Several sites reporting extreme SO levels were
not representative of general conditions in North Carolina. Furthermore,
only sites located in residential areas were considered. The final data
set and descriptive statistics are presented in Table C-8. It is apparent
that there is a very small range in reported SO levels compared with
most industrial regions. This will complicate subsequent analyses.
Analyses were done using both Spearman Correlation Tests and Multiple
Regression Models. The former were used as a screening procedure to
indicate possible linear or non-linear relationships. The multiple
regression models generally included various emission parameters and land
use parameters. Two sets of weighting factors were used to approximate
the expected influence of vegetation surface area. These factors, shown
in Table C-9, represent minimum and maximum conditions.
124
-------
TABLE C-8-a. COUNTY EMISSION DATA AND LAND USE CHARACTERISTICS
FOR NORTH CAROLINA
SO emissions, T/yr
Land use, acres
County
ID Total
80
480
500
520
540
660
720
780
840
940
960
1140
1300
1480
1560
1780
1840
1860
1940
2060
2120
2280
2360
2480
2580
2680
2880
2980
3000
3080
3180
3260
3320
3380
3420
3460
3500
3600
3900
3980
4080
4120
4160
4280
4360
4500
Point Area
348
22598
1008
2972
995
1486
23230
520
1043
3525
2885
1023
1306
5509
93756
6999
1224
10448
261
1857
1042
552
567
2760
3467
569
38902
1179
1804
598
1254
1507
514
6318
18455
26252
13563
913
2039
244
726
797
3332
20315
956
603
32
21008
115
1778
237
144
22717
402
48
2928
1113
665
772
2485
91856
3129
764
9812
14
746
216
159
143
1793
222
101
36717
297
1282
319
548
364
59
5458
17394
25008
12307
617
1460
14
102
303
1151
19447
261
49
316
1590
893
1194
758
1342
513
118
995
597
1772
368
534
3024
1900
3870
460
636
247
1111
826
393
424
967
3245
468
2185
882
522
279
706
1143
455
860
1061
1244
1256
296
579
230
624
494
2181
868
695
554
S02
ug/m3
6.04
5.02
7.41
6.01
5.82
6.54
6.04
5.18
5.15
5.18
7.46
5.07
7.59
6.72
5.05
15.46
6.25
5.26
03
71
62
60
5.57
5.02
9.59
6.35
6.44
5.11
6.27
5.08
5.35
5.43
6. 34
6.17
5.59
8.23
4.99
6.84
6.17
5.04
5.97
6.57
8.66
5.25
4.99
4.99 ;
Crop land Pastures Other Forest
36146
36000
30166
60948
25971
65547
54010
36312
101213
71972
93068
166617
136907
69234
37944
101666
117719
19641
56647
108155
211024
30700
64999
13169
61507
51478
6330
58391
49025
54089
158692
111759
49056
224569
83382
93202
64288
50529
82260
7128
136684
44118
107941
146664
107642
14884
22884
41985
13656
28524
22565
20540
29813
2200
35932
4772
12566
9534
13178
20489
18465
30235
11617
40544
3015
63247
11500
7057
17558
10399
24000
11278
906
1750
25116
3000
13622
21218
12591
17957
15217
35938
18000
7293
35035
4975
50960
5955
26277
9415
2637
22109
4395
9685
19567
8387
10854
16616
9916
7231
8425
18448
12816
10574
10237
21000
11273
17025
12546
8367
10031
9526
16500
4100
8149
7396
16155
13871
10032
23362
10828
4889
22049
17893
10204
11762
15222
12801
12521
6091
7948
2150
15837
6929
24679
21623
5219
4192
94363
251800
199110
99487
175150
123675
343600
64657
127500
287057
224528
325820
152700
115037
111636
192300
232400
141402
157727
166372
249200
111292
91411
177516
156100
351561
67880
300716
154500
78228
191637
296562
217554
317300
233898
150420
248121
99477
113252
84203
185200
85578
322521
159382
110000
120705
125
-------
TABLE C-8-b. DESCRIPTIVE STATISTICS FOR MODEL
Variable
Number of Standard
Observations Mean Minimum Maximum deviation
Ambient SO,
(yg/m3)
46
6.30
4.99
15.46
1.763
Total S02 46
emmissions
(103T/yr)
Point source 46
S02 emissions
(103T/yr)
Area source 46
SOo emmissions
(103T/yr)
Crop land 46
(103 acres)
Pastures 46
(103 acres)
"Other" land 46
(103 acres)
Forests 46
(103 acres)
7.22
6.23
0.99
76.29
18.64
11.94
179.58
0.24
0.01
0. 12
6.33
0.91
2.15
64.66
93.76
91.86
3.87
224.57
63.25
24.68
351.56
15.56
15.33
0.81
50.63
13.70
5.57
81.77
126
-------
TABLE C-9. VEGETATION SURFACE AREA WEIGHTING FACTORS
Land use category Weighting factors
Set A Set B
Pasture 1 1
Crop land 2 5
Other 3 25
Forest 4 100
The results of the statistical tests are presented in Tables C-10
and C-ll. Correlations with a level of confidence greater than 90 percent
are marked with an asterisk. It is apparent that the area source SO
emmissions are the dominant influence on ambient SO levels. This
observation is consistent with most emission impact studies.
The land use parameters did not improve the predictive capability
of models using area source SO emissions data. Note that all Spearman
coefficients of land use parameters were equal to or lower than the comparable
model based solely on area source emissions. Similar results are apparent
in the linear and multiple regression models. A multiple regression model
using appropriate powers of the area source emissions yields a modest
increase in the correlation coefficient (0.44 improved to 0.71). However,
the slight improvement does not warrant the considerable effort involved
in compiling the land use (vegetation surface area) parameter.
While the consideration of vegetation characteristics should improve
the SO emission ambient SO models, the techniques evaluated in this
section were unsuccessful. It is felt that this could be due to the small
range of ambient SO concentrations or a too simplistic approach to the
ground deposition process.
127
-------
TABLE C-10. SPEAEMAN CORRELATIONS:
ANNUAL AVERAGE SO CONCENTRATION AND VARIOUS EMISSION
AND LAND USE PARAMETERS
Parameter
Spearman
coefficient
Prob > F
Total S0? emmissions
Point source SO,., emissions
Area source SO™ emissions
Total SO- emissions/county area
Area SO™ emissions/county area
Crop land
Pasture land
"Other" land
Forest
o
Area source emissions/Set A
Area source emissions/Set B
Area source + 10% point source
emissions/Set A
Area source + 10% point source
emissions/Set B
0.131
0.064
0.253
0.096
0.149
0.160
0.259
0.269
0.016
0.252
0.233
0.187
0.184
0.61
0.67
0.09
0.53
0.67
0.28
0.08
0.07
0.91
0.09
0.11
0.21
0.22
Set A and Set B denote the two sets of weighting factors for
land use categories as described in Table 1-2.
128
-------
TABLE C-ll,
ANNUAL AVERAGE SO,
t
Independent variables
Total So_ emissions
Point source S0« emissions
Area source S0~ emissions
Crop land
Pasture land
"Other" land
Forests
Total emissions/area
Area source emissions/area
(Area source emissions)
2
(Area source emissions)
3
(Area source emissions)
Log (area source emissions)
Area source - Total
Area sources
2
(Area sources)
(Area sources)2
(Area sources)
Log (area sources)
MULTIPLE REGRESSION MODELS:
CONCENTRATION AS DEPENDENT VARIABLE
Correlation
coefficient
(R2)
0.004
0.010
0.439
0.0090
0.0610
0.0582
0.0004
0.0019
0. 1200
0.339
0.581
0.652
0.234
0.717
-
-
-
-
_
Prob > F
0.67
0.50
0.0001
0.53
0.10
0.11
0.88
0.76
0.02
o.oooia
o.oooia
o.oooi3
0.0007a
0.0001
0.0001
0.0001
0.0151
0.4365
0.0257
129
-------
REFERENCES
1. Richards, J. R. and Richard Gerstle. Stationary Source Control Aspects
of Ambient Sulfates: A Data Base Assessment: Office of Research
and Development, U.S. Environmental Protection Agency, Research
Triangle Park, N.C. 157 pages. February 1976.
2. Garland, J. A. Deposition of Gaseous Sulphur Dioxide to the Ground.
In: Atmospheric Environmental. Great Britain. 8:75-79, July 1973.
3. Shepherd, J. G. Measurements of the Direct Deposition of Sulphur
Dioxide Onto Grass and Water by the Profile Method. In: Atmospheric
Environmental. Great Britain. 8:69-74, July 1973.
4. Garland, J. A., W. S. Clough, and D. Fowler. Deposition of Sulphur
Dioxide on Grass. In: Nature. 242:256-257, March 23, 1973.
5. Wilson, W. Personal Communication, February 1975.
6. North Carolina Conservation Needs Inventory. N.C. Department of
Agriculture. December 1971.
130
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
1. REPORT NO.
EPA-600/1-78-051
3. RECIPIENT'S ACCESSION-NO.
4. TITLE AND SUBTITLE
Population at Risk to
Data Base "Popatrisk"
Various Air Pollution Exposures
5. REPORT DATE
June 1978
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
Sandor J. Freedman
Joseph D. Wilson
8. PERFORMING ORGANIZATION REPORT NO.
Elsa Lewis-Heise
Albert V. Hardy
9. PERFORMING ORGANIZATION NAME AND ADDRESS
System Sciences, Inc.
P. 0. Box 2345
Chapel Hill, North Carolina 27514'
1O. PROGRAM ELEMENT NO.
.1AA601
11. CONTRACT/GRANT NO.
'Contract No. 68-02-2269
12. SPONSORING AGENCY NAME AND ADDRESS
Health Effects Research Laboratory RTP,NC
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
ra..TYP.E OF REPORT AND PERIODJCOVEREO.
Final report covering Oct.1975
1Q77
14. SPONSORING AGENCY CODE
EPA finn/n
15. SUPPLEMENTARY NOTES
16. ABSTRACT
The work reported herein was undertaken to provide the EPA with a user-
oriented data base containing recent county-based information, for all
counties in the contiguous United States, on population demographics,
population mobility, climatology, emissions, air quality, and age-adjusted
death rates.
The completed data base, called "POPATRISK," contains approximately
27.5 million characters and is in SYSTEM 2000, Version 2.80 format, facili-
tating access with minimal user computer training. Population demographics
are as of the 1970 Census; population mobility is described spanning the
years 1965 to 1970 for 6 sex-race categories in 7 age groupings for both
"in" and "out" migrants" climatology information contains county summaries
of temperature, precipitation and hours of sunshine; county point and area
source emission estimates are provided for 5 criteria pollutants--TSP, S02,
N02, CO, and Ozone--based on the NEDS-USER file; air quality information is
based on 1974 data contained in SAROAD; age-adjusted death rates were computed
for the combined years 1969, 1970, and 1971 for 4 sex-race categories in
50 groupings of ICDA categories (8th revision).
7.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.lDENTIFIERS/OPEN ENDED TERMS C. COSATI Field/Group
data file
air quality
demography
population
mortality
Population at risk
05A
09B
06 F
8. DISTRIBUTION STATEMENT
Release to Public
19. SECURITY CLASS (ThisReport)
Unclassified
21. NO. OF PAGES
140
20. SECURITY CLASS (Thispage)
Unclassified
22. PRICE
EPA Form 2220-1 (9-73)
131
------- |