&EPA
             United States
             Environmentai Protection
             Agency
             Health Effects Research     EPA-600/1-78-051
             Laboratory         June 1978
             Research Triangle Park NC 27711
             Research and Development
Population  at Risk
to Various Air
Pollution
Exposures :
Data Base "Popatrisk"

-------
                RESEARCH REPORTING SERIES

Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gories were established to facilitate further development and application of en-
vironmental technology.  Elimination of traditional grouping  was consciously
planned to foster technology transfer and a maximum interface in related fields.
The nine series are:

      1.  Environmental  Health Effects Research
      2.  Environmental  Protection Technology
      3.  Ecological Research
      4   Environmental  Monitoring
      5.  Socioeconomic Environmental Studies
      6.  Scientific and Technical Assessment Reports (STAR)
      7.  Interagency Energy-Environment Research and Development
      8.  "Special" Reports
      9.  Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL HEALTH EFFECTS RE-
SEARCH series. This series describes projects and studies relating to the toler-
ances of man for unhealthful substances or conditions. This work is generally
assessed from a medical viewpoint, including physiological or psychological
studies. In addition to toxicology and other medical specialities, study areas in-
clude biomedical  instrumentation and health research techniques utilizing ani-
mals — but always with  intended application to human health measures.
 This document is available to the public through the National Technical Informa-
 tion Service, Springfield, Virginia  22161.

-------
                                               EPA-600/1-78-051
                                               June  1978
POPULATION AT RISK TO VARIOUS AIR POLLUTION EXPOSURES:

                DATA BASE "POPATRISK"
                          by

     Sandor J. Freedman          Elsa Lewis-Heise
     Joseph D. Wilson            Albert V. Hardy

                 System Sciences, Inc.
          Chapel Hill, North Carolina  27514
               Contract No. 68-02-2269
                    Project Officer
                   William C. Nelson
         Statistics and Data Management Office
          Health Effects Research Laboratory
     Research Triangle Park, North Carolina  27711
          Health Effects Research Laboratory
          Office of Research and Development
         U.S. Environmental Protection Agency
     Research Triangle Park, North Carolina  27711

-------
                                DISCLAIMER
     This report has been reviewed by the Health Effects Research Laboratory,
U.S. Environmental Protection Agency, and approved for publication.   Approval
does not signify that the contents necessarily reflect the views and policies
of the U.S. Environmental Protection Agency, nor does mention of trade names
or commercial products constitute endorsement or recommendation for  use.
                                     ii

-------
                                FOREWORD
     The many benefits of our modern, developing, industrial society
are accompanied by certain hazards.  Careful assessment of the relative
risk of existing and new man-made environmental hazards is necessary
for the establishment of sound regulatory policy.  These regulations
serve to enhance the quality of our environment in order to promote the
public health and welfare and the productive capacity of our Nation's
population.

     The Health Effects Research Laboratory, Research Triangle Park,
conducts a coordinated environmental health research program in toxicology,
epidemiology, and clinical studies using human volunteer subjects.
These studies address problems in air pollution, non-ionizing
radiation, environmental carcinogenesis and the toxicology of pesticides
as well as other chemical pollutants.  The Laboratory participates in
the development and revision of air quality criteria documents on
pollutants for which national ambient air quality standards exist or
are proposed, provides the data for registration of new pesticides or
proposed suspension of those already in use, conducts research on
hazardous and toxic materials, and is primarily responsible for providing
the health basis for non-ionizing radiation standards.  Direct support
to the regulatory function of the Agency is provided in the form of
expert testimony and preparation of affidavits as well as expert advice
to the Administrator to assure the adequacy of health care and surveillance
of persons having suffered imminent and substantial endangerment of
their health.

     The data base described in this report has been developed to provide
the capability to examine easily and quickly the available county level
information on air quality; population including social-economic,
demographic, and migration factors, climatology, and mortality.  The
original information has been collected by various government agencies but
has not previously been combined into a single data file.  Use of this
file will permit more accurate estimates of the populations exposed to
various air pollutant levels and a better assessment of the geographic
variability between air quality and mortality.
                                   F. G. Hueter, Ph. D.
                                     Acting Director,
                           Health Effects Research Laboratory
                                   iii

-------
                                  ABSTRACT

     The work reported herein was undertaken to provide the EPA with a user-
oriented data base containing recent county-based information, for all
counties in the contiguous United States, on population demographics,
population mobility, climatology, emissions, air quality, and age-adjusted
death rates.

     The completed data base, called "POPATRISK," contains approximately
27.5 million characters and is in SYSTEM 2000,  Version 2.80 format,  facili-
tating access with minimal user computer training.   Population demographics
are as of the 1970 Census; population mobility is described spanning the
years 1965 to 1970 for 6 sex-race categories in 7 age groupings for both
"in" and "out" migrants; climatology information contains county summaries
of temperature, precipitation and hours of sunshine; county point and area
source emission estimates are provided for 5 criteria pollutants—TSP, SO.,
NO,-, CO, and Ozone—based on the NEDS-USER file; air quality information is
based on 1974 data contained in SAROAD; age-adjusted death rates were computed
for the combined years 1969, 1970, and 1971 for 4 sex-race categories in
50 groupings of ICDA categories (8th revision).

     Sample applications of the data base are provided herein.  A detailed
manual documenting the county identification codes  to be used in retrieving
data has been provided under separate cover.  Also  included in the geocoding
manual is a cross-reference table showing the relationships among POPATRISK
and the SAROAD, .FIPS, Census and NCHS geocoding schemes.

     This report was submitted in fulfillment of Contract No. 68-02-2269 by
Systems Sciences, Inc. under the sponsorship of the U.S. Environmental
Protection Agency.  This report covers a period from October 29, 1975, to
December 31, 1977, and work was completed as of  December 31,  1977.
                                    iv

-------
                             CONTENTS


Abstract	    ii
Figures	    iv
Tables 	     v
Acknowledgements	   vii

   1.  Introduction	     1
   2.  POPATRISK Data Base Features	     3
            System 2000 Version 2.80 Features  	     3
            POPATRISK Data Base Structure  	     4
            Components and Sources of Data	     9
            Variations in Geocoding Schemes Among
              Sources	    24
   3.  Use of Data Base "POPATRISK"	    26
   4.  Sample Retrievals Using POPATRISK 	    29
   5.  Data Base Development Procedures	    44
            Considerations Leading to Choice of
              Data Items	    44
            History of Development	    46
            Methodologies for Processing and Loading ....    47
            Use of Removable Disk Pack and
              Archiving Procedures	 .    55
       Appendix A:  Resolution of Geocoding
                    Discrepancies for Massachusetts
                    NEDS and SAROAD Data	    69
       Appendix B:  Documentation of Data Tapes and
                    Runstreams	    75
       Appendix C:  Other Studies  	    8S

-------
                                  FIGURES

Number                                                                 Page

  1    POPATRISK heirarchical structure 	       8

  2    Overview of POPATRISK data sources and development
         procedures, Phase 1  	      51

  3    Overview of POPATRISK data sources and development
         procedures, Phase 2  	      58

 C-l   1973 Monitoring Station Sample Distribution  	      92

 C-2   Fit of present model with all valid 1973 SAROAD
         population based monitoring  	     115

 C-3   "Inherent" variability of SAROAD data for intracounty
         population based monitoring  	     116

 C-4   Comparison of fitted normal curve for predicted air
         quality and "inherent" variability 	     117
                     2
 C-5   Variation of R  with data set size	     118
                                       vi

-------
                                  TABLES
Number                                                                  Page
  1    POPATRISK Data Base Definition in System 2000 Format	     5
  2    POPATRISK Data Base Component Description  	    10
  3    Aggregations of Mortality Categories for POPATRISK
         Age-Adjusted Deaths  	    21
  4    Cluster Descriptions 	    23
  5    Illustrative Retrieval of NEDS Emissions Data for
         Massachusetts in tons/year 	    30
  6    Illustrative Retrieval of SAROAD Site and Monitoring
         Data for Durham County, NC	    32
  7    Illustrative Retrieval of Counties with Highest Calculated
         Mean S02 Monitoring (yg/m3)  	    35
  8    Illustrative Retrieval of Counties with Highest Age-Adjusted
         Auto Fatalities for White Males  	    37
  9    Illustrative Retrieval of Counties with Highest White Male
         Death Rate from Bronchitis	    40
 10    Illustrative Retrieval of Counties with Highest White
         Female Age-Adjusted Death Rate from Breast Cancer  	    42
 11    Illustrative Retrieval of Counties with Highest White
         Death Rate from Malignant Neoplasm Cluster 1	    43
 12    Unallocated In-Migration Totals for Selected Counties  ....    48
 13    State Unallocated In-Migration Totals for All Races  	    49
 14    Assignment of Massachusetts SAROAD Monitoring to
         Census Counties  	    53
 15    Independent Cities Recognized and Coded in Data Base	    55
 16    Assignment of Virginia Independent Cities to POPATRISK
         Counties	    56
A-l    NEDS Points Apportioned into Census Counties
         by using "Dummy" UTM Coordinates 	    72
A-2    NEDS Points Apportioned into Census Counties
         by Tracing Their Name and Address	    73
A-3    Total Emissions Summed by Type and by County	    74
A-4    Massachusetts Area Emissions Totals by POPATRISK Counties  . .    75
                                    vii

-------
                                TABLES (continued)
Number                                                                  Page
C-l    Listing of Monitoring Methods Included In and
         Excluded from Survey	    90
C-2    Percentage of States' Population in Counties with
         One or More Monitoring Stations	    99
C-3    Cumulative Frequency Tables for Valid 1973 SAROAD
         Population Based Monitoring of TSP 	   110
C-4    Maximum Correlation Values, R, for a Sampling of
         Different Models Investigated  	   Ill
C-5    Sample Comparison between Maximum Correlation Coefficients R
         and Those 'obtained from Model	   112
C-6    Sample Comparison between Chosen Model and Other
         "Simple" Models	   113
C-7    Effect on Regression Model of Removing 5 "Outlier"
         Points from 58 Point Data Set	   114
C-8-a  County Emission Data and Land Use Characteristics
         for North Carolina	   125
C-8-b  Descriptive Statistics for Model 	   126
C-9    Vegetation Surface Area Weighting Factors  	   127
C-10   Spearman Correlations:  Annual Average S0» Concentrations  .  .   128
C-ll   Multiple Regression Models:  Annual Average SO^
         Concentrations 	   129
                                     viii

-------
                             ACKNOWLEDGEMENTS

     The continuing support and technical direction of the EPA Project Officer,
Dr. William C. Nelson, are gratefully acknowledged.  Thanks are due also to
Dr. Victor A. Hasselblad of HERL for many helpful conversations.
John Van Bruggen of HERL provided information on the ICDA aggregations in
current use at HERL.

     Paul Comely, M.D., Ph.D., of System Sciences, Inc., was involved in the
interpretation of the cluster analysis and was most helpful also in inter-
preting some fine points of ICDA coding.  Christopher Gordon of System Sciences,
Inc. assisted in developing the geocoding scheme for the independent cities
in Virginia and also in resolution of other geocoding problems.

     Dr. I-Li Huang and Ms. Susan Alston, formerly of System Sciences, Inc.,
were responsible for some work on the data base in the intial phases of
this contract.

     The authors are pleased to acknowledge also the assistance of two
consultants.  Dr. Richard Kopec of the Geography Department, University of
North Carolina at Chapel Hill, was responsible for the reallocation of NEDS
point emission sources to the Massachusetts FIPS counties.  Mr. John Richards
of PEDCo Environmental Specialists, Chapel Hill, North Carolina, carried
out the study on S0? - ground deposition parameters.

     Finally, thanks are due to Ms. Signe Wetrogen of the Population Division,
Bureau of Census, for her continued interest in the in-out migration data
and for providing a special compilation of New York City migration data.
                                    ix

-------
                                  SECTION 1
                                INTRODUCTION

     Compilation of primary source material of nationwide data on popula-
 tion demography and mobility, climatology,  emissions of pollutants, air
 quality and deaths by ICDA cause is the responsibility of a number of
 different government agencies.  The data has heretofore been available
 mostly in specific format which tends to be unique for each data type.  The
 investigator of potential relationships among those variables has, therefore,
 often been required to reformat, condense, aggregate and/or disaggregate
 data which may have conflicting schemes of identifying geographical areas,
 population age groups and the like.

     The main thrust of the work conducted under this project was to provide
 the EPA with a coherent source of recent data for these variables in which
 the user, with a minimum of computer training, can retrieve information
 pertinent to his investigations with maximum speed and flexibility.   The
 user can define the format within the wide capabilities of SYSTEM 2000,
Version 2.80.

     All compilations of data require some choice of what items should be
 included and this work is no exception.   Considerations leading to this
 choice were guided by the desirabilities of time, compatibility among data
 sets,  reasonableness in data base size and expected retrieval costs, and
 anticipated usefulness to the EPA as defined through many discussions between
 the contractor staff and the EPA project officer.

     The sections to follow provide  a detailed description of the data base
 structure and historical development.   In particular, Section 2 documents
 the unique features >of the POPATRISK data base including  a discussion of the
meaning, units, and primary sources  of each individual data item.  Geocoding
 unique to this data base, which was  necessitated by the variance in geocoding

-------
schemes used by the different primary sources, is  summarized in this report
and detailed under separate cover in the report "POPATRISK Geocoding Manual."
Section 3 gives the reader an overview of techniques for performing retrie-
vals although the discussion is by no means meant  to be complete.   Reference
is made to the MRI SYSTEM 2000 Reference Manual for more detailed discus-
sion of the use of a SYSTEM 2000,  Version 2.80 data base.   Section 4
illustrates some sample results using POPATRISK and is a brief indication of
potential uses of the data base.   Development procedures used in this work
are documented in Section 5.  Appendix A contains  a discussion of methodologies
used to resolve geocoding discrepancies for Massachusetts.   Appendix B
contains detailed documentation on tapes and runstreams provided EPA with
this report.  Appendix C describes other studies performed  in this work.
In particular, a mapped survey of monitoring activity for the criteria
pollutants in 1973 and a report of studies correlating ambient TSP levels
with county emission estimates are presented.

-------
                                   SECTION 2
                         POPATRISK DATA BASE  FEATURES
 SYSTEM 2000,  VERSION 2.80 FEATURES
     The POPATRISK data base design and contents are a result of analysis
 and evaluation of available data  sources and their potential usefulness to
 investigators in population-at-risk studies.  The SYSTEM 2000 data base
 management system was selected as  the most versatile and cost-effective
 medium for storage and retrieval  of POPATRISK data.

     SYSTEM 2000 is  a general-purpose data base management system which
 operates on the UNIVAC 1100 series computers as well as IBM and CDC computers.
 It. provides the user xtfith a comprehensive set of data base management
 capabilities for developing and utilizing information systems tailored to
 the specific requirements of the  user.

     SYSTEM 2000, Version 2.80 offers a full range of capabilities for modi-
 fication of a data base definition, data base update, and quick-response
 retrievals.  Specifically, a SYSTEM 2000 data base design is flexible, in
 that additions can be made to the data base definition (structure) at any
 time and many changes to the existing structure are easily accomplished.
 The Immediate Access feature offers an English-like user-oriented language
which  can be used with minimal computer training to update or retrieve data
 in the data base.  Along with the Report Writer feature, Immediate Access
 provides the user capability to retrieve and print data quickly and in the
desired format.   Update features enable the user to ADD, CHANGE, or REMOVE
 data at any level from the entire logical entry down to the individual com-
ponent level.

     A further extension of SYSTEM 2000 retrieval and update features which
was used extensively in development of POPATRISK is the Procedural Language
Interface feature.   It enables the user to manipulate data in a data base

-------
through interfacing with COBOL, FORTRAN, or assembly language programs.
This  feature is particularly useful both in processing and loading large
amounts of data and in performing complicated retrievals from logically
unrelated sets of data.

      A detailed discussion of the SYSTEM 2000 data base management system,
its capabilities and command usage may be found in the SYSTEM 2000
Reference Manual, the Version 2.80 Newsletter, and Procedural Language
Interface (UM-2).

POPATRISK DATA BASE STRUCTURE

      Table 1 shows the POPATRISK data base definition in SYSTEM 2000
format.  As an aid to understanding the data base structure, Figure 1 is
provided to illustrate the hierarchical relationships of the data within
the data base.   It is important to note that the data base is described in
a county based format since the smallest geographical entity defined in
this work is the county.   It was decided to adopt a county coding convention
adhering as closely as possible to the current list of NADB SAROAD codes to
provide a system most adaptable to the data base and to the user.   Some
differences do exist, however, between the POPATRISK geocoding scheme and
the SAROAD scheme..  These are documented below.   Data in the data base are
stored in groupings referred to as logical entries.  One logical entry
contains all information pertaining to a particular county, thus there is
one logical entry of data for each county represented in the data base.
Within a logical entry, data are organized in tree structure into  repeating
groups of either related or disjoint data sets.   Each repeating group
represents a different type of data and components within the group define
in detail the data sets contained therein.

      In Figure 1, each level 1 repeating group is a descendant of the
Level 0 or Entry Level data and the Level 2 repeating group is a descendant
of a Level 1 repeating group.  All data sets connected by a line follow a
branch of the tree structure and are considered related data sets, however,
they  are disjoint to data sets connected to different lines (branches) from
the Entry Level.  For example, MONITORING DATA repeating group C300 is a

-------
       TABLE 1. POPATRISK DATA BASE DEFINITION IN SYSTEM 2000 FORMAT
SYSTEM RELEASE NUMBER   2t80B
DATA BASE NAME is POPATRISK
DEFINITION NUMBER 11
DATA BASE CYCLE   25M7Q
      1*  STATE ID (NAME XX)
      2*  COUNTY ID (NAME xxxx>
      3*  ST-COUNTY ID (NAME XUM
     10*  TOTAL POPULATION  (INTEGER NUMBER 9(8))
     n*  PERCENT NONWHITE  (DECIMAL NUMBER 99.9)
     12*  PERCENT BLACK (DECIMAL NUMBER 99,9)
     13*  PERCENT OVER 6<4 (DECIMAL NUMBER 99.9)
     i«t*  PERCENT FEMALE (DECIMAL NUMBER 99.9)
     15*  POP PER SQ MI (DECIMAL NUMBER 9(5).9)
     16*  PERCENT URBAN (DECIMAL NUMBER 99.9»
     17*  * EMP OTHER CNTY  (NON-KEY  DECIMAL NUMBER  99.9)
     18*  INCOME >10K/<5K (NON-KEY  DECIMAL NUMBER 99,99)
     19*  INCOME >15K/<1QK  (NON-KEY  DECIMAL NUMBER  99.99)
     20*  INCOME >1518  (DECIMAL NUMBER 999.9)
     23*  GAS SALES/CAPITA  (DECIMAL NUMBER 999,9)
     30*  MEAN TEMP jAN (NON-KEY  DECIMAL NUMBER 99,9)
     31»  MEAN TEMP jUL (NON-KEY  DECIMAL NUMBER 99.9)
     32*  MEAN ANNUAL TEMP  (DECIMAL NUMBER 99.9)
     33*  MEAN PRECIP JAN (NON-KEY  DECIMAL NUMBER 99,99)
     3H*  MEAN PRECIP JUL (NON-KEY  DECIMAL NUMBER 99.99)
     35*  MEAN ANNUAL PPEC  (DECIMAL NUMBER 99.99)
     36*  HOURS SUN JAN (NON-KEY  INTEGER NUMBER 999)
     37*  HOURS SUN jUL (NON-KEY  INTEGER NUMBER 999)
     38*  REL HUMIDITY JAN  (NON-KEY  DECIMAL NUMBER  .99)
     39*  REL HUMIDITY JUL  (NON-KEY  DECIMAL NUMBER  .99)
     fO*  CNTY ELEVATION (INTEGER NUMBER 9(5))
     *•!*  CNTY LATITUDE (NON-KEY  INTEGER NUMBER 9999)
     ^2*  * WMALE IN  MIG (DECIMAL NUMBER 99,9)
     43*  * WMALE OUT MIG (DECIMAL NUMBER 99.9)
     MH*  X WFEM IN MIG (DECIMAL NUMBER 99.9)
     H5*  % WFEM OUT  MIG (DECIMAL NUMBER 99.9)
     <«6*  * NWMALE IN MIG (DECIMAL NUMBER 99.9)
     *»7*  % NWMALE OUT MJG  (DECIMAL NUMBER 99.9)
     *»8*  * NWFEM IN  MIG (DECIMAL NUMBER 99.9)
     *<9*  « NWFEM OUT MlG (DECIMAL NUMBER 99.9)
    100*  NEDS EMISSIONS (RG>
      101*  NEDS POLLUTANT  (INTEGER NUMBER 9(5) JN 100)
      102*  COUNTY TOTAL (NON-KEY  DECIMAL NUMBER 9(8),999  IN  100)
      103*  POINT SOURCES (NON-KEY  DECIMAL NUMBER 918).99?  JN  100)

                                                      (continued)

-------
                      TABLE 1 (contined)

200*  SAROAD MONITORING SITES 
  206*  CITY POPULATION (NON-KEY  INTEGER NUMBER 9(BJ IN 2DO>
  207*  UTM ZONE (NAME XX IN 200)
  208*  EASTING (NAME X(8) IN 200>
  209*  NORTHING (NAME X(7) IN 200)
  210*  ADDRESS (NQNi-KEY  NAME X ( 11 ) IN 200)
  211*  TYPE (NAME XX IN 20C»
  212*  ELEV ABOVE GND (INTEGER NUMBER 999 IN 200)
  213*  ELEV ABOVE SEA (INTEGER NUMBER 9999 IN 200)
  300*  MONITORING DATA (RG IN 200)
    301*  POLLUTANT (INTEGER NUMPER 9(5)  IN 300)
    302*  METHOD (NAME XX IN 300)
    303*  INTERVAL (NAME X IN 300>
    301*  « OBSERVATIONS (NON-KEY  INTEGER NUMBER 9(5) IN 300)
    305*  GEOMETRIC MEAN (DECIMAL NUMBER  9(5).99 IN 300)
    306*  STD DEVIATION (NON-KEY  DECIMAL NUMBER 999.99 IN 300)
    307*  70TH PERCENTILE (NON-KEY  DECIMAL NUMBER 9(5).99 IN 300)
    308*  90TH PERCENTILE (NON-KEY  DECIMAL NUMBER 9(5j.99 IN 3001
    309*  99TH PERCENT1LE (NON-KEY  DECIMAL NUMBER 9(5).99 IN 300)
    3io»  HIGHEST VALUE (DECIMAL NUMBER 9(5).99 IN 300)
    3ii*  2ND HIGHEST VAL (DECIMAL NUMBER 9<5)«99 IN 3005
    312*  LOWEST VALUE (NON-KEY  DECIMAL  NUMBER 9(5).99 IN 300)
100*  AIR QUALITY (RG)
  HOI*  AQ POLLUTANT (INTEGER NUMBER 9(5) IN 100)
  102*  MEAN POP SITES (DECIMAL NUMBER 9(5).99 IN MOO)
  103*  MEAN SRCE SITES (DECIMAL NUMBER 9(5).99 IN HOO)
  101*  MEAN BKGND SITES (DECIMAL NUMBER  9(5).99 IN 100)
  105*  MEAN TSP PREDICT (DECIMAL NUMBER  9(51.99 IN 100)
600*  PERCENT IN-OUT MIGRATION (RG>
  601*  AGE GROUP (INTEGER NUMBER 99 IN 600)
  602*  * MALE IN TOT (DECIMAL NUMBER 99.99 IN 600)
  603*  X MALE IN ALC (NON-KEY  DECIMAL NUMBER 99.99 IN 600)
  601*  X MALE OUT TOT (DECIMAL NUMBER 99.99 IN 600)
  605*  % MALE OUT ALC (NON-KEY  DECIMAL  NUMBER 99.99 JN 600)
  606*  X FEM IN TOT (DECIMAL NUMBER 99.99 IN 600 I
  607*  X FEM IN ALC (NON-KEY  DECIMAL NUMBER 99.99 IN 600 )
  608*  X FEM OUT TOT (DECIMAL NUMBER 99.99 IN 600)
  609*  % FEM OUT ALC (NON-KEY  DECIMAL NUMBER 99.99 IN 600)
  610*  % WMALE IN TOT (DECIMAL NUMBER 99.99 IN 600)
  611*  X WMALE JN ALC (NON-KEY  DECIMAL  NUMBER 99.99 IN 600)
  612*  X WMALE OUT TOT (DECIMAL NUMBER 99.99 IN 600>
  613*  X WMALE OUT ALC (NON-KEY  DECIMAL NUMBER 99,99 IN 600)
  611*  X WFEM IN TOT (DECIMAL NUMBER 99.99 IN 600)
  615*  X WFEM IN ALC (NON-KEY  DECIMAL NUMBER 99.99 IN 600)
  616*  X WFEM OUT TOT (DECIMAL NUMBER 99.99 IN 600)
  617*  X WFEM OUT ALC (NON-KEY  DECIMAL  NUMBER 99.99 IN 600)

                                                     (continued)

-------
                         TABLE 1 (continued)
  618*  * NWMALE IN TOT (DECIMAL NUMBER 99.99  IN  600)
  619*  X NWMALE IN ALC (NON-KEY  DECIMAL NUMBER  99«99  IN  600)
  620*  * NWMALE OUT TOT (DECIMAL NUMBER 99.99  IN  600)
  621*  * NWMALE OUT ALC (NON-KEY  DECIMAL NUMBER  99.99  IN 6001
  622*  * NWFEM JN TOT (DECIMAL NUMBER 99»99  IN 60D>
  623*  % NWFEM JN ALC (NON-KEY  DECIMAL NUMBER 99.99  IN  600)
  62M*  * NWFEM OUT TOT (DECIMAL NUMBER 99.99  IN  600)
  625*  I NWFEM OUT ALC (NON-KEY  DECIMAL NUMBER  99.99  IN  600)
700*  AGE-ADJUSTED DEATH RATES (RGJ
  701*  CAUSE OF DEATH (INTEGER NUMBER 99 IN 700)
  702*  ADJ TOTAL D-RATE (DECIMAL NUMBER 9999.99  IN 700)
  703*  ADJ WMALE D-RATE (DECIMAL NUMBER 9999.99  IN 700)
  70«»*  ADJ WFEM D-RATE (DECIMAL NUMBER 9999,99 IN 700)
  705*  ADJ NWMAL D-RATE (DECIMAL NUMBER 9999.99  IN 700)
  706*  ADJ NWFEM D-RATE (DECIMAL NUMBER 9999.99  IN 700)
800*  AGE-SPECIFIC POPULATION (RG)
  801*  AGE GROUPING (INTEGER NUMBER 99 IN 800)
  802*  TOTAL POP (NON-KEY   INTEGER NUMBER 9(8) JN 800)
  803*  FEMALE (NON-KEY  INTEGER NUMBER 9(8)  IN 800>
  80***  WHITE (NON-KEY  INTEGER NUMBER 9(8)  IN 800)
  805*  WHITE FEMALE (NON-KEY  INTEGER NUMBER 9(8) IN 800)
  806*  NONWHITE MALE  (NON-KEY  INTEGER NUMBER 9(8) IN 800)
  807*  TOTAL URBAN POP (NON-KEY  INTEGER NUMBER 9(8) IN 800)
  808*  URBAN FEMALE (NON-KEY  INTEGER NUMBER 9(8) IN 800)
  809*  URBAN WHITE (NON-KEY  INTEGER NUMBER  9(8)   JN  800)
  810*  URB WHITE  FEMALE  (NON-KEY  INTEGER NUMBER 9(8) IN  800)
 User-defined functions:


 1001*  AREA SOURCES  (DECIMAL FUNCTION ((C102-CI 03 ) ) J
 1002*  MALE (INTEGER FUNCTION ( (C8D2-C8D3 ) ) )
 1003*  NONWHITE (JNTEGER FUNCTION ( (C802-C80H ) ) )
 100M*  WHITE MALE  (INTEGER FUNCTION  ( (C80H-C80S ) ) )
 1005*  NONWHITE FEMALE (INTEGER  FUNCTION ( (C8Q3-C805 ) ) )

-------
1 ,
\
V i '
\
CO
V [£
' t—
\ 0 «
o /-
(N £
u "^
c
u p=
w


1 ^ _. ?^
P^ F-i ' 1 f 1 ^
ti < <: z 5
« O W i-l O
u pa cj
J" ^-1 K O
g O H J5

[r| |-^| f , ~ r ]
^ g W S H
U3 H y^















I
I



O
5^
0 K
. O C
U n
s^
o
JC

f^
i a. z
0! ^H O <
o fe e o
s js a d 1

-<
                                                                                                                         •H
                                                                                                                          OJ



                                                                                                                         W
                                                                                                                         en
                                                                                                                         M
                                                                                                                         Pi
                                                                                                                         PH
                                                                                                                         O
                                                                                                                          0)
                                                                                                                          (-1
                                                                                                                          3
                                                                                                                          to
O
                                                                                                    g
                                                                                                    U)

-------
descendant of C200 SAROAD MONITORING SITES and thus, entry level data 1*
through 41*, repeating group C200, and repeating group C300 are related and
belong to the same family tree.  Logically, this means that for a county C2
there are a number of data sets in C200 containing information on each
monitoring site for the county, and for each monitoring site data set, there
are a number of data sets in C300 containing information on each pollutant
monitored by each site.

COMPONENTS AND SOURCES OF DATA

     Table 2 is a quick-reference description of each component included
in POPATRISK.  Each component number is listed in numerical data base
definition order along with the component name, description, units, picture
in COBOL notation, and original source of the data item.  Certain components
in the database definition (Table 1)  are described as 'NON-KEY.'   All
unlabelled components are 'KEY' except those labelled RG (repeating group).
Key components in Table 2 are flagged with a + symbol to the left of the
component number.   It is anticipated that these KEY items will be frequently
needed as selection criteria for other data in the data base.   A component
must be KEY to be used in a retrieval WHERE clause.  For example, for KEY
component C305, one could easily obtain the number of stations reporting a
yearly geometric mean greater than a specified value for any pollutant, for
any set of counties.   Because of the significant cost of loading and main-
taining KEY'ed components, the KEY option is used only for those components
considered useful as selection criteria.

     State and county codes (components Cl through C3) conform to the SAROAD
county assignment protocol except for isolated cases which are discussed
later in this section and also in the 'POPATRISK Geocoding Manual.'  General
county description data (components CIO through C41) were extracted from
a master file created by SSI.   Population figures are based on the 1970
Census and,  with the exception of TOTAL POPULATION, were extracted from a
Bureau of the Census 1972 County and City Data Book computer file containing
data obtained from representative samples of population rather than a complete
count.   To maintain consistency as much as possible and to prevent possible
confusion, TOTAL POPULATION CIO figures were changed to reflect the same complete

-------










;z;
o
M
H
Q.
KM
a
o


M
Q
H

53
O
1
O
CJ
H
W
|C
8*3
g°
Sg
*
*
W Pt
d °
t> w
rJ
•< ^-
fH Ed
3 S
*S











H§
OMPONEN
SCRIPTI
U W
Q









SI*
•5 v
f i







S a!
S w

|i


§
1
CO


S
PS




• r-»
t— ( ^D
ggg
O fl 1
S3 Q
i P
_J Q.
8s81
3 -rl s C-4 ~ -
en i-i P: i-i a M 4-1 1-.
 Ov OS OS OS ,-s OS OSOs
^ X OS OS OS OS OS in OS OSOS
^^
OS


x-s j- 0
rH CJ 00 O
303 * cd B >s O
S o >s s y * •
B t-~ Q. 01 rOBOos
00 ctl u os O 4J -O M 3r-lvO
B £ O r-4 CJ -H i-H 3 O «• Os
•B IM " .S O 0 rH
0 B 0> 0 "5 rH f 00 -3 H «j -
O -H t) U M > B rl Of JC O
•So UH .3^ 5 co w -aug
QOO OB «> C04JO
- 4-> BOJlO
2ei)3 BC rl sf^*J_ C04J<0-
r4AO CO SOBBB CO
CO .H fi • -H B 030 C 01 C
XB C<"04J 4J Orll-IO-^ -r4MCO
•OCOg OJ-CO CO 1-IOJ4JOW M^
0)r-l 4JOIO J2 4J>«) « « U
TlOelaJ 0)"r4 4J COOr-jBrH A'4I
1-1 H -O "r-l i-l 3 -rl 3 fS B CO
•H  O O 01
TJPL^OJ CDrH3 O CXOOOIO r-HuO)
OO-S-SrQP- i-l O-rlttrHQ. (XBrH
SCXtO 03 -4J O.4-1 s 9 OJ
B^0-rl3-"r-! r-lr-ICCltJ J3B
OI-Ht)!>,.r4COcoe 3 cd 3 4J 0) B- B4JO
•o ojf<4J>B3a. 4JO,on 3cnoi-io
o-aB4Jolcdo)oo oO4Jo3<33Ti>B
OOJi-IBi~4 OU(X»4JCX 3OC04J iH
(Dnn>3CO 00 CMC?* BcdCD
t» o>sO-rt>
o -H >-,T-I >, i-i 'w id 4-1 u u 4J cd a. <-> J3 W>,rH<»-l OJ B M-l-OlHU-lOl
*4cog,-ig 3CXOJ3rOO>rlO O 01 O iH
COcUO^'OCdO "H B M-lr-l
, covi BOIBBBcdBU-iB B
B o>UrH33r-l OJB 0) OJ OJr-l 01 01 0) O CO
• PM P-c PH Pj rWcd (U CC4-I






M
IT)

Q r-t IV rQ rQOPMCO^ JBtH
H4 H O HA
ZC^!cij CJCJCJO O2EJO
§ H § W SjwSoS SB
U CO H P-. CL. (1, (L, pj PH »-« M








CN (o o i-4 CN co -3- in vo t— oo
S i-l ^rHrHrMr-l r-lr-l
-1- + + + + + + + +





OS
o\
Os
OS



§
o .
-Os
in MD
•H OS
«• r-4

|o
^ 0
4J O
rl C3
01 i-l
4-1 lO-
CO
01 c
&£
4-1
0 CO
U 41
B r-4
•H
0)
-c B
4J 0
•H CJ
* B
•r4
0)
OJ JS
^ 4J
fH -rl
•a?
03 CD
CM 0)
•rl
U-l »-4
O i-l
0§
•H IM
3S





x
5
rH
V
H
•H
A
Z
M








O\
•H






Os
Os
OS
OS



0
O
0
in os
r-l SO
to- Os
I-l
B
J8o-
4J O
O
rl -
CD in
4J 
SB
S>,a
4J
I CO
O 09
U 0)
C rH
•H
4J Q
•H 0
s a
1-4
CD
01 Jl
•H iJ
3*
§ CO
>H 01
T4
1-1 r-4
°^
O CO
•H U-l
32






«
V
tx
•n
r-4
A
Z
l-l








O
CM







Os
Os
OS

B
O
•H
4J
cd
4-1
rl
0
O.
0)
B
ca
rl
4J
u
»4
rH
,0
R
00
OJ
00
3
4J
CO
X
4J
B
o
•H
4-1
3
O-j!
O M
 CD
i-l CO
•a • oi
«l B
CD 60 -rl
2 « 3
o u-i pa
4-1 O
CD U-l
CO O
rl rl
O O CO
3 01 3
0" X CO
•rl B
rH CO OJ
rH O
rl
O U CN
u-i 0) r-.
> o,
3 Or" .
rH r-> B •--
ed 4J O co
00 B rl K
3 u-i co
rH O rH
03 U Cd rH
4J 4J O
O U-l cd Q
H O T3 s-'





CO
£
U
S3
CO
rJ
O









OS
OS
OS

•-H
cd
4J
o
4J
U-l
o

OJ
3
•H
cd
>
OJ .
J2 *-\
U CD
xy
JO rH
-0-3
OJ Q
3
a, meas
tations
4J CO
T-l
Q. 0)
cd B
O 1-4
rH
r< O
0) 00
a. cd
00
00
OJ >»
rH JQ
0)
CO 00
01
00 rH
o) cd
O CO





•<
s
I
CO
u
3
CO
s








cr>
CN
-1-
























Q)
•0
•H
>
O
K
0.
0
4-1
00 •
00 00
OJ r-l
•H CO
rl «
O i-l
!-,
CO 4J
r4 W
01 V4
4^
CJ 1 U B
U o
&M SO
r-l n
00 00
T4 01 C
H i-l
4-1 Cd -O
B cd
o> no)
B 4) J3
0 B
fCd 4J
g rH
3
U 4-1 03
SIM
rH
B B rH
9 O 3
(5 UU4
+ *


























4J
0
U-l
B
O
•H
4J
CL
•H
14
O
03
01
•a

•o
rH
OJ
•H
U-l
g
8

•o
rl
«!
•3
S
4J
cn
«
*

























of sources.
B
O
-H
4-1
CO
4-1
B
g
O
o
T3
U
01
X
4J
t-l
3
U-l
M
O
U-l

-------
tl

-------
           1;
           8,
o
•o
 3
 d
•H
iJ
 g
 o
«NI

W
,J


I
COMPONENT
DESCRIPTION
                                                                                                                                                                      g
                                                                                                                                                                      u
                                                                             en
                                                                             en
                                                                                                        oox
                                                                                                        V-'X
     ^          -gJSria        5g
      09    •       qj  4J  S O            S*
      B   «       >
                                                 euxu
                                                 OCU--H
                                                 -HS^ncOeO
                                                 UOrHO™
                                                 O.OBO,,
                                                 -H     3«n
                                                 MCU C/vc   °
                                                 0 43    r^   ...
                                                 CS U  U  1  -S
                                                 CU    •HCN'3
                                                 T3B<;--Tl
                                                    -
 X
 U
•H  U
 U  U
QOed-S
M-rtw
    UNO
4I3IU
eo  O    01  o    •
--  ft  * *  S
         ox°
                                                         00
                                                         B  h
Point source
(Tons/yr)
                           >  z
                                                                                                                                      >   >-l
                                                o
                                                ts
                                                                      o
                                                                      l
                     O   -I  M
                     CM   <>l  CN
                                                                                                                                 CNCMfl
                                                                               12

-------
  W  <
  O  p
          o.
          s
                                                                                                                                                                 •o
                                                                                                                                                                  01

                                                                                                                                                                  3
                                                                                                  M
                                                   9
                                                   z
                                                                                                   I
                                                                                                  «
                                                           ON  ON
                                                           OV  O\
                                                            •    •

                                                           OV  /—,
                                                                                       ON    ON   ON
 ~
 <

•°
                                    e
                                    o
T3
 01


I
iJ
 C

 8
es

W
nJ


3
ON
DESC
            oo
             -
         o««ocs
         CU^CM o
            ticMvO
         Qt)
                                                                                  o
                                                                                 M
                                                    3
                                                    M
                                                    a
                                                    o<
                                                    CT.
                                                                                                                  ai  h
                                                                                                                  at  o  eo
                                                                                       60
                                                                                       a
                                                                       B
                                                                   •  •>.
                                                                  •»   op
 M
 3
-0

T>
 S
                                                           41
                                                           2
                                                           °
 3
-oeio
     a
-0-H
 01   M
 >3
                                                                                                                     41
                                                                                                                    •o
                                                                                                                                3
                                                                                                                               M
                                                                                                                               M
                                                                                                                                o
                                                                                                                                o.
 OOOO
 e  «
-H4l-
                                                  0)
                                                  M
                                                                      >o
                                                                      Co

ENT

*
        a
        O
        a,
                                                                                                                  3S
                         CN
                         0
                                        OOOO
                                        r'lce<"i
                                                                         13

-------
                                                co  o.
                                                9  nl
                                                                                                                                                            •o
                                                                                                                                                             a
                                               MO
                                               CO  01
                                               en  a.
                                                  CO
TJ
 0)
 3
 G
T*
 t-t
 a
 o
 o
CM

i
 acM
 30
 O
 o  m
                                    Cn
                                    30
                                    O
                                    ooooo
                                      I-HC
                                   J3rt-H
                                    oa-H
      •OfO           a
 Wl    C 6   H 0>  U>
 O'ooj'^o'a
iwO     OOiwOdl
    o«3-      oj3
 01    x-^^*   01     U
 oua)      cm
 cdnU'cgmoi
 4l  o  S r-^   a) o  M
 0S.S      ap-S
    ^ H h      h £
 U3r-o    jJcr
   4J0«j
                                                  0)
                                                  ij
                                                  W
                                                  H
                                                  d
                  0.73
                  3  ii
                  On)
                  neoM-i
                  oo 
                                                            3
                                                         UU
                                                        ^s  «
                              e«
                                                        
                                              « -o
                                             ^r-l
                                                o

                                            lO-O
                                                c
                                                «
                                              C
                                              -Hiu
                                              CO.
                                                                            S
       «J B
       cJ o)
•H  4J

i> B
                                                                    4-1 J3
              u-l  W  i  JJ
              \D  *0  M  fl
O 4J
O C8
   •o

§
                    M  -H     CO  3
                    U  UJ  M  W  TJ
                                                                              I-M    t—I  VI     I*
                                                                          C   O  B H  C CO  41
                                                                         ~r{   Q.-H <  M -H O
 h  3  O  -a-
 41     hi
 B  O. 000
 33     f>
 C  O  01  ^
    V< X  J
 01   01
0-i   O.
                             55
                             H
                                               H
                                               ^

                                               3
                                                                                     O
                                                                                                                                     5§§Sgl
                                                                                                                                     ww
                                                                                                                      h43^1


                             8
                                                                                                                         S
                                                                                   S3SS5
                                                                                                                                              S
                                                                      14

-------
4-1
a
o
o
CN
*
*
•1C
cd <
p H
f* *C
0
C/5 fe
O
J U
> u
n












as
IH O
W E-*
O H
U M








!k
1

Id

6 p
8



CO
3
CO
g
o

ON
ON
i.












rH
id
o
B
•H
0>
rH
01
•g
g
U
01
PL.

g
IH
i

0
rH
vO
+






"



_










^^

a
id
CJ
o
rH
td
01
rH
01
•g
g
U
01
PH

0
2
M
Cd

rH
rH
VD







-



_








,

•O
0)
4-1
^^ CO
rH O
Id O
4H rH
4J Cd
4-f 4-1
a 3
O 0
01 01
rH rH
§td
B
0) 4)
•g "g
B B
U 01
CJ U
0) 0)

H O
O .J
H <
H H
i B
w w
1 1

CNI en
rH rH
v£> VO
+






•• "



.








^^
•n
01

rH (.I
CO O
4-1 rH
O rH
B B
•H fH
01 01
rH rH
§10.
01 1
CU 01
i ^
B B
01 01
o u
01 01
PH PH


* Z
M M
§ §
«4 »4!
.» in
rH rH
vO O
+






-



_










.

rH
id
o
4J
4J
3
O
01
rH
1
01
9
B
01
U
01
PH

S
H
B
o
S
§


rH
\0
+










.........<--..





f
• /4. *o
^N T3 « •

T3 4) • -W x-s 0) rH
4) u ^ ca tH o co

St-H O «J O *J -— < O
HtHOO(UVa)
3 iH i-( fH
O<-UO)QJ4>cgflJrt
fH rH rH rH E B 0
iVtQQtdcdiUVQj
i-HanBB'*-***-*'*-*
gH)(l)DQjt.V(D(U
.1 ? I f f f If
•HBBBBBBB
-COOOOOOO
3BBBBBBB
BBBBBCBB
01010101010)010)
uouuuuuu
OIOIOIQJOIOIOIO)

g ej c5 ^ E-i
< H H H 
•"-i
•o
id
i)
00
id
13
u
aq
1
|
^
rH



U.)
1




















4-1
el
6
rH


Csl

+
CO
01
•H
Q
01
10
CJ
•a
01
id
oo
00
oo
ra
M

ca























13
CO
CO
01
•§
H

01
01

^-*
10
11
T3
o
oo
n
M
1
o
CO *^
cd
rH 41
0^
C H









                                                                                                                          1
en
B
ON
CO
01
a
4)
4J
U

^

M

UH
O
B
O
4J
id
00
0)

BO
60
tlj

U

IH
O
14H
O
1^
1
rH
0)
•§
CJ

•x.
o
0
u
C/3
"
rH
o
r>-
+
e/>
8
ON
ON
ON
ON
ON
ON
^
8


$
to
11

IH
0
UH
01
id


4J
id
01
•o

•8
4J
3
•a
01
00
id
M
id
4J

1
g
g

*>
g
p^
+

















0
o
o
C)
2
p
OI
a

&•*

o
oo
u
10
0









:
i-1.
u
to
01

0
UH

01
4J
cd
IH
J2
id
ol
•o

41
rH
id
B

01
•H
*
4?
to
3

•o
1
01

Cd
?'
a
Ed
g

i
m
O

-t-

O
o
o

0
s

IH
01
a.

CO
01
IH
O
00
01
4-1
id
u


CJ
M
O
§
•H
4.1

00
01
IH
00
00
id









i
rH
Ui
Q)
O,


4-1
tt


4J
g
V
•a
u
rH

1
UH

01
4-1
•H
•*
1
CO
3

•o
1
01

D-RATE
g
§e

i
3

+
o
8
8
rH
IH
11
a


01
4J
crj



4J

•X3
0)
cd


01
•H
s:
?
B
O
B
S
CO
3

•3
i
01
Cd
S
i
Id

g

i
o

H>
^^
O
O
o
o
o
rH
rH
41
Cv.

01
4J
cd
IH


Vj
id
01
13
01
rH
0)
UH

01
JJ
•H

f
B
O
B
•a
01
CO
3

•o
id

01
00

D-RATE
§
g

i

r-
H>
                                                              15

-------


^*^
T3
a>
g
•rl
*J
P!
o
o
•+~t
C^J
U
rJ
§
il
w (n
o
«
*
W P3
,J R
> w
-rf* ^^
OMPONENT DAT^
SCRIPT ION TYPE)
COMPONENT COMPONENT C
NUMBER NAME* DE
3 3
« « 	
g ' g 	
o u «
^ 00
- r : -. -. -. : -- = ,v
in
o
00
0. «> V
3 OJ 00 J.
M^BST 7 s §
1? .« o, S-2 ng-".^ ~ g § §
iH4)jO-S M OO-O O ^ t '
•rt •< 3 f"< _. M O 00 a r-4. t< rH M c^J S 13
<0 i5 * .3 0. 0 ,0 OoSSoi
4JOB-HOJO)J20I 01 U 0! m H » 3 O|-^iH
OOI4H
WOIBO.'WO^'-'O •> 31-10.30.00 p u-i-Ho.
004J o3-aOpq"H JSBOB O003O3AI C iHo3
n-rioB-BsfOoo w OKO -C-aooio
OOO4JMO)Ca]C-HM3 -O-H ODOlMOOki^: mOI 6000 00 iHCXIOOO
iHoirHB-Hng-ai-toton) o. oiio o) a o "H o)
4J003(JO4Jc>in60tocr' rH 0.30.00 oioooioi S-H 0.01
rtq)O.'H-H'H., tdO. 3 Br-~a Quo OO) ton OI3OO
o.m o.o.to-Hjj°04-iooni ° HOOUJSB j= o xo.on
OlrH O^H'wTH"ItntJ4J UO. 60 60UOIXOXU-I 4JO)V400M
u^MSHtj-. tooo«oi o) « ony B 60 o
O 4J 00 O.-O tU r-l O CTv BO CO 01600lu
* 4J -H o o -a -n 01 itdiJ 601000 oo> BSSo.H OI>M QOIigB-H E 3 IMOIOUHIO
O CX-H • o O, C ^ O OO.OI OIO4JBOBO. 60U-I rH
iH3>0)4JO3Jj,,alt>tl<::>oa' H -HlOOiHOO tOlO B3
iJOOP,«OTj6pM(OO M MOM4JrH-H4JiHC4* CI BOCX
lOMM^l-'CXtj.2;C(0 * Oi4-iOf034J(84J O^-i " BOBO.O.30.310 to MrHOl
O4J-H 0)Bos^lHO)M-»6060 OlHQO CX O CX S -H B i-l 3 rH
(X to 0)0)w
UJ3000l-HC4-itO .010100 3CXp(Otog4-l4J 0 rH 01
•H 0) O> .C M-l O O'QQ.MOOklU O.OO.SOIX>9iHiH M30I<-IO>
OB4J4-liHB7 j: ao.x ^
O) tO -^ rH T3 13 CX 00 1-ia.O.O.rHrHOIOIdrHBBB > 0) 3
lOjdo>3B^3 uaiD33«tfl4-i4j|«««q) maiiuT
o>r~-uMOCx3 Boo>n>ooo4-> B-HIH B u ji J3 ji oltH c -H B
M}O\\4?O)-|OO -MXIOIQUMMO a>.CtB o O M M l-i M H} OfC O
^rHr-iiijo.o.u-! -a!604j>,Oo06oeoHPi.5S!zHO»5 3 :3:z5:a
M
H
DM M
O g J W
O O MMMwc/3 2
H Jzs , i fj js j u b •• W td
&I"! ^C ^C ^C ^C EH &3 O M ^
& ^ SSnSHblzaS 3
U D O up^UrCHO^uScd
M O B< PuWStaSri-lrHOHSH
Cu « UH fdHCOMM
W O) JJWWI-IJZ2S U X W 3
1 <3-
-------
 count as is used in the age-specific population repeating group.   Income
 ratios are based on 1969 data and were also taken from the 1972 County and
 City Data Book file.

      Figures for alcohol sales and gasoline sales were taken from the 1970
 Census of Retail Trade.   Data for alcohol sales are available only when 500
 or more retail establishments exist in the county.   Therefore, data is
 available for roughly  60 to  70 percent of the  counties.   The problem is
 further complicated by disclosure limitations  in some  of  the remaining
 counties.   Using the data that do exist,  SSI developed an index for providing
 a rough indication  of  alcohol sales per persons 18 years  of  age and over for
 unlisted counties.

      Climate data were originally obtained from the Environmental Data
 Service of the National Oceanic and Atmospheric Administration.   The data
 are in terms of means  based  on the years  1941-70.   Where  there is  more than
 one weather station, data are recorded  for the  station closest to  the
 county's population center.   The same  procedure was used  to  record the
 county's elevation  and latitude.   Since not all counties  have a primary
 weather station,  National Weather Service climate regions were used to extra-
 polate data from other similar climatic areas.

      County migration  totals (components  C42 through C49)  are totals for in
 and out migration summed across the age groups  in the  migration repeating
 group.   Detailed information on the in-out migration data will be provided
 later in this section.

      The NEDS  EMISSIONS  repeating group data (components  C101 through  C103)
 were extracted from data present  in  the NADB NEDS-USER file  in 1976  for the
 five criteria pollutants:  Total  Suspended Particulates(11101), Carbon
 Monoxide(42101),  Sulfur  Dioxide(42401), Nitrogen  Dioxide(42602),  and
 Hydrocarbon(43101).   The  numbers  in parentheses  refer  to  standard  NADB
 pollutant  codes.  The  repeating group  consists  of data sets  containing
 total  county  emissions  and total point  source emissions for  each of  the
 pollutants.  A user-defined  function is provided  for calculation of  area
 source  emissions  by subtracting  the point source emissions from total
emissions.
                                    17

-------
     The SAROAD MONITORING SITES data (components C201 through C312) were
selected from 1974 data in three SAROAD files; Yearly Summary File (NADB-
YRSUM-D), Frequency File (NADB-YRFRQ-D), and the.Site Description File
(NADB-STE-INVO). The SAROAD MONITORING SITES repeating group contains data
sets describing each monitoring site, its location, type, and purpose.  The
MONITORING DATA repeating group consists of data sets containing SAROAD
monitoring data as descendants of the reporting site descriptions.

     Monitoring data are provided for the following pollutants:  TSP(lllOl),
Nitrate(12306), Sulfate(12403), Carbon Monoxide(42101),  Total Sulfur(42269) ,
Sulfur Dioxide(42401), Nitrogen Dioxide(42602), and Ozone(44201).   The
numbers in parentheses refer to standard NADB pollutant  codes.   Sampling
intervals provided in the data base are  identical to standard NADB codes,
except that codes X, Y, Z,  and C were expunged from the  data base because
the data were of questionable validity.

     As requested by the EPA Project Officer, the yearly means in the data base
are geometric means.  Although some present National Air Quality Standards
set a limit on annual arithmetic means,  these means may  be estimated
from a knowledge of the geometric means  and geometric standard deviations.
A list and description of codes used in  the data base to describe monitoring
sites and data, e.g. site type, sampling method, sampling interval may be
found in the appropriate sources as referenced by Table  2.

     The AIR QUALITY DATA repeating group data  (components C401 through C405)
represent calculations of air quality data for each pollutant monitored in
the county.  Data are in terms of the arithmetic mean of geometric means for
each monitored pollutant for purposes:  (1) population monitoring, (2) source
monitoring, and  (3) background monitoring where the sampling interval code is
1 (1 hour)  or 7 (24 hour).   An additional component provides a calculated
predicted mean for TSP where TSP and NO   emissions (tons/year)  fall within the
                                      X
following limits:1      1414.375  < TSP < 230031.803
                                  NOY < 432778.195
                                    rft "•"
      These limits correspond to the limits of the variable set used in the
      regression analysis discussed in Appendix C.
                                     18

-------
     The PERCENT  IN-OUT MIGRATION  repeating  group  data  (components C601 through
 C625)  represent   calculated migration  rates  for  the  county  over  the span of
 years  1965-70.  Documentation on the in-out  migration file, as received from
 the Bureau of the Census, is provided  in Appendix B.  It provides  a
 discussion of the file content, allocation procedures, and  comparison with
 published data.  Briefly, the data in  the file are based on question 19
 of the 1970  Census 15 percent sample household questionnaire.  It
 requested those persons who reported living in a different house in  1965  to
 report the state, county, and city for their residence in 1965.  An  alloca-
 tion procedure was developed by the Bureau of the Census to provide  a
 residence assignment procedure for those persons indicating they lived in a
 different house but did not report an address.  Data in the file are in
 terms of both 'unallocated' and 'allocated' figures.  It was decided that
 for use in the data base, the data would be best represented in terms of
 Total  (allocated + unallocated) and Allocated.  The race categories  on the
 data file are white, black, and total all races.   To maintain consistency
 throughout the data base, race categories in the data are white,  non-white,
 and total all races.   The 7 age groups used in the migration file roughly
 correspond to aggregations of the 15 age groups for age-specific population
 data to be discussed later.   These seven age groups are used in the data base.

     The in-out migration repeating group in the data base consists of seven
data sets,  one data set for each of the age groups in the migration file.
For each data set there are figures for each of the race-sex groups in terms
of Total and Allocated.   Since the Total figures  are considered to be the
most useful,  those components are KEY'ed.

     The AGE-ADJUSTED DEATH RATES repeating group (components C701 through
 C706)  contains direct method age-adjusted mortality rates for the county over
a three year period from 1969 to 1971.   Three tapes of raw mortality data,
one tape for each year,  were obtained from the National Center For Health
Statistics.   Each tape contains about two million records,  one record per
death,  and includes all deaths for the year.   The cause of death  contained
in each record is coded using a four-digit code of the Eighth Revision of the
International Classification of Diseases.  Considerations of constraints
                                     19

-------
imposed by data base size limitations and obtaining significant numbers of
deaths over a three-year period required some aggregation of mortality
categories at the three-digit level.  With the approval of the EPA Project
Officer, a decision was made to use aggregations of mortality categories
corresponding to those already in use at HERL.

     Two supplemental malignant neoplasm cluster groups were gleaned from
the publication 'A Sequential Space-Time Cluster Analysis of Cancer Mortality
in the United States:  Etiologic Implications,1 Fred Burbank,  Am. J. of
Epitf 95, 393 (1972).  Further discussion of the selection methodology for
these clusters is provided in Section 5.

     Table 3 describes  the 50 aggregations  of  mortality categories used  in
the POPATRISK data base.   Table 4 gives  a detailed  description  of  the two
cluster groups.   The POPATRISK code associated with each cause  grouping  in
Table 3 is the code by  which the grouping is referenced in the  data base.
The repeating group consists of one data set per cause grouping, containing
county age-adjusted rates  for the total  population  and  four race-sex groups.

     The AGE-SPECIFIC POPULATION repeating group (components C801 through
C810) contains age-specific population data extracted from 1970 Census tapes.
Population counts on the Census tapes are based on a 100% count and were
provided for rural and urban and broken down by race-sex group and age.   For
loading into the data base, the data were aggregated into 15 age categories,
1 data set per category, and expressed in terms of  Total and Urban race-
sex groups.   A 16th data set is provided as totals across the 15 age groups.
Race-sex group data not represented in the repeating group are provided for
as user-defined functions.

     The '1000'  series  components are user-defined functions which may be
changed or augmented at any time without affecting the integrity of the
data base.  They are mathematical expressions intended to provide useful
information needed on a less frequent basis while conserving storage space
in the data base.
                                     20

-------
%
00

8
M
                  O
                   1
                  0
                  H
                  O
 CTi  «O
 O  CO
 0  rH

 cici
 O  CN
 0  0
                                    CTi  CT>
                           1    1
                          O  O
                          <•  m
                          •-H  •-!
rO     r**- ON  f*1) ON
vo     oo oo  r***- ^^
rH     i-H r-l  rH rH

o  *  o sr  m
                                                                                   I   I   m  i   i
                                                                                  O ST  <• rH  CM
                                                                                  sr -*     m  ON
                                                                                  sr sr     sr  m
i
s
H
i
9
CO
cu
CO
s
to
CJ
0
VO
1


rH
O
cT
o

o
rH
CM
O
 H
           CJ
                                    •H  CM
                                                         m
                                                                           00  ON
                                                                                        O  rH CM CO «* in  VO
                                                                                        CM  CM CM CM CM CM  CM
                                                                                             r-- oo ON
                                                                                             CM CM CM
                                                                                                                          O rH CM  CO vtf-
                                                                                                                          CO CO CO  CO CO
 en
 PL)

 O
 (X.
           o
           CJ
 00
 M

 |


 8
CM
           ro
                         m
                                    oo ON
                                                  0 rH
                                                                CM CO
                                                                          m vo
                                                                                     oo
                                                                                             ON O r-l
                                                                                             rH CM CM
                                                                                  CM  co sr m  vo
                                                                                  CM  CM CM CM  CM
Cfl

§
co
 4J
 CO
 CU
 Q
 U-i
 O
 CU
 CO

J
           M
           2
                  CQ
                 •H
                  CO
                  O
                 rH

                  O
                  rl
                  CU
                 ,0
 CO
 CU
 CO
 CO
 CU
 CQ
 O
•rl
4J
 rl
 Cfl
 CX

TJ

 §

 CU
4J

i
                                    I
                                    H

                                   fi -
           "2
            d  CO
            *  9
                                    CO  CU
                                   rH 4J
                                    CO CO
                                    U CU
                         Ujj  It t

                          O   O


                          W   CQ
                          Cfl   CO
                         rH  rH
                          CX  CX
                          O   O  B
                          CU   CU  3
                          d   d  cu


                          d  £ J
                          6C 60 CU
system
 O
 4-1
 CO
 i-l  4-1
iH  CO
 CX Cfl
 CO  CU
 CU  rl
 rl  rO
                                                      co
                                                      cu
                                                      4-1
                                                     •H
                                                      CO


                                                     "8
                                                     •H
                                               CO  CO U-l
                                               d  d iH
                                               co  co  u
                                               00 00  CO
                                               M  rl  CX
                                               o  o  en

                                              rH  >,  3
                                               CO  rl
                                               4-1  CO  rl
                                              •rj  d  CO
                                               3  -rl J3
                                               CU  rl  4-1
                                               60 3  O
 O  O  O   O  O

 B  e  B   e  e
 CO  CO  CO   CQ  CO
 cfl  cfl  cO   cfl  cO
rH  rH rH  l-H t-H
 CX CX  CX  CX CX
 O  Q  O   O  O
 CU  CU  CU   CU  CU
 d  d  d   d  d

 4J 4-1  4J  4-1  4-1
 c d  d   d  d
 cfl cfl  CO   cfl  cfl
    added
    00  60  60 60
•H iH -H  -rl -H



(0
•g
CO
y
cu
rH
Cfl
*o
cO

Cfl
•H
e
CO
y
CO
plasms of lymphatic •
O CU
cu y
d w
(0
rl -H
CU 4J
4J cj
O iH
TJ CU
CO O
CX
CO O
B 4J
o eg
u B
rl <0
tQ rj ,
CO

jl
fied nature
•rl
CJ
CU
CX
CO
3
U-l
o
CO
e
CO
co
rH
&
CU
d

c* nj
00 S
•H JC
d 4J
CU CO
P^ <^






co
4J
•H
rH
f-l
CU

CQ
CO
4J
CU
•Q
iH













CO
Cfl
•H
Q)
d



CU
CO
£
(0
•H
TJ

CO
* p3
CJ
CO CO
i-l co
w >
•H o
60 rl
•H <3)
IS
rheumatic heart
CJ
fl
§
|

*o
d
to

cu
cu
U-l
cu
U CO
•H CO
4-1 CO
cO CO
E -H
Snrt
<§
r4
d *o
CU
4-1
U-l i-l
0 2
cu
CO U-l
CO O
CU
CO TO
T> rl
O
CJ UH
•H
d rl
o cu
& W
CJ O


cu cu
co co
CO to
CU CO
co co
•H iH
t3 T3

4J CU
SfH
10
A d
0)
CO 4-1
•H CU
(0 CX
C &*i
cu ,c
4J
cu cu
P rd
>-, 4J
w o
4-1
CO
CO
?s
rl
o
4-1
CO rH
•rl 3
co cj
O rl
M -H
CO CJ
r-l
CJ U-l
CO O
O
iH CO
* CO
cu co
tJ $
Cfl CO
•rl
i-H T3
Cfl
CO CO
d rS
CU 4-1
o o
%
cu
rl
rl
0)
O
"d
CO
CO
4-1
rl
CX
CU
TJ
cu
•H
U-l
•H
O
CU
CX
CQ
§

X) -H
d CO
Cfl o
^
O CO
•H rH
d CJ
| W
CJ
                                                                        21

-------
         00
\O csi^vDcsrOCO           CO    LOrH
co c^  r—( vo co  *^ 10           \o    r^-oo

 I   I   I   I   I    I   I  vo     rOlr^ll^
O O  O O r—(  O O iO     iA t—I lO -^  O ^
f*- a^  Ovoro-^io           vo    r^-oo
•^••^lOKj-ioioin           10    iotr»
                                                                     ro
                                                                     vo
                                                                         I   I
                                                                        oo
                                                                         r>-  oo
m m  m
.
                                                                                I   I
                                                                               ON O
                                                                               vO CO
                                                                               t*^ r*^
                                                                      rH
                                                                      CO
                 •H
                 CO
                                                               m CM  cs
                                                               CO O  vO
                                                               OO OO  ON
                                                               WWW

                                                               O CD  O
                                                               rH O  
                                                                                                oo

                                                                                                w
CO ON
vO W

w
                                                                                                ON
-oo

 cd
                                                                                                                          ON

                                                                                                                          ON

                                                                                                                          ON
          m
          cu
          co
          2
          o


          S

          CO
          fa
          CJ
VO
in
co
       oo
       co
                             ON
                             CO
                                    co
                                           vo
                                           m
                                                     CJN
                                                 O-   *i
                                                     oo
                                              CN
                                              m co 
                                                            o
                                                            m
         CO
         H
                       CU
                       n
       •o
       CJ

       •a
       o
                 cu
                 a
                 P.    a

                -a     8*
                 a     &
                 tfl     O
                    CO  0
                 CO i-l  O
                 N  4-1  U

                 Q) JS

                rH  fl  CU

                 a  S  4J
                M PQ  O
s, enteritis, and colitic,
f newborn
•H O
4J
•H CO
ti CU
CU &
T3 M
0 l-i

•o
CO 4->
CU P.
4-1 CU
•H O
^ X
4J CU
CO
ifl
o
ecystitis, and cholangitis
rH
0
r-l X
CU O
^
•H *
rH CO
•H
IM CO
0 CO
CO 2
•H 4J
CO -H
O rH
JC CU
J-l rH
rJ O
•H J3
CJ CJ
infection
Sx,
cu
e
•0
•H
*^

T)

CO
•H
4-1
•H
M
A
Cu
CU
2
CU
4J
cfl
4J
CO
O
S-l
Cu

4-1
O
cfl
•H
CO
Cfl
rH
p_,
^i
CU
p.
^>
s
ication of pregnancy
rH CO
& 3
O rH
o 53
S
TlJ O
cfl TO

CU cfl
•H 4J
M H
cu rj
> Q)
•H 60
rH C
CU O
Q U
iseases
13
•0
a
cfl

CO
•H
l-l
3
•r->
C
•H

r^
4J
J_j
•,-{
PQ
rly infancy
and ill-defined conditions
CO
CU
U-l
0

CO
cu
CO
CO
cu
CO
•H


M
0)
.C

0

>N W
4J CU
•H W

•H CU
C CO
CU -H
CO Tj
•> >-i
cn cu
0 rC
O 4-1
4J O
Cu
6 rH

CO 

^j
o
4J
Q
f.





CO
4-1
|

•H
CJ
CJ
CO

M
cu
_f"j
4-1
O











cu
•o
•rl
CJ
•H
3
CO










cu
TJ
•H
O
•H
a
0
EC
                                                                                                                              H  CM









CO
cu
•H
}_l
3
•i~>
•H
t-i
CU
M P
• cu cu
4J 4-J
co ca
3 3
rH rH
O CJ
a a
CO CO
CO cfl
_J __J
^^ r^
cu a
0 0
cu cu
d c
CO
CU 4-1 4-1
3 CO CO
CO d d
•O 00 00
•H -H
                                                                                                                        O <
                                                                  22

-------
                    TABLE 4.  CLUSTER DESCRIPTIONS

Cluster 1:  ICD                Rates Male *               Rates Female *
Breast           174              0.24                       20.89;
Bladder          188              4.89                        1.90
Kidney    ,       187              3.25                   '     1.81
Rectum           154              4.71                        3.44
Stomach          151             12.93                        6.64
Large Intestine  153             12.29                       13.67
Small Intestine  152               ?                           ?


Cluster 2;
Tongue             141             ?                           ?
Naso Pharynx       147            0.33                        0.10
Trachea, Bronchus  162           29.59                        5.07
and Lung           163
Larynx             161            2.0                         0.20
*  Mortality rates cited are 50th percentile values taken from Atlas
   of Cancer Mortality for U.S.  Counties:   1950-1969,  U.S.  Department
   of Health, Education and Welfare.   DHEW Publication No.  (NIH)75-780.
                                   23

-------
VARIATIONS IN GEOCODING SCHEMES AMONG SOURCES

     The compilation of several different types of data originating from a
variety of independentisources to create POPATRISK resulted in several discre-
pancies in county code assignraent schemes.   Resolution of these discrepancies
involved a significant portion of the data processing effort.   A brief
description of the geocoding problems and their impact on the  POPATRISK
coding scheme will be outlined here while a detailed discussion of the
resolution methodology is provided in Section 5.   Full documentation of the
POPATRISK geocoding scheme, including a cross-reference table  to the various
other county codes is provided under separate cover in the POPATRISK Geo-
coding Manual.

     The state of Massachusetts presented a special problem when using SAROAD
county assignments in the POPATRISK data base because the county assignments
do not match those of any of the other data sources.  SAROAD recognizes 13
counties while NEDS has  6 counties and Census recognizes 14 counties.   With
few exceptions, Census county assignments adhere  to the Federal Information
Processing Standards (FIPS) coding scheme.   Since Census uses  a finer geo-
graphical breakdown than NEDS and SAROAD and is most widely used, it was
decided to adopt the Census protocol for Massachusetts.  All data not con-
forming to Census protocol, namely NEDS and SAROAD, required development of a
reallocation procedure to reassign data to  the proper counties;  Massachusetts
county codes to be used  in the data base were arbitrarily chosen to be the
Census county codes with a leading zero to fulfill the four-digit code
requirement.

      Throughout development of the data base, Virginia continued to be a
 source of problems when attempting to mold data  for Virginia  counties to
 conform to a county based coding scheme.  The independent cities were not
 compatible with the county based data base and data for each  city had to be
 reassigned to its original county and incorporated into the data for that
 county entry.  It was necessary to "create" two  new counties  in Virginia for
 those independent cities for which no county now exists, namely those areas
 formerly occupied by Warwick County and Elizabeth City, and also by Norfolk
                                     24

-------
and Princess Anne counties.  These two new counties were arbitrarily assigned
county codes of 0972 and 2144, respectively, to maintain consistency in the
numerical and alphabetical order of counties.

      Another problem with independent cities occurred for the cities of
Baltimore, Maryland and St. Louis, Missouri.  Both FIPS and SAROAD lists
counties alphabetically but while SAROAD lists Baltimore before Baltimore
County, FIPS lists them in opposite order.   A similar problem existed with
St. Louis, Missouri.   Yellowstone, Montana also appeared as an independent
city and the data were reassigned to Park County, Montana.

     The five New York City boroughs presented a problem in that some data
sources provided data only for New York City as a whole with no further
breakdown into boroughs as needed for POPATRISK.  Climate and geographical
data were two such examples.  In these cases, the data provided for New York
City were duplicated for each of the five boroughs.   NCHS mortality data were
also available only for New York City as a  whole.  In this case, a decision
was made to calculate age-adjusted death rates using total New York City data
along with the total population figures for all five boroughs, yielding rates
for New York City as a whole.  These rates  were then duplicated for each of
the five New York City boroughs.
                                      25

-------
                                   SECTION 3
                         USE OF DATA BASE "POPATRISK"

     The POPATRISK data base resides on a private removable disk pack on
 the Univac 1110 at Research Triangle Park.  It is accessable in either
 demand or batch processing mode and can be used by non-programmers with a
basic knowledge of SYSTEM 2000 N.atural Language.  As part of SYSTEM 2000's
security system, a user is required to give a password for access to the
data base.  Multiple passwords may be available to the user, but these
passwords should have only retrieval authorities.  Assignment of passwords
and their authorities are allowed only by the Data Base Administrator
using his/her master password.   Normally the Data Base Administrator maintains
exclusive update authority to the data base.

     Once in possession of a valid password, a user- may access the data base
using the following commands:

               @XQT  N*NS2K.280
               USER, 'Password1:
               DATA BASE NAME IS POPATRISK:
                  S2K Natural Language Commands
               EXIT:

     As previously mentioned, it is not within the scope of this report to
provide instruction in the use of SYSTEM 2000 Natural Language.  However,
there are some points to discuss concerning various retrieval and maintenance
techniques tailored specifically to the POPATRISK data base.

     Since the data base is county based,  the most convenient and least
expensive method  of qualifying data is by use of C3  ST-COUNTY ID  in a
WHERE clause to qualify data sets for a particular county or span of
                                     26

-------
 counties, e.g. WHERE C3 EQ 343000.  Component C3  is the only component in the
 data base.that is unique for each logical entry.

     The  first component of each repeating group contains data which act
 as identifiers for each data set in the repeating group and are KEY'ed
 for use as selection criteria in WHERE clauses.  For example, C101 NEDS
 POLLUTANT contains the five-digit criteria pollutant code KEY'ed for easy
 access to emissions data when used in the qualification of data, e.g., WHERE
 C101 EQ 11101.  Each component name in the POPATRISK definition was
 chosen to be 16 characters or less to allow convenient default printout
 of the full component name in LIST Output.

     The POPATRISK data base is structured such that all repeating groups,
with the exception of SAROAD MONITORING SITES and MONITORING DATA are disjoint.
 Disjoint data sets are logically unrelated as shown in the data base structure
in Figure 1.     For retrieval purposes, it is most desirable to design a
 data base with repeating groups that are logically related whenever possible.
However, each set of POPATRISK data,  for the most part,  came from a source
 for which there was no basis for a combination with other data types.   For
example, in-out migration age-groups do not correspond with age-specific
age groups and NEDS emissions pollutant codes do not necessarily correspond
with SAROAD monitoring pollutants.   Therefore, data available for retrieval
by means of simple S2K Natural Language commands, e.g.,  PRINT, LIST, are
entry level data and data along any one branch of the tree structure in
Figure 1.

     Disjoint data sets may be retrieved using Natural Language in one of
 two ways.  The user may use "BY clause" processing to normalize the retrieval
process to a level of data that is common to families of all disjoint data
sets to be retrieved.   However, because of  the nature of the POPATRISK data
base, use of this retrieval method has not  produced useful results.  The other
method of disjoint data set retrieval  is to make retrievals without WHERE
clauses.  Since this method would qualify the entire data base, this method
                                     27

-------
is not recommended.  Therefore, using available Natural Language  capabilities,
separate retrievals should be made for sets  of unrelated data.   If further
combining or processing of data is needed, each retrieval output may be written
to a report file for input to a special utility program.

     Also available to the user is the Report Writer feature of SYSTEM 2000.
Although the feature requires learning a different set of retrieval techniques
and commands, it provides the capability for preparing several reports within
one run formatted specifically to the user's specifications.  The merits of
using Report Writer would be dependent entirely on its adaptability to future
retrieval requirements.

     Although it was not within the scope of this contract effort, an alterna-
tive solution to the limitations on retrievals of disjoint data sets dis-
cussed above is to develop either special purpose or generalized Procedural
Language Interface programs to extract requested data from the data base.  PLI
offers the capability of establishing multiple positions within a data base
entry to gain full access to all data sets and to structure output in a format
most useful to  the user.  A PLI retrieval program would accept input parameter
cards giving the type of data requested,  specific items desired, selection
criteria, and possibly data to be used in any calculations.  Once the user's
retrieval needs are assessed and PLI retrieval capability implemented,
POPATRISK output could be provided in a timely and efficient manner and in
a format tailored to the user's needs.

     Another aspect of active data base use and maintenance that is not
presently contemplated is updating of data.   If, at some point, a decision
is made to update existing data in the data base, PL! load programs could be
developed to process new data and modify existing data and/or load new data
sets or repeating groups.  Any changes to the existing data base definition
would in most cases require a reload of the data base.
                                      28

-------
                                  SECTION 4
                      SAMPLE RETRIEVALS USING POPATRISK

     In this section are presented a few sample retrievals intended primarily
to demonstrate simple investigative capabilities of POPATRISK.

     Table 5 illustrates one of the simplest types of retrievals available,
a listing of all entries in a county for specific data items.   In particular,
the NEDS emission data, apportioned to all Massachusetts FIPS  counties are
listed.  One special feature should be noted.  The S2K Natural Language
retrieval command which generated the listing of Table 5 calls for the item
"*C1001*" which is a user-defined function that generates the  data for area
source emissions.

     The result of another simple command is presented in Table 6.  Here
data on SAROAD monitoring for Durham County, N. C. are requested.  There
are three monitoring sites in the county; two in downtown Durham and one
near a reservoir some distance to the north.  Although the two downtown
sites carry the same site code and are located at the same address, their
UTM coordinates erroneously indicate actual locations some nine kilometers
apart.  Unfortunately coordinate errors of this type do occasionally occur
in the SAROAD data base.
     Table 7 lists a sample of counties in descending order of county air
quality for Sulfur Dioxide averaged over sites in the county whose purpose
was to jnonitor population exposure.  The average for state-county code
192140, Orleans County, Louisiana, is unreasonably high and, most likely, the
result of a keypunch error in the SAROAD data base.  The potential user is
cautioned that occasional errors do occur in SAROAD so that unusual data should
be investigated thoroughly before significant conclusions are  drawn.  The SO
air quality does not appear to be closely correlated with any  of the demographic
or socio-economic data items chosen for display.
                                     29

-------
                                  oooooooooooooooooooooooooooo
                                  OOCJOOOOOOOOOOOCJOOOOOOOOOCJOOO
                            oc.
                            =>
                            o
                            Of.
                            <
                            CO
                            UJ
                                                                                                                           13
                                                                                                                           QJ

                                                                                                                           C
                                                                                                                           •H
                                                                                                                           •U
                                                                                                                           C!
                                                                                                                           O
                                                                                                                           O
                                      04
                                                                                                                        aotrvm
                            (X
                            3
                            O
                                                   -o -o o  oo vf. -
                 >-
                 z

                 o
       oooooooaoooooooooooooooooooo
       oooooocjooooooooooooooooapooo
                                                                                                   (M
          O
          UJ
                                  «-•* -O »-»->>  >O
                                    g
          LU  tu
          ae
          a.  «-
          (L  <->
                            O
                            a.
                 a
                 UJ
w
,-j
«/)  LU
    ac
H  uj
<  X
LU  2
a.
UJ  «
a:  «-
x,  O
*-  O
o
H*


>•
3    O
O    O
«J    O
                                               O
                                               o
                                               o
                                  O
                                  o
                                  o
0
0
0
o
o
o
o
o
0
0
o
                                                                  30

-------
        ooooooaooooooooo oooooaoooooo
              -o r\i r>j -4  r-u>r^fOf\jr-ooOT-u> f- N- is\ a o o •* K»ojw-r-o
              N*oOf-W"»OfO  >f K» «- O >O lA (\|O>t>>oO»»r>»ninroO«r»«vjoo»N*
              »o«»K»tJ«o>o         r-ror»jr-ooK>r«->roro«*«Aineh«-
                 KI »- K>  -o  oo               t-jvio       »-o    «-rsjo    »-r>jt3
                             >»                     (p-          «-          f\J          (M
        oaooaooocjoooooaooooooooaoooo
        OUOOUOOtJOOUOOVJOU CJCJOOUUOOOOUU
        ooc»oooaciooc3'3oooa oo^ioa ooooooa

        OOtrtK>or\iOcxJ>omr^o.oi^»oOK»'~«*r\j«rt-4'oor\j^^-oor3
        •OK>T-for--o«*uo    fo»nooaK(ir>N.«-»OO»^
Crt     OOOOOOOOOOOOOOOOOOOOOOOOOOOO
 a>     o o o o o o o o o o o rp a o a o o o  3 o o o o o :"j o o a
 3     •OOOO4Air»>O*~Or^OO>A'41-OtJ'Of«-ir»fJ
 0       ••••••••••••••••••••••••••••
 O     OlAOiAOfMOO          rvi^«O«OC>»^»r>-«*»-O
-------
TABLE 6.   ILLUSTRATIVE RETRIEVAL OF SAROAD SITE AND MONITORING DATA

                    FOR DURHAM COUNTY, NC

  PRINT/REPEAT SUPPR£SS.NAME/C200  WHERE  C301  EG 111C1
  AND  C3 EQ 341160:
     AREA  CODE* 1160
     SITE  CODE*   1
     AGENCY* G
     PURPOSE*  1
     CITY* DURHAM
     CITY  POPULATION*    95438
     UTM ZONE* 17
     EASTING* 690551
     NORTHING* 3975943
     ADDRESS* HEALTH DEPT 300 EAST  PAIN  ST
     TYPE* 13
     ELEV  ABOVE GND*   8
     ELEV  ABOVE SEA*  405
       POLLUTANT* 11101
       METHOD* 91
       INTERVAL* 7
       tt OBSERVATIONS*
       GEOMETRIC MEAN*
       STD DEVIATION*
       70TH PERCENTILE*
       90TH PERCENTILE*
       99TH PERCENTILE*
       HIGHEST VALUE*
       2ND HIGHEST VAL*
       LOWEST VALUE*
  42
  55.29
1.96
   86.CC
  115.00
  123.CO
123.00
  122.CO
 6.CO
       POLLUTANT* 42401
       METHOD* 91
       INTERVAL* 7
       * OBSERVATIONS*
       GEOMETRIC MEAN*
       STD DEVIATION*
       70TH PERCENTILE*
       90TH PERCENTILE*
       99TH PERCENTILE*
       HIGHEST VALUE*
       2ND HIGHEST VAL*
       LOWEST VALUE*

       POLLUTANT* 42602
       METHOD* 94
       INTERVAL* 7
       # OBSERVATIONS*
       GEOMETRIC MEAN*
       STD DEVIATION*
1
 45
  6.55
.50
   8.CO
  14.60
  17.00
17.CO
  16.CO
5.00
  45
  21.44
2.02
            70TH PERCENTILE*
            90TH PERCENTILE*
            99TH PERCENTILE*
            HIGHEST VALUE*
            2ND HIGHEST VAL*
            LOWEST VALUE*
   4C.OO
   50.00
   71.00
 71.00
   69.00
10.00
                                                   (continued)
                            32

-------
       TABLE 6 (continued)
AREA CODE* 1160
SITE CODE*   1
AGENCY* P
PURPOSE*  1
CITY* DURHAM
CITY POPULATION*    95438
UTM ZONE* 17
EASTING* 689403
NORTHING* 3984931
ADDRESS* HEALTH DEPT 300 E  MAIN  ST
TYPE* 13
ELEV ABOVE GND*  50
ELEV ABOVE SEA*  405

  POLLUTANT* 11101
  METHOD* 91
  INTERVAL* 7
  # OBSERVATIONS*      6
  GEOMETRIC MEAN*    55.69
  STD DEVIATION*    1.21
  70TH PERCENTILE*     66.79
  90TH PERCENTILE*     69.49
  99TH PERCENTILE*     69.49
  HIGHEST VALUE*    69.49
  2ND HIGHEST VAL*     66.79
  LOWEST VALUE*     43.39

  POLLUTANT* 42401
  METHOD* 91
  INTERVAL* 7
  * OBSERVATIONS*      5
  GEOMETRIC MEAN*    15.35
  STD DEVIATION*    1.70
  70TH PERCENTILE*     22.19
  90TH PERCENTILE*     24.89
  99TH PERCENTILE*     24.89
  HIGHEST VALUE*    24.89
  2ND HIGHEST VAL*     22.19
  LOWEST VALUE*     7.09

  POLLUTANT* 42602
  METHOD* 94
  INTERVAL* 7
  * OBSERVATIONS*      6
  GEOMETRIC MEAN*    45.29
  STD DEVIATION*    1.10
  70TH PERCENTILE*     47.89
  90TH PERCENTILE*     49.79
  99TH PERCENTILE*     49.79
  HIGHEST VALUE*    49.79
  2ND HIGHEST VAL*     47.89
  LOWEST VALUE*     37.69

                                 (continued)
             33

-------
      TABLE 6 (continued)


AREA CODE* 1160
SITE CODE*   2
AGENCY* G
PURPOSE*  1
CITY* DURHAM
CITY POPULATION*     95438
UT« ZONE* 17
EASTING* 694973
NORTHING* 4002678
ADDRESS* LAKE M1CHIE
ELEV ABOVE SEA*  403

  POLLUTANT* 11101
  METHOD* 91
  INTERVAL* 7
  tt OBSERVATIONS*     24
  GEOMETRIC MEAN*     33.14
  STD DEVIATION*    1.67
  70TH PERCENTILE*     38.00
  90TH PERCENTILE*     59.uO
  99TH PERCENTILE*     85.CO
  HIGHEST VALUE*     85.00
  2ND HIGHEST VAL*     59.00
  LOWEST VALUE*      6.QC

  POLLUTANT* 42401
  METHOD* 91
  INTERVAL* 7
  * OBSERVATIONS*     25
  GEOMETRIC WEAN*      5.21
  STD DEVIATION*    1.19
  70TH PERCENTILE*      5.00
  90TH PERCENTILE*      5 . CO
  99TH PERCENTILE*     12.00
  HIGHEST VALUE*     12.DC
  2ND HIGHEST VAL*      6.00
  LOWEST VALUE*      5.00

  POLLUTANT* 42602
  METHOD* 94
  INTERVAL* 7
  H OBSERVATIONS*     25
  GEOMETRIC MEAN*     13.C6
  STD DEVIATION*    1.33
  70TH PERCENTILE*     16.00
  90TH PERCENTILE*     17.CO
  99TH PERCENTILE*     28.00
  HIGHEST VALUE*     28.00
  2ND HIGHEST  VAL*     21.CO
  LOWEST VALUE*     10.00
           34

-------
        I
   OJ
 o     u
 CO     M
        PC
        o
        p

        Q
 ^J     O
  O
 •  •
o o
•0 13


U CM
 • O
Kl »
                  • O
                 O Z
                 «1 O
                 O X
                 V CM
                 O
                 CM <
 in     ui CM
 ui     »
 »-     •
 •-     c>
 VI     — !

 Q.     —
 O
 O.

 Z
 <
 UI
 Z



 K     K .
 in     a- <
 v     •••••••••••••••«•»••«««.................».....
 x                  •*     — —                      _ _     CM —  -" «        CM  — — —    rt  CM           — n     »•»• —  — 3-     CM  —

 UI

 A

 UI
 r
 o
 o
 Z





 z    ^-oo«DccrMY»r><>r>-nr>o-ivorMrvT-cc>CMO-oo-oxoa-CMoo>oo-o9-CMco —  —CMOIXUIOO

 m    o     cor^xcn—ou>r/''OOor>-o-TOuioc>r/-«oa-c-i>-rM — c^uiuio*cc*»iuiinrMOifi'«»-'«CMo-fM3-*^
 re     o-    ui — oa)r^o-ui3-nuio-o-<o.rMa>'nr^_4i9-in-«i^cD«i«ia>ro-«»— — «->«mu>»-7X
in    o     CM     CM »•  »-  o- CM <•>    —ncM3-».~9-CMCMcvcMCM
       i            *"        •»               oui               ~«u>            CM           ~—        CM     —     —     — —     ui
 K                            *O               — *-t                                    *.                  —
 UI
ft.

 a.
 o
a.



z    **<^cno*oO'«-M-*-«u>c^uioa'CDO'r^
o     cv^o-rv—c~-.n — a>n — oot^CMOD3-«^o — o-co»-i*1 — —  «or>CMr^»-nco-«r^-co)iMcnCM»nn —  nccnnoccK
»-    •^—•«^«nin^-»'«o»'CMOCM«)CM»na)»ocouic>-o-r>roocD->nocMUicocMa-nor>)uii^conincMcoo- — o-  — — m
 -I    UI            •« CM  — Ul —            -OO-CMn     CMUIf^UI            -OCMCM    UI  — O        XCMUICMO—  »h.«S-»CM
 =                                                                                                                                                   u.

 o
a.

 _l
 «
 »-
o
t-



o


 >-
t-
2
I>     OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOl/IOOOOo
o     3~a--oxoo^)*a-ocM3-eo*otMO-coo»-oa-fMOx-oocMtM»x-oiM«CMOCD»--oo3-
u    -"'»--"'«-"»'«'>'»»cM»-u»CM—r»ci)coou<'«»-«-«ocMr^»na-o-<)-.«o»^'«'n«~«n'«f»CMa-u —  o»CMa-o->nis.anoo — O«»TCOOCM —  — » Q
•-    » 3-  < a>  o- 3-  — n-«ot^-e-ao^«-oa-n-o-o— 01/13-—«IOT—  a-— M  a- — o- —  — * tn  -o i^ — r  o- o- o-
m    —x»n —  « —  u>«n«ui3-cncM«ooa-«noCMCM«nnn — n  — CMCM—«r»3-i/iCMCMOoo —  3-<^a-
                 in u
                 •« I
                                                                                                35

-------
     The data in Table 8 were retrieved in order to investigate a possible
relationship between county alcohol sales and the age-adjusted white male
death rate for deaths attributable to motor-vehicle accidents.  In order to
place the numbers listed for alcohol sales/capita population over 18 in
perspective, a feature of System 2000, namely the capability to produce
nationwide averages for data items, was used in a separate retrieval to obtain
the value of $64.40 for the nationwide average of alcohol sales > 18.  Inspection
of the data in Table 8 reveals that the counties reporting high death rates
did not have unusually high per capita alcohol sales but that a disproportionate
share of high death rate counties were rural counties.  Data on gas sales/
capita for State County Code 453330, Loving County, Texas, appear unreasonable
and may be due to confidentiality restrictions in reporting of gas sales.

     In Table 9 the age-adjusted death rates for white males for- bronchitis
are compared to those for influenza and pneumonia for counties having high
death rates from bronchitis.  A large proportion of the counties retrieved
have small total populations and generally high migration rates.  While
the data suggest that there may be some relationship between the incidence
of disease and population mobility, no conclusions should be drawn in this
regard, nor with regard to possible association between the two disease
categories, until more counties (especially urban counties) are included.
The data base provides the capability of pursuing implications of high
migration rates in counties with high (or low) death rates.  This might
be a logical next step since high migration rates for age groups not nor-
mally associated with the particular disease category under investigation
would not, in and of themselves, indicate a county was unsuitable for
drawing conclusions about possible associations.

     The final retrievals in Tables 10 and 11 compare the geographical
distribution of age-adjusted death rates for the white female population for
breast cancer and for a cluster which includes breast cancer as a significant
component.  Eleven of the first 37 counties occur in both lists, but the data
suggest a significantly different geographical distribution of the two
variables.  Similar retrievals of data using different variables could serve
to illustrate the suitability of the cluster technique for investigating air
quality-disease association.
                                     36

-------
                        O
                        O
                 III
                                                                                                                       T3
                                                                                                                       01
                                                                                                                       3

                                                                                                                       •H
                                                                                                                       4-1
                                                                                                                       c
                                                                                                                       O
                                                                                                                       O
                 3k

                 ••»
                 Ct
                                                                                      
 O
 V
 0)
 oc
 LU

 ex
 o
                 CO

                 A

                 LU
                 _)
                 <
                 V)
                 O
                 X
                 o
CJ  O
r~   •
 «J  O
 •  o

fM

 *  O
fVI
OJ  Kl
O  O
 •  P-
•O  O
T*
o  d
 »  z
to  <

•%  f>

o
(M  9
j a
H N-
M O
H a
to tu
1-4 X
_J 3
                 en
                 OC
                 X
                 LU
                 u
                 cc
                 iU
                 0.
                 >•
                 H
                 ae

                 o

                 t

                 V)
                           IM
                                                       37

-------
                                                                                             T-r-r- »-»-r-r-T-    C
                                                                                                                         O
                                                                                                                         O
         •O
 a)
4J
 C
 8
00

w
                                              in
         ooooooooooooooooooooooooooooooooooo
         OKO>roo<«»OOK1'Ot\4--»N-'OI>O»«"»»*-l/AI^>
                                                           38

-------
•o
•H    T-
t!    rs«

g
o
00
                                    ro»—
      ooooooooooooooooooooooooooooooooooooo
                                                          39

-------

^fMi<>o.«Mo-ooui^ —  o-«o«/>— «xoxi«!a-oin»-»inooeo«e« oa>

ss
W
1
g
P
qj
*
M
H
M
H
CO
W
53
ffi

H
jj


El
M
g
O
^
j
>
i_i
s
H
g
W
>
M
H

H
co
i-4
M

•
o\
M
H





[NFLUENZA
1 I
en
u

M
«
i
P
1
£~J
PH
CO
§
0
CJ

+T*
N
s
1
"^
^^
00

^
tt
P
fn
o
u

I
h—i
H
M
w
CJ
o






*
o
I
o
M
1
CJ3
IH
s
w
3
w
H
M
P
\
<4
'IS
0
H


\ — '
P-i

CN
, ,
P

z
= 0
o o
0 0-
1 O
I/I O
0
•









r,»»oooa.o.»r.o










ooo.i^-ocMuir-DOin-or-eo-o^«»-a-— • — CM — CM — IM — oa- T o CM — — — . a



3
C
•H
4J
C
O
o

c
•H W)
crj C
•P O
»d r-(
O (0
O ^
J-l CU
co ^
r-l O
> M
*ri *rH
' M T3
w C
(U CU
M CJ
CO
'O CU

*rM C
rQ *H
g
O CO
0 -H
4J
rH -H
CO rC
H 0
0) C •
> O CO
CU ^ N
W ,0 C
U-l >-( 3
O O M
4J fj
t-( CO -H
3 CU
CO 4-1 >-l
CU CO O
CU 4-^ CO
Si CO CU
-Ti "rrt
CO 60 H

.G 00
4J c
3 X! -H
O 4J t)
4J -H C
C S O
•H a,
M CO CO
(X CU O
CU ±J J-(
> a o
O 3 O
-0 O
cfl O 
                      40

-------
 a)
 3
 0
•H
W
 C
 O
 O
W
oo
3-    <
           o
           CD
o    IM    —
O--OO
(MO    —
      e>
      -O
      o
U1    —
a-o
                                                              oQTaoo
                                                              «    «     —    _.    n    _
                                                              oo     —    —    —    —
                                                                                                oo
                                                                                                CO'«
                                                                                                rio-
                                                                                                o    —
                                                                               41

-------
     M
     H

                 ui
                  o
                  IE
                  O
                  1/1 ..
                  a-  o
                  u o

                  *• o
                  a-  -e
                  ff     H »* CD CC  ^ (£  *O O  CD F^  «O ^J  W O  O ^>  7 **1  O O*  0^ O"  Fv i/l  a" y  y a*  **> t*\  ^1 ^1  f^ <^ ^3  O O
                  >     r>n-«oir«.r^rvrx^>^)<«^)^)<4)-«<«-«<«-«^i<4»^>a*<4><4}inoor4'*^o<^i/i^t^iftnv*cDi/^a'~*i>ka'oDao**n'«aDr4CCocoi/ia>&>'*>
                         a*(/>oo^-^ <*>is*e»je4*^^jMioi%»>*muir»i-a)-o*r»e*'»/>i/)-*a- —  — O--O-O  — ^»or*«v«o-«CDeDOf^ij«3-coinr^**     if)  —• -a  •«        r^a>.*'nr^inco'ncNJt/ibni/irM«nriao4n«iia'r>.
3
                                   o
                                   CL
  - O
O  !•••
—  O
o
 -  o
n  z
u  <
                  t- ec
                  V) UJ
                                   3     OOOOOOODOOOOOOOOOOOODOOOODOOOOOOOOODO



                                    •     r)oooa~»«a-«*v^rtK4*-*«M—*oa'*«O'w>r*)aN''>F>i~MvH'.*v«.M*MvMa'*A'4ifi'-r'>'*)
                                                                                     42

-------
S3
w o
H M
< H
2S
re o
H M
3*
Q W
Q 3
w a
H S
OT fti
n w
9ft
T S
w §
e>
< P
w <

^C S3
5l O
W M
f*H H
W 5
H S3
M P4
ffl O
§ P i
H K
V3 H
W M
CO ^
0
M O
EC S3
O
£3
M
ta X-N
W  g
M H

§d
w a
p> c/j
H ^J
4 ^
H W
W E3
iJ H
i-J S5
M <
^J
O

^ 'J
§
Sa





















o
r^
U

19
1
y-
B
Ul

o
a:
o

•
o
- o
to o
a- •
o ^)
- o
a- —
a~
o »-
a-
— a-
Cl CJ
o u
w o
» z
u
— 3-
O
" U!

Ul —
_l O
i- rv
>-
N. Ul
t~ K
_l X
u, -"*~«---*;;-5-5*22SSi'>rt^;S*r>"'><0Sl-S"0-



Ul
u.
X
T
O


«, oo.^«*«^«or*»-^^^-t*-o.---.-,o-,o.W-,-,.
*-
^
o
X
UJ

X

•ft

z
z
Ul
u.

w



Ul

»-
z


If
CL
z _^^,,,-.^,.^U,-.o-»«^o-*,u1-a,ooo.ao<_
,- "SSJJ"5M«*::* «.«,«« £S£o^./i:i°°i-
o
••»

»-
z
o ooooooooooooooooooooooooooooooooooooo
0 4— ««-«l^*«<*lOO"*»***OO4OOOOOr">
^ ^> 1*1 r^ f*t (^ oo i*s ft o* tt o* o tt ^ oo iv •*« ^ f*> f> a* u> ••) ^ «• i^ M i*) GO oo ^ a* ^k *^ do v^> i/t
tfi oa"fvc*4— •r^Na'"-«*^"-iiif>*— •a-"-»ui — -~«-«a'a'—»f*-~c»*«— XINOJ— — r^*^*-c*4»*»
•
43

-------
                                 SECTION 5
                      DATA BASE DEVELOPMENT PROCEDURES

CONSIDERATIONS LEADING TO CHOICE OF DATA ITEMS

     Choices of specific data items to be included in POPATRISK were made
in close consultation with the EPA Project Officer.   Considerations of
utility to EPA were primary but the limitations imposed by data base size
and consequent development and retrieval times played a significant role
in the decision process.  In this section, some rationale is given for the
specific choices made.

Entry Level Data  Items

     The county level is the smallest geographic entity for which deaths
and population mobility data are available.   In addition, area emission
estimates provided by the NEDS file are available only on a county basis.
Even if these data sources were not restricted to the county level, daily
population mobility such as commuting to work, strongly suggests that a
there is not much to be gained in studies of possible relationships
between air quality and disease in a finer geographic subdivision than
the county unless data on individuals themselves are presented.  .For these
reasons, the county was chosen as the geographical entity for POPATRISK.
Once this decision was made, it was decided to include data on population
density, racial makeup, measures of socio-economic status, climatology, and
population mobility both with regard to work/residence and to in- and out-
migration.

Monitoring Data

     It was decided to preserve in its entirety the integrity of monitoring
data in SAROAD.  The inclusion of UTM coordinates, e.g. makes feasible the
                                  44

-------
 development  of  air  quality estimates  on  a  finer  than  county  level  for
 special  user purposes.  The  geometric mean and standard deviation  enable  a
 ready  computation of arithmetic mean, if the user wishes,  and might  even
 be  accomplished by  a user-defined function if experience shows  this  statis-
 tical  measure is in much demand.

 Air Quality

     A number of schemes of varying complexity were considered  as  candidates
 for the  estimation  of county air quality.  In the end, the simplest, i.e.,
 an  arithmetic mean  of geometric means for  all stations by purpose  type, was
 chosen because  no other seemed to offer  distinct advantages.  As mentioned
 above both  intraarea  and  interarea  population mobility and the coarseness of
 mortality  statistics were considered  to  limit the usefullness of a finer
 geographical breakdown.  It was felt also  that in preserving the SAROAD monitor-
 ing data intact, POPATRISK would enable  individual investigators to  utilize
 the data for constructing more complex air quality assessments  to meet
 particular analytical needs with maximum flexibility.

 In-out Migration

     The Census tape from which POPATRISK data on in-out migration were
 derived, was specially compiled by Census for this work and represents to
 our knowledge,  the  first use of these data at the county level.   It was felt
 especially important to include these data so that individual assessment
 could be made of the stability of a county with regard to population and
 consequently so that possible artifacts in the relationship between air
 quality  and mortality could be flagged and, perhaps, explained.

Mortality

     The direct method of age-adjustment was  chosen to correspond to the
practice most widely used for making  county mortality statistics comparable
 from county to county.   Limitations on data base  size precluded  more than
about 50 ICDA categories so that it was  decided to choose the 48 categories
 corresponding to aggregations of ICDA's  already in use at HERL.   Following
a published attempt  in the literature  to provide  the most rational  basis for
                                   45

-------
a88re8ating ICDA's into meaningful groups it was decided to include two other
aggregations °n an experimental basis.  These aggregations were derived
from the results of a cluster analysis reported by F. Burbank (A Sequential
Space-time Cluster Analysis of Cancer Mortality in the United States:
Etiologic Implications, Am. J. Epid _95, 383-417, 1972) in which ICDA cancer
mortalities were analyzed on the basis of geographical similarity among tumor
types, separately for the white male and white female population.  For this
work, it was decided to select groupings of ICDA's which were quite similar
for the white male and white female population.  With this selection criterion
only two clusters seemed meaningful and were used for the remaining two
mortality categories.  The utility of this approach remains a subject for
future investigation.

HISTORY OF DEVELOPMENT

     The POPATRISK data base development and loading procedure was modular-
ized, in that,  as data from each selected source became available, method-
ologies for processing and loading the data along with resolution of
geocoding discrepancies were developed and implemented.

     Initial verification of raw data to be used in the data base involved
for the most part close inspection of the data and extensive correspondence
with those persons responsible for providing the data.  Raw data verification
processes were  performed on Census and mortality data during processing
procedures and  will be discussed later in this section.

     A more extensive verification process was performed on the in-out
migration data obtained from the Bureau of the Census.  The verification
effort was done after consultation with the Bureau of the Census because its
use in POPATRISK represented the first access to this data outside the
Bureau.   Detailed documentation on the migration file is provided in Appendix B.

     To verify  the data on the in-out migration file, 33 counties which are
also state economic areas were selected and data for these counties were
compared with published Census data.   The nunber of unallocated out-migrants
by sex and by age group in these counties matched exactly the data in Table
2 of the PC(2)-2E Migration Between State Economic Areas Census report.
                                   46

-------
      The numbers of unallocated in-migrants on the file were also checked
 for  these  counties.  Since comparisons with published reports could not be
 made by sex or by age group, in-migrant totals on the file were summed for
 both sexes and all ages.  These unallocated in-migrant totals on the file
 were  compared with totals obtained by adding the "abroad in 1965" data in
 Table 1 of the PC(2)-2E Migration Between State Economic Areas to the sum
 of the male and female "population 5 years old and over" data in Table 2
 of the PC(2)-2E publication.   Exact matches could not be made with published
 data on in-migration because of differing allocation procedures used in
 creating the migration file, but the differences in the data totals of
 each county selected were less than 1%.   Table 12 gives the results of this
 comparison.

     State totals on the tape were also verified for the contiguous
 United States and the District of Columbia by aggregating totals for all
 ages and both sexes.   These totals corresponded exactly with the unallocated
 out-migrant totals determined from Table 4 of PC(2)-2E by the following
 formula:  "total persons living in state in 1965" - "total persons still
 living in state in 1970."

     As with the in-migrant county totals,  unallocated in-migrant state totals
 on our file differed slightly (less than 1%)  from published data.   Data from
Table 1 of PC(2)-2E under the general heading "different state economic
 area" and the specific heading "different  state"  were added to "abroad in
 1965" data from Table 44 of PC(2)-2B, Mobility for States and the Nation.
The resulting data totals compared closely with the sum of male and female
 in-migrant totals for all ages on our tape.   This  comparison is shown in
 Table 13.

METHODOLOGIES FOR PROCESSING AND LOADING

     During the entire procedure of processing and loading the POPATRISK
data base,  extreme  care was taken to assure proper maintenance of the quality
 and integrity of the  data from their verified sources,  through the  processing
stage, and into the data base.   This involved a thorough investigation  of
the data to isolate all anomalies  and geocoding discrepancies in the data
                                  47

-------
TABLE 12.   UNALLOCATED IN-MIGRATION TOTALS (ALL RACES, AGES, BOTH SEXES)  FOR

           SELECTED COUNTIES WHERE COUNTY=STATE ECONOMIC AREA (SEA)
     STATE

Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Georgia
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Massachusetts
Michigan
Mississippi
Missouri
Nevada
New  Hampshire
New  Jersey
New  Mexico
New  York
North Carolina
Ohio
Oklahoma
Oregon
Pennsylvania
South Carolina
Tennessee
Utah
Washington
West Virginia
                COUNTY
FIPS CODE
SEA
                                                         TOTAL 1
                                                               TOTAL 2
                                                                           **
Pima
Pulaski
Ventura
El Paso
Hartford
New Castle
Bibb
Boone
St. Joseph
Woodbury
Sedgwick
Fayette
Calcasieu
Aroostook
Worcester
Kent
Hinds
Greene
Clark
Hillsborough
Atlantic
Bernalillo
Broome
Wake
Allen
Creek •
Lane
Blair
Aiken
Davidson
Salt Lake
Spokane
Kanawha
04 019
05 119
06 111
08 041
09 003
10 003
13 021
17 007
18 141
19 193
20 173
21 067
22 019
23 003
25 027
26 081
28 049
29 077
32 003
33 Oil
34 001
35 001
36 007
37 183
39 003
40 037
41 039
42 013
•45 003
47 037
49 035
53 063
54 039
B
A
7
B
C
A
F
2
B
A
A
E
D
1
B
B
A
C
A
A
E
A
E
E
0
C
B
F
B
B
A
D
C
95501
52092
114038
94668
95197
52501
16985
6545
29369
14571
62292
38991
18725
13273
61123
49305
31879
33160
73634
35751
21378
68623
25542
51031
12341
9506
50693
9868
13024
66472
57067
58094
22999
95371
52129
113820
94312
94403
52338
16988
6568
29379
14584
62222
38985
18722
13154
60888
49343
31881
33181
73572
35796
21395
68495
25542
51005
12330
9506
50676
9868
12997
66346
56803
58011
22861
  *
 **
Total from 1970 migration tape
Total from Census publication
                                     48

-------
       TABLE 13.  STATE UNALLOCATED IN-MIGRATION TOTALS FOR

                  ALL RACES, AGES, BOTH SEXES
     STATE

 Alabama
 Arizona
 Arkansas
 California
 Colorado
 Connecticut
 Delaware
 District of
   Columbia
 Florida
 Georgia
 Idaho
 Illinois
 Indiana
 Iowa
 Kansas
 Kentucky
 Louisiana
 Maine
 Maryland
 Massachusetts
 Michigan
 Minnesota
 Mississippi
 Missouri
 Montana
 Nebraska
 Nevada
 New Hampshire
 New Jersey
 New Mexico
 New York
 North Carolina
 North Dakota
 Ohio
 Oklahoma
 Oregon
 Pennsylvania
 Rhode Island
 South Carolina
 South Dakota
 Tennessee
 Texas
 Utah
 Vermont
 Virginia
Washington
West Virginia
Wisconsin
Wyoming
FIPS CODE

   01
   04
   05
   06
   08
   09
   10

   11
   12
   13
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   44
   45
   46
   47
   48
   49
   50
   51
   53
   54
   55
   56
TOTAL FROM 1970
MIGRATION TAPE

     253274
     367799
     183763
    2164948
     393295
     310808
      74058

     100313
    1344272
     474952
     100797
     797673
     387630
     189704
     261540
     239756
     248315
      82386
     513130
     450030
     564476
     269816
     168462
     410022
      76863
     138803
     119133
     106157
     723874
     144420
    1053286
     415305
      59260
     681849
     303150
     273894
     582853
     103951
     246710
      56105
     326387
    1028483
     118723
      56911
     641962
     512079
     114621
     270043
      52575
TOTAL FROM CENSUS
   PUBLICATION

      253445
      368192
      183918
     2171140
      393764
      311115
       74175

      100436
     1345594
      475353
      100926
      798727
      388114
      190120
      261819
      239915
      248684
       82527
      513893
      450842
      565340
      270656
      168549
      410605
       76972
      138998
      119224
      106285
      724639
      144617
     1054876
      415688
       59322
      683030
      303425
      274395
      584037
      104311
      246959
       56177
      326679
     1029826
      118865
       56940
      643407
      513084
      114722
      270534
       52611
                                49

-------
as well as verification that all such problems were resolved as accurately
as possible.  The effort was accomplished, for the most part, through use
of county code cross-walk files developed to match record identifiers with
data base entries.  These files were merged to produce the county cross-
reference table included in the POPATRISK Geocoding Manual.

     Every opportunity was taken to spot-check .data during processing and
loading to insure not only accuracy of processing but also that data were
loaded into the proper county entry and in the correct location.   The
cohesive nature of the data base entries provided opportunity for an on-
going process of verifying already loaded data as new data were being
processed and loaded.

     Figure 2 provides an overview of the first phase of the POPATRISK
data base development process.   The loading processes shown in Figure 2
were fairly mechanical in nature and involved relatively moderate amounts
of data.   For these sets of data the loading procedure involved extracting
data from the input files and structuring the data into S2K load string
format with the proper associated county code and then performing Natural
Language Queue Access  loading.   All loading processes involved incremental
loads in groups of county repeating groups to assure successful execution
of load jobs within the constraints of run time and file sizes.

     Population demographics and climate data items in the entry level were
taken straight from the SSI-developed tape and loaded county by county.
Matching of counties in the data base presented no problems with this data
since discrepancies had been resolved in previous work by SSI.   However, as
mentioned in Section 2, entry level data for New York City were  provided in
one record for New York City as a whole.  Income ratios and climate data
were considered the same for the five boroughs and were duplicated for each
borough entry.  Other demographic data for each borough were extracted from
the 1972 County and City Data Book.

     The NEDS emissions and SAROAD monitoring data were also simply
extracted from the appropriate NADB files and loaded into the data base
county by county.  Users should be aware that some monitoring data in
                                   50

-------
I
\
I^^L
^^


^
*


^



<* w
r^ w
ON H
f-4 M
W
B9

r^ p£
M U

0
a
a
o
H
M
ZS
•«!
0 H
53
                                                                                                                                                   01
                                                                                                                                                   w
                                                                                                                                                   n)
                                                                                                                                                   co
                                                                                                                                                   cu
                                                                                                                                                   t-i
                                                                                                                                                   a
                                                                                                                                                  13
                                                                                                                                                   CU
                                                                                                                                                   O
                                                                                                                                                   O
                                                                                                                                                  4J
                                                                                                                                                  c
                                                                                                                                                  0)
                                                                                                                                                  rt

                                                                                                                                                  en
                                                                                                                                                  QJ
                                                                                                                                                  o
                                                                                                                                                  M
                                                                                                                                                  a
                                                                                                                                                  o
                                                                                                                                                  CO

                                                                                                                                                  ctf
                                                                                                                                                 o
                                                                                                                                                 CM
                                                                                                                                                  Q)
                                                                                                                                                 •H
                                                                                                                                                  d
                                                                                                                                                  bO
                                                                   51

-------
SAROAD is of questionable nature.   This  applies  for  the most  part to a
few of the reported values for observed  maxima.   It  was not possible, within
time and budget constraints,  to edit SAROAD data for the purposes of this
contract.  However, it is our judgment  that the  yearly  averages  abstracted
for use in POPATRISK can be used with considerable confidence.   It should
also be noted that although running-average monitoring  data,  identified
by sampling intervals  X, Y,  or Z,  were  initially loaded into the data
base, quality and the usefulness of these data were  found to  be
questionable and the data were subsequently removed.

     Since NEDS recognizes only 6  counties in Massachusetts,  SAROAD
recognizes 13, and the POPATRISK data base contains  14  Census counties,
resolution of the discrepancies required a complete  reapportionment of
both sets of data to the respective data base counties.

     As discussed in Section 2, the Census coding scheme was  selected for
Massachusetts because of its finer geographic breakdown and widespread use.
The four-digit county codes used are the three-digit FIPS codes  with leading
zeros added to create a four-digit POPATRISK code.   Table 14  lists each
SAROAD city and county and POPATRISK code to which it was assigned.  The
NEDS area emissions were aggregated for  the state and then disaggregated
into 14 Census counties using 1970 Census figures, gasoline sales, etc.,
on a basis which conforms largely  to the guidelines  in  the EPA document
'Guide for Compiling a Comprehensive Emission Inventory1 APTD-1135.  There
are approximately 2,500 point sources in Massachusetts.  They were assigned
to appropriate Census counties by  a computerized procedure involving
geographic location by UTM coordinates.   A detailed  discussion of the
procedures used to assign NEDS point source data to  Massachusetts counties
is provided in Appendix A.

     SAROAD monitoring data for Massachusetts were simply reassigned to
their appropriate data base counties by identifying  the city  name of all
SAROAD monitoring sites and then locating the data base county in which
the city is located.
                                   52

-------
TABLE 14.   ASSIGNMENT OF MASSACHUSETTS SAROAD MONITORING TO CENSUS COUNTIES
                             (all state 22)
       SAROAD City Name

         Adams
         Amherst
         Athol
         Attleboro
         Ayer
         Belchertown
         Boston
         Brookline
         Cambridge
         Chicopee
         Fall River
         Fallmouth
         Fitchburg
         Framingham
         Greenfield
         Haverhill
         Holyoke
         Lawrence
         Lee
         Lowell
         Lynn
         Marblehead
         Maynard
         Medford
         Needham
         New Bedford
         Newburyport
         North Adams
         Northfield
         Norwood
         Peabody
         Pittsfield
         Plymouth
         Quincy
         Revere
         Springfield
         Waltham
         Warren
         Woburn
         Worchester
Census County Name

    Berkshire
    Hampshire
    Worcester
    Bristol
    Middlesex
    Hampshire
    Suffolk
    Norfolk
    Middlesex
    Hampden
    Bristol
    Barnstable
    Worcester
    Middlesex
    Franklin
    Essex
    Hampden
    Essex
    Berkshire
    Middlesex
    Essex
    Essex
    Middlesex
    Middlesex
    Norfolk
    Bristol
    Essex
    Berkshire
    Franklin
    Norfolk
    Essex
    Berkshire
    Plymouth
    Norfolk
    Suffolk
    Hampden
    Middlesex
    Worcester
    Middlesex
    Worcester
POPATRISK County Code

        0003
        0015
        0027
        0005
        0017
        0015
        0025
        0021
        0017
        0013
        0005
        0001
        0027
        0017
        0011
        0009
        0013
        0009
        0003
        0017
        0009
        0009  .
        0017
        0017
        0021
        0005
        0009
        0003
        0011
        0021
        0009
        0003
        0023
        0021
        0025
        0013
        0017
        0027
        0017
        0027
                                  53

-------
    Since POPATRISK is a county based data base, data for independent cities
without county status required reassignment to their original counties.
Only three independent cities are universally given county status and there-
fore maintain that status in the data base.  They are Baltimore, Maryland,
St. Louis, Missouri, and Carson City, Nevada and are listed in Table 15.
Independent cities in Virginia were assigned to their original counties as
shown in Table 16.  As mentioned in Section 2, it was necessary to create
two data base counties for those independent cities for which no county now
exists, namely Elizabeth City - Warwick and Norfolk - Princess Anne, with
arbitrarily assigned codes 0972 and 2144 respectively.   Entry level data
for the new counties were obtained by aggregating data for independent
cities contained therein.

     The NEDS emission data for Virginia independent cities were added to
the NEDS data for the county to which they were assigned.   Independent city
SARQAD site and monitoring data were simply retrieved and loaded into the
county entry to which they were assigned.

     To calculate monitoring air quality data, the geometric means for each
pollutant were retrieved from the data base for purposes of (1)  population
monitoring, (2) source monitoring, and (3) background monitoring where the
sampling interval is 1 (1 hour)  or  1 (24 hours).   Arithmetic  averages of the
geometric means were calculated for each pollutant for  each of the three
site types.

     A predicted mean of emissions for TSP was calculated by SSI on the
basis of a regression model described in Appendix C.  A card deck of NEDS
data had to be used because the regression model was developed using a
version of the NEDS emissions inventory which is no longer currently access-
ible through NADB files.   The predicted mean for TSP was calculated for
counties where TSP and NOx emissions (tons/year) fall within the following
limits:

                    1414.375 ^ TSP £ 230031.803
                               NOX < 432778.195
                                   54

-------
IABLE H.  I»         "TXES KECOC^EO MD CODE. I, DATA BASE

                                    POPATRISK State-Counti_Code
Baltimore, Maryland                              26      4280
St. Louis, Missouri                              29      0040
Carson City, Nevada
                                 55

-------
TABLE  16.   ASSIGNMENT OF VIRGINIA INDEPENDENT CITIES TO POPATRISK COUNTIES
                               (all state 48)
                     SAROAD
City Name             Code

Alexandria            0080
Bedford               0320
Bristol               0480
Buena Vista           0560
Lexington             1740
Charlottesville       0680
Clifton Forge         0780
Covington             0840
Colonial Heights      0820
Danville              0920
Emporia               0980
Fairfax               1040
Falls Church          1080
Franklin              1180
Fredericksburg        1240
Galax                 1280
Harrisonburg          1480
Hopewell              1560
Lynchburg             1840
Martinsville          1940
Norton                2240
Petersburg            2360
Radford               2600
Richmond              2660
Roanoke               2700
Salem                 2800
South Boston          2920
Staunton              3060
Waynesboro            3320
Suffolk               3080
Williamsburg          3360
Winchester            3380
Hampton               1440
Newport-News          2120
Chesapeake            0710
Norfolk               2140
Portsmouth            2440
Virginia Beach        3240
County Name

Arlington
Bedford
Washington
Rockbridge
Rockbridge
Albemarle
Alleghany
Alleghany
Chesterfield
Pittsylvania
Greensville
Fairfax
Fairfax
Southampton
Spotsylvania
Grayson
Rockingham
Prince George
Campbell
Henry
Wise
Dinwiddle  '
Montgomery
Henrico
Roanoke
Roanoke
Halifax
Augusta
Augusta
Nansemond
James City
Frederick
Elizabeth City-Warwick
Elizabeth City-Warwick
Norfolk-Princess Anne
Norfolk-Princess Anne
Norfolk-Princess Anne
Norfolk-Princess Anne
POPATRISK County Code

       0200
       0340
       3300
       2740
       2740
       0060
       0100
       0100
       0720
       2380
       1400
       1060
       1060
       2940
       3000
       1360
       2760
       2500
       0580
       1520
       3420
       0960
       2020
       1500
       2720
       2720
       1420
       0260
       0260
       2060
       1600
       1220
       0972
       0972
       2144
       2144
       2144
       2144
                                    56

-------
     A formatted input file of SAROAD data selected from the data base was
 read simultaneously with NEDS input card deck.  The county identifiers on
 each file were matched and calculations performed.  Results were placed in
 S2K load strings and subsequently loaded into the data base.

     In those cases for which there were no monitoring data for a county,
but the TSP and NO  values qualified the county for calculation of mean
                  X
predicted TSP, a single data set for TSP exists containing the mean
predicted TSP.

     The air quality repeating group provides the user readily accessible
estimates for assessing the general exposure of county population to the
various pollutants.

     Figure 3 provides an overview of the second phase of the data base
development process.   Processing and loading of data at this stage involved
numerous calculations and handling of much larger quantities of data.
Therefore, S2K Procedural Language Interface COBOL programs were used to
interface with the data base,  retrieve  data base data needed for calcu-
lations,  and load the data by  means of PLI optimized loading.

     The Census in-out migration tape was read and the FIPS county code for
each record on the tape was written to a file to create a cross-walk file
for matching migration counties  with data base counties.   Although there
are state total records on the original tape, these data were  not needed
for POPATRISK and were disregarded.   Also disregarded on the tape were
migration records for Alaska and Hawaii.

     Geocoding problems with  the migration data were  resolved in a manner
 similar  to  that  used  for previous data types.  Baltimore and St. Louis
 independent  cities and  counties were reordered to match the SAROAD scheme;
 data for Yellowstone Park, Montana was added into the record for Park
                                   57

-------
                                                                                CSl

                                                                                0)
                                                                                w
                                                                                ctf
                                                                                M
                                                                                (U
                                                                                V-i

                                                                                -o
                                                                                (U
                                                                                o
                                                                                o
                                                                                M
                                                                                ex

                                                                                4-1
                                                                                d
                                                                                0)

                                                                                CU
                                                                                o
                                                                                0)
                                                                               T3
                                                                                C
                                                                                cfl
                                                                                0)
                                                                                O
                                                                                V-i
                                                                                3
                                                                                O
                                                                                M
                                                                               CO
                                                                               M

                                                                               <&
                                                                               PL.
                                                                               O
                                                                               •H

                                                                                VJ
                                                                                0)

                                                                               O
                                                                                00
                                                                               •H
58

-------
County, Montana, and data for all Virginia independent cities were added
into records for their corresponding counties.   There was no record on
the migration tape for Loving,  Texas,  POPATRISK code 45 3330.  Since the
population of that county was 164, the data would have been suppressed by
Census confidentiality requirements.

     When all geocoding problems were  resolved, a PLI program was written
to process the migration tape using the FIPS -  POPAIRISK cross-walk file
and load the data county by county.  Processing of migration data involved
retrieval of the 16th data set  (totals) of the  age-specific Census data.
These race-sex totals were used to convert migration data to percentages
of the total population for a particular race-sex group.   For example, the
number of white male in-migrants for age group  14 was divided by total
white males in the county and multiplied by 100 to give a percentage of
the total population of white males that were both in-migrants and between
age 5 and 14.  Migration percentages were calculated for both 'Total'
(Allocated + Unallocated) and 'Allocated' data.  As data sets were processed
and loaded, total county migration figures were tallied and loaded into
the entry level components for the county.

     The New York City boroughs presented a unique problem with in-out
migration data.  As the Census  documentation in Appendix B explains, persons
reporting in 1970 that their residence in 1965  was merely a different
house in New York City or an incomplete response of "New York City  - no
borough specified" were not included in either the in-or out-mi grant tallies
for individual boroughs.  An extra summary record for New York City as a
whole is provided that excludes those  persons with the above responses.
The documentation also describes another extra  record on the tape that
contains out-migrant counts for those persons with a 1965 residence of "New
York City - no borough specified" and  1970 residence outside New York City.
However, this record was not found on  the tape.
                                   59

-------
     In order to complete the data for New York City boroughs, SSI contacted
the Bureau of the Census to discuss the possibility of retrieving data
not included on the tape and incorporating that data into the data base.
The Census Bureau was very helpful in providing hard copy tabulations of
the needed data.  Specifically, tabulations provided show counts of persons
reporting 1970 residence of one of the New York City boroughs and 1965
residence of 'New York City' - borough not specified by sex, race, and age
groups; also supplied were counts of persons reporting 1970 residence out-
side New York City and 1965 residence 'New York City' - no borough speci-
fied by race, sex, and age group.

     A procedure was developed to distribute among boroughs the count of
in-migrants who gave incomplete responses (no borough specified) to 1965
residences, between the five boroughs according to a ratio proportional to
in-migrants giving complete responses.  The procedure resulted in estimated
counts of 'Incomplete response' in-migrants from each of the five boroughs,
which when summed over the 1970 residence boroughs, provided out-migrant
counts from each of the boroughs.  Both in- and out-migrant figures were
considered 'allocated1 and were added to their respective borough records.

     'Incomplete response' out-migrants from New York City were distributed
to each of the five boroughs according to the 1970 borough population.
These figures were also considered 'allocated'  and where added to their
appropriate boroughs.

     The procedure for processing the NCHS mortality data and calculating
age-adjusted rates was as follows:

     To ease the problem of working with such large numbers of records on
the raw tapes,  each tape was divided into five separate mass storage files.
Since the raw data were sorted by state and county, three mass storage
files,  one from each tape, could be read and manipulated simultaneously
resulting in a merge of all deaths over the three-year period for each county
into a worktape record.  Counts of deaths were stored in the worktape
record in a three dimensional age(15), race-sex(4), ICDA(50) array.   The
15 age groups used for mortality data are identical to those used in the
                                   60

-------
age-specific population data.   Five runs were made, using three mass storage
raw data files at a time, to create five separate tapes of mortality
counts, one record per county.  These five work tapes were then saved to
be combined into one file spanning two tapes for delivery to EPA along with
this final report.   Documentation on the work tapes are provided with the
tapes as well as in Appendix B of this report.

     Direct method  age-adjusting was accomplished as follows:

     Let the indices i, j,  k be defined as:
          i      1CDA (1-50)
          j      Age Group (1-15)
          k      Race-sex  Group (WM,  WF,  NWM,  NWF)
     and let:
          Dijk = total number dead in county for years 1969, 1970,  1971
                 in ICDA i cause category,  age group j, and race-sex
                 group k,  taken from the  work  tape  record.

          Rjk   = number  of people  in age  group j  and race-sex group k in
                 county  (i.e.  number at risk)
                 Dijk
          Sijk = Rij  = age-specific death rate in ICDA i cause in age
                       group  j  and race-sex group k

          Pj    = number  of people  in age  group j  in the standard  U.S.
                 population in base year  1970.
       P=  ZPj = total U.S. population in 1970
           j
          Aik   = age-adjusted death rate  in ICDA i  cause in race-sex group
                 k (per  100,000)

Aik can be written as:
                   15
          Aik   =   Z Sijk  Pj
                 1=1   	
                                 61

-------
     with  a substitution for Sijk:
                   15
           Aik  =   £    Dijk   Pj_
                  j=l   Rjk    P

The actual calculation of age-adjusted rate per 100,000 over 3 year
period is  performed as follows:
               Dijk  =.Dijk  = average number of deaths over 3 year period
                        3
                   15
          Aik  =   £    Pijk .  .   Pi  .    100,000
                  j=l   Rjk        P

     The variable R.,  is easily retrieved from the AGE-SPECIFIC POPULATION
                   JK
repeating group.  Since the number dead,  Dijk, is divided by three, the
death rates represent the average yearly age-adjusted death rate per
100,000 population over the three-year period from 1969 to 1971.

     Verification of the mortality data and processing methodology began
with the mortality count worktape records prior to any calculations and
loading.  Three test files were created for testing the worktape creation
software.  When verified correct, the worktapes were created and extensive
spot-checking of the worktapes was accomplished using published NCHS
publications.   Data is published for each year, showing total deaths for
each ICDA category for each county.   Adding deaths across age groups and
across the three years provided verification of total death counts on the
worktapes.

     Geocoding problems with the NCHS coding scheme were handled in a
similar manner to the in-out migration data.   Mortality worktape records
containing counts for Virginia independent cities were combined with the
worktape records for their respective counties.  With a few exceptions,
the worktape delivered to EPA contains all geocoding corrections
giving a county by county match-up of worktape records with data base
counties.  Mortality data were available only for New York City as a whole
with no breakdown for the five boroughs.   Therefore, only one record exists
                                   62

-------
 on the worktape  for  the  five New York  City boroughs.

     A PLI  COBOL program was developed to extract death counts  from the
 worktape, retrieve county Census data, calculate age-adjusted rates  and
 load the rates into  the  data base, using PLI optimized load.  Upon  comple-
 tion of testing  of the mortality calculations and loading process and trial
 loading of  data  for  several counties,  it became apparent that processing
 time and charges for loading the data  in their present format would be
 prohibitive.  Version 2.80 of SYSTEM 2000 offers the capability for creation
 and removal of indices at any time, in effect, changing components  from
KEY to NON-KEY,  and vice versa.  Therefore, all the KEY'ed components in
 the mortality repeating group were charged to NON-KEY during the loading
process.   This action eliminated the excessive processing time and expense
 of creation and maintenance of KEY item pointers in the data base files.

     A decision was made to process and load the mortality data for New
York City into each of the five New York City boroughs using the total
population of the five boroughs for calculations.   This resulted in identical
data for each of the boroughs which represent age-adjusted death rates
 for the whole of New York City.

     At the present time, there are no published data directly comparable
to the direct method age-adjusted death rates present in the data base.
Although NCHS did provide unpublished age-adjusted data for 1969-71, they
customarily use the 1940 standard million for calculations.   Therefore,  to
verify the  data,  extensive hand calculations  were  done for randomly selected
counties.   In addition,  calculations were done using the 1940 standard
million to  arrive at results comparable to NCHS data.   Calculated data
corresponded extremely well with the NCHS data.  Any slight variances in
results could be explained by the slight  differences in the distribution of
1970 county population data across  race groups.

     It should be noted  that in some cases,  calculated age-adjusted death
rates may be misleading, specifically in  cases where there are few or no
people in a race-sex-age group in a county.   The situation occurs, for
example,  in counties where there are few, if any,  non-white persons.  In
                                  63

-------
 calculation of age-adjusted death rates, there was no satisfactory method
 of selecting a threshold, or cut-off point, for nuirber of deaths per
 population below which calculations would not be done.  Therefore, in
 those cases where a few deaths occur in a very small population, the age-
 adjusted death rate may be extremely high.   The user may avoid problems
with interpretation of the data by glancing at population data to obtain a
 feeling for the distribution across race-sex-age groups.

     There were isolated cases in which one or more deaths occurred in a
 county for a race-sex-age group for which there were no people.   This
problem may be due to the inherently imprecise nature of Census  data when
working with small numbers or to inaccuracies in the NCHS tapes.   In these
cases, the age-specific rate for that race-sex-age group was set to 0.

     When all loading and verification of the mortality data were completed,
all components in the mortality repeating group were converted back to
KEY'ed components.  The data base was then  reorganized using the REORGANIZE
ALL command to re-structure the data base tables and pointers to maximize
efficiency of retrievals.

     The 1970 age-specific population data  were taken from raw Census tapes
and aggregated into 15 age groups and 8 race-sex categories (4 total
categories and 4 urban categories).   It was assumed that since age-specific
Census data would be accessed rather infrequently, those race-sex categories
defined in the data base would be the minimum necessary for derivation of
all categories.   For example, total non-white can be derived by  subtracting
total white from the total population.   User-defined functions are provided
to derive those components not explicitly defined.

     The Census  data were aggregated, formatted for entry into the data
base, and then placed in mass storage files.   A PLI program was  used to
load the data by means of the PLI optimized loading.   During loading,
another component for non-white males was calculated and loaded to further
simplify calculations of components not explicitly defined for use in
in-out migration and mortality calculations.   To further enhance the
Census data repeating group, figures were totalled across the 15 age groups
                                   64

-------
 to create  a 16th  data  set  giving  county  totals  for  each  component.   These
 totals were loaded  as  data with AGE GROUPING  code 0.

     Geocoding problems with independent cities for Census data were handled
 as  has been previously discussed.  The Census county codes are shown in
 the cross-reference file of the POPATRISK Geocoding Manual except for the
 District of Columbia.  There were no data for the District of Columbia on
 the Census tape, and therefore, the data had to be  manually extracted from
 Census publication PC(V2)-10 General Population Characteristics and loaded
 separately.

     Verification of the Census data has been an on-going process of spot
 cross-checking against various sources.  The primary source used for
 verification was Census publication PC(V2)  General Population Character-
 istics giving detailed population figures for each  county.  Loaded Census
 data were  checked against PC(V2)  and then cross-checked against the county
entry level data and the Bureau of Census publication 1972 County and
 City Data Book (CCDB) to assure data loaded into each county were the
 correct data for that county.

     There were found,  in some cases,  to be minor discrepancies between
 the total age-specific population as shown in the Census  repeating group
and the total population as shown in component CIO.   These discrepancies
are explained in the CCDB to occur because  figures shown  in the main body
of  the book are only representative counts.   Complete counts  appear in
Appendices A and B of the CCDB and correspond  exactly with those loaded
into the data base Census repeating group.   In order to resolve the
problem and to eliminate confusion over the figures, CIO  TOTAL  POPULATION
 figures were changed to show complete  counts.

Conversion to SYSTEM 2000  Version 2.80

     Loading of the data base  was  initiated using SYSTEM  2000 Version 2.65.
The data base was partially loaded when the National Computer Center
installed a new version of SYSTEM 2000 Version 2.80.   Normally,  new releases
of  the system are fairly transparent to data base development and require
                                  65

-------
a. minimal effort to adapt to changes.  However, Version 2.80 contains
major changes to the basic file structure, namely decreasing page size
from 504 words to 448 words.  This change, in effect, reduces input/
output charges for accessing and updating the data base.  A reload of the
data base was necessary to gain optimal benefit from Version 2.80 features.
Reloading also provided an opportunity to make desired modifications to
the data base design to provide for more efficient retrievals and space in
the data base for newly obtained in-out migration and mortality data.

     Redesign and conversion to Version 2.80  involved unloading the
existing data base and reloading the data into a newly designed data base
created under Version 2.80.   The conversion went smoothly, and the benefits
of the redesign and conversion far outweighed the cost of the effort.

USE OF REMOVABLE DISK PACK AND ARCHIVING PROCEDURES

     As data base development proceeded and the size of the data base files
began to increase substantially, the cost of maintaining the files on
mass storage became prohibitive.  The use of magnetic tapes as the primary
storage medium was also undesirable because of the high I/O costs involved
in transferring large files between tape and mass storage.

     In light of the significant increase in file sizes expected with the
addition of in-out migration and mortality data to the data base, the most
cost-effective solution to the data base storage problem was a private
removable disk pack.   Therefore the POPATRISK data base was transferred to
removable disk pack to minimize monthly charges for disk space utilization
and to provide for relatively easy access for frequent use.

     There are a few drawbacks to use of the private disk pack which should
be mentioned here.   The first is the inconvenience of having to have the
disk pack mounted before accessing the data base.   Experience has shown,
however, that this is only  a  minor inconvenience and usually involves
only a 3-5 minute wait.  The other drawback is that files on the disk pack
are not protected from accidental loss by the Secure Processor.   Archiving
of files residing on a disk pack is left completely to the user.
                                   66

-------
     Utilization of a dependable backup and restore system is vital to
 successful creation of a SYSTEM 2000 data base.  SYSTEM 2000 flags a
 data base damaged whenever an update job, either demand or batch, terminates
 abnormally.  Having been saved prior to the update job, the previous
 updated version of the data base may be restored and operations resumed.
 Since this error is generated by an update run 'MAX TIME1  or any EXEC 8
 system interrupt,it became a fairly common problem with POPATRISK and was
 resolved routinely.

     A POPATRISK backup system was utilized during development of the data
base to both insure against accidental loss and to provide sufficient
 archiving of data base updates.   Similar to backup systems used by the
National Mr Data Branch of EPA, n generations of the data base are archived
 on tape, one generation per backup tape.   The tapes are cycled each time
 the data base is saved so that only the last n archival versions of the
 data base are retained.   An archival restore runstream was used in conjunc-
 tion with the backup system to provide access to previous  versions of the
 data base.   Both backup and restore runstreams are fully documented in
Appendix B.

     A UNIVAC EXEC 8 system error encountered during data  base development
 should be mentioned here because of its impact on the data base and the
substantial cost in man hours and computer time required for recovery.
National Computer Center personnel determined that a system error caused
damage to one or more of the data base files, resulting in a shift in data
within the data base.   Since the data base was not flagged as damaged, the
error went undetected throughout a significant portion of  data base loading
and archiving.   Recovery from the damage  required recovering appropriate
data files and reloading all data loaded  since the last correct version of
 the data base.   This problem of  unflagged damage to data is a data processor's
nightmare and is mentioned here  to emphasize the point that although safe-
guards are built into  a data management system, they do not always work.
Experience with POPATRISK has shown the importance of continuous monitoring
of existing data in the data base as well as extensive verification of data
presently being loaded.
                                   67

-------
     The POPATRISK data base was loaded by both S2K Natural Language and
Procedural Language COBOL Interface programming.  During the initial
stages of data base development,when moderate amounts of data were being
loaded and data base files were small, Natural Language Queue Access
loading performed a satisfactory function.   As the volume of incoming data
grew substantially and the data base files  expanded, increases in computer
I/O changes made Natural Language processing prohibitive.  Considering
the need for a more efficient method of loading large quantities of data
and the necessity for interaction with data base data during processing,
and loading, it was decided that the benefits of PLI programs would far
out-weigh the costs of their development.

     Procedural Language Interface optimized loading provided a much more
cost-effective method of loading and allowed full access to existing data
base data, a capability most useful in calculation of migration and age-
adjusted mortality data.  However, even with the increased efficiency of
loading, by the time mortality data were being loaded, the data base files
had expanded such that running times and I/O charges for mortality load
jobs were growing exhorbitantly.

     The major factor contributing to the increased I/O charges was the
number of data sets to be loaded, all components of which were  KEY'ed.
A decision was made to make the components  NON-KEY during loading, a new
option available under SYSTEM 2000 Version  2.80.  The action in fact
decreased mortality data load times by at least a factor of 10.  Changing
the components back to KEY upon completion  of loading was accomplished in
2 hours of computer time.  The net results  of the loading procedure were
substantial savings in time and computer charges.   The REORGANIZE ALL
command was issued at various times during  loading and upon completion of
data base development to reorder all the data base tables whose entries
were scattered as a result of incremental load and update operations.
                                   68

-------
                                 APPENDIX A

                 RESOLUTION OF GEOCODING DISCREPANCIES FOR
                     MASSACHUSETTS NEDS AND SAROAD DATA
Overview of Work Done on Massachusetts NEDS


     Massachusetts NEDS point sources were assigned to appropriate Census

counties by a computerized procedure involving geographic location by UTM

coordinates.  Area sources were apportioned by first aggregating the entire

state's area emissions and then disaggregating into Census counties following

the EPA publication, "Guide for Compiling a Comprehensive Emission Inventory."

Massachusetts SAROAD monitoring stations were assigned by identifying the

city name of all SAROAD monitoring sites and then locating the Census county

in which the city was located.


Detailed Procedures Used in Treating NEDS Point Sources


     To obtain Census county emissions totals for NEDS point sources in

Massachusetts, the points were  apportioned from NEDS counties into Census

counties.  SSI staff accessed the AEROS NADB*NEDS-USER file to retrieve

the following data for approximately 2500 Massachusetts points.


          STATE
          COUNTY
          AQCR
          PLANT ID NUMBER
          CITY
          UTM ZONE
          YEAR OF RECORD
          ESTABLISHMENT NAME AND ADDRESS
          POINT-ID
          UTM-HORIZONTAL
          UTM-VERTICAL
          EMISSION ESTIMATES OF PARTICIPATES
          EMISSION ESTIMATES OF S0?
          EMISSION ESTIMATES OF NO".
          EMISSION ESTIMATES OF HC
          EMISSION ESTIMATES OF CO
                                   69

-------
     These data were supplied to an SSI  consultant,   Dr.  Richard J.  Kopec,
who identified Census counties for each  point source according to UTM

coordinates of the point.   For the several points  in which UTM coordinates
were missing in the NEDS-USER file, SSI  staff searched for another point
in the file with the same  city code as the point with incomplete information.

In most cases, the UTM coordinate of this new point  was used to locate

the original point in the correct Census county.


     For example, the NEDS point

          STATE                                   22
          COUNTY                                  1291
          AQCR                                    119
          PLANT ID NUMBER                         0510
          CITY                                    1700
          UTM ZONE                                19
          YEAR OF RECORD                          72
          ESTABLISHMENT NAME AND ADDRESS          POLAROID, 1 UPLAND RD,
          POINT-ID                                03
          UTM-HORIZONTAL
          UTM-VERTICAL
          EMISSION ESTIMATES OF PARTICIPATES      0000000
          EMISSION ESTIMATES OF SO                0000000
          EMISSION ESTIMATES OF NO                0000000
          EMISSION ESTIMATES OF HC                0000008
          EMISSION ESTIMATES OF CO                0000000

has no UTM coordinates (UTM-HORIZONTAL,  UTM-VERTICAL).  It has a city code

of 1700.  The following point,  also with a city code of 1700,  and with UTM
coordinates present in the point record  was located  in the NEDS-USER FILE.


          STATE                                   22
          COUNTY                                  1291
          AQCR                                    119
          PLANT ID NUMBER                          5732
          CITY                                    1700
          UTM ZONE                                19
          YEAR OF RECORD                          72
          ESTABLISHMENT NAME AND ADDRESS          RICHARD A KLEIN CO.349 LENOX,NORWOC
          POINT-ID                                01
          UTM-HORIZONTAL                          3181
          UTM-VERTICAL                            46723
          EMISSION ESTIMATES OF PARTICULATES       0000000
          EMISSION ESTIMATES OF SO                0000000
          EMISSION ESTIMATES OF NO".               0000000
          EMISSION ESTIMATES OF HC                0000022
          EMISSION ESTIMATES OF CO                0000000


                                   70

-------
The UTM coordinates of this source were substituted in the point record of
the original source.  These "dummy" UTM coordinates were used to place the
original source in the correct Census county.  Table A-l lists
the point sources with missing UTM coordinates and includes the "dummy"
coordinate used to identify the point's Census county.  Although in many
cases identical multiple entries appear in the NEDS file due, e.g., to
multiple stacks at a given site, only one point is listed below for each
plant.  (Points around the city of Boston were carefully checked when using
this method of Census county identification because the city lies in more
than one county boundary.)

     In a few instances it was not possible to locate other points in the
file with the same city code as the incomplete point source.   For example,
no points in the Lower Pioneer Valley Region (City Code 6866) contained
UTM coordinates in the point records,  so there was no "dummy" UTM coordinate
available.   Also,  several point records contained UTM coordinates that
misplaced the point source by locating it outside of the state.

     In these cases the name and address of each  point was  traced to place
the point in the correct Census county.  Table A-2 contains record descriptions
of point sources which were located in Census counties by this method.
Again, only one set of values is listed for each plant.

     After identifying Census counties for all the Massachusetts points,
county totals were calculated for each of the five emission categories.
Table A-3, "Total Emissions Summed by Type and by County," illustrates the
results of this summation and the data that were loaded into the POPATRISK
data base.

Detailed Results for NEDS Area Sources

     Table A-4 gives the results of apportioning state total emissions
among 14 Census counties.
                                  71

-------
       W
       H
                                 00 JO
                 S31VWI1S3 NOISSIW3
                OH JO
S31VWI1S3 NOISSIW3
                                 m jo
                 S31VWI1S3 NOISSIW3


                                2OS JO
                 S3J.VWIJ.S3 NOISSIW3
                     S3ivinoiibvd jo
                 S3IVWI1S3 NOISSIW3
OOOOCDOOOOOOOOOOOOOOOOOOOOCDOO OOO

O OOCDC5CI7OOOOOOCDOOOOOOOOOOCDOOOO OCDO
OOOOOOOOOOOOCDOCDOOCDCDOOOOOOOOCD OCDO
OOCDOOOOOOCDCDCDC^OOOOC^CiCDOOOOOOOO OOO
ooooooooooooooaooooooooooooo ooo
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO

                iojcoc^cr*vx5i-ncT»c\jcoto o cocNJO(Nj«^-<3-o^<^-i — UD oo«^-o
                 ro«X>O«3-'d-oC5O. — CNJCNJ
                l^-unoOOC5rOCXjrocr.COOOOOOOOOOC3OO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO

OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO

OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOCT OOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOO  >OOOOOOOOOOOOOOoO OOO

OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOO OOO
JJ
 ct)
H
                              , — ro«3-c\JC\J
                              rororororo
                                                           ocr>c\jCvj, — • — cvj
                                                           cooorororororo
                                                   - z: C3
                                                 " 2: e£ Z
                                           o: a I— o 3: •—
                                           1—    oo H- O 	I
                                           OO  "   CO LU C£
                                                  o LU o
                                           ~^ OO LU CO Z CQ
                                           <-o    a
                                           10 t— ce o  «   -
                                             t— o o LU v:
                                            * o ca CM r> a_ LU
                                           o <_>
                                           s: •-• 10  -
                                           o o
                      0^0338  jo
                              3NOZ
                                                    OOOOOOOOOOOOOOOOOOOOCMOOr-^ OOO
                                                                                       , — oicricncTi<^oir>CNjc\jLr>r— i — oO
                                                       OCMOOOi — i — OOOOOO
                                                                                               • — « — CNJOi — CXJO CDO. —
                                                                I — CMoJtnoooooooooi — r— o
                                                                                                      o o o o ooo
                                                                                                                 OOO
                                                                                                              CM CMOsJOU
                                                                                                         CVICMCJOJCVJCO
                                 31V1S
                                                                72

-------
CM

4s

 0)
                                            03  JO

                          S11VWI1S3 NOISSIW3



                                            3H  JO
                                          *ON JO
                          S31VWI1S3 NOISSIW3


                                          ZOS JO
                          S3IVWI1S3 NOISSIW3
                              S31V~inOIlHVd JO
                          S31W'JI1S3 MOISSIW3
                                   uoz laon-win
                                       QIJV
                                        JO 8V3A

                                       3'joz  uin

                                            A1I3


                                       01  INVId
                                                      *

                                                      to
                                                      o
                                                      e:
                                                      o
                                                      8
                                                      s:
                                                      3

                                                      CO
                                                      O
                                                      a.
OOOOOOCMOCM
OOOOOOOOCM
ooooooooo
ooooooooo
OOOOOOOOO
   §00000000
   oooooooo

   §«j- cn f—. ro 10 »— u"> oo
   CO o O •— OCDOOO
OOOOOOOVOO
CD CD CO CO CD CD C^ CD ^D
CD CD CD O CD CD CD CD CD
OOOOOOOOO
OOOOOOOOO
0<
o<
    )O O O O O OO
    > O O O O O OO
ooooooooo
ooooooooo
ooooooooo

r-OOOOOOOO
CMOOOOOOOO

ooooooooo
CD CD O CD CD CD CD fTt CD
ooooooooo
OO O
                                                                    OOOOOO
CMOOOOOi—Ovf

CD CD CD O CO CD CD CD CD

CD CD CD CD CD CD CO ^D CD



lOi— cn O O OO O op i—

cn i— co ro UD r*^ co
                                                            oo cn cn cn cn o oo 10 CD
                                                            ir> ID i— i— ^roroor^.
orz: or z: LU
so h- 3 > :r
OH- z o o a.
LU_I  «   —_
cao. i— (— <_> o.
  o to to or

         : o z
         .    .-• o .
                                                               ouj
                                                            ro
          or 3: z 01
          o <
          c_>.

          CD '
                  See :
                  o.
                                                            O LU CM
                                                            2? CD P*»  •> ~I~ CO  * O
                                                            i— CD.  «ZD
                                                                              00
OOJCMCMCMCNJO*3-O


cn cn cn cn en cn cn co co
OOCDOrO
               -stCDI-
  CMirjrococnr^.— ro
CNJOl— OJOr— OC-Jr—
                                                            cnocricncncnocvjcNj
                                                            •—•— r— r— r— ,— OO*J-«tf-
                                                            r-i— 1— ^_^-^-^_OO
                                                            ^-t— I— .— r-r— ^0000
                                                            cn cn cn en cn en CM cn cn
                                                            
-------
                   Table A-3
TOTAL EMISSIONS SUMMED BY TYPE AND BY COUNTY

COUNTY =1
Barns table
VARIABLE SUM
PART
SO 2
NOX
HC
CO

PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO
1383.0000000
31682.0000000
8699.0000000
590.0000000
1202.0000000
rniiMTY-7
Berkshire
1363.0GOOOOO
10727.0000000
2017.0000000
1178.0000000
2192.0000000
rniiNTY-c; _ __ _
Bristol
6726.000000
109956.000000
36436.000000
23459.000000
2858.000000
rniiMTv-7 -
Dukes
44.00000000
2.00000000
15.00000000
30.00000000
227.00000000
rniiMTY-Q - _
Essex
2344.0000000
15274.0000000
3859.0000000
7246.0000000
1128.0000000

COUNTY =11
Franklin
VARIABLE SUM
PART
S02
NOX
HC
CO

PART
S02
JJOX
HC
CO

PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO
500.00000000
1297.00000000
328.00000000
731 .00000000
1409.00000000
rniiwTY-i "3
Hampden
27003.0000000
55770.0000000
22517.0000000
18852.0000000
3433.0000000
rniiMTY-i c
Hampshire
9260.0000000
22639.0000000
4915.0000000
8407.0000000
1633.0000000
rfl!IIMTY=1 7 - - 	
Middlesex
5176.0000000
12562.0000000
2860.0000000
28724.0000000
3248.0000000
- mi IWTV-1 Q -
Nantucket
86.00000000
5.00000000
32.00000000
230.00000000
459.00000000

COUNTY=21
Norfolk
VARIABLE SUM
PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO

PART
S02
NOX
HC
CO
1080.0000000
3905.0000000
939.0000000
16457.0000000
750.0000000
.-rnnwTY-7^-
Plymouth
173.00000000
11.00000000
64.00000000
928.00000000
922.00000000
	 rnilMTY-?f",_ 	
Suffolk
3325.0000000
7324.0000000
1658.0000000
38277.0000000
122.0000000
--milNTY-97- _ __ _
Worcester
14437.0000000
19081 .0000000
9228.0000000
6071 .0000000
17810.0000000
                    74

-------
                        Table A-4

MASSACHUSETTS AREA EMISSIONS TOTALS BY POPATRISK COUNTIES
                     (All State 22)
POPATRISK
COUNTY
0001
0003
0005
0007
0009
0011
0013
0015
0017
0019
0021
0023
0025
0027
TSP
1,136.5
1,806.6
4,711.1
70.2
6,741.9
580.3
4,989.0
1,104.6
13,872.5
47.8
3,718.8
3,127.9
6,520.5
5,582.6
so2
1,827.2
3,747.5
11,095.3
102.4
14,958.0
1,280.0
10,882.6
2,574.9
30,154.6
93.6
12,400.9
7,097.2
14,569.4
15,299.7
NOX
6,558.1
9,068.3
21,000.7
426.0
31,680.4
2,689.4
23,954.8
5,046.8
66,601.1
242.1
27,461.9
14,495.6
29,334.6
29,877.5
CO
56,345.0
67,896.0
138,511.3
3,785.2
223,638.6
18,994.2
172,158.6
34,182.5
486,865.6
1,931.7
198,855.1
100,580.0
204,053.0
201,280.0
                         75

-------
                                   APPENDIX B
                   DOCUMENTATION OF DATA TAPES AND RUNSTREAMS


BUREAU OF THE CENSUS IN-OUT MIGRATION TAPE FILE DESCRIPTION


1.   File Title:   Migration by counties,  1970

2.   Technical Characteristics

    a.  Tape type and density
        9 track, 1600 BPI, EBCDIC,  odd parity, standard IBM OS labels.

    b.  Record/field size.  Fixed length
        1464 character records,  7320 character blocks.   Data records
        contain 120 characters for  geography and 168 8-character
        data cells.
3.   File Size:  3192 records

4.   File Sequence:  FIPS  state by FIPS county.
5.   File contains one record for each state and one record for each county.
    State records contain a "000" for county code and a record identifier
    of "A" County records contain a valid county code and a blank record
    identifier.   There is a special record for New York City as a whole in
    addition to the individual counties in New York City.  This record is
    identified as state 36, county 000 and record identifier "B."
    There is also an additional  out-migrant  record in New York coded 36995
    which contains the tabulation of only records with a response of
    "New York City—no borough given" for 1965 and a residence code in 1970
    outside of New York City.  This out-migrant record,  however,  should not
    be used with much confidence.

                                RECORD LAYOUT

Matrix is:

Migration (2), Race (3),  Sex (2), Age (7), Allocation (2).  (168 cells).

    Geography                                        Characters

    State                                              1-2
    County                                             3-5
    Identifier                                         6
    Unused                                             7 - 119
    Dollar Sign                                        120
                                    76

-------
Data                                           121 - 1464
Immigration                                    121 - 792
Total All Races                                121 - 344
Male                                           121 - 232
5-14 years
  Allocated                                    121 - 128
  Not Allocated                                129 - 136

15-19 years
  Allocated                                    137 - 144
  Not Allocated                                145 - 152

20-24 years
  Allocated                                    153 - 160
  Not Allocated                                161 - 168

25-29 years
  Allocated                                    169 - 176
  Not Allocated                                177 - 184

30-34 years
  Allocated                                    185 - 192
  Not Allocated                                193 - 200

45-64 years
  Allocated                                    201 - 208
  Not Allocated                                209 - 216
65 years and over
  Allocated                                    217 - 224
  Not Allocated                                225 - 232
Female                                         233 - 344

5-14 years
  Allocated                                    233 - 240
  Not Allocated                                241 - 248
15-19 years

White                                          345 - 568
  Male                                         345 - 456
  Female                                       457  - 568
    •
    •

Black

  Male                                         569  - 680
    •
    •

  Female                                       681  - 792
                                77

-------
    Out  Migration
      Total all  races
      Male                                        793 -  904
      Female                                       905  -  1016

    White

      Male                                        1017  -  1128
        •
        •
        •

      Female                                      1129  -  1240
        •
        •

    Black

      Male                                        1241  -  1352
        •
        •
        •

      Female                                      1353  -  1464
In-migrants in this file also are persons who reported a foreign country as
their residence in 1965; however,the out-migrant category excludes persons who
had reported their residence in that  county in 1965 but were living overseas
in 1970.   For counties that do not have any persons reporting a foreign
county as their residence  in 1965, the count of unallocated  in-migrants will
equal the number shown in  Table 119 of the PC(1)-C report.

6.  Considerations Regarding File Content
    Allocation Method;
    The data presented in  this file are based on question 19 of the 1970 Census
    15 percent household sample questionnaire.  It requested those persons
    who reported living in a different house in 1965 to report the state,
    county and city,  town  or village  (if applicable) for their residence in
    1965.  Persons who indicated that  they lived in a different house but
    did not report its address were classified as "moved, residence not
    reported."

    An allocation procedure was developed for such nonresponses which
    assigned a 1965 state  and county  of residence based on the respondents
    age,  sex, race, college status and military status.  Such allocations are
    by design valid only to the county level of geography.

    To allow comparability with published reports, both allocated and
    unallocated migrant counts are shown for each county or  Census county
    division in the United States.
                                     78

-------
Suppression

Data on in-migrants were suppressed for those counties that had fewer
than 34 weighted cases.  Specifically, these counties are:

  1)  Angoon County, Alaska (St. 02, County 030)
  2)  Hinsdale County, Colorado (St. 08, County 055)
  3)  Yellowstone National Park County, Montana (St. 30, County 113)

Data fields for these counties will contain zeroes.

Comparison with Published Data

In theory, the unallocated counts of migrants into any county developed
from this file should equal the number of persons living in a different
county plus those abroad in 1965.  (See Table 119, 125, or 145 of the
1970 Census State reports.)

However,  there will be in some counties fewer unallocated "different
county in-migrants"  than are shown in the published reports.  This stems
from the allocation procedure which altered the records for persons who
were in the military in either 1965 or 1970 and who reported "at sea in
1965" by substituting a county of residence in 1965.  However, if the
individual was not in the military either in 1965 or 1970, his "at sea"
response was not changed.

Some counties may also have more unallocated "different county in-migrants"
than are shown in the published reports.  This stems from the different
interpretations of a complete report for persons reporting a foreign
country as their residence in 1965.

The number of unallocated out-migrants shown on this file for a given
county may be compared to Table 2 of the PC(2)-2E Migration Between
State Economic Areas report provided that the given county is also a
state economic area.

Migration Between New York City Boroughs

A special code was assigned to those persons living in New York City in
1970 whose reponse for 1965 residence merely indicated different house-
New York City.  Persons assigned this code were not included in either
the in- or out-migrant tallies for individual boroughs.

Because of this unique situation, a special record is provided for New
York City as a whole.  (This record is identified as State 36, County 000
and Record Identifier B.)   For this summary record, in-migrants  exclude
those persons who reported their residence in 1965 to either a New York
City borough and those persons cited to "place of residence-New York City-
no borough given."

In-and out-migrants  for specific boroughs include only those persons with
either allocated or  complete responses to residence five years ago.
Persons giving "New York City-no borough given" were not included as in-
or out-migrants from boroughs.
                                79

-------
MORTALITY WORKTAPE FILE DESCRIPTION-AGGREGATION OF 1969-71 MORTALITY
DATA INTO 50 ICDA CATEGORIES


1.   File Title:  Mortality Counts,for 50 ICDA's

2.   Technical Character!sitics
     a.   Tape type and density
          9 track, 1600 BPI,  standard UNIVAC labeled tape
     b.   Record/field size.  Fixed  length

          24005 character records, 24005 character blocks.  Data
          records contain 5 characters for state and county and
          3000 8 character data cells.

3.   File size:  3067 records

4.   File sequence.  File contains one record for each NCHS county,
     one record provided for New York City area as a whole with no
     borough breakdown.  Records for Virginia independent cities have
     been combined with their original county records.  Each record
     contains combined mortality counts for 1969-1971 broken out into
     50 categories (48 HERL categories + 2 approved neoplasm clusters) ,
     15 age groups, and 4 race-sex categories.
                              RECORD LAYOUT

Matrix is:

ICD # (50), Age (15), Sex Race (4),  (3000 cells).

            Geography                           Characters

State                                             1-2

County                                            3-5

Data                                              6 -24005

ICDA - 1

0-4 Age Group

    White Male                                    6-13
    White Female                                 14 - 21
    Non-white Male                               22 - 29
    Non-white Female                             30-37
                                  80

-------
             Geography

 5-9  Age  Group

     White Male
     White Female
     Non-white Male
     Non-white Female

 10-14 Age Group

     White Male
     White Female
     Non-white Male
     Non-white Female

 15-19 Age Group

     White Male
     White Female
     Non-white Male
     Non-white Female

 20-24 Age Group

     White Male
     White Female
     Non-white Male
    Non-white Female

 25-29 Age Group

    White Male
    White Female
    Non-white Male
    Non-white Female

 30-34 Age Group
    White Male
    White Female
    Non-white Male
    Non-white Female

 35-39 Age Group

    White Male
    White Female
    Non-white Male
    Non-white Female

40-44 Age Group

    White Male
    White Female
    Non-white Male
    Non-white Female
 Characters
  38 - 45
  46 - 53
  54 - 61
  62 - 69

  70 - 101
102 -  133
134 - 165
166 - 197
198 - 229
230 - 261
262 - 293
                                  81

-------
                        ••*».
            Geography                           Characters

45-49 Age Group                                 247 - 325

    White Male
    White Female
    Non-white Male
    Non-white Female

50-54 Age Group                                 326 - 357

    White Male
    White Female
    Non-white Male
    Non-white Female

55-59 Age Group                                 358 - 389

    White Male
    White Female
    Non-white Male
    Non-white Female

60-64 Age Group                                 390 - 421

    White Male
    White Female
    Non-white Male
    Non-white Female

65-69 Age Group                                 422 - 453

    White Male
    White Female
    Non-white Male
    Non-white Female

70 > Age Group                                  454 - 485

    White Male
    White Female
    Non-white Male
    Non-white Female

ICDA - 2 through ICDA - 50

0-4 Age Group through 70 > Age Group             486 - 24005

    White Male
    White Female
    Non-white Male
    Non-white Female
                                   82

-------
 POPATRISK DATA BASE BACKUP SYSTEM


     A data base backup system was used during development of the  POPATRISK

 data base to provide adequate archiving of updates during loading.  Backup

 procedures were used to protect against computer system failures as well as

 data processing errors.  The runstream documented below causes cycling of

 a specified number of backup tapes so that the current version of  the data

 base is copied to the tape containing the oldest version.


              0RUN
              aS¥M PRINTStf'SITEID'
              0ASG,T T.,F40,DSD087
              0REMARK*  .  GET CYCLED  BACKUP TAPE AND ASSIGN
              «8EDfR APPAR*POPTAPES.
              OFF DSPLIT
              SPLIT TAPE  N N
              §ADD,P TAPE
              ^REMARK*  .  COPY EACH DATA BASE FILE  TO TAPE
              eC()Py,GM  I POPATRISK — . ,POPSAVE.
                    ,GM  2POPATRISK —. ,POPSAVE.
                    GM  3POPATRISK —. .POPSAVE.
              «COPffGAi  4POPATRISK —. .POPSAVE.
              flCOPy,GM  5POPATRI6K —.iPOPSAVE.
              flC()py,GM  6POPATRI5K —. ,POPSAVE.
              aCOPy,GM  7POPATRISK —. 9P()PSAVE.
            -  «?PRT,F I POPATRISK—.
              SPRT,F 2POPATRISK—.
              SPRT,F 3POPATRISK—.
              «PRT,F 4POPATRI5K—.
              tfPRT.F 5POPATRISK —.
              8PRT.F 6POPATRISK—.
              iSPRT.F 7POPATRISK—.
              SIED.U APPAH*P()PTAPES.
              MOVE N
              LNPRINT!
              SFREE APPAR*P()PSAVE
              e?FIN


0RUN

     Data base backup requires  just over  five minutes SUP time.  Therefore,

 the run card should specify  10  minutes  run time.
                                83

-------
      PR I NTS,, 'SITE ID'

     This command is used to route output to user's remote terminal.

Elimination of the command causes printing at the RTF on-site printer.

0ASG.T  T.,F40,DSD087

     A tempory file is  assigned on the disk pack to assure pack is mounted

prior to accessing files.


0ED.R APPAR*POPTAPES.
OFF  DSPLIT
SPLIT TAPE N  N


     The UNIVAC editor  is invoked to access a file containing a list of

backup tape assignment  commands.  The SPLIT command copies the Nth tape

assignment command to the temporary element TAPE.

 0ADD.P  TAPE

     Temporary element  TAPE is added into the runstream causing assignment

of the selected tape.
             §COP¥,GM  IPOPATRISK — . ,P()PSAVE.
             §COPr,GM  2POPATRISK — . ,POPSAVE.
             dCOPY, GM  3POHATRISK — . ,POPSAVE.
             aC()Pr,GM  4POPATRISK — . ,P()PSAVE.
             §COPY«GM  5POPATRISK — • ,P()PSAVE.
             @C()PVtGM  6POPATRI6K-. ,P()PSAVE.
                   GM  7POPATRISK — . ,POPSAVE.
     Each of the seven  SYSTEM 2000 data base files is copied to tape in

COPY, G format.
             £PRT,F  IPOPATRISK — .
             @PRT,F  2POPATRISK—.
             SPRT,F  3POPATRISK—.
             0PRT.F  4POPATRISK —.
             §PRT,F  5POPATRISK—.
             0PRT,F  6POPATRISK —.
                  F  7POPATRISK---.
                                 84

-------
     The Master File Directory entry for each file currently on the disk

pack is printed

     9ED.U APPAR*POPTAPES.
     MOVE N
     LNPRINT!


     The tape  file is edited to move the tape command just used to the

top of the tape list, thus cycling the backup tapes.

    3FREE APPAR*POPSAVE        -
    8FJN

The backup tape is released and the run terminated.


     The runstream documented below is used in conjunction with the backup

runstream discussed above, and is used to copy a previous version of the

data base from tape to disk pack.
                 PRINTS, ,'SITEID'
                 T T.,F40,DSD087
           0ASG,T POPSAVE.,16N,TAPE  4
           tf DELETE, C IPOPATRISK —
           @ASG,C  IPOPATRISK— ,///45000,DSD087
           QCOPY.G POPSAVE., I POPATRISK--.
           ^DELETE ,C 2POPATR I SK—
           0ASG,C 2POPATRISK — ,///45000,DSD087
           @C()PVfG POPSAVE. ,2POPATRISK — .
           ^DELETE ,C 3POPATRISK—
           QASG,C 3POPATRISK— f///45000,DSD087
           SCOPy.G POPSAVE. ,3P()PATRISK—.
           ^DELETE, C 4POPATRISK—
           «JASG,C 4POPATRISK — , ///45000,DSD087
           «C()P)T,0 POPSAVE., 4POPATRISK— .
           flDELETE,C 5POPATRISK— •
           ^ASG,C bPOPATRISK— ,///45000,DSD087
           «COP1T,G POPSAVE., 5POPATR I SK— .
           «DELETE,C 6POPATRISK—
                , C 6POPATR ISK— f ///45000 f DSD087
                   POPSAVE., 6POPATR ISK—.
                   C 7POPATRISK—
           §ASG , C 7POP ATR I SK— , F40/0/POS/999 , DSD087
           fiCOP Y , G POPSA VE . , 7 POPATR I SK— *
                               85

-------
       dFREE  IPOPATRISK —
       «$FREE  2POPATRISK —
       «JFREE  3POPATRISK —
       0FREE  4POPATRISK —
       0FREE  5POPATRISK —
       0FREE  6POPATRISK —
       0FREE  7POPATRISK —
       8PRT.F IPOPATRISK —
       @PRT,F 2POPATRISK —
       9PRT,F 3POPATRISK —
       @PRT,F 4POPATRISK—
       @PRT,F 5POPATRISK—
       «PRT,F 6POPATRISK—
       ^PRTfF 7POPATRISK —
       ®FREE  POPSAVE.
       SRUN

Run card should contain 10 minutes run time.

       0SYM PRINTS,,'SITEIDy
       ^ASGtT T.,F4Q,DSD087

Same as in backup runstream.

       SASG,T POPSAVE.,I6N,TAPE #

Assign tape containing  version  of data base to be restored.

       ^DELETE, C  IPOPATRISK —
       @ASG,C 1 POPATRI^K—, ///45000,DSD087
       JCOPY,G POPSAVE.,IPOPATRISK--.
       §DELETE,C  2POPATRISK—
       eASG,C 2POPATRISK—,///45000,DSDQ87
       eC()Py,G POPSAVE.,2POPATRISK-.
       QiDELETE,C  3POPATRISK —
       ^ASG,C 3POPATRISK--,///45000,DSD087
       tfCOP/,G POPSAVE..3POPATRISK—.
       ^DELETE,C  4POPATPISK —
       ^ASG.C 4POPATRISK—,///45000,DSDOS7
       ^C()PY,G POPSAVE., 4POPATRISK-.
       ^DELETE,C  5POPATHISK —
       ^ASG,C 5POPATRISK— ,///45000 ,DSD087
       ^COPY , G POPSAVE., bPOPATRISK—.
       ^DELETE ,C  6POPATRISK—
       §ASG,C 6POPATR ISK— , ///45000 ,DSD087
       dCOPY.G POPSAVE.,6POPATRISK—.
       ^DELETE,C  7POPATRISK —
       dASG,C 7POPATRISK—,F40/0/P()S/999,DSDOB7
       «8C()PY,G POPSAVE. ,7POPATRISK—.

                          86

-------
     Each of the seven disk pack files are deleted,  reassigned on the disk
pack, and the files copied from tape to disk pack.


            @FREE IPOPATRISK—
            §FREE 2POPATRISK—
            tSFREE 3POPATRISK--
            0FREE 4POPATRISK—
            3FREE 5POPATRISK—
          .  0FREE 6POPATRISK—
            tfFREE 7POPATRISK —

     Free each disk pack file  so the Master File Directory can be updated
with current file information.


           a/PRT,F IPOPATRISK—.
           
-------
                               APPENDIX C

                              OTHER STUDIES

SURVEY OF 1973 M3NITORING ACTIVITY

     This section documents an initial survey made of monitoring stations
active in 1973 by county in the contiguous United States for seven major
air pollutants:  TSP, SO-, NO , CO, Ozone, Suspended Nitrates and Suspended
Sulfates.  Tabulations of levels of monitoring activity were made for all
counties in the 48 contiguous states (Alaska and Hawaii were not considered)
In addition, seven maps were produced, one for each pollutant considered,
on which the number of monitoring stations is indicated by a corresponding
number written in the county of interest of U.S.  Department of Commerce
Maps of the United States, stock number 0301-1895.

     This survey was undertaken in order to:

     1)   Determine the geographical distribution of air monitoring
          activity for the seven pollutants of interest in an easily
          comprehended pictorial form.
     2)   Determine the percentages of U.S. population on a state and
          national basis residing in counties for which the air quality
          is monitored by numbers of stations falling into the following
          categories:  at least 1, at least 2, at least 3, at least 5,
          at least 10.
     3)   Provide a sound basis for judgment on the extent to which
          methods for augmenting the existing air quality data should
          be investigated.
                                  88

-------
 Methodology

 Sources of Data—

     The primary source of data relating to air quality monitoring sites
was the publication, Directory of Air Quality Monitoring Sites Active in
 1973 (Directory), EPA-450/2-75-006.  1973 was chosen as the year investi-
 gated because of the ready availability of this siting information.
The Directory lists siting information, pollutant monitored, method used,
and other data on a state but not strictly a county basis.   In order to
obtain an accurate count of monitoring sites by county it was necessary
to keep a running tabulation while scanning by hand through the states
listed.  A computer program to perform this task was considered but
rejected as being much more costly to develop and implement.

     Population data for the year 1972 were considered sufficiently
accurate and were obtained from the 1972 County and City Data Book.  A
listing (SAROAD file name NADB-PARMFL) of methods, by pollutant,  in use for
monitoring purposes was obtained from Mr. A. A. Slaymaker,  Chief,  Data
Processing Section, National Air Data Branch.   Another publication found
useful in identifying quickly the location and names of counties was
Office of Air Program's Federal Air Quality Control Programs, /'AP-102.

Tabulation of Monitoring Methods Included in and Excluded from the Survey—

     A number of stations listed in the Directory used methods which
were deemed unacceptable or unsatisfactory for the purposes of this survey.
For example,  methods yielding results not directly relatable to air quality
standards were deemed unsatisfactory.   Although a count of  all methods in
use was kept, the results reported are only for those methods considered
to yield reliable air quality data.   In Table  C-l we tabulate, for each
pollutant surveyed, a list of methods included in the count and alongside
(if applicable)  a list of methods  excluded.   In no case would inclusion of
                                   89

-------
                TABLE C-1.  LISTING OF MONITORING METHODS

                  INCLUDED  IN  AND  EXCLUDED  FROM SURVEY
Pollutant
     Methods Included
    Methods Excluded
   SO,
flame photometric
West—Gaeke colorimetric
conductimetric
pulsed-flourescent
pararosanaline-sulfuric acid
total sulfur flame
  photometric
instrumental coulometric
polarographic
Hydrogen Peroxide NaOH
  titration
Sequential-conductimetric
Hydrogen Peroxide
   NO,
Colorimetric-Lyshkow (mod.)
coulometric
chemiluminescence
Sodium Arsenite - frit
NASN Sodium Arnsenite -
  orifice
TEA method - frit
TGS method - frit
TGS method - orifice
Griess-Satzman
polarographic
TEA Method-orifice
Sulfates
Colorimetric
turbidimetric
methyl thymol blue (ASTM)
barium sulfate (ASTM)
Nitrates
2-4 Xylenol
reduction-diazo coupling
specific ion electrod<=-
phenol-disulfonic acid
ultraviolet-spectrophoto-
  metric
 Ozone
chemiluminescence
ultraviolet DASIBI CORP
total oxidant - 0.2
  (NO + NO )

total oxidant colorimetric
   neutral KI
  TSP
Hi-Vol
Membrane Sampler
millipore filter
cassette
sticky paper
soil index tape sampler
bucket gravimetric
nephelometer
smoke shade
cyclone
  CO
nondispersive infra-red
gas chromatographic
dual isotope flourescence

                  90
detection tube
catalytic comb-thermal
  detector

-------
 unacceptable and/or unsatisfactory methods  in the  count  materially alter
 the results of the survey.   We  are pleased  to acknowledge the assistance of
 Mr.  L.  J.  Purdue of the  Environmental  Monitoring and  Support  Lab,  Research
 Center,  RTF,  in assisting in the  technical  judgment of reliability and
 applicability of monitoring  methods.

 The Case of Massachusetts—

     Massachusetts  presented some  special problems in performing this
 survey.  As of  the  date  of this report, no  EPA guidelines exist for a
 clear cut  separation of  Massachusetts  geography into counties.  Many areas
 are  specified as  townships.   The 1970  Census, however, divides Massachusetts
 into 14  counties  and it was  decided to use  this division as the basis for
 deriving siting information.   The Directory specified the AQCR within
which monitoring  stations were located.  By determining the exact  location
 of each  monitoring station within an AQCR we were able to determine the
 corresponding Census county.   Census data then provided the necessary
population  figures.

Results

Maps—
     Figure C-l presents  seven maps of selected areas in  the  U.S.,  one
 for  each of the seven pollutants of interest.  Each map presents numbers
 written  within  each county's borders indicating how many  stations  are
 monitoring a  particular  pollutant.  No numbers are shown  for  counties with
 no monitoring stations.   Complete U.S. maps by county are being provided
 EPA  under  separate cover.

Tables—

     Table C-2 shows tabulation tables, one for each of the seven air
pollutants of interest,  in which the results of this  survey are summarized
on a state basis.   For each of the 48 states studied,  the entries show the
percentage of state population residing in counties (within the state)
having at least 1, at least 2, at least 3, at least 5, and at least 10
stations monitoring that particular pollutant.  Finally,  corresponding
total numbers for the 48 states are given.
                                   91

-------
Figure C-l.  1973 TSP Monitoring
   Station Sample Distribution
              92

-------
Figure C-l.  1973 SO  Monitoring
  Station Sample Distribution
               93

-------
Figure C-l.  1973 NO  Monitoring

  Station Sample Distribution
 •

              94

-------
Figure C-l.  1973 CO Monitoring Station
             Sample Distribution
               95

-------
c
o
•H
Q
 c
 o
•H
 4-J
 n)
 4-1
 M)
 C
•H
 C
 O
 0)
 c
 o
 N
o
CTv
r-t




tH

O

 0)
 n

 M
•H
PM
           VO

-------
Figure C-l.  1973 Sulfates Monitoring
    Station Sample Distribution
                     97

-------
Figure C-l.  1973 Nitrates Monitoring
      Station Sample Distribution
                  98

-------
TABLE C-2.   PERCENTAGE OF STATES'  POPULATION IN COUNTIES
                WITH ONE OR MORE MONITOR1NO- STATIONS

                             1973 TSP
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
77
97
61
93
93
92
100
81
56
58
84
66
53
77
33
43
74
81
90
85
69
54
71
60
67
98
100
84
82
100
70
59
69
92
94
81
100
85
31
80
71
68
62
82
95
46
73
67
79
at least
2
52
90
36
82
87
92
100
71
37
58
77
47
35
43
18
31
56
72
80
78
58
22
63
42
45
86
97
78
66
94
47
38
61
61
74
78
95
51
23
68
60
55
12
63
77
46
60
30
66
at least
3
48
84
27
71
73
89
70
62
37
23
71
45
22
41
14
0
43
67
79
69
54
16
54
32
38
83
76
77
56
90
39
12
58
45
49
67
80
49
14
53
55
43
0
44
61
34
49
0
58
at least
5
43
75
3
35
27
78
70
51
17
16
65
42
16
41
11
0
32
63
71
66
49
16
54
27
38
81
16
49
51
83
29
0
51
36
45
56
70
37
0
47
55
43
0
27
54
17
44
0
49
at least
10
25
55
0
35
0
78
70
27
17
0
54
33
10
0
0
0
0
53
36
47
44
0
48
0
11
81
0
8
0
48
11
0
42
21
27
30
61
25
0
36
52
0
0
17
34
13
28
0
34
                          99

-------
TABLE C-2.   PERCENTAGE OF STATES'  POPULATIONS IN COUNTIES
                WITH ONE OR MORE MONITORING STATIONS

                              1973 SO,
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN

-------
TABLE C-2.  PERCENTAGE OF STATES' POPULATION IN COUNTIES
                WITH ONE OR MORE MONITORING STATIONS

                              1973 NO,,
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
XX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
35
79
17
96
23
100
85
47
21
0
53
56
16
43
72
30
23
74
52
48
52
14
37
2
38
81
5
50
36
48
24
0
55
37
27
50
95
79
1
39
61
43
0
19
54
17
44
21
51
at least
2
26
75
15
82
23
93
70
34
21
0
53
42
13
31
52
30
4
64
0
45
47
14
37
0
11
0
0
49
31
24
20
0
54
21
27
20
70
47
1
30
51
0
0
0
34
0
7
0
38
at least
3
21
0
0
77
23
90
0
27
13
0
49
27
10
7
44
0
0
58
0
40
44
0
34
0
0
0
0
35
0
15
3
0
23
21
0
17
70
45
1
0
21
0
0
0
0
0
0
0
27
at least
5
21
0
0
60
23
82
0
19
13
0
49
11
10
0
38
0
0
56
0
0
38
0
34
0
0
0
0
0
0
0
3
0
9
0
0
17
61
38
1
0
16
0
0
0
0
0
0
0
18
at least
10
0
0
0
35
0
26
0
0
13
0
49
11
0
0
23
0
0
16
0
0
25
0
20
0
0
0
0
0
0
0
0
0
0
0
0
17
61
0
0
0
0
0
0
0
0
0
0
0
10
                            101

-------
TABLE C-2.
PERCENTAGE OF STATES' POPULATION IN COUNTIES
   WITH ONE OR MORE MONITORING STATIONS

             1973 CO
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
10
75
0
93
42
78
0
24
13
0
52
15
16
46
5
16
0
66
70
39
6
0
50
0
38
81
30
94
43
31
3
0
36
36
37
30
61
3
0
33
42
68
22
32
64
13
18
0
41
at least
2
0
75
0
75
23
26
0
19
13
0
49
0
0
24
1
0
0
61
46
30
0
0
34
0
0
0
30
44
43
21
0
0
27
21
27
30
61
0
0
30
0
43
0
10
42
0
24
0
27
at least
3
0
0
0
61
23
0
0
0
0
0
49
0
0
0
1
0
0
44
38
30
0
0
34
0
0
0
0
22
43
8
0
0
16
21
0
17
61
0
0
18
0
0
0
10
42
0
24
0
20
at least
5
0
0
0
38
23
0
0
0
0
0
49
0
0
0
0
0
0
23
0
0
0
0
34
0
0
0
0
0
36
8
0
0
0
0
0
0
61
0
0
0
0
0
0
0
0
0
0
0
11
at least
10
0
0
0
35
0
0
0
0
0
0
0
0
0
0
0
0
0
23
0
0
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
i
                          102

-------
TABLE C-2.   PERCENTAGE OF STATES'  POPULATIONS IN COUNTIES
                WITH ONE OR MORE MONITORING STATIONS

                        1973 OZONE
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
0
75
0
96
42
82
0
41
13
0
51
26
16
33
33
28
0
46
59
30
38
15
37
0
26
56
35
39
31
31
10
0
46
36
34
17
61
9
0
35
22
68
0
23
34
0
24
0
37

at least
2
0
0
0
83
23
26
0
27
0
0
49
15
0
24
27
16
0
46
38
0
0
14
20
0
0
56
0
15
31
9
7
0
31
21
27
17
61
0
0
33
14
43
0
5
34
0
23
0
25

at least
3
0
0
0
73
23
26
0
19
0
0
49
0
0
0
0
0
0
29
25
0
0
0
20
0
0
0
0
0
31
0
0
0
14
0
0
17
0
0
0
18
0
0
0
5
34
0
0
0
16

at least
5
0
0
0
50
23
26
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
31
0
0
0
6
0
0
17
0
0
0
0
0
0
0
0
34
0
0
0
8

at least
10
0
0
0
35
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4

                          103

-------
TABLE C-2.
PERCENTAGE OF STATES'  POPULATIONS IN COUNTIES
    WITH ONE OR MORE MONITORING STATIONS

         1973 SULFATES
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NB
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US

at least
1
0
97
0
0
0
90
0
0
60
0
0
42
0
0
24
0
0
55
0
0
69
0
21
8
0
25
0
0
0
44
0
59
9
0
0
0
0
0
0
7
17
0
0
0
0
0
0
0
14

at least
2
0
86
0
0
0
86
0
0
19
0
0
19
0
0
12
0
0
45
0
0
59
0
17
8
0
25
0
0
0
11
0
59
9
0
0
0
0
0
0
3
17
0
0
0
0
0
0
0
8

at least
3
0
78
0
0
0
56
0
0
8
0
0
12
0
0
5
0
0
43
0
0
56
0
14
8
0
25
0
0
0
4
0
31
9
0
0
0
0
0
0
0
17
0
0
0
0
0
0
0
6

at least
5
0
75
0
0
0
56
0
0
0
0
0
12
0
0
5
0
0
23
0
0
49
0
14
0
0
25
0
0
0
4
0
12
9
0
0
0
0
0
0
0
17
0
0
0
0
0
0
0
5

at least
10
0
55
0
0
0
51
0
0
0
0
0
0
0
0
2
0
0
8
0
0
44
0
0
0
0
25
0
0
0
4
0
0
9
0
0
0
0
0
0
0
17
0
0
0
0
0
0
0
4
1
                           104

-------
TABLE C-2.   PERCENTAGE OF STATES'  POPULATIONS IN COUNTIES
                WITH ONE OR MORK MONITORING STATIONS

                    1973 NITRATES
State
AL
AZ
AR
CA
CO
CT
DE
FL
GA
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MX
NB
NV
NH
NJ
NM
NY
NC
. ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Total US
at least
1
9
97
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
44
0
59
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
at least
2
0
86
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
11
0
31
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
at least
3
0
81
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
0
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
at least
5
0
75
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
at least
10
0
55
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
9
                          105

-------
 CORRELATION AND REGRESSION STUDIES:  TSP-NEDS EMISSIONS

 BACKGROUND

     Results of the survey of monitoring activity show that 79% of the
U.S. population (1970 Census  data)  resided in counties having at least
one station monitoring TSP in 1973.  Even if all stations reported valid
data there were nevertheless  more than 42 million Americans whose
resident counties were not being monitored for TSP.   Instrumental tech-
niques for monitoring this pollutant are well-established and stable so
that one would expect that the ambient monitoring data are reliable.  It
was therefore decided to undertake a study to investigate possible
correlations between TSP and  various emissions and meteorological parameters
on a county basis.  The results of these studies were used to augment the
air quality data for counties with no monitoring for TSP but which had
emmissions for TSP and NO  lying within the range of the regression analysis.
                         X
METHODOLOGY AND RESULTS
     A file of mean annual ambient air quality for all stations reporting
valid TSP data to SAROAD in 1973 was assembled.   Although exploratory
studies were made using all such data,  the final work reported here was
performed using only data from stations sited to provide ambient air quality
measurements for the resident population.  Table C-3 lists the numbers, of
counties in various categories referring to level of monitoring activity
in those individual counties.  Thus the entry '114'  under category  'N>4'
means that a total of 114 counties had 4 or more stations reporting valid
yearly SAROAD readings for TSP.  The 'N>1' category contains the entire
data set.

     The division of valid stations into these categories was made for the
following reasons:
     1.   It was intended to  develop a regression model relating mean annual
TSP ambient air quality to NEDS emission values independently for data in
a number of these categories.  The detailed forms of the models thus ob-
tained could be compared and   the model giving the best "over all' fit
selected.  It is, of course,  essential that the functional form of the
                               106

-------
 regression model does not change as more or fewer stations, selected on
 some rational basis, are included in the data set.

     2.   The model selected as a result of the above considerations must
 then be tested against a known data set.  This data set could come only
 from the valid SAROAD readings.  The procedure followed was to derive the
 model from a comparison of results in the N>4, N>6, N>7, N>10 subgroups
 and then to compare the predicted and measured values for the entire 605
 point data set.

     3.   It is of some interest to determine the maximum correlation co-
 efficient obtainable as the N categories are scanned since this result has
 implications for determining how representative a given station's monitoring
 is of emissions in the county.

     For each county reporting some valid yearly data to SAROAD, both the
 geometric and arithmetic means of the annual average for all stations reporting
 valid data within the county were computed.  Since air pollution data tends
 to be lognormally distributed, the geometric means of the annual averages
 will also be lognormally distributed if one assumes that the ambient air
 quality annual averages at stations within a county are independently
 distributed.  Although this is not likely to be a good assumption, it was
 felt important to study correlations of the geometric means as well as the
 arithmetic means because of the simplicity of the resulting distribution.
 Actually, the results of the study showed no significant difference in the
 structure of the models resulting from the use of either of these alternative
 averaging procedures.  The magnitudes of the correlation coefficients were
 likewise similar.

     Standard correlation and regression analyses were run for a variety
 of empirical models using the SAS (Statistical Analysis System) available
 on EPA's UNIVAC 1110.  After experience with numerous trials the following
requirements for an acceptable model were adopted:
     1.   A correlation coefficient, R, of 0.7 or greater,
     2.   Parsimony and simplicity of the resulting model,
     3.   Stability of model parameters with respect to variation of
          the N category of the data set.

                                107

-------
Table C-4 shows the maximum correlation values obtained for the four N
categories used as a function of the number of parameters  used in the
regression model.   The possible parameters in the model are listed at the
top of the table and the resulting parameter set which maximized R are
listed in the table proper.   Note that 'TSP', which is the NEDS total TSP
county emission value, makes the major contribution to explaining the data
variance in all cases.  Significant contributions are made also by high
powers of NO  and HC NEDS county total emissions.  It should also be
noted that the maximum value for R obtainable with this parameter set
decreases tnonotonically with the value of the N category.

     The model chosen as giving the best 'overall fit' as  a result of many
trials and comparison of maximum and near maximum values for R was a linear
                                                    4           5
regression model having NEDS county total TSP, (NO ) , and (NO )  as the
                                                  X           X
independent variables.

     Table C-5 gives a comparison between the maximum correlation coeffi-
cients from Table 2 for a three parameter model and the R values obtained
from the chosen theory for the four N categories used.  Except for the
N>10 categories differences in R are in the third decimal  place.

     Table C-6 gives a similar comparison, for the N>7 category between the
chosen model and a sampling of other 'simple' models.  Here again the
differences are small.

     While it is our judgement that the model chosen gives the best overall
fit to the data in the N>4,  N>6, N>7 and N>10 categories the differences in
fit between it and alternative models are small enough so  that it does not
seem appropriate to perform any attempts at theoretically  justifying the
resulting functional form.  Rather the model should be viewed as a purely
empirical means of calculating TSP air quality in counties for which data
is missing or inadequate.  In this regard, it should be noted that an
attempt was made to impose the theoretically justifiable functional
     TSP
form     (where U is the county mi
model with disappointing results.
     TSP
form     (where U is the county mean annual wind speed)  on the regression
                                108

-------
     While the model should be viewed as purely empirical, it is never-
theless important to ensure that the choice of high powers of the NO
                                                                    X
NEDS county total emissions is not the result of a few outlying data
points.  Accordingly, 5 points of the 58 point data set for the N>7
category were chosen for removal from the data set.  The choice was made
as follows:

     1.   A single parameter model using only NEDS county total TSP
emissions was used to predict air quality in the 58 counties of interest.
The 5 counties giving the worst agreement with measured averages were then
removed from the data set and the resulting 53 point data set was re-
analyzed.  Table C-6 gives the result of this analysis by showing a comparison
between the 1st and 2nd best values for R for the two data sets along with
the resulting variable sets.  It can be seen that the model chosen holds up
quite well, differences in R being mainly in the third decimal place.

     Having chosen the model, we calculated the TSP ambient air quality for
all 605 counties for which valid data was available using regression co-
efficients derived from the N>7 category data.  The ratio of measured to
calculated.values were then computed and the results plotted on the
frequency graph Figure C-2.   In this figure, the computed ratios appear as
the discontinuous curves labelled 'data' while the smooth curve is a fitted
normal distribution.  It can be seen from this figure that the empirical
theory developed predicts the TSP yearly average ambient air quality within
about ±37% for a 67% confidence interval.  The population variance 02
and its associated 95% confidence interval are o^ = .076 +.009.

     Part of the variance of the prediction is attributable to an "inherent*
variability in the data which results from the fact that within a given
county measured air quality at monitoring stations itself varies.  Figure C-3
gives a measure of this variability.  The ratio of individual station data
within a county to geometric mean for all stations within that county were
computed and plotted on this figure.  The variance of the resulting fitted
normal curve and its associated 95% confidence interval is o  = .056 ± .004.
                                109

-------
Figure C-4 shows the two fitted normal curves from Figures C-2 and C-3 plotted
to the same scale and superimposed.   Comparison of o^ and o^ indicates
that not much improvement in the predictive value of the regression model
can be obtained.  We are, however,  currently investigating the use of
exp(aNO  ) as a possible parameter primarily because of the simplicity of its
      X
functional form.

     Figure OS shows the variation  of the square of the correlation co-
            2
efficient, R ,  for the chosen model  with the number of points included in
the correlation data set.  The numbers on the curve indicate the corresponding
N category.  Provided that errors in NEDS emission data show no systematic
correlation with N category (which seems a reasonable assumption) the
                       2
monotonic decrease of R  with N value indicates that representativeness of
population based monitoring requires something in the neighborhood of 7 or
more stations per county with present siting strategies.  Thus, for example,
if one were constrained to a fixed number of monitoring stations nationwide
and/or if one were interested in reducing the amount of monitoring with
little or no reduction in the overall quality of monitoring data, the
results presented here are highly suggestive of a strategy which would
consist of eliminating monitoring in most or all counties having only a few
stations monitoring population ambients and instead concentrating one's
effort to place at least seven stations in a smaller number of counties
representative of a wide range of NEDS emissions for TSP and NO  nationwide.
                                                               X
     In terms of the primary aim of  this contract, i.e., to develop a data
base including air quality measures  for all counties in the contiguous
United States,  the regression model  presented in this report would seem
quite adequate for filling in some missing TSP data.

            TABLE C-3.  CUMULATIVE FREQUENCY TABLES FOR VALID 1973
                        SAROAD POPULATION BASED MONITORING OF TSP

Category N:  Number of  Stations in County Reporting Valid Population
             Based TSP Data to SAROAD in 1973
                  "  N>10 N>9 N>8 N>7 N>6 N>5 N>4 N>3 N>2 N>1
Total Number of
Stations in Given     27   33  42  58  71  87 114 170 266 605
'N1 category
(number of points
in given data set)

                                  110

-------
        TABLE C-4.   MAXIMUM CORRELATION VALUES,  R, FOR A SAMPLING
                    OF DIFFERENT MODELS INVESTIGATED
Possible Emission Variables in Model: (TSP)1; (S02)J; (N0x)k; (HC/;  (CO)™
     where i, j, k, £, m = 0, 1, 2, 3, 4, 5

             Number of
             Parameters                                        Resulting Correlation
N Category    Needed      Variables in model which maximize R  Coefficient T
1
2
N>4

4
1
2
N^fi
3
4
1
2
3

4
1
2
^
3
4
TSP
TSP,
TSP,

TSP,
TSP
TSP,

TSP,
TSP,
TSP
TSP,
TSP,

TSP,
TSP
(TSP)

(TSP)
(TSP)

(N0x)5
(NO )4, (NO )5
x , x _
(NO ) , (NO ) ,CO
X •"•

(HC)5
? 5
(NO r (HOD
*5 ^
(NO ) , (HC) , CO

(HC)5
(NO )2, HC5
2 2 3
(S02)Z, (N0xr, (HC)J

2, (NO )5
li X S
\ S02, (N0x)5
5, SO., (N0v)4, (NO^)5
.43
.46
.49

.51
.50
.55

.59
.61
.60
.66
.71

.73
.67
.78

.84
.84
                                  111

-------
TABLE C-5.  SAMPLE COMPARISON BETWEEN MAXIMUM CORRELATION
            COEFFICIENTS R AND THOSE OBTAINED FROM MODEL CHOSEN
            AS GIVING BEST OVERALL FIT (THREE PARAMETER THEORY)
           Chosen model:  TSP;
NO4; NO5
  x'   x

N Category
N>4

N>6

N>7

N>10

Variables
TSP,

TSP,

TSP,

TSP4,

which
NO 4,
x
NO 2,
x
NO 2,
x
so2,

maximize R
NO 5
x
HC5

HC5

NO 5
x

Maximum R
.49

.59

.71

.84
Number of points
in data set
114

71

58

27
R for Model
Chosen
.49

.59

.71

.77
                         112

-------
TABLE C-6.  SAMPLE COMPARISON BETWEEN CHOSEN MODEL AND
            OTHER 'SIMPLE' MODELS (N>7)


3 parameter



2 parameter
Variables
TSP, NO 4, NO 5
x x
TSP NO 3, NO 5
x x
TSP, NO 3, NO 4
x x
TSP, NO 2, e x
' x '
TSP, NO 5
TSP, NO 4
x
NO
TSP, e x
R
.71
.70
.70
.70
.64
.63
.63
                                                Chosen model
                       113

-------
TABLE C-7.   EFFECT ON REGRESSION MODEL OF REMOVING 5  'OUTLIER1  POINTS
            FROM 58 POINT DATA SET BASED ON SINGLE PARAMETER THEORY
53 Point Data Set
Number of
Parameters
1
1
2
2
3
3
1st & 2nd Best
Variable Set
TSP
NO
X
TSP, (NOx)A
TSP, (NO )5
X
TSP, (N0x)2, (N0x)3
TSP, (NO )4, (NO )5
A X
R
.626
.628
.653
.653
.654
.653
Original
58 Point Data Set
Number of
Parameters
1
1
2
2
3
3
1st & 2nd Best
Variable Set
TSP
(NO )2
A
TSP, (NO )5
TSP, (NO )4
A.
TSP, (N0x)4, (N0x)5
TSP, (NO )3, (NO )5
A X
R
.598
.396
.642
.643
.707
.705
                                  114

-------
    30
    60
Q
<
O
o:
LL
  O
TJ
O

O
 o
 cc
 u-
 o
 >.
 o
 c.
 0)
 73
 cr
 0)
   20
                         DATA	&
        (COSpoints total in data set)
                                              FITTED NORMAL CURVE
                                                 Icr (sample)
                                                        2a- (sample)
                                   SAMPLE
                                   MEAN
               .4
                         Ratio
                                    I                    2.5

                                 measured (SAROAD)
                               IBP calculated (present model)

    FIG. c-2.  Fit of Present Model  with all Valid 1973   SAROAD

             Population  Based  Monitoring
                                 115

-------
  200
   150
D
Q:
o
c:
0>
rj
Or
(U
   100
   50
                        DATA
                                           '-•FITTED NORMAL CURVE
                                                   lo-
               .4
2.5
                           TSP Measured fSAROAD) for IndividualJStc.tion
                      HOT io
                           Geometric Mean ior All Stations in County

   FIG, c-3.   'inherent1  Variability of SAROAD  Data for  Intracounty

             Population  Based  Monitoring
                                      116

-------
                                Ratio
FIG. c-4.  Comparison  of  Fitted  Normal  Curve for  Predicted  Air
         Quality and  ' Inherent'Variability
                                117

-------
                CO
                G
                O
               •H
                CO
                CO
                     en
                     C
                     o
                     •rl
                     CO
                     CD
                     •H

                     §
             co
             fi
             o
            •rt
             CO
             CO
 CO
 c
 o
•H
 CO
 CO
•H
                                                   ca
                                                   a
                                                   o
                                                  •H
                                                   co
                                                   CO
                                                                                                                                           O
                                                                                                                                           O
 CU
 N
 cu
 CO
&
 4J
 •H
 >

CM
M-l
 O


 0
 O
 n)
 •H
 M
IT)

 I
 00
•rl
               CO
               H
 CO
 cu
!-|
43
 n)
•H
 VJ
               c
               cu
               a.
               cu
 4-1
•H
               •H
               CO
              X
                              CL,   PL,
                              CO   X
                              H   W
                              a
                              0)

                              •a
                              cu
                              (X
                              cu
                              13
                              G
                              •H
                              CU
                              a
                              •H
                              rt
                              4-1
                             o
                                                                                                                                                          O
                                                                                                                                                          O
                                                                                                                                                          o
                                                                                                                                                          o
                                                                                                                                           o
                                                                                                                                           o
                                                                                                                                                         O
                                                                                                                                                         O
                                                                                                                                                         CO
                                                                                                                                                                   W
                                                                                                                                                                  O

                                                                                                                                                                   c
                                                                                                                                                                   o
                                                                                                                                                                  •H
                                                                                                                                                                  l-i
                                                                                                                                                                  O
                                                                                                                                                                 u
                                                                                                                      en
                                                                                                                      4J
                                                                                                                      c
                                                                                                                      •H
                                                                                                                      o
                                                                                                                      (X,

                                                                                                                      14-1
                                                                                                                      o

                                                                                                                      1-1
                                                                                                                      01

                                                                                                                      I

                                                                                                                      ts

                                                                                                                      111
                                                                                                                      p
                                                                                                                                                         o
                                                                                                                                                         o
                                                                                                                                                         CM
                                                                                                                                                         o
                                                                                                                                                         o
                                                                                             -  zl
                                                                  118

-------
 CORRELATION AND REGRESSION  STUDIES:   SO  -  GROUND DEPOSITION

      A rigorous determination  of  population-at-risk with  respect  to
 air polution exposure  requires  a  prohibitive  level of  air monitoring to
 account for local variations.  Many counties lack  the  financial resources
 and  technical expertise to  successfully operate a  sufficient number  of
 monitoring  stations.   Therefore,  the population-at-risk study inevitably
 involves some extrapolation of a  limited air quality data set.  The  errors
 introduced  by the extrapolation represent a limitation to the accuracy  of
 the  risk estimates.  Selected factors which could  conceivably introduce
 local variations in air quality are discussed below followed by an attempt
 to incorporate one of  the dominant factors into the statistical models used
 to predict  air pollution exposure.

 POLLUTANT REACTIVITY AND ATMOSPHERIC "CLEANSING" PROCESSES

     Analyses discussed above consider only the SO  emission density and
 the  general  source characterization (point and area).   While these are
 probably the dominant factors influencing the local SO  levels on an
 annual average, there are a variety of additional factors which can affect
 observed air quality.   A partial list is as follows:

     -  Ground Deposition Processes
     -  Regional Transport (Inter-county dispersion)
     -  Atmospheric reactivity (smog chemistry) of S0_
     -  SO  Washout and Rainout
     -  Plume reactivity of SO
     -  Type of atmospheric monitors
     -  Monitoring site characteristics
     -  Monitor operational characteristics
     -  Micrometeorological factors

The latter group of factors has a potentially large impact on observed
air quality; however,  evaluation requires a time-consuming,  comprehensive
study of local conditions  which is beyond the scope of  the present project.
Therefore, attempts to improve the prediction capabilities of the models
                                   119

-------
are limited to consideration of atmospheric reactivity and pollutant
removal processes.
     SO
         reactivity in large point source plumes has been observed to
range from 2 to 15 percent per hour depending on meteorological conditions
and measuring technique.   On an assumed average basis this does not
appear to be a major factor; particularly since area sources of SO  are
dominant in most regions.  Area sources emit SO  at levels subject to
surface air flow disturbances and therefore, experience less defined
plumes.

     Washout and rainout are one of the two major atmospheric removal
processes; however, conditions are relatively similar in a given region
of the county to adjacent counties in the same general region.

     Atmospheric reactivity, regional transport and ground deposition
processes appear to be the dominant factors potentially introducing local
variation in SO., levels.   The relative importance of these has  been eval-
               ^             1
uated by Richards and Gerstle  using a Continous Stirred Tank Reactor (CSTR)
model.  This approach involves CSTR equations for a first order reaction
process  in conjunction with NEDS emission data and SAROAD ambient SO  data
in a 10  state northwestern region.   Comparison of the predicted SO  annual
average  concentration at various decay rates with observed concentrations
are used to identify the prevailing SO  decay rate and the SO  oxidation
rate.  By dividing the 10 state west coast region into four approximately
equal regions, it was possible to roughly calculate the extent  of regional
transport.

     The results of the CSTR calculations suggest that atmospheric reactivity
is relatively insignificant compared with ground deposition and other
"cleansing processes."  Most SO  appears to be removed prior to atmospheric
oxidation.   Due to the rapid rate of SO  removal (half-life of  less than
12 hours) regional transport of S02 also appears to be less important
than ground deposition.  Based on these rough analyses, the scope of the
remainder of this section is restricted to ground deposition processes.
                                   120

-------
GROUND DEPOSITION

     Ground deposition processes have gained recognition in the last
3 to 5 years as a major sink for SO  and other atmospheric pollutants.
Studies done in England in regions dominated by grass fields or wheat
crops indicate that over 40 to 60 percent of emitted S02 could ultimately
                                                     O Q /
be removed in one of the ground deposition processes. ' '   Recent work
in the RAPS Study Region (Metropolitan St.  Louis) has also indicated strong
ground deposition.    While these processes have not been adequately
studied it is logical to assume that differential ground deposition char-
acteristics are responsible for some of the observed variation in air
quality.

     There are at least three distinct mass transfer mechanisms collect-
ively described on ground deposition:   1)  adsorption on soil and
vegetation surfaces,  2) absorption in  water droplets and layers, and
3)  physiological uptake in vegetation.

     1.   The absorption process is a potentially reversible  operation
         which is independent of plant characteristics.   Absorption occurs
         only during  periods  when water droplets  are on  the  leaf surfaces.
         This would primarily be restricted to  night times when the temp-
         erature drops  below  the dew point  or during periods of rain.
         The absorption may or may not be irreversible depending on the
         fate of the  droplets.   It is  logical to assume  that pollutants
         absorbed in  dew are  released  upon  evaporation of  the  droplets.
         SO  collected  during the night could be  released  during the warm
         late morning periods by this  process.

         There are  several factors which probably inhibit  mass  transfer to
         the water  droplets.   These droplets  are  present  only  during periods
         generally  characterized by poor mixing,  therefore,  the diffusional
         boundary layer in the gas phase is at  a maximum.  Furthermore,
         the droplets are  stagnant,  therefore,  capacity  for  SO   due to the
         rise in pH which  results from absorption of SO   and CO .   This
         pH limit is  particularly important for rain droplets which have
                                   121

-------
    an initial pH of - 3.5 to 4.5.

    Absorption processes could be either controlled by the gas phase
    diffusional resistivity or the gas liquid solubility equili-
    brium.  This depends primarily on the presence of alkaline dust
    on vegetation surfaces, the pH of rain in the region of interest
    and the quantity of water on the surfaces.   If it is diffusion
    controlled, the rate is proportional to the 1% power of the
    absolute temperature and to the % power of  windspeed.   The
    solubility step, however, is inversely proportional to temperature.

2.  Adsorption is a reversible process which is directly proportional
    to the available surface area.  Water vapor and CO  probably
    compete with SO,, and other pollutants for the active sites, there-
    fore,  such gases influence the surface area relationship.
    Adsorption involves a gas phase diffusion step which limits the
    overall rate of mass transfer.  According to standard mass
    transfer equations, this process should be  directly proportional
    to the 1-2 power of the absolute temperature and to the ^ power
    of the average windspeed.  There should be  only a weak dependence
    on the type of plants,  however, the total available surface area
    would be important.

3.  The physiological uptake process is inherently irreversible and
    highly dependent on plant characteristics.   The process is more
    complicated with respect to mass transfer and is potentially more
    important than the surface adsorption/absorption mechanisms.  The
    pollutant gases must traverse a diffusional boundary layer and a
    relatively stagnant stomatal cavity.   Upon  contact with a cell,
    the molecule is adsorbed on the cell wall prior to diffusion
    through the cell membrane.  It is not presently known which
    diffusional mass transfer operation controls the uptake rate.  The
    overall mechanism should be directly proportional to temperature
    within the normal tolerance limits of the plant.

    The opportunity for physiological uptake is limited to daylight
                              122

-------
         hours  during which photosyntheses is occurring.  Certain  defensive
         mechanisms could reduce this uptake since both SO  and CO could
         eventually stimulate closure of the stomata.  Water stress would
         also keep stomata closed to reduce transpiration losses.  Physio-
         logical uptake is directly proportional to the leaf surface area.
         Unlike the surface removal processes, physiological uptake is
         favored during the warm months.

     Physiological uptake introduces a seasonal and diurnal cycle which
conceivably influences dose-rate.  While this is not of direct interest in
the present study which involves annual average data, the dose-rate of SO
could be two to four times lower in daytime periods in warm months compared
with night time or winter periods.

ANALYSIS OF GROUND DEPOSITION PARAMETERS

     Ground deposition processes conceivably exert a strong influence of
annual average SO  concentrations.   Therefore,  an attempt has  been made
to incorporate ground deposition parameters into the statistical model
used to predict SO  exposure in those counties lacking adequate monitoring
equipment.   The mass  transfer analyses  discussed earlier indicate that the
most useful parameters include the  following:

     -  vegetation surface area
     -  vegetation type
        temperature to 1^ or 2 power
     -  windspeed to  % power

The latter two factors introduce a  seasonal and diurnal cycle  not of direct
interest to the present study.   The vegetation factor,  however, should be
useful in approximating the  influence of ground deposition processes.

     A limited analysis has  been performed using the Statistical Analysis
System (SAS)  computational package  in order to evaluate the impact of the
vegetation parameters  on the predictive capabilities of the ambient SO-
emmission inventory models.   The scope  was limited to North Carolina
counties due to the availability of land use/vegetation distribution data.
                                   123

-------
These data included a county by county tabulation of acreage devoted to
crop land, pastures, forests and "other" uses in 1958 and 1967.  The
similarities of the 1958 and 1967 data sets indicated that land use is
relatively stable and therefore, probably representative of 1975
conditions.  The SO  ambient concentration data for 1975 was obtained from
the SAROAD data files.   Several sites reporting extreme SO  levels were
not representative of general conditions in North Carolina.  Furthermore,
only sites located in residential areas were considered.  The final data
set and descriptive statistics are presented in Table C-8.  It is apparent
that there is a very small range in reported SO  levels compared with
most industrial regions.  This will complicate subsequent analyses.

     Analyses were done using both Spearman Correlation Tests and Multiple
Regression Models.  The former were used as a screening procedure to
indicate possible linear or non-linear relationships.   The multiple
regression models generally included various emission parameters and land
use parameters.   Two sets of weighting factors were used to approximate
the expected influence  of vegetation surface area.   These factors, shown
in Table C-9, represent minimum and maximum conditions.
                                   124

-------
  TABLE C-8-a.   COUNTY  EMISSION  DATA AND LAND USE CHARACTERISTICS
                         FOR NORTH CAROLINA
         SO  emissions, T/yr
                                                 Land use, acres
County
  ID      Total
   80
  480
  500
  520
  540
  660
  720
  780
  840
  940
  960
 1140
 1300
 1480
 1560
 1780
 1840
 1860
 1940
 2060
 2120
 2280
 2360
 2480
 2580
 2680
 2880
 2980
 3000
 3080
 3180
 3260
 3320
 3380
 3420
 3460
 3500
 3600
 3900
 3980
 4080
 4120
4160
4280
4360
4500
                Point   Area
348
22598
1008
2972
995
1486
23230
520
1043
3525
2885
1023
1306
5509
93756
6999
1224
10448
261
1857
1042
552
567
2760
3467
569
38902
1179
1804
598
1254
1507
514
6318
18455
26252
13563
913
2039
244
726
797
3332
20315
956
603
32
21008
115
1778
237
144
22717
402
48
2928
1113
665
772
2485
91856
3129
764
9812
14
746
216
159
143
1793
222
101
36717
297
1282
319
548
364
59
5458
17394
25008
12307
617
1460
14
102
303
1151
19447
261
49
316
1590
893
1194
758
1342
513
118
995
597
1772
368
534
3024
1900
3870
460
636
247
1111
826
393
424
967
3245
468
2185
882
522
279
706
1143
455
860
1061
1244
1256
296
579
230
624
494
2181
868
695
554
  S02
 ug/m3

 6.04
 5.02
 7.41
 6.01
 5.82
 6.54
 6.04
 5.18
 5.15
 5.18
 7.46
 5.07
 7.59
 6.72
 5.05
15.46
 6.25
 5.26
   03
   71
   62
   60
 5.57
 5.02
 9.59
 6.35
 6.44
 5.11
 6.27
 5.08
 5.35
 5.43
 6. 34
 6.17
 5.59
 8.23
 4.99
 6.84
 6.17
 5.04
 5.97
 6.57
 8.66
 5.25
 4.99
4.99 ;
                                      Crop land   Pastures    Other   Forest
  36146
  36000
  30166
  60948
  25971
  65547
  54010
  36312
 101213
  71972
  93068
 166617
 136907
  69234
  37944
 101666
 117719
  19641
  56647
 108155
 211024
  30700
  64999
  13169
  61507
  51478
  6330
  58391
  49025
  54089
 158692
 111759
  49056
 224569
  83382
  93202
  64288
  50529
 82260
  7128
136684
 44118
107941
146664
107642
 14884
 22884
 41985
 13656
 28524
 22565
 20540
 29813
  2200
 35932
  4772
 12566
  9534
 13178
 20489
 18465
 30235
 11617
 40544
  3015
 63247
 11500
  7057
 17558
 10399
 24000
 11278
   906
  1750
 25116
  3000
 13622
 21218
 12591
 17957
 15217
 35938
 18000
 7293
 35035
 4975
 50960
 5955
 26277
 9415
 2637
22109
  4395
  9685
 19567
  8387
 10854
 16616
  9916
  7231
  8425
 18448
 12816
 10574
 10237
 21000
 11273
 17025
 12546
  8367
 10031
  9526
 16500
  4100
  8149
  7396
 16155
 13871
 10032
 23362
 10828
 4889
 22049
 17893
 10204
 11762
 15222
 12801
 12521
 6091
 7948
 2150
15837
 6929
24679
21623
 5219
 4192
  94363
 251800
 199110
  99487
 175150
 123675
 343600
  64657
 127500
 287057
 224528
 325820
 152700
 115037
 111636
 192300
 232400
 141402
 157727
 166372
 249200
 111292
  91411
 177516
 156100
 351561
  67880
 300716
 154500
  78228
 191637
 296562
 217554
 317300
 233898
 150420
 248121
 99477
 113252
 84203
 185200
 85578
322521
159382
110000
120705
                               125

-------
                TABLE C-8-b.  DESCRIPTIVE  STATISTICS  FOR MODEL
Variable
 Number of                                         Standard
Observations     Mean     Minimum     Maximum      deviation
Ambient SO,
(yg/m3)
                    46
                 6.30
 4.99
 15.46
 1.763
Total S02           46
emmissions
(103T/yr)

Point source        46
S02 emissions
(103T/yr)

Area source         46
SOo emmissions
(103T/yr)

Crop land           46
(103 acres)

Pastures            46
(103 acres)

"Other" land        46
(103 acres)

Forests             46
(103 acres)
                 7.22
                 6.23
                 0.99
                76.29
                18.64
                11.94
               179.58
 0.24
 0.01
 0. 12
 6.33
 0.91
 2.15
64.66
 93.76
 91.86
  3.87
224.57
 63.25
 24.68
351.56
15.56
15.33
 0.81
50.63
13.70
 5.57
81.77
                                   126

-------
            TABLE  C-9.  VEGETATION SURFACE AREA WEIGHTING FACTORS

Land use  category                                      Weighting factors
                                                       Set A       Set B
Pasture                                                  1            1
Crop land                                                2            5
Other                                                    3           25
Forest                                                   4          100

     The results of the statistical tests are presented in Tables C-10
and C-ll.   Correlations with a level of confidence greater than 90 percent
are marked with an asterisk.  It is apparent that the area source SO
emmissions  are the dominant influence on ambient SO  levels.   This
observation is consistent with most emission impact studies.

     The land use parameters did not improve the predictive capability
of models using area source SO  emissions data.   Note that all Spearman
coefficients of land use parameters were equal to or lower than the comparable
model based solely on area source emissions.  Similar results are apparent
in the linear and multiple regression models.  A multiple regression model
using appropriate powers of the area source emissions yields  a modest
increase in the correlation coefficient (0.44 improved to 0.71).   However,
the slight improvement does not warrant the considerable effort involved
in compiling the land use (vegetation surface area) parameter.

     While the consideration of vegetation characteristics should improve
the SO  emission ambient SO  models, the techniques evaluated in this
section were unsuccessful. It is felt that this  could be due  to the small
range of ambient SO  concentrations  or a too simplistic approach to the
ground  deposition process.
                                   127

-------
                   TABLE C-10.   SPEAEMAN CORRELATIONS:
           ANNUAL AVERAGE SO  CONCENTRATION AND VARIOUS EMISSION
                          AND LAND USE PARAMETERS
Parameter
 Spearman
coefficient
Prob > F
Total S0? emmissions
Point source SO,., emissions
Area source SO™ emissions
Total SO- emissions/county area
Area SO™ emissions/county area
Crop land
Pasture land
"Other" land
Forest
                           o
Area source emissions/Set A
Area source emissions/Set B
Area source + 10% point source
  emissions/Set A
Area source + 10% point source
  emissions/Set B
   0.131
   0.064
   0.253
   0.096
   0.149
   0.160
   0.259
   0.269
   0.016
   0.252
   0.233
   0.187

   0.184
  0.61
  0.67
  0.09
  0.53
  0.67
  0.28
  0.08
  0.07
  0.91
  0.09
  0.11
  0.21

  0.22
  Set A and Set B denote the two sets  of weighting factors  for
  land use categories as described in  Table 1-2.
                                   128

-------
                 TABLE C-ll,
          ANNUAL AVERAGE SO,
                            t

Independent variables

Total So_ emissions
Point source S0« emissions
Area source S0~ emissions
Crop land
Pasture land
"Other" land
Forests
Total emissions/area
Area source emissions/area
(Area source emissions)
                       2
(Area source emissions)
                       3
(Area source emissions)
Log (area source emissions)
Area source - Total
  Area sources
                2
  (Area sources)
  (Area sources)2
  (Area sources)
  Log (area sources)
 MULTIPLE REGRESSION MODELS:
CONCENTRATION AS DEPENDENT VARIABLE
Correlation
coefficient
(R2)
0.004
0.010
0.439
0.0090
0.0610
0.0582
0.0004
0.0019
0. 1200
0.339
0.581
0.652
0.234
0.717
-
-
-
-
_

Prob > F
0.67
0.50
0.0001
0.53
0.10
0.11
0.88
0.76
0.02
o.oooia
o.oooia
o.oooi3
0.0007a
0.0001
0.0001
0.0001
0.0151
0.4365
0.0257
                                  129

-------
                                REFERENCES
1.   Richards, J.  R.  and Richard Gerstle.   Stationary Source Control Aspects
    of Ambient Sulfates:   A Data Base Assessment:   Office of Research
    and Development,  U.S.  Environmental Protection Agency,  Research
    Triangle Park, N.C.  157 pages.   February 1976.


2.   Garland, J.  A.  Deposition of Gaseous Sulphur  Dioxide to the Ground.
    In:  Atmospheric Environmental.   Great Britain.   8:75-79,  July 1973.


3.   Shepherd, J.  G.   Measurements of the Direct Deposition  of  Sulphur
    Dioxide Onto Grass and Water by  the Profile Method.   In:  Atmospheric
    Environmental.  Great  Britain.   8:69-74,  July  1973.


4.   Garland, J.  A.,  W. S.  Clough, and D.  Fowler.  Deposition of Sulphur
    Dioxide on Grass.   In:  Nature.   242:256-257,  March 23, 1973.


5.   Wilson, W.   Personal  Communication, February 1975.


6.   North Carolina Conservation Needs Inventory.  N.C.  Department of
    Agriculture.   December 1971.
                                   130

-------
                                   TECHNICAL REPORT DATA
                            (Please read Instructions on the reverse before completing)
 1. REPORT NO.
   EPA-600/1-78-051
                                                            3. RECIPIENT'S ACCESSION-NO.
 4. TITLE AND SUBTITLE
   Population at Risk  to
   Data Base "Popatrisk"
Various Air Pollution  Exposures
5. REPORT DATE
  June 1978
                                  6. PERFORMING ORGANIZATION CODE
 7. AUTHOR(S)
   Sandor J. Freedman
   Joseph D. Wilson
                                                            8. PERFORMING ORGANIZATION REPORT NO.
  Elsa Lewis-Heise
  Albert V. Hardy
9. PERFORMING ORGANIZATION NAME AND ADDRESS
   System Sciences, Inc.
   P. 0. Box 2345
   Chapel Hill, North  Carolina   27514'
                                                            1O. PROGRAM ELEMENT NO.
                                    .1AA601
                                  11. CONTRACT/GRANT NO.

                                    'Contract No. 68-02-2269
 12. SPONSORING AGENCY NAME AND ADDRESS
   Health Effects Research Laboratory         RTP,NC
   Office of Research and Development
   U.S.  Environmental Protection Agency
   Research Triangle Park, North Carolina  27711
                                  ra..TYP.E OF REPORT AND PERIODJCOVEREO.
                                   Final report covering  Oct.1975
                                        1Q77	       	
                                  14. SPONSORING AGENCY CODE

                                    EPA finn/n
 15. SUPPLEMENTARY NOTES
 16. ABSTRACT
            The work  reported herein was undertaken  to  provide the EPA with a  user-
       oriented data  base containing recent county-based information, for all
       counties in  the contiguous United States,  on  population demographics,
       population mobility, climatology, emissions,  air quality, and age-adjusted
       death rates.

            The completed data base, called "POPATRISK," contains approximately
       27.5 million characters and is in SYSTEM 2000, Version 2.80 format, facili-
       tating access  with minimal user computer training.   Population demographics
       are as of the  1970 Census; population mobility is described spanning the
       years 1965 to  1970 for 6 sex-race categories  in  7 age groupings for both
       "in" and "out" migrants" climatology information contains county summaries
       of temperature, precipitation and hours of sunshine;  county point and area
       source emission estimates are provided for 5  criteria pollutants--TSP,  S02,
       N02, CO, and Ozone--based on the NEDS-USER file; air  quality information  is
       based on 1974  data contained in SAROAD; age-adjusted  death rates were computed
       for the combined years 1969, 1970, and 1971 for  4 sex-race categories in
       50 groupings of ICDA categories (8th revision).
 7.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                     b.lDENTIFIERS/OPEN ENDED TERMS  C. COSATI Field/Group
  data file
  air quality
  demography
  population
  mortality
                     Population at risk
               05A
               09B
               06 F
 8. DISTRIBUTION STATEMENT
  Release to Public
                     19. SECURITY CLASS (ThisReport)

                       Unclassified
              21. NO. OF PAGES
               140
                                              20. SECURITY CLASS (Thispage)

                                                 Unclassified
                                                                         22. PRICE
EPA Form 2220-1 (9-73)
                                             131

-------