EPA-600/5-77-008a May 1977 Socioeconomic Environmental Studies Series CLASSIFICATION OF AMERICAN CITIES FOR CASE STUDY ANALYSIS: VOLUME I. Summary Report Office of Monitoring and Technical Support Office of Research and Development U.S. Environmental Protection Agency Washington, D.C. 20460 ------- RESEARCH REPORTING SERIES Research reports of the Office of Research and Development, U.S. Environmental Protection Agency, have been grouped into nine series. These nine broad cate- gories were established to facilitate further development and application of en- vironmental technology. Elimination of traditional grouping was consciously planned to foster technology transfer and a maximum interface in related fields. The nine series are: 1. Environmental Health Effects Research 2. Environmental Protection Technology 3. Ecological Research 4. Environmental Monitoring 5 Socioeconomic Environmental Studies 6. Scientific and Technical Assessment Reports (STAR) 7 Interagency Energy-Environment Research and Development. 8. "Special" Reports 9. Miscellaneous Reports This report has been assigned to the SOCIOECONOMIC ENVIRONMENTAL STUDIES series. This series includes research on environmental management, economic analysis, ecological impacts, comprehensive planning and fore- casting, and analysis methodologies. Included are tools for determining varying impacts of alternative policies; analyses of environmental planning techniques at the regional, state, and local levels; and approaches to measuring environ- mental quality perceptions, as well as analysis of ecological and economic im- pacts of environmental protection measures. Such topics as urban form, industrial mix, growth policies, control, and organizational structure are discussed in terms of optimal environmental performance. These interdisciplinary studies and sys- tems analyses are presented in forms varying from quantitative relational analyses to management and policy-oriented reports. This document is available to the public through the National Technical Informa- tion Service, Springfield, Virginia 22161. ------- EPA-600/5-77-008a May 1977 CLASSIFICATION OF AMERICAN CITIES FOR CASE STUDY ANALYSIS Volume I Summary Report by Elizabeth Lake Carol Blair James Hudson Richard Tabors Urban Systems Research & Engineering Inc. Cambridge, Massachusetts 02138 Contract No. 68-01-3299 Project Officer Samuel Ratick Office of Monitoring and Technical Support Washington, DC 20460 OFFICE OF MONITORING AND TECHNICAL SUPPORT OFFICE OF RESEARCH AND DEVELOPMENT U.S. ENVIRONMENTAL PROTECTION AGENCY WASHINGTON, DC 20460 ------- DISCLAIMER This report has been reviewed by the Office of Research and Development, U.S. Environmental Protection Agency, and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Environmental Protection Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. 11 ------- ABSTRACT Attempts to analyze and evaluate the impacts of federal programs has led to the extensive use of case studies of program impacts at selected sites. This project has developed a methodology for the systematic selection of representative case study sites and for generalizing the study results. The methodology, involving two stage factor analysis and clustering, is applied to a specific program/policy problem, the selection of metropolitan areas for case studies in analyzing the impact of federal policies on general environmental quality. The methodology begins with a data base on standard metropolitan statistical areas, SMSAs, including variables related to environmental quality, urban form, and household, industrial, and government activity. It analyzes these variables through a two-stage factor analysis technique which allows heuristic consideration of the significant characteristics. Finally, it develops city clusters which group areas with similar attributes. Modal (or representative) cities are selected for each group and suggested as case study sites. These groups may be used to generalize the study results and to analyze the trans- ferrability of results between areas. The methodology is sufficiently flexible to consider a wide range of research hypotheses. iii ------- CONTENTS Abstract ill Figures vi Tables vi Executive Summary vii 1.0 Introduction 1 1.1 Research Objectives 1 1.2 Research Methodology 3 1.3 Results 5 2.0 General Classification of Cities 8 2.. 1 Research Design and Data Base 8 2.2 Stage I Factor Analysis 10 2.3 Stage II Factor Analysis 15 2.4 SMSA Groupings and Representative Cities 17 3.0 Applicability to General Environmental Research Process 22 3.1 Environmental Data Base 22 3.2 Applicability to Environmental Research 23 4.0 Other Research Purposes 25 4.1 Transportation Demonstration Project 25 4.2 Potential Applications 26 APPENDIX A: Listing of Data Variables 29 ------- TABLES Number Page 1 Major SMSA Groupings for General Environmental Purposes ix 1-1 Major SMSA Groupings for General Environmental Purposes .. ~.. 3 2-1 SCS2 - Stage I - Factor Analysis of 15 Variables 11 2-2 SCS4 - Stage I - Factors 12 2-3 SCS4 - Stage I - Interpretation of Factors 13 2-4 SCS5 - Stage I - Final Factors 13 2-5 SCS6 - Stage I - Final Factors 14 2-6 Stage II Factors 16 2-7 Groups of Similar SMSAs 19 4-1 Variables Selected for the .Transportation Classification .... 25 FIGURES Mumber 1 1-1 Causation: Environmental Quality. Boston SMSA Page viii 2 vi ------- EXECUTIVE SUMMARY The research described in this report has developed two major products, one a direct output, and the other a methodology for further analysis. The first output has been a classification of Standard Metropolitan Statistical Areas (SMSAs) based on broad measures of environmental quality and other attributes. This classification depends on a large scale data base which in- cludes 262 SMSAs, and has data on activity in the industrial, demographic, and government sectors, on the attributes of the urban form and the physical environment, and on the pollutant residuals and ambient environmental quality resulting from these activities and attributes. The second product is a methodology for developing alternative classifica- tions, oriented towards specific policy or research issues and the urban char- acteristics related to them. The methodology can be directed at issues such as the choice of sites for case studies, and demonstration projects or the transfer of results from one case study area to other cities. The data base and methodology are being maintained by EPA for further applications in case study analyses. The results of this study are focused on analytic needs involving gen- eral environmental quality or other subject areas. Environmental quality is determined by actors in the urban socioeconomic system and the physical en- vironment together. A simplification of the interrelationships is illustrated in Figure 1. Note that, at least in this crude model, polluting residuals and ambient environmental quality are entirely endogenous to the system. Be- cause these two aspects of the system are closely related to the attributes of the four other actors and because available measures of environmental quality and residuals are less numerous and less reliable than most others, the major classification scheme of this research was developed from data on the public sector, households, industry, and the physical environment, and then evaluated by comparison with data on environmental quality and residuals. Due to the fact that these four aspects are comprehensive in terms of the urban system and are not biased toward environmental quality, they may have far-reaching applications. Each box in Figure 1 is represented in the data base by a group of varia- bles called a significant characteristic set (SCS). Altogether, then, there are six SCSs which together contain approximately 200 variables. The SCSs are: 1. Ambient Environmental Quality 2. Urban Form and the Physical Environment 3. Residuals 4. Demographic Characteristics vii ------- Fiqure 1 CAUSATION; ENVIRONMENTAL QUALITY (First Order Effects) Public Sector Abatement V Residuals Urban Form and Physical Environment 7 Ambient Environmental Quality viii ------- Table 1 MAJOR^SMSA GROUPINGS FOR GENERAL ENVIRONMENTAL PURPOSES (based on total SMSA Sample) Group No. 1 2 3 4 5 No. of Cities in Group 36 46 27 48 18 Representative City Little Rock-North Little Rock , AR Lake Charles, LA Williamsport, PA Albany-Schenectady-Troy , NY Dallas, TX Cities Close to Modal City Baton Rouge, LA Corpus Christi, TX Lafayette, LA Midland, TX Montgomery , AL Odessa, TX Tyler, TX Spartanburg , NC Parkersburg-Marietta, WV-OH Davenport, IL Evansville, IN-KY Lawrence, MA Peoria, IL Appleton-Oshkosh , WI New Britain, CT Portland , OR Charlotte, NC Richmond, VA ix ------- 5. Government 6. Industry. Correlation between variables within an SCS was anticipated as well as re- lationships between SCSs. The methodology used to classify SMSAs takes advantage of these correlations to reduce the vast amounts of data through factor analysis. The factor analysis technique was applied to the data in two stages. In Stage I, a small number of factors was extracted from the data in each SCS. These factors summarized the basic dimensions of the data available to describe relevant attributes of urban areas. In the second stage, factors derived in Stage I were treated as variables. The factors derived in Stage II, then, reflect relationships both within the SCSs and between SCSs. The four factors* from Stage II which explain the greatest amount of variance in the data base were taken to characterize the SMSAs for purposes of classification. As the factors generated are linear combinations of the original variables, it is possible to estimate scores for each observation on each factor. These pro- vide the location of each SMSA in the four-dimensional factor space derived in Stage II. (The 262 SMSAs have been classified by applying a simple "nearest neighbor" clustering technique to these factor scores.) Five major city groups were developed, including 175 of the 262 SMSAs. (See Table 1) These groups were tested with respect to their ability to discriminate between cities with different levels of environmental quality. A series of t- and F-tests (statistical comparisons of means) were performed using a select set of environmental quality measures. Testing revealed the groups to be significantly different in an environmental prospective. The groups appear to be useful for environmental research and may be tested in a similar manner to determine their applicability to any given area of research. New classifications may be developed by the same method used here by modifying the data base. The availability of new and relevant data will often justify such an effort. For instance, land use is a valuable measure of a number of influences on environmental quality, yet little data measuring land use is available except that which cones from dispersed sources in various forms. Should a new body of uniform data on land use in a large number of SMSAs become available, a more enlightened classification of SMSAs might be developed. Other classifications might be developed to satisfy a more specific emphasis. This research included such an effort for the Energy Resources Development Administration, interested in potential energy savings through changes in transportation patterns. The basis of the classification was limited to variables related to auto use: auto ownership, per capita vehicle- miles travelled, household size, urban density, etc. Comparison of the result- ing groups with those previously developed indicates a great deal is common to the two classifications. The data bank, the methodology, the classification and modal cities will be valuable in a variety of applications related to urban development and environmental quality. Specific classification of this sort is applicable to a wide range of environmental and urban policy research problems/ wherever detailed case studies are performed. *The number of second stage factors used for clustering was arbitrarily limited to four. A larger number of factors represents more dimensions along which cities may differ, fragmenting city groups into a large number of small clusters. x ------- 1.0 INTRODUCTION 1.1 Research Objectives The officials of the Environmental Protection Agency and other Government agencies are frequently faced with the task of evaluating the effects of pro- grams or policies at the local and regional levels. For example, EPA officials may be concerned with the effects of parking restrictions on urban air quality. To analyze this, they may monitor air quality in every city where parking restrictions are imposed, or they may restrict their monitoring activities to a more limited representative sample of cities. The second alternative is clearly more economical, however, it requires an appropriate urban classifica- tion scheme. The objective of this research project was the development of a flexible methodology for the classification of cities, which would-be appro- priate for the purposes of testing the effects of general environmental and other programs, and to aid in assessing the impacts of specific environmental policies. These typologies then group similar cities and identify modal, or representative urban areas for each group, facilitating the generalization of case study results. In this report the two terms, cities and SMSAs (Standard Metropolitan Statistical Areas), are used interchangeably. Cities normally make up parts of SMSAs, as demonstrated in Figure 1-1. In the last few decades, a large number of city classification schemes have been developed. It may be appropriate then to ask why yet another methodology and typology was necessary? A brief review of past attempts to classify cities may answer this question. Although considerable resources have been directed toward developing com- munity typologies, few of the resulting classifications have been applied to further research or practical problems. One reason is that every potential use of a community typology has specific requirements in terms of community characteristics considered, and the universe of communities to be investigated. No classification, then, is useful in every case. The majority of the earlier classification schemes did not include environmental characteristics. Recently there has been a great interest in environmental quality and quality of life. Coughlin* performed factor analysis for 101 metropolitan areas on sixty indicators of environmental quality and quality of life. The *Robert E. Coughlin, Goal Attainment Levels, in 101 Metropolitan Areas, RSRI Discussion Paper Series No. 41 (Philadelphia: Regional Science Research Institute, 1970). ------- Figure 1-1 Boston SMSA Outer Boundary of Boston StMSA Urbanized Area, 1970 ------- analysis, however, was biased toward social and economic characteristics since data on physical conditions was sparse. The John Somers' study* performed for EPA represents another recent example. This study utilized 1960-61 Census data for the most part, therefore, its results are somewhat obsolete. More relevant is Berry's study,** Land Use, Urban Form and Environmental Quality, which provides a city classification based on social, economic, and environmental characteristics. Although there are weaknesses in both the data base and the methodology used for this research, Berry has made a beginning and provides inspiration, as well as a core data base, for future research. A complete review of community classification studies may be found in Appendix B to Volume III. Most of these studies developed groupings for single research purposes (e.g., transportation analysis, environmental quality analysis, etcetera); in many, the groupings are limited to a subset of U.S. metropolitan areas; and some of the data sets used are incomplete or out of date. In contrast, the research described here did not identify a single urban typology, rather it developed a flexible methodology through which urban classifications may be developed for testing a variety of research hypotheses. Further, all 262 SMSAs are included in the analysis, which utilized an extensive data base, with much of the information recently becoming available. 1.2 Research Methodology As the initial research objective specified case study site selection for environmental analysis, the data base was designed to include descriptors of ambient environmental quality as well as its causal variables. This data bank thus includes information on ambient air and water quality, on the types and quantities of residuals being discharged into the environment, on socio- economic parameters, on the activities of the local government which affect environmental quality, and on variables describing the urban form, including land use, density, and so on. The data sources used include STORET, SEAS, the Bureau of the Census, the Department of Transportation, and the Department of Agriculture. Specific policy analyses would only use the relevant portions of the data, of course. Theoretically, it is possible to develop city groupings directly on the basis of the variables. However, the number of variables in the data bank represent too many "axes" along which cities may differ, making it impossible to develop consistent city groupings. We have used an iterative, two stage factor analysis procedure for data reduction purposes. Factor analysis is an arithmetic means of reducing a complex and highly intercorrelated data set to a smaller number of underlying factors. For example, the research design may require information on the percent of families below the poverty level, or the prevalence of substandard housing *John Somers, George B. Pidot, Jr., Modal Cities, prepared under Contract 8EPA-600/5-74.-027, for The Office of Research and Development, U.S. Department of Environmental Protection Agency. **Brian J.L. Berry, et al/ Land Use, Urban Form and Environmental Quality, Chicago: Department of Geography, University of Chicago, 1974. ------- units, and unemployment statistics: three highly correlated variables. Factor analysis offers one method of combining such variables into a single dimension for further statistical analysis or for grouping communities. This procedure is described in Chapter 2, as well as in Volume III. The use of factor analysis for data reduction purposes and for the development of groupings has been widely critiqued. In a tongue-in-cheek study of the dangers of indiscriminate use of factor analysis, J. Scott Armstrong uses an example in which Tom Swift, the young analyst, must collect data of significance and then analyze a sample of metal blocks. Armstrong chose the variables in such a way that there are only five significant variables in the grouping of eleven, the other six being only combinations of the first five. The results are amusing with the characteristics of the metal block identified as "intensity, shortness and compactness."* The point made in the Armstrong article is that the investigator must have some prior knowledge of the sample under study, first, to frame the hypotheses, and second, to interpret the results in light of reality. In order to frame the research hypotheses, and to assist in the inter- pretation of the results, an iterative, or two stage factor analysis procedure, was used. This approach also structured and facilitated the data collection efforts. The variables, on which information was to be collected, we*e separated into six categories, or significant characteristic sets (SCSs), each of which describe or affect ambjLent environmental quality. These sets are: SCS 1. Ambient Environmental Quality SCS 2 * Urban F6rm and the Physical Environment SCS 3. Residuals SCS 4. Household Sector SCS 5. Government Sector SCS 6. Industrial Activity Simple factor analyses were performed for each of these SCSs, with the result- ant factors being inputs into the second stage factor analysis. City group- ings were then developed on the basis of the second stage factors obtained. The objective here was to minimize the within group variance, and to maximize the between group variance in terms of the dimensions defined by the second stage factor analysis. In other words, the objective was to form groups of cities similar to one another, but different from the cities of other groups. *J. Scott Armstrong, "Derivation of a Theory by Means of Factor Analysis or Tom Swift and His Electric Factor Analysis Machine," The American Statistician (December 1967): 17-21. ------- Modal, or most representative, cities were then selected for each of the groups by simply identifying the city in each group which lies the closest in the multidimensional space to the geometric centroid (center) of the group. 1.3 Results This research project developed an urban typology, and identified repre- sentative SMSAs appropriate for general environmental analysis. In addition, it developed a flexible capability for developing similar typologies and identify- ing representative SMSAs for testing alternative research, hypotheses. Each of these will be described in turn, 1.3.1 SMSA Groupings for General Environmental Research — Five major city groups were identified for general environmental purposes; these five groups include 175 of the 262 SMSAs considered. The remaining cities are either outliers, single cities significantly different from the cities in the five major groups, or they are in mnor of cities, having different characteristics. groups comprised of a smaller number Table 1-1 describes the five major city groupings. The largest group con- tains 48 SMSAs, with the modal city being Albany-Schenectady-Troy, New York. Other cities in this group include Appleton-Oshkosh, WI; New Britain, CF; Portland, OR; etcetera. The second largest group contains 46 SMSAs, with Lake Charles, LA, being- the modal city, while the smallest group contains 18 cities, with Dallas, TX, being its representative. Group No. 1 2 3 4 5 Table 1-1 Major SMSA Groupings for General Environmental Purposes No. of Cities 36 46 27 43 13 Representative City Little Bock-North Little Rock, AR Lake Charles, LA Williamsport, PA Albany-Schenectady-Troy , NY Dallas, TX Cities Close to Modal City Baton Rouge, LA Corpus Chris ti, TX Lafayette, LA Midland, TX Montgomery , AL Odessa, TX Tyler, TX Spartanburg, NC Parkersfaurg-Marietta, WV-OH Davenport, IL Evansville , IN-KY Lawrence , MA Peoria, IL Apple ton-Oshkosh , ~fll New Britain, CT Portland , OR Charlotte, NC Richmond , VA ------- From this classification the single most representative city for environ- mental analysis in the United States is Louisville, Kentucky. If one were limited to a single case study, or demonstration project, the results would suggest that it should be located in Louisville, KY. The study results then could be allowed for a greater number of case studies/demonstration-projects, say five, these should be located at Little Rock, AR; Lake Charles, LA; Williamsport, PA; Albany-Schenectady-Troy, NY; and Dallas, TX; with the results being appropriate for the other cities in each of the five groups. To assess the effects of city size on city characteristics, the set of 262 cities was divided into small (less than 200,000) population, medium (between 200,000 and 500,000 population), and large (greater than 500,000 population) SMSAs. Two analyses were performed: first, the second stage factor analysis was repeated for each of the city size groups. Second, a separate clustering, similar to that described for the entire sample, was performed with- in each of these strata. Second stage factors remained stable for the three size groups. For example, factor 1 (largest explanatory power) from the all city analysis, indicating low income, low expenditures for sewerage and low levels of manufacturing activity, showed up as factor 1 in the analysis for each group of cities. These clusters were based on a single set of stage 2 factors, the set used for the general classification. Separate classifications may be very useful where city size is of great importance. The classifications appear to provide similar results. Note from Table 2-7 that small, medium, and large SMSAs are distributed throughout the general classification. Clusters within city size strata were found to be similar to the general SMSA groups, and to clusters in the other size groups. For example, Group 1 of the small SMSAs and Group 1 of the medium SMSAs show similar characteristics, as do Group 3 of the medium size SMSAs and Group 4 of the large SMSAs. In other words, city size did not significantly affect the classification scheme. 1.3.2 Application to Alternative Research Hypotheses — The data collected in this research project, and the methodology developed may be used to assist other environmental and urban research in three major ways: through the direct use of the data, through the identification of appro- priate case study sites, and through facilitating the generalization of study results. Each of these will be described in turn. The use of the data collected during this project for other research purposes is an obvious function. Although our data collection efforts were limited to secondary sources, some of the information contained in the data base was not easily accessible to the public. Information on ambient water and air quality, obtained from STORET and from the SEAS model, are two such ex- amples. Some of the descriptors of land use represent another case in point: the urbanized proportions of SMSAs, and the land area devoted to outdoor recreation were obtained from OSDA, and the Bureau of Outdoor Recreation, re- spectively. This data collection effort should not be duplicated by other researchers; a comprehensive description of our data collection efforts, as well as a complete listing of data may be found in Volume III of this series of reports. ------- As described above, a broad data base containing some 200 variables was collected, containing descriptors of ambient environmental quality and a diversity of other phenomena believed to affect environmental quality. Alternative research hypotheses may be described in terms of the variables contained in the data base, on the basis of which representative cities appropriate for case study/demonstration project siting may be selected. For example, a program analyst interested in the effects of the bottle bill on resource recovery may be interested in funding a limited number of demonstration projects. The optimal sites for these may be identified by first specifying the variables believed to affect the outcome, then performing factor analysisf groupings, and the selection of representative cities as described above. If the number of variables is limited,, the research hypothesis is well defined, a simple one- stage factor analysis may be appropriate. Additional variables of interest available from secondary sources may also be added. In addition, the universe of cities may be limited to fit the requirements of the particular research project; this may be limited to cities in a certain size range, in certain geo- graphical region, OJT to cities possessing certain attributes, such as high unemployment. Environmental deterioration and other problems frequently occur in SMSAs which are "outliers," which do not fit into any of the groups. Although the studies analyzing these effects cannot be easily generalized to other cases, a limited extrapolation may be possible. An examination of the data for the outlying SMSAs will identify the factor axes or variables with extreme observa- tions, which are in fact responsible for the outlier position of the city. If these variables are not crucial to the analysis, the city may be grouped with others in terms of the remaining variables. The study results then can be generalized to this group, although at a lower level of confidence. This capability was tested during the course of the project in connection with siting potential demonstration projects by the Energy Research Development Agency (SSDA) . This agency is interested in a limited number of demonstration projects for electric cars; potential sites for these demonstration projects were identified by USR&E. The results of this application are described in more detail in Chapter 4 of this, volume, as well as in Volume II. Case studies or demonstration projects may not always be performed at their ideal sites; data limitations or other constraints may prevent this. Alterna- tively, a researcher may be interested in generalizing the results of a pre- viously performed study. The methodology developed in. this project may facili- tate this process as well. Variables are selected, factor analysis is per- formed, and groupings are developed in the same manner as described above. The cities of interest are then located within or outside the groups, indica- ting the degree the study results may be generalized. The development of the general city typology, the data bases, factor analytic techniques, and cluster- ing methods used are described in Chapter 2. The applicability of the groups and their modal cities to general environmental research is discussed in Chapter 3; the development of alternative urban typologies is summarized in Chapter 4. ------- 2.0 GENERAL CLASSIFICATION OF SMSAs 2.1 Research Design and Data Base Essential to the design of any classification methodology are (1) defini- tion of the entities to be classified, (2) identification of attributes to be considered in the classification and the formulation of hypotheses concerning those attributes, and (3) selection of appropriate techniques by which the data can be used together to differentiate between observations. Each of these will be discussed below. 2.1.1 The Set of Localities to be Classified — For this research, the definition of entities to be classified was a sim- ple manner. Standard Metropolitan Statistical Areas (SMSAs) have been defined by the Office of Management and Budget (OMB) to represent the areas in and around one or more cities that act as a center in which the activities form an integrated economic and social system. Although other definitions of U.S. metropolitan areas are available, none have been utilized as extensively as the SMSA for the collection of data. The use of SMSAs achieved the greatest possible amount of data consistent between metropolitan areas of the U.S. As cities are constantly changing and OMB revises the definition of an SMSA periodically, it was necessary to perform the classification from the perspective of a single point in time. The set of SMSAs defined as of January 7, 1972 was arbitrarily selected as the set of SMSAs to be classified and data utilized in the classification was that which most closely represented each SMSA at that point in time. 2.1.2 The Data Base — Figure 1 in "Executive Summary" presents a general diagram intended to identify the major determinants of environmental quality. Beginning at the bottom of the diagram, ambient environmental quality is shown to be a function of both residuals and the capacity of the environment to dilute and/or neutralize pollutants (urban form and physical environment). Urban form also influences residuals generation, particularly in the transportation sector. Residuals are also the net result of pollution generated by households and industry and abatement efforts by the public sector. The public sector also influences consumption through investment in public facilities. However, the three areas at the top are reduced to simple form by associating abatement with the public sector, consumption with households, and production with industry. Second order effects, such as the influence of environmental quality on the three actors at the top of the causal path, are not considered here. The primary areas of interest, then, are : 1. ambient environmental quality 2. urban form and the physical environment 3. residuals 4. demographic characteristics 5. government 6. industry. ------- The variables in each of these categories comprise a significant character- istic set (SCS). Initial sets of variables used to. represent the six SCSs are listed in Appendix A to this volume. These variables represent the data which are presently available; many items are indicators of relevant activity or surrogates for otherwise unquantified attributes. For the general environmental, classification of cities, only four SCSs were utilized. These are: urban form and the physical environment; demographic characteristics; government? and industry. These SCSs include the variables appropriate for a general urban classification scheme. Further, these variables describe some of the factors which determine ambient environmental .quality, as well as the sources and levels of pollutants. The resulting typology should then be appropriate for urban research, as well as for general environmental policy analysis. SCS2 (urban form) contains variables describing the distribution of activities within the SMSA, the density of the SMSA and its urbanized portions, the assimilative attributes of the city, as well as some transportation related measures. SCS4 is comprised of household descriptors, providing information on the demographic characteristics of the population, on housing quality and living conditions, on economic welfare, on population and income changes, and on the modal split in transportation. The public sector, SCSS, contains in- formation on Government expenditures for improving environmental quality, as well as general community concern and involvement. SCS6 is comprised of in- dustry variables indicating the importance of industries critical.to environ- mental quality,* the importance of manufacturing and describing the :industry mix in terms of 20 manufacturing categories, wholesale trade, retail trade, and selected services. Because previous studies have found size and regional location to dominate groupings, the variables were "standardized" where appropriate: in that they were expressed in per capita or normalized terms. Regional biases were also eliminated where possible; variables that describe only regional location (e.g., latitude) were excluded. 2.1.3 Methodology — As described in Chapter 1, the data set contains too many variables to perform groupings directly so a data reduction technique is necessary. An iterative, two stage factor analysis was chosen in order to facilitate framing the research hypotheses, and to aid in the interpretations of the resulting factors. Selected factors derived from simple factor analyses of the four individual SCSs served as the second stage factor analysis, the second stage factors then formed variables, the basis for the clustering. The basic function of factor analysis is to ascertain and measure the funda- mental dimensions or interrelationships of a set of variates. The transforma- tion is made possible by moderate or high correlation between variables which compose the original data base. When two or more variables are very highly correlated, a single "factor" may describe this related variable set. In geo- metric terms, correlations between the original variates indicate nonorthogonal Industries critical to environmental quality have been defined as the heaviest polluters, and industries where abatement is the most difficult. ------- dimensions. In most cases, resolution into orthogonal (independent) dimen- sion: factors will simplify the vector space.* Many alternative techniques have been devised in the development of factor analysis. Although new methods often improve on the last for some analyses, no single method is regarded as better than all others. In fact, none is applicable to all research efforts. The methods differ in basic assumptions about the data base as well as in solution techniques. As a result, the value of any given application of factor analysis rests in the ability of the researcher to choose the method most appropriate to his research. Alter- native methods of performing factor analysis including principal components analysis and common factor analysis are described in Appendix A to Volume II. For purposes of this research, principal components analysis was selected as the preferred method; with common factor analysis being the principal alternative considered. The result of the two techniques did not differ substantially, with principal components analysis being the less expensive alternative, in terms of computer costs. The choice of this technique is described in greater detail in Section 4.5.3 of Volume II. Other aspects of refining the methodology include selecting the method of rotating the factors and the. choice of the clustering technique. These are discussed in Chapters 5 and 6 of Volume II, respectively. 2.2 Stage I Factor Analysis For each SCS, the general approach to developing Stage I factors was to perform initial statistical descriptions and then repeat factor analyses to: 1. clear the data set of unrealistic and erroneous observations; 2. modify hypotheses and the variables in each SCS to reflect knowledge gained from initial analysis; 3. estimate values to replace missing observations; 4. complete the Stage I factor analysis. The sections of this chapter will describe, separately for each SCS, the course of analysis followed and the outcome of Stage I* analysis. 2.2.1 SCS2—Urban Form and the Physical Environment — The analysis of SCS2 included a large number of factor analysis runs— each yielding new information about the measures involved. From the '30 variables originally included in this SCS, 15 were chosen for the final Stage I factors which best reflected the urban form concept of the original research hypothesis. *Note, however, that the final (preferred) solution may include non- orthogonal factors. 10 ------- The remaining set of 15 variables was factor analyzed, and yielded the five factors shown in Table 2-1, which explain 65.7 percent of the variance in the variable set. Factor 1, for example, loads on all five central tendency measures, and thus presents a measure of the proportion of activity which takes place in the central city(s) of the SMSA. It is not surprising this appears as the primary indicator of urban form. Factor 3 has a high positive loading on the number of primary radial facilities and a positive loading on the number of major circumferential facil- ities (roads) in the urban area. The factor loads negatively on square miles per person in the SMSA. Together, these indicate a heavily urbanized area. Factor 5 loads heavily on the number of population centers in the area (PCR) and the number of square miles per person in the central city(s) (SQMC). This indicates a dispersed pattern of urbanization involving more than one community in the core of the area. Table 2-1 SCS2 - Stage I - Factor Analysis for 15 Variables Variable fACTOR >" 3 Portion of Employment in Central City Portion of Total Population in Central City Portion of Manufacturing Bnployment in the Central City Portion of Retail Sales in Central City Portion of Manufacturing in Central City Portion of Land Which is Urbanized Portion of Workers Working Outside SMSA. Portion of Land Devoted to Outdoor Recreation Number of Radial Roadways Number of Major Circumferential Roadways Square Miles Par Parson Total Miles of Roadway per Capita Portion of Principal Arterials Number of Population Centers Square Miles per Person - Central City factor loadings are indicated as follows .50 to .73 (+); .75 to 1.00 C-I-M; -.50 to -.75 W, -.75 to -1.00 (—). The factors are numbered in order according to the percent of variance explained 65% of variance explained. 11 ------- 2.2.2 SCS4—The Household Sector — analysis of SCS4 data yielded factors which describe the socioeconomic character of an urban area. In this sense, the analysis of this SCS is more similar than any other to previous efforts in city classification. The variables used in this factor analysis, and their factor loadings are presented in Table 2-2. The interpretation of the resulting seven factors is summarized in Table 2-3. Table 2-2 SCS4 - Stage I - Factors Variable Movers into the SMSA Individuals Residing in the Same Dwelling for Five Years Population Change - SMSA Female/Male in 20-64 Age Group Population Change - Central City. Employmtn/Population - Total Employment/Population - Male Employment/Population - Black Employment/Population - Female Income - Gini Coefficient Median Family Income Married Couples without Own Household Crowded Housing Units Per Capita Income Hbrk Trip— Drivers Work Trip - Passengers Single Fanily Dwellings Households with one or more Auto- mobiles Owner-Occupied Housing Units in Structure — More than Five Portion of Population which is Urban Units in Structure — More than Fifty Portion of Population which is of Foreign Stock Average Household Size Median Age Fertility Rate Portion of Population which is Black Infant Death Rate Relative Death Rate 1 2 A W -y fr o tran growtl ' A •f-t- •• \y standaj f livij s portal r Factors 3 4 employ rd_ 19 A V u-on "^^^ 1 ^^ ment <=: ^n " i \ 567 0 & >\ hous ^ P \+j at ossaopo', //•I I ing .itan H 1 fan. stn Aj n - 1 +•/ health .andard I Lly icture •H- ; t 73.3% of variance explained 12 ------- Factor 1. Growth 2. Employment 3, Standard of Living 4. Location- Density Table 2-3 SCS4 - Stage I - Interpretation of Factors Positive Score Indicates some Combination of— Immigration to the SMSA Increasing population in SMSA and central city High employment participation rates in all sectors of the population Relatively low level of income as indicated by median and per capita measures Unequal distribution of income over the population Crowded housing 5. Cosmopolitan 6. Family Structure 7. Health Standard High proportion of single unit and owner-occupied housing Heavy dependence on the automobile Relatively high level of income as indicated by a per capita measure Dense housing Urban and foreign elements of the population relatively large Crowded housing Large proportion of young families High infant death rate Relatively high overall death rate given the age distribution of the population 2.2.3 SCS5—The Public Sector -- As indicated in Table 2-4, eight variables have been used to describe the public sector. These yielded four factors. Table 2-4 SCS5 - Stacre I - Final Factors Variable Local Govemnent General Revenue* General Expenditures Smploynent Sewerage Expenditures — Total Sewerage Expenditures — other than capital Expenditures on water Supply Expenditures on parks S Recreation Expenditures on Sanitation other than Sewerage Factors 1 <-v f\ I**; w 234 —large A n I] u V sa government •sewerage A r lo V nitati water parks /N ,n-0 \J 8S.5* of variance explained 13 ------- 2.2.4 SCS6: The Industrial Sector — Sesults of the first stage factor analysis for this SCS are shown in Table 2-5. The eight factors described here explain 59 percent of the variance indicated by the 24 input variables. The 8 factors appear to describe well the basic dimensions of industry mixes and could be easily titled as follows: 1. overall level of industrial activity 2. services and trade 3. textiles and apparel 4. miscellaneous manufacturing and instruments 5. fuel and chemicals 6. paper and allied products 7. leather and leather products 3. lumber and wood products. The factors are listed in order according to the amount of variance explained fay each; the latter factors, therefore, are the least valuable to the factor description of SMSAs. Table 2-5 SCS6 - Stage I - Final Factors Factor Variables 1 VAM Value added in Manufacturing EM Employment in Manufacturing S34 Value added in Fabricated metal products S35 Value added in machinery, except electrical 2 WHOL Wholesale Sales RETT Retail Sales SS Selected Services S27 Value added in printing and publishing 3 S22 Value added in textile mill products S23 Value added in apparel and other textile products 4 S39 Value added in miscellaneous manufacturing industries S38 Value added in instruments and related products 5 S28 Value added in chemicals and allied products S29 Value added in petroleum and coal products 6 526 Value added in paper and allied products 7 S31 8 S24 59% of variance explained. Value added in leather and leather products Value added in lumber and wood products NOTE: These factors did not load on the remaining 7 variables: S20—value added in food and kindred products; S2S—value added in furniture and fixtures 530—value added in rubber and plastic products; S32---value added in stone, clay and glass products; S33—value added in primary metal industries; S36— value added in electrical equipment and supplies; S37==value added in trans- portation equipment. —— ------- 2.3 Stage II Factor Analysis The two-stage application of the factor analytic technique was utilized to insure the proper evaluation of research hypotheses. Stage 1 analysis involved separate factor analysis for each SCS to reveal the hypothesized underlying dimensions within each group of variables. The output from in- dividual Stage I analyses was then combined, and used as the input to Stage II, Stage II thus identifies the relationships between SCSs and the basic underlying attributes of U.S. metropolitan areas. For a variety of reasons—both conceptual and statistical, a limited num- ber of Stage I factors were chosen for input to Stage II. This set included the first four factors from four SCSs—2, 4, 5, and 6—with the exception of Factor 3 from SCS2. Stage I analysis yielded a different number of factors for each of the four SCSs to be pursued in Stage II. For SCSs 2, 4, 5, and 6,. the number of factors was 4, 7, 4, and 8, respectively as indicated in the previous section. To use this set of factors in Stage II would create a significant bias toward SCS4 (.the household sector) and SCSS Cthe industry sector). In addi- tion, since Stage I factors from each SCS are mutually independent, excess factors in SCS4 and SCS6 wi.ll lead to the formation of additional separate factors beyond the primary factors indicated by interactions between the four SCSs. With these considerations/, the number of factors from Stage I to be included in Stage II was limited to four per SCS. The loss of explanatory power from the exclusion of these factors is significant but tolerable. The loss will be 28.3 percent in SCS4 and 21.0 percent in SCSS. Percent of variance explained by the first four factors in each SCS is as follows: SCS2* Urban Form 70.4 percent SCS4 Household Sector 50.0 percent SCSS Public Sector 85.5 percent SCS6 Industry Sector 38.0 percent Although several alternative combinations of Stage I factors were tested, the even distribution of factors between SCSs yielded the most meaningful factors, therefore, the reduced set was accepted as the best input to Stage II. Factor analysis of the 15 Stage I factors shown in Table 2-6 yielded 6 Stage II factors which, together, explain 63.1 percent o£ the variance in Stage I factors. These factors are intuitively satisfying as well as statistically valid in that each factor represents a set of characteristics which are likely to be encountered together in an urban area. Factor 1 indicates a low standard of living, low government expenditures for sewerage, and a low level of total manufacturing activity; this factor would characterize an economically depressed area on this factor scale. *Factor 3 was dropped from SCS2 because of data problems. 15 ------- Factor 2 indicates a low level of total manufacturing activity, heavy growth in recent years and a high concentration of population and economic activity in the core (central city) of the SMSA. Factor 3 indicates manufacturing activity in the miscellaneous category, a compact core, and heavy expenditures for sanitation other than sewerage. Factor 4 indicates high employment and service trade activities. Factor 5 indicates low residential densities, high auto dependence, little manufacturing in industries such as textiles and apparel, and heavy expenditures on water supply and recreation. Finally, factor 6 indicates a highly urbanized SMSA with a relatively large local government. scs 4 5 6 4 2 6 2 5 4 6 4 6 5 5 2 Stage 1 Factor No 3 2 1 1. 1 4 4 4 2 2 4 3 3 1 2 Table 2-6 Stage II Factors - Staae I Factor Name Low Standard of Living Expenditure on Sewerage Overall Level of Manufacturing Growth Central Tendency Miscellaneous Manu- facturing & Instru- ments Sprawling Core Expenditure on Non- Sewerage Sanitation Employment Services & Trade Location-Density (many single- family homes, high auto dependence} Textiles & Apparel Expenditure on Water Supply s Recreation Large Government Highly Urbanized Stage II Factor 123456 -H- — - - ++ + •H- - 4> +4- •»+ •K - + •H- — 16 ------- Before proceeding to the next stage of the analysis, which identified groups of similar cities, a test was performed to determine the stability of Stage II factors between different size cities. The factor analysis was' repeated for each of three groups of cities: Group Population small less than 200 ,.000 medium 200,000-500,000 large more than 500,000 The proportion of variance explained by primary factors is stable, varying- only from 61.7 percent to 65.4 percent- 2.4 SMSA Groupings and Representative Cities Groups of similar cities were identified through a simple geometrical clus- tering technique; in which each SMSA is initially considered a separate point. The two groups separated by the smallest geometric distance are then located and combined to form a new group with its centroid (center) midway between the two points. Then,, the two groups with nearest centroids of the new set are combined; and a new centroid located; the process can continue until only one groups remains. The centroids are weighted by the number of SMSA's already in the group. Criteria for choosing a stopping point in the process include the size and number of groups, and the relationship between within-group variance and between- group variance. Once the set of groups is selected, modal cities are identified by simply determining which city in each group lies closest in the multidimensional space to the geometric center of the group. It would have been possible to develop SMSA groups directly from the initial variables using the geometrical clustering routine. In practical termsA however, the variables of the data base provide too many dimensions along which cities may differ—the additional descriptive information provided by the variables stresses the uniqueness of each city rather than underlying basic characteristics which the cities have in common- Because the use of too many dimensions creates an unmanageable set of groups, input to the cluster analysis was arbitrarily limited, to the first four Stage II factors- Table 2-7 shows how the 262 SMSAs grouped together form basic classes of SMSAs. Five major groups of cities were identified? these include 175 of the 262 cities. The remaining cities grouped together as follows: 36 SMSAs in groups of five or more 34 SMSAs in groups of two to five *Computer program written by Howard Gilbert and Steve Chasen, Health Sciences Computing Facility, University of California, Los Angeles, California. Reference: R.R. Sokal and P.H.A.- Sneath (1973) Numerical Taxonomy; the Principles and Practice of Numerical Classification (San Francisco: W.H. Freeman and Co.). 17 ------- 17 SMSAs in groups of one. In Table 2-7 the double lines indicate division between groups and the dotted lines delineate subgroups. Subgroups within the major groups exhibit minor dissimilarities; the major groups are dramatically different. Group I shows a low level of manufacturing activity, low income and low expenditures on sewerage. This is tempered by moderate loadings on Stage II factors 2 and 4 which signify a growing economy oriented more than the average toward services and trade. Little Rock, the modal city for this group/ is very close to the group's centroid. In 1970, Little Rock was less wealthy than the average SMSA as is indicated by the number of families below the low income level: 13.5 percent as compared to the national level of 8.5 percent. But the area is growing and, in 1970, enjoyed a high rate of employment of. 3.3 percent Caverage for all SMSAs was 4.3 percent), and 34.7 percent of all housing units were built since 1960 (average for all SMSAs was 25.5 percent). Employment in manufacturing was below the national level: 20.L percent as compared to 25.8 percent for all SMSAs in 1970. Other cities found near the- center of this group include Baton Rouge, .LA; Corpus Christi, TX; and Montgomery, AL. Group II includes cities which are closer to the centroid of all cities than Group I. Factor scores indicate these cities have high unemployment and are not active in services and-trade. In addition, they generally have slightly lower than average income levels and economic activity and they may have experi- enced less than average growth. Lake Charles, LA, the modal city for this group, is still different from a hypothetical city at the centroid of the group, loading more heavily on Factor 1 and not at all on Factor 3. A high Factor 1 score is the result of a lower than average standard of living. Of all. families in Lake Charles, 16.6 percent, have incomes below the low income level and 4.7 percent of all housing units lack some or all plumbing facilities (national averages for SMSAs are 8.5 and 2.9 respectively). And, a large negative score on Factor 4 reflects the combined effect of slightly more than average activity in services and trade accompanied by very high unemployment (5.7 percent as opposed to the average 4.3 percent). Other cities near the centroid of the group are Spartanburg, NC; and Parkersburg, WV. The group also includes the overall modal city of Louisville, KY. Group III has high negative scores on Factors 2 and 3. Cities in this group, then are expected to be small SMSAs (in area) with an industrial base and little recent growth. Williamsport, PA, the modal city for this group, includes only one county of moderate size, 42.6 percent of its employment is in manufacturing and population growth in the decade ending 1970 was only 3.6 percent, as compared with the national average of 16.6 percent for SMSAs. Other cities near the centroid of this group include Davenport-Bock Island- Moline, IA-IL; Evansville, IN-lOf; and Lawrence'-Haverhill, MA-NH. Group IV is even closer to the overall centroid than Group II. Scores are moderately negative for Factors 1 and.2, moderately positive for Factor 3 and almost zero for Factor 4. Thus* these cities are expected to rank about average on the dimensions defined by the factor analysis. The modal city, Albany-Schenectady-Troy, NY, has a more negative score than the centroid 18 ------- VO CROUP 1 Abilene, TX Lafayette, LA San Angela, TX Midlar.d, TX Lubbock. TX Tuscon, AZ Albany, GA Knoxville, TN Uest I'dlui Beach-Boca Raton, FL Tan.pa , FL Monroe , LA Orlando FL Table 2-7 Groups of Similar SMSAs* CROUP 2 Ifuntington-Ashla.n<}, WV-KN-OH Stockton, CA Modestot_CA ___ _ Augusta, GA-SC Pueblo, CO Fresno, CA __ Killene-Tonple, TX Lewiston-Auburn, ME Pine Bluff, AR Spokane, UA Owensboro, KN Fort Hyera, FL Yakima^^UA HINOH CBDUP El Paao, TX Tuacalooaa, AL San Antonio, TX Columbus, GA-AL MINOR CROUP Calveston-Texas City, Manchester, NH Santa Barbara, CA Columbia, SC Corpus Christ t, TX SHre vuport , LA Macon . GA Texarkana, TX-AR Wilmington, NC Savannah, GA Tyler, TX - Montgomoty , AL MS New Orleans, LA Portland, ME Waco, TX Little Kock-N- Little Rock, AR« Odessa, TX Tulsa. OK Baton Rouge i LA St. Joa, >h, MO Sioux City, IA-NB Billings, MT' Boise City, ID lluntsville, AL Springfield, HO Amarlllo. TX Florence, AL Santa Rosa, CA Penaacola, FL Riverside-San Bernadlno-Ontarlo, CA Provo-Orem, UT Charleston, SC Salem, OR Ouluth-Sugerior^ WI-Mjj___ _ _ _ Altoona, PA Gadsden, AL Lake Charles, LA** Mobile, AL Bakersfield, CA Chattanooga, TN-GA Lakeland-Winter Haven, FL BirniUujhain, AL__ GROUP 3 Allentoun-Bethlehen-Eaaton, PA-NJ llarrisburg, PA St. Louis, MO-JL Greenville, SC Reading, PA York, PA Lancaster^ £*____.___________,^_ Cedar Rapids, IA Waterloo-Cedar Falls, ID Fort Wayne, IN Toledo, Oil- MI Nockford, IL Erie, PA Wheeling, HV-OII Poughkeepsie, NY Louisville, KM-IN Ricliland-Kennewick, HA Springfield, Oil Parkersburg-Marietta, HV-OH Sacramcntot_CA ___„______,_ Uavonport-MoUne-Rock Inland, IA-IL Hilliamsport, PA*« Elnlra, TX Mansfield, OH Lima, Oil Peoria, IL Lawrence-Haverhill, HA-NU Gastonia, NC Salt Lake City, UT Pcternburg-Colonial Heighta-Hopewall, VA Charleston. WV HiIkes Barre-Hazelton, PA Beaumont, TX Spartanburg, SC Hew Bedford, HA Alexandria, LA Indianapolis, IN Springfield, IL Wichita, KS Decatur. IL Terre Haute, IN Anderson, IN •This table is to ba used in conjunction with Table 6-5 as discussed below pertaining to Group V. ••Modal SMSA ------- N3 O MINOR. —/ itxm-.xiiut.uj Hartford, CT Minneapolis-St. Paul, MN-WI San Jose. CA Milwaukee, HI Mashing ton, DC-MB-VA Norwalk, CT Rochester, HY UKOUP 4 Bristol, CT Mariden, CT N..IW London-Norwich, CT-RJ Baltimore, MO Los Angeles, CA Molborune-Titusville-Cocoa, Beach, FL Day (.oiia Beach, FL Newark, HJ Philadelphia, PA AfpU-ton-Oshkosh, WI Syracuse, NY Racine, WI Scra.-iton, PA Loraine-Elyrla, OH Worcester, MA South Bond, IN Bui.jh.iml^ii, NY-PA Now Brunswich-Parth-Amboy-SayrevHle, NJ Mi IminyLon, HE-NJ-HD Cleveland, Oil Detroit, MI Tienton, NJ O.iytun, Oil Cincinnati, OH Brockton, !-!A PittsCield, HA UJ--.-11, HA-IIII WI Anahrim-Santa Ana-Garden Grove, CA Ni>w Iiavon-W.'st Haven, CT Chicago, IL Jersu/ City, NJ Seattle-Everett, WA San rranci >ico-Oak land, CA fort I and, OK- HA New Britain, CT Oxnard-Siroi Valley-Ventura, CA Nashville, TN Green Bay, I.I La crosse. MI Dubutjue , IA Ogj^-n, UT Santa Cruz, CA Haiti! 1 ton-Middle town, OH Albany-Schenectady-Troy, NY** CKOIIl' 5 Charlotte, NC Dallas, TX** Oklahoma City, OK Baleigh, NC Lexington, KV Tallahassee, FL Jacksonville, FL Durham. NC . TN-AR-HI Qes Moinea, IA Kansas City, KS-MO Stamford, CT Richmond, VA Omaha, NB Denver, CO Coluinbua, Oil Houston, TX MINOR CROUPS Fort Lauderdale-Hollywood, FL Phoenix, AZ Roanoke, VA Sarasota, FL Fall kiver, MA-RI Albuquerque, NH Eugene-Springfield, OR Fargo-Hoorehead, ND-HN Lincoln, NB Lafayctte-Wust Lafayette, IN Rochester, MN Topeka, KS Bloomington-Normal, Id Canton, OH Youngstown-Warren, OH Pittsburgh, PA Paterson-Clifton-Paasalc, NJ Utica-Rome, NY StoubenvHle-Weirton, Oil-WV Johnstown, PA Long Branch-Asbury Park, HJ Atlantic City, NJ '•Modal SMSA Vineland-Mlllville-Bridgeton, NJ Fort Smith, AR-OK Lynchburg, VA Ashevllle, NC Las Vegas, NV New York, NY Madison, WI Bryan-College Station,. TX Gainesville, FL Columbia, MO :VAuatin, TX H1NOH (illOUPS Greensboro-Hinston-Salem-Highpoint, NC Nashville-Davidson, TN Sioux Falls, SO Reno, NV Atlanta, GA Baltimore, HD Jackson, HI Huskogon-Huakegon Heights, HI Gary-Hammond, IN Saginaw, HI Huncie, IN Bay City, HI Flint, HI Lansing-Cast Lansing, HI Kalamazoo-Portage, MI Ann Arbor, HI Fayetteville, NC Lawton, OK Newport News-Hampton, VA Norfolk-Virginia Beach-Portsmouth, NC San Diogo, CA Champalgn-Urbana-Rontoul, IL Colorado Springs, CO , WA Salinas-Seaside-Honterey, CA Vallejo-Fairfield-N-pa, CA Great Falls, HT Btloxi-Gulfport, MS Hiama, FL Provi dence-Harwick-Pawtucket , RI-HA Haterbury, CT Springfield-Chicopee-llolyoke, HA-CT Boston, HA Bridgeport, CT Akron, Oil Laredo. TX HcAllen-Pharr-Edinburg, TX Brownsville-Harlingen-San Benito, TX. ------- on Factor 1, perhaps the result of higher incomes and manufacturing activity. Cities very similar to the modal city include Appleton-Oshkosh, WI, and New Britain, CT. Group V has high scores on Factors 2 and 4, describing large SMSAs which are prosperous, as indicated by growth and high employment, and which are active in services and trade rather than manufacturing. Dallas, TX, has been designated as the modal city for this group and appears to fit the factor description. A great deal of caution should be exercised in dealing with the modal city and groups of cities since the group includes a large number of cities for which several of the values were estimated. Most of these estimated values are for descriptors of the industry mix, which is important to the grouping fof these cities. 21 ------- 3.0 APPLICABILITY TO GENERAL ENVIRONMENTAL RESEARCH PROBLEMS As described in the previous chapter, the general city classification scheme excluded the SCSs describing ambient environmental quality, and the residuals discharged into the environment^ There were several reasons for this r a classification scheme based on the other four SCSs would result in city groupings useful for general urban research; and the data describing- environ— mental quality had some limitations.. Further* the general city groupings should reflect differences in the generation of residuals, and in ambient. environmental quality if the causal relationships hypothesized in our research design are true (see Figure 1). In this chapter, first the data base contained in the ambient environmental quality SCS and in the residuals SCS are described. Second, differentials in environmental quality are analyzed between the general city groupings.. * 3.1 Environmental Data Base This data base consists of eleven, ambient water quality indicators, two measures of air quality, a single subjective measure of perceived water quality, and eleven drinking water quality variables. The water quality variables were obtained from STORET. For each SMSA, up to eleven longitude-latitude points were identified along the boundaries; information was retrieved from all STORET stations within the polygon defined by these longitude/latitude points. A simple average of the readings was then calculated for each variable and used to indicate water quality differences across SMSAs. These measures represented approximations at best because of the uneven distribution of sampling over time and over space» Because the motive for sampling varies, the parameters measured, the sampling methods and the location of the STORET stations also vary. Missing values also represented a significant problem. Suspended particulates and sulfur dioxide were the only two air quality parameters included in our data base> for other descriptors of air quality (oxidants, carbon monoxide, nitrogen dioxide, for example) information was available for a limited number of SMSAs only. The SEAS data file was the source of the air quality information. The PDI index is a subjective measure of the prevalence, duration, and intensity of water pollution, calculated by the Office of Water Programs in EPA. Drinking water quality data has been obtained from the Water Supply Division of EPA. The ten drinking water quality parameters include informa- tion on the chemical content of the water supply, its alkalinity, hardness, and acidity. Information on the quantities of residual pollutants discharged into the environment has been obtained from the SEAS data base. This data base includes information on the quantities of residuals discharged into the air: particulates, sulfur oxides, etcetera into the water; BOD, suspended solids, etcetera, and on the generation of solid wastes. Data on residuals from the SEAS data bank is 22 ------- computed rather than directly measured. Industry coefficients are developed for approximately 400 pollution producing economic sectors and subsectors. These coefficients relate the generation of a specific pollutant by the partic- ular industry to the output of that industry at the national level. The coefficient times the output of the industry in the given SMSA equals the total gross residual. A second coefficient estimating abatement by sector is applied to the gross residual at the SMSA level to obtain the net residual. The use of national coefficients for most sectors ignores regional differentials in the production of residuals generation process- Industry output at the SMSA level is measured by total economic value of production. The 1975 data used here is actually forecast by the SEAS model rather than measured. The national forecast is shared out between SMSAs based on disaggregate forecasts prepared by the Bureau of Economic Analysis COBERS), the Economic Information System (EIS) tapes and other appropriate sources. 3.2 Applicability to Environmental Research The simplest method of testing the applicability of the general SMSA groups to environmental policy analysis was to look at the variation in environmental measures between groups of SMSAs. Three approaches to comparison have been followed: regression analysis, factor analysis, and t- and F-tests (comparison of means). Regression analysis was used to test the hypothesis that there is a significant relationship between environmental variables and Stage I factors derived from nonenvironmental data. The statistics support this hypothesis. For example, dissolved oxygen (DO) was. found to rise with sewer expenditures (S5F2)*, although it is negatively related to other sanitation expenditures (S5F4). Growing cities have, on the average, lower DO than older centralized cities (S2F1, S2F2, S4F1) , and low income also is correlated with low stream DO. A portion of these effects may be related to the fact that northern cities naturally have higher DO because of lower temperatures. However, the general relationship is that sewers, higher incomes, and slow growth all improve DO. Similarly, significant relationships were found between the other environ- mental quality variables, and the Stage I factors. For a comprehensive description of these results, see Chapter 6 of Volume II. The second approach to testing the relationship between environmental characteristics and more basic urban attributes was to include Stage I factors and variables from SCSI and SCS3 in the Stage II factor analysis. The second stage analysis was repeated with each of the eight variables indicating ambient environmental quality. A close relationship between environmental and general attributes would be expected to cause the added measures to join Stage I factors from other SCSS to form factors similar to those which resulted from the basic fifteen factor set. If environmental attributes were not closely aligned with other attributes, the added measures would cause the factors to be restructured to some extent—perhaps forming an entirely new Stage II factor. *S5F2 indicates factor number 2 from SCSS. 23 ------- When SCSI indicator variables were added, the result was as expected— the indicator variables appended themselves directly to factors derived from the basic fifteen factors. In no case was an additional factor developed. The final approach, to testing the suitability of the groups to environ- mental analysis involved t- and F-tests, comparisons of means to test whether the groups xere significantly different in terms of the environmental quality variables. In the simplest case—testing-whether Group A and Group B have significantly different yalues for a single variable—a t-test is used with the null hypo- thesis indicating equal means for the two groups. For each variable, then, each pair of groups was. compared to generate t-statistics which indicate the magnitude of any difference in the means relative, to the variance of the given variable. significant differences between groups were found for all but one variable— the PDI index. Groups can also be compared in terms of general environmental quality by performing F-tests between the groups using all eight indicator variables together. The null hypothesis is that no group has its own characteristics, that there is a high probability that the eight environmental quality variables do not show significant differences between the groups. If this was true for a pair of groups, they could be combined to form a single, unique group in terms of environmental quality. In every case, however, F-statistics indicate with at least 60 percent probability that the groups were significantly different given the eight environmental indicators. 24 ------- 4,0 OTHER RESEARCH PURPOSES As discussed in. Chapter 1, the data base and the methodology developed. during this research project may be useful for other research purposes. In particular, the data may be used directly, city groupings and representative cities may be selected for testing alternative research.hypotheses, and the results of studies performed in. particular localities may be extrapolated to other areas. During the project, this site selection capacity was tested. for ERDA? so that potential sites for electric car demonstration projects were identified- The example described in the following section is followed by other potentiaL applications described in Section. 4.2.. 4.1 Transportation Demonstration Project: In the context of this project, a specific city classification scheme was developed in response to a problem proposed by the Energy Research Develop- ment Agency. ERDA is concerned with the potential for energy conserva- tion which may be achieved through alternative transportation policies, in urban areas, in particular, through the use of electric cars. The development of an appropriate urban classification scheme and the identification of SMSAs for performing case studies and/or siting demonstration projects are the objectives for this task. Given the more limited scope of this classification, the factor analysis was performed in a single stage. A set of thirteen variables was chosen from the assembled data base to reflect attributes of an urban area which are im- portant in urban transportation analysis (see Table 4-1) .. The transportation analysis is particularly interesting because it has been performed for three city siae strata, (see page 6) as well as for the full sample of 262 SMSAs. Table 4-1 Variables Selected for tha Transportation Classification Variable Description [ 1. Hit! Percent of families in single-unit housing 2» TD Percent of workers commuting as auto drivers 3. AUTO Percent of households with one or more cars 4_ CSHE Percent of SKSA employment in central city 5. PCS Percent population change, 1960-1970 6_ 2PF Percent of woaen 18 or older who are employed 7. Y2i Median household income 3. par, Percent of population which is Blacfc 9. PPH Persons/housing unit 10. SOMU Square miles/person, urbanized area II* RAD Count of radial highways 12. CIR Count of circumferential highways 13. VMT? Vehicle nilss travelled/capita-day 25 ------- Four separate single stage factor analyses were performed; one for the 262 SMSAs and one for each of the three city size strata. Although a total of twelve factors were generated, three factors describing auto use,, income/racial characteristics, and highways dominate all the runs. In other words, city size appears to have a limited effect on the factors describing urban transportation characteristics. Four separate clustering procedures were performed; one for all cities, and one for each of the three size strata- The four most important factors were used for clustering in each case; the factors selected varied somewhat between the city size groups. The 65 large cities formed two.major groups with Providence, RI, and Louisville, KY-IN, being their representatives. About a quarter of the large cities are outliersf indicating the wide divergence in characteristics shown by the large cities, The medium size cities also formed two major groups, with Little Rock, AR, and Tacoma, WA, being their modal SMSAs. Of the 87 cities in this group, only 9 were outliers. Within the 107 small cities, there are five major groups, and about 10 per- cent unclassified (outlier) cities. Modal cities were Sarasota, FL; Lincoln, NB; St. Joseph, MO; Parkersburg,. WV; and Spartanburg, NC. The results of this classification may be used for a variety of purposes. The modal cities Suggest natural case study sites, or locations for demonstra- tion projects; the results of these may then be generalized to other cities in their groups. Further considerations may lead to other choices, data availability represents a case in point. The results of these studies may also be generalized to a larger set of cities. In addition,.factor scores may be used to compare cities along the urban transportation related dimensions defined by the factors. Large city case studies, for example,- should be located in Providence, RI, and Louisville, KY-IN, with Providence results being applicable to Akron, OH; Rochester, Pittsburgh, and so on, and the Louisville results being relevant for Atlanta, Baltimore, Omaha, and so on. Medium city case studies should ideally be located in Little Rock and in Tacoma. Should study results be available for "outlier" cities such as Nashville, Hartford, or San Antonio, the factor scores for these cities should be examined individually. They may indicate that the SMSA~San Antonio, for example—is like no other city, in this case the results cannot be generalized. For other outliers, similarities may be discovered at least along some of the axes, for example, Nashville resembles groups 5 and 6 (of the large city groupings)—but does not fall into them because of an extreme value on Factor 1. Thus, its results have at least limited relevance to Toledo, Norfolk, and other cities in these groups. 4.2 Potential Applications The data base and the methodology developed during this research project may be applied for alternative research purposes in three main uses. These are: the direct use of the data base, case study/demonstration project site selec- tion, and the extrapolation of results of existing case studies to other sites. 26 ------- The direct use of the data base does not merit extensive discussion. Clearly, researchers requiring the information available for our data base should not duplicate our data collection efforts, particularly since some of the information has. been obtained from unpublished secondary sources. A complete list, of the variables included in the data base is included in Appendix A to this volume; a comprehensive description of the data sources, strengths and weaknesses, as well as a data listing of all data may be found in Volume Case study or demonstration project sites may be selected for a variety of research purposes through the use of the data base and methodology developed in this project. The major constraint to developing an appropriate classification scheme is that the research hypotheses to be tested must be capable of being framed in terms of the variables included in the data base. Indeed/ some subset of the data base may be adequate for that purpose. For example, if a program analyst was interested in studying the effects of an antipoverty program, environmental quality and urban form descriptors would be of, peripheral interest for his research purposes. Alternatively, if air quality maintenance programs represented the focal point of the inquiry, then water quality descriptors, and some of the income variables may not be relevant. The first step in developing appropriate city groupings is the specification of the relevant data set to be used for factor analysis and • clustering. Secondly, the universe of SMSAs may be limited to fit the requirements of the research project. The program being tested may apply to a limited geographical area such as the South, or may be relevant for cities in a certain size category only. It is possible to identify cities with cer- tain attributes to be excluded/included from the analysis. Once the data set and the universe of SMSAs is delimited, factor analysis is performed. If the data set is composed of a limited number of variables, and if the research hypotheses are relatively simple and well-defined, a one stage factor analysis may ±>e adequate. A more complicated research design may call for a two stage factor analysis. Clustering is then performed on the basis of the first and second stage factors obtained, and representative cities are selected for each of the city groups. The case studies or demonstration projects, should be sited at these representative cities to best ensure that their results can be generalized to the other SMSAs in the group. In some cases, it may not be possible to perform a case study at an ideal location in the appropriate representative city because of costs, data limitations, or simply due' to lack of cooperation. Alternative sites may then be chosen. Further, the site selection process may be random, based on less rational criteria than the ones described here. It is appropriate to ask whether the results of such studies may be extrapolated to other sites. The data base and methodology developed here may be used for this purpose as well. The data base and the universe of SMSAs must be delimited first, and second, the factor analysis and clustering, are performed as described above. The sites of the case studies/demonstration projects are identified relative to the city groups; with the results being capable of generalization to the other cities in the groups. If the sites do not fall into any of the groups, then the study results may not easily be generalized to other cities. A large number of case studies have been performed in outlier cities, such as 27 ------- New York, not because of a random site selection process but because the pro- blem areas are frequently outliers. It is possible to analyze the data for such SMSAs,, and to determine the axes or variables with extreme observation, which are in fact responsible for the outlying position of the SMSA. If these variables are not crucial to the analysis, the SMSA may be grouped with other cities in terms of the remaining variables. Study results may be generalized to this group—although at a lower level of confidence. 28 ------- APPENDIX A LISTING OF DATA VARIABLES ASSIGNED TO SCS 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CODE BOD FCOL N C TDSS TSS TURB AALK PHS OAG MB AS SUS SO2 PDI CL FL FE MG N03 S03 ALK HD PH STATISTICAL UNITS Biochemical Oxygen Demand (5-day, 20° C) Fecal Coliforms, measured by membrane filter method * Total Nitrogen (mg/1) *Total Organic Carbon (mg/1) ^Dissolved Solids (mg/1) *Suspended Solids (mg/1) *Turbidity (Jackson Candle Units) *Alkalinity (mg/1 as CaCO3) *Acidity (standard units) *0il and Grease, soxhlet extraction (mg/1) %ethylene Blue Active Substance (mg/1) Suspended Particulates (micro-g/cu.m.) Sulfur Dioxide (micro-g/cu.m.) PDI Index "^Chloride (mg/1) "^Fluoride (mg/1 ) "*"Iron (mg/1) "^Manganese (mg/1) "Citrate (mg/1) +Sulfate (mg/1) "^Alkalinity (mg/1 as CaC03) ^Hardness (mg/1 as CaC03) "^Acidity (pH standard units) *Ambient Water Quality ^Drinking Water Quality 29 ------- ASSIGNED TO SCS 1 2 2 2 2 2 CODE TDS SQMS SQMC SQMU CENP CENM STATISTICAL UNITS ••"Total Dissolved Solids (mg/1) Square Miles Per Person Square Miles Per Person - Central City Square Miles Per Person - Urban Places Portion of Total Population in Central City Portion of Manufacturing in Central City CENS CENE CENME 2 2 2 2 2 2 2 2 2 2 2 2 ARC RAD CIR PCR LU EEC LA TCOM LAT LONG ALT PREC (by value added)(percent) Portion of Retail Sales in Central City (percent) Portion of Employment in Central"City (percent) Portion of Manufacturing Employment in the Central City (percent) Arc of SMSA around the Center (quadrants) Number of Major Radial Roadways Number of Major Circumferential Roadways Number of Population Centers Portion of Land Which is Urbanized (percent) Portion of Land Devoted to Outdoor Recreation (percent of total land area) Land in Farms (percent) Portion of Workers Working Outside SMSA (percent) Latitude of SMSA (degrees) Longitude of SMSA (degrees) Altitude (feet) Mean Annual Precipitation (inches) 30 ------- ASSIGNED TO SCS 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 CODE SUN WIND HWPC HWPR INV WTMP DO HARD WTR RPAR RSO RNOX RHC RCO RBOD * RSS RDS RNUT RWW STATISTICAL UNITS Mean Annual Possible Sunshine (percent) Mean Annual Wind Velocity (miles per hour) Total Miles of Roadway per Capita Portion of Principal Arterials (percent) Inversions (mean annual frequency) Water Temperature (ambient) (°C) Dissolved Oxygen in Water (ambient) (rag/1) Hardness of Water (ambient) (mg/1 as CaCO,) Large Water Bodies (number) Parti culates (tons per year per capita) Sulfur Oxides (tons per year per capita) Nitrogen Oxides (tons per year per capita) Hydrocarbons (tons per year per capita) Carbon Monoxide (tons per year per capita) Biochemical Oxygen Demand (tons per year per capita) Suspended Solids (tons per year per capita) Dissolved Solids (tons per year per capita) Nutrients (tons per year per capita) Wastewater (million gallons per year per capita) RNSW Noncombustible Solid Waste (tons per year per capita) RIS industrial Sludges (tons per year per capita) 31 ------- ASSIGNED TO SCS 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 CODE «•••••*•••••••• AGE PP PR NOH PU NMOV PPH PBL MIGP FTM PCC PCS YMC H1U HOCC HSU 4 4 4 H50U GINI YM YP STATISTICAL UNITS Median Age (years) Portion of Population which is of Poreign Stock (percent) Fertility Rate (children ever born, per thousand women ever married) Married Couples Without Own Household (percent) Portion of Population Which is Urban (percent) Individuals Residing in the Same Dwelling- for Five Years (percent) Average Household Size (persons per household) Portion of Population Which is Black (percent) Movers into the SMSA (percent of individuals over five years of age) Female/Male in 20-64 Age Group Population Change ~ Central City (percent) Population Change — SMSA (percent) Change in Median Income (percent) Single Family Dwellings (percent) Owner-Occupied Housing (percent) More than Five Units Units in Structure (percent) Units in Structure (percent More than Fifty Units Income — Gini Coefficient Median Family Income (dollars) Per Capita Income (dollars) 32 ------- ASSIGNED TO CODE ^STATISTTCAT. TTOTTS GX Local Government — General Expenditures (dollars per capita) GSEV Local Government — General Revenues (dollars per capita) GEMP Local Government — Employment (full time equivalent per capita) EM Total Employment in Manufacturing (percent) VAM Value Added by All Manufacturing (dollars per capita) IFLT Total Value of Production in Meat Animals and other Livestock (dollars per capita) IOIL Total Value of Production in Crude Petro- leum, Natural Gas (dollars per capita) IMET Total Value of Production in Meat Products (dollars per capita) ICTH Total Value of Production in Broad and Narrow Fabrics (dollars per capita) 145 Total Value of Production in Household Furniture (dollars per capita) IPLP Total Value of Production in Pulp Mills (dollars per capita) IPPR Total Value of Production in Paper and Paperboard Mills ICHM Total Value of Production in Industrial Chemicals (dollars per capita) IPRT Total Value of Production in Commercial Printing (dollars per capita) IFRT Total Value of Production in Fertilizers (dollars per capita) IMCM Total Value of Production in Miscellaneous Chemical Products (dollars per capita) 33 ------- ASSIGNED TO SCS CODE STATISTICAL UNITS 6 IPLA Total Value" of Production in Plastic Materials and Resin (dollars per capita) 6 IPNT Total Value of Production in Paints (dollars per capita) 6 IFUL Total Value of Production in Petroleum Refining (dollars per capita) t 6 IASP Total Value of Production in Paving and Asphalt (dollars per capita) 6 IGLS Total Value of Production in Glass (dollars per capita) 6 ICLY Total Value of Production in Structural Clay Products (dollars per capita) 6 ICMT Total Value of Production in Cement, Concrete, Gypsum (dollars per capita) 6 ISTL Total Value of Production in Steel (dollars per capita) 6 IALM Total Value of Production in Aluminum (dollars per capita) 6 IAPL Total Value of Production in Household Applicances (dollars per capita) 6 ICAR Total Value of Production in Motor Vehicles (dollars per capita) 6 IELC Total Value of Production in Electric Utilities (dollars per capita) 6 ICOL Total Value of Production in Coal Mining (dollars per capita) 6 IVEG Total Value of Production in Canned and Frozen Foods (dollars per capita) 6 IFBR Total Value of Production in Cellulous Fibers (dollars per capita) 34 ------- ASSIGNED TO SCS 6 6 6 RETT WHOL S20 6 6 CODE STATISTICAL UNITS ITAN Total Value of Production in Leather and Industrial Leather Products (dollars per capita) IABS Total Value of Production in Other Stone and Clay Products (dollars per capita} ICU Total Value of Production in Copper (dollars per capita) IPB Total Value of Production in Lead (dollars per capita) IZN Total Value of Production in Zinc (dollars per capita) IMTL Total Value of Production in Other Fabricated Metal Products (dollars per capita) IWST Total Value of Production in Wholesale Trade (dollars per capita) Total Retail Sales ($000 per capita) Total Wholesale Sales ($000 per capita) Total Value Added in Food and Kindred Products (SIC 20) ($ millions per capita) S22 Total Value Added in Textile Mill Products (SIC 22) C$ millions per capita) S23 Total Value Added in Apparel and Other Textile Mill Products (SIC 23) ($ millions per capita) SS Selected Services ($000 per capita) S24 Total Value Added in Lumber and Wood Products (SIC 24) ($ millions per capita) S25 Total Value Added in Furniture and Fixtures (S 25) ($ millions per capita) S26 Total Value Added in Paper and Allied Products (SIC 26) ($ millions per capita) 35 ------- ASSIGNED TO SCS CODE STATISTICAL UNITS 6 S27 Total Value Added in Printing and Publishing (SIC 27) ($ millions per capita) 6 S28 Total Value Added in Chemicals and Allied Products (SIC 28) ($ millions per capita) 6 S29 Total Value Added in Petroleum and Coal Products (SIC 29) ($ millions per capita) 6 530 Total Value Added in Rubber and Plastic Products (SIC 30) ($ millions per capita) 6 S31 Total Value Added in Leather and Leather Products (SIC 31) ($ millions per capita) 6 S32 Total Value Added in Stone, Clay and Glass Products (SIC 32)($ millions per capita) 6 S33 Total Value Added in Primary Metal Industries (SIC 33) ($ millions per capita) 6 S34 Total Value Added in Fabricated Metal Products (SIC 34) ($ millions per capita) 6 S35 Total Value Added in Machinery, Except Electrical (SIC 35) ($ millions per capita) 6 S36 Total Value Added in Electrical Equipment and Supplies (SIC 36) (? millions' per capita) 6 S37 Total Value Added in Transportation Equipment (SIC 37) ($ millions per capita) 6 S38 Total Value Added in Instruments and Related Products (SIC 38) ($ millions per capita) 6 S39 Total Value Added in Miscellaneous Manufacturing Industries (SIC 39) ($ millions per capita) 36 ------- CQpE STATISTICAL UNITS control data* P Population (in thousands) control data* HTOT Total Housing Units (in thousands) control data* TALL Total Commuters (hundreds) control data* LS Total Land Area — SMSA (square miles) control data* LUBB Total Land Area — Urbanized Portion (square miles) *Control data were used for computing normalized variables. 37 ------- TECHNICAL REPORT DATA (Please read Instructions on the reverse before completing) 1. REPORT NO. EPA-600/3-7T-008a 3. RECIPIENT'S ACCESSIOI^NO, 4. TITLE AND SUBTITLE Classification of American Cities For Case Study Analysis-Volume I Summary Report 5. REPORT DATE May 1977 (issuing date) 6. PERFORMING ORGANIZATION CODE 7. AUTHOR(S) Elizabeth Lake, Carol Blair, James Hudson, Richard Tabors 8. PERFORMING ORGANIZATION REPORT NO. 9. PERFORMING ORGANIZATION NAME AND ADDRESS Urban Systems Research & Engineering Inc. 1218 Massachusetts Avenue Cambridge, Massachusetts 02138 10. PROGRAM ELEMENT NO. 1HA091 11. CONTRACT/GRANT NO. 68-01-3299 12. SPONSORING AGENCY NAME AND ADDRESS Office of Monitoring & Technical Support - Wash., DC Office of Research and Development U.S. Environmental Protection Agency Washington, D.C. 20^60 13. TYPE OF REPORT AND PERIOD COVERED Final 14. SPONSORING AGENCY CODE EPA/600/19 15. SUPPLEMENTARY NOTES Volumes II - Detailed Report and III - Documentation of Study are available from Na1"irvna1 16. ABSTRACT Attempts to analyze and evaluate the impacts of federal programs has,led to the extensive use of case studies of program impacts at selected sites. This project has developed a methodology for the systematic selection of representative case study sites and for generalizing the study results. The methodology, involving two stage factor analysis and clustering, is applied to a specific program/policy problem, the selection of metropolitan areas for case studies in analyzing the impact of federal policies on general environmental quality. The methodology begins with a data base on standard metropolitan statistical areas, SMSAs, including variables related to environmental quality, urban form, and household, industrial, and government activity. It analyzes these variables through a two-stage factor analysis technique which allows heuristic consideration of the significant characteristics. Finally, it develops city clusters which group areas with similar attributes. Modal (or representative) cities are selected for each group and suggested as case study sites. These groups-may be used to generalize the study results and to analyze the*trans- ferrability of results between areas. The methodology is sufficiently flexible to consider a wide range of research hypotheses. 7. KEY WORDS AND DOCUMENT ANALYSIS DESCRIPTORS b.lDENTIFIERS/OPEN ENDED TERMS COSATI Field/Group Economic Factors Economic Surveys Economic Geography Census Central City Demographic Surveys Populations Socioeconomic Stans Urban Areas Urban Sociology Urban Geography Factor Analysis Modal Cities City Classification 08F 05C 05K 13B 3. DISTRIBUTION STATEMENT Unlimited 19. SECURITY CLASS (ThisReport) Unlimited 21. NO. OF PAGES 48 20. SECURITY CLASS (Thispage) Unlimited 22. PRICE EPA Form 2220-1 (9-73) 38 .S. GOVERNMENT PRINTING OFFICE: 1977-757-056/6426 Region No. 5-11 ------- |