United States Environmental Protection Agency Office of Research and Development Washington, D.C. 20460 EPA/600/R-01/078 October 2001 &EPA Guidance for Statistical Determination of Appropriate Percent Minority and Percent Poverty Distributional Cutoff Values Using Census Data for an EPA Region II Environmental Justice Project Q: In a random location, can one determine the level of% minority and % poverty within 100 contiguous census block groups? Q: Does spatial distribution and nature of the census block groups dictate the clumping of the sample locations in a highly populated area? 002LEB02.RPT * 6/15/05 ------- EPA/600/R-01/078 October 2001 Guidance for Statistical Determination of Appropriate Percent Minority and Percent Poverty Distributional Cutoff Values Using Census Data for an EPA Region II Environmental Justice Project by M.S. Nash, G.T. Flatman, D.W. Ebert, and C.L. Cross U.S. Environmental Protection Agency Office of Research and Development National Exposure Research Laboratory Environmental Sciences Division Las Vegas, Nevada ------- Notice The U.S. Environmental Protection Agency (EPA), through its Office of Research and Development (ORD), funded and performed the research described here. It has been peer reviewed by the EPA and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation by EPA for use. ------- Preface The purpose of this report is to assist Region II by providing a statistical analysis identifying the areas with minority and below poverty populations known as "Community of Concern" (COC). The aim was to find a cutoff value as a threshold to identify a COC using demographic data. Other consultants were also involved to provide similar information. Region II presented our method for the Senior Mangers on June 2000, as a comparison with another two methods: cluster-based cutoff and state averages. A decision was made to use the cluster-based cutoff and state average because they were easier to understand and to use at the community level. Although our method was not the preferred one, there was a significant amount of time and effort put forth by the authors to develop the methodology, and we feel the technique is a valid one with possible future uses. ------- Table of Contents Notice ii Preface iii List of Abbreviations vii Section 1 - Introduction 1 Section 2 - Sampling and Decision Units 9 Section 3 - Distribution and Cutoff Value 10 3.1 Decision Unit Is Census Block Group and Sampling Unit Is Census Block Group 10 3.2 Decision Unit Is Census Tract and Sampling Unit Is Census Block Group 10 3.3 Decision Unit Is County and Sampling Unit Is Blocking Group 11 Section 4 - Distribution and Cutoff for Re-Sampling 15 Section 5 - GIS Remediation 17 Section 6 - Summary and Conclusion 19 References 20 IV ------- ------- List of Tables and Figures Table Description Summary of cutoff values associated with the three decision units from the "minority" and "below poverty" statistical analysis of the US EPA Region II (New York and New Jersey) Environmental Justice Study. In all cases, the sampling unit is the census block group. A "*" indicates the state cutoff values are not based on the 80th percentile; these are the values for state as decision unit 2 Number of neighboring census block groups (No.) from random selection, and five percentiles for percent minority for New Jersey and New York. 100th percentile is the maximum value 15 Figure Description 1 The Five Percentiles and Their Values for % Minority by Block Group 3 2 The Five Percentiles and Their Values for % Below Poverty by Block Group 4 3 The Five Percentiles and Their Values for % Minority by Tract 5 4 The Five Percentiles and Their Values for % Below Poverty by Tract 6 5 The Five Percentiles and Their Values for % Minority by County 7 6 The Five Percentiles and Their Values for % Below Poverty by County 8 7 Sample Locations (red circles) of the 100 Contiguous Census Block Groups 16 ------- Appendices Appendix Description Page la Percent minority in New Jersey. Values are from the sampling unit (a census block group) 12 Ib Percent minority in New York. Values are from the sampling unit (a census block group) 13 Ic Percent minority in New Jersey. Values are from the sampling unit (a census block group) 13 Id Percent minority in New York. Values are from the sampling unit (a census block group) 14 2a Example of error 1 17 2b Example of error 2 18 2c Example of error 3 18 VI ------- List of Abbreviations TotPop90 Total Population in 1990; Universe persons Pov_univ All persons for whom poverty status is determine Bel Pov Below poverty level; Universe: persons for whom poverty status is determine P belPov Percent below poverty level; Universe: persons for whom poverty status is determine; Calculation: Bel Pov I Pov univ * 100 NhispWht Non-hispanic White; Universe: persons Nhispblk Non-hispanic Black; Universe: persons Nhispnat Non-hispanic American Indian, Eskimo, or Aleut; Universe: persons Nhispas Non-hispanic Asian or Pacific Islander; Universe: persons Nhispoth Non-hispanic Other race; Universe: persons Hisp_wht Hispanic White; Universe: persons His_blk Hispanic Black; Universe: persons Hisp_nat Hispanic American Indian, Eskimo, or Aleut; Universe: persons Hisp_as Hispanic Asian or Pacific Islander; Universe: persons Hisp_oth Hispanic Other race; Universe: persons Perjnin Percent minority; Universe: persons; Calculation: [(Hisp_wht + His_blk + Hisp_nat + Hisp_as+Hisp_oth + Nhispblk + Nhispnat + Nhispas + Nhispoth) / TotPop90] * 100 VII ------- Section 1 Introduction The goal of this project is to identify a GIS and a statistical procedure which will objectively, reproducibly, and statistically identify a "Community of Concern" (COC) which is defined as a community with a "minority" or "below-poverty" population. We shall demonstrate the procedure using the census data for the state of New Jersey and New York located in EPA's Region II. This exercise in classification sounds straightforward and doable, but the choice of threshold values or cutoff values and changes of scale (e.g., census block groups to counties) changes the number and location of the COC, and may raise questions and criticism. An objective statistical algorithm is needed for identifying and locating the COC on the map of the Region. This is a non-trivial statistical problem. Because the data have time and space dimensions and skewed probability distributions, hypothesis testing, confidence intervals, and ratios and proportions are inappropriate and hence have the potential to mislead decision- makers. Descriptive analyses of the probability distribution of the data when aggregated to the appropriate scale (census block or group, census tract, town, township, county, state, or region) is an appropriate approach for the data and will give the desired quality for identification of a COC. Decisions will be made from the probability of the cutoff, not from arbitrary cutoff. In this context, it is important to define units and scale. The basic (indivisible) sampling unit of data or information is the census "block group." The decision unit changes (e.g., census block group, census tract, township, county, or state) and is chosen by the specific question to be answered. To change scale to a different decision unit other than the census block group (sampling unit), all of the spatially included sampling units in the new decision unit must have the counts of their characteristics summed over the desired decision unit and the desired percentages recomputed. The counts or frequencies are additive but the percentages or relative frequencies (probabilities) are not. The probability distribution is a useful statistical tool to measure the population of all decision units of a given scale (e.g., census tract, township, county, . . .). By choosing the cutoff probability at the 80th percentile for the characteristic of "minority" and the characteristic of "below poverty" in the population of all census block groups decision unit, the cutoff values associated with the cutoff probability are 48% and 68% for minority and 12% and 22% for below poverty, for New Jersey and New York, respectively (Table 1; Figures 1 & 2). It is not obvious that these cutoff values have anything in common, and they sound arbitrary, but in the probability of the population distribution they are determined (back transformed) by equal probability (80th percentile). It is important to note that the cutoff values associated with the equal probability decrease with a growth in area of the decision unit; this is to be expected from spatial statistics. It is also important to note that the cutoff values depend on locations of the area where the samples were taken. The cutoff values for the same probability (80th percentile) for the distribution of census tracts decision units are 56% and 77% for minority and 13% and 22% below poverty for New Jersey and New York, respectively (Table 1; Figures 3 & 4). The cutoff values for the same probability for the distribution of the county decision unit are 31% and 14% for minority and 10% and 13% for below poverty, for New Jersey and New York, respectively (Table 1; Figures 5 & 6). The commonality is 1 ------- their equal probability of the 80th percentile of their respective distributions. Thus the choice of COC will be based on a cutoff of "equal probability" instead of a cutoff of an arbitrary value (e.g., 50% minority or 50% below poverty). In summary, equal probability, as measured by the chosen highest percentile of the distribution of the data aggregated to the decision unit, will give the COC areas without using arbitrary cutoff values or percentages of "minority" or "below poverty." Table 1. Summary of cutoff values associated with the three decision units from the "minority" and "below poverty" statistical analysis of the US EPA Region II (New York and New Jersey) Environmental Justice Study. In all cases, the sampling unit is the census block group. A "*" indicates the state cutoff values are not based on the 80th percentile; these are the values for state as decision unit. Decision Unit Census Block Group Census Tract County State* Minority Cutoff (%) New Jersey 48 56 31 26 New York 68 77 14 31 Below Poverty Cutoff (%) New Jersey 12 13 10 8 New York 22 22 13 13 ------- New York Census Block Group % Minority PI (0- 1.91) P2 (1.91 - 6.28) P3I6.28- 16.4S) P4( 16.48-67.77) PS (67 77- 100) New Jersey Census Block Group % Minority PI (0 - 2.88) P2 (2.88 - 7.58) P3 (7.58 - 16,09) P4 (16.09-48.08) PS (48.08- 100) Figure 1. The Five Percentiles and Their Values for % Minority by Block Group. ------- New York Census Block Group % Below Poverty ~~]P1 (0-2.3) j P2 (2.3 - 5.6) | P3 (5.6 - 10.7) | P4 (10.7-21.6) JP5 (21.6-100) New Jersey Census Block Group % Below Poverty I I PI (0-1) IP2 (1 - 2.9) P3(29-5.7) P4 (5.7-11.7) P5(11 7- 100) Figure 2. The Five Percentiles and Their Values for % Below Poverty by Block Group. ------- New York Tract % Minority ^] 0 - 3.384 ^3.384-8.687 3 8.687 - 23.409 ^23409-77.147 • 77 147- 100 New Jersey Tract % Minority n 0 - 5.002 ] 5.002 - 9.894 ]9.894 - 19.173 I 19.173-56.136 I 56.136- 100 Figure 3. The Five Percentiles and Their Values for % Minority by Tract. ------- New Ywfc Tract % Below Poverty 8P1 (0-3.5) P2 (3.5 01-6.781} P3 (6.781-11.666} P4 (11.666-21.794) PS (21.794-100) Mew Jersey Tract % Below Poverty P1 (0-2.117) _P2 (2.117 -3.616) BP3 (3.616 -6.067) P4C8007 -12.603) PS (12.603-75) Figure 4. The Five Percentiles and Their Values for % Below Poverty by Tract. ------- New York County % Minority SP1 (1.023-2.702) P2 (2.702 - 4.948) P3 (4.948 - 7.775) P4 (7.775- 14.217) | P5 (14.217-77.054) New Jersey County % Minority SP1 (3.903-8.334) P2 (8,334- 15.087) P3 (15.087-22.741) P4 (22.741 -31.009) | PS (31.009 -54.67) Figure 5. The Five Percentiles and Their Values for % Minority by County. ------- oo New York County % Below Poverty BP1 (3.649 - 8.506) P2 (8.506 - 9.709) P3 (9.709- 11.726) P4(11.726-13.383) • P5( 13.383 -28.707) New Jersey County % Below Poverty P1 (2.569-3.912) J P2 (3.912-5.438) HI P3 (5-438 - 7-45) ^g PA (7.45 -10.261) ^B P5 (10.261 - 14.84) Figure 6. The Five Percentiles and Their Values for % Below Poverty by County. ------- Section 2 Sampling and Decision Units Two statistical units were identified: (1) decision units and (2) sampling units. These units were used to determine whether a community was/was not a minority and/or below poverty. The sampling unit is the census block group and the decision unit can be any unit that is equal to or larger than the census blocking group. For a preliminary attempt, we used census block group, tract, and county units as decision units. We used three combinations of sampling and decision units to examine the relative frequency of minority and below poverty. The three combinations were: 1. Decision unit is census block group and sampling unit is census block group, 2. Decision unit is census tract and sampling unit is census block group, and 3. Decision unit is county and sampling unit is census block group. ------- Section 3 Distribution and Cutoff Value Initially, a histogram was developed using blocking group percent minority (Per_Miri) and percent below poverty (P belPov) for each county and state (Appendices la - Id). We visually examined the distribution of each histogram, and along with the five equal probability percentiles of the ARC view maps, a decision cutoff value was defined. A different cutoff value for each of these two variables was made. Mathematical derivation of the percent minority and percent below poverty for decision units is explained below: 3.1 Decision Unit Is Census Block Group and Sampling Unit Is Census Block Group For this we used the Perjnin and P belPov variables that were provided to us by Region II and subsequently verified and recalculated by scientists in Las Vegas prior analysis (See "GIS Remediation" and Appendices 2a - 2c). 3.2 Decision Unit Is Census Tract and Sampling Unit Is Census Block Group To calculate % minority and % below poverty at the tract level, counts must be used rather than census block group percentages. Counts of minority (summation oftfisp_wht, Hisp_blk, Hisp_nat, Hisp_as, Hisp_oth, Nhispblk, Nhispnat, Nhispas, and Nhispoth), TotPop90, Bel Pov, andPov_Univ from each census block group were used. Relative frequencies for minority and below poverty at the level of the census tract were calculated. Tract percent minority and percent below poverty are the relative frequencies times 100. Calculations were done as follows: a) Tract % minority t . S Tract % Minority = ' " * 1 00 S i= 1 Where, £ = summation, t = total number of block groups in a given census tract, i = census block group (i = 1,2, ..., t), m = counts of minority in each census block group, and T = TotPop90 = count of total population in a census block group. 10 ------- b) Tract % below poverty: . L (BP), Tract % Below Poverty = ' ~ ' x 100 S Pi i= 1 Where, £ = summation, t = total number of block groups in a given census tract, i = census block group (i = 1,2, ..., t), BP = Bel Pov = count of Below Poverty in each census block group, and P = Pov_Univ = count of all people who reported their income in each census block group. 3.3 Decision Unit Is County and Sampling Unit Is Blocking Group To calculate % minority and % below poverty at the county level, counts must be used rather than percentages. Counts of minority (summation oftfisp_wht, Hisp_blk, Hisp_nat, Hisp_as, Hisp_oth, Nhispblk, Nhispnat, Nhispas, and Nhispoth), TotPop90, Bel Pov, andPov Univ from each census block group were used. Relative frequencies for the minority and below poverty at the level of the county were calculated. County percent minority and percent below poverty are the relative frequencies times 100. Calculations were done as follows: a) County % minority: c S County % Minority = J — x 100 * Where, £ = summation, c = total number of census block groups in a given county, i = census block group (i = 1, 2, ..., c), m = counts of minority in each census block group, and T = TotPop90 = count of total population in a census block group. 11 ------- b) County % Below Poverty: . S (BP), County % Below Poverty = ' c - x 100 S Pi Where, £ = summation, c = total number of census block group in a given county, i = census block group (i = 1, 2, ..., c), BP = Bel Pov = count of Below Poverty in each census block group, and P = PovJUniv = count of all people who reported their income in each census block group. It is important to note that we excluded block groups with TotPop90 and Pov_univ of zero value prior posting their five percentile values on maps. This also has to be considered in any other analyses such as clusters and averages; otherwise, different analyses will result in non comparable results. 1500: 1400 1300 o g 1200- 1100- 1000: 900- 800: 700 600 : 500: 400: 300: 200: 100 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Percent Minority Appendix 1a. Percent minority in New Jersey. Values are from the sampling unit (a census block group). 12 ------- 4000- 3000 2000 1000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Percent Minority Appendix 1b. Percent minority in New York. Values are from the sampling unit (a census block group). 3000- 2000- 2 4) er £ 1000- 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Percent Below Poverty Appendix 1c. Percent minority in New Jersey. Values are from the sampling unit (a census block group). 13 ------- 5000- 4000 >, 3000 a c u 3 cr 2000 1000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Percent Below Poverty Appendix 1d. Percent minority in New York. Values are from the sampling unit (a census block group). 14 ------- Section 4 Distribution and Cutoff for Re-Sampling There was a need to demonstrate the application of the above analysis on randomly aggregated numbers of contiguous census blocks in each state. This was done to simulate the results for larger decision units than that of the census block group, decision units such as townships, tract, and/or county. We generated 100 samples of contiguous census block groups at the following size groupings: 50, 100, 150, 200, and 250 contiguous census block group. The %minority and %below poverty were calculated for these simulated groups and their corresponding 80th percentiles were determined (Table 2). The overall trend was for cutoff values to decrease as the number of neighbors increased. Table 2. Number of neighboring census block groups (No.) from random selection, and five percentiles for percent minority for New Jersey and New York. 100th percentile is the maximum value. State New Jersey New York No. 50 100 150 200 250 50 100 150 200 250 20th 9.35 10.96 10.96 13.67 13.52 5.93 8.18 6.48 6.57 9.53 40th 15.24 17.27 17.16 20.34 22.05 13.66 13.14 11.73 14.43 16.25 60th 30.71 26.72 29.26 26.02 26.70 25.54 27.53 22.12 24.63 26.36 80th 57.03 55.30 55.62 47.13 41.40 56.94 61.41 48.92 60.12 54.84 100th 93.92 96.71 93.40 87.38 82.74 99.42 98.80 98.37 98.55 96.83 The locations of the central block group for the 100 samples of the 100 contiguous block group simulation for New Jersey and New York are shown in Figure 7. The apparent clumping of the sample locations in highly populated areas is due to the spatial distribution and nature of the block groups. In New York, sample locations were mostly in New York City and Buffalo, and in New Jersey, they were mostly in Jersey City, Newark, Staten Island, Hackensack and Camden (Figure 7). Block groups are drawn to include approximately an equal number of people. Therefore, block groups in densely populated areas are smaller in size and occur in greater numbers than in rural areas. It follows then that if 90% of the block groups occur in urban areas, then 90% of randomly selected groups will fall within these same areas. 15 ------- Figure 7. Sample Locations (red circles) of the 100 Contiguous Census Block Groups. ------- Section 5 GIS Remediation When we began the statistical analyses, we found errors in the data. These errors were: 1) Numerous block groups are comprised of several polygons where only one was necessary (Appendix 2a), 2) Several polygons are missing from the block group coverage obtained from Region II (Appendix 2b), and 3) Several polygons have erroneous id codes (see Appendix 2c). To remediate the errors so that both Region II and Las Vegas scientists could work on the same data set, the polygon data was downloaded from ESRI's ArcData Online site, internal boundaries between like block groups were dissolved, and the tabular demographic data supplied by Region II was joined to the polygons. Results were visually inspected for correctness. 340258113021 21 340255343021 Appendix 2a. Example of error 1. 17 ------- Appendix 2b. Example of error 2. 361190014033 361190014034 360050435009 Appendix 2c. Example of error 3. 18 ------- Section 6 Summary and Conclusion We demonstrated a simple descriptive method using the probability distribution of census and random sampling data sets that used to identify a COC based on a cutoff value. The cutoff value associated with cutoff probability at the 80th percentile in the population in the decision unit for the characteristic of "minority" and the characteristic of "below poverty" was used. For this analysis, it is important to define the sampling and decision units. The basic sampling unit was the census "block group." The decision unit may be equal to or larger than that of the sampling unit (e.g. county). If the decision unit is larger than that of the sampling unit, then all of the characteristics of the spatially included sampling units in the new decision unit must be recomputed. The above analysis, therefore, offers an easy method to evaluate a cutoff value based on the spatial proximity (scale) of the decision unit in order to determine if that is a COC. The choice of the scale is dependent on the degree of details that is required in answering a question and/or to make a managerial decision. In summary, this is one method that could be used to estimate distribution across the regional scale using census data. 19 ------- References SAS/STAT User's Guide (Version 6, 4th Ed.), Vol. 2. 1990. SAS Institute Inc., Gary, North Carolina, USA. 20 ------- |