£ rPA United States m 0fficeof \/Crr\ Environmental Protection Agency Research and Development National Human Exposure Assessment Survey (NHEXAS) Arizona Study Quality Systems and Implementation Plan for Human Exposure Assessment The University of Arizona Tucson, Arizona 85721 Cooperative Agreement CR 821560 Standard Operating Procedure SOP-IIT-A-9.0 Title: Sampling Weight Calculation Source: The University of Arizona U.S. Environmental Protection Agency Office of Research and Development Human Exposure & Atmospheric Sciences Division Human Exposure Research Branch Notice: The U.S. Environmental Protection Agency (EPA), through its Office of Research and Development (ORD), partially funded and collaborated in the research described here. This protocol is part of the Quality Systems Implementation Plan (QSIP) that was reviewed by the EPA and approved for use in this demonstration/scoping study. Mention of trade names or commercial products does not constitute endorsement or recommendation by EPA for use. ------- SOP#nT-A-9.0 Revision #0 February 27, 2001 Page: 1 of 8 STANDARD OPERATING PROCEDURE FOR SAMPLING WEIGHT CALCULATION This Standard Operating Procedure (SOP) uses data that have been properly coded and certified with appropriate QA/QC procedures by the University of Arizona NHEXAS team. Objective Calculate sampling weights. The sampling weights are needed to obtain weighted statistics of the NHEXAS data. Introduction Three types of sampling weights are defined here: 1. Single-stage Weight: It is the inverse of the probability of selection of each sampled unit at a particular stage. 2. Total Weight: Total weight of a sampled unit at a particular stage is calculate by 2 different ways depending on the following conditions: • If no weight adjustment has been done in the previous stages. The total weight is the product of the single-stage weights of that sampled unit at that stage and all other previous stages. • If sampling weights in the previous stages have been adjusted. The total weight is the product of the single-stage weight of that sampled unit at that stage and the adjusted total weight of that unit at the previous stage. 3. Adjusted Total Weight: It is the total weight which has been adjusted to best represent the population of interest. This type of weight is the one to be used in the calculation of weighted statistics of data at each stage. A diagram which explains development of sampling weights in each survey stage of NHEXAS is presented in Table 6-1. ------- SOP# IIT-A-9.0 Revision #0 February 27, 2001 Page: 2 of 8 Table 6-1: Sampling Weight Diagram-Survey Stages STAGE ACTIVITY SINGLE-STAGE WEIGHT TOTAL WEIGHT Survey 1 Select SO Tracts from the State of Arizona WU Survey 2 Select 250 Blocks from the 50 Tracts Wy WT,j = (WUXWU) Survey 3 Select 1225 HHs from the 250 Blocks WU WTU = (WTU)(WU) Survey 4 Select 1 primary respondent per HH Wi.4 WPDES = (WTDES'XWi,4) Calculation of Sinele-staee Weights In random sampling, each sample unit has equal probability of selection. When n units are selected from a stratum which has N units, the probability of selection of each unit is: n (6-1) When the "Probability Proportional to Size" or PPS design is applied, sample units which are different in size have unequal probability of selection. In NHEXAS, the number of occupied housing units (OHU) are the measure of size. When unit i is selected from a stratum which has N units, the probability of selection of that unit, 7C», is: (size of the unit) (total size of all N units) (6-2) When n units are selected, the probability of selection of each unit is multiplied by n and becomes the frequency of selection, 7ti*: 1 WTDES is the adjusted total weight of unit i at the survey stage 3, WTy. Sampling weight adjustment are explained in SOP#10. ------- SOP# IIT-A-9.0 Revision #0 February 27, 2001 Page: 3 of 8 , (n)(size of the i"1 unit) 1 (total size of all N units) (6-3) The sampling weight of unit i, W; , is the reciprocal of the probability of selection or frequency of selection of that unit: Wi = 1/Tti or l/7Ci* (6-4) The equations for weight calculations of each survey stage in NHEXAS are as follow: Stage 1 : Selection of tracts from each county (PPS design) W., = (0HU/a"y)- (6-5) (tract selected/county) j (OHU / tract 2) • Stage 2 : Selection of blocks from each tract (PPS design) w. , (0HU/tratt2)' (6-6) x'2 (block selected/tract) j (OHU / block 2) j Stage 3 : Selection of households from each block (random sampling) (OHU/block 2): _ W = 1— (6-7) ' (OHU selected/block)j Stage 4 : Selection of one primary respondent from each household (random sampling) (member/OHU), =(mmber (6g) If* 1 Calculation of Total Wei£hts In a multi-stage sampling, weighted statistics of the samples at any particular stage can be calculated by using the total weight of each unit at that stage. That total weight is equal to a product of the single-stage weights resulted from that stage and all the proceeding stages. Mathematically: ------- SOP#nT-A-9.0 Revision #0 February 27, 2001 Page: 4 of 8 m wt,„=nwi, (s-9) s=l where WT^ = total weight of unit i at stage m, and = single-stage weight of unit i at stage s. Equivalently, a total weight of unit i at stage m can be considered as a product of the single-stage weights resulted from that stage and the total weight of that unit at stage m-1 Mathematically: WT„=(WTim_,XWin,) (6-10) If the total weight at stage m-1 is adjusted for nonresponse or adjusted with other methods, the adjusted total weight will be used to calculate the total weight at the next stage. Therefore, the equation becomes: WTim =(WTi>m_1>Myustod)(Wijm) (6-11) Variable List Variable Description COUNTY County I.D., according to the Census. COUNTY 2 County I D., after some changes are made (see details in Procedure). TRACT Tract I.D., according to the Census. TRACT 2 Tract I.D., after some changes are made (see details in Procedure). BLOCK Block I.D., according to the Census. BLOCK 2 Block I.D., after some changes are made (see details in Procedure). HHID Household I D., according to the Census. RESPONSE Response status (enrolled or refused to answer the Descriptive Questionnaire). OHU/CNTY Number of occupied housing unit (OHU) per county. OHU/TRACT Number of OHU per tract. ohu/tract 2 Number of OHU per tract, according to TRACT 2. HHENU/BLK Number of household enumerated by "field truthing" per block VAC REP/BLK Number of vacant household per block ohu/block Number of OHU available for contact per block. This is equal to HH ENU/BLK minus VAC REP/BLK. ohu/block 2 Number of household available for contact per block, according to BLOCK 2. ------- SOP#nT-A-9.0 Revision #0 February 27, 2001 Page: 5 of 8 Variable Description TRACT SELECTED/CNTY Number of tracts selected per county, according to TRACT 2. BLOCK SELECTED/TRACT Number of blocks selected per tract, according to TRACT 2 and BLOCK 2. OHU SELECTED/BLOCK Number of OHU selected per block, according to BLOCK 2. UEUBER/OHU Number of member per OHU. Wn Single-stage weight of OHU i. resulted from sampling in stage 1. m Single-stage weight of OHU i, resulted from sampling in stage 2. wl3 Single-stage weight of OHU i, resulted from sampling in stage 3. Single-stage weight of the primary respondent in OHU i, resulted from sampling in stage 4. Total weight of OHU i at stage 3. ------- SOP# nT-A-9.0 Revision #0 February 27, 2001 Page: 6 of 8 Procedure 1. In SPSS, open ORIGINAL DQX, delete all variables except the following: TRACT, BLOCK, HHID, and RESPONSE. The data will then be saved as a new file called weight structure MAIN. 2. In Excel, open WEIGHT STRUCTURE MAIN, the following variables will be added into the file and their values will be entered: COUNTY, OHU/CNTY, and OHUtlRACT. Data for the 3 variables will be obtained from the 1990 Census data file C_TRACTJ)AT sent to IIT by the UA research team. Next, the following variables will be added into the file and their values will be entered: HHENU/BLK and VAC REP/BLK. Data for the two variables will be obtained from a document called "NHEXAS RECRUITMENT LOG SUMMARY" which was sent to IIT by the UA research team. Also, a variable called OHU/BLOCK will be created. Its value, for each household, is equal to the value in HH ENU/BLK minus the value in VAC REP/BLK. 3. Since there are combination of tracts and blocks, variables for new tract and block i d. will be created and called TRACT 2 and BLOCK 2, respectively. The combination data is obtained from the document "NHEXAS RECRUITMENT LOG SUMMARY". For TRACT 2, all values will be the same as the original tract i d. except in Santa Cruz county where its two tracts (9962 and 9964) will be combined. They are considered as one tract with a given tract i.d. "99629964". For BLOCK 2, all single blocks will have the same block i.d. as in the original block i.d., while all combined blocks will have a new set of i.d. There is a total of 36 groups of combined blocks. Each of them will be given a block combination number. The new block i.d. will then be equal to the block combination number plus 9900. For example, Pima county has 6 selected blocks: 101, 307, 113, 128, 119, and 201. The last two blocks are indicated as combined blocks. This combination was given a combination number equal to 1. Therefore, these two blocks will be considered as one block with an i.d. 9901. Two important criteria for weight calculation purposes are: a) A block cannot be considered as a single block and a component of combined blocks at the same time. For example, tract 4 in Yavapai county which has 4 single blocks: 107, 229, 220, and 231; and a group of combined blocks: 231+311+312 is considered as having 3 single blocks: 107, 229, and 220; and a group of combined blocks: 231+311+312. b) A block cannot be in different groups of combined blocks at the same time. For example, tract 9611 in Navajo county has 2 single blocks 135 and 316; and 3 groups of combined blocks: 147+218, 211+218, and 239+218. Since block 218 appears in all 3 groups, the groups must be combined. As a result, this tract is considered as having 2 single blocks: 135 and 316; and a group of combined blocks 147+211+239+218. The new tract and block i.d., which are identified by the variables TRACT 2 and BLOCK 2 will be used for the rest of the weight calculation procedure. 4. When using the SUDAAN program, it is required that the number of Secondary Sampling Unit (SSU) selected from each Primary Sampling Unit (PSU) is more than ------- SOP# IIT-A-9.0 Revision #0 February 27, 2001 Page: 7 of 8 one. In NHEXAS, the PSU is county and the SSU is tract. Therefore, any county which has only one tract selected must be combined with another county. Based on geographical attributes of the counties, combinations of counties were made and the new county i.d. are: Combined Counties New I D. in COUNTY 2 1 and 17 1017 9 and 11 9011 15 and 25 15025 19 and 23 19023 Similarly, the new county i.d. which is identified by COUNTY 2 will be used for the rest of the weight calculation procedure. 5. In weight structure main, a new variable OHU/TRACT 2 will be created. Its values will be the same as OHU/TRACT except in tract 99629963 where the value is the summation of OHU in tract 9962 and 9963. In other words, this variable contains the numbers of OHU per tract which correspond to TRACT 2. Next, a new variable OHU/BLOCK 2 will be created. For single blocks, the variable will have the same values as OHU/BLOCK. For groups of combined blocks, each group will have a value which is the summation of OHU in each combined blocks. In other words, this variable contains the numbers of OHU per block which correspond to BLOCK 2. The summation process will be done using available functions in Excel. 6. In weight structure kain, using functions available in Excel, the following variables will be created: TRACTSELECTED/CNTY, BLOCK SELECTED/TRACT, and OHU SELECTED/BLOCK. 7. Finally, in weight structure main, calculate the following sampling weights: • Wi,i, calculated by using Eq. 6-5. • calculated by using Eq. 6-6. • Ufa, calculated by using Eq. 6-7. • calculated by using Eq. 6-8. • WT& calculated by using Eq. 6-9. Spreadsheet Format In weight structure main: Column Variable 1 COUNTY 2 2 TRACT 2 3 BLOCK 2 4 HHID 5 RESPONSE ------- SOP# nT-A-9.0 Revision #0 February 27, 2001 Page: 8 of 8 Column Variable 6 OHU/CNTY2 7 ohu/tract 2 8 HHENU/BLK 9 VAC REP/BLK 10 OHU/BLOCK, calculated from (HHENU/BLK) - (VACREP/BLK) 11 OHU/BLOCK 2, which is the values from OHU/BLOCK adjusted according to BLOCK 2 12 TRACT SELECTED/CNTY 13 BLOCK SELECTED/TRACT 14 OHU SELECTED/BLOCK 15 UEUBER/OHU 16 Wii, calculated from (OHU/CNTY) / [(OHU/TRACT 2) x (TRACT SELECTED/CNTY)] 17 W„. calculated from (OHU/TRACT 2) / ((OHU/BLOCK 2) x (BLOCK SELECTED/TRACT)] 18 WiS, calculated from (OHU/BLOCK2) /(OHUSELECTED/BLOCK) 19 Wu4, which is equal to (UEUBER/OHU) 20 OTtj, calculated from x x ------- |