EPA/600/B-18/302
The Human Exposure Model (HEM)
Residential Population Generator (RPGen) Module
Technical Manual
September 2018
U.S. Environmental Protection Agency, Office of Research and Development

-------
Prepared by
Kathie Dionisio1
Graham Glen2
Heidi Hubbard2
Jessica Levasseur2
Contributions from
Kristin Isaacs1
Paul Price1
Dan Vallero1
^.S. Environmental Protection Agency, Office of Research and Development, National Exposure
Research Laboratory
2ICF
2

-------
Table of Contents
Acknowledgments and Disclaimer	4
1.	Introduction	5
1.1	Overview	5
1.2	Purpose of this Technical Manual	5
2.	Overview of Residential Population Generator	5
2.1	Inputs	5
2.2	Outputs	5
3.	Implementation	5
3.1	Generation of variables	5
Determination of household size and distribution of adults/children	6
Binning of households by age distribution and number of household members	7
Location	7
Household Income	7
House Type	7
3.2	Linking of Survey Data/Simulation of the Population	8
3.3	Generation of physiology data	8
Appendices	9
A.	Geographic region definitions	9
B.	Output files	10
3

-------
Acknowledgments and Disclaimer
The United States Environmental Protection Agency through its Office of Research and Development
funded and collaborated in the research and development of this software. This model and its default
data are currently under development; this material has been distributed for evaluation purposes only,
The model has not been cleared by the United States Environmental Protection Agency for general
distribution. While example input data have been provided as an example, it is up to the user to verify
appropriate input data are being used for a given application. This manual is draft documentation and
has not been cleared for publication.
4

-------
1. Introduction
1.1	Overview
The Residential Population Generator (RPGen) module generates a simulated population of individuals
along with their corresponding individual and household characteristics and a description of their
residence which is representative of the U.S. population. RPGen takes as input large, nationally
administered databases representing U.S. demographic, household, and housing patterns.
1.2	Purpose of this Technical Manual
This Technical Manual is intended for use by scientists to understand the logic and scientific rationale
implemented in RPGen.
2.	Overview of Residential Population Generator
2.1	Inputs
The RPGen module takes as input national surveys of individual, household, and housing characteristics
for the U.S. population. Though the various national surveys were conducted independently, the survey
data is linked on key characteristics within the RPGen module such that variables from all surveys can be
assigned in the output data set.
In the RPGen module, three national databases are linked: the Public Use Microdata Sample (PUMS),
American Housing Survey (AHS), and Residential Energy Consumption Survey (RECS). PUMS is produced
by the U.S. Census Bureau's American Community Survey. The version provided as a default input file for
RPGen is the 5-year sample format, covering the years 2012-2014 inclusive, with nearly 9 million data
records. The PUMS data includes data on personal income for household members as well as other
population-level descriptors. These data are used to represent the demographic patterns of the U.S., to
be replicated in the simulated population output by RPGen. Data from both AHS and RECs are used to
provide additional information on the household level for the individuals being simulated. The AHS data
provided as default input for RPGen are from the 2013 survey, and cover housing type, housing size, and
housing age. The RECS data provided as default input for RPGen is from 2009 and focuses on variables
related to heating, cooling, types of appliances, and other energy-consuming objects found in homes.
2.2	Outputs
Each time RPGen is run, a text file "pophouse.csv" is produced, which includes a detailed description of
the individual, their household members, and characteristics of their residence, for the desired
simulated population. For each primary individual in the simulated population, the pophouse.csv file
includes the age and gender of the primary individual plus each other person living in the household.
The output file also includes physiology related variables for the individuals.
A detailed data dictionary for pophouse.csv can be found in Appendix B.
3,	Implementation
3.1 Generation of variables
In many cases, use of consumer products by the primary product user, in addition to bystander exposure
in a household when products are used by other household members, will vary by life stage, and by
characteristics of the individual's housing situation. As such, we aim to define bins of household
5

-------
composition and housing characteristics which are most likely to capture differences in an individual's
use of consumer products, as well as the background or bystander exposure due to product use in the
household. Additionally, we bin households to assist with linking the three input datasets based on the
common variables present in all input datasets. The PUMS, AHS, and RECS input datasets all include
variables pertaining to the household composition, specifically the total number of occupants of a
household, and additional variables which allow for the logical assignment of the number of adults and
children in the household.
The PUMS dataset is initially used to generate the simulated population. PUMS includes linked
population and housing data. Separate apartments in the same building are considered separate
housing units. Statistical sampling weights provided by PUMS were used to balance the selection
probability at the individual level.
The PUMS survey includes one record per person, with the potential for multiple records per household.
RECS and AHS contain one record per household. The housing portion of the AHS dataset is a random
survey of potential dwellings, and thus includes empty houses. Due to the original survey design, the
dataset is no more likely to include a house with many occupants over a house with few (or no)
occupants. By assigning a compatible house to each person selected from PUMS, the RPGen sample
becomes representative of the overall population.
Please note that though variables generated in this Module and described below may relate to
estimation of a home's air exchange rate, the air exchange rate is not calculated or estimated in the
RPGen Module. For assignment of air exchange rate for each household, please see the Source-to-Dose
Module Technical Manual.
Determination of household size and distribution of adults/children
Each PUMS record has a variable indicating the total number of persons living in that housing unit.
Separate apartments in the same building are considered separate housing units. The total number of
persons living in a housing unit ranges from 0-20, with lower numbers being far more common. For the
purposes of the RPGen module, houses with no occupants were removed from consideration. The PUMS
survey includes variables indicating number of children in the household who are related to the head of
household, and a categorical variable with four options: no children, at least 1 child under 6 years, at
least 1 child between 6-17 years, or at least 1 child under 6 years and at least 1 child from 6-17 years.
For the purposes of the RPGen module, the number of children in the household was assumed to be
either the number of children related to the head of household, or the smallest number consistent with
the categorical variable. A minimum of one adult is required in each household. Adults were assumed to
be individuals 18 years of age or older.
The AHS and RECS variables representing household composition were more straightforward, with RECS
including variables providing the total number of individuals in the household, and the binned age of
each member of the household. AHS included variables providing the total number of residents in the
household, the number of children (<18 years), number of adults (>18 years), and number of elders (>65
years).
6

-------
Binning of households by age distribution and number of household members
Due to available information within each survey on total number of household members and age and
gender distribution, and corresponding potential for differentiated product use by age and gender, the
determination was made to match survey records by grouping households into 4 bins based on
household composition.
Table 1. Household composition bins

Adults
Children
Bin 1
1
0
Bin 2
1
1+
Bin 3
2+
0
Bin 4
2+
1+
Statistical weighting variables and study design were taken into account to implement statistical
sampling on a per-person basis, though the population and housing data will be linked on a per-
household basis.
Location
Due to the inability, given current data, to determine differences in product use by fine-scale geographic
regions (e.g., city or state), the RPGen module identifies location of an individual's home as being in one
of 4 geographic regions (Northeast, Midwest, South, and West), and as being in a rural or urban setting.
The PUMS data set from which individual demographic variables are sampled identifies the Public Use
Microdata Area (PUMA) in which each individual resides. Using the U.S. Census Topologically Integrated
Geographic Encoding and Referencing (TIGER) dataset, the population density of each PUMA was
determined. The RPGen module then classifies an individual's location of residence as urban if the
population density is >129.8 people/km2, and rural for a lower population density. The four geographic
regions were coded as 1, 2, 3, or 4 (defined as Northeast, Midwest, South, and West, respectively) and
are defined in Appendix A.
Both the AHS and the RECS surveys include variables providing data on region of the country and rural or
urban designation, to be used for binning and survey matching purposes.
Household Income
Dependent on product category, use of consumer products is sometimes correlated with wealth. In the
RPGen module, household income was utilized as the indicator of wealth which may correspond to
product use. However, because purchasing power associated with income is relative to cost of living,
income was first ranked by region and urban/rural status, then assigned to bins corresponding to the
households with the top, middle, and bottom third of income within that region and urban or rural
designation. The number and size of bins related to purchasing power were chosen arbitrarily in the
absence of data indicating how consumer product use varies with income.
House Type
A variety of housing types are identified in the AHS and RECS datasets. Within the RPGen module,
housing types have been simplified and condensed as: single-unit (stand-alone) structures, multi-unit
structures, and other (mobile homes, boats, etc.). It is believed that the largest influence of house type
7

-------
on exposure will be in the determination of air exchange rates which influence indoor air
concentrations. Additional impacts of housing type on product use relate to the presence or absence of
required yard or outdoor maintenance, which are typically not required when one does not live in an
owned, stand-alone unit. Additional descriptive variables such as if the household owns or rents can
impact this distinction as well.
3.2 Linking of Survey Data/Simulation of the Population
Using the above defined key demographic and household related variables which can be identified in all
datasets, the PUMS dataset providing demographic and household composition data was linked with
the AHS and RECS datasets which each provided various housing characteristics. Records in each of the 3
surveys were binned on the variables defined previously. To link survey data, records from the same
bins were matched from each of the surveys.
Surveys were linked based on 4 variables: location, household income, house type, and household
composition. Bins utilized for each of the 4 key variables are identified in Table 2.
Table 2. Linking variables and bins
Linking variable
Bins
Location
Northeast, Midwest, South, West
Urban vs. rural
Household income
Top, middle, and bottom third of household income by
geographic region and urban/rural designation
House type
Stand-alone structure, multi-unit structure, other
Household composition
Household composition as defined by number of adults
and children in the household, by age group (see Table 1)
3.3 Generation of physiology data
For the primary person in each household (randomly chosen, so it may be a child or infant), the httk R
package is used (modified slightly for random number reproducibility) to generate a set of physiologic
variables. These modifications include adding upper and lower bounds on variables such as height and
weight to ensure that physiologically realistic descriptions are generated for the simulated population.
Variables output in pophouse.csv include properties such as height, weight, skin area, organ masses, and
blood flows to each organ. The data are available for later use, for example, by a model which tracks the
chemical after it has entered the body, such as a physiologically-based pharmacokinetic (PBPK) model.
8

-------
Appendices
A. Geographic region definitions
Region code
State name
3
Alabama
4
Alaska
4
Arizona
3
Arkansas
4
California
4
Colorado
1
Connecticut
3
Delaware
3
District of Columbia
3
Florida
3
Georgia
4
Hawaii
4
Idaho
2
Illinois
2
Indiana
2
Iowa
2
Kansas
3
Kentucky
3
Louisiana
1
Maine
3
Maryland
1
Massachusetts
2
Michigan
2
Minnesota
3
Mississippi
2
Missouri
4
Montana
2
Nebraska
4
Nevada
1
New Hampshire
1
New Jersey
4
New Mexico
1
New York
3
North Carolina
2
North Dakota
2
Ohio
3
Oklahoma
4
Oregon
1
Pennsylvania
9

-------
1
Rhode Island
3
South Carolina
2
South Dakota
3
Tennessee
3
Texas
4
Utah
1
Vermont
3
Virginia
4
Washington
3
West Virginia
2
Wisconsin
4
Wyoming
B. Output files
Data dictionary for pophouse.csv output file
Please note, not all variables listed are used in subsequent modules of HEM. Variables are maintained
however for potential future use in HEM, or if found useful when using RPGen to generate simulated
populations for use in other modeling efforts beyond HEM. A indicates the variable is unitless.
Variable Name
Description
Population Variables from PUMS
gender
gender of selected person (primary individual); Male or Female
reth
ethnic group (httkpop categories)
compid
7-digit code; first two digits = state FIPS, last 5 digits = 2010 PUMA
recno
PUMS record number
race
W=White, B=Black, N=Native American, A=Asian, P=Pacific Islander, 0=0ther, M=Multiple
ethnicity
M=Mexican hispanic, 0=other hispanic, N=not hispanic
age_years
age in full years, rounded down (range= 0 to 96)
pwgtp
statistical sampling weight
pool
combination of database matching variables family type, house type, income, census region, and
urban/rural (range 1-288)
income
annual household income
ages
40-character string, each pair is age of one household member (range 00-96)
genders
20-character string, each is M or F
state
2-digit FIPS code for one of the 50 states or DC (range 01=Alabama to 56=Wyoming)
Housing Variables from RECS
afuel
fuel used for air conditioning; 1-electricity, 2=gas or propane, 3=other, -6=NA
baths
number of full bathrooms; 0-10 (capped at ten), -6=NA
bedrms
number of bedrooms; 0-10 (capped at ten), -6=NA
built
year house was built; each year for 1990+, rounded down to 5x for 1970-1989, rounded down to lOx for
1920-1969, earlier=1919
cars
number of cars; 0-5 (capped at 5), -6=NA
cellar
type of basement; l=full basement, 2=partial basement, 3=crawl space, 4=slab, 5=other, -6=NA
10

-------
Variable Name
Description
hequip
main heating equipment; l=forced air furnace, 2=steam radiators, 3=heat pump, 4=electric baseboard, 5-
14=others
lot
square footage of lot; range is 200 - 999,997 square feet (almost 22 acres)
pwt
statistical weight within AHS; used for random selection
rooms
number of rooms; 1-21 (capped at 21)
sewdis
type of sewage disposal; l=septic tank, 2=chemical toilet, 3=outhouse, 4=other, 5=none, -6=municipal
system
unitsf
square footage of house (excl. garage, unfinished areas); 99-99998 (minimum allowed=99, capped at
99,998)
water
source of water (for washing and bathing); l=water system, 2=well, 3=spring, 4=cistern, 5=stream or lake,
6=bottled, 7=other
waterd
source of drinking water; l=water system, 2=well, 3=spring, 4=cistern, 5=stream or lake, 6=bottled, 7=other
control
record number from full 2013 AHS database
Housing Variables from AHS
doeid
record number from 2009 RECS database
nweight
sampling weight
hdd30yr
average annual heating degree days; range 0 -13346
cdd30yr
average annual cooling degree days; range 0 - 5357
kownrent
own or rent house; l=owned, 2=rented, 3 = stay without rent
condcoop
part of condo or coop; l=condominium, 2=cooperative, -2=NA
naptflrs
number of floors in apartment; range 1-4, -2 = not an apartment
stories
number of stories in single-family home; 10=one, 20=two, 31=three, 32=4+, 40=split-level, 50=other, -
2=not a single family home
stoven
number of oven-cooktop combinations; range 0-10
stovenfuel
fuel used for stove; l=gas, 2=propane, 5=electric, 21=other
stove
number of cooktops (not combined with ovens); range 1-10
stovefuel
fuel used for cooktop; l=gas, 2=propane, 5=electric, 21=other
oven
number of ovens (not combined with cooktops); range 0-10
ovenfuel
fuel used for oven; l=gas, 2=propane, 5=electric, 21=other
ovenuse
frequency of oven use; 0=not used, 1=3+ per day, 2=twice per day, 3=once per day, 4 = few times per week,
5=l/week, 6=less
outgrill
outdoor grill used; 0=no, l=yes
dishwash
dishwasher used in home; 0=no, l=yes
cwasher
clothes washer used in home; 0=no, l=yes
washload
frequency clothes washer used; l=l/week or less, 2=2-4 per week, 3=5-9 per week, 4=10-15 per week, 5=
16+ per week
dryer
clothes dryer used in home; 0=no, l=yes
dryruse
frequency clothes dryer used; l=every time clothes washed, 2=sometimes when clothes washed, 3=rarely, -
2=NA
tvcolor
number of televisions used in home; range 0-15
computer
computer used at home; 0=no, l=yes
numpc
number of computers; range 0-15
pcprint
number of printers used at home; range 0-9, -2=NA
moisture
humidifier used at home; 0=no, l=yes
11

-------
Variable Name
Description
prkgplcl
have an attached garage; 0=no, l=yes
prkgplc2
have a detached garage or carport; 0=no, l=yes
cooltype
type of air conditioning system; l=central, 2=window/wall, 3=both, -2=none
tempniteac
temperature setting at night (in warm weather); range 45-96, -2=no AC
numberac
number of window/wall AC units; range 1-15, -2=NA
numcfan
number of ceiling fans used; range 0-15
notmoist
dehumidifier used at home; 0=no, l=yes
highceil
high ceilings in home; 0=no, l=yes
windows
number of windows in heated areas of home; 0=none, 10=1-2, 20=3-5, 30=6-9, 41=10-15, 42=16-19, 50=20-
29, 60=30+
adqinsul
level of insulation; l=well insulated, 2=adequate, 3=poor, 4=none
drafty
home drafty in winter; l=always, 2=mostly, 3=sometimes, 4=never
swim
swimming pool or hot tub; 0=none, l=hot tub only, 2=pool only, 3=both
Physiological variables from httk
meanjogh
mean of log(height) for this age-gender group
meanjogbw
mean of log(body weight) for this age-gender group
weight
body weight in kilograms; calculated from meanjogbw and logbw_resid
height
height in centimeters; calculated from meanjogh and logh_resid
blood_mass
mass of blood (kg)
brainjnass
mass of brain (kg)
gonads_mass
mass of gonads (kg)
heart_mass
mass of heart (kg)
kidneys_mass
mass of kidneys (kg)
large_intestine_mass
mass of large intestines (kg)
liver_mass
mass of liver (kg)
lungjnass
mass of lungs (kg)
muscle_mass
mass of muscular tissue (kg)
pancreas_mass
mass of pancreas (kg)
skeleton_mass
bone mass (kg)
skin_mass
mass of skin tissue (kg)
small_intestine_mass
mass of small intestines (kg)
spleen_mass
mass of spleen (kg)
stomach_mass
mass of stomach (kg)
adipose_flow
blood flow to adipose tissue [-]
brain_flow
blood flow to brain [-]
CO
Cardiac output (L/h)
gonads_flow
blood flow to gonads [-]
heart_flow
blood flow to heart muscle (not into heart) [-]
kidneys_flow
blood flow to kidneys [-]
la rgejntesti ne_flow
blood flow to large intestines [-]
liver_flow
blood flow to liver [-]
lung_flow
blood flow to lung tissue [-]
12

-------
Variable Name
Description
muscle_flow
blood flow to muscles [-]
pancreas_flow
blood flow to pancreas [-]
skeleton_flow
blood flow to bone tissue [-]
skin_flow
blood flow to skin tissue [-]
small_intestine_flow
blood flow to small intestines [-]
spleen_flow
blood flow to spleen [-]
stomach_flow
blood flow to stomach [-]
other_mass
mass of other tissues (kg)
adipose_mass
mass of adipose tissue (kg)
org_flow_check
relevant to httkpop r package
weight_adj
adjusted body weight (sum of organ masses) (kg)
BSA_adj
adjusted body surface area (cm2)
million.cells.per.gliver
Hepatocellularity, million cells/g liver
hematocrit
Percent volume of red blood cells in the blood
serum_creat
Serum creatinine, mg/dL
gfr_est
Estimated glomerular filtration rate, mL/min/1.73m2 BSA
bmi_adj
adjusted body mass index
bmi
body mass index
BSA
body surface area (cm2)
13

-------