Identifying Predictors of Exposure—Analyses of the
National Human Exposure Assessment Survey (NHEXAS)
Questionnaire and Measurement Data
Prepared For;
U.S. Environmental Protection Agency
National Exposure Research Laboratory—Human Exposure Research Branch
P.O. Box 93478
Las Vegas, Nevada 89193-3478
Prepared By;
Anteon Corporation
4220 South Maryland Parkway, Suite B-408
Las Vegas, Nevada 89119-7525
Under GSA Contract No. GS-09K-99-BHD-0001
Task No. 9T1Z004TMA
Task Leader: Carol B. Thompson, Anteon Corporation
Client Representative: James J, Quackenboss, U.S. EPA
Project Officer: Gary L. Robertson, U.S. EPA
August 2003

-------
Disclaimer
The United States Environmental Protection Agency (EPA) through its Office of Research and
Development funded and managed the research described here. It has been peer reviewed by the
EPA and approved for publication. Mention of trade names or commercial products does not
constitute endorsement or recommendation by EPA for use.

-------
TECHNICAL REPORT DATA
1. Report No. 2.
-
4. Title and Subtitle
"Identifying Predictors of Exposure—Analyses of the National
Human Exposure Assessment Survey (NHEXAS) Questionnaire
and Measurement Data"
5. Report Date
Prepared August, 2003
6. Performing Organization Code
7, Author(s)
Bloom, B.1, Crockett, P.W.% Egeghy, P.P.3, Leiss, J.2,
Quackenboss. J.J.3, Richins. J.4, Riksford, J.C.4, Schwer, R.K,5,
Stanek, T.4, Stephan, P.M.4, Thompson, C.B.4, Tsang, A.M.4, and
Wilkes, C.R.1.
1.	Wilkes Technologies, Inc., Bethesda, MD
2.	Constella Health Sciences, Durham, NC
3.	U.S. Environmental Protection Agency, Las Vegas, NV
4.	Ant eon Corporation, Las Vegas, NV
5.	University of Nevada, Las Vegas, NV
8. Performing Organization
Report No.
9. Performing Organization Name and Address
Anteon Corporation
4220 South Maryland Parkway, Suite B-408
Las Vegas, Nevada 89119-7525
10, Program Element No.
8020IF, APG 33, APM 29
11. Contract/Grant No.
LAG #DW4793927501
12. Sponsoring Agency Name and Address
U.S. Environmental Protection Agency
ORD/National Exposure Research Laboratory
Exposure & Dose Research Branch
P.O. Box 93478
Las Vegas, Nevada 89193-3478
13. Type of Report and Period
Covered
14. Sponsoring Agency Code
15. Supplementary Notes

-------
TECHNICAL REPORT DATA
16. Abstract
The National Human Exposure Assessment Survey (NHEXAS) studies provide the basis for
identifying major predictors of exposure to understand how high exposures can be reduced and how
activities contribute to exposures. A systematic analysis of the questions used in NHEXAS relative to
environmental concentration and exposure measurements offers an opportunity to minimize
participant burden and costs for future exposure and health effects studies. As part of the Strategic
Analysis Plan for the NHEXAS study data, task P-01: Analysis and Comparison of NHEXAS
Exposure Data to Residential Pollutant Sources, Concentrations, and Activity Patterns was charged
with identifying questionnaire items and/or environmental and biological measures that are useful for
predicting human exposure to chemicals. Using data from the NHEXAS Region 5 and Arizona
studies, this project evaluated such potential relationships under three analysis objectives; modeling
and regression analysis, classification of individuals by their exposure level, and classification of
individuals with high-end exposure levels. Forty-eight model-based analyses for each of the three
objectives were performed. The topics of predictors that seem to be most universal across the two
studies and two chemical classes are air measurements, tobacco-related activities, air-exchange
activities, housing characteristics, and where time is spent.
17. KEY WORDS AND DOCUMENT ANALYSIS
A. Descriptors
National Human Exposure Assessment Survey
NHEXAS
B. Identifiers / Open Ended
Terms
C. COSATI
18. Distribution Statement
19. Security Class (This
Report)
Unclassified
20. Security Class (This
Page)
Unclassified
21. No. of Pages
22. Price
Form Available: Network Neighborhood\Knight\Groups\HEASD\Forms\Technical-Report-Data-2220-l

-------
Contributors
Bernard Bloom, MS, C1AQP
Wilkes Technologies, Inc., Bethesda, MD
Patrick W. Crockett, PhD
Constella Health Sciences, Durham, NC
Peter P. Egeghy, PhD
U.S. Environmental Protection Agency, Las Vegas, NY
Jack Leiss, PhD
Constella Health Sciences, Durham, NV
James J. Quackenboss
U.S. Environmental Protection Agency, Las Vegas, NV
Jennifer Richins
Anteon Corporation, Las Vegas, NV
Jana C. Riksford
Anteon Corporation, Las Vegas, NV
R. Keith Schwer, PhD
University of Nevada, Las Vegas, NV
Thomas Stanek
Anteon Corporation, Las Vegas, NV
Peter M. Stephan
Anteon Corporation, Las Vegas, NV
Carol B. Thompson
Anteon Corporation, Las Vegas, NV
Andy M. Tsang
Anteon Corporation, Las Vegas, NV
Charles R. Wilkes, PhD, PE
Wilkes Technologies, Inc., Bethesda, MD

-------
Abstract
The National Human Exposure Assessment Survey (NHEXAS) studies provide the basis for
identifying major predictors of exposure to understand how high exposures can be reduced and
how activities contribute to exposures. A systematic analysis of the questions used in NHEXAS
relative to environmental concentration and exposure measurements offers an opportunity to
minimize participant burden and costs for future exposure and health effects studies. As part of
the Strategic Analysis Plan for the NHEXAS study data, task P-01: Analysis and Comparison of
NHEXAS Exposure Data to Residential Pollutant Sources, Concentrations, and Activity Patterns
was charged with identifying questionnaire items and/or environmental and biological measures
that are useful for predicting human exposure to chemicals. Using data from the NHEXAS
Region 5 and Arizona studies, this project evaluated such potential relationships under three
analysis objectives: modeling and regression analysis, classification of individuals by their
exposure level, and classification of individuals with high-end exposure levels.
The analysis approach used in this project includes both non-traditional statistical techniques and
non-traditional uses of traditional parametric statistical techniques. It uses patterns of
relationships that exist in the data rather than subjective methods to select variables for
subsequent analyses. The analyses are then science-driven; that is, they are based on the
understanding of the types of relationships that are plausible, and are used to test hypotheses
based on that understanding of the science. The approach recognizes the limitations of the data
and thus balances exploratory and descriptive paths with inferential paths.
This approach to identify primary predictors provided a systematic and consistent review of all
the questions used in the NHEXAS studies, whereas other analyses evaluating relationships
between questionnaire and measurement variables are usually limited to selected questions of
interest. In Phase 2, the approximately 600 available questionnaire variables were filtered with
respect to their relationships with other questionnaire variables using a combination of principal
component analysis and stepwise regression analysis. The questionnaire variables carried
forward to Phase 3 were then tested for potential relationships to concentration, exposure, and/or
biomarker measurements in analyses based on conceptual models from the environmental health
paradigm. Depending on the relationships evaluated in a model for a specific chemical,
questionnaire variables may have had multiple opportunities to be selected as a predictor. Phase
3 used various types of traditional and nonparametric regression analyses, and data mining
techniques such as CIIAID to select the predictors for each conceptual model. The analyses
were performed for five metals and five volatile organic compounds which were measured in
both studies.
Overall the approach seemed to provide reasonable and consistent results in selecting predictors
of exposure. Limitations of the number of available cases, available measurements, and
measurements below the detection limit impacted the breadth of conceptual models that could be
analyzed. Forty-eight model-based analyses for each of the three objectives were performed.
The topics of predictors that seem to be most universal across the two studies and two chemical
classes are air measurements, tobacco-related activities, air-exchange activities, housing
characteristics, and where time is spent. Predictors are also identified for chemical-specific
interests. Information about relationships between the questionnaire variables affords future
studies options for selecting questions based on ease of administration.

-------
Categorical Regression 	4-30
Transforming Response Outcomes 				4-31
Interpreting Categorical Regression Coefficients 	4-36
Stepwise Regression 					4-37
Cross-validation of Regression Analyses 	4-38
Final Selected Predictors			4-38
Evaluating the Results				 4-39
4.5.4	Objective 2--Classifying Subjects by Exposure 		4-40
Classification Overview			4-44
Growing the Tree	4-46
Form of Dependent Variables	4-46
Predictor Categories	4-46
Predictors with Imputed Measurements 			4-47
Stopping Rules			4-48
Refining the Tree			4-48
Characteristics of the Classification 	4-49
4.5.5	Objective 3—Classifying Subjects with High Exposure Levels	4-54
Logistic Regression and E-CHAID	4-54
Analysis Options 				4-55
Separation Issues			4-57
Analysis Results and Criteria	4-57
4.6 Quality Assurance 			4-58
5	Results and Discussion 			5-1
5.1	Introduction						 		5-1
5.2	Results for N1IEXAS Region 5 Study	5-4
5.2.1	General Comments 						5-4
5.2.2	Metals 			 				5-4
5.2.2.1	Arsenic				5-4
5.2.2.2	Lead	5-11
5.2.3	VOCs						 . 5-18
5.2.3.1	Benzene						5-18
5.2.3.2	Chloroform	5-23
5.2.3.3	Tetrachloroethylene	5-27
5.2.3.4	Trichloroethylene			5-30
5.3	Results forNHEXAS Arizona Study	5-33
5.3.1 General Comments	5-33
'5.3.2 Metals			5-34
5.3.2.1	Arsenic 				5-34
5.3.2.2	Cadmium			 5-39
5.3.2.3	Chromium 			 5-45
5.3.2.4	Lead 					5-49
5.3.2.5	Nickel	5-52
5.3.3 VOCs 					 5-56
5.3.3.1	Benzene			5-56
5.3.3.2	Formaldehyde 			5-59
5.3.3.3	Toluene 							 5-60
5.4	Summary of Results			5-62
6	References 	6-1

-------
Appendix A	OMB Version of NHEXAS Questionnaires
Appendix B	Citations from NHEXAS Studies
Appendix C	NHEXAS-Related Entries in EIMS-IIEDS
Appendix D	List of Questionnaire Variables from NHEXAS Studies Reviewed in Phase 1
Appendix E	Potential Sources and Health Effects of Chemicals Included in the Analyses
Appendix F	Description of Information in Appendices G and H: Examples and Explanations
Appendix G	Detailed Analysis Results for Region 5 Study
Appendix H	Detailed Analysis Results for Arizona Study
Appendix 1	Background Information on NHEXAS Pilot Studies
Appendix J	Glossary
Appendix K	Information on Derived Variables Included in the Analyses
Appendix L	Naming Conventions and Sample Type Codes for Measurement Variables

-------
Tables
Table 2.1 Selected Predictors for Metals in the Region 5 Study Across the Phase 3 Analysis
Objectives 	2-6
Table 2.2 Selected Predictors for VOCs in the Region 5 Study Across the Phase 3 Analysis
Objectives 			2-10
Table 2.3 Selected Predictors for Metals in the Arizona Study Across the Phase 3 Analysis
Objectives 						 2-14
Table 2.4 Selected Predictors for VOCs in the Arizona Study Across the Phase 3 Analysis
Objectives 			2-18
Table 4-1. Primary Target Chemicals Analyzed in the NHEXAS Region 5 and Arizona
Studies
		4-3
Table 4-2, Media Measured in the NHEXAS Region 5 and Arizona Studies 		4-4
Table 4-3. Types of Questionnaires Used in the NHEXAS Region 5 and Arizona Studies 4-5
Table 4-4a, Example of Questionnaire Variables with Code Values Assigned for No Response
and Not Applicable: Exposure impact less likely with "No" response than "Yes"
response			4-12
Table 4-4b. Example of Questionnaire Variables with Code Values Assigned for No Response
and Not Applicable: Exposure impact less likely with fewer cigarettes smoked
per day	4-12
Table 4-5. Selected Principal Components (Rotated Matrix) Showing Absolute Loading
Values > 0.6 for the Region 5 Study's Health Status Group of Questionnaire
Variables (N=249)	 4-19
Table 4-6. Initial Set of Questionnaire Variables and VIF Values for the Third Rotated
Principal Component in Table 4-5			4-21
Table 4-7. Final Set of Questionnaire Variables and VIF Values for the Third Rotated
Principal Component in Table 4-5			4-21
Table 4-8. Summary of the Phase 2 Analysis on the Region 5 Study Health Status Group of
Questionnaire Variables			4-21
Table 4-9. Category and Final Scaling Values of Arsenic Concentration in Personal Air
(ng/m3) from CATREG Analysis of the Region 5 Study Example Model . . . 4-32
Table 4-10a, Category and Final Scaling Values of Predictor B06 (Use Tobacco Products?)
from CATREG Analysis of Region 5 Study Example Model	4-33
Table 4-10b. Category and Final Scaling Values of Predictor B08B (# Minutes with Smoker at
Work) from CATREG Analysis of Region 5 Study Example Model	4-33
Table 4-10c. Category and Final Scaling Values of Predictor GEO (What state do you live in?)
from CATREG Analysis of Region 5 Study Example Model	4-34
Table 4-11. Partial Table of Regression Coefficients from CATREG Analysis of the Region 5
Study Example Model	4-37
Table 4-12. Predictors Selected from the 6-partition and 9-partition Scenarios for the Region 5
Study Example Model			4-38
Table 4-13. Selected Predictors and Analysis Criteria for the 6- partition and 9-partition
Regression Scenarios of the Region 5 Study Example Model 	4-39
Table 4-14. Criteria for Top Five Predictors Available for Splitting Node 0 in the Region 5
Study Example Model				 4-48
Table 4-15. Percent Change in Risk Estimate at Levels of Tree in Figure 4-7 	4-49
Table 4-16. Summary Statistics and Defining Characteristics of Terminal Nodes from Tree in
Figure 4-9		 4-52
Table 4-17, Logistic Regression Odds Ratios and Analysis Criteria for Region 5 Study
Example Model 			4-55

-------
Table 5,1 Guide!
Table 5.2.2.1-CIA
Table 5.2.2.1-CSF
Table 5.2.2.1-EAR
Table 5.2.2.1-EOT
Table 5.2.2.1-DOS
Table 5.2.2.2-CIA
Table 5.2.2.2-CSF
Table 5.2.2.2-EAR
Table 5.2.2.2-EDT
Table 5.2.2.2-DOS
Table 5.2.3.1-CIA
Table 5.2.3.1-EAR
Table 5.2.3.1-DOS
Table 5.2.3.2-CIA
Table 5.2.3.2-EAR
Table 5.2.3.2-DOS
Table 5.2.3.3-CIA
Table 5.2.3.3-EAR
ines for Discussing the Results from the Phase 3 Analyses		 5-3
Selected Predictors of Arsenic Concentration in Indoor Air (ng/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=218) 5-4
Selected Predictors of Arsenic Loading in Indoor Surface Dust (ng/cm2)
and Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=247)
	5-6
Selected Predictors of Arsenic Concentration in Personal Air (ng/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=169)
												..5-1
5-9
Selected Predictors of Arsenic Intake in Food and Beverage from
Duplicate Diet (ug/day) and Analysis Criteria Across the Phase 3
Objectives in Region 5 (N=156) 				
Selected Predictors of Arsenic Dose in Urine (ug/g Creatinine) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=197)
								 5-10
Selected Predictors of Lead Concentration in Indoor Air (ng/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=213) 5-11
Selected Predictors of Lead Loading in Indoor Surface Dust (ng/cm2) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=245) 5-13
Selected Predictors of Lead Concentration in Personal Air (ng/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=167)
			5-15
Selected Predictors of Lead Intake in Food and Beverage from Duplicate
Diet (ug/day) and Analysis Criteria Across the Phase 3 Objectives in
Region 5 (N=l 56) 		5-16
Selected Predictors of Lead Dose in Blood (ug/dL) and Analysis Criteria
Across the Phase 3 Objectives in Region 5 (N=165)		 5-17
Selected Predictors of Benzene Concentration in Indoor Air (ug/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=248) 5-18
Selected Predictors of Benzene Concentration in Personal Air (ug/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=244)
					 5-20
Selected Predictors of Benzene Dose in Blood (ug/L) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=143)	5-22
Selected Predictors of Chloroform Concentration in Indoor Air (ug/m3)
and Analysis Criteria Across the Phase 3 Objectives in Region 5
(N=245) 			 5-23
Selected Predictors of Chloroform Concentration in Personal Air (ug/m3)
and Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=240)
						 5-25
Selected Predictors of Chloroform Dose in Blood (ug/L) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=125) ....... 5-26
Selected Predictors of Tetrachloroethylene Concentration in Indoor Air
(ug/m3) and Analysis Criteria Across the Phase 3 Objectives in Region 5
(N=228) 			 5-27
Selected Predictors of Tetrachloroethylene Concentration in Personal Air
(ug/m3) and Analysis Criteria Across the Phase 3 Objectives in Region 5
(N=228) 	 5-28

-------
Table 5.2.3.3-DOS
Table 5.2.3.4-CIA
Table 5.2.3.4-EAR
Table 5.2.3.4-DOS
Table 5,3.2.1-CIA
Table 5.3.2.1-CSF
Table 5,3.2.1-EDR
Table 5.3.2.1-EDT
Table 5.3.2.1-DOS
Table 5.3.2.2-CSF
Table 5.3.2.2-EDR
Table 5,3,2.2-EDT
Selected Predictors of Tetrachloroethylene Dose in Blood (ug/L) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=147)
						5-29
Selected Predictors of Trichloroethylene Concentration in Indoor Air
(ug/m3) and Analysis Criteria Across the Phase 3 Objectives in Region 5
(N=236) 						 5-30
Selected Predictors of Trichloroethylene Concentration in Personal Air
(ug/m3) and Analysis Criteria Across the Phase 3 Objectives in Region 5
(N=228) 								 5-31
Selected Predictors of Trichloroethylene Dose in Blood (ug/L) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=149)
Selected Predictors of Arsenic Concentration in Indoor Air (ng/nr) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=127) 5-34
Selected Predictors of Arsenic Loading in Indoor Surface Dust (ug/m2)
and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=135) 				5-35
Selected Predictors of Arsenic Loading in Dermal (ug/m2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=154)	5-36
Selected Predictors of Arsenic Intake in Total Diet from Duplicate Diet
(ug/day) and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=158) 							 5-37
Selected Predictors of Arsenic Dose in Urine (ug/g creatinine) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=166)
				 5-38
Selected Predictors of Cadmium Loading in Indoor Surface Dust (ug/m2)
and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=128) 	5-39
Selected Predictors of Cadmium Loading in Dermal (ug/m2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=134)			 5-40
Selected Predictors of Cadmium Intake in Total Diet from Duplicate Diet
(ug/day) and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=lll) 					5-41
Table 5.3.2.2-DOS-BLD
Table 5.3.2.2-DOS-URN
Table 5.3.2.3-CSF
Table 5.3.2.3-EDR
Table 5,3.2.3-EDT
Table 5.3.2.3-DOS
Selected Predictors of Cadmium Dose in Blood (ug/L) and
Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=162) 		 5-42
Selected Predictors of Cadmium Dose in Urine (ug/g creatinine)
and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N—171)	5-43
Selected Predictors of Chromium Loading in Indoor Surface Dust
(ug/m2) and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=128) 						5-45
Selected Predictors of Chromium Loading in Dermal (ug/m2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=134)	5-46
Selected Predictors of Chromium Intake in Total Diet from Duplicate Diet
(ug/day) and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=120) 				5-47
Selected Predictors of Chromium Dose in Urine (ug/g creatinine) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=171)
			5-48

-------
Table 5.3.2.4-CSF
Table 5.3.2.4-EDT
Table 5.3.2.4-DOS
Table 5.3.2.5-CSF
r-rn 11 JT "i	C T-^ Tl
1 able 5.3.2.5-hUK
Table 5,3.2.5-EDT
Table 5.3.2.5-DOS
Table 5.3.3.1-CIA
Table 5.3,3.i-DOS
Table 5.3.3.2-CIA
Table 5.3.3,3-CIA
Table 5.3.3.3-DOS
Selected Predictors of Lead Loading in Indoor Surface Dust (ug/m2) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=128) 5-49
Selected Predictors of Lead Intake in Total Diet from Duplicate Diet
(ug/day) and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=154)				 5-50
Selected Predictors of Lead Dose in Blood (ug/dL) and Analysis Criteria
Across the Phase 3 Objectives in Arizona (N=162) 				5-51
Selected Predictors of Nickel Loading in Indoor Surface Dust (ug/m2) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=128) 5-52
Selected Predictors of Nickel Loading in Dermal (ug/m2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=134)	5-53
Selected Predictors of Nickel Intake in Total Diet from Duplicate Diet
(ug/day) and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N-149)	5-54
Selected Predictors of Nickel Dose in Urine (ug/g creatinine) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=171)	5-55
Selected Predictors of Benzene Concentration in Indoor Air (ug/m3) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N—166) 5-56
Selected Predictors of Benzene Dose in Blood ( ug/'L) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=l 12) 			5-58
Selected Predictors of Formaldehyde Concentration in Indoor Air (ug/m3)
and Analysis Criteria Across the Phase 3 Objectives in Arizona
(N=167) 		5-59
Selected Predictors of Toluene Concentration in Indoor Air (ug/m3) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=l 66) 5-60
Selected Predictors of Toluene Dose in Blood (ug/L) and Analysis Criteria
Across the Phase 3 Objectives in Arizona (N=l 14) 		 5-61

-------
Figures
Figure 4. la. Histogram of Arsenic Concentration (ng/m3) in Personal Air, Region 5 Study
	.	4-14
Figure 4. lb. Normal Q-Q Plot of Arsenic Concentration in Personal Air, Region 5 Study
	4-15
Figure 4-2. Normal Q-Q Plot of Arsenic Concentration in Personal Air, Region 5 Study,
Transformed by (Log (y)) 				4-15
Figure 4-3. Normal Q-Q Plot of Arsenic Concentration in Personal Air, Region 5 Study,
Transformed by (l/(y+l)), the Associated Box-Cox Transformation			 4-15
Figure 4-4. General Exposure Paradigm 			4-26
Figure 4-5. Plot of Category and Final Scaling Values of Arsenic Concentration in Personal
Air (ng/m3) from CATREG Analysis of the Region 5 Study Example Model 4-34
Figure 4-6a. Plot of Category and Final Scaling Values (Table 4-10a) for Predictor B06A
(Use Tobacco Products?) from CATREG Analysis of Region 5 Study Example
Model								4-35
Figure 4-6b. Plot of Category and Final Scaling Values (Table 4-1 Ob) for Predictor B08A
(# Minutes with Smoker at Work) from CATREG Analysis of Region 5 Study
Example Model 						4-35
Figure 4-6c. Plot of Category and Final Scaling Values (Table 4-10c) for Predictor GEO
(What state do you live in?) from CATREG Analysis of Region 5 Study
Example Model 			4-36
Figure 4-7. Example of E-CHAID Tree for Region 5 Study Example Model		 4-42
Figure 4-8. Example of Using CART to Customize Categories for a First-Level Measurement
Predictor Producing More Than Five Nodes 				4-47
Figure 4-9. Final Tree for Region 5 Study Example Model, the Result of Pruning the Tree in
Figure 4-7			4-50

-------
1 Introduction
Ail important aspect of public health protection is the prevention or reduction of exposures to
hazardous chemical contaminants that contribute, either directly or indirectly, to increased rates
of premature death, diseases, discomfort or disability. Evaluating the risks posed by these
chemicals requires the ability to estimate the number of people exposed to the chemicals as well
as the magnitude and duration of the exposure. Exposure assessment is used in environmental
health studies such as risk assessment, risk management, status and trends analysis, and
epidemiology (Sexton 1995a) to identify, define, and quantify exposures that occur, or are
anticipated to occur, in human populations (WHO 2000), Environmental health studies also
enhance the understanding of the relationships of human exposure to environment and behavior
to identify how high exposures and any subsequent health effects can be reduced.
In many instances, measuring a person's exposure level entails the costs associated with the
collection and analysis of personal and possibly environmental samples. Finding cost-effective
ways of explaining the variability in, or classifying potential study participants by, their exposure
level would improve the extent of information that could be realized from these studies. The
purpose of this report is to present an approach, and its results for identifying predictors of
human exposure to provide those involved in designing future environmental health studies with
information to minimize study costs. [Note: Words included in the glossary (Appendix J) are
shown in bold at their first use.]
National Human Exposure Assessment Survey (NHEXAS)
Measurements of human exposure play an important role in reflecting the actual distributions of
human exposure to these chemicals, and the National Human Exposure Assessment Survey
(NHEXAS) was initiated to enhance the tools and data available for human exposure assessment.
NHEXAS was a federal interagency research effort coordinated by the Environmental Protection
Agency (EPA), Office of Research and Development (ORD).
The objectives of NHEXAS are threefold: to document the occurrence, distribution,
and determinants of exposures to hazardous environmental agents, including
geographic and temporal trends, for the U, S. population; to understand the
determinants of exposures for potentially at-risk population subgroups, as a key
element in the development of cost-effective strategies to prevent and reduce
exposures (risks) deemed to be unacceptable; and to provide data and methods for
linking information on exposures, doses, and health outcomes that will improve
environmental health surveillance, enhance epidemiologic investigations, promote
development of predictive models, and ultimately lead to better decisions. (Sexton
1995c)
1-1

-------
Many previous studies focused on one type of exposure to one chemical. Phase I of NHEXAS
(hereafter referred to as NHEXAS) evaluated total human exposure to multiple chemicals on a
community and regional scale. It focused on the exposure of people to environmental pollutants
during their daily lives. Participants were selected from four areas of the country (Arizona,
EPA's Region 5, Maryland, and Minnesota) using population-based probability sampling
schemes. Between 1995 and 1997, NHEXAS scientists measured the levels of a suite of
chemicals to which these participants were exposed in the air, soil, and dust in and around their
homes, and through consumption of food, beverage, and water. Measurements were also made of
chemicals or their metabolites in biological samples (blood and urine) provided by the
participants. Finally, participants completed questionnaires to help identify possible sources of
exposure to chemicals and to characterize major activity patterns and conditions of the home
environment (Sexton 1995b, 1995c). Additional details about, and references for, the NHEXAS
studies are included in Appendices B, C, and I.
Predictors of Human Exposure
EPA's strategy for analyzing the NHEXAS study data is intended to enhance the efforts of
investigators involved with exposure assessment activities. Toward that end, EPA developed a
Strategic Analysis Plan (EPA 2000) to delineate and prioritize important types of analysis to be
performed using the NHEXAS data. In preparing the plan, EPA considered analysis projects
that would enhance, and not duplicate, the work already defined by the NHEXAS study
investigators. Identifying predictors of human exposure using the NHEXAS data became part of
this analysis plan as task P-01: Analysis and Comparison of NHEXAS Exposure Data to
Residential Pollutant Sources, Concentrations, and Activity Patterns. The goals of this task as
stated in the analysis plan are twofold:
•	To evaluate and identify hypotheses about those residential pollutant sources, housing
characteristics, residential concentrations (indoor and outdoor), and activity patterns
that contribute to human exposures, especially for high-end exposures.
•	To determine the value of questionnaires for understanding various aspects of
exposure, and the reliability and validity of the instruments used for ascertaining
these factors.
These goals, along with the characteristics of the available data and the resources required to
carry out various analysis options, were taken into consideration in determining how to
implement this task. The results of this task would assist the development of future exposure
studies by minimizing the number of questions to be asked, and forming the selection of
questions related to specific chemicals and measurements. The results would also assist
epidemiological studies by identifying questions or factors in terms useful for grouping the
population by exposure levels. Specifically, the charge of this analysis project is to identify
questionnaire items and/or environmental and biological measures that are useful for predicting
human exposure to chemicals. It will evaluate potential relationships between questionnaire
items and the environmental and biological measures under three analysis objectives: modeling
and regression analysis, grouping and classification analysis, and the classification of individuals
with high-end exposure levels.
The analysis approach is performed on the Arizona and Region 5 studies and consists of three
phases.
1-2

-------
*	Phase 1, Data Review and Cleanup, prepares the data for analysis by reviewing each
data set in the context of its study design and with respect to the requirements of the
analysis techniques.
*	Phase 2, Question Variable Reduction, sorts through the NHEXAS questionnaire
variables for each study to explore relationships between these variables and to
reduce the number of questionnaire variables used in the Phase 3 analyses,
*	Phase 3, Model-based Analysis, analyzes the questionnaire variables carried forward
from Phase 2 with measurement data for a specific chemical in the context of a
conceptual model based on the environmental health paradigm (Sexton 1995a).
The following sections describe the approach used, the results, and how to use the results in a
fair amount of detail. The Conclusions and Recommendations highlight the overall findings of
this study. The Methodology section provides a detailed and reasonably non-technical
description of the three-phase analysis approach. The Results section presents a high-level view
of results from each NHEXAS study. There, for each chemical and conceptual model analyzed,
a summary of the questionnaire variables and measurements identified as predictors across the
three objectives is provided. Appendices G and H include detailed information for the analyses
in Phases 1, 2, and 3.
1-3

-------
This page intentionally left blank.
1_4

-------
2 Conclusions
As part of the Strategic Analysis Plan for the NHEXAS study data ( EPA 2000), task P-01:
Analysis and Comparison of NHEXAS Exposure Data to Residential Pollutant Sources,
Concentrations, and Activity Patterns, was charged with identifying questionnaire items and/or
environmental and biological measures that are useful for predicting human exposure to
chemicals. Using the NHEXAS Region 5 and Arizona studies, this project evaluates such
potential relationships under three analysis objectives: modeling and regression analysis,
classification by exposure level, and the classification of individuals with high-end exposure
levels.
2.1 Analysis Approach
Several issues regarding the data to be analyzed and available analysis techniques were
considered in developing the analysis approach. They include:
1.	approximately 600 potential predictors from the questionnaires,
2.	a relatively small number of cases available for analysis (180 to 250, if no missing
data),
3.	predictors that are primarily categorical in nature,
4.	missing questionnaire and measurement data,
5.	data that may not meet the assumptions of traditional analyses, and
6.	measurement data with sometimes high percentages of below-detection limit values.
Traditional analysis techniques, such as regression analysis, did not seem suitable for the
approach.
The analysis approach used in this project includes both non-traditional statistical techniques and
non-traditional uses of traditional parametric statistical techniques. It uses patterns of
relationships that exist in the data rather than subjective methods to select variables for
subsequent analyses. The analyses are then science-driven; that is, they are based on the
understanding of the types of relationships that are plausible, and are used to test hypotheses
based on that understanding of the science. The approach recognizes the limitations of the data
and thus balances exploratory and descriptive paths with inferential paths.
The approach developed for this project includes three phases:
*	Phase 1, Data Review and Cleanup, prepares the data for analysis by reviewing each
data set in the context of its study design and with respect to the requirements of the
analysis techniques.
•	Phase 2, Question Variable Reduction, sorts through the NHEXAS questionnaire
2-1

-------
variables for each study to explore relationships between these variables and to
reduce the number of questionnaire variables used in the Phase 3 analyses.
*	Phase 3, Model-based Analysis, analyzes the questionnaire variables carried forward
from Phase 2 with measurement data for a specific chemical in the context of a
conceptual model based on the environmental health paradigm.
The model-based analyses in Phase 3 cover three objectives. The objectives consider
relationships between questions and measurements from different perspectives or uses. The
analysis techniques selected provide different types of predictive information to address the
objectives and are as follows:
*	Modeling and Regression Analysis
•	Categorical Regression Analysis
Stepwise Regression Analysis
*	Classification of Individuals by Exposure Level
CHAID (Chi-Square Automatic Interaction Detector)
•	CART (Classification and Regression Trees)
Classification of Individuals with High-End Exposure Levels
•	Logistic Regression Analysis.
This approach to identify primary predictors provided a systematic and consistent review of all
the questions used in the NITEXAS studies, whereas other analyses evaluating relationships
between questionnaire and measurement variables are usually limited to selected questions of
interest. In Phase 2, the questionnaire variables were filtered with respect to their relationships
with other questionnaire variables. The questionnaire variables carried forward to Phase 3 were
then tested for potential relationships to concentration, exposure, and/or biomarker
measurements in analyses based on conceptual models from the environmental health paradigm.
Depending on the relationships evaluated in a model for a specific chemical, questionnaire
variables may have had multiple opportunities to be selected as a predictor.
2.2 Limitations and Strengths of the Analysis Approach
Each technique used in the approach had its strengths and limitations. The limitations offer
opportunities for refining the approach as described in the Recommendations (Section 3).
2.2.1	Phase 1
The approach developed for coding non-responses in questionnaire variables provided several
benefits to the analysis approach. It allowed the cases with non-response outcomes to be
included in the analyses which helped maintain a larger number of cases for the analyses. The
non-response outcomes did not have to be imputed or collapsed with other categories; they could
be used as categories in their own right. This also allowed possible differences between
respondents and non-respondents to be examined.
2.2.2	Phase 2
Questionnaire variables that were related to the same topic generally appeared in the same
2-2

-------
principal component. This shows an internal consistency among the responses to the
questionnaire variables. Asking only a few questions regarding a particular topic may provide a
study with most of the information necessary on that topic for predicting exposure levels.
Principal component analysis (PCA) was used in an exploratory maimer to evaluate relationships
between questionnaire variables. PCA is often used to generate composite variables that describe
the components; however, a composite variable might not be easily interpreted in the context of
a conceptual model, and would require that all of the questions be asked in order to generate the
variable. This would not elicit the desired effectiveness. Combining PCA with stepwise
regression analysis created a filtering process which allowed the analyses in Phase 3 to be based
on the original questionnaire variables. A variable selected for Phase 3 analysis, and
subsequently evaluated as a predictor for a model, may be one of several in a component that
would offer similar results. A researcher could then select questions within the topic, smoking
activities or a health condition for example, that offer better reliability for questionnaire
administration.
2.2 J Phase 3
The categorical regression analysis allowed the original questionnaire variables to be used in the
analysis relative to the measurement data. In traditional regression analysis, using categorical
data would have required transforming the categorical variables into dummy variables which
inflates the number of independent variables used in a model. The conceptual models defined
in Appendices G and H include from 20 to over 100 independent variables, most of which are
categorical. The number of dummy variables required could exceed the number of cases,
making analysis with regression analysis inappropriate, if at all possible. Categorical regression
analysis does not inflate the number of independent variables used in the model, and is used to
transform the response outcomes to better fit the assumptions of regression analysis without the
use of dummy variables.
In categorical regression analysis, each variable, dependent or independent and numeric or
categorical, is transformed to a new scale. This makes interpretation of the categorical
regression coefficients more difficult and not directly translatable to the original scale.
CHAID and categorical regression analysis allowed non-response categories to be analyzed as
separate categories. The CHAID algorithm allowed them to combine with any other category in
the questionnaire variable or with any group of measurement values based on the relationship of
those cases with the dependent variable. In categorical regression analysis, the non-response
category in a numeric variable could be included as a nominal category. A transformed value for
the category was then estimated based on relationships in the data rather than having a value
assigned
The CHAID type of analysis automatically identifies interaction effects in developing the
classification model, and filters out non-significant predictors in a manner similar to a forward
stepwise regression analysis. The number of available cases with respect to the variability, or
lack of it, in the data did not allow it to do an effective job of classifying by exposure level.
All measurement variables had some missing data, and for a specific conceptual model the
missing data usually did not occur on the same cases. The level of missing measurement data
ranged from 0 to 70%. Imputation of the missing values was not feasible because adequate
2-3

-------
justifications to support the use of imputed values were not available to this project. The logistic
regression analysis software was not able to handle cases with missing measurement values like
CHAID or the categorical regression analysis, and excluding cases with missing measurement
values would significantly decrease the number of cases available for analysis. Consequently,
only questionnaire variables were included in the logistic regression analysis of the models.
The group of interest for objective 3, individuals with high-exposure levels, was defined as the
upper 10th percentile of the sampled population. Given the relatively small number of cases
available from the studies, the number of cases representing this group ranged from 10 to 25
depending on the model analyzed. The distribution of these eases across questionnaire variable
categories created a high number of separation issues in the logistic regression analysis. For
some conceptual models, no analysis could be finalized.
The models that could be analyzed in Phase 3 were limited because there were not enough
measurement values for the dependent variable and/or the level of below detection limit values
was high. Similar issues affected the independent measurement variables that could be used in
an analysis. Surrogate questionnaire variables may have been selected bas predictors because
the measurement variables were not included in the model.
In general, the approach offers some additional avenues for exploring the analysis of
relationships between categorical questionnaire variables and measurement data. Some
suggestions for refinements are offered in the Recommendations.
2.3 Predictors of Human Exposure
The breadth of models that could be analyzed for a particular chemical was limited by the
available data collected, analyzed, or containing useable information. There were six potential
models that could be analyzed for each chemical: concentration in indoor air, concentration in
indoor surface dust, inhalation exposure, dietary exposure, dermal exposure and dose (blood or
urine). The Region 5 study had measurement data on six primary target chemicals [two metals
and four volatile organic compounds (VOCs)]. The Arizona study had measurement data on ten
primary target chemicals (five metals and five VOCs), although two of the VOCs did not have
enough data for any analysis. Forty-eight conceptual models were analyzed, 22 for the Region 5
study and 26 for the Arizona study. Appendices G and H provide the most detailed information
on the model-based analyses for each of the three objectives. In the Results section, the selected
predictors are summarized by model to show the predictors selected across the three objectives.
These results would be valuable for identifying chemical-specific questions.
The summary tables in the Results section included a "Category" column which grouped
questionnaire or measurement variables. These categories were based on the predictors selected
across both studies and both chemical classes. The questionnaire variables and descriptions are
also included to clarify the categorization scheme. In this section, the selected predictors are
presented in Tables 2-1, 2-2, 2-3, and 2-4 by chemical class across all models and chemicals
analyzed for each study. The tables in this section provide a more general view of the selected
predictors and should be tempered with an understanding of the number, type, and breadth of
models on which they are based.
Each table lists the predictors selected in at least one model for one chemical from the chemical
class and study analyzed. The predictors are listed under the analysis objective columns for
2-4

-------
which they were selected. Some predictors may be universal across the objectives, and others
will appear under a particular objective. Within a given category in a table, the predictors may
split across the objectives as in Table 2-1 for the Cleaning category. Reviewing the components
associated with each of the questions in the category will give the user an indication of whether
similar or different information is being collected by the questions. In each table, questionnaire
and measurement variables are identified for the analyses from Objective 1 (Modeling and
regression analysis), Objective 2 (Classification of individuals by exposure level), and Objective
3 (Classification of individuals with high-end exposure levels). A formal comparison of the
predictors selected between the studies is not included because differences between the studies
may affect the meaningfulness of such comparisons. The comments after each table, and the
predictor categorizations within each table give the user some information toward that end.
2-5

-------
Table 2.1 Selected Predictors for Metals in the Region 5 Study Across the Phase 3 Analysis Objectives
Type: M - Measurement; Q - Question
Modeling - Objective 1; Exposure Level - Objective 2; High Exposure - Objective 3
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
M
INDOOR AIR
CONCENTRATION
CONC020
Air Measurements
•
*
Nl
M
OUTDOOR AIR
CONCENTRATION
CONC030
Air Measurements
*
*
Nl
Q
# TIMES PAST WEEK
VACUUMING
F03A2
Cleaning
(DustA/acuuming/Sweep)

*

Q
# DAYS PAST WEEK SINCE
VACUUMING
F03A3
Cleaning
(DustA/acuuming/Sweep)
*


Q
# TIMES PAST WEEK SWEEP
INDOORS
F03B2
Cleaning
(DustA/acuuming/Sweep)
*

*
Q
PAST WEEK DID YOURSELF:
SWEEP INDOORS
F03B6
Cleaning
(DustA/acuuming/Sweep)
*


Q
# TIMES PAST WEEK
DUSTING
F03C2
Cleaning
(Dust/Vacuuming/Sweep)

*
*
Q
# DAYS PAST WEEK SINCE
DUSTING
F03C3
Cleaning
(Dust/Vacuuming/Sweep)

+

Q
# MIN PAST WEEK DUSTING
F03C4
Cleaning
(Dust/Vacuuming/Sweep)


*
Q
AV, MIN. TRAVELED ON
ROADWAYS/HIGHWAYS
ATA19R
Commute Time/Distance
*


Q
PAST 6 MOS, COMMUTE BY
CAR/TRUCKA/AN?
B19A
Commute Time/Distance
*


Q
MONTH START AIR
CONDITIONING?
B29C1
Cooling Season
*


Q
# MONTHS IN COOLING
SEASON
B29R
Cooling Season
*


Q
PAST WEEK DIETING?
F10
Diet
*


Q
PDAYS DIET DIFF DUE TO
ILLNESS/MED COND
FD14CPCR
Diet
•


M
FOOD AND BEVERAGE
INTAKE CONCENTRATION
CON C138
Diet Measurements
*

Nl
M
SURFACE DUST LOADING
CONC050
Dust Measurements

»
Nl
Q
PERFORMED VIGOROUS
EXERCISE
A1A27R
Exercise

*

Q
AV. MIN. PERFORMED
VIGOROUS EXERCISE
ATA27R
Exercise
*

¦
Q
NO.DAYS LUNCH USUAL 1-3
TIMES/MO
FD06CNYR
Food Intake
*


Q
PDAYS LUNCH USUAL <
ONCE/MO
FD06DPCR
Food Intake


*
Q
NO DAYS AMT DUE TO
OTHER
FD12HNYR
Food Intake

•

Q
NO.DAYS BREAKFAST PREP
AT RESTAURANT
FD02BNYR
Food Preparation

*
*
Q
NO.DAYS DINNER PREP AT
HOME
FD08ANYR
Food Preparation
*
*
•
Q
# MONTHS IN HEATING
SEASON
B32R
Heating Season

*

Q
MONTH STOP HEATING
DEVICES
B33B
Heating Season


*
2-6

-------
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
Q
WAS HEATING ON DURING
SAMPLING PERIOD?
HEAT
Heating Season

*

Q
EXTERIOR SIDING -
ASBESTOS/ASPHALT
T06C6
Housing Structure


*
Q
TYPES OF FOUNDATION:
SLAB
TQ6J1
Housing Structure

*

Q
TYPES OF FOUNDATION:
FULL BASEMENT
T06J4
Housing Structure


*
Q
FLOORS IN BUILDING
T01
Housing Structure/Size

*

Q
FLOORS LIVED ON/MULTI-
UNIT BLDG FR #1
T02
Housing Structure/Size
*


Q
IS THIS A MULTI-UNIT
BUILDING?
T02MULTR
Housing Structure/Size
*
#

Q
IS THIS HOUSE OR
APARTMENT , [OWNERSHIP]
D09
Housing/Ownership
*


Q
IS THIS HOUSE OR
APARTMENT . [OWNERSHIP]
D09_DESC
Housing/Ownership
*


Q
WHAT STATE DO YOU LIVE
IN
STATE
Location
*
*

Q
SIZE OF COUNTY
GEO
Location/Characteristics
*
*
*
Q
GROUP QUARTERS?
DQ5R
Number of People in Home



Q
NUMBER IN HOUSEHOLD
HH_NUM_R
Number of People in Home
*


Q
SEX - PARTICIPANT
B02
Participant Characteristics

*

Q
WEIGHT (POUNDS) -
PARTICIPANT
BQ5AMD
Participant Characteristics


*
Q
# MOSTLY INDOOR HOUSE
PETS?
B43B
Pets

#

Q
# MOSTLY OUTDOOR HOUSE
PETS?
B43C
Pets
*


Q
# DAYS PAST WK
SMOKE/FUMES-OIL
FURNACE
F01H2
Smoke/Fumes/Burned Food
•


M
YARD SOIL
CONCENTRATION
CONC080
Soil Measurements
*

• Nl
Q
WATER TREATMENT:
REVERSE OSMOSIS?
B26EIII
Source of Water


*
Q
FISH FROM OCEAN?
BAA12B1R
Specific Foods



Q
AV. MIN, IN ENCLOSED
WORKSHOP
ATA25R
Time at Home

*
*
Q
AV. DAILY HOURS INSIDE
ELSEWHERE
ATEJR
Time Away From Home
*
*

Q
AV. DAILY HOURS OUTSIDE
AT WORK/SCHOOL
ATEMR_E
Time Away From Home

¦

Q
AV. DAILY HOURS OUTSIDE
AT WORK/SCHOOL
ATEMRJ3
Time Away From Home
*


Q
HOURS/WK CHILD AWAY
FROM HOME
B18A
Time Away From Home
•

*
Q
SCHOOL/DAYCARE OUTSIDE
HOME-PARTICIPANT
SCHLRJE
Time Away From Home

•

Q
AV. MIN. SAT/LAY ON
CARPET/RUGS
ATA24R
Time on Rugs/Carpet

•

Q
# MINUTES WITH SMOKER
AT HOME
B08A
Tobacco
*
*

Q
# MIN, WITH SMOKER IN
OTHER ENCL, AREA
B08D
Tobacco


*
2-7

-------
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
Q
TOBACCO SMOKING IN
HOME?
B09A
Tobacco



Q
CENTRAL AIR
CONDITIONER?
B29B1
Ventilation System (AC/Heat)

*

Q
WALL/WINDOW AIR
CONDITIONER?
B29B2
Ventilation System (AC/Heat)


*
Q
PAST WEEK USED
WINDOW/WALL AC
F01B
Ventilation System (AC/Heat)



Q
PAST WEEK USED
PORTABLE/CEILING FAN
F01D
Ventilation System (AC/Heat)

*

Q
PAST WEEK USED AIR
CENTRAL HEAT
FOIL
Ventilation System (AC/Heat)
*


Q
# DAYS PAST WEEK USED
AIR CENTRAL HEAT
FQ1L1
Ventilation System (AC/Heat)

*

M
WATER CONCENTRATION
CONC09O
Water Measurements
*

Nl
Q
# MINI PAST WEEK
WOODWORKING
F03F4
Wood Work


#
Q
AT-JOBS CONTACT WITH
SAW DUST?
AC14G1R
Working Conditions
*


Q
AT-JOBS NO CONTACT WITH
DUST?
AC14G9R
Working Conditions

*

Q
AT WORK- EXPOSURE TO
METALS THRU FUMES
FMTXPOSR
Working Conditions
*


* Predictor was selected for the objective.
Nl - Measurement variable was not included in logistic regression analysis, but was selected In the same model for one of the other
objectives. Thus it may be a predictor for this objective.
For metals in the Region 5 study, the following categories were selected across all the objectives:
Air Measurements
Cleaning (Dust/Vacuuming/Sweep)
Dust Measurements (includes NI for High Exposure)
Exercise
Food Preparation or Intake
Housing Structure or Housing Structure/Size
Location or Location/Characteristics
Time Away From Home
Tobacco
Ventilation System (AC/Heat)
Working Condtions.
The following categories were selected only for the High Exposure objective:
Source of Water
Specific Foods
Wood Work.
2-8

-------
The following categories were selected only for the Modeling objective:
Commute Time/Distance
Cooling Season
Diet
Housing Ownership
Smoke/Fumes/Bumed Food.
The remaining categories were mixed across other combinations of the objectives.
2-9

-------
Table 2.2	Selected Predictors for VOCs in the Region 5 Study Across the Phase 3 Analysis Objectives
Type: M - Measurement; Q - Question
Modeling - Objective 1; Exposure Level - Objective 2; High Exposure - Objective 3
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
M
PERSONAL INDOOR AIR
CONCENTRATION
CONC160
Air Measurements
*
*
Nl
M
INDOOR AIR
CONCENTRATION
CONC180
Air Measurements
*
*
Nl
M
OUTDOOR AIR
CONCENTRATION
CONC190
Air Measurements
*
*
Nl
Q
AV. MIN. USED CLEANING
SUPPLIES
ATA23R
Cleaning Supply Usage
*


Q
PAST 6 MOS, COMMUTE BY
BICYCLE?
B19E
Commute Time/Distance


*
Q
AIR COND ON DURING
SAMPLING?
AC
Cooling Season
*
*
*
Q
MONTH START AIR
CONDITIONING?
B29C1
Cooling Season
*


Q
# MONTHS IN COOLING
SEASON
B29R
Cooling Season
*
*
*
Q
PAST 6 MONTHS,
DEODORIZERS USED
B42
Deodorizer Usage

*

Q
AV MIN. PERFORMED
VIGOROUS EXERCISE
ATA27R
Exercise


*
Q
AV. MIN. PERFORMED
MODERATE EXERCISE
ATA28R
Exercise


4r
Q
NO. DAYS STARTED/TENDED
FIRE
ATAQ7R
Fireplace,Wood Stova
*

*
Q
FREQ. OF FIREPLACE USE
B37C
Fireplace/Wood Stove
-


Q
# DAYS PAST WEEK USED
WOOD/COAL STOVE
F01G1
Fireplace/Wood Stove


*
Q
# DAYS PAST WEEK USED
FIREPLACE
F01K1
Fireplace/Wood Stove


*
Q
NO. DAYS IN ENCLOSED
GARAGE WITH CAR
ATA03R
Garage Structure/Activity


*
Q
GARAGE LOCATION
B27B
Garage Structure/Activity

*

Q
PAST WEEK PARK CAR IN ?
F05
Garage Structure/Activity
*


Q
NO. DAYS WITH YARD
DIRT/SOIL ON SKIN
ATA04R
Gardening


*
Q
NO. DAYS PUMPED GAS
ATA01R
Gasoline Usage
*

#
Q
NO. DAYS GASOLINE ON
SKIN
ATA02R
Gasoline Usage


*
Q
GAS-POWERED DEVICES
STORED
B28
Gasoline Usage

*
*
Q
# DAYS PAST WK SINCE
USED GLUES
F02B3
Glue Usage
*
*

Q
HEATING FUEL -
ELECTRICITY?
B31C
Heating Fuel Usage
*
*

Q
HEATING FUEL-WOOD?
B31F
Heating Fuel Usage


•
Q
MONTH STOP HEATING
DEVICES
B33B
Heating Season
i
*


Q
EXTERIOR SIDING - OTHER
T06C7 | Housing Structure


*
2-10

-------
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
Q
TYPES OF FOUNDATION;
CRAWL SPACE
T06J2
Housing Structure
*


Q
PAST 6 MONTHS WALL
ADDED OR REMOVED
B25B
Housing
Structure/Remodeling



Q
PAST 6 MONTHS FLOORS
REFINISHED
B25D
Housing
Structure/Remodeling


*
Q
FLOORS IN BUILDING
T01
Housing Structure/Size
*


Q
FLOORS LIVED ON/MULTI-
UNIT BLDG FR #1
T02
Housing Structure/Size

*

Q
IS THIS A MULTI-UNIT
BUILDING?
T02MULTR
Housing Structure/Size


*
Q
IS THIS HOUSE OR
APARTMENT. [OWNERSHIP]
DOS
Housing/Ownership
*


Q
TOOK BATH
AIA11R
Hygiene


*
Q
DAYS PAST 3-MO. USING
LEAD SOLDER?
B11A
Lead Use
*


Q
DAYS PAST 3-MO. USE
(LEAD) OIL PAINT
B11B
Lead Use


*
Q
WHAT STATE DO YOU LIVE
IN
STATE
Location
*

*
Q
PROPERTY USED AS FARM
OR RANCH?
B22
Location/Characteristics


*
Q
SIZE OF COUNTY
GEO
Location/Characteristics
•
*

Q
# MIN PAST WEEK METAL
WORKING
F03G4
Metal Work


*
Q
10+ PEOPLE AT ADDRESS?
D04
Number of People in Home

*

Q
GROUP QUARTERS?
D05R
Number of People in Home


•
Q
NUMBER IN HOUSEHOLD
HHJJUMJ3
Number of People in Home
*
*

Q
DAYS PAST MO. FREQ.
PAINTING?
B10A
Paint Usage
*


Q
# TIMES PAST WEEK USED
PAINT/SOLVENT
F02A2
Palrrt Usage

~

Q
SEX-PARTICIPANT
B02
Participant Characteristics
*


Q
HOUSEHOLD INCOME
B44
Participant Characteristics
*

*
Q
# DAYS PAST WEEK SINCE
BURN FOOD
F04B3
Smoke/Fumes/Burried Food


*
Q
SOURCE OF RUNNING
WATER - PRIVATE WELL?
B26B2
Source of Water
*

*
Q
SOURCE OF COOKING
WATER?
B26C
Source of Water
*


Q
WATER TREATMENT:
OTHER?
B26EV
Source of Water


*
Q
FISH FROM GREAT LAKES?
BAA12B2R
Specific Foods

*

Q
# TIMES EAT
BROC/CAULIF/BRUS
SPROUTS
F09A2
Specific Foods



Q
# TIMES EAT
CABBAGE/SLAW/SAUERKRA
UT
F09B2
Specific Foods


*
Q
TOTAL HRS/WK WORKED AT
HOME, BOTH JOBS
AC14AIR
Time at Home
*


Q
IN ENCLOSED WORKSHOP
AIA25R
Time at Home
*


Q
NO. DAYS TOBACCO
SMOKED IN HOME
ATA09R
T obacco
! *

2-11

-------
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
Q
AV, NUMBER CIGARETTES
SMOKED
ATA15R
Tobacco
*

¦
Q
USE TOBACCO PRODUCTS?
B06A
Tobacco


•
Q
MONTHS SINCE QUITTING
TOBACCO USE
B06C
Tobacco
*

w
Q
# MINUTES WITH SMOKER
AT HOME
B08A
Tobacco
*
*

Q
# MINUTES WITH SMOKER
AT WORK
B08B
Tobacco


*
Q
TOBACCO SMOKING IN
HOME?
B09A
Tobacco


*
Q
WALL/WINDOW AIR
CONDITIONER?
B29B2
Ventilation System (AC/Heat)
*


Q
PAST WEEK WINDOW/WALL
AC SETTING
F01B1
Ventilation System (AOHeat)
*


a
# DAYS PAST WEEK USED
OTHER AIR FILTER
FQ101
Ventilation System
(AC/Heat)/Filters


*
Q
HOW OFTEN CHANGE
FILTER IN DEVICE
F01P1R
Ventilation System
(AC/H eat)/Filters
*
•
*
Q
DRANK WATER
AIA14R
Water intake
•


Q
AV. NO. GLASSES OF
WATER
ATA14R
Water Intake
•


M
TAP WATER
CONCENTRATION
CONC200
Water Measurements

*
Nl
Q
AT-JOBS WEAR GLOVES?
AC14F1R
Working Conditions
*
*

a
AT-JOBS CONTACT WITH
SAW DUST?
AC14G1R
Working Conditions
*


Q
AT-JOBS CONTACT WITH
OTHER DUST?
AC14G7R
Working Conditions


*
* Predictor was selected for the objective.
Nl - Measurement variable was riot included in logistic regression analysis, but was selected in the same model for one of the other
objectives. Thus it may be a predictor for this objective.
For VOCs in the Region 5 study, the following categories were selected across all the objectives:
Air Measurements
Cooling Season
Garage Structure/Activity
Gasoline Usage
Heating Fuel Usage
Housing Structure/Size
Location or Location/Characteristics
Number of People in Home
Specific Foods
Tobacco
Ventilation System (AC/Heat)
Working Condtions.
2-12

-------
The following categories were selected only for the High Exposure objective:
Commute Time/Distance
Exercise
Gardening
Housing Structure/Remodeling
Hygiene
Metal Work
Smoke/Fumes/Burned Food.
The following categories were selected only for the Modeling objective;
Cleaning Supply Usage
Heating Season
Housing/Ownership
Time at Home
Water Intake.
The remaining categories were mixed across other combinations of the objectives..
2-13

-------
Table 2,3	Selected Predictors for Metals in the Arizona Study Across the Phase 3 Analysis Objectives
Type: M - Measurement; Q - Question
Modeling - Objective 1; Exposure Level - Objective 2; High Exposure - Objective 3
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
M
INDOOR AIR
CONCENTRATION
CCNC'03
Air Measurements
*

Nl
M
INDOOR AIR
CONCENTRATION
CONC111
Air Measurements
*

Nl
Q
PAST WEEK DUSTING
FQ3C1
Cleaning
(DustA/acuuming/Sweep)

*

Q
PAS I 6 MOS, COMMUTE BY
OTHER MEANS?
B19G1
Commute Time/Distance
*

*
Q
AIR CONDITIONING ON
DURING SAMPLING?
AC
Cooling Season

•

Q
MONTH START AIR
CONDITIONING?
B29C1
Cooling Season
*
*

Q
PAST 6 MONTHS,
DEODORIZERS USED
B42
Deodorizer Usage

*

M
DERMAL LOADING
CONC1Q3
Dermal Measurements
*
*
Nl
Q
PAST WEEK ON DIABETIC
DIET
F11G
Diet
*


Q
PDAYS DIET DIFF DUE TO
TRVUVACATION
FD14APCZ
Diet


*
Q
PDAYS DIET DIFF DUE TO
WT CONTROL DIET
FD14BPCZ
Diet


*
Q
NO.DAYS REPORTED ON
DIET CAUSE
FD14NDZ
Diet


*
M
FOOD AND BEVERAGE
INTAKE CONCENTRATION
CONC130
Diet Measurements
*

Nl
Q
DOORS AND WINDOWS LEFT
OPEN
AIA262
Doors/Window Open
*
*

Q
DRIP LINE LOCATION
T06G1
Dripline

*
*
Q
DRIPL1NE METERS FROM
WALL
T06G2A
Dripline

*
*
Q
DUST LEVEL RATING
T04A
Dust Level



M
SURFACE DUST LOADING
CONC101
Dust Measurements

*
Nl
Q
PERFORMED VIGOROUS
EXERCISE
AIA27Z
Exercise
*
•

Q
PERFORMED MODERATE
EXERCISE
A1A28Z
Exercise
»
*
*
Q
FREQ. OF WOOD/COAL
STOVE USE
B36C
Fireplace/Wood Stove
*


Q
PDAYS BREAKFAST
FOOD/BEV NOT COLLECTED
FD10APCZ
Food Collection
4


Q
NO,DAYS REPORTED ON
LUNCH COLLECTION
FD10BNDZ
Food Collection


*
Q
NO.DAYS REPORTED ON
SNACK COLLECTION
FD10DNDZ
Food Collection

*

Q
PDAYS BREAKFAST USUAL
6-7 TIMES/WK
FD03APCZ
Food Intake
*


Q
PDAYS LUNCH EATEN
FD04PCZ
Food Intake
*
*

Q
PDAYS LUNCH EATEN AT
| HOME
FDQ5A1PZ
Food Intake

*

2-14

-------
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
Q
PDAYS LUNCH EATEN AT
WORK SITE
FD05C1PZ
Food Intake

*

Q
PDAYS DINNER USUAL 6-7
TIMES/WK
FD09APCZ
Food Intake

4

Q
PDAYS SNACK FOOD/BEV
NOT COLLECTED
FD10DPCZ
Food Intake

*

Q
PDAYS BREAKFAST PREP
AT WORK SITE
FD02CPCZ
Food Preparation

•

Q
PDAYS LUNCH PREP AT
RESTAURANT
FD05BPCZ
Food Preparation

*

Q
PDAYS LUNCH PREP AT
SCHOOL
FD05DPCZ
Food Preparation


*
Q
# TIMES PAST WEEK
GARDENING
F03E2
Gardening

*

Q
AGE INTESTINAL-BOWEL
TROUBLE DIAGNOSED
B21N4
Health Problems
*
»

Q
AGE KIDNEY TROUBLE
DIAGNOSED
B21V4
Health Problems
*


Q
MONTH START HEATING
DEVICES
B33A
Heating Season
*


Q
WAS HEATING ON DURING
SAMPLING PERIOD?
HEAT
Heating Season
*


Q
EXTERIOR SIDING -
CONCRETE BLOCK
T06C4
Housing Structure
*


Q
EXT PAINTING
CHALKING/CHIPP1NG/PEELIN
G
T06D
Housing Structure



Q
MATERIAL - ENTRANCE TO
STRUCTURE; SOIL
T06F1
Housing Structure
»


Q
YARD MATERIAL:
WOOD/DECK
T06I5
Housing Structure
*

*
Q
TYPES OF FOUNDATION:
SLAB
T06J1
Housing Structure

*

Q
PAST 6 MONTHS FLOORS
REFIN1SHED
B25D
Housing
Structure/Remodeling
*
*

Q
FLOORS IN BUILDING
T01
Housing Structure/Size
*


Q
IS THIS HOUSE OR
APARTMENT & [OWNERSHIP]
DOS
Housing/Ownership
*


Q
AV. NO. TIMES WASHED
HANDS
ATA18Z
Hygiene

*

Q
WHAT COUNTY DO YOU LIVE
IN?
CNTY_Z
Location
*
*

Q
SURROUNDING AREA:
COMMERCIAL
T06A3
Location/C haracteristics
*


Q
SURROUNDING AREA:
INDUSTRIAL
TQ6A4
Location/Characteristics


*
Q
PAST WEEK TAKE
DIURETICS
F06A2
Medications/Supplements


*
Q
PAST WEEK TAKE OTHER
MEDICINE?
F06E2
Medications/Supplements
*
*

Q
PAST WEEK TAKE
CHROMIUM SUPPLEMENT?
F07C2
Medications/Supplements
*


Q
DAYS PAST MO REMOVING
PAINT (OTHER)?
B10C
Paint Usage


*
Q
SEX OF PARTICIPANT
B02
Participant Characteristics
*
»

Q
HEIGHT (METERS) -
PARTICIPANT
B04CMD
Participant Characteristics

*

2-15

-------
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
Q
WEIGHT (KILOGRAMS) -
PARTICIPANT
B05AMD
Participant Characteristics
•
«r

Q
HOUSEHOLD INCOME
B44
Participant Characteristics
*


Q
DO YOU HAVE HOUSE
PETS?
B43A
Pets


*
Q
# MOSTLY OUTDOOR HOUSE
PETS?
B43C
Pets
*


M
FOUNDATION SOIL
CONCENTRATION
CONC122
Soil Measurements
*
*
Nl
• Q
SOURCE OF RUNNING
WATER-PUB/COMM SYSTEM
B26B1
Source of Water
*


Q
SOURCE OF COOKING
WATER?
B26C
Source of Water
«


Q
SOURCE OF DRINKING
WATER
B26D
Source of Water
*
*

Q
WATER TREATMENT:
REVERSE OSMOSIS?
B26E1I1
Source of Water
*


Q
DAYS IN 3-MO. EAT HOME-
GROWN CANNED CROP
B12B
Specific foods
*


Q
TOTAL HRS/WK WORKED AT
HOME, BOTH JOBS
AC14AIZ
Time at Home


*
Q
AV. DAILY HOURS INSIDE AT
HOME
ATE_EZ
Time at Home
*


Q
TOTAL HRS/WK WORKED AT
BOTH JOBS
AC14AZ
Time Away From Home
•


Q
AV. DAILY HOURS OUTSIDE
ELSEWHERE
ATE_OZ
Time Away From Home

*

Q
AV. DAILY HOURS INSIDE AT
WORK/SCHOOL
ATEGZJE
Time Away From Home
*
*

Q
AV. DAILY HOURS OUTSIDE
AT WORK/SCHOOL
ATEMZ_0
Time Away From Home

*
*
Q
SMOKED CIGARETTES
A1A15Z
Tobacco


4
Q
SMOKED CIGARS/PIPEFULS
A1A16Z
Tobacco


*
Q
AV. MIN. INDOORS WITH
SMOKER
ATA20Z
Tobacco
*
*

Q
USE TOBACCO PRODUCTS?
B06A
Tobacco
*
*
*
Q
MONTHS SINCE QUITTING
TOBACCO USE
B06C
Tobacco

#

Q
# CIGARETTES/DAY
SMOKED [CATEGORIES}
B07A
Tobacco
*


Q
# CIGARS/DAY SMOKED
BOTE
Tobacco


•
Q
# MINUTES WITH SMOKER
AT HOME
BQ8A
Tobacco
*

*
Q
PAST WEEK USED AIR
CENTRAL HEAT
FOIL
Ventilation System (AC/Heat)
*
*

Q
# DAYS PAST WEEK USED
AIR CENTRAL HEAT
F01L1
Ventilation System (AC/Heat)


*
M
TAP WATER
CONCENTRATION
CONC123
Water Measurements
*
*
Nl
Q
# TIMES PAST WEEK USED
SANDER
F02F2
Wood Work


*
Q
AT-JOBS CONTACT WITH
SAW DUST?
AC14G1Z
Working Conditions
*


Q
AT-JOBS CONTACT WITH
ROAD DUST?
AC14G2Z
Working Conditions

•

2-16

-------
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
Q
AT-JOBS CONTACT WITH
MINE DUST?
AC14G5Z
Working Conditions
*


Q
AT JOBS, CONTACT WITH
UNKNOWN CHEMS?
AC14J122
Working Conditions
*


* Predictor was selected for the objective.
Nl - Measurement variable was not included in logistic regression analysis, but was selected in the same model for one of the other
objectives. Thus it may be a predictor for this objective.
For metals in the Arizona study, the following categories were selected across all the objectives:
Dermal Measurements (including NI for High Exposure)
Exercise
Food Collection/In take/Preparation
Housing Structure/Remodeling/Size
Location or Location Characteristics
Medications/Supplements
Soil Measurements (including NI for High Exposure)
Time Away from Home
Tobacco
Ventilation System (AC/Heat)
Water Measurements (including NI for High Exposure)
Working Condtions.
The following categories were selected only for the High Exposure objective:
Diet
Paint Usage
Wood Work.
The following categories were selected only for the Modeling objective:
Dust Level
Fireplace/Wood Stove
Heating Season
Housing Ownership
Specific Foods
Wood Work.
The remaining categories were mixed across other combinations of the objectives.
2-17

-------
Table 2.4	Selected Predictors for VOCs in the Arizona Study Across the Phase 3 Analysis Objectives
Type: M - Measurement; Q - Question
Modeling - Objective 1; Exposure Level - Objective 2; High Exposure - Objective 3
Type
Description
Variable
Category
Modeling
Exposure
Level
High
Exposure
M
OUTDOOR AIR
CONCENTRATION
CONC311
Air Measurements

*
Nl
Q
DOORS AND WINDOWS LEFT
OPEN
A1A26Z
DoorsAA/indow Open
•>
*

Q
CLEANED FIREPLACE/WOOD
STOVE
AIA06Z
Fireplace/Wood Stove
*
~

Q
FREQ. OF WOOD/COAL
STOVE USE
B36C
Fireplace/Wood Stove
*


Q
IN ENCLOSED GARAGE
WITH CAR
A1AQ3Z
Garage Structure/Activity
*


Q
DOORWAY FROM GARAGE
TO LIVING QTRS?
B27C
Garage Structure/Activity
*
*

Q
HEATING FUEL -
ELECTRICITY?
B31C
Heating Fuel Usage

9

Q
FREQ. OF PORT./UNVENTED
GAS HEATER USE
B35C
Heating Fuel Usage
*


Q
EXTERIOR SIDING -
CONCRETE BLOCK
T06C4
Housing Structure
*


Q
PAST 6 MONTHS WALL
ADDED OR REMOVED
B25B
Housing
Structure/Remodeling
*


Q
WHAT COUNTY DO YOU LIVE
IN?
CNTY_Z
Location

*

G
SURROUNDING AREA:
INDUSTRIAL
T06A4
Location/Characteristics
*


Q
DAYS PAST MO. STRIPPING
PAINT (CHEM)?
B10B
Paint Usage
*


Q
INDOOR PESTICIDE,MONTH
LAST USED
B38E
Pesticide Use



Q
INDOOR PESTICIDE, WHO
MIXED
B38H
Pesticide Use
*


Q
DAYS SINCE LAST USED-
INSECTICIDES
F02G3
Pesticide Use
*


Q
SOURCE OF DRINKING
WATER
B26D
Source of Water
*


Q
DAYS IN 3-MO. EAT HOME-
GROWN CANNED CROP
B12B
Specific Foods
*


Q
IN VEHICLE WITH SMOKER
AIA21Z
Tobacco
*


Q
USE TOBACCO PRODUCTS?
B06A
Tobacco
*


Q
# CIGARETTES/DAY
SMOKED [CATEGORIES]
B07A
Tobacco
*
*

Q
# MINUTES WITH SMOKER IN
ENCL. VEHICLE
B08C
Tobacco

*

Q
WALL/WINDOW AIR
CONDITIONER?
B29B2
Ventilation System (AC/Heat)
*
*

Q
PAST WEEK USED AIR
CENTRAL HEAT
FOIL
Ventilation System (AC/Heat)
*


* Predictor was selected for the objective,
Nl - Measurement variable was not included in logistic regression analysis, but was selected in the same model for one of the other
objectives. Thus it may be a predictor for this objective.
2-18

-------
For VOCs in the Arizona study, no predictors were selected for the High Exposure objective
because the separation issues prevented the analyses from being finalized. The following
categories were selected across the Modeling and Exposure Level objectives:
Doors/Windows Open
Fireplace/Wood Stove
Garage Structure/Activity
Heating Fuel Usage
Location or Location/Characteristics
Pesticide Use
Tobacco
Ventilation System (AC/Heat),
The following categories were selected only for the Modeling objective:
Housing Structure
Housing Structure/Remodeling
Paint Usage
Source of Water
Specific Foods.
2.4 General Comments
Overall the approach seemed to provide reasonable and consistent results in selecting primary
predictors and topics. The topics that seem to be most universal across the four tables (two
studies and two chemical classes) are air measurements, tobacco-related activities, air-exchange
activities, housing characteristics, and where time is spent. The results for a specific
chemical/model should be viewed in the context of the distribution of the dependent variable
with respect to range of values, values below detection limit, and available sample size.
Another consideration in evaluating the success of this approach relates to the issue of
identifying what might cause exposure. There may be actions or circumstances which put a
person more at risk for exposure, but these, in and of themselves, do not necessarily cause the
exposure. The exposed person must do something specific to complete the contact with the
chemical. Developing a scenario for exposure based on a highly exposed person or person(s),
may not be useful because there are many others who fit the same scenario, but are not highly
exposed. The results of this report can be used as a starting point for further evaluation of
exposure scenarios based on the selected predictors.
2-19

-------
This page intentionally left blank.

-------
3 Recommendations
The analysis approach used for this project was developed using non-traditional analysis
techniques for these type of data, and traditional analysis techniques used in non-traditional
ways. This developmental approach afforded opportunities for analysis not available with
traditional techniques, however, adaptations of this approach could provide a better tool for
selecting predictors for future studies. The following recommendations are made for future
efforts:
•	Model-based analyses were restricted to dependent measurement variables that had
more than 50% of their values above detection limit. Other approaches for imputing
or estimating the values below the detection limit, such as fitting a distribution to the
values (Hornung 1990, Helsel 1990), may allow a better description of the existing
relationships.
•	In models where less than 50% of the dependent measurement values were above the
detection limit, the E-CHAID classification analysis might be performed using a
binary dependent variable with categories of below detection limit and above
detection limit. The categories for the binary split in the dependent variable could
also be chosen to fit other scenarios, depending on the nature of the measurement
variable's distribution.
•	The approach for screening questionnaire variables in Phase 2 excluded variables
without looking at potential relationships with dependent measurement variables.
This limitation was initiated because of the logistical implications from the number of
available measurement variables compared to the large number of questionnaire
variables. In some cases this leads to relationships identified in models that in and of
themselves would not be considered likely (Hand 1999). Development of a different
questionnaire variable screening approach could improve the relationships identified.
For example, an approach being used in another NHEXAS task performs one-way
analysis of variance to screen the questionnaire variables to assess their relationship
with the dependent measurement. The effectiveness may depend on the number of
measurements and the number of questionnaire variables involved in the assessment.
•	To analyze the questionnaire variables in a traditional regression analysis, the
approach for Objective 1 used categorical regression analysis to transform the values
assigned to the dependent and independent variables. These transformations
provided values that were more suitable for use in the traditional regression analysis.
The coefficients from the traditional regression analyses, however, are only
interpretable in terms of the transformed variables. When the predictors for a model
are selected, it would be useful to review the transformations from the categorical
3-1

-------
regression analysis for a better understanding of the relationships between the
dependent measurement variable and the questionnaire variables as represented by
the transformations. This understanding could be useful to future studies for
questionnaire development.
The analyses performed usually included questionnaire variables and measurements
that were considered to have primary relationships with the dependent variable.
Alternative analysis techniques that might be considered to allow predictors expected
to have a secondary relationship to be included in the conceptual model are Path
Analysis and Structural Equations, Clayton (1999) performed analyses using these
techniques with a limited set of questions for the Region 5 study. These techniques
might be used in conjunction with categorical regression analysis as in Objective 1 to
make the categorical and non-response outcomes of the data more suitable for the
analyses.
An evaluation of the usefulness of the predictors selected in this project could be
performed by using the data from the Maryland study to determine how well the
concentration and exposure levels are predicted, especially for the predictors with
stronger relationships. Consideration would also need to be given to the impact of
differences in study design and area of the country.
A second type of evaluation would consider other questionnaire variables in the same
principal components as the selected predictors, and determine how well they might
predict the concentration and exposure levels for the Region 5 and Arizona study
data.
3-2

-------
4 Methodology
The methodology implemented in this project began with a review of the available data and the
development of an analysis approach. Section 4.1 describes the data available for analysis from
the N1IEXAS studies. Section 4.2 identifies issues inherent in the data that led to the current
analysis approach. Sections 4.3, 4.4, and 4.5 provide detailed descriptions for each phase of the
approach with examples. Highlights of the Phase 3 analyses are presented in Section 5—Results,
and details of the analyses are presented in Appendices G and H . Analyses were performed
using SPSS (Base, Regression Models, and Categories) version 11.5, SPSS Answer Tree version
3.1, and S-Plus version 6. SPSS Base version 11.5 and SAS version 8 were used for data
manipulation and summary table preparation.
4.1 Data Sets Used for Analysis
4.1.1 Description of NHEXAS Study Data
The NHEXAS pilot studies provide a basis for this project. First and foremost, the studies
represent the type of study that could take advantage of the project's results, that is, a study
interested in exposure assessment and utilizing questionnaire and monitoring data. Second,
there is a breadth and commonality to the study data collected. The NHEXAS pilot studies
cover four areas of the country: EPA's Region 5, Arizona, Maryland, and Minnesota. The study
data sets also include several chemical classes with common chemicals, several sample
mediums allowing for a breadth of models, and a common base of questions asked of the
participants. The studies were designed to be demonstration/scoping studies and each had a set
of hypotheses that focused its efforts. Appendix I presents additional details of the four studies
and how they compared in terms of objectives, study design, and chemicals sampled.
The Region 5 and Arizona studies provide information on the general population and are the
basis for this project, P-01--Analysis and Comparison of NHEXAS Exposure Data to Residential
Pollutant Sources, Concentrations, and Activity Patterns—in the NHEXAS Strategic Analysis
Plan (EPA 2000). The Maryland study is longitudinal and is analyzed using different techniques
under task ST-01 in the Analysis Plan. The Minnesota study is focused on pesticide exposure in
children. These studies are not included in this project.
The Region 5 study was conducted in EPA's Region 5 by a consortium consisting of the
Research Triangle Institute and the Environmental Occupational Health Sciences Institute. The
study covers the states of Illinois, Indiana, Michigan, Minnesota, Ohio, and Wisconsin, and
includes both questionnaires and personal exposure, residential concentration, and biomarker
measurements of metals and VOCs (Pellizzari 1995). This study covered three monitoring
periods called visits. In the first visit, all questionnaires were administered, and the greatest
breadth of monitoring data were collected over the period of a week. Additional samples were
4-1

-------
collected and questionnaires administered in subsequent visits for a subset of visit 1 participants.
Questionnaire and monitoring data from visit 1 for the 249 participants and households were
analyzed for this report.
The Arizona study was conducted across the state of Arizona by a consortium consisting of the
University of Arizona, Battelle Memorial Institute, and the Illinois Institute of Technology. The
study includes both questionnaires and personal exposure, residential concentration, and
biomarker measurements of metals, pesticides, and VOCs (Lcbowitz 1995). This study had
three monitoring periods called stages. The first stage was used to screen for candidate
households. The second stage collected questionnaire data and environmental samples from a
subset of the stage 1 households. In the third stage, a subset of stage 2 households were
reevaluated for metals, pesticides, and VOCs using methods with greater resolution and
reliability. Multiple individuals in each household were asked to complete questionnaires.
Questionnaire monitoring data from the 179 stage 3 households and the primary participant in
each household were analyzed for this report.
Although there are many similarities between the Region 5 and Arizona studies, differences in
the locations, study objectives, and measurement techniques suggest that the data from each
study be analyzed and discussed separately. The following tables describe some of the
similarities and differences between the two studies and thus their influence on the analyses to be
performed. This project focuses on metals and volatile organic compounds (VOCs) which were
collected in both studies. Table 4-1 describes the primary target chemicals measured in each
study for these chemical classes. These chemicals were selected because they were known or
strongly suspected to be major environmental health risks; they occurred in more than one
environmental medium; and they had importance to EPA, other federal agencies, and state
interests. The collection and analytical techniques used by a NHEXAS study were optimized to
detect these chemicals. Only primary chemicals were analyzed for this report. Secondary target
chemicals are included in Table 4-1 only where the chemical is a primary target for the other
study.
4-2

-------
Table 4-1. Primary Target Chemicals Analyzed in the MIEXAS Region 5 and Arizona Studies
P = Primary
S = Secondary
Chemical
Region 5
Study
AZ Study
Metals


Arsenic
P
P
Cadmium
S
P
Chromium
s
P
Lead
p
P
Nickel
s
P
VOCs


Benzene
p
P
Chloroform
p
S
Tetrachloroethyiene
p

Toluene
s
P
Trichloroethylene
p
P
1,3-Butadiene

P
Formaldehyde

P
Table 4-2 describes the media sampled in the two studies. The chemical classes measured for
each medium are also noted. The media in the NHEXAS studies were selected to represent
major pathways of exposure to the chemicals in Table 4-1 for the general population. Each study
design defined the number of samples collected for a medium within the study's population. A
medium's inclusion in these analyses was determined by the level of sample availability and the
medium's relevance to the models analyzed.
4-3

-------
Table 4-2. Media Measured in the NHEXAS Region 5 and Arizona Studies
Medium
Region 5
Study
Arizona
Study
Indoor Air
Metals
VOCs
Metals
VOCs
Outdoor Air
Metals
VOCs
Metals
VOCs
Personal Air
Metals
VOCs
Metals
VOCs
House Dust
Metals
Metals
Soil
Metals
Metals
Food
Metals
Metals
Beverage
Metals
Metals
Tap Water
Metals
Metals
VOCs
Drinking Water
Metals
VOCs
Metals
VOCs
Blood
Metals
VOCs
Metais
VOCs
Urine
Metals
Metals
Dermal Wipe
Metals
Metals
One set of questionnaires was developed for use in all NHEXAS studies. Rationales for
selecting the questions were based on the chemicals of interest, possible sources of exposure to
the chemicals, and activities that might contribute to the exposure. The questionnaires were
approved by the Office of Budget and Management (OMB) before their use in the field. Copies
of the OMB-approved version of the questionnaires are included in Appendix A. Each study
consortium selected questions from the OMB-approved version to determine their final
questionnaire collection forms. The studies generally used the same questions with a few
exceptions based on the location of the study, the chemicals of interest, and the goals of the
study. Table 4-3 describes the questionnaires used in the Arizona and Region 5 studies. Based
on the study design, a questionnaire may have been administered to an individual more than once
during the study. The Dietary Diaries collected information about the foods eaten by a
participant. The food codes and associated variables included in the diaries would require
information about the contamination of different food types to be analyzed in, for example, a
study of the total diet residue (Macintosh 1996) or with the Dietary Exposure Potential Model
(). Although the dietary diaries were not included
here, other diet-related questions as described in Appendix D were included.
4-4

-------
Table 4-3. Types of Questionnaires Used in the NHEXAS Region 5 and Arizona Studies
Questionnaire
Description
Descriptive
To enumerate individuals within a household for sampling purposes
(basis for selection of the participant), to identify general characteristics
of the living quarters and occupants, and to provide a basis for
assessing potential bias due to refusals in subsequent steps.
Baseline
To provide more detailed information on the characteristics of the
participant and housing, and on the usual frequency of activities over a
longer time frame (i.e., last month or year) relative to persistence in
environmental or biological media.
Follow-Up
To provide information on relatively infrequent (e.g., less than daily )
activities during the monitoring period to explain variation in the sample
(or differences between subgroups) for the monitoring results.
Food Follow-Up
To identify dietary patterns of food preparation and consumption during
the period in which the duplicate diet samples were collected
Time Activity Diary
For collecting data on detailed (daily) time and location information and
activity patterns (for relatively frequent activities when recalling events
over several days would be more burden on the respondent).
Technician Walk-
Through
To identify and inventory the presence of pollutant sources and
document physical characteristics of the building (completed by
technician to minimize burden on respondents).
24-Hour Dietary
Diary
For collecting information on actual daily consumption patterns from the
participant for use in estimating dietary exposures. To test the need for
making direct dietary measurements through comparisons with
assessments using existing data and approaches.
The NHEXAS data have undergone many analyses by the study consortia to respond to the
hypotheses underlying their study designs, and to subsequent questions of interest. Some
examples of the analyses performed include;
Evaluations of metals, distributions and preliminary exposures for NHEXAS Arizona
(O'Rourke 1999)
Predictions of residential exposure to chlorpyrifos and diazinon for NHEXAS Arizona
(Moschandreas 2001)
Longitudinal investigation of dietary exposure to selected elements for NHEXAS
Maryland(Scanlon, 1999)
Longitudinal exposure to selected pesticides in drinking water for NHEXAS Maryland
(Macintosh 1999)
Distributions and associations of lead, arsenic, and volatile organic compounds in
NHEXAS Region 5 (Clayton 1999)
Responses to NHEXAS Time/Activity Diary in NHEXAS Region 5 (Freeman 1999)
Predicting children's short-term exposure to pesticides in Minnesota Childrens Pesticide
Exposure Study (Sexton 2003).
4-5

-------
A list of the citations published by the four study consortia is included in Appendix B.
Analyses of the NHEXAS data, as defined in the NHEXAS Strategic Analysis Plan (EPA 2000)
were specifically developed to not be duplicative of analyses already performed by the study
investigators. At this point, the work in Clayton (1999) is the only similar, but not as extensive,
analysis of questions. Thus comparisons of results from other NHEXAS analyses are not
included in this report.
4.1.2 Source of Data Used
The data analyzed in this report were obtained from EPA's Human Exposure Database System
(HEDS) at . HEDS provides web access to study
documents and to study data sets organized by questionnaire and by chemical class and medium.
Information on the studies, data sets, and documents is available through EPA's Environmental
Information Management System (EIMS) at . A First-
Time User's Guide and a Reference Manual for accessing NHEXAS-related entries using HEDS
and EIMS are available on the HEDS web site. Appendix C contains a list of the HEDS entries
available for the Arizona, Maryland, and Region 5 studies; it also includes the EIMS Entry ID
number for each data set and document available.
Data included in HEDS were obtained from the study consortia. Quality assurance of the data
was performed by the consortia and in the conversion to the HEDS format. Any changes made
to the data in the HEDS conversion were limited to data structure and coding consistencies.
Corrections, for seeming inconsistencies in responses, were made to the data with guidance from
the consortium-lead. The HEDS web site includes a link for frequently asked questions about
the use of HEDS and the data included in HEDS. The notes section includes information
relating to questions or comments submitted to EPA regarding the data and corrections in the
data. Data used in these analyses were adjusted as needed to reflect the corrections described in
the HEDS notes.
4.2 Overview of Analysis Approach
Developing an analysis approach requires an understanding of the data to be analyzed and a
review of techniques that may be appropriate given the project objectives and the data. The
following issues were identified as important to defining this approach.
1.	A large number of potential predictors from the questionnaires, approximately 600
per study, must be evaluated for potential relationships. Thus the number of variables
to be analyzed needed to be reduced for meaningful analysis. The selection of
predictors was based on patterns in the data's relationships rather than subjective
methods.
2.	A relatively small number of cases is available for analysis. The maximum number of
usable cases for the Region 5 study is 249 and for the Arizona study is 179.
Techniques like Multiple Linear Regression require that the number of cases be larger
than the number of predictors analyzed. Depending on the variability in the
dependent variable, a small number of cases can also impact the effectiveness of a
technique to identify predictors that have strong associations with the dependent
variable.
4-6

-------
3.	The majority of variables from the questionnaires are categorical, that is, they have a
small and limited number of response outcomes. Traditional parametric analyses,
such as regression analysis, were developed for use with continuous-valued variables.
In general, when categorical data are used as predictors, a transformation using
dummy or indicator variables is required. This increases the number of predictors
in an analysis. A large number of categorical variables, as may be analyzed for this
project, would produce situations with more predictors than cases available for
analysis. Other types of analyses also handle categorical variables; however, they
have limitations for dealing with a large number of such variables.
4.	The questionnaire data include non-response outcomes, for example, categories that
represent responses of missing, not applicable, and refused. Excluding cases with
non-response outcomes from an analysis with a large number of variables can •
substantially decrease the number of available cases which affects their representation
of the study population. In some instances, a questionnaire variable may have more
than one type of non-response outcome. Combining the non-response outcomes when
a variable has more than one type, e.g., "Missing" and "Not Applicable," may not
always be a suitable approach because the outcome categories may represent different
subpopulations.
5.	Environmental or personal samples were not collected for every medium/chemical
combination in a study. Missing or non-available measurements may occur because
of sampling design, issues with the field sampling process, or issues with the
analytical process. Lack of measurements across media for a given chemical can
impact the feasibility and/or suitability of the analysis of a conceptual model based on
the environmental health paradigm.
6.	Probability distributions of environmental measurement data tend to be skewed to the
right, since many measurements fall at the low level of the distribution. Traditional
parametric analyses require a normality assumption not necessarily consistent .with
this type of skewness. In most instances, the measurement data are transformed by
the logarithmic or other mathematical function to better meet the assumptions of the
analysis.
7.	Environmental measurement data tend to include values that are at or below the
detection limit of an analytical procedure. Thus they are termed censored data. How
these values are handled, with respect to the distribution of the data that are above the
detection limit, can affect the results and interpretation of the analysis (Hornung
1990; Ilelsel 1990). In the NHEXAS data sets, the detection limit may differ across
measurements within a study/medium/chemical combination depending on how the
laboratory chose to provide the data.
8.	The Region 5 and Arizona studies selected their households and participants using
population-based probability sampling schemes. Using the weights associated with
the sampling designs in the analyses may affect the types of analyses performed and
the results.
The analysis approach selected for this project attempts to address these issues. Other
considerations in defining the approach are the project's objectives, the number of analyses that
4-7

-------
would need to be performed, how resource-intensive certain analyses are, what results can
reasonably be derived from these data, and what information would be meaningful to readers.
This analysis approach includes both non-traditional statistical techniques and non-traditional
uses of traditional parametric statistical techniques. It uses patterns of relationships that exist in
the data rather than subjective methods to select variables for subsequent analyses. The analyses
are then science-driven; that is, they are based on the understanding of the types of relationships
that are plausible, and are used to test hypotheses based on that understanding of the science.
The approach recognizes the limitations of the data and thus balances exploratory and
descriptive paths with inferential paths.
Current literature suggests that this type of analysis approach might be considered "data mining."
This is a general name for an approach that searches through data for relationships which may or
may not be defined a priori. It uses a combination of machine learning, statistical analysis,
modeling techniques and database technology to find patterns and subtle relationships in data
and to infer rules that allow prediction of future results (Two Crows Website). The process is
essentially exploratory in nature in comparison to a confirmatory analysis, which is concerned
with determining whether a proposed conceptual model adequately explains the observed set of
data (Hand 1999).
Thus it is important that the approach used here distinguish in the results presented, the
explainable unexpected from the by-chance unexpected.
Data mining is used in a broad spectrum of industries such as telecommunications, insurance,
medical applications, financial markets, retailers and pharmaceutical firms. Several current data
mining techniques include neural networks, decision trees, logistic regression, discriminant
analysis and generalized additive models (Two Crows 1999). Although it usually operates on
large data sets with thousands or millions of cases, it is used in this project because of the large
number of potential predictors to be analyzed. As Hand (1999) notes:
Related to the view of data mining as a process is the recognition of the novelty of
the results. Many data mining results are only what one would expect—in
retrospect. However, the fact that one can explain them does not detract from the
value of the data mining exercise in unearthing them. Without this exercise, it is
entirely possible that one would never have thought of them. Indeed, it is likely
that only those structures for which one can retrospectively formulate a plausible
explanation will be valuable. Those which still seem improbable, no matter how
one twists and turns the likely causal mechanism, may well not be real
phenomena at all, but simply chance artifacts of the particular data at hand.
The selected analysis approach consists of three phases. Section 4.3 describes the first phase,
Data Review and Preparation. In this phase, the data are evaluated in terms of what data are
available and can be used, and how the data may need to be reformatted for use in the analyses.
Section 4.4 describes the second phase, Questionnaire Variable Reduction, where the number of
potential predictors is reduced through an assessment of their interrelatedness. Section 4.5
describes the third phase, Model-Based Analysis, where relationships between measurements
and questionnaire variables are identified in the context of a model from the environmental
health paradigm.
4-8

-------
Unweighted Analyses
Weights associated with the sampling design for the Region 5 and Arizona studies axe
available for the data sets from HEDS, however, unweighted analyses will be performed for two
reasons. Options for using such weights in the selected analysis approach may not be available
and may be burdensome to implement. Secondly, the weights developed for the study data may
not be appropriate to use when cases are deleted for reasons pertinent to a specific analysis, or
when measurement variables with different weights are used in the same analysis. Using
unweighted analyses to look at relationships in the data is less likely to affect the end results of
selecting important predictors than for developing population estimates. It should be noted that
since the analyses are unweighted, some variables may appear to have relationships when they
are actually confounded with the sampling design parameters.
4,3 Phase 1—Data Review and Preparation
Phase 1 prepares the data for analysis in Phases 2 and 3. Data preparation includes a review of
the data in the context of its study to identify the cases to be included, the variables to be
analyzed, and any transformations of the data needed for more effective analysis of the data.
Section 4.3.1 describes how the study design is used to determine which cases fit the analysis
requirements. Sections 4.3.2 and 4.3.3 describe the review and revision process for the
questionnaire variables. Composite and summary questionnaire variables were created as
needed, and questionnaire variables were recoded to better handle conditional questions and non-
response outcomes. Finally, Section 4.3.4 describes how the measurement data were explored
using descriptive statistics to understand their probability distributions and any need for
transformations before the analyses.
4.3.1	Case Inclusion
Since the Phase 3 analyses require both questionnaire and measurement data, the first step
identifies cases in the study with both types of data. For example, in the Region 5 study, 555
participants were interviewed using the Descriptive questionnaire, 326 of the 555 participants
were administered the Baseline questionnaire. In visit 1, only 249 of the 326 participants had
monitoring samples collected and were administered the remainder of the questionnaires. Thus
the cases selected for the Region 5 analyses were the 249 participants. Measurements were
collected at the household and participant levels. Hereafter, references to participants will refer
to the measurements from a household at both levels.
4.3.2	Questionnaire Variable Source and Inclusion
To provide organization to the analysis approach, the questionnaire variables were assigned to a
set of group categories; Demographic, Dietary, Exposure, Health Status, Housing
Characteristics, and Occupation based on the original design of the NHEXAS questionnaires.
The questionnaire variables were also categorized according to the data type of the variable:
nominal, ordinal, or numeric. The group category, data type, and full description for each
questionnaire variable analyzed in this report are included in Appendix D.
4-9

-------
Questions Not Analyzed
Some questions from the study data sets were not analyzed. These included open-ended
questions (e.g., name of pesticide sprayed), and questions that were not pertinent to this work
(e.g., characteristics of household members other than the primary participant). The former type
of questions may contain useful information regarding potential relationships and can be part of.
subsequent analysis projects. Derived variables were created for ease of analysis. For example,
the "Average number of minutes/day spent in transit" during the monitoring period is created
from the "Number of minutes/day spent in transit" noted on each day in the Time-Activity Diary.
The daily amount is easier to collect from the participant; the average amount is more useful for
analysis relative to the samples collected in three studies. Appendix D identifies the source of
each questionnaire variable as "Derived" or as "OMB," where the latter identifies questionnaire
variables used directly from the OMR-approved questionnaires in Appendix A. The Region 5
study developed many derived variables for the Baseline, Food Follow Up, and Time-Activity
questionnaires. In subsequent analyses, the derived variables replaced the OMB variables used
to create them. Derived variables similar to those in the Region 5 study were created for the
Arizona study where possible. The calculations of the derived variables were handled in a
comparable manner, accounting for study differences. For example, the Region 5 study
administered the Food Follow Up questionnaire on four days; the Arizona study administered it
on one day. A general list of derived variables is included in Appendix K.
Additional derived variables were created to meet certain analytical needs. For example, the
variables AC and HEAT compare the month of the monitoring period to the responses for the
question pairs (B29C1 and B29C2) and (B33A and B33B), respectively, to identify the
participant's definition of a heating or cooling season. Another example is the variable F05.
This question was originally provided as three separate variables to describe where the car was
usually parked: in an attached garage, in a detached garage, or in an attached carport. This is
like a "Circle All That Apply" question where any combination of options can be selected and
each is tracked as a separate variable. Preliminary analyses using the three variables did not
allow the effects of the three parking situations to be clearly identified across the participant's
responses; thus a derived nominal variable was created to capture all category combinations that
occurred. Other "Circle All That Apply" type questions could be considered for regrouped
variables in future analyses to clarify interpretations of variability or differences.
The list of questionnaire variables analyzed in Phase 2 is presented in Appendix D.
4.3.3 Questionnaire Variable Recoding
After defining and selecting the set of questionnaire variables to be analyzed based on
questionnaire design and analysis needs, the variability and distribution of response outcomes
and levels of non-response outcomes for each variable were reviewed using frequency
distributions. Variables were excluded if they could not be used in the analyses, for example,
variables with no variability in response outcomes. Summary statistics for questionnaire
variables in the two studies are included in Tables Gl-1 and Hl-1.
Reeoding Conditional Questions
Frequency distributions were also used to determine when response categories should be
recoded for more effective analysis. For example, questionnaires tend to use conditional
4-10

-------
questions, that is, questions that are only asked of some participants, based on their response to
an earlier question. These questions are part of skip patterns in a questionnaire's administration.
An example of a conditional pairing in NHEXAS is question B06A (Do you currently use
tobacco products?) and B07A (# cigarettes smoked per day?). If the answer to B06A (the
condition question) is "No", then B07A (the conditional question) was not asked. To ensure that
response outcomes on conditional questions accurately and consistently reflected the response to
the condition question, response outcomes to the conditional questions that were skipped were
coded with a "Not Applicable (NA)" response outcome. This recoding has some additional
impacts on the analysis. For condition questions with a large number of "No" response
outcomes, the distribution for the conditional question will be heavily weighted with NA
response outcomes. In the Phase 2 analysis, the consistent coding between the condition and
conditional questions maintains the relatedness between the questions from the same subject
group.
Options far Handling Non-Responses
Questionnaires usually include non-response categories to handle response outcomes such as
"Missing," "Refused," "Not Applicable," and "Don't Know." When analyzing several
questionnaire variables at a time, the existence of non-response outcomes either decreases the
number of cases with complete data to levels that are not necessarily representations of the
population sampled or leads to the elimination of variables with some, but incomplete
information on all cases from the analysis. Imputation is sometimes used as a solution for
analyzing such incomplete data sets; however, it requires an a priori knowledge of relationships
between variables. It can also be resource-intensive even when using software packages such as
SPSS Missing Values which are designed to assist in the process. For these studies, the number
of non-response outcomes was usually small for any specific variable; however, the large
number of variables that would require imputation suggested that an alternate approach of
reading non-response categories be considered. This type of recoding preserves the sample size
in an analysis and allows differences between respondents and non-respondents to be
investigated. A "No Response" category was created for the response outcomes of "Missing,"
and "Refused," which were considered equivalent for these analyses. The "Not Applicable"
category was not included with the "No Response" category so that situations resulting from
conditional questions could be easily recognized.
Recoding No Response and Not Applicable Categories
Although most of the questionnaire variables were nominal (e.g., yes-no), the response
categories could be relatively placed on a continuum of impact to exposure. Categories or values
from other questions, such as those representing counts, categories with an underlying ordinality,
or numeric values could be viewed in the same way. A coding scheme assigned values to the
"Not Applicable" and "No Response" categories for each question to be consistent with a
continuum of exposure impact. In each case, the "Not Applicable" category immediately follows
or precedes the category assumed to have the least impact on exposure because it implies no
potential exposure from the activity or environment. The "No Response" category immediately
follows or precedes the "Not Applicable" category. If a question is not conditional, and thus
does include a "Not Applicable" category, the "No Response" category immediately follows or
precedes the category assumed to have the least impact on exposure. Tables 4-4a and 4-4b
illustrate this recoding scheme with two examples.
4-11

-------
Table 4-4a. Example of Questionnaire Variables with Code Values Assigned for No Response and Not
Applicable: Exposure impact less likely with "No" response than "Yes" response
Code
Description
1
Yes
2
No
3
Not Applicable
4
No Response
Table 4-4b. Example of Questionnaire Variables with Code Values Assigned for No Response and Not
Applicable: Exposure impact less likely with fewer cigarettes smoked per day
Code
Description
-1
No Response
0
Not Applicable
1
None
2
Less than 1/s pack/day
3
Vi to 1 pack/day
4
1 to 1.5 packs/day
5
1.5 to 2 packs/day
6
2+ packs/day
In Table 4-4a, the response category with the least potential impact to exposure is 2, "No." Thus
the "Not Applicable" category is assigned a value of 3 and the "No Response" category is
assigned a value of 4, In Table 4-4b, the response category with the least potential impact to
exposure is 1, "None" (No cigarettes smoked per day). For this variable, "Not Applicable" and "
No Response" categories are assigned the values of 0 and -1, respectively. This coding scheme
has an implicit assumption that a "No Response" outcome can represent any of the question's
response categories and thus the participant's actual situation may not have less impact on
exposure. However, the scheme provides a consistent placement for the non-response
categories, and the statistical analyses take this assignment into account. Tables Gl-1 and Hl-1
provide descriptive information about the questionnaire variables and their levels of non-
response for the Region 5 and Arizona studies.
4.3.4 Measurement Variable Definition and Transformation
Measurement data provide information about the concentration of a chemical in an
environmental or personal sample. An understanding of the characteristics of the measurement
data is important for appropriate and effective analyses. Thus the data were reviewed using
descriptive statistics and plots to identify the measurement variables that could be used in the
analyses, and to determine whether transformations of the variables were necessary to satisfy the
assumptions of subsequent statistical techniques. Each study defined quality indicators to flag
measurement data as "okay," "suspect but usable," and "unusable." Measurements flagged as
unusable were not included in HEDS for the Region 5 study and were eliminated from further
analysis for the Arizona study. The remaining measurements were summarized by chemical and
a sample type code that defined types of measurement variables (e.g., concentration of metals in
4-12

-------
indoor air). The sample type codes are derived from the chemical class, medium, sampling
location, and measurement type (concentration, loading, intake, or adjusted concentration). The
schemes for defining sample type codes were study-specific. Appendix L contains a list of the
sample type codes used in this report.
Summary statistics for data in each target chemical and sample type code were evaluated to
determine which measurement data would be used in the Phase 3 analyses. A measurement
variable, that is, a chemical/sample type code combination, used as a dependent variable in an
analysis was required to have at least 50 percent of its measurements above the detection limit,
A measurement variable used as an independent variable in an analysis could have as little as 10
percent of its measurements above the detection limit. The criterion for independent variables
was less stringent to capture any potential associations with values at the high end of the
dependent variable distribution. A measurement variable's relevance in a conceptual model
based on the exposure paradigm was also a criterion for its inclusion in a model analysis. Tables
Gl-1 and H1-1 provide the summary statistics for the measurements used in subsequent
analyses. Measurement data for additional chemicals and sample type codes are available from
HEDS as described in Section 4.1.2.
Options for Handling Missing Measurement Data
Missing or non-available measurements may occur because of sampling design, issues with the
field sampling process, or issues with the analytical process. In the Region 5 study, for example,
soil and outdoor air measurements were only taken on a subset of the 249 households sampled.
Clayton (1999) describes an approach for imputing, where possible, the measurements missing
because of the sampling design.
To help alleviate missing data, values were imputed using temporally and
spatially related samples. For participants lacking outdoor air data, values from
other participants in the same PSU (usually a county) and having a close time
match with the participant's monitoring period were used. For those lacking soil
data, values from other participants in the same area segment within the PSU
were used.
The specific methods are described in the article. This imputation approach was implemented
for these Region 5 media.
In other instances, where data were missing for a measurement variable, imputation was not
implemented because adequate justifications or reasons to support the use of estimated/imputed
values for measurements where no measurements were taken or samples weren't analyzed were
not available to this project. Some of the Phase 3 analysis techniques allowed missing
measurement data to be treated as separate categories and potential differences between missing
and non-missing cases to be reviewed. Where that option was not available, either the cases or
the measurement variables with missing measurement data were excluded from the analysis.
Such decisions are defined in the Phase 3 part of the approach.
Detection Limits
Each study, and potentially each analytical laboratory used by the NHEXAS studies, had a
methodology for defining and providing the detection limit and the measurement value. It was
4-13

-------
unusual for all measurement values of a dependent variable to have one detection limit value.
When both the measurement and detection limit values were provided for a measurement
identified as below detection limit (BDL), the measurement value was used in. the analyses. The
Region 5 study data contains the laboratory values, which were potentially adjusted for blanks
and calibration, and includes values that were BDL. When only the detection limit value was
provided for a measurement identified as BDL, as in the Arizona study data, one-half of the
detection limit value was used as the measurement in the analysis (EPA 1992).
Distributions and Transformations
Measurements of chemical levels tend to be skewed to the right and contain censored data, that
is, values that are BDL. The results of traditional parametric analyses can be affected when
these data are used as is because their normality assumptions may be violated. A transformation
of the measurement data, specifically for the dependent variable, can help mitigate the skewness
and normality requirements.
The logarithmic transformation is one of the typical transformations used on this type of
measurement data (EPA 1992, Gilbert 1987). However, the most suitable transformations tend
to be data driven (Millard 2002). Box and Cox developed an approach (Box 1964) for
determining the power transformation for a data set. Previous analyses (Clayton 1999) identified
non-normal distributions for some of the Region 5 measurements when using the log-normal
transformation. In theory, transforming the residuals with respect to a chosen model's fit is an
ideal solution. However, this model-dependent approach is very computationally intensive,
especially when considering all possible sub-models. As a result, the appropriate Box-Cox
transformation was applied to the dependent variable in a model before the analyses were
performed. Although many of the techniques used in Phase 3 are nonparametric in nature,
comparisons of the techniques using untransformed and transformed data showed more
reasonable results when using the latter. The Box-Cox transformation selected for each
measurement variable (Millard 2002) is included in Tables G2-1 and H2-1.
The following figures display the distribution of the concentration of Arsenic in Personal Air
measurements from the Region 5 study. Figures 4-la and 4-lb show the histogram and normal
Q-Q plot for the measurements in the original scale. Figure 4-2 shows the Q-Q plot using the
logarithmic transformation. Figure 4-3 shows the Q-Q plot using the Box-Cox transformation.
CONCOlO
70-
6t>
¥
Li_
<£?	_ v? V_ <5*	«S>	/, /S 4-
CONCOlO
Figure 4.1a. Histogram of Arsenic Concentration (ng/nr) in Personal Air, Region 5 Study
4-14

-------
Normal Q-Q Plot of CONC010
O	10
Observed Value
Figure 4,1b. Normal Q-Q Plot of Arsenic Concentration in Personal Air, Region 5 Study
Normal Q-Q Plot of LCONC010
2.0,
i2 -s
-.5 0.0 .8 1.0 1,8 2.0 2.5 3.0
Observed Value
Figure 4-2. Normal Q-Q Plot of Arsenic Concentration in Personal Air, Region 5 Study, Transformed by
(Log (y))
Normal Q-Q Plot of CCONC010

1„2|

l.o

.8-
§
6
->

CO

£
.4-


"i
.2-
T>

-------
expected normal distribution. Additional reviews of the measurement data take place in Phase 3
in the context of the model-based analyses-
Derived Measurement Variables
Some derived measurement variables were created from the study data when the desired
measurement type was not available. For example, concentration values for the duplicate diet
food samples were provided for the Arizona study, and food intake values were derived from the
concentrations and the weights of the food consumed by a participant. Appendix K provides
information on the derived measurement variables. Appendix L describes the naming
conventions used for all measurement variables in this report.
4.4 Phase 2—Questionnaire Variable Reduction
Phase 1 identified the data to be analyzed in Phase 2 and prepared the data for the analyses.
Phase 2 sorted through, in a systematic way, the approximately 600 questionnaire variables
available for analysis in each study using a combination of two statistical techniques, Principal
Component Analysis (PCA) and Multiple Linear Regression (MLR), to explore relationships
among the questionnaire variables and to reduce the number of potential predictors carried
forward to Phase 3. The Phase 2 approach is descriptive, rather than inferential.
PCA defines components, or sets of related variables, in a data set that help explain the
variability in the data. PCA is used here to analyze the questionnaire variables from each of the
questionnaire groups, such as Housing Characteristics. Multiple linear regression is then used to
select the most informative variables, i.e. those explaining the most variability within each
component, for use in the Phase 3 analyses. The questionnaire variables that represent important
descriptors of a person's environment or activities are identified, and variables that carry
redundant information or that do not carry differentiating information are excluded. In
eliminating information redundancy in the response outcomes, the Phase 2 process makes the
number of questionnaire variables analyzed in Phase 3 more manageable. Relationships with
measurement data are not considered at this point in the approach.
The discussion of the statistical techniques in Sections 4,4. and 4.5 does not include an extensive
description of the techniques. Instead the sections present a summary of how the techniques are
used in the analysis approach. References with more extensive discussions of the techniques
include (Jolliffe 1986, Jolliffe 2002, Jackson 1991, Draper 1966, Neter 1996).
4.4.1 Finding Related Variables
PCA is one of the oldest and best known of the multivariate analysis techniques (Hotelling 1933,
Harris 1975, Jackson 1991, Jolliffe 1986, and Jolliffe 2002). Its central idea, " to reduce the
dimensionality of a data set in which there are a large number of interrelated variables, while
retaining as much as possible of the variation present in the data set" (Jolliffe, 1986) is consistent
with the intent of Phase 2. PCA was performed on the questionnaire variables within the
Dietary, Exposure, Health Status, Housing Characteristics, and Occupation groups. Analysis
was not performed on the variables in the Demographic group because there were so few
variables. Variables with zero variability were not included in the PCA because they would have
zero correlation or relationship with the other variables. Variables with low variability were
initially considered for exclusion, however, including these variables in some preliminary Phase
4-16

-------
3 analyses did not show high sensitivity. Including these variables in the Phase 2 process
allowed them, if selected in Phase 2, to be evaluated for relationships with concentration or
exposure measurements.
PCA determines principal components (PCs) as linear combinations of the questionnaire
variables that maximally discriminate among the cases. PCA derives the PCs in an order based
on the magnitude of the eigenvalue. The PCs are designed to account for as high a percentage,of
variation among the questionnaire variables with as few PCs as possible, and the dimensions in
the data described by the PCs are unrelated or orthogonal The result of a PCA is a matrix of
loading values for each variable with respect to each PC that has been generated. A loading
value ranges from -1.0 to 1.0 and represents a variable's correlation with the PC. If the data are
standardized, the loading value is the coefficient for the variable in creating a PC score, and thus
represents the variable's relative weight in or important to that PC.
Sensitivity of PCA to Alternate Uses
Traditionally PCA is performed on continuous or at least ordinal type variables; however, most
of the questionnaire variables in the NHEXAS studies are categorical. Joliiffe (1986, 2002)
confirms that when using PCA as a descriptive, rather than inferential, technique, assumptions
about the type of data used are not required. He also notes that although linear functions of
binary variables may be harder to interpret than linear functions of continuous variables, "the
basic objective of PCA, to summarize most of the 'variation' which is present in the original set
of p variables, using a smaller number of composite variables (i.e., PC scores) can be achieved
regardless of the nature of the original variables." PCA can be used by replacing actual values
with ranks or by using measures of dispersion or association more appropriate to discrete data in
place of variances and covariances. This expanded view of PCA and the previously defined
codings for non-response outcomes and conditional questions attempts to create a monotonia
relationship between the questionnaire variable and impact on exposure makes PCA a reasonable
part of the reduction process.
Several test analyses were performed to understand the sensitivity of PCA to certain approaches
under consideration. The tests confirmed that changes in the coding values assigned to the "No
Response" and "Not Applicable" outcomes affected the sign of the loading values more than it
affected the PC to which the questionnaire variables were assigned. The tests also confirmed
that the inclusion or exclusion of variables with low variability did not change the structure of
the PCs significantly, because the process was able to sort out their usefulness, or lack thereof, in
explaining variability in the data sets. Lastly, the necessity of using dummy variables in place of
categorical variables was evaluated. The tests confirmed that using sets of dummy variables in
place of the categorical variables created more PCs, but only by subdividing the PCs defined by
the analysis with the categorical variables. Also, the use of dummy variables did not change the
underlying dimensions described by the PCs in comparison to those produced from the
categorical variables. This stems from the fact that categorical variables can be viewed as linear
combinations of the dummy variables, which is consistent with the derivation of the PCs.
Selecting Useful Components
Traditionally, PCs with an eigenvalue > 1.0 are considered meaningful for a given data set.
Joliiffe (1986) recommends considering PCs with an eigenvalue > 0,7 to account for sampling
variation in the eigenvalues based on the sample instead of the population. Several techniques
4-17

-------
(Jolliffe 1972, Krzanowski 1987, Mansfield 1977) were evaluated to identify the PCs and thus
the variables to be included in subsequent analyses. These approaches were not deemed suitable
for this project because they were either resource-intensive for the number of variables being
analyzed or they excluded a large number of variables, and an alternative approach was
developed.
For each PCA, the resulting PCs were rotated using the Varimax option to provide an easier
interpretation of the dimensions in the data set (Jolliffe 1986). Hereafter, the result of the
Varimax rotation will be described as rotated PCs. "PC scores" will refer to linear combination
of participant scores and coefficients from the components. The rotated PCs remain orthogonal,
but the coefficients or loading values from the rotation have a simpler structure. The total
percent variability explained by the m rotated PCs, where m 0.6 for a PC were considered descriptive of that
component. This cutoff value is in a medium range of correlation and was selected to allow the
Phase 3 processes the opportunity to filter the questionnaire variables with respect to the
measurement data.
In summary, the PCA process consists of the following steps:
1.	Use the questionnaire variables assigned to one group category (e.g., Housing
Characteristics).
2.	Exclude variables with no variability.
3.	Run PCA using the correlation matrix option.
4.	Perform the varimax rotation on PCs with an eigenvalue > 0.7.
5.	Select variables within each rotated PC that have an absolute loading value > 0.6,
Table 4-5 shows the first five rotated PCs and questionnaire variables from an analysis of the
Region 5 Study's Health Status group of 61 questionnaire variables for 249 cases based on the
steps described above. Only questionnaire variables selected from these five PCs are included in
the table.
4-18

-------
Table 4-5. Selected Principal Components (Rotated Matrix) Showing Absolute Loading Values > 0.6 for the
Region 5 Study's Health Status Group of Questionnaire Variables (N=249)
Variable
Description
Principal Component
1
2
3
4
S
B21A1
HAD DIABETES?
0.971




B21A2
PROF-DIAGNOSED DIABETES?
0.988




B21A3
HAVE DIABETES NOW?
0.988




B21A4
AGE DIABETES DIAGNOSED?
-0.94




B21B1
HAD NEUROMUSCULAR DISABILITY?


0.928


B21B2
PROF-DIAGNOSED NEUROMUSCULAR DISEASE?


0.974


B21B3
HAVE NEUROMUSCULAR DISABILITY NOW?


0.988


B21B4
AGE NEUROMUSCULAR DISABILITY DIAGNOSED


-0.948


B21D1
HAD ULCER?

0.928



B21D2
PROF-DIAGNOSED ULCER?

0.932



B21D3
HAVE ULCER NOW?

0.926



B21D4
AGE ULCER DIAGNOSED

-0.846



B21H1
HAD STOMACH TROUBLE?




0.941
B21H2
PROF-DIAGNOSED STOMACH TROUBLE?




0.959
B21H3
HAVE STOMACH TROUBLE NOW?




0.948
B21H4
AGE STOMACH TROUBLE DIAGNOSED




-0.852
B21U1
HAD KIDNEY STONES?



0.953

B21U2
PROF-DIAGNOSE KIDNEY STONES?



0.965

B21U3
HAVE KIDNEY STONES NOW?



0.928

B21U4
AGE KIDNEY STONES DIAGNOSED?



-0.897

The full rotated PC matrix for this group of 61 questionnaire variables has 19 PCs with
eigenvalues > 0.7 which accounted for 91 percent of the variability in the data set. The number
of variables used in a PCA, or the number of different subject areas covered by the variables
analyzed, affect the cumulative percent variability accounted for by components with
eigenvalues > 0.7. PCAs based on a smaller number of variables or subject areas tended to
account for a higher percent variability with less PCs because there is less initial in the data set.
One of the 61 variables in this group, F06C2, did not have an absolute loading value> 0.6 on any
of the 19 PCs and thus was excluded from further analysis.
4.4.2 Excluding Redundant Variables
PC scores are values obtained from the linear combinations of the data described by the PCs or
rotated PCs. Many applications of PCA use PC scores in subsequent analyses because PCA
reduces the dimensionality of, and collinearity within, the data set. This approach was not
considered suitable here for two reasons. PCA was used to reduce the number of questions that
would need to be asked in a study in order to identify subpopulations of interest, however,
analyses based on PC scores would still require asking all the questions used in the PCA to
determine the PC scores. Secondly, this project looks for relationships between the actual
questions and the measurements. Although the dimensions of the data described by the PCs are
meaningful, one question may not always be able to describe the dimension. Thus the next step
attempts to extract the important and non-redundant questionnaire variables from each PC.
Jolliffe (1986) notes that if a data set with p variables can be successfully described by q PCs,
where q 
-------
original variables without losing very much of the information. Similarly, since each PC is
defined by variables that are strongly interrelated, that is, have high loading values, it is likely
that the underlying dimension described by a PC can be characterized by a subset of the
variables with high loadings. The second part of Phase 2 sorts through the questionnaire
U	O	JT	£7	1
variables assigned to each rotated PC to eliminate variables with redundant information. A
second evaluation step performs a multiple linear regression on each PC, using the component's
PC score as the dependent variable and the variables with an absolute loading > 0.6 on that PC as
independent variables.
Regression analysis is used here in a descriptive, rather than inferential, manner to identify
indicators of strong associations, i.e., higher explained variability. As such, it was not
considered necessary or desirable to test the significance of the coefficients in order to sort
through the variables. However, the variance inflation factor (VIF), a parameter produced by the
SPSS Regression procedure, was used to identify independent variables providing redundant
information in the PC (Belsley 1991, Neter 1996). The VIF for a specific independent variable is
defined as 1/(1 - R2), where R is the multiple correlation coefficient between that variable and
the other independent variables in the regression analysis. Thus high VIFs indicate high levels
of collinearity. A VIF > 10 reflects an R2 of 0.9 or more. Neter (1996) and Belsley (1991)
suggest that VIFs >10 greatly affect any statistics produced in the regression analysis and should
be resolved for more effective analysis. It should be noted that the selected variables from a
rotated PC may act as "surrogates" for other, weakly correlated, variables in the same dimension.
Interpretations of the relationships between selected questionnaire variables and measurements
should consider this potential substitution.
In summary, the Multiple Linear Regression process consists of the following steps;
1.	Perform Regression analysis for a PC using the PC score as the dependent variable
and the variables in the PC having absolute loading values > 0.6 as independent
variables,
2.	Review the VIF scores of the independent variables and exclude any variables with
VIF > 10.
3.	Rerun the Regression analysis with the remaining independent variables.
4.	If high levels of collinearity still exist, use other exploratory methods and reviews of
the variable subject matter lo help identify variables that can be excluded.
5.	The process terminates when a rerun of the regression analysis shows no independent
variable with a VIF >10.
Table 4-6 shows the VIF values from step 1 for the third rotated PC in Table 4-5. All four
variables have VIF values > 10 and additional analyses were performed to identify B21B1 and
B21B2 as variables to be excluded. Table 4-7 shows the variables remaining from this
component after the process has been completed.
4-20

-------
Table 4-6, Initial Set of Questionnaire Variables and VIF Values for the Third Rotated Principal
Component in Table 4-5
Variable
Description
VIF
821B1
HAD NEUROMUSCULAR DISABILITY?
82.038
B21B2
PROF-DIAGNOSED NEUROMUSCULAR DISEASE?
28.876
B21B3
HAVE NEUROMUSCULAR DISABILITY NOW?
174.43
B21B4
AGE NEUROMUSCULAR DISABILITY DIAGNOSED
50.679
Table 4-7. Final Set of Questionnaire Variables and VIF Values for the Third Rotated Principal
Component 111 Table 4-5
Variable
Description
VIF
B21B3
HAVE NEUROMUSCULAR DISABILITY NOW?
8.313
B21B4
AGE NEUROMUSCULAR DISABILITY DIAGNOSED?
8.313
4.4.3 Results of Phase 2 Processing
This process is repeated for each rotated PC in each group of questionnaire variables. For
tracking purposes, the status of each variable is annotated. Table 4-8 illustrates the results of
processing the Region 5 Health Status group. Results for all the Phase 2 processing are provided
in Tables G3-1 through G3-6 and Tables H3-1 through H3-6. A value of "2" for a variable in
Table 4-8 indicates that the variable had an absolute loading value > 0.6, but was excluded from
further analysis through the regression evaluation process. A value of "3" for a variable
indicates that the variable had an absolute loading value > 0.6 and was carried forward for Phase
3 analysis. F06C2 did not have an absolute loading value > 0,6 on any of the 19 PCS for this
questionnaire group. This table also shows how groups of questions clustered in a PC. For the
most part, each PC identifies information related to a specific health condition. For example, all
questions relating to Diabetes represent the dimension of the first PC. Variables B21A1 and
B21A2 were excluded from further analysis because the information they carried was considered
to be redundant to the information contained in variables B21A3 and B21A4.
Table 4-8. Summary of the Phase 2 Analysis on the Region 5 Study Health Status Group of Questionnaire
Variables
Status Codes:
2	= Variable is included in the component with an absolute loading > 0.6, but is not carried forward to
Phase 3 analysis
3	= Variable is included in the component with an absolute loading > 0.6 and is carried forward to Phase 3
analysis
Variable
Description
Principal Component


1
2
3
4
5
6
7
8
9
1
1
1
1
1
1
1
1
1
1











0
1
2
3
4
5
6
7
8
9
B20
CURRENT HEALTH STATUS


















3
B21A1
HAD DIABETES?
2


















B21A2
PROF-DIAGNOSED DIABETES?
2


















B21A3
HAVE DIABETES NOW?
2


















B21A4
AGE DIABETES DIAGNOSED?
3


















B21B1
HAD NEUROMUSCULAR DISABILITY?


2
















4-21

-------
Variable
Description
Principal Component

1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
a
1
9
B21B2
PROF-DIAGNOSED NEUROMUSCULAR
DISEASE?


2
















B21B3
HAVE NEUROMUSCULAR DISABILITY
NOW?


3
















B21B4
AGE NEUROMUSCULAR DISABILITY
DIAGNOSED


3
















B21C1
HAD ASTHMA/ALLERGIES?











3







B21C2
PROF-DIAGNOSED
ASTHMA/ALLERGIES?











2







B21C3
HAVE ASTHMA/ALLERGIES NOW?











3







B21C4
AGE ASTHMA/ALLERGIES DIAGNOSED











3







B21D1
HAD ULCER?

3

















B21D2
PROF-DIAGNOSED ULCER?

2

















B21D3
HAVE ULCER NOW?

3

















B21D4
AGE ULCER DIAGNOSED

3

















B21F1
HAD GASTRITIS?








2










B21F2
PROF-DIAGNOSED GASTRITIS?








2










B21F3
HAVE GASTRITIS?








2










B21F4
AGE GASTRITIS DIAGNOSED








3










B21G1
HAD FREQUENT INDIGESTION?









3









B21G2
PROF-DIAGNOSED FREQUENT
INDIGESTION?









2









B21G3
HAVE FREQUENT INDIGESTION?









2









B21G4
AGE FREQUENT INDIGESTION
DIAGNOSED









3









B21H1
HAD STOMACH TROUBLE?




2














B21H2
PROF-DIAGNOSED STOMACH
TROUBLE?




2














B21H3
HAVE STOMACH TROUBLE NOW?




2














B21H4
AGE STOMACH TROUBLE DIAGNOSED




3














B21M1
HAD FREQ CONSTIPATION?







2











B21M2
PROF-DIAGNOSED CONSTIPATION?







2











B21M3
HAVE FREQUENT CONSTIPATION
NOW?







2











B21M4
AGE FREQUENT CONSTIPATION
DIAGNOSED?







3











B21P1
HAD FATTY LIVER?





3













B21P2
PROF-DIAGNOSED FATTY LIVER?





2













B21P3
HAVE FATTY LIVER NOW?





2













B21P4
AGE FATTY LIVER DIAGNOSED?





3













B21Q1
BHAD HEPATITIS?










2








B21Q2
PROF-DIAGNOSED HEPATITIS?










2








B21Q3
HAVE HEPATITIS NOW?










2








B21Q4
AGE HEPATITIS DIAGNOSED?










3








B21T1
HAD NEPHRITIS?












2






B21T2
PROF-DIAGNOSED NEPHRITIS?












2






B21T3
HAVE NEPHRITIS NOW?












3






B21T4
AGE NEPHRITIS DIAGNOSED?














3




B21R1
HAD YELLOW JAUNDICE?






2












4-22

-------
Variable
Description
Principal Component
1
2
3
4
5
6
7
S
9
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
B
1
9
B21R2
PROF-DIAGNOSED YELLOW
JAUNDICE?






2












B21R3
HAVE YELLOW JAUNDICE NOW?






2












B21R4
AGE YELLOW JAUNDICE DIAGNOSED?






3












B21U1
HAD KIDNEY STONES?



2















B21U2
PROF-DIAGNOSE KIDNEY STONES?



2















B21U3
HAVE KIDNEY STONES NOW?



3















B21U4
AGE KIDNEY STONES DIAGNOSED?



3















B21W1
HAD DISEASE REQUIRING
CHEMOTHERAPY?















3



B21W2
TIME SINCE LAST CHEMOTHERAPY?















3



F06A2
PAST WEEK TAKE DIURETICS













3





F06B2
TAKE ANY CHELATING AGENT PAST
WEEK?













3





F06C2
PAST WEEK TAKE ANTACIDS



















F06D2
PAST WEEK TAKE HORMONES
















3


F06E2
PAST WEEK TAKE OTHER MEDICINE?

















3

FOB
ARE YOU PREGNANT OR NURSING
MOTHER?
















3


Questionnaire variables with a status of "3" are carried forward to the Phase 3 process. Phase 2
was intended to create a first evaluation of the questionnaire variables to make the analysis of
Phase 3 more manageable. This analysis was performed conservatively so that variables with
potentially meaningful relationships to the measurement data would not be eliminated. Groups
of similar variables generally stayed together in a rotated PC, which is an indicator of the
internal consistency of the questionnaire data. The Phase 2 process created about a 50 percent
reduction in the number of questionnaire variables. The process systematically allowed all
variables to play a role and let interrelationships within the data drive which variables would
move to Phase 3 instead of using preconceived notions or subjective rules to make the decisions.
In almost every instance of a set of related questions, for example, B21A1 - B21A4 (diabetes)
and B21B1 - B21B4 (neuromuscular disability), at least one variable was selected for use in
Phase 3. This reflects the usefulness of the question subject matter rather than the particular
question selected from the set. Some issues arise with "circle all that apply" type questions. In
these questionnaire sets, each option that can be circled is captured in a separate variable. This
allows each option to be evaluated individually; however, the analysis cannot take into account
the breadth of options used by a given participant as it affects relationships with other variables.
4.5 Phase 3—Model Analysis
In Phase 3, the relationships between the questionnaire variables selected in Phase 2 and the
chemical measurements arc evaluated. A conceptual/mechanistic model of human exposure is
used to investigate hypothesized relationships between the questions and measurements.
Different analysis techniques were selected to address each of the project's objectives as follows:
Objective 1 Modeling and regression analysis
-	Categorical Regression Analysis
-	Stepwise Regression Analysis
4-23

-------
Objective 2 Classification of individuals by their likely exposure
-	CHAID (Chi-squarc Automatic Interaction Detector)
-	CART (Classification and Regression Trees)
Objective 3 Classification of individuals with high exposure levels
-	Logistic Regression Analysis
The analysis techniques selected provide different types of predictive information to address the
objectives. The objectives consider relationships between questions and measurements from
different perspectives or uses. The analysis techniques selected for each objective thus provide
types of information different from the other objectives because the purpose is different.
4.5.1 Human Exposure Models as the Basis for Analysis
Environmental Exposure Paradigm
Exposure assessment paradigms provide a tool for understanding and analyzing the pathways
and processes that result from a chemical release or existence in the environment, a chemical's
transformations and transport, and a person's contact with the chemical. A paradigm describing
these relationships is a useful tool for establishing relationships, causal or not, between factors
such as product use behaviors, activities, and physiological characteristics, and outcomes such as
concentrations in environmental media, exposures, and internal doses. A "General Exposure
Paradigm," as presented in the upper portion of Figure 4-4, has been used by many researchers
for this purpose (Sexton 1995, WHO 2000).
In this project, the paradigm presented in Figure 4-4, provides a framework for developing
hypotheses that test for relationships between questionnaire and measurement variables. The
"Measured Environmental/Personal Quantities" indicate measurements representative of each of
the four paradigm outcomes. The "Potential Underlying Predictors" identify the processes and
factors, represented by a response to the questionnaire variables, that may be used to predict the
values of the measurement variables.
The questionnaire variables available as predictors are described in Appendix D. The Phase 2
process reduced the large number of predictors by eliminating those that carried redundant
information. Tables G3-2 to G3-6 and Tables H3-2 to H3-6 identify the questionnaire variables
carried forward to the Phase 3 model analyses for the Region 5 and Arizona studies,
respectively. The potential predictors include: occupational exposures (e.g., exposed to metal
fumes at work?), type and location of the residential environment (e.g., multifamily or single
family building?), residential products use (e.g., how many times did you paint in the past
week?), activities in the home (e.g., how many times did you sweep in the past'week? or how
many times used pesticide in past week?), dietary exposure (e.g., source of running water? or
how many times did you prepare dinner in the past week?), commuting exposure (e.g., do you
commute by car, subway, or bicycle?), outdoor activities (e.g., number of minutes swam in pool?
or number of days used outdoor grill?), modifying activities (e.g., number of days with gasoline
on the hand?), and health problems (e.g., age diabetes diagnosed? or age intestinal/bowel trouble
diagnosed?). Questions associated with each group of "Potential Underlying Predictors" are
included in the "Questionnaire Data Examples" in Figure 4-4.
For each "Measured Environmental/Personal Quantity," a set of potential predictors is identified,
in the context of the exposure paradigm, to evaluate their relationships to the measured quantity
4-24

-------
under each of the Phase 3 objectives defined above. For example, one model for a specific
chemical may hypothesize that the value of exposure in air (EAR), as indicated by the
concentration in a personal air sample, is in part predicted by other measured concentration
values in indoor air (CIA), outdoor air (COA). surface dust (CSF), and soil (CSL), and by factors
describing locations, contact areas, breathing rates, etc.
4-25

-------
Emissions/
External Sources
General Exposure
Paradigm
Media Loadings/
Concentrations
Exposures/
Potential Dose
Internal Dose 1
Measured
Environmental/Personal
Quantities
I—»-/V
» A
J
-	Product types
and uses
-	Ambient Env.
Concentrations
-	Locations
-	Activities
-Contact
-Quantity
Potential Underlying
Predictors
-Absorption
processes
Sep 3: Dose Model
Step 2: Exposure Model
Step 1: Concentration/Loading Model
- Rates
-Pattways
CSL (concentration in
soil)
CQA (concentration in
outdoor air)
EAR (concentration in
personal air)
EDT (intake from food)
EDR (loading on skin)
CIA (concentration in
indoor air)
CSF (concentration in
surface dust)
CDW (concnetration in
drinking water)
BLD (concentration in
blood)
URN (concentration in
urine)
Questionnaire Data
Examples
-	Burning coal or wood?
-	Number of cigarettes srroked?
-	Source of cooking water?
-	Heavy metals In soils?
-	How close is garage to home?
-	How many times did you shower?
-	Number of months use AC?
• Number of days sweeping?
-	Number of dinners at home?
-	Number of min traveled on roadway?
-	At job contact with dust?
-	Number of min sat on carpet?
-Age kidney trouble?
-	Age Intestinal disease diagnosed?
-Smoker?
- Number of days gasoline on skin?
Figure 4-4. General Exposure Paradigm
4-26

-------
Application of Paradigm to Concentration, Exposure, and Dose Models
In this project, models are defined for: (1) contamination loadings in the human environment; (2)
human exposures to such loadings; and (3) internal human doses generated by such exposures.
Models for emissions and external sources of contaminants are not evaluated because the
associated factors were not within the scope of the NHEXAS studies nor of this task. Models for
mass loadings and concentrations, for each major exposure pathway (inhalation, ingestion, and
dermal contact), and for dose represented by biomarkers were considered. Within a study,
models for a specific chemical were considered for analysis if the measured quantity was
available for a suitable number of cases with respect to the analysis.
For mass loadings/concentrations, Step 1 in Figure 4-4 describes the physical and human activity
factors that strongly influence the residential and outdoor environments to which people may be
exposed to a particular chemical. Essentially, this model component predicts residential mass
loadings, or concentrations, that people could be exposed to for various amounts of time and
under various conditions. These mass loadings represent potential, not actual, exposures.
The analysis includes as predictors direct measures such as outside air, residential soil, and water
contaminant concentrations. Indoor residential burdens of such contaminants, of course, are
highly influenced by numerous amplifying, filtering, and transport factors and many particular
human activities. For instance, while it is true that indoor air concentrations depend upon
proximity to contaminated ambient air, it is also true that the physical characteristics of homes
and how they are used greatly affect indoor air concentrations. The questionnaire variables are
used to provide information regarding many such factors that could have a plausible physical
basis for influencing indoor contaminant burdens. In the case of indoor air, these factors include,
for example: whether people smoke cigarettes indoors; whether they bring home contaminants
on their clothing from work; whether airborne contaminants are introduced into the home via use
of contaminated water in major home appliances; and the type and frequency of fuel and
household chemical use. Each of the questionnaire variables was scrutinized to establish a list of
such physically realistic activities, behaviors, and environmental fectoIS that could be expected
to drive or modify indoor mass loadings by chemical class and chemical, if appropriate.
Step 1 defines models for indoor airborne concentrations and indoor dust loadings (ug
contaminant per unit area). Step 2 models actual human exposures for each pathway by
chemical starting with the measurements available from Step 1 and including questions
providing information on contact with each medium. The output of Step 2 is a set of
pathway-specific exposure models (Inhalation Exposure, Dermal Exposure, and Dietary
Exposure which includes food and water ingestion) for each contaminant.
As in Step 1, many human activities influence the relationship between potential exposure (Mass
Contaminant Loading) and actual exposure (Exposures/Potential Dose) in Figure 4-4. Those
activities and factors that had physical plausibility for each contaminant type were identified and
included in the exposure models. For example, indoor carpets may be burdened by arsenic from
outdoor foot traffic through arsenic-bearing soils adjacent to a home. However, such carpet
dusts may or may not contribute to inhalation, dermal, or food contamination, depending on how
often such a carpet is vacuumed; how often people are in contact with the carpet; and whether
fans or air conditioners are turned on, possibly aerosolizing such dusts. The models that were
4-27

-------
used to connect indoor mass burdens to actual exposure were based on a screening of the
available questions to identify potential relevant relationships.
Step 3 involves the relationship between human exposure to contaminants and actual internal
human dose. Starting with measured quantities of exposure from Step 2, these models include
additional factors that modulate relationships between exposure and uptake of a contaminant in
the body. For example, it is known that gasoline on the skin may influence the dermal sorption of
certain chlorinated hydrocarbons studied in this project. Also, the presence, duration, and extent
of certain human organ diseases suggests the ingestion of certain heavy metals. Information on
such factors were available from the questionnaires and included in the dose models for blood
and urine as appropriate.
Using the context of the exposure paradigm described above, the data available from a study
were reviewed for each chemical to determine the models to be analyzed. Based on the
conceptual models described above, the questionnaire and measurement variables were reviewed
to identify reasonable candidates for testing hypothesized relationships between the measured
quantities and potential predictors. Tables G4-1, G4-2, H4-1, and H4-2 show the models and
the variables generally included in the models for each chemical class in the Arizona and Region
5 studies. A specific chemical/model combination was analyzed if the necessary data were
available and if the levels of measurement values above the detection limits were suitable for
analysis. Because of the difficulty in having enough cases for analysis, relationships between
more than two steps of the exposure paradigm were not considered. A multi-step model analysis
for the Region 5 data is presented, however, in Clayton (1999).
4.5.2 Preparation for Model Analysis
For the remainder of Section 4, a chemical/model combination to be analyzed will be referred to
as a model. Before a model was analyzed with respect to the objective, an assessment of non-
response outcomes and a review of the dependent variable's distribution was performed. In
Phases 1 and 2, only questionnaire variables were considered. A coding scheme was defined
that allowed the cases with non-response outcomes to be included in the Phase 2 analyses. In
Phase 3, non-response outcomes for measurements also needed to be considered. Cases where
the dependent measurement variable does not have a value are excluded from analysis. Cases
having no values for the independent measurement variables were not excluded for the
Modeling/ Regression Analysis and Classification since options exist in the implemented
analysis techniques for handling them. The SPSS implementation of logistic regression analysis
did not include a similar option. Attempting to include the measurements as independent
variables at times decreased the number of available cases by 50 percent. Such results would not
be reflective of the populations analyzed under the other objectives. Imputation of the
measurement values was also not feasible as discussed in section 4.3.4. Subsequently,
independent measurement variables were excluded from the Objective 3 analyses. This
exclusion carries with it the recognition that comparisons of predictors across the three
objectives needs to be tempered since surrogate questionnaire variables may appear in the
logistic regression analysis in place of the excluded measurement variables.
The dependent measurement variable was transformed using the Box-Cox transformation
identified in Tables G2-1 and H2-1. The Box-Cox transformation was applied only to the
4-28

-------
dependent measurement variable in a model and was used only for modeling/regression analysis
and classification. The binary dependent variables created for logistic regression analysis,
reflecting whether a case was above or below the 90th percentile, was based on the original scale
of the dependent variable.
Normal Q-Q plots were previously reviewed to determine the most suitable power
transformation of the dependent variable. The normal Q-Q plot for the Box-Cox transformed
measurement was reviewed at this point to determine if any adjustments were required because
of the percent of measurement values below the detection limit (BDL). None of the dependent
variables for the models to be analyzed required adjustment.
There were usually no more than a few cases with a "No Response" outcome in any
questionnaire variable. Cases where more than 50 percent of the questionnaire variables in a
model had a "No Response" outcome were considered unreliable and would be excluded from
the analysis for that model. No cases in any of the models analyzed were excluded. Variability
levels for questionnaire variables were reviewed in Phase 1 and Phase 2. However, it did not
take into account the exclusion of cases for a model analysis because no value for the dependent
variable was available. This necessitated another review of variability levels for each model
before proceeding with the Phase 3 regression analyses for Objectives 1 and 3. It is important to
recognize that a different subset of the study's data is used in almost every model analysis
because of case deletions. This may lead to seeming inconsistencies in the results across models
for the same chemical.
Excluding Cases from Analysis
Because of the limited number of cases in each study, the exclusion of cases from an analysis
was only performed when no other reasonable option existed. In all models, a case was deleted
if it had no value for the model's dependent variable. This means that a slightly different subset
of cases from the study was analyzed in each model. This situation needs to be taken into
account when comparing model results, especially between models for the same chemical.
Imputation was not considered a reasonable or suitable option for the measurement data as
discussed in section 4.3.4. The software used for the Phase 3 analyses offered different options
for handling non-response outcomes from questionnaire variables and missing measurement data
for independent variables. These options are discussed with the specific methodology in the
following sections.
4.5.3 Objective 1—Modeling/Regression
Objective 1 identifies questionnaire variables or measurements, within the conceptual framework
of a concentration or exposure model, that explain the variability about the mean in the model's
dependent variable from a descriptive, rather than inferential, perspective. The approach for
handling this objective combines the nonparametric categorical regression with the traditional
parametric Stepwise Regression, taking advantage of the strengths of each technique.
4-29

-------
Categorical Regression
In this project, most of the questionnaire variables in a model are categorical. Traditional
regression analysis is not suitable for situations having a large number of categorical
independent variables for several reasons. For traditional regression analysis, categorical
variables are generally transformed into dummy variables. This approach would significantly
increase the number of variables and potentially impact the results if p (the number of variables)
is much greater than n (the number of eases) by reducing the power of the sample size.
Depending on the analysis technique and implementation, dummy variables for a specific
questionnaire variable may be evaluated for the model individually or as a group. The former
does not give as useful a picture of the relationship between the dependent variable and the
original question. In traditional regression analysis, ordinal variables are also transformed into
dummy variables, which does not allow the underlying ordinality to be considered. These points
suggested that alternatives to traditional regression analysis be considered. Categorical
regression offers an approach for scaling the response outcomes to make the data more
appropriate for traditional regression assumptions.
Categorical regression is one type of optimal scaling regression which dates back to Kxuskal
(1965) and has been developed along several avenues (Breiman 1985, Gifi 1990, Hastie 1994).
Most literature uses the term categorical regression to describe logistic regression because
logistic regression provides a methodology for analyzing categorical dependent variables
(Agresti 1990). The categorical regression, as used in this study, was initially described by
Young (1976). Categorical regression was performed using the CATREG procedure available in
SPSS's CATEGORIES module. A description of the algorithm can be found at the SPSS
website (SPSS 2003a). A white paper, "Optimal scaling methods for multivariate categorical
data analysis," describing optimal scaling techniques can be found at the SPSS website (SPSS
1998). Categorical regression is suitable for data sets containing a combination of continuous
and categorical variables and provides optimal transformations of the variables based on their
scaling level, that is, numerical, ordinal, or nominal.
Through an optimization process categorical regression replaces the category values, that is, the
original response outcomes, with category quantifications, a new set of response outcomes. The
mixed use of numerical, ordinal, and nominal scaling levels within the same analysis is a unique
feature of the CATREG procedure. If all variables are assumed to be numerical, the solution is
approximately identical to a multiple linear regression. If all predictors are assumed to be
nominal, the estimated multiple correlation coefficient is equivalent to a multiple linear
regression where the categorical predictors are replace by dummy variables. Categorical
regression estimates a regression coefficient and a set of category quantifications for each
variable using an alternating least squares algorithm as described in Dusseldorp (2001).
Categorical regression does not expect or assume a linear relationship between the dependent
and independent variable. Also, the nature and structure of the variables are taken into account,
that is, ordinal variables are treated as having an underlying ordinal framework. Categorical
regression is one of a family of analytical techniques known as non-linear multivariate analysis
(Buja 1990, Gifi 1990, Van der Geer 1993a, Van de Geer 1993b).
4-30

-------
Transforming Response Outcomes
The implementation of CATREG for a given model starts with an initial scaling of all the
independent variables. The values for a numerical variable are translated into a set of discrete,
non-negative integer values obtained for these analyses by multiplying the standardized values
by 10, rounding, and adding a value such that the lowest value is 1 (MULTIPLYING option).
The values for a nominal or ordinal variable were increased by three to meet the non-negativity
requirement of the procedure. The SPSS CATREG algorithms then use these values to
determine the initial scaling values (SPSS 2003a). The selection of initial scaling values in any
iterative algorithm may affect the results, however, there was no user option available for testing
the sensitivity. At any iteration in the algorithm, the scaling attempts to create a linear
relationship between the values of each independent variable and the dependent variable. The
"No response" category, which may normally be excluded as a missing value, is assigned to an
"Extra Category." This option in the SPSS CATREG procedure allows the Extra Category to be
handled as a nominal category even if it belongs to a numerical variable. The optimization
algorithm determines a fixed value for the "No Response" category as it relates to the dependent
variable. Thus "No Responses" can be included in the analysis and their potential differences
with other categories or values for the variable can be reviewed.
The Region 5 Study Arsenic Inhalation model will be used to illustrate the approach for this
objective and will be subsequently referred to as the example model. Table 4-9 and Figure 4-5
show the initial and final scaling values for the dependent variable used in the example model,
that is, Arsenic concentration in personal air (ng/m3). Tables 4-10a, 4-1 Ob, and 4-10c show the
response categories and final scaling values for three independent variables in this model, and
Figures 4-6a, 4-6b, and 4-6c show plots of the values in those tables. The original pre-CATREG
values are on the x-axis and the final scaling values (quantifications in CATREG terminology)
are on the y-axis. The final scaling values give an indication of which response categories are
similar and which are different as they relate to the dependent variables of arsenic concentration.
For example, in Table 4-10c and Figure 4-6c, the states Wisconsin and Michigan might be
considered similar to each other and different from Illinois because of their final scaling values.
In Table 4-10b and Figure 4-6b, the zero final scaling value seems to fall at about 280 minutes
and the slope of the transformation is about 0.001. The "No Response" category, which was
assigned to the Extra Category, has a final scaling value similar to that of the "1800 minutes"
outcome which shows the non-respondents to be within the distribution rather than beyond the
tails. In Table 4-10a and Figure 4-6a, the "Not Applicable" category has a final scaling value
very different than the values assigned to the other two categories. This situation likely reflects
differences in the participants responding to the condition question "Are you greater than 10
years of age?"
4-31

-------
Table 4-9. Category and Final Scaling Values of Arsenic Concentration in Personal Air (ng/m3) from
CATREG Analysis of the Region 5 Study Example Model
Discretized
Values*
Category
Value*
Final
Scaling
Value
Frequency
0.07
1
-2.373
1
.09-.10
2
-2.274
2
0.1
3
-2.174
1
0.14
4
-2.075
1
0.16
5
-1.975
1
.16- .17
6
-1.876
2
.21 - .22
8
-1.677
2
.22 - .23
9.
-1.578
4
.24 - .24
10
-1.479
2
.26 - .27
11
-1.379
3
0.28
12
-1.28
1
0.3
13
-1.181
1
.33 - .34
15
-0.982
3
.35 - .36
16
-0.882
4
.38 - .38
17
-0.783
3
.39 - .40
18
-0.684
2
.41 - .43
19
-0.584
9
.43 - .45
20
-0.485
7
.45 - .46
21
-0.386
5
.47 - .49
22
-0.286
5
.49 - .50
23
-0.187
6
.51 - .52
24
-0,088
6
.53 - .54
25
0.012
6
.55 - .56
26
0.111
9
.57 - .58
27
0.211
3
.58 - .60
28
0.31
7
.61 - .62
29
0.409
2
.62 - .64
30
0.509
8
.64 - .65
31
0.608
2
.66 - .68
32
0.707
4
.68 - .70
33
0.807
8
.70- .71
34
0.906
6
.72 - .73
35
1.005
5
.74 - .75
36
1.105
6
.76 - .77
37
1.204
3
.78 - .79
38
1.304
4
.87 - .89
43
1.8
2
0.89
44
1.9
1
4-32

-------
Discretized
Values"
Category
Value'
Final
Scaling
Value
Frequency
0.94
46
2.098
1
1
49
2.397
3
Total


151
" Multiplying option used in SPSS CATREG to discretize dependent variable and to define category values.
Table 4-10a.Category and Final Scaling Values of Predictor B06 (Use Tobacco Products?) from CATREG
Analysis of Region 5 Study Example Model
Category
Value
Final
Scaling
Value
Frequency
1 = YES
0.411
41
2 = NO
-0.671
91
3 = Not applicable
2,329
19
Table 4-10b.Category and Final Scaling Values of Predictor B08B (# Minutes with Smoker at Work) from
CATREG Analysis of Region 5 Study Example Model
Category
Value
Final
Scaling
Value
Frequency
0
-0.27
121
10
-0.26
1
20
-0.25
1
60
-0.211
4
120
-0.153
4
180
-0.094
1
300
0.023
2
420
0.14
1
600
0.315
2
960
0.667
1
1800
1.486
2
2100
1,779
1
2400
2.071
3
2700
2.364
2
3600
3.242
1
10080
9.563
1
Extra Category0
1.582
3
* Extra Category represents cases with a "No Response" outcome. It is considered a nominal category within the
numeric variable.
4-33

-------
Table 4-10c. Category and Final Scaling Values of Predictor GEO (What state do you live in?) from
CATREG Analysis of Region 5 Study Example Model
Category
Value
Final
Scaling
Values
Frequency
1 = Ml
-1,163
33
2 = IL
1,247
41
3 = Wl
-1.667
12
4 = MN
-0.311
24
5= IN
0.117
14
6 = OH
0.501
27
3
2
1
0
•1
•2
¦3
20
40
60
0
Category Value
Figure 4-5. Plot of Category and Final Scaling Values of Arsenic Concentration in Personal Air fng/nr)
from CATREG Analysis of the Region 5 Study Example Model
4-34

-------

2.5

2 -

15 -

1 -
m
0.5 -
c

IS
u
0 -
m

U
c
-0.5 -
u_
-1 -
0	1	2
Category Value
Figure 4-6a. Plot of Category and Final Scaling Values (Table 4-1 Oa) for Predictor B06A (Use Tobacco
Products?) from CATREG Analysis of Region 5 Study Example Model
u
w
12
10
8
6
4
2
oV*
-2 •
X
# # Minutes
-X— No Response
0 2000 4000 6000 8000 1000 1200 1400
0 0 0
Category Value
Figure 4-6b. Plot of Category and Final Scaling Values (Table 4-10b) for Predictor B08A (# Minutes with
Smoker at Work) from CATREG Analysis of Region 5 Study Example Model ("No Response"
|Extra Category in Table 4-10b] was assigned an X-value = 12000 for plotting purposes only.)
4-35

-------

1.5

1 •

0.5

0
m
c
-0.5
8
m
-1
a
c
il
•1.5
-2 4	,	,	,	
0	2	4	6	6
Category Value
Figure 4-6c. Plot of Category and Final Scaling Values (Tabic 4-10c) for Predictor GEO (What state do you
live in?) from CATREG Analysis of Region 5 Study Example Model
Interpreting Categorical Regression Coefficients
Table 4-11 shows the regression coefficient table from the CATREG analysis for the example
model. Note that the degrees of freedom differ depending on the type of independent variable.
The degrees of freedom for a continuous variable is 1. The degrees of freedom for a nominal or
ordinal variable is c-1, where c is the number of categories with response outcomes for the
variable in this analysis. The coefficients in Table 4-11 represent the relationship between the
final scaled values of both the dependent and independent variables. In this objective,
categorical regression is used only to obtain the transformed values of the variables, thus the
tests of coefficient significance were not considered. Care should be used in interpreting these
coefficients because they represent coefficients for the transformed response outcomes of the
independent variable which have been adjusted to form the best linear relationship with the
transformed dependent variable. The transformation of the independent variable may create a
scale that is not monotonic, that is, increasing with respect to increases in the original values as
in Figures 4-6a and 4-6c, The coefficients are shown here only as an example of the output
available from the CATREG analysis. Interpreting the coefficients would be difficult without
the details of the transformations on each variable as in Tables 4-9 and 4-1 Oa. Thus the results
of this objective focus only on identifying the important predictors and do not include traditional
coefficient information as part of the results.
4-36

-------
Table 4-11, Partial Table of Regression Coefficients from CATREG Analysis of the Region 5 Study Example
Model
Variable
Description
Standardized Coefficients
df
sig.
Beta
Std. Error


AC
WAS AIR COND. ON DURING SAMPLING
PERIOD
0.307
0.164
2
0.047
BOB A
USE TOBACCO PRODUCTS?'
0.431
0.268
2
0.097
B08B
# MINUTES WITH SMOKER AT WORKb
0.162
0.117
1
0,180
B17A
STUDENT OUTSIDE OF HOME?
0.517
0.229
1
0.034
B17B
CLASS HOURS/WK IN PAST MONTH
-0.391
0.159
1
0.022
B18A
HOURS/WK CHILD AWAY FROM HOME
-0.282
0.117
1
0.024
CONCQ20
INDOOR AIR CONCENTRATION
-0.724
0.133
1
0.000
FMTXPOSR
AT WORK- EXPOSURE TO METALS THRU
FUMES
0.270
0.142
3
0.02B
SCHLRJD
SCHOOL'DAYCARE OUTSIDE HOME-
PARTICIPANT
1.010
0.131
1
0.000
STATE
WHAT STATE DO YOU LIVE INC
-0.321
0.148
5
0.004
° See Table 4-1 Oa and Figure 4-6a,
' See Table 4-10b and Figure 4-6b.
c See Table 4-1 Oc and Figure 4-6c.
Stepwise Regression
The categorical regression algorithm produces the new values for the response outcomes, using
relationships between all the variables. As such, collinearitv among the independent variables
affects the scaling, coefficients, and significance testing in the regression analysis. The SPSS
CATREG procedure does not include an option for performing a stepwise regression analysis to
help identify and thus exclude collinear independent variables, and manually implementing a
stepwise process was considered too resource-intensive for the number of variables and models
to be analyzed. Instead, the selected approach builds on the transformation of the dependent and
independent variables from CATREG. Using the transformed values, as defined by the final
scaling values, a stepwise regression analysis is performed with all the independent variables
from the CATREG analysis, using a PIN value (nominal probability for a variable's inclusion in
the model) = 0.05 and a POUT value (nominal probability for a variable's removal from the
model) =0.10. Because this combined regression process is ad hoc, the precise standard for
inclusion and removal corresponding to these PIN and POUT values may not be equivalent to
the standard corresponding to the PIN and POUT values in a more common application of
stepwise regression. However, this is a reasonable and practical method for selecting predictors
and there is no a priori reason to use other PIN or POUT values. The independent variables
identified in the final step of the stepwise regression are considered the important predictors for
the model.
4-37

-------
Cross-validation of Regression Analyses
To enhance the quality of the process for selecting the predictors for a model, many model
building efforts split the available data into training and testing subsets to measure the reliability
of the predictive model. The sample sizes available from the two NHEXAS studies are not,
however, adequate for such splits. An approach from data mining, ten-fold cross-validation
(Breiman 1984), is implemented as a validation process as follows. The cases from a model's
data set are randomly assigned without replacement to 10 mutually exclusive subsets. Ten data
sets, or partitions, are created by excluding a different subset from the full model data set for
each partition. The categorical regression procedure followed by stepwise regression on the
transformed values is performed on each of the ten partitions. The number of times each
independent variable appears in the final model of the stepwise regression analysis is then
tallied. Variables that appear in the ten runs at least six times (6-partition level) and at least
nine times (9-partition level) are carried forward to the next part of the analysis. These
scenarios were chosen to select predictors. The 9-partition level recognizes almost universally
important predictors across the ten partitions. The 6-partition level uses a less conservative
selection process by identifying predictors that appear in more than half the partitions. Table 4-
12 shows the predictors appearing at these two levels for the example model.
Table 4-12. Predictors Selected from the 6-partition and 9-partition Scenarios for the Region 5 Study
Example Model
Variable
Description
6-Partition
9-Partition
ATA27R
AV. MIN. PERFORMED VIGOROUS EXERCISE
*

B18A
HOURS/WK CHILD AWAY FROM HOME
*

B19A
PAST 6 MOS, COMMUTE BY CAR/TRUCKA/AN?
*
*
B43C
# MOSTLY OUTDOOR HOUSE PETS?
*

CONC020
INDOOR AIR CONCENTRATION
*
*
FMTXPOSR
AT WORK-EXPOSURE TO METALS THRU FUMES
*

* Predictor was selected for the scenario.
Final Selected Predictors
An additional iteration of the combined categorical regression and stepwise regression analysis
is performed for variables identified in the two scenarios. This provides two final sets of selected
predictors for the model. Starting with the untransformed data set, the set of independent
variables identified in one of the scenarios, e.g., the variables in the 6-partition column of Table
4-12, is used in the CATREG procedure to obtain a new set of transformed values. At this point,
many of the non-significant predictors in the regression analysis have been culled out and
instances of collinearity have been minimized. This gives CATREG the opportunity to estimate
the transformations of these variables with less extraneous variables. The transformed values are
then used in a stepwise regression analysis as described above. The final set of selected
predictors are presented for both scenarios in Table 4-13.
4-38

-------
Table 4-13. Selected Predictors and Analysis Criteria for the 6- partition and 9-partition Regression
Scenarios of the Region 5 Study Example Model
Description
Variable
Significance Level
6-partltion
9-partition
AV. MIN. PERFORMED VIGOROUS EXERCISE
ATA27R
trtr*
NA
HOURS/WK CHILD AWAY FROM HOME
B18A
#**
NA
PAST 6 MQS. COMMUTE BY CAR/TRUCK/VAN?
B19A
***

# MOSTLY OUTDOOR HOUSE PETS?
B43C
*
NA
INDOOR AiR CONCENTRATION
CONC020
...
...
AT WORK- EXPOSURE TO METALS THRU FUMES
FMTXPOSR
***
NA
Analysis Criteria



ADJUSTED R-SQUARED

0.5
0.35
MALLOWS' PREDICTION CRITERION

7
3
NA The variable was not significant in 9 or more partitions.
NS > 0.05
* (0.01,0.05]
" (0.001,0.01]
*** <0.001
Evaluating the Results
As noted earlier, coefficients from the stepwise regression analyses are not included for these
analyses in Appendices G and H because they represent coefficients based on the transformed
values of the dependent and independent variables and would not be directly interpretable
without details of the transformation information. To assess how well the selected set of
predictors explain the variability in a model's dependent variable, two criteria are provided. The
adjusted R2 and Mallows' Cp are included in Table 4-13 for the two scenarios of the example
model.
Each criterion adjusts for the number of predictors included in the model as a way to allow for
cross-model comparisons. R% or the coefficient of multiple determination, is based on the
square of the multiple correlation coefficient, and is a measure of the proportionate reduction of
total variation in the dependent variable associated with the set of independent variables. The
adjusted R2 takes into account the number of cases and predictors included in the model (Neter
1996). Unlike R2, the adjusted R2 may actually become smaller when another independent
variable is introduced into the model. In Table 4-13, the adjusted R2 values of .499 and .351
show a good fit for the two models. Mallows' Cp can compare the goodness of fit between
alternative models for the same data set (Mallows 1973). A model is considered to have a good
fit, the closer Cp is to/H-1, the number of selected predictors in the model including the constant.
Mallows Cp can be used to compare the models from the 6- and 9-partition scenarios. In Table
4-13, the Mallows' Cp values of 7 and 3 are equal to the number of selected predictors including
the constant, so both models are considered good fits.
A large number of significance tests are performed through all the steps of this combined
approach, which could create spurious results. The cross-validation process helps mitigate some
of the potential by-chance occurrences of significance. Given some of the limitations previously
4-39

-------
discussed and the number of tests performed, the results presented in Section 5, and in
Appendices G and H should be viewed as indicators of existing relationships based on the
statistical analyses. To allow the reader to make decisions about a predictor's significance
consistent with their objectives, the relative strength of the predictor's relationship with the
dependent variable is expressed in Appendices G and H as ranges of significance levels.
An additional note for consideration in reviewing the results is the impact of excluding variables
with low variability from the analysis as described in steps 2 and 4 below. Excluding cases with
no dependent measurement value can affect the variability level of questionnaire variables that
were previously evaluated and deemed to be okay to use. For example, in the Region 5 Study
Example Model, 19 of the 85 independent variables that could be included in the analysis were
deleted because of low variability. It is not possible to list all variables excluded for each model
in this report. This exclusion also extends to the partitions in step 4 below because another 10
percent of the cases are deleted.
In summary, the process using the SPSS CATREG procedure and stepwise regression analysis
consists of the following steps:
1.	Adjust response outcomes for all variables in the model to non-negative values.
2.	Review predictors for variability level and exclude any predictor for which one
category has more than 95 percent of the outcomes or for which there is no
variability.
3.	Define ten partitions of the model's data set for the ten-fold cross-validation process.
4.	Review each partition for any additional variables requiring exclusion because of
no/low variability issues.
5.	Run the CATREG procedure on each of the ten partitions.
6.	For each of the ten partitions, use the transformed data values from the CATREG
procedure as input to a stepwise regression analysis.
7.	Identify the significant predictors at the 6-partition and 9-partition levels from the ten
stepwise regression runs.
8.	Using the original untransformed data from step 1, run the CATREG procedure with
the variables identified in the 6- and 9-partition scenarios separately.
9.	For each scenario, use the transformed data from step 8 as input to a stepwise
regression analysis.
10.	Identify the selected predictors from each scenario.
The results of the analyses for this objective are found in Appendices G and H. The results in
each appendix are organized by chemical class, chemical within the class, and model for the
chemical. Results across the three objectives are presented and discussed in Section 5.
4.5.4 Objective 2—Classifying Subjects by Exposure
This objective looks at the relationship between a dependent measurement and independent
variables from a different perspective in that it attempts to identify a model of predictors and
their interactions that optimally classify participants by their exposure level. In this objective,
the term exposure level is used generically to describe the dependent variable of the model
whether it is a concentration in air or a personal measurement such as blood. A classification
4-40

-------
map or scheme defines groups of the sampled population that have different levels of the
dependent variable and provides characteristics of those groups in terms of the predictors'
outcomes. The techniques used in this approach come from the field of data mining and are used
considerably, though not exclusively, in consumer preference and health studies (Magidson
1993, Weitlisbath 1999, The Measurement Group Website). The mapping from this technique
is called a tree, and is similar to a decision tree diagram (Two Crows 1999). Breiman (1984)
describes such mappings as classification trees or regression trees, depending on whether the
dependent variable is categorical or continuous. Figure 4-7 shows an example of a tree
produced by the Exhaustive CHAID algorithm in SPSS Answer Tree for Arsenic Concentration
in the Region 5 Study example model. The mapping partitions the data into subgroups through
an iterative process. Potential predictors are considered for each subgroup of the data,
independent of the other subgroups. For example in Figure 4-7. the boxes identified as Nodes 1,
2,3 and 4 are evaluated individually to determine the next predictor for each of those nodes.
The goal of the classification technique is to define the best set of rules for identifying the class
to which a case belongs (Breiman 1984),
4-41

-------
Arsarfe P*r*oha3 Air Cane
¦illli.
A'semc Indoor Air Cone.
A4 p-ya1)i.*=ooQoo,>r-X3aer6.at<
<=0,35dS9W3^Mi«M,«c
_L
Sto D*v
0.1 «*
0,5146
ll.

•III..
3*Nc* 8»!c!eae*,!»yes
Nods 15
toem	O.ts
Sra.De-*, 0.1*
¦(¦¦I
¦III
.. n
111.
sex Of PitRTICIPAKf
I. P-vakie-Q.003? f-ii 54&S, trf-1 ,S
PAST WEEK DO VOtRSaF" SWEEP NCOORS
M P-vohJB-0 0039 F-11.1139.d<-l.17
t-OOME BV PARTKaPANf
M0de2S
07170
0.4820
01780
011728
4.73
07170
0 4020
Mode 6
Mesrt	O
aa.Dev. 0
Isl.
i
—0.1®
	 ( 	 1
(0.1 S.Q.'JSSSOOOWXXjQOQQQii {O.^330OO[WDOOaW3i.a6?6S^^9M»9Se7j
1
*O.S?6e9e99e9989999?
wwe S7

NodelB

Nttfe 19

Node 20
Msan 0 S948
Sid. Dev. 0.08S5
n €
¦* 3 SS
P'Mfcled 0-S9«e

Mean 0.4929
SIs.Dr*. Q£92'i
f> S
?ss
Predicted 0,4839

MM* 3 371(1
SU.Cw. 3.0522
n S
% 296
ftmcted 032is

Mean 0 fi£*6
SW, Dev. 9,1210
r 6
% 3 as
Pr«dttetf o.eoie
1.1

i Hi

III

1 1 III
L
Figure 4-7. Example of E-CHAID Tree for Region 5 Study Example Model
4-42

-------
Ana*. fercfrri£t fere
Hoe
bQ
Mew
0W3S
3W ts«v
0195?
1
1O0D0
ft#9 CMB
0 MJS
...ll
III..
hutKMnMrfiiK.
M P-vsMfH? tOW f»33,WW, S-3,185
(jij&B'mymvimp Ksrara'oxuanMf
(0 smxsmmm-t mrmmimm
!«de2
SW Rev
SEX Of PjMTKWIT
&a. oaw. r»i 3 »», <#
jRJ.Drr
AV.OWLVHOUftS ajTSDEAT¥»»«5C.HGCi.
Ad| vak»--dOi«.F=10<448? ,«
GSBS2
EM 0*v
CSH2
nodi 17
SttDsv
1	0
Hiarc Suleue Oust Load
A4	cR-5,18
i
(3.Jt-,C .rcXKCOOMnHH I 1,29
SCHOOL©*VCASf OUtSI*HCM|.P«!TICS>«W
9-mMus-OJH/C, M 8839, C S?S65(39S989S«»?
•2.
-------
The two techniques used in this approach are CHAID (Chi-square Automatic Interaction
Detector) and CART (Classification and Regression Tree). The analyses were implemented
using SPSS Answer Tree version 3,1 (SPSS 2001). Both CHAID and CART accept categorical
and continuous dependent variables and predictors. CART produces a tree with only binary
splits of predictors; CHAID can produce trees with multi-way splits of a predictor. Of the two
CHAID procedures available in SPSS Answer Tree, Exhaustive CHAID (E-CHAID) is used
because its algorithm finds more optimal splits for the predictors. An overview of CHAID, E-
CHA1D, and CART can be found in a white paper, Answer Tree Algorithm Summary, on the
SPSS website (SPSS 1999a). The following sections provide an overview to the classification
process and to their implementation in this study. The term predictor is used in this section to be
consistent with the terminology used in SPSS Answer Tree.
Classification Overview
Figure 4-7 will be used to illustrate the steps of the classification process. Details of these steps
are as follows:
1.	Starting with the full data set, that is, node 0 or level 0 at the top of the tree, consider
each predictor in turn, and evaluate the relationship between the predictor and
dependent variable.
2.	Identify the predictor which splits the data into discrete groups and maximizes the
"distance" between the groups where the distance is based on deviations of the
dependent variable values within in a node from the node's mean. Create nodes on
level 1 of the tree (the row below node 0) defined by each predictor category, or
group of categories that define the distinct groups. In Figure 4-7, the first predictor is
Arsenic in Indoor Air concentration and four nodes were created based on intervals of
the concentration values.
3.	For each node on level 1, consider again each predictor in turn and evaluate the
relationship between the predictor and dependent variable for that node's cases.
4.	Identify the predictor which splits the data into discrete groups and maximizes the
"distance" between the groups as in step 2. Create a node on level 2 (the third row) of
the tree defined by each predictor category, or groups of categories, that define the
distinct groups.. In Figure 4-7, the predictor for splitting node 1 is "Performed
Vigorous Exercise" and two new nodes are created on level 2. The predictor for
splitting node 3 is "Av Daily Hours Outside at Work/School" and two new nodes are
created on level 2.
5.	This process continues until the stopping rules set by the user, and discussed below,
allow no further nodes to be created anywhere in the tree.
A node from which additional nodes are created is called the parent node. The nodes created
from the parent node are called child nodes. The nodes from which no additional child nodes
can be created based on the stopping rules used are called terminal nodes of the tree. These
nodes represent the final subgroups of the population defined by this tree. The group of child
nodes produced from a parent node is called a branch. In Figure 4-7, node 1 is a parent node to
the child nodes 5 and 6, and nodes 5 and 6 form a branch from node 1. Nodes 13, 14, 15, and 16
are some of the terminal nodes on level 3 (fourth row) of the tree.
The type of dependent variable, categorical or continuous, determines the type of statistical test
used for selecting predictors at a given node. The dependent variables in this report are
4-44

-------
continuous, thus an F-test,similar to the test for a one-way analysis of variance (ANOVA), is
used. The predictor at level 1 in Figure 4-7 was selected because the F-test comparing the four
groups of Arsenic Indoor Air concentration, as described by nodes 1 through 4, had the lowest
significance level of all potential predictors. To assess a tree's predictive ability for
classification, a Risk Estimate (RE) is calculated. The predicted value for all observations
grouped into a node is the mean of the dependent variable for all cases in that node. For example
the predicted value of node 1 is the mean, 0.6831, for the 31 cases in the node. The RE for the
tree in Figure 4-7 is the average of the squared deviations of the observed and predicted (node
mean) values based on all the terminal nodes (Breiman 1984). RE(0), the risk estimate of the
root node, is the maximum RE for the data set, and has a value of 0.036 for the tree in Figure 4-
7. Although there are no formal statistical criteria available for making this type of assessment,
REs can be used to evaluate how much of an improvement has occurred by adding or removing a
branch. The RE for the fall tree in Figure 4-7 is 0.011. Smaller REs are better when comparing
different trees for the same data set.
In using these classification techniques, it is important to recognize the advantages and potential
disadvantages with respect to a user's particular analysis needs and interests.
Advantages of classification techniques are as follows:
1.	No distributional assumptions are required on the dependent or predictor variables.
2.	Dependent variables and predictors can be numerical or categorical.
3.	Predictors can include a mix of nominal, ordinal, and numerical variable types.
4.	Missing values in a predictor variable can be handled as a separate and floating
category, that is, the algorithm allows the missing value category to combine with
any other category regardless of the underlying category ordering. This provides a
better alternative to imputing missing values.
5.	Non-linear relationships can be identified.
6.	Major data trends can be identified in cases where a strong theory does not exist to
indicate the usefulness of predictors for a dependent variable.
7.	Depending on the criteria selected, major trends can be identified without
overcapitalizing on chance.
8.	Identifying potential interactions is an automatic and integral part of the process.
9.	Predictors can be forced in at any node.
10.	The process is analogous to a forward stepwise regression so that cullinearity of the
predictors is not as much of an issue as in a simultaneous evaluation.
Potential disadvantages of classification techniques are as follows:
1.	The process is analogous to a forward stepwise regression and does not have the
option of eliminating, at each step, those predictors added to the model in an earlier
step that arc no longer of benefit to the model when subsequent predictors are
included.
2.	Although software options are available for adjusting p-values based on multiple
comparisons, the algorithms are not set up to correct for the number of potential
predictor variables being considered.
3.	The model defined by the tree should be considered as suggestive of the underlying
relationships. Alternate models may be available that fit the data in a statistically or
theoretically acceptable manner. (The Measurement Group Website)
4-45

-------
Growing the Tree
Four aspects of the data were considered for growing a tree: (1) would the dependent variable be
used in a continuous or discrete form, (2) how would categories of continuous predictors be
handled, (3) what would be the impact of using imputed data for measurement predictor, and (4)
what stopping rules would be used. The considerations and adjustments to the approach are
described below.
Form of Dependent Variables
Since the intent of this objective is classification, consideration was given to whether the
analysis should be performed using a continuous dependent variable or using pre-defined
categories of the dependent variable. Test analyses using a categorical dependent variable were
performed for two scenarios: (1) predefined quartile categories and an extra category for values
that were below detection limit, and (2) predefined categories based on actual break points in
the distribution. In the first scenario, E-CHAID was not able to create reasonable distinctions
between the five dependent variable categories. There were no distinct breaks in the data close
to the quartile values, thus a case close to the quartile could be as easily assigned to one category
as to the neighboring category. In the second scenario, break points usually occurred only on the
upper tail of the distribution if at all. This left little room to create reasonable breakpoints for the
rest of the distribution. Predefining categories for classification also limits potential uses. As
techniques for analyzing samples are refined to give more precise measurements, the exposure
levels of interest will change. A more flexible and suitable approach uses the dependent variable
as a continuous measurement, where each terminal node can be described by the mean of the
cases in the node, and a user can identify exposure levels and associated populations by the mean
values.
Predictor Categories
Both CHAID and E-CHAID, as implemented in SPSS Answer Tree, expect grouped categories
of response outcomes for continuous predictors. The software can generate its own groups or the
user can customize the groups. The default option creates up to ten categories of approximately
equal cell sizes. Given the number of cases available for analysis, using more than five child
nodes seemed to be too fine of a split. In general, the E-CHAID-generated groups are used for
these analyses, however, if the first predictor in the tree is a measurement variable, and if more
than five child nodes are produced for the predictor, customized categories are used. Contiguous
or neighboring categories can then be merged as needed for any branch to best reflect the
relationships with the dependent variable.
The category groups were customized using the Classification and Regression Tree (CART)
option in SPSS Answer Tree. CART does not use pre-defined categories of response outcomes
for continuous variables, and it does allow only binary splits at any node. This option was not
selected for the classification approach because its binary algorithm tends to grow trees with
many levels and the trees do not present results efficiently when a predictor is split on successive
levels (SPSS 2001). Also, Because E-CHAID and CART tend to select the same predictors in
the first levels, when E-CHAID produces too many child nodes from a parent node at the first
level (i.e., too many categories for a predictor), CART can be ran first to predefine categories for
E-CHAID to use for the predictor that is a potentially more usable set of categories. Figure 4-8
shows an example of a CART analysis used to define the categories for Arsenic Concentration in
4-46

-------
Indoor Air as the first predictor for Arsenic Concentration in Personal Air. In the initial E-
CHAID analysis for the model associated with Figure 4-7 (not shown), the first predictor level
produced ten nodes. Using the first two levels from the CART analysis (terminal nodes 3, 4, 5,
and 6), the tree produces four nodes for the Arsenic in Indoor Air Concentration, The CART
definitions for the four nodes are then used as the first level nodes for the E-CHAID analysis in
Figure 4-7.
Moee ft
stesn	0 531?
Sa.Dev 01901
n	1EE
*	100.00
PfMCtCd 05317
..illllia.
Ar*«fUc fodsor 41-Core,
•nercvawrt-oflzzir
1
1
X 5M9COC&XIOOOOOO*
i
NOW!
M«m OC.'bs
Std.Dev. 0.11G3
n 86
* SIB1
| 111..

M«we2
Moan ggrs-s
SW.Dsv 8.1153
n K
%¦ W! 19
Prertdcd C-3T55
-111
Arsenic msoer Art^»ow.w*»0 035?
>0^3909006900000007
i
Wwfe-4
38217
Afscfkcftftx* At Cane.
iBJwwBwi-o aes3
«HJ .3fitI2399S933333S95
Figure 4-8. Example of Using CART to Customize Categories for a First-Level Measurement Predictor
Producing More Than Five Nodes
Predictors with Imputed Measurements
In the Region 5 study, more than 50 percent of the outdoor air and soil measurements were
imputed as previously described. The first level predictor is a driving force for the rest of the
tree's development and there was a concern about using an imputed variable at that level. In
situations where one of the imputed measurements was selected as a first level predictor, a
review of other potential predictors was made. Table 4-14 shows the five most significant
predictors for node 0 in Figure 4-7. The Adjusted Probability value tests the difference between
the groups created at the first level of the tree for the predictor, similar to a one-way ANOVA.
The table also shows the Risk Estimate (RE) and Standard Error of the RE for each tree grown
with the predictor at the first level. Although the top predictor, Arsenic Indoor Air
Concentration, did not contain imputed values, this model will be used to illustrate the approach.
4-47

-------
Table 4-14. Criteria for Top Five Predictors Available for Splitting Node 0 in the Region 5 Study Example
Model
Variable
Predictor Description
F-
values
df
Adjusted
Probability
Risk
Estimate
Standard
Error
CONC020
Arsenic Indoor Air Concentration
514.3
9,156
< 1 .CE-09
.00078
.00019
CONC030
Arsenic Outdoor Air
Concentration
32.6
2,163
< 1.0E-09
.00247
.00065
B19F
Past 6 Months, Commute by
Walking?
20.8
1, 164
0.00001
.00135
.00039
CONC050
Arsenic Surface Dust Loading
9.6
3, 162
0.00040
.00717
.00025
GEO
What State Do Vou Live In?
12.2
2,163
0.00042
,00188
.00288
The selected approach evaluates whether any of the other top predictors produces a tree with a
better Risk Estimate (RE) than the initial predictor chosen for the first level. To determine which
predictor will be used in the first split, a tree is grown forcing each of the other top predictors
with a P-value < 0.001 in at the first level. The RE and the Standard Error (SE) of the Risk
Estimate is reported for each of these trees as in Table 4-14. The average of the SEs is calculated
(0.0009 from Table 4-14) and used to judge differences between the REs. If any of the other
predictors have an RE that is better than the first predictor by twice the average SE, that
predictor is forced to be the first level predictor. If no predictor meets that criterion (which is the
case in this example), the second highest predictor from the list of top predictors is used. If the
first predictor (the imputed measurement) still appears to be the best first level predictor, it is
used with a customized set of categories.
Stopping Rules
The classification process allows the user to specify stopping rules for growing the tree that
make sense with their objective. The stopping rules used in this approach took into account the
levels of available data in these analyses, and the issue of potentially over fitting the data. The
stopping rules used are: (1) a parent node is not split if it has less than ten cases, (2) a child node
is not created with less than five cases, and (3) the minimum alpha level for selecting a predictor
to split a parent node is 0.05. The alpha level is the adjusted p-value shown immediately below
the parent node of a split. In Figure 4-7, the p-value for the split of Aisenic Indoor Air
Concentration from node 0 is < 0.001. The p-value includes a Bonferroni adjustment (Miller
1981), based on the comparisons made to determine the number of nodes in the split. It does not
include an adjustment for the number of predictors evaluated for the split. A third stopping rule
limits the tree to ten levels.
Refining the Tree
The process for defining a model's final tree includes rules similar to the process for a backward
elimination regression. Given the stopping rules described above and the available sample sizes,
most initial trees do not contain more than six levels. The tree is allowed to grow more levels
then needed so that a pruning and review process can take place. The tree is pruned one level at
a time, that is, all branches for a level are eliminated by considering the percent change that has
occurred in the Risk Estimate because of the pruning step. This step reviews how much the RE is
improved by including the last level in the tree. The percent change in the RE is defined as [REV,
- REV]/RE0, where RE0 is the risk estimate at the root node or node 0, and REV is the risk estimate
4-48

-------
for level v. The level is slated for pruning if there is less than a ten percent change in the RE
when the level is excluded.. Ten percent was used as the criterion after reviewing several
analyses and the levels of change occurring in the RE for these data sets. The branches from the
level to be pruned are then similarly reviewed. In the example data, none met the criterion to
remain on the tree. Table 4-15 shows the percent change in the risk estimates used to prune the
tree in Figure 4-7. The 4.4 percent change represents the percent increase in the Risk Estimate
obtained by not pruning level 4 in the tree. This percent was below the criterion of 10 percent;
thus level 4 was pruned.
Table 4-15. Percent Change in Risk Estimate at Levels of Tree in Figure 4-7
Tree
Level
Risk Estimate
Standard
Error of RE
Percent
Change
L4
0.01132
0.00143
NA
L3
0.01301
0.00178
4.4
L2
0.01879
0.00259
15.2
L1
0.02355
0.00282
12.5
LO
0.03805
0.00398
38.1
Figure 4-9 shows the pruned tree for Figure 4-7. The RE for the tree in Figure 4-7 is .011; the
RE for the tree in Figure 4-9 is .013. In this case, the pruning did not change the RE by very
much.
Characteristics of the Classification
Once the pruning is complete, summary statistics for the terminal nodes in the tree are prepared.
Table 4-16 shows the summary statistics and defining characteristics for the terminal nodes of
the tree in Figure 4-9. In many instances, the Box-Cox transformation used on the dependent
variable is a reciprocal function, and high exposure levels have smaller transformed values. The
nodes in Table 4-16 are sorted in order of the untransformed means to give a view more
consistent with the original values. The transformed means arc included for cross-reference to
the tree. In Table 4-16 the node with the highest untransformed mean value is node 11 (the last
node in the table). The untransformed mean for the node is 6.92; the transformed mean
appearing in the tree is 0.18. Cases belonging to this node have the following characteristics:
"Arsenic Indoor Air Concentration" is not Missing and is > 0.9803, and
The questionnaire variable "Floors in the Building" is not Missing and has a value < 1 or
"No Response."
The distribution of the seven cases in node 11 seem to be reasonably close-knit. Each node in
the tree includes a graph showing a relative distribution of the response outcomes in that node
and thus indicates how good the classification in the node is. Less spread within a node would
represent better classification. The coefficient of variation may also be useful in evaluating a
node's ability to classify.
4-49

-------
Ars«ric Personal *r Core.
0.5425
D«v 0195?
10QOQ
0.542S
Arsenic Indoei Ar Core.
P-vafcje-O.DOOO, F-23.BS78, df-3,16S
Mean
Std 0sv.
Q.SS31
0.1307
31
18 34
0.E831
,.il.
e 0 ,S33900OCKMaa]D00«|
Mean	0.6105
3d. Dev. 0.1885
%	3432
Predicted 0.61 Di
.« II
llll..
PERFORM® VIGOROUS EXERCISE
A4 P-vafue=C.01Q3l F=*7 5292, dM,29
1=YES
M5di6
0 5994
0.1605
12
7.10
0 53R4
0 7366
SJd.uev
3d. Dt
0.1197
%
Pre<£sfed
Predicted
07366
PAST V^EK USED PORTABLEKELINO ?AN
Ac* P-yaiu»'0.0234,F-7,13S6,df»1,10
2=FiMALE
	I	
Near
Std Dev.
0 5724 .
0.1065
36
21-30
I O S7?t
.III..
AT^JCSS NO CONTACT WITH DUST?
Adj, p-s^.*=0.0239 F-10.8798, dM.34
SEK OF PARTr.PAMT
Act P-Vfltue-Q ^QJ7 » 24QM.d»«1 ,56
1*MAUE,«'in:S'sing>
0 £092
Sd Dfv
0.1457
B.MW
T-"H
Arswi*:- Surface Oust losd.
Adj. P-v«tue»0.D238, F*9 575M. dKU«
I
1-YES
	1
2-NQ
h-octe 13

lOedeU
Mean Q5i<6

Mean 0715?
Sid. Dev 01554

53d Dev ln?TS
ft 7

ft 5
% 4.14

% 2.96
Pmfikdad 0.5146

predicted 0 7157
llll 1

..1
;MUr* ui^ntrBbte.l-VSS
NodniS
Mean	07631
Sid. Dev 01433
n	1?
%	ia.es
0.7681
lllil
rnm	0.S8S7
£ld.Oev 31815
%	11.24
Pfscidad 0.S887
.•III
(016,0 3B33C0CM200000001J (C 3&TmC0Q(m)001 0 676SQ9O0399aHHya7J
Node 17
Mean
0.SS48
Sid Dev.
C086S
n
6
%
355

prftdHad
I.I
0.5340
0.1029
siaoev
0 0923
Msen
Stt. Oe*
%
PVedicted
ll
03218
0 0822
2 96
0 3218
»Q 676S99939HSHaHH!«
Mean 0601S
SKJOsv. 01210
355
06CTI8
II.
Figure 4-9. Final Tree for Region 5 Study Example Model, the Result of Pruning the Tree in Figure 4-7.
4-50

-------
AraeNc Personal Ar Cone. (1 1))
Moda O
Q547!>
0.1^5/
SM Dev
foo.oc
0.5425
•i	e
At sent Indoor Air Cone.
Adj. P-vsfcjB'O.OOOO, F-33 8876, df=3165
(0.5339MO!f)OGCiOOGQ4jD 98029999989999995}
	Node';3	
Mean
Sid. Dev.
0.S1S8
0,1165
20.40
0.5! 58

AV DALY HOURS OUTSIDE AT WORK/SCHOOL
Adj P-ytfue=0,a341,FM0.4l82>df«1l45
s»Q 08G2[IKBQ[K»Q00»i
Mean
set. Dev
0.3233
0.1 <29
19 93
0.3233
il.IL
FLOORS « BUlDNG
Adj. Pw#Me=O-0295 F-12.0832, c(-1.30
'•C.OfOCOOOOGOOGOOOQO7 «rris®irg»
HJ .070000000000000007
NodeS
Mean
9.532$
Sid. Dev.
0.1041
n
31
%
18.34
P reahied
[ ,.il
0552S
li
A V. D AIL V HOURS INSIDE B-S&NHERE
Mj P-veSue-0 0CeS.F-ia.2332, eff-t .20
1
K31 pS?^^HlB!a935HSH3?J
!
»0 67firaaRSS999999<37
lode i 9

Node 20
0.321 S

Mean 0.6016
IV. O.0822

Sid Dev 0 1210
5

f! 6
296

% 3.65
wJ 0 3210

Predated QJ3Q16
1


.Hi

Node 2!
Mean
0.59S1
sw Dew.
0.QBQS
fi
23
%
13B1
Pradfclwf
I
0S9D1
I -1
II
Nflde 22
Mean
~ .4446
Sid DEW.
00903
n
8
*
4.73

delict sd
0.44-16

Ills

Mean
Sid. Dev.
0 44Q9
0.1103
10.C8
n 4488
.III.
Mean
Sid Dev.
	1	0
SCHOOL JDAYCARE OUTSDE HOME-PARUQPANT
Ai| . P-vafue-O.CMTO, F-45839, Ot-1,15
I	 	1
Node 23
Mean
0.4874
So. Dev.
0.1076
0
11
%
6.S1

Prsdfctfld
0.4374

1.1
I.
Nods 24
Wean
0^?81
Sim Dev
0.08Q2
n
B
%
3.55
Rrecfdsd
Q37B1
¦ 8-

o.te^o
D.1346
4.11
a.tem
Msan
Std.Dev.
0,3612
0.1195
11.79
0.3632
.III
II.
	1	~
AV. MM. SATJLAY CtJ CAf^FTlRUGS
A3| P-V«ttue^ia32fi F-11.9102,^-1,23
r»t ? 2600GDQQOQQG002
>17 jeoooawottaoooa
Nods 25
Mean
0 4145
hm. Dev
OJHTIH
n
16
%
9.4?
Prerfrfed
0.41 *S
Ji
.
*l«©*
SSf! D«v.
.Ill
0.2720
533
0J72O
Figure 4-9, continued. Final Tree for Region 5 Study Example Model, the Result of Pruning the Tree in Figure 4-7.
4-51

-------
Table 4-16. Summary Statistics and Defining Characteristics of Terminal Nodes from Tree in Figure 4-9
Node
N
Transformed
Mean'
{shown in tree)
Original Scale (ng/m3)
Rules for Classifying Subpopulations in Nodes
Mean*
St.
Dev.
Median
Min
Max
15
17
0.77
0.34
0.23
0.42
0.00
0.67
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.239 AND ARSENIC INDOOR AIR CONC. <= 0.5339)) AND (SEX OF
PARTICIPANT = "2=FEMAIE") AND (AT-JOBS NO CONTACT WITH DUST? =
"3=NOT APPLICABLE" OR AT-JOBS NO CONTACT WITH DUST? = "1=YES")
6
19
0.74
0.39
0.25
0.36
0.00
1,12
IF (ARSENIC INDOOR AIR CONC. IS MISSING OR (ARSENIC INDOOR AIR CONC. <-
0.239)) AND (PERFORMED VIGOROUS EXERCISE l="1=YES")
14
5
0.72
0.41
0.15
0.35
0.26
0.61
IF (ARSENIC INDOOR AIR CONC. IS MISSING OR (ARSENIC INDOOR AIR CONC. <=
0,239)) AND (PERFORMED VIGOROUS EXERCISE = "1=YES") AND (PAST WEEK
USED PORTABLE/CEILING FAN = "2=NO")
17
6
0.59
0.71
0.25
0.72
0.44
1.02
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.239 AND ARSENIC INDOOR AIR CONC. <= 0.5339)) AND (SEX OF
PARTICIPANT !- *2=FEMALE") AND (ARSENIC SURFACE DUST LOAD. IS MISSING
OR (ARSENIC SURFACE DUST LOAD. <= 0,16))
21
23
0.59
0.73
0.25
0.69
0.40
1.28
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.5339 AND ARSENIC INDOOR AIR CONC. <= 0.9803)) AND (AV. DAILY
HOURS OUTSIDE AT WORK/SCHOOL IS MISSING OR (AV. DAILY HOURS OUTSIDE
AT WORK/SCHOOL <= 0.07)) AND (AV. DAILY HOURS INSIDE ELSEWHERE IS
MISSING OR (AV. DAILY HOURS INSIDE ELSEWHERE <= 2))
20
6
0.60
0.74
0.45
0.60
0,34
1,61
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.239 AND ARSENIC INDOOR AIR CONC. «= 0.5339)) AND (SEX OF
PARTICIPANT != "2=FEMALE") AND (ARSENIC SURFACE DUST LOAD. NOT
MISSING AND (ARSENIC SURFACE DUST LOAD. > 0.6767))
16
19
0.59
1.08
1.47
0.76
0.28
6.39
IF (ARSENIC INDOOR AIR CONC, NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.239 AND ARSENIC INDOOR AIR CONC. <= 0.5339)) AND (SEX OF
PARTICIPANT = "2=FEMALE") AND (AT-JOBS NO CONTACT WITH DUST? !=
"3=NOT APPLICABLE" AND AT-JOBS NO CONTACT WITH DUST? != "1=YES")
1fl
5
0.48
1.12
0,35
1.22
0.57
1.44
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.239 AND ARSENIC INDOOR AIR CONC. <= 0.5339)) AND (SEX OF
PARTICIPANT != "2=FEMALE") AND (ARSENIC SURFACE DUST LOAD. NOT
MISSING AND (ARSENIC SURFACE DUST LOAD. >0.16 AND ARSENIC SURFACE
DUST LOAD. <= 0.3633))
13
7
0.51
1.14
0.81
0.91
0.29
2.81
IF (ARSENIC INDOOR AIR CONC. IS MISSING OR (ARSENIC INDOOR AIR CONC. <=
0.239)) AND (PERFORMED VIGOROUS EXERCISE = "1=YES") AND (PAST WEEK
USED PORTABLE/CEILING FAN != "2=NO")
4-52

-------
Node
N
Transformed
Moan*
(shown in tree)
Original Scale (ng/m ')
Rules for Classifying Subpopulations in Nodes
Mean*
St.
Dew.
Median
Min
Max
23
11
0,49
1.16
0.53
0.98
0.57
2.01
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0,5339 AND ARSENIC INDOOR AIR CONC. <= 0.9803)) AND (AV. DAILY
HOURS OUTSIDE AT WORK/SCHOOL NOT MISSING AND (AV. DAILY HOURS
OUTSIDE AT WORK/SCHOOL > 0.07)) AND (SCHOOL/DAYCARE OUTSIDE HOME-
PARTICIPANT != "1=YES")
22
8
0.44
1.34
0.54
1.24
0.79
2.33
IF {ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.5339 AND ARSENIC INDOOR AIR CONC. <= 0.9803)) AND (AV. DAILY
HOURS OUTSIDE AT WORK/SCHOOL IS MISSING OR (AV. DAILY HOURS OUTSIDE
AT WORK/SCHOOL <= 0.07)) AND (AV. DAILY HOURS INSIDE ELSEWHERE NOT
MISSING AND (AV. DAILY HOURS INSIDE ELSEWHERE > 2)}
25
16
0.41
1,51
0.57
1.40
0.79
3.21
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.9803)) AND ((FLOORS IN BUILDING IS MISSING OR FLOORS IN
BUILDING = "-1=NO RESPONSE") OR (FLOORS IN BUILDING > 1)) AND (AV. MIN.
SAT/LAY ON CARPET/RUGS IS MISSING OR (AV, MIN. SAT/LAY ON CARPET/RUGS
<= 17.26))
24
6
0.38
1.79
0.86
1.46
1.25
3.52
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0,5339 AND ARSENIC INDOOR AIR CONC. <= 0.9803)) AND (AV. DAILY
HOURS OUTSIDE AT WORK/SCHOOL NOT MISSING AND (AV. DAILY HOURS
OUTSIDE AT WORK/SCHOOL > 0.07)) AND (SCHOOL/DAYCARE OUTSIDE HOME-
PARTICIPANT = "1=YES")
19
5
0.32
2.30
0.93
1.65
1.56
3.34
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0.239 AND ARSENIC INDOOR AIR CONC. <= 0.5339)) AND (SEX OF
PARTICIPANT != "2=FEMALE") AND (ARSENIC SURFACE DUST LOAD. NOT
MISSING AND (ARSENIC SURFACE DUST LOAD. > 0.3633 AND ARSENIC
SURFACE DUST LOAD. <= 0.6767))
26
9
0.27
3.50
2.34
2.90
1.05
8,90
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND (ARSENIC INDOOR AIR
CONC. > 0,9803)) AND ((FLOORS IN BUILDING IS MISSING OR FLOORS IN
BUILDING = "-1=NO RESPONSE") OR (FLOORS IN BUILDING > 1)) AND (AV. MIN.
SAT/LAY ON CARPET/RUGS NOT MISSING AND (AV. MIN. SAT/LAY ON
CARPET/RUGS > 17.26))
11
7
0,18
6,92
4.43
5.33
1.19
13.75
IF (ARSENIC INDOOR AIR CONC. NOT MISSING AND {ARSENIC INDOOR AIR
CONC. > 0.9803)) AND ((FLOORS IN BUILDING NOT MISSING AND FLOORS IN
BUILDING != "-1=NO RESPONSE") AND (FLOORS IN BUILDING <= 1)5
1 Table is sorted by Mean in Original Scale. Because the transformation is an inverse form (1/(y+1)), the mean of the transformed values does not always appear
sorted in the Transformed Mean column, e.g., Nodes 21, 20, and 16,
4-53

-------
In summary, the process for using E-CIIAID is as follows:
1.	Run E-CHAID analysis.
2.	Grow the tree given the stopping rules described above.
3.	Review the first level predictor if it is a measurement variable.
4.	Redefine the categories of the first level predictor using CART if more than 5 child
nodes are generated.
5.	Review the first level predictor if it is an imputed measurement variable.
6.	Regrow the tree if needed.
7.	Prune the tree to obtain the final model using the rules specified above.
8.	Prepare summary statistics and defining characteristics for the terminal nodes.
4.5.5 Objective 3—Classifying Subjects with High Exposure Levels
Objective 3 has a narrower focus for classification than that of Objective 2, that is, to identify
predictors that classify people with high exposure levels of a chemical. High exposure, as used
here, is based on EPA's definition (EPA 1992), and includes participants whose measurement is
greater than or equal to the 90th percentile. As in Objective 2. the intent is to find descriptions of
the subpopulations that are more likely to have high exposure levels.
Logistic regression analysis is designed for situations where the dependent variable is binary,
that is, Y=1 (exposure is "high") and Y=0 (exposure is "not high"). It estimates the probability
that a case with a given set of predictors has a high exposure level. The SPSS procedure
LOGISTIC REGRESSION allows both continuous and categorical predictors in the analysis. A
categorical predictor with c categories is transformed into e-1 dummy variables, and the
remaining category is termed the reference category in SPSS. The odds of an event Y=1 is
defined as/?/(]-/?),that is, the ratio ofp, the probability that Y-1, to 1 -p, the probability that Y=0.
One of the benefits of using logistic regression for this objective is the ability to obtain an odds
ratio for each predictor. In models with no interaction terms, the odds ratio, or the exponential of
a predictor's regression coefficient, describes the percent change that occurs in the odds with
either changes in units for a continuous predictor, or in comparison to the reference category for
a categorical predictor.
Logistic Regression and E-CHAID
There are three differences between the logistic regression and E-CHAID analyses which should
be noted if comparisons are made between the two sets of results for a model. The first is that,
as part of its classification process, E-CHAID looks at interactions between predictors. Logistic
regression can consider interactions, but they must be explicitly specified. Without some
knowledge base for deciding on interactions to be included in an analysis, the number of
potential interactions would be voluminous. For example, without filtering twenty predictors
would produce about 200 first-order interactions. Thus no interactions were included. The
second difference is that the focus of the analyses are different. E-CHAID makes its own
determination of dependent variable groups as defined by the terminal nodes of the tree.
Logistic regression specifically looks at the two subpopulations, those above and below the 90th
percentile of the dependent variable.
4-54

-------
The third difference is that, in E-CHAID, missing values for measurement predictors are handled
as a separate nominal category, and thus do not require exclusion or imputation. In the
LOGISTIC REGRESSION procedure, such missing values can only be included in the analysis
if a constant or imputed value is assigned. The measurement variables have varying levels of
missing data, ranging as high as 70 percent. As previously discussed, there is no justifiable way
to assign a value for missing measurements, and as noted in section 4.3.4, imputation of the
measurement variables would require adequate justifications which were not available for most
of the measurement variables. Allowing cases with missing measurement values to be excluded
significantly decreased the number of cases available for the analysis. Thus independent
measurement variables were not included in the logistic regression analysis. With no
measurement predictors available, questionnaire variables may be selected as surrogate
predictors. Since the analyses for all three objectives are based on the same number of cases, it
is easier to make a comparison of the selected predictors in a model across the objectives as
presented in Section 5.
Analysis Options
Backwards elimination regression was initially considered as the analysis option in the
LOGISTIC REGRESSION procedure, since there is less risk of not finding a relationship when
one exists (Menard, 1995). However, the necessity of creating dummy variables to replace
categorical variables would increase the number of variables beyond the number of cases
available, and would not allow the regression to be performed. The forward stepwise approach
was used instead, with a PIN value (probability of inclusion in the model) = 0.10 and a POUT
(probability of removal from the model) = 0.15. The PIN and POUT values are different than
those used for the Modeling/Regression objective to allow more variables an opportunity to be
selected. When the LOGISTIC REGRESSION procedure translates the categorical variables
into dummy variables, one category is designated the reference category and is not assigned a
dummy variable. The default option, to assign the reference category to the response category
with the highest value, was taken. In Table 4-17 the month of June for variable B33B is the
reference category. Hosmer (2000) discusses many aspects of logistic regression analysis.
Table 4-17. Logistic Regression Odds Ratios and Analysis Criteria for Region 5 Study Example Model
Description
For Nominal/Ordinal
Variables
Variable
Significance
Level
Odds
Ratio
95% Confidence
Interval for Odds
Ratio
Indicator
Category
Reference
Category
Lower
Upper
AV, MIN. PERFORMED VIGOROUS
EXERCISE


ATA27R
***
1,01
1,01
1.02
MONTH STOP HEATING DEVICES


B33B
*



MONTH STOP HEATING DEVICES
March
June
B33B(1)
NS
0.20
0.03
1.27
MONTH STOP HEATING DEVICES
April
June
B33B(2)
*
0.14
0,03
0.78
MONTH STOP HEATING DEVICES
May"
June
B33B(3)
**
0.03
0.004
0.26
HOURS,WK CHILD AWAY FROM
HOME


B18A

1.07
1.00
1.14
4-55

-------
Description
For Nominai/Ordinal
Variables
Variable
Significance
Level
Odds
Ratio
95% Confidence
Interval for Odds
Ratio
Indicator
Category
Reference
Category
Lower
Upper
Analysis Criteria







NAGELKERKE R SQUARE



0.26



PERCENT CORRECT
CLASSIFICATION - HIGH EXPOSURE



5.88



NS > 0.05
*	(0.01,0.05]
" (0.001,0.01]
*" < 0.001
*	Because of quasi-complete separation issues, "No Response" was combined with "May," the mode of this variable.
4-56

-------
Separation Issues
Two situations occur in logistic Regression which are reviewed in developing the final set of
predictors for the model, complete separation and quasi-complete separation (Allison 1999).
These separation issues are a problem because of the software algorithm's inability to calculate
the Maximum Likelihood estimates for the coefficients when there are small cell counts.
Complete separation occurs when an independent variable perfectly predicts the dependent
variable. An example of complete separation is when Y--1 if X< 3.5, and Y=0 otherwise. No
instances of complete separation occurred.
Instances of quasi-complete separation occur for a predictor when there is complete separation
except for a few response categories. These categories have cases with both values of the
dependent variable. Because the sample sizes available for analysis are not large, the number of
cases representing the high exposure level population ranges from about 10 to 20. Spreading this
small number of cases across several categories or across values of a continuous predictor makes
this situation more likely to occur. When instances of quasi-complete separation occurred,
categories of the variable were combined where possible, or the variable was deleted, before a
subsequent analysis. Several instances of quasi-complete separation occurred, and a list of the
categories combined or the variables deleted are included in a footnote with the analysis results
in Appendices G and H.
For many of the analyses, several adjustments of the variables to resolve the separation issues
did not lead to a model with reasonable estimates, and further analysis was discontinued.
Limited information about predictors for these analyses is provided in Appendices G and H.
Analysis Results and Criteria
Table 4-17 shows the final set of predictors for the example model with the odds ratio and
confidence interval of the odds ratio for each selected variable. The significance level is based on
the Wald statistic, and tests the hypothesis that the coefficient is zero as described in the SPSS
LOGISTIC REGRESSION algorithm (SPSS 2003b). For a categorical variable, all dummy
variables representing the variables are kept together in the model. The test using the Wald
statistic for a dummy variable is actually a test of the indicator category compared to the
reference category. In Table 4-17, the significance level for variable B33B(2) basically tests the
difference the months April and June as predictors of the Arsenic concentration. The
significance level for variable B33B, the original categorical variable, represents a test across all
categories of the predictor. These two situations can be likened to the difference between the F-
test in a one-way ANOVA for B33B, and pairwise multiple comparisons for B33B(1), B33B(2),
B33B(3). Thus the coefficients for B33B(2) and B33B(3) are significant, but the coefficient for
B33B(1) is not Overall B33B is a significant predictor. Interpretations of the odds ratio for the
dummy variables depend on the reference category used.
As mentioned previously, the default option for reference categories was used. For some
variables, it may be that using one particular category as the reference will produce more easily
interpreted pairwise comparisons. In such a case, it would be preferred to select that category as
the reference. Appendix F provides information on creating odds ratios for other comparisons in
a categorical predictor.
4-57

-------
In summary, the process for using logistic regression is as follows:
1.	Run logistic regression analysis using the forward stepwise option.
2.	Review results for separation issues, exclude variables or combine categories as they
occur.
3.	Rerun the analysis until separation issues affecting the analysis are resolved (up to
three runs).
Two criteria are used to evaluate the logistic regression model. One criterion is Nagelkerke's R2
(Nagelkerke 1991). This criterion is based on a generalization of the conventional R2, and, in a
particular analysis, is rescaled by the upper bound of the R2 for that analysis. This rescaling
allows Nagelkerke's R2 to be interpreted as the proportion of variance explained by the
independent variables (Allison 1999).
A second criterion is the percent correct classification. Although logistic regression is not a
classification procedure, the SPSS LOGISTIC REGRESSION procedure includes an option for
classifying cases based on a selected probability cutoff value (SPSS 2003b). In these analyses,
the cutoff value used was 0.5. Thus if, based on the predictors selected for a model, the
probability of a case having a high exposure level is > 0.5, the case would be classified as a high
exposure case. Usually both categories of the dependent variable would be included in this
measure. However, the category definitions are based on the 90th percentile, and the proportion
of cases between the categories is highly disproportionate. Any such measure across both
categories would be overwhelmed by the percent correct classification for the Y=0 category.
The correct classification of interest is for cases with high exposure levels, thus the percent
correct classification is based only on that category. In Table 4-17, both analysis criteria indicate
that this model is poor.
4.6 Quality Assurance
In order to ensure the quality of the results provided in this report, quality assurance steps were
implemented. Some examples of these steps are as follows:
1.	Detailed steps of the analysis implementation were documented for use.
2.	Transformations of the data were checked with sample hand calculations, and/or
reviews of before and after data distributions, as appropriate to the transformation.
3.	A system was used to organize the files from individual work steps in separate
electronic and paper folders.
4.	Electronic file names included descriptive and date information and were created to
assist in identifying the correct files for use.
5.	Analyses were performed using syntax files which also included the names of the
input files and comments about the processing or analysis performed.
6.	Syntax and output file names were made similar for tracking purposes.
7.	Programs developed to generate syntax for a specific type of analysis were tested by
comparing the syntax and associated output to an original version of both.
8.	Checking steps were included in the analysis process.
9.	Checks were made in preparing the report at several points and by different people to
catch inconsistencies or potential errors.
4-58

-------
5 Results and Discussion
5.1 Introduction
The data from the Arizona and Region 5 studies were analyzed independently using as much of
the available question and measurement data in each study as possible. Similarities in chemicals,
measurements, and questionnaire items offered opportunities to make comparisons between the
studies. Differences in the types of data collected, regional locations, and the data available for
analysis can also confound some comparisons between the studies. Thus this report will present
the results for each study separately.
The results in this section are the culmination of the work from Phases 1, 2, and 3, and
specifically summarize the work from Phase 3. Details of the analyses conducted in the three
phases are included in Appendices G and H for the Region 5 and Arizona studies, respectively.
Each of the appendices includes:
Summary statistics on the questionnaire variables
Summary statistics on the measurement data
Results of the Phase 2 Questionnaire Variable Reduction Process
Lists of questionnaire variables and measurements included in analysis models
For each chemical/model analysis:
Descriptive information on the dependent variable in the model
Results of the Regression Analysis for Objective 1
Results of the CHAID Analysis for Objective 2
Results of the Logistic Regression Analysis for Objective 3 .
Appendix F includes an example of the tables listed above with an explanation of their contents.
The results in this section are organized by study. Analyses were performed by chemical/model
combination (hereafter referred to as model). For each model, the results in this section provide
a high-level view of the predictors found to have significant relationships with the dependent
variable under each of the three objectives. Within each study, the results are organized by
chemical class, chemical, and finally model. Models are included, or not, based on the
availability of measurement data from the study. Model names, as used in this report, were
created using the following codes: C for concentration, E for exposure, DOS for dose, IA for
indoor air, SF for surface dust, AR for air(personal), DR for dermal, and DT for diet. Potential
models for a chemical include:
« Concentration in Indoor Air (CIA),
• Concentration in Surface Dust (CSF),
¦ Inhalation Exposure (EAR),
5-1

-------
•	Dietary Exposure (EDT),
•	Dermal Exposure (EDR), and
•	Dose (DOS).
For each model in Section 5, more details from the analyses are included in Appendices G and H
where a reader can make additional evaluations of a predictor's usefulness with respect to their
needs. Appendix F is an important reference for understanding and evaluating the information in
the tables.
The results for each model presented in this section are shown in a two-part table. The first part
lists the questionnaire and measurement variables identified as having significant relationships
under each of the three objectives: modeling (regression analysis), classification (CILAID), and
high-end exposure levels (logistic regression analysis). The full description for each
questionnaire variable can be found in Appendix D by the variable name. For Objective 1
(Modeling/Regression Analysis), a predictor is included in the summary table with an if it
appears in the final model of the stepwise regression analysis. For Objective 2 (Classification by
Exposure Levels), a predictor is included in the summary table with an if the predictor
appears anywhere in the final E-CIIAID tree. For Objective 3 (Classification of High-End
Exposure Levels), a predictor is included in the summary table with an if it appears in the
final model of the logistic regression analysis. Some variables will appear in the table tinder all
three objectives and some are objective-specific. Such seeming inconsistencies reflect the
different criteria used by the statistical techniques to determine significant relationships.
Analyses for all three objectives use the same set of cases.
As discussed in Section 4.5.5, the effect of missing measurement data on the logistic regression
analysis was handled by excluding all independent measurement variables from the analysis.
For such analyses, "NT1 is included in the Logistic Regression column if a measurement variable
appears as a variable selected by one of the other objectives. This is a reminder that the
measurement variable might have been selected or that surrogate variables may appear to take its
place. Also a Category column is included in the table as a way to group the selected variables
by topic to give a higher-level view of the type of information selected in the analyses for a
model. These categories will be also used in the summary tables in Section 2 (Conclusions).
The second part of the table provides criteria for assessing how well the variables with
significant relationships ("*") in the column explain the dependent variable under that objective.
The criteria are explained in Section 4, however, the following table is a tool used for discussing
the quality of the models' analysis results based on experience with these and other data sets.
The reader may choose different guidelines for evaluating the results based on their experience
or interests.
5-2

-------
Table 5.1 Guidelines for Discussing the Results from the Phase 3 Analyses
Analysis Criteria
Guideline
Adjusted R2
For these sample sizes, a value of 0.3 - 0.4 is fair; a value > 0.4
is good. Maximum = 1
Mallows' Cp
Value £ p+I is good, where p is the number of predictors; value
> p+1 is fair to poor depending on how large Cp is
Relative Risk Estimate
Value of 0.5 - 0.6 is fair; value < 0.50 is good. Maximum usually
= 1
% Change in Risk Estimate
Value of 30% - 50% is fair; value > 50% is good.
% Correct Classification -
High Exposure
Value > 50% is fair; value > 75% is good.
Nagelkerke R2
For these sample sizes, a value of 0.3 - 0.4 is fair; value > 0.4 is
good. Maximum = 1
5-3

-------
5.2 Results for NHEXAS Region 5 Study
5.2.1	General Comments
Analysis of the Region 5 study considered questionnaire and measurement data from Visit 1.
Data from the following questionnaires were analyzed: Descriptive, Baseline, Follow up, Food
Follow up, Technician Walk-through, and Time-Activity Diary. Summary statistics on the
approximately 600 questionnaire variables following the Phase 1 process are found in Table Gl-
1. The primary target metals analyzed were: Arsenic and Lead; the primary target VOCs
analyzed were: Benzene, Chloroform, Tetrachloroethylene, and Triehloroethylene. The
matrices or media sampled in the study included: air, dust, soil, water, food, blood, and urine.
Summary statistics for the measurement data used in the analyses are included in Table G2-1.
Results from twenty-two models across the six primary target chemicals are presented here.
Some models were only analyzed for Objective 3 by logistic regression analysis because more
than 50% of the values for the dependent variable were below detection limit. The results for
each objective's analysis will be discussed in terms of good, fair or poor based on the guidelines
in Table 5.1 and taking both criteria for each objective into account.
5.2.2	Metals
Appendix E describes the sources and human exposure routes for the primary metals analyzed in
the Region 5 study: Arsenic and Lead,
5.2.2.1 Arsenic
Table 5.2.2.1-CIA Selected Predictors of Arsenic Concentration in Indoor Air (ng/m3) and Analysis
Criteria Across the Phase 3 Objectives In Region 5 (N=218)
Type; M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partitlon
Regression
9-partition
CHA1D
Logistic
Regression

Selected Predictors






M
OUTDOOR AIR
CONCENTRATION
CONC03O
Air Measurements
*
*
*
Nl
M
SURFACE DUST
LOADING
CONC050
Dust
Measurements
*
*
k
N!
Q
# MOSTLY INDOOR
HOUSE PETS?
B43B
Pets


*

Q
IS THIS HOUSE OR
APARTMENT.
[OWNERSHIP]
D09
Housing/Ownershi
P
*



Q
PAST WEEK USED
WINDOW/WALL AC
F01B
Ventilation System
(AC/Heat)



*
Q
# DAYS PAST WEEK
USED AIR CENTRAL
HEAT
F01L1
Ventilation System
(AC/Heat)




Q
PAST WEEK DID
YOURSELF: SWEEP
INDOORS
F03B6
Cleaning
(Dust/Vacuuming/
Sweep)




Q
# MIN PAST WEEK
WOODWORKING
F03F4
Wood Work



*
Q
SIZE OF COUNTY
GEO
Location/
Characteristics


*

5-4

-------
Q
NUMBER IN HOUSEHOLD
HH_NUM_R
Number of People
in Home
*



Q
WHAT STATE DO YOU
LIVE IN
STATE
Location
*



Q
FLOORS LIVED
ON/MULTI-UNIT BLDG FR
#1
T02
Housing
Structure/Size

*


Q
TYPES OF FOUNDATION:
FULL BASEMENT
T06J4
Housing Structure



*

Analysis Criteria







Adjusted R Square


0.339
0.305



Mallows' Prediction
Criterion


15.115
6



Relative Risk Estimate




0.608


% Change in Risk Estimate




39.2


% Correct Classification -
High Exposure





9.091

Nagelkerke R Square





0.143
* Variable included in the final model,
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis Indicates
that it may be a predictor for this analysis.
The concentration values for the indoor air measurements ranged between 0 and 5.92 ng/nr.
•	The regression analysis for the 6-partition scenario is considered a poor-fair fit; the regression analysis
for the 9-partition scenario is considered a fair fit.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
The predictors for this model across the three objectives include air and dust measurements, housing
structure, ventilation system, and household activities.
5-5

-------
Table 5,2.2.1-CSF Selected Predictors of Arsenic Loading in Indoor Surface Dust (ng/cm1) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=247)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partitiori
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC020
Air Measurements
*
*
*
Nl
M
OUTDOOR AIR
CONCENTRATION
CONC030
Air Measurements


*
Nl
M
YARD SOIL
CONCENTRATION
CONC080
Soil
Measurements
*


Nl
M
WATER
CONCENTRATION
CONC090
Water
Measurements
*


Ni
Q
# MINUTES WITH
SMOKER AT HOME
B08A
Tobacco
•
*


Q
# MONTHS IN COOLING
SEASON
B29R
Cooling Season
*
*


Q
IS THIS HOUSE OR
APARTMENT.
[OWNERSHIP]
D09J3ESC
Housing/Ownershi
P
*



Q
PAST WEEK USED AIR
CENTRAL HEAT
F01L
Ventilation System
(AC/Heat)
*



Q
# DAYS PAST WEEK
USED AIR CENTRAL
HEAT
F01L1
Ventilation System
(AC/Heat)


*

0
# MIN PAST WEEK
WOODWORKING
F03F4
Wood Work



*
Q
WHAT STATE DO YOU
LIVE IN
STATE
Location
*
*
#

Q
IS THIS A MULTI-UNIT
BUILDING?
T02MULTR
Housing
Structure/Size
*

*

Q
EXTERIOR SIDING -
ASBESTOS/ASPHALT
T06C6
Housing Structure



*

Analysis Criteria







Adjusted R Square


0.241
0.158



Mallows' Prediction
Criterion


10
6



Relative Risk Estimate




0.717


% Change in Risk Estimate




28.3


% Correct Classification -
High Exposure





4

Nagelkerke R Square





0.121
* Variable included in the final model.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis,
•	The loading values For the indoor surface dust measurements ranged between 0.02 and 6.03 ng/cm2.
•	The regression analysis for the 6-partition scenario is considered a fair fit; the regression analysis for
the 9-partition scenario is considered a fair fit.
The CHAID analysis is a poor-fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful, except for the node with the highest predicted value.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
5-6

-------
• The predictors for this model across the three objectives include air, soil and water measurements,
ventilation system, housing structure, and household activities.
Table 5.2.2.1-EAR Selected Predictors of Arsenic Concentration in Personal Air (ng/m3) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=169)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partitiori
Regression I
9-partition
CHAJD
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC020
Air Measurements
*
*
*
Nl
M
SURFACE DUST
LOADING
CONCOSO
Dust
Measurements


•
Nl
Q
AT JOBS NO CONTACT
WITH DUST?
AC14G9R
Working
Conditions


*

Q
PERFORMED VIGOROUS
EXERCISE
A1A27R
Exercise


*

Q
AV. MINI. SAT/LAY ON
CARPET/RUGS
ATA24R
Time on
Rugs/Carpet


*

Q
AV. MIN. PERFORMED
VIGOROUS EXERCISE
ATA27R
Exercise
*


~
Q
AV. DAILY HOURS INSIDE
ELSEWHERE
ATEJR
Time Away From
Home


*

Q
AV. DAILY HOURS
OUTSIDE AT
WORK/SCHOOL
ATEMR_E
Time Away From
Home




Q
SEX - PARTICIPANT
BQ2
Participants
Characteristics


*

Q
HOURS/WK CHILD AWAY
FROM HOME
B18A
Time Away From
Home
*


*
Q
PAST 6 MOS, COMMUTE
BY GAR/TRUCK/VAN?
B19A
Commute
Time/Distance
*
*


Q
MONTH STOP HEATING
DEVICES
B33B
Heating Season



*
Q
# MOSTLY OUTDOOR
HOUSE PETS?
B43C
Pets
*



0
PAST WEEK USED
PORTABLE/CEILING FAN
F01D
Ventilation System
(AC/Heat)


*

Q
AT WORK- EXPOSURE
TO METALS THRU
FUMES
FMTXPOSR
Working
Conditions
*



Q
SCHOOL/DAYCARE
OUTSIDE HOME-
PARTICIPANT
SCHLR_E
Time Away From
Home


*

Q
FLOORS IN BUILDING
T01
Housing
Structure/Size


*


Analysis Criteria







Adjusted R Square


0.499
0,351



Mallows' Prediction
Criterion


7
3



Relative Risk Estimate




0.342


% Change in Risk Estimate




65.8


% Correct Classification -
High Exposure





5.882

1 Nagelkerke R Square





0.261
5-7

-------
* Variable included in the final model.
Ni - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis,
•	The concentration values for the personal air measurements ranged between 0 and 13.75 ng/nr.
•	The regression analysis for the 6-partition scenario is considered a good fit; the regression analysis for
the 9-partition scenario is considered a fair fit.
•	The CHAID analysis is a fair-good fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, there is some
reasonable differentiation for the nodes with high predicted values and for some of the other nodes.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
•	The predictors for this model across the three objectives include air and dust measurements, time away
from home, household and work characteristics, and personal activities.
5-8

-------
Table S.2.2.1-EDT Selected Predictors of Arsenic Intake in Food and Beverage from Duplicate Diet
(ug/day) and Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=156)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHA1D
Logistic
Regression

Selected Predictors






Q
FISH FROM OCEAN?
BAA12B1R
Specific Foods



*
Q
WEIGHT (POUNDS) -
PARTICIPANT
B05AMD
Participants
Characteristics



*
Q
WATER TREATMENT:
REVERSE OSMOSIS?
B26EIII
Source of Water



*
Q
NO.DAYS BREAKFAST
PREP AT RESTAURANT
FD02BNYR
Food Preparation



•
Q
NO DAYS LUNCH USUAL
1-3 TIMES/MO
FD06CNYR
Food Intake
h
4


Q
PDAYS LUNCH USUAL <
ONCE/MO
FD06DFCR
Food Intake



«
Q
NO.DAYS DINNER PREP
AT HOME
FD08ANYR
Food Preparation
*


*
Q
NO.DAYS AMT DUE TO
OTHER
FD12HNYR
Food Intake


*

Q
PDAYS DIET DIFF DUE
TO ILLNESS/MED GOND
FD14CPCR
Diet
*




Analysis Criteria







Adjusted R Square


0.099
0.042



Mallows' Prediction
Criterion


4
2



Relative Risk Estimate




0.899


% Change in Risk Estimate




10.1


% Correct Classification -
High Exposure





43,75

Nagelkerke R Square





•0.473
* Variable included in the final model.
The intake values for the duplicate diet measurements ranged between 1.08 and 130.1 ug/day.
« The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit,
*	The CI I All J analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
The logistic regression analysis is a fair-good fit for classifying people with high exposure levels.
•	The predictors for this model across the three objectives include food and diet-related activities.
5-9

-------
Table 5.2,2.1-DOS Selected Predictors of Arsenic Dose in Urine (ug/g Creatinine) and Analysis Criteria
Across the Phase 3 Objectives in Region 5 (N=197)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression.
6-partition*
Regression
9-partition*
CHA1D
Logistic
Regression

Selected Predictors















Analysis Criteria







Adjusted R Square







Mallows' Prediction
Criterion







Relative Risk Estimate







% Change in Risk Estimate







% Correct Classification -
High Exposure





NA

Nagelkerke R Square





NA
" Analysis not run because of high percentage of samples below detection limit.
NA - Analysis run, but no variables were selected.
•	The adjusted concentration values for the day 7 urine measurements ranged between 0.01 and 13.82
ug/g Creatinine.
No regression analysis was performed because more than 50% of the measurement values were below
the detection limit.
No CHA1D analysis was performed because more than 50% of the measurement values were below
the detection limit.
•	The logistic regression analysis did not select any predictors. As a reminder, the logistic regression
analysis did not include any concentration or exposure measurements and no additional questions were
included in the model, used for all three objectives, to compensate for the exclusion. Thus it is not
known whether the adjusted urine concentration at high exposure levels could be predicted if the
measurements were included.
•	No predictors were selected for this model.
5-10

-------
5.2.2.2 Lead
Table S.2.2.2-CIA Selected Predictors of Lead Concentration in Indoor Air (ng/m3) and Analysis Criteria
Across the Phase 3 Objectives in Region S (N=213)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partltion
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
OUTDOOR AIR
CONCENTRATION
CONC030
Air Measurements


*
Nl
M
SURFACE DUST
LOADING
CONC05Q
Dust
Measurements


*
Nl
Q
# MINUTES WITH
SMOKER AT HOME
BOB A
Tobacco




Q
TOBACCO SMOKING IN
HOME?
BC9A
Tobacco


*
*
Q
#TIMES PAST WEEK
SWEEP INDOORS
F03B2
Cleaning
(Dust/Vacuuming/
Sweep)




Q
PAST WEEK DID
YOURSELF: SWEEP
INDOORS
F03B6
Cleaning
(Dust/Vacuuming/
Sweep)
*



Q
# DAYS PAST WEEK
SINCE DUSTING
F03C3
Cleaning
(DustA/acuuming/
Sweep)


*

Q
SIZE OF COUNTY
GEO
Location/
Characteristics
*
*

*
Q
WAS HEATING ON
DURING SAMPLING
PERIOD?
HEAT
Heating Season


*

Q
EXTERIOR SIDING-
ASBESTOS/ASPHALT
T06C6
Housing Structure
*
*



Analysis Criteria







Adjusted R Square


0.189
0.143



Mallows' Prediction
Criterion


29.992
28.736



Relative Risk Estimate




0.579


% Change in Risk Estimate




421


% Correct Classification -
High Exposure





27.273

Nagelkerke R Square





0.347
* Variable included In the final model.
N: - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
• The concentration values for the indoor air measurements ranged between -0.9 and 293.5 ng/m3. The
Region 5 study provided concentration values as reported by the laboratory which may reflect some
correction for blanks and calibration. Some of the values were below detection limit.
The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9 partition scenario is considered a poor fit. The predictors for the regression analysis seem
reasonable for this model except for T06C6. This may be reasonable if there is an indirect connection
between lead paint dust inside and the economic level of the home, i.e., if older lower-priced homes
use this type of siding more and that such homes also have heavier lead paint inside. Also, outdoor air
is a predictor for the inhalation exposure model, but is not a predictor here. This seems a little
5-11

-------
unusual. The outdoor air concentrations were imputed for over 50% of the measurements as discussed
in section 4,3.4,
•	The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, there is some
reasonable differentiation of levels.
The logistic regression analysis is a poor-fair fit for classifying people with high exposure levels.
•	The predictors for this model across the three objectives include air and dust measurements, smoking
and cleaning activities, and household characteristics.
5-12

-------
Table 5.2.2.2-CSF Selected Predictors of Lead Loading in Indoor Surface Dust (ng/cm2) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=245)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partltion
Regression
9-partition
CHA1D
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC020
Air Measurements


*
Nl
M
OUTDOOR AIR
CONCENTRATION
CONC030
Air Measurements


•
Nl
M
YARD SOIL
CONCENTRATION
CONCQ8Q
Soil
Measurements
*


Nl
Q
CENTRAL AIR
CONDITIONER?
B29B1
Ventilation System
(AC/Heat)


*

' Q
WALL/WINDOW AIR
CONDITIONER?
B29B2
Ventilation System
(AC/Heat)



*
Q
MONTH START AIR
CONDITIONING?
B29C1
Cooling Season
*



Q
# MONTHS IN HEATING
SEASON
B32R
Heating Season


•

Q
GROUP QUARTERS?
D05R
Number of People
in Home



*
Q
PAST WEEK USED
PORTABLE/CEILING FAN
FQ1D
Ventilation System
(AC/Heat)


*

Q
# DAYS PAST WK
SMOKE/FUMES-OIL
FURNACE
FQ1H2
Smoke/Fumes/
Burned Food




Q
# TIMES PAST WEEK
VACUUMING
F03A2
Cleaning
(Dust/Vacuuming/
Sweep)




Q
# DAYS PAST WEEK
SINCE VACUUMING
F03A3
Cleaning
(Dust/Vacuuming/
Sweep)




Q
# TIMES PAST WEEK
DUSTING
F03C2
Cleaning
{DustA/acuuming/
Sweep)




Q
SIZE OF COUNTY
GEO
Location/
Characteristics


*

Q
FLOORS IN BUILDING
T01
Housing
Structure/Size


*

Q
TYPES OF FOUNDATION:
SWB
T06J1
Housing Structure


*


Analysis Criteria







Adjusted R Square


0.199
0,035



Mallows' Prediction
Criterion


5
2



Relative Risk Estimate




0.593


% Change in Risk Estimate




40.7


% Correct Classification -
High Exposure





28

Nagelkerke R Square





0.228
* Variable included in the final model.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHA1D Analysis indicates
that it may be a predictor for this analysis.
5-13

-------
The loading values for the indoor surface dust measurements ranged between 0 and 113900 ng/cra2.
*	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiations
for the high levels may be useful The ability lo distinguish the other levels is mixed,
*	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
*	Although the maximum value is very high in comparison to the rest of the measurements, the Box-Cox
transformation used tempers it considerably.
*	The predictors for this model across the three objectives include air and soil measurements, ventilation
system, housing characteristics and activities in the household.
5-14

-------
Table 5.2.2.2-KAR Selected Predictors of Lead Concentration in Personal Air (ng/m3) and Analysis Criteria
Across the Phase 3 Objectives in Region 5 (N=167)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC020
Air Measurements
*
*
*
N1
M
OUTDOOR AIR
CONCENTRATION
CONC030
Air Measurements
*
•

Nl
Q
AV. MIN. TRAVELED ON
ROADWAYS/HIGHWAYS
ATA19R
Commute
Time/Distance
*



Q
AV. MIN. PERFORMED
VIGOROUS EXERCISE
ATA27R
Exercise
*
*

*
Q
AV. DAILY HOURS INSIDE
ELSEWHERE
ATEJR
Time Away From
Home
*



Q
AV. DAILY HOURS
OUTSIDE AT
WORK/SCHOOL
ATEMRJD
Time Away From
Home




Q
# MIN. WITH SMOKER IN
OTHER ENCL. AREA
B08D
Tobacco



*
Q
# TIMES PAST WEEK
SWEEP INDOORS
F03B2
Cleaning
(Dust/Vacuuming/
Sweep)



*
Q
# MIN PAST WEEK
DUSTING
F03C4
Cleaning
(DustA/acuuming/
Sweep)



*-

Analysis Criteria







Adjusted R Square


0.448
0.396



Mallows' Prediction
Criterion


7
4



Relative Risk Estimate




0.643


% Change in Risk Estimate




35.7


% Correct Classification -
High Exposure





47.059

Nagelkerke R Square |




0.45
* Variable included in the final model,
N1 - Measurement variable was not included in this analysis, however, its selection in the Regression or CHA1D Analysis indicates
that It may be a predictor for this analysis.
The concentration values for the personal air measurements ranged between -0,55 and 254,3 ng/m5.
The Region 5 study provided concentration values as reported by the laboratory which may reflect
some correction for blanks and calibration. Some of the values were below detection limit.
*	The regression analysis for the 6-partition scenario is considered a good fit; the regression analysis for
the 9-partition scenario is considered a good fit. Outdoor air concentration is a predictor for this
model, but not for the indoor air model. This seems a bit unusual. Outdoor air concentrations were
imputed for about 50% of the measurements.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiations
for the higher levels may be useful.
The logistic regression analysis is a good fit for classifying people with high exposure levels,
•	The predictors for this model across the three objectives include air measurements, time away from
home, and smoking and household activities.
5-15

-------
Table 5.2.2.2-EDT Selected Predictors of Lead Intake in Food and Beverage from Duplicate Diet (ug/day)
and Analysis Criteria Across the Phase 3 Objectives in Region S (N=156)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partltion
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
AT JOBS NO CONTACT
WITH DUST?
AC14G9R
Working
Conditions


*

Q
AV. DAILY HOURS
OUTSIDE AT
WORK/SCHOOL
ATEMRO
Time Away From
Home
*



Q
# TIMES PAST WEEK
SWEEP INDOORS
FQ382
Cleaning
(Dust/Vacuuming/
Sweep)




Q
# TIMES PAST WEEK
DUSTING
F03C2
Cleaning
(Dust/Vacuuming/
Sweep)


*

Q
PAST WEEK DIETING?
F10
Diet
*



Q
NO.DAYS BREAKFAST
PREP AT RESTAURANT
FD02BNYR
Food Preparation


*

Q
NO.DAYS DINNER PREP
ATHOME
FD08ANYR
Food Preparation


*

Q
NO.DAYS AMT DUE TO
OTHER
FD12HNYR
Food Intake


•

Q
WHAT STATE DO YOU
LIVE IN
STATE
Location


*


Analysis Criteria







Adjusted R Square


0.133
0.062



Mallows' Prediction
Criterion


4.368
2



Relative Risk Estimate




0.656


% Change in Risk Estimate




34,4


% Correct Classification -
High Exposure





' NALR

Nagelkerke R Square





NALR
* Variable included in the final model,
NALR - Analysis was run, but not finalized because of separation issues.
The intake values for the duplicate diet measurements ranged between 0.95 and 222.9 ug/day.
The regression analysis for the 6 partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor Fit,
•	The CHA1D analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may or may not be useful,
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix G.
•	The predictors for this model across the three objectives include diet-related, household and work-
related activities.
5-16

-------
Table 5.2.2.2-DOS Selected Predictors of Lead Dose in Blood (ug/dL) and Analysis Criteria Across the
Phase 3 Objectives in Region 5 (.N=165)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
FOOD AND BEVERAGE
INTAKE
CONCENTRATION
CONC138
Diet
Measurements
*
«

Nt
Q
A1-JOBS CONTACT WITH
SAW DUST?
AC14G1R
Working
Conditions
*



Q
AV. MIN. IN ENCLOSED
WORKSHOP
ATA25R
Time at Home


«r


Analysis Criteria







Adjusted R Square


0.04
0.016



Mallows" Prediction
Criterion


2.975
1.527



Relative Risk Estimate




0.9


% Change in Risk Estimate




10


% Correct Classification -
High Exposure





17.647

Nagelkerke R Square





0.184
* Variable included in the final mode).
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
•	The concentration values for the blood measurements ranged between 0.4 and 13.1 ug/dL.
The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a poor fit for classifying people by their exposure level.. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels. There
are two potential routes for exposure to lead, inhalation and hand-to-mouth. Personal air is a good
measure of the inhalation route, and surface dust would be a good measure for the hand-to-mouth
route. Neither measurement was included in the logistic regression analysis as explained in section
4.5.5.
•	The predictors for this model across the three objectives include dietary measurements, working
conditions, and time away from home.
5-17

-------
5.2.3 VOCs
Appendix E describes the sources and human exposure routes for the primary VOCs analyzed in
the Region 5 study: Benzene, Chloroform, Tetrachloroethene, and Trichloroethene.
5.2.3.1 Benzene
Table 5.2.3.1-CIA Selected Predictors of Benzene Concentration in Indoor Air (ug/m3) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=248)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partitIon
Regression
9-partitlon
CHASD
Logistic
Regression

Selected Predictors






M
OUTDOOR AIR
CONCENTRATION
CONC190
Air Measurements


*
Nl
Q
AIR COND ON DURING
SAMPLING?
AC
Cooling Season
*
*

*
Q
AV. NUMBER
CIGARETTES SMOKED
ATA15R
Tobacco
*



Q
USE TOBACCO
PRODUCTS?
BQ6A
Tobacco



*
Q
# MINUTES WITH
SMOKER AT HOME
B08A
Tobacco
*

*

Q
GARAGE LOCATION
B27B
Garage
Structure/Activity


*

Q
GAS-POWERED DEVICES
STORED
B28
Gasoline Usage



*
Q
# MONTHS IN COOLING
SEASON
B29R
Cooling Season


*
*
Q
HEATING FUEL -
ELECTRICITY?
B31C
Heating Fuel
Usage


*

Q
HOUSEHOLD INCOME
B44
Participant
Characteristics
*



Q
# DAYS PAST WEEK
USED FIREPLACE
F01K1
Fireplace/Wood
Stove



*
Q
# DAYS PAST WEEK
USED OTHER AIR FILTER
F01O1
Ventilation System
(AC/HeatyFiltens



*
Q
# TIMES PAST WEEK
USED PAINT/SOLVENT
F02A2
Paint Usage


*

Q
SIZE OF COUNTY
GEO
Location/
Characteristics
*
*>



Analysis Criteria







Adjusted R Square


0.232
0.043



Mallows' Prediction
Criterion


8.007
9.218



Relative Risk Estimate




0.575


% Change in Risk Estimate




42.5


% Correct Classification -
High Exposure





36

Nagelkerke R Square





0.391
* Variable included in the final model.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
5-18

-------
The concentration values for the indoor air measurements ranged between 0.75 and 156.3 ng/m3.
The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, some differentiation
of levels may be useful.
The logistic regression analysis is a poor-fair fit for classifying people with high exposure levels.
The predictors for this model across the three objectives include air measurements, household
characteristics, and activities involving smoking, fuel, and paint.
5-19

-------
Table 5.2.3.1-EAR Selected Predictors of Benzene Concentration in Personal Air (ug/m3) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=244)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC180
Air Measurements


*
Nl
Q
AT-JOBS CONTACT WITH
OTHER DUST?
AC14G7R
Working
Conditions



*
Q
NO. DAYS GASOLINE ON
SKIN
ATAD2R
Gasoline Usage



•
a
NO. DAYS IN ENCLOSED
GARAGE WITH CAR
ATA03R
Garage
Structure/Activity



*
Q
AV. NUMBER
CIGARETTES SMOKED
ATA15R
Tobacco



*
Q
AV. M1N. PERFORMED
MODERATE EXERCISE
ATA28R
Exercise



*
a
USE TOBACCO
PRODUCTS?
B06A
Tobacco



*
Q
# MINUTES WITH
SMOKER AT WORK
B08B
Tobacco




Q
PAST 6 MOS, COMMUTE
BY BICYCLE?
B19E
Commute
Time/Distance



*
Q
PROPERTY USED AS
FARM OR RANCH?
B22
Location/
Characteristics



*
Q
MONTH STOP HEATING
DEVICES
B33B
Heating Season

*


Q
# DAYS PAST WEEK
USED FIREPLACE
F01K1
Fireplace/Wood
Stove



*
Q
NUMBER IN HOUSEHOLD
HH_NUM_R
Number of People
in Home




Q
WHAT STATE DO YOU
LIVE IN
STATE
Location



*

Analysis Criteria







Adjusted R Square


0.043
0.041



Mallows' Prediction
Criterion


8,961
2



Relative Risk Estimate




0.514


% Change in Risk Estimate




48.6


% Correct Classification -
High Exposure





56

Nagelkerke R Square





0.534
* Variable included in the final model.
N1 - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that It may be a predictor for this analysis.
The concentration values for the personal air measurements ranged between 0.92 and 106,5 ug/m3.
•	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9 partition scenario is considered a poor fit.
*	The CHAID analysis is a fair- fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, there is some
reasonable differentiation for the nodes with high predicted values.
5-20

-------
The logistic regression analysis is a fair-good fit for classifying people with high exposure levels and
has a higher percent correct classification for high exposure levels than most of the other models.
* The predictors for this model across the three objectives include air measurements, household
characteristics, and activities involving smoking and fuel.
5-21

-------
Table 5,2.3.1-DOS Selected Predictors of Benzene Dose in Blood (ug/L) and Analysis Criteria Across the
Phase 3 Objectives in Region 5 (N=143)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
PERSONAL INDOOR AIR
CONCENTRATION
CONC160
Air Measurements
*
•
"
Nl
Q
AT-JOBS WEAR
GLOVES?
AC14F1R
Working
Conditions
*



Q
DRANK WATER
AIA14R
Water Intake
*
*


Q
IN ENCLOSED
WORKSHOP
AIA25R
Time at Home
•*



Q
NO. DAYS PUMPED GAS
ATA01R
Gasoline Usage
*



Q
AV. MIN. USED
CLEANING SUPPLIES
ATA23R
Cleaning Supply
Usage
*
*



Analysis Criteria







Adjusted R Square


0.32
0.23



Mailows' Prediction
Criterion


7
4



Relative Risk Estimate




0,037


% Change in Risk Estimate




16.3


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
The concentration values for the blood measurements ranged between 0,03 and 2.20 ug/L.
•	The regression analysis for the 6-partition scenario is considered a fair fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix G.
•	The predictors for this model across the three objectives include air measurements, and various types
of personal activities.
5-22

-------
5.2.3.2 Chloroform
Table 5.2.3,2-ClA Selected Predictors of Chloroform Concentration in Indoor Air (ug/m3) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=245)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
OUTDOOR AIR
CONCENTRATION
CONC190
Air Measurements
*
*

Nl
M
TAP WATFR
CONCENTRATION
CONC200
Water
Measurements


*
Nl
Q
AIR COND ON DURING
SAMPLING?
AC
Cooling Season


*
*
Q
NO. DAYS TOBACCO
SMOKED IN HOME
ATA08R
Tobacco


•

Q
MONTHS SINCE
QUITTING TOBACCO USE
B06C
Tobacco



*
Q
# MINUTES WITH
SMOKER AT WORK
B08B
Tobacco



*
Q
TOBACCO SMOKING IN
HOME?
B09A
Tobacco



*
Q
DAYS PAST 3-MO. USE
(LEAD) OIL PAINT
B11B
Lead Use



•
Q
SOURCE OF RUNNING
WATER - PRIVATE WELL?
B26B2
Source of Water



ik
Q
WATER TREATMENT:
OTHER?
B26EV
Source of Water



*
Q
GAS-POWERED DEVICES
STORED
B28
Gasoline Usage


*

Q
HEATING FUEL - WOOD?
B31F
Heating Fuel Usage



*
Q
PAST 6 MONTHS,
DEODORIZERS USED
B42
Deodorizer Usage


*

Q
10+ PEOPLE AT
ADDRESS?
D04
Number of People
in Home


*

Q
HOW OFTEN CHANGE
FILTER IN DEVICE
F01P1R
Ventilation System
(AC/Heat)/Filters
•

•
*
Q
# TIMES PAST WEEK
USED PAINT/SOLVENT
F02A2
Paint Usage


•

Q
# DAYS PAST WK SINCE
USED GLUES
F02B3
Glue Usage


A

Q
# DAYS PAST WEEK
SINCE BURN FOOD
F04B3
Smoke/Fumes/
Burned Food



*
Q
NUMBER IN HOUSEHOLD
HH NUM
R
Number of People
in Home


*

Q
FLOORS LIVED
ON/MULTI-UNIT BLDG FR
m
TQ2
Housing
Structure/Size


*

Q
EXTERIOR SIDING -
OTHER
T06C7
Housing Structure



*
Q
TYPES OF FOUNDATION:
CRAWL SPACE
T06J2
Housing Structure
*



5-23

-------
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Analysis Criteria







Adjusted R Square


0.26
0.235



Mallows' Prediction
Criterion


5
3



Relative Risk Estimate




0,488


% Change in Risk Estimate




51.2


% Correct Classification -
High Exposure





32

Nagelkerke R Square





0.43
* Variable included in the final model,
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
*	The concentration values for the indoor air measurements ranged between 0 and 34.02 ug/m3.
The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9 partition scenario is considered a poor fit. The predictors for the regression analysis seem
reasonable for this model, except for outdoor air and drinking water concentrations.
*	The CHAID analysis is a fair-good fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, a few of the nodes
may be useful in differentiating levels.
*	The logistic regression analysis is a poor-fair fit for classifying people with high exposure levels.
*	The predictors for this model across the three objectives include air and water measurements, housing
characteristics, source of water, personal activities, and activities involving smoking and fuel.
5-24

-------
Table 5.2.3,2-EAR Selected Predictors of Chloroform Concentration in Personal Air (ug/m3) and Analysis
Criteria Across the Phase 3 Objectives in Region 5 (N=240)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC180
Air Measurements
«
*
*
Nl
M
OUTDOOR AIR
CONCENTRATION
CONC190
Air Measurements
*
*

Ni
Q
TOTAL HRS/WK WORKED
AT HOME, BOTH JOBS
AC14AIR
Time at Home
*



Q
SOURCE OF RUNNING
WATER - PRIVATE WELL?
B26B2
Source of Water
*
*


Q
WHAT STATE DO YOU
LIVE IN
STATE
Location
*




Analysis Criteria







Adjusted R Square


0.435
0.395



Mallows' Prediction
Criterion


6
4



Relative Risk Estimate




0.443


% Change in Risk Estimate




55.7


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included In the final model,
Nl - Measurement variable was riot included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
•	The concentration values for the personal air measurements ranged between 0 and 26.17 ug/m3.
•	The regression analysis for the 6-partition scenario is considered a good fit; the regression analysis for
the 9-partition scenario is considered a good fit.
•	The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation for
the high levels may not be useful.
« The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix G.
•	The predictors for this model across the three objectives include air measurements, time at home, and
source of water.
5-25

-------
Table 5.2.3.2-DOS Selected Predictors of Chloroform Dose 111 Blood (ug/L) and Analysis Criteria Across
the Phase 3 Objectives in Region 5 (N=125)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
PERSONAL INDOOR AIR
CONCENTRATION
CONC160
Air Measurements
*

*
NI
Q
AT-JOBS CONTACT WITH
SAW DUST?
AC14G1R
Working
Conditions
*



Q
# DAYS PAST WEEK
USED WOOD/COAL
STOVE
F01G1
Fireplace/Wood
Stove



*
Q
# MiN PAST WEEK METAL
WORKING
FD3G4
Metal Work



*

Analysis Criteria







Adjusted R Square


0.Q66
NA



Mallows' Prediction
Criterion


3
NA



Relative Risk Estimate




0.841


% Change in Risk Estimate




15.9


% Correct Classification -
High Exposure





15.385

Nageikerke R Square





0,138
* Variable included in the final model.
NA: The variable was not significant in 9 or more partitions.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
•	The concentration values for the blood measurements ranged between 0 and 2.6 ug/L.
•	The regression analysis for the 6-partition scenario is considered a poor fit; no variables appeared in at
least 9 partitions.
•	The CHAID analysis is a poor fit for classifying people by their exposure level.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
•	The predictors for this model across the three objectives include air measurements, working
conditions, metal work, and fireplace/wood stove use.
5-26

-------
5.2.3.3 Tetrachloroethylene
Table 5.2.3.3-CIA Selected Predictors of Tetrachloroethylene Concentration in Indoor Air (ug/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=228)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
OUTDOOR AIR
CONCENTRATION
CONC190
Air Measurements
*
*
ft
Nl
Q
NO.DAYS
STARTED,TENDED FIRE
ATAQ7R
Fireplace/Wood
Stove




Q
DAYS PAST 3-MO. USING
LEAD SOLDER?
B11A
Lead Use
*
*


Q
SOURCE OF COOKING
WATER?
B26C
Source of Water
*



Q
HEATING FUEL -
ELECTRICITY?
B31C
Heating Fuel
Usage
»



Q
PAST WEEK
WINDOW/WALL AC
SETTING
F01B1
Ventilation System
(AC/Haat)
*



Q
SIZE OF COUNTY
GEO
Location/
Characteristics


*


Analysis Criteria







Adjusted R Square


0.403
0.324



Mallows' Prediction
Criterion


7
3



Relative Risk Estimate




0.562


% Change in Risk Estimate




43.8


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model,
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues,
•	The concentration values for the indoor air measurements ranged between 0 and 659.6 ug/m3.
¦ The regression analysis for the 6-partition scenario is considered a fair-good fit; the regression analysis
for the 9-partition scenario is considered a fair fit. The predictors in the 9-partition model seem
reasonable; there is some question about the additional predictors for the 6-partition model.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix G.
•	The predictors for this model across the three objectives include air measurements, source of water,
ventilation system, and activities involving fuel and lead.
5-27

-------
Table 5.2.3.3-EAR Selected Predictors of Tetrachloroethylene Concentration in Personal Air (ug/ni3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=228)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHA1D
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC180
Air Measurements
*
*
*
Nl
M
OUTDOOR AIR
CONCENTRATION
CONC190
Air Measurements
*
*

Nl
Q
AV. NO. GLASSES OF
WATER
ATA14R
Water Intake
*



Q
SEX - PARTICIPANT
B02
Participant
Characteristics
*



Q
DAYS PAST 3 MO. USING
LEAD SOLDER?
B11A
Lead Use
*



Q
SIZE OF COUNTY
GEO
Location/
Characteristics
*
*


Q
WHAT STATE DO YOU
LIVE IN
STATE
Location
*
*



Analysis Criteria







Adjusted R Square


0.435
0,392



Mallows' Prediction
Criterion


9.959
5



Relative Risk Estimate




0.402


% Change in Risk Estimate




59.8


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHA1D Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
The concentration values for the personal air measurements ranged between 0 and 986.9 ug/m'.
The regression analysis for the 6-partition scenario is considered a fair fit; the regression analysis for
the 9-partition scenario is considered a fair-good fit.
The CHA1D analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix G.
• The predictors for this model across the three objectives include air measurements, household
characteristics, lead use, and water intake.
5-28

-------
Table 5.2.3.3-DQS Selected Predictors of Tetrachloroethylene Dose in Blood (ug/L) and Analysis Criteria
Across the Phase 3 Objectives in Region S (N=147)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
PERSONAL INDOOR AIR
CONCENTRATION
CONC16Q
Air Measurements
•
*
*
Nl
Q
NO. DAYS PUMPED GAS
ATA01R
Gasoline Usage
-
*

*
Q
# DAYS PAST WEEK
USED WOOD/COAL
STOVE
F01G1
Fireplace/Wood
Stove




Q
# TIMES EAT
BROC/CAULIF/BRUS
SPROUTS
F09A2
Specific Foods




Q
# TIMES EAT
CABBAGE/SLAVWSAUERK
RAUT
F09B2
Specific Foods





Analysis Criteria







Adjusted R Square


0.141
0.141



Mallows' Prediction
Criterion


3
3



Relative Risk Estimate




0.762


% Change in Risk Estimate




23.B


% Correct Classification -
High Exposure





20

Nagelkerke R Square





0.269
* Variable included in the final model.
N! - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
•	The concentration values for the blood measurements ranged between 0 .04and 20 ug/L.
•	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
•	The CHAID analysis is a poor fit for classifying people by their exposure level.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels,
•	The predictors for this model across the three objectives include air measurements, specific foods in
the diet, and activities involving fuel.
5-29

-------
5.2.3.4 Trichloroethylene
Table 5.2,3.4-CIA Selected Predictors of Trichloroethylene Concentration in Indoor Air (ug/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=236)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
OUTDOOR AIR
CONCENTRATION
CONC19Q
Air Measurements
*
*
1
Ni
M
DRINKING WATER
CONCENTRATION
CONC200
Water
Measurements


ft
Nl
Q
MONTHS SINCE
QUITTING TOBACCO USE
B06C
Tobacco
*


*
Q
DAYS PAST MO. FREQ.
PAINTING?
B10A
Paint Usage
*
*


Q
PAST 6 MONTHS FLOORS
REFINISHED
B25D
Housing Structure/
Remodeling



*
Q
MONTH START AIR
CONDITIONING?
B29C1
Cooling Season




Q
# MONTHS IN COOLING
SEASON
B29R
Cooling Season
*



Q
HOUSEHOLD INCOME
B44
Participant
Characteristics
*


*
Q
IS THIS HOUSE OR
APARTMENT,
[OWNERSHIP]
D09
Housing/Ownershi
P




Q
PAST WEEK PARK CAR
IN?
F05
Garage
Structure/Activity

*


Q
FLOORS IN BUILDING
T01
Housing
Structure/Size
*
*


Q
IS THIS A MULTI-UNIT
BUILDING?
TQ2MULTR
Housing
Structure/Size
*
*

*

Analysis Criteria







Adjusted R Square


0,361
0.254



Mallows' Prediction
Criterion


11
6



Relative Risk Estimate




0.416


% Change in Risk Estimate




58.4


% Correct Classification -
High Exposure





12.5

Nagelkerke R Square





0.219
* Variable included in the final model,
Nl - Measurement variable was not Included in this analysis, however, its selection In the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis,
*	The concentration values for the indoor air measurements ranged between 0 and 120,4 ug/m3.
*	The regression analysis for the 6-partition scenario is considered a fair fit; the regression analysis for
the 9-partition scenario is considered a fair fit
*	The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
The logistic regression analysis is a poor fit for classifying people with high exposure levels,
*	The predictors for this model across the three objectives include air and water measurements,
household characteristics and activities involving smoking and paint.
5-30

-------
Table 5.2.3.4-EAR Selected Predictors of Trichloroethvlene Concentration in Personal Air (ug/m3) and
Analysis Criteria Across the Phase 3 Objectives in Region 5 (N=228)
Type; M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC180
Air Measurements


•
Nl
M
OUTDOOR AIR
CONCENTRATION
CONC190
Air Measurements
*
*

Nl
Q
TOTAL HRS/WK WORKED
AT HOME, BOTH JOBS
AC14AIR
Time at Home
*



Q
AV. MIN. PERFORMED
VIGOROUS EXERCISE
ATA27R
Exercise




Q
PAST 6 MONTHS WALL
ADDED OR REMOVED
B25B
Housing Structure/
Remodeling




Q
WALL/WINDOW AIR
CONDITIONER?
B29B2
Ventilation System
(AC/Heat)
*



Q
HOUSEHOLD INCOME
B44
Participant
Characteristics
*



Q
GROUP QUARTERS?
DQSR
Number of People
in Home



*
Q
# DAYS PAST WEEK
USED WOOD/COAL
STOVE
F01G1
Fireplace/Wood
Stove



*
Q
IS THIS A MULTI-UNIT
BUILDING?
T02MULTR
Housing
Structure/Size
*
*



Analysis Criteria







Adjusted R Square


0.1B5
0.136



Mallows' Prediction
Criterion


20.448
3



Relative Risk Estimate




0.44


% Change in Risk Estimate




56


% Correct Classification -
High Exposure





17.391

Nagelkerke R Square





0.214
* Variable included in the final model,
Nl - Measurement variable was not Included in this analysis, however, its selection in the Regression or CHAID Analysis Indicates
that it may be a predictor for this analysis.
The concentration values for the personal air measurements ranged between 0 and 168,02 ug/m'.
*	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
« The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
*	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
*	The predictors for this model across the three objectives include air measurements, household
characteristics, ventilation system, and personal activities.
5-31

-------
Table 5.2.3.4-DOS Selected Predictors of Trichloroethylene Dose in Blood (ug/L) and Analysis Criteria
Across the Phase 3 Objectives in Region 5 (>=149)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partltion
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
PERSONAL INDOOR AIR
CONCENTRATION
CONC160
Air Measurements
*
*
*
Nl
Q
AT-JOBS WEAR
GLOVES?
AC14F1R
Working
Conditions
*
*
*

Q
NO. DAYS GASOLINE ON
SKIN
ATA02R
Gasoline Usage



*
Q
NO. DAYS WITH YARD
DIRT/SOIL ON SKIN
ATA04R
Gardening



*
Q
NO. DAYS
STARTED/"'ENDED FIRE
ATA07R
Fireplace/Wood
Stove



*
Q
TOOK BATH
AIA11R
Hygiene



~
Q
FREQ. OF FIREPLACE
USE
B37C
Fireplace/Wood
Stove
*
*


Q
FISH FROM GREAT
LAKES?
BAA12B2R
Specific Foods


*

Q
# DAYS PAST WK SINCE
USED GLUES
F02B3
Glue Usage
•




Analysis Criteria







Adjusted R Square


0.53
0.454



Mallows' Prediction
Criterion


8.029
4



Relative Risk Estimate




0.352


% Change in Risk Estimate




64.8


% Correct Classification -
High Exposure





45.455

Nagelkerke R Square





0.386
* Variable included in the final model.
Ni - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
*	The concentration values for the blood measurements ranged between 0.01 and 0.52 ug/L.
*	The regression analysis for the 6-purtitiun scenario is considered a good fit; the regression analysis for
the 9-partition scenario is considered a good fit.
*	The CHAID analysis is a good lit for classifying people by their exposure level. The differentiation of
levels may not be useful because of the range of measurements.
The logistic regression analysis is a fair fit for classifying people with high exposure levels.
The predictors for this model across the three objectives include air measurements, specific foods in
the diet, hygiene, working conditions, and activities involving fuel and glue.
5-32

-------
53 Results for NHEXAS Arizona Study
5.3.1 General Comments
Analysis of the Arizona study considered questionnaire and measurement data from Stage 3.
Data from the following questionnaires were analyzed: Descriptive, Baseline, Follow up, Food
Follow up, Technician Walk-through, and Time-Activity Diary, Summary statistics oil the
approximately 600 questionnaire variables following the Phase 1 process are found in Table Hl-
1. The primary target metals analyzed were: Arsenic, Cadmium, Chromium, Lead, and Nickel;
the primary target VOCs analyzed were: Benzene, 1,3-Butadiene, Formaldehyde, Toluene, and
Tricholorethylene. The matrices or media sampled in the study included: air, dust, soil, water,
food, blood, and urine. Summary statistics for the measurement data used in the analyses are
included in Table H2-1. Results from twenty-six models across eight of the primary target
chemicals are presented here. 1,3-Butadiene and Trichloroethylene had an insufficient number
of measurements for analysis. Some models were only analyzed for Objective 3 by logistic
regression analysis because more than 50% of the values for the dependent variable were below
detection limit. The results for each objective's analysis will be discussed in terms of good, fair
or poor based on the guidelines in Table 5.1 and taking both criteria for each objective into
account.
5-33

-------
5.3.2 Metals
Appendix E describes the sources and human exposure routes for the primary metals analyzed in
the Arizona study: Arsenic, Cadmium, Chromium, Lead, and Nickel,
5.3.2.1 Arsenic
Table 5.3.2.1-ClA Selected Predictors of Arsenic Concentration in Indoor Air (ng/m3) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=127)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition*
Regression
9-partition*
CHAID
a
Logistic
Regression

Selected Predictors















Analysis Criteria







Adjusted R Square







Mallows' Prediction
Criterion







Relative Risk Estimate







% Change in Risk Estimate







% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Analysis not run because of high percentage of samples below detection limit.
NALR - Analysis was run, but not finalized because of separation issues.
The concentration values for the indoor air measurements ranged between 1.71 and 22.3 ng/m3.
•	No regression analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
•	No CHAID analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
« No predictors were selected for this model.
5-34

-------
Table 5.3.2.1-CSF Selected Predictors of Arsenic Loading in Indoor Surface Dust (ug/rn2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=135)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CONC103
Air Measurements
*


Nl
M
FOUNDATION SOIL
CONCENTRATION
CONC122
Soil
Measurements
*


Nl
M
TAP WATER
CONCENTRATION
CONC123
Water
Measurements
*


Nl
Q
TOTAL HRS/WK WORKED
AT BOTH JOBS
AC14A2
Time Away From
Home
•



Q
AV. MIN. INDOORS WITH
SMOKER
ATA20Z
Tobacco
*



Q
PAST 6 MONTHS FLOORS
REFINISHED
B25D
Housing Structure/
Remodeling
*

*

Q
MONTH START AIR
CONDITIONING?
B29C1
Cooling Season


«r

Q
# MOSTLY OUTDOOR
HOUSE PETS?
B43C
Pets
*



Q
HOUSEHOLD INCOME
B44
Participant
Characteristics
•



Q
WHAT COUNTY DO YOU
LIVE IN?
CNTYJ7
Location
*

•

Q
EXTERIOR SIDING -
CONCRETE BLOCK
T06C4
Housing Structure
•




Analysis Criteria







Adjusted R Square


0.517
NA



Mallows' Prediction
Criterion


11
NA



Relative Risk Estimate




0.694
•

% Change in Risk Estimate




30.6


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square


I

NALR
* Variable included in the final model.
NA: Analysis run, but no variables were selected,
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
The loading values for the indoor surface dust measurements ranged between 0.06 and 22.9 ug/mJ.
*	The regression analysis for the 6-pamtion scenario is considered a good fit; no variables appeared in at
least nine of the partitions.
The CHAID analysis is a poor-fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
The predictors for this model across the three objectives include air, soil and water measurements,
household characteristics, pets, smoking activities, and time away from home.
5-35

-------
Table 5.3.2.1-EDR Selected Predictors of Arsenic Loading in Dermal (ug/mJ) and Analysis Criteria Across
the Phase 3 Objectives in Arizona (N=154)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
SURFACE DUST
LOADING
CONC101
Dust
Measurements


*
Nl
M
TAP WATER
CONCENTRATION
CONC123
Water
Measurements
*
*
*
Nl
Q
AIR CONDITIONING ON
DURING SAMPLING?
AC
Cooling Season


*

Q
SEX OF PARTICIPANT
B02
Participant
Characteristics


*

0
USE TOBACCO
PRODUCTS?
B06A
Tobacco




Q
PAST WEEK DUSTING
F03C1
Cleaning
(Dust/Vacuuming/
Sweep)





Analysis Criteria







Adjusted R Square


0.053
0.D25



Mallows' Prediction
Criterion


3
2



Relative Risk Estimate




0.569


% Change in Risk Estimate




43.1


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
" Variable included in the final model.
N1 - Measurement variable was not included in this analysis, however, its selection in the Regression or CHA1D Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
The loading values for the dermal measurements ranged between 0.28 and 76,7 ug/nr.
The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
•	The predictors for this model across the three objectives include dust and water measurements, and
smoking and personal activities.
5-36

-------
Table 5.3.2.1-EDT Selected Predictors of Arsenic Intake in Total Diet from Duplicate Diet (ug/day) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=158)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
AT-JOBS CONTACT WITH
ROAD DUST?
AC14G2Z
Working
Conditions


*

Q
HEIGHT (METERS) -
PARTICIPANT
B04CMD
Participant
Characteristics


*

Q
DAYS IN 3-MO. EAT
HOME-GROWN CANNED
CROP
B12B
Specific Foods




Q
SOURCE OF COOKING
WATER?
B26C
Source of Water
*



Q
SOURCE Of DRINKING
WATER
B26D
Source of Water
*
*
*

Q
WATER TREATMENT;
REVERSE OSMOSIS?
B26EIN
Source of Water
*
*


Q
PDAYS BREAKFAST
USUAL 6-7 TIMES/WK
FD03APCZ
Food Intake
*



Q
PDAYS DINNER USUAL 6-
7 TIMES/WK
FDQ9APCZ
Food Intake


¦

Q
NO.DAYS REPORTED ON
SNACK COLLECTION
FD10DNDZ
Food Collection


4

Q
DUST LEVEL RATING
T04A
Dust Level
*



Q
EXT PAINTING
CHALKING/CHIPPING/PE
ELING
T06D
Housing
Structure





Analysis Criteria







Adjusted R Square


0.239
0.117



Mallows' Prediction
Criterion


8
4



Relative Risk Estimate




0.726


% Change in Risk Estimate




27.4


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
NALR - Analysis was run, but not finalized because of separation issues.
The intake values for the duplicate diet measurements ranged between 0.05 and 71.92 ug/day.
•	The regression analysis for the 6-partition scenario is considered a poor-fair fit; the regression analysis
for the 9-partition scenario is considered a poor fit. The predictors for the regression analysis seem
reasonable for this model, except for T06D.
The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
•	The predictors for this model across the three objectives include diet-related activities, source of water,
household characteristics, and working conditions.
5-37

-------
Table 5.3.2.1-DOS Selected Predictors of Arsenic Dose in Urine (ug/g creatinine) and Analysis Criteria
Across the Phase 3 Objectives in Arizona (N=166)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
8-partltion
CHAID
Logistic
Regression

Selected Predictors






M
INDOOR AIR
CONCENTRATION
CGNC111
Air Measurements
*


Nl
Q
PERFORMED VIGOROUS
EXERCISE
AIA272
Exercise


*

Q
AV. MIN. INDOORS WITH
SMOKER
ATA20Z
Tobacco


*

Q
AV. DAILY HOURS
OUTSIDE ELSEWHERE
ATE_OZ
Time Away From
Home


•tr

Q
PAST 6 MONTHS.
DEODORIZERS USED
B42
Deodorizer Usage


*

Q
WHAT COUNTY DO YOU
LIVE IN?
CNTY_Z
Location
*

*

Q
PAST WEEK USED AIR
CENTRAL HEAT
F01L
Ventilation System
(AC/Heat)


*

Q
DRIPLINE METERS FROM
WALL
TQBG2A
Dripline


*


Analysis Criteria






Adjusted R Square


0.24
NA


Mallows' Prediction
Criterion


3.011
NA |
I


Relative Risk Estimate




0.534


% Change in Risk Estimate




46.6


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model
NA: Analysis run, but no variables were selected,
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
The adjusted concentration values for the urine measurements ranged between 0.47 and 400 ug/g
creatinine.
•	The regression analysis for the 6-partition scenario is considered a poor fit; no variables appeared in at
least nine of the partitions.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
•	The predictors for this model across the three objectives include air measurements, household
characteristics, smoking and personal activities.
5-38

-------
5.3.2.2 Cadmium
Table 5.3.2,2-CSF Selected Predictors of Cadmium Loading in Indoor Surface Dust (ug/m2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=128)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHA1D
Logistic
Regression

Selected Predictors






Q
AT-JOBS CONTACT WITH
MINE DUST?
AC14G5Z
Working
Conditions
*



Q
USE TOBACCO
PRODUCTS?
B06A
Tobacco
*



Q
# MINUTES WITH
SMOKER AT HOME
B08A
Tobacco



*
Q
# DAYS PAST WEEK
USED AIR CENTRAL
HEAT
F01L1
Ventilation System
(AC/Heat)




Q
WAS HEATING ON
DURING SAMPLING
PERIOD?
HEAT
Heating Season
*



Q
DRIPLINE METERS FROM
WALL
T06G2A
Dripline



*'

Analysis Criteria







Adjusted R Square


0.126
NA



Mallows' Prediction
Criterion


4
NA



Relative Risk Estimate




NA


% Change in Risk Estimate




NA


% Correct Classification -
High Exposure





15.385

Nagelkerke R Square





0.232
* Variable included in the final model.
NA; Analysis run, but no variables selected.
The loading values for the indoor surface dust measurements ranged between 0.1 and 11.1 ug/m2.
•	The regression analysis for the 6-partitiori scenario is considered a poor fit; no variables appeared in at
least nine of the partitions.
No predictors were identified for the CI1A1D analysis.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
•	"Die predictors for this model across the three objectives include smoking activities, working
conditions and household characteristics.
5-39

-------
Table 5.3.2.2-EDR Selected Predictors of Cadmium Loading in Dermal (ug/mJ) and Analysis Criteria
Across the Phase 3 Objectives in Arizona (N=134)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition*
Regression
9-partition*
CHAiD
Logistic
Regression

Selected Predictors






Q
SMOKED CIGARETTES
AIA15Z
Tobacco



•
Q
PERFORMED MODERATE
EXERCISE
A1A2BZ
Exercise



*
Q
AV. DAILY HOURS
OUTSIDE AT
WORK/SCHOOL
ATEMZjO
Time Away From
Home



•
Q
DAYS PAST MO.
REMOVING PAINT
(OTHER)?
B10C
Paint Usage



#
Q
# TIMES PAST WEEK
USED SANDER
F02F2
Wood Work



*

Analysis Criteria







Adjusted R Square







Mallows' Prediction
Criterion







Relative Risk Estimate







% Change in Risk Estimate







% Correct Classification -
High Exposure





35.714

Nagelkerke R Square





0.455
* Analysis not run because of high percentage of samples below detection limit.
The loading values for the dermal measurements ranged between 9.43 and 17170 ug/mJ,
No regression analysis was run, because more than 50% of the dermal measurements were below
detection limit.
•	No CHAID analysis was run, because more than 50% of the dermal measurements were below
detection limit,
•	The logistic regression analysis is a fair fit for classifying people with high exposure levels.
•	The predictors for this model across the three objectives include time away from home, and smoking
and persona) activities.
5-40

-------
Table 5.3.2.2-EDT Selected Predictors of Cadmium Intake in Total Diet from Duplicate Diet (ug/day) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=lll)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
SEX OF PARTICIPANT
B02
Participant
Characteristics
*



Q
HEIGHT (METERS) -
PARTICIPANT
B04CMD
Participant
Characteristics


*

Q
WEIGHT (KILOGRAMS) -
PARTICIPANT
BQ5AMD
Participant
Characteristics
*

*

Q
MONTHS SINCE
QUITTING TOBACCO USE
B06C
Tobacco


*

Q
PAST WEEK ON
DIABETIC DIET
F11G
Diet
*



Q
PDAYS LUNCH EATEN AT
HOME
FD05A1PZ
Food Intake »


*

Q
PDAYS LUNCH EATEN AT
WORK SITE
FD05C1P2
Food Intake


*


Analysis Criteria







Adjusted R Square


0.246
0.167



Mallows' Prediction
Criterion


4
2



Relative Risk Estimate




0.548


% Change in Risk Estimate




45.2


% Correct Classification -
High Exposure



l
|

NALR

Nagelkerke R Square


I

NALR
* Variable included In the final model.
NALR - Analysis was run, but riot finalized because of separation issues.
•	The intake values for the duplicate diet measurements ranged between 0.03 and 1.71 ug/day.
•	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful, but this is a narrow range of values.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
The predictors for this model across the three objectives include participant characteristics, and
activities involving diet and smoking.
5-41

-------
Table 5.3.2.2-DOS-BLD Selected Predictors of Cadmium Dose in Blood (ug/L) and Analysis Criteria Across
the Phase 3 Objectives in Arizona (N=162)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
USE TOBACCO
PRODUCTS?
BQ6A
Tobacco
•



Q
# MINUTES WITH
SMOKER AT HOME
B08A
Tobacco
*



Q
SOURCE OF RUNNING
WATER-PUB/COMM
SYSTEM
B26B1
Source of Water
*



Q
SOURCE OF COOKING
WATER?
B26C
Source of Water
«



Q
FREQ. OF WOOD/COAL
STOVE USE
B36C
Firepiace/Woad
Stove
*



Q
DUST LEVEL RATING
T04A
Dust Level
*



Q
MATERIAL - ENTRANCE
TO STRUCTURE: SOIL
T06F1
Housing Structure
*




Analysis Criteria







Adjusted R Square


0.23
NA



Mallows' Prediction
Criterion


8
NA



Relative Risk Estimate




NA


% Change in Risk Estimate




NA


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model,
NA: Analysis run, but no variables were selected,
NALR - Analysis was run, but not finalized because of separation issues.
*	The concentration values for the blood measurements ranged between 0.1 and 4.3 ug/L.
« The regression analysis for the 6-partition scenario is considered a poor-fair fit; no variables appeared
in at least nine of the partitions.
*	No predictors were selected for the CHAID analysis.
*	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
*	The predictors for this model across the three objectives include household characteristics, source of
water, and smoking activities.
5-42

-------
Table 5.3.2.2-DOS-URN Selected Predictors of Cadmium Dose in Urine (ug/g creatinine) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=171)
Type; M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
DERMAL LOADING
CQNC1Q9
Dermal
Measurements
*

*
Nl
Q
AIR CONDITIONING ON
DURING SAMPLING?
AC
Cooling Season


*

Q
DOORS AND WINDOWS
LEFT OPEN
A1A26Z
Doors/Window
Open


*

Q
PERFORMED VIGOROUS
EXERCISE
AIA27Z
Exercise


*

Q
AV. NO. TIMES WASHED
HANDS
ATA18Z
Hygiene


*

Q
AV. DAILY HOURS INSIDE
AT HOME
ATE_EZ
Time at Home
*
*


Q
AV. DAILY HOURS INSIDE
AT WORK/SCHOOL
ATEGZ_E
Time Away From
Home
•
*
tr

Q
# CIGARETTES/DAY
SMOKED [CATEGORIES]
B07A
Tobacco
*



Q
# CIGARS/DAY SMOKED
BQ7B
Tobacco



*
Q
PAST 6 MOS, COMMUTE
BY OTHER MEANS?
B19G1
Commute
Time/Distance
#


*
Q
HOUSEHOLD INCOME
844
Participant
Characteristics
*



Q
WHAT COUNTY DO YOU
LIVE IN?
CNTY Z
Location


*

Q
PAST WEEK USED AIR
CENTRAL HEAT
F01L
Ventilation System
(AC/Heat)
•



Q
# TIMES PAST WEEK
USED SANDER
F02F2
Wood Work



*
Q
SURROUNDING AREA:
COMMERCIAL
T06A3
Location/
Characteristics
*
*


Q
DRIPLINE LOCATION
T06G1
Driplirie


*

Q
DRIPLINE METERS FROM
WALL
T06G2A
Dripline



«

Analysis Criteria







Adjusted R Square


0.346
0.094



Mallows' Prediction
Criterion


8
3



Relative Risk Estimate




0.672


% Change in Risk Estimate




32.8


% Correct Classification -
High Exposure





16.667

Nagelkerke R Square





0.23
* Variable included in the final model.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
• The adjusted concentration values for the urine measurements ranged between 0.02 and 40 ug/g
creatinine.
The regression analysis for the 6-partition scenario is considered a fair fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
5-43

-------
• The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
The logistic regression analysis is a poor fit for classifying people with high exposure levels.
The predictors for this model across the three objectives include dermal measurements, household
characteristics, time at or away from home, smoking and personal activities.
5-44

-------
5.3.2.3 Chromium
Table 5.3.2.3-CSF Selected Predictors of Chromium Loading in Indoor Surface Dust (ug/m2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=128)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partltion
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
FOUNDATION SOIL
CONCENTRATION
CONC122
Soil
Measurements
*
*

Nl
Q
AT-JOBS CONTACT WITH
SAWDUST?
AC14G1Z
Working
Conditions
*



Q
MONTH START HEATING
DEVICES
B33A
Heating Season
*



Q
WHAT COUNTY DO YOU
LIVE )N?
CNTY_Z
Location
*•
*


Q
IS THIS HOUSE OR
APARTMENT &
[OWNERSHIP]
D09
Housing/Ownershi
P




Q
# DAYS PAST WEEK
USED AIR CENTRAL
HEAT
F01L1
Ventilation System
(AC/Heat)



*
Q
WAS HEATING ON
DURING SAMPLING
PERIOD?
HEAT
Heating Season
*



Q
YARD MATERIAL:
WOOD/DECK
T06I5
Housing Structure
*
*

•

Analysis Criteria







Adjusted R Square


0.391
0.171



Mallows" Prediction
Criterion


8
4



Relative Risk Estimate




NA


% Change in Risk Estimate




NA


% Correct Classification -
High Exposure





0

Nagelkerke R Square





0.109
* Variable included in the final model.
NA: Analysis run, but no variables were selected.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis,
The loading values for the indoor surface dust measurements ranged between 0.5 and 56.1 ug/m2.
The regression analysis for the 6-partition scenario is considered a fair-good fit; the regression analysis
for the 9-partition scenario is considered a poor fit.
•	No predictors were selected for the CHAID analysis,
« The logistic regression analysis is a poor fit for classifying people with high exposure levels.
•	The predictors for this model across the three objectives include soil measurements, household
characteristics, and working conditions.
5-45

-------
Table 5.3.2.3-EDR Selected Predictors of Chromium Loading in Dermal (ug/nr) and Analysis Criteria
Across the Phase 3 Objectives in Arizona (N=134)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition"
Regression
9-partition*
CHAID
»
Logistic
Regression

Selected Predictors















Analysis Criteria







Adjusted R Square







Mallows' Prediction
Criterion







Relative Risk Estimate







% Change in Risk Estimate







% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
a Analysis not run because of high percentage of samples below detection limit.
NALR - Analysis was run, but not finalized because of separation issues.
The loading values for the dermal measurements ranged between 13.58 and 426.9 ug/m2.
No regression analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
*	No CHAID analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
*	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
*	No predictors were selected for this model.
5-46

-------
Table 5.3.2.3-EDT Selected Predictors of Chromium Intake in Total Diet from Duplicate Diet (ug/day) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=120)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
WEIGHT (KILOGRAMS)-
PARTICIPANT
BD5AMD
Participant
Characteristics
*

•

Q
AGE KIDNEY TROUBLE
DIAGNOSED
B21V4
Health Problems
*
*¦


Q
PDAYS BREAKFAST
PREP AT WORK SITE
FD02CPCZ
Food Preparation


ft

Q
PDAYS LUNCH PREP AT
RESTAURANT
FD05BPCZ
Food Preparation


*

Q
PDAYS BREAKFAST
FOOD/BEV NOT
COLLECTED
FD10APCZ
Food Collection




Q
PDAYS SNACK
FOOD/BEV NOT
COLLECTED
FD10DPCZ
Food Intake





Analysis Criteria







Adjusted R Square


0.134
0.051



Mallows' Prediction
Criterion


4
2



Relative Risk Estimate




0,695


% Change in Risk Estimate




30.5


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model,
NALR - Analysis was run, but riot finalized because of separation issues.
The intake values for the duplicate diet measurements ranged between 0.08 and 12.63 ug/day!
•	The regression analysis for the 6-panition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit,
•	The CHAID analysis is a poor-fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
The predictors for this model across the three objectives include diet-related activities and health
condition.
5-47

-------
Table 5.3.2.3-DOS Selected Predictors of Chromium Dose in Urine (ug/g creatinine) and Analysis Criteria
Across the Phase 3 Objectives in Arizona (N=171)
Type: M - Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
DERMAL
CONCENTRATION
CONC109
Dermal
Measurements
*


Nl
Q
DOORS AND WINDOWS
LEFT OPEN
AIA26Z
Doers/Window
Open
*
*


Q
PERFORMED MODERATE
EXERCISE
AIA28Z
Exercise


*

Q
AV. DAILY HOURS
OUTSIDE AT
WORK/SCHOOL
ATEMZ_0
Time Away From
Home


»

Q
SEX OF PARTICIPANT
B02
Participant
Characteristics
*
*
#

0
PAST 6 MOS, COMMUTE
BY OTHER MEANS?
B19G1
Commute
Time/Distance
*



Q
HOUSEHOLD INCOME
B44
Participant
Characteristics




Q
WHAT COUNTY DO YOU
LIVE IN?
CNTY_Z
Location
*

*

Q
PAST WEEK TAKE
CHROMIUM
SUPPLEMENT?
F07C2
Medications/Suppl
ements

*


Q
DUST LEVEL RATING
T04A
Dust Level





Analysis Criteria







Adjusted R Square


0.407
0.086



Mallows' Prediction
Criterion


9
4



Relative Risk Estimate




0.753


% Change in Risk Estimate




24.7


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
N! - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
• The adjusted concentration values for the urine measurements ranged between 0.07 and 40 ug/g
creatinine.
The regression analysis for the 6-partition scenario is considered a good fit; the regression analysis for
the 9-partition scenario is considered a poor Fit. The predictors for the regression analysis seem
reasonable for this model, although a variable relating to tobacco use was expected.
The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
The predictors for this model across the three objectives include dermal measurements, household
characteristics, time away from home and personal activities.
5-48

-------
5.3.2.4 Lead
Table 5.3.2.4-CSF Selected Predictors of Lead Loading 111 Indoor Surface Dust (ug/m2) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=128)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition*
Regression
9-partition*
CH^JD
Logistic
Regression

Selected Predictors






Q
DO YOU HAVE HOUSE
PETS?
B43A
Pets



*
Q
SURROUNDING AREA:
INDUSTRIAL
T06A4
Location/
Characteristics



*
Q
EXT PAINTING
CHALKING/CHIPPING/PE
ELING
T06D
Housing Structure





Analysis Criteria







Adjusted R Square







Mallows' Prediction
Criterion







Relative Risk Estimate







% Change in Risk Estimate







% Correct Classification -
High Exposure





30.769

Nagelkerke R Square





0.278
a Analysis not run because of high percentage of samples below detection limit.
The loading values for the indoor surface dust measurements ranged between 1.90 and 355.8 ug/m\
*	No regression analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
•	No CHA1D analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
The logistic regression analysis is a poor-fair fit for classifying people with high exposure levels.
The predictors for this model across the three objectives include pets and household eharacteritics.
5-49

-------
Table 5.3.2.4-EDT Selected Predictors of Lead Intake in Total Diet from Duplicate Diet (ug/day) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=154)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
SOURCE OF COOKING
WATER?
B26C
Source of Water
•



Q
PDAYS LUNCH EATEN
FD04PCZ
Food Intake
-

•

Q
PDAYS DIET D1FF DUE
TO TRVI /VACATION
FD14APCZ
Diet



*
Q
NO.DAYS REPORTED ON
DIET CAUSE
FD14NDZ
Diet



*
Q
EXT PAINTING
CHALKING/CHIPPING/PE
ELING
T06D
Housing Structure





Analysis Criteria







Adjusted R Square


0.194
0.103



Mallows' Prediction
Criterion


4
2



Relative Risk Estimate




0.393


% Change in Risk Estimate




10.7


% Correct Classification -
High Exposure





6.25

Nagelkerke R Square



|
0:071
* Variable included in the final model.
•	The intake values for the duplicate diet measurements ranged between 0.04 and 9,21 ug/day.
•	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
The logistic regression analysis is a poor fit for classifying people with high exposure levels.
The predictors for this model across the three objectives include diet-related activities, source of water,
and household charcteristics.
5-50

-------
Table S.3.2.4-DOS Selected Predictors of Lead Dose in Blood (ug/dL) and Analysis Criteria Across the
Phase 3 Objectives in Arizona (N-162)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
AT JOBS, CONTACT WITH
UNKNOWN CHEMS?
AC14J12Z
Working
Conditions
*
*


Q
SEX OF PARTICIPANT
B02
Participant
Characteristics


¦ft

Q
AGE INTESTINAL/BOWEL
TROUBLE DIAGNOSED
B21N4
Health Problems
*

ft

Q
WHAT COUNTY DO YOU
LIVE IN?
CNTY_Z
Location


*

Q
# TIMES PAST WEEK
GARDENING
F03E2
Gardening


*

Q
FLOORS IN BUILDING
T01
Housing
Structure/Size
*



Q
TYPES OF FOUNDATION:
SLA.B
T06J1
Housing Structure


*


Analysis Criteria







Adjusted R Square


0.181
0,075



Mallows' Prediction
Criterion


4
2



Relative Risk Estimate




0.541


% Change in Risk Estimate




45.9


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
NALR - Analysis was run, but not finalized because of separation issues.
•	The concentration values for the blood measurements ranged between 0.3 and 18 ug/dL.
*	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9 partition scenario is considered a poor fit.
The CHAID analysis is a poor-fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may be useful.
The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H,
The predictors for this model across the three objectives include household characteristics, health
condition, and working conditions.
5-51

-------
5.3.2.5 Nickel
Table 5.3.2.5-CSF Selected Predictors of Nickel Loading in Indoor Surface Dust (ug/m1) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=128)
Type; M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAJD
Logistic
Regression

Selected Predictors






M
FOUNDATION SOIL
CONCENTRATION
CONC122
Soil
Measurements
*

*
NI
Q
AT-JOBS CONTACT WITH
SAW DUST?
AC14G1Z
Working
Conditions
*



Q
SMOKED
CIGARS/PtPEFULS
AIA16Z
Tobacco



*
Q
MONTH START HEATING
DEVICES
B33A
Heating Season




Q
WHAT COUNTY DO YOU
LIVE IN?
CNTY_Z
Location
*
*


Q
# DAYS PAST WEEK
USED AIR CENTRAL
HEAT
F01L1
Ventilation System
(AC/Heat)




Q
WAS HEATING ON
DURING SAMPLING
PERIOD?
HEAT
Heating Season
*




Analysis Criteria







Adjusted R Square


0.292
0.122



Mallows' Prediction
Criterion


6
3



Relative Risk Estimate




0.875


% Change in Risk Estimate




12.5


% Correct Classification -
High Exposure





0

Nagelkerke R Square





0.132
* Variable included in the final model.
Nl - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
¦ The loading values for the indoor surface dust measurements ranged between 0.7 and 75.3 ug/m2.
The regression analysis for the 6-parlition scenario is considered a poor-fair fit; the regression analysis
for the 9-partition scenario is considered a poor fit,
•	The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
The logistic regression analysis is a poor fit for classifying people with high exposure levels.
*	The predictors for this model across the three objectives include soil measurements, household
characteristics, smoking activities, and working conditions.
5-52

-------
Table 5.3,2.5-EDR Selected Predictors of Nickel Loading in Dermal (ug/nr') and Analysis Criteria Across
the Phase 3 Objectives in Arizona (N=134)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partltion"
Regression
9-partition*
CHAID
¦
Logistic
Regression

Selected Predictors















Analysis Criteria







Adjusted R Square







Mallows' Prediction
Criterion







Relative Risk Estimate







% Change in Risk Estimate







% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Analysis not run because of high percentage of samples below detection limit.
NALR - Analysis was run, but not finalized because of separation issues,
•	The loading values for the dermal measurements ranged between 36.44 and 524.9 ug/nr.
•	No regression analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
No CHAID analysis was run, because more than 50% of the indoor air measurements were below
detection limit.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H,
No predictors were selected for this model.
5-53

-------
Table 5.3.2.5-EDT Selected Predictors of Nickel Intake in Total Diet from Duplicate Diet (ug/day) and
Analysis Criteria Across the Phase 3 Objectives in Arizona (N=149)
Type; M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partltion
Regression
9-partltion
CHAID
Logistic
Regression

Important Variabes






Q
AGE KIDNEY TROUBLE
DIAGNOSED
B21V4
Health Problems
*



Q
PDAYS LUNCH PREP AT
SCHOOL
FDQ5DPCZ
Food Preparation



*
Q
NO.DAYS REPORTED ON
LUNCH COLLECTION
FD1QBNDZ
Food Collection



*
Q
PDAYS DIET D1FF DUE
TO WT CONTROL DIET
FD14BPCZ
Diet



*

Analysis Criteria







Adjusted R Square


0.045
NA



Mallows' Prediction
Criterion


2
NA



Relative Risk Estimate




NA


% Change in Risk Estimate




NA


% Correct Classification -
High Exposure





6.667

Nagelkerke R Square





0.275
* Variable included in the final model.
NA: Analysis run, but no variables were selected.
•	The intake values for the duplicate diet measurements ranged between 0.38 and 16.55 ug/day.
•	The regression analysis for the 6-partition scenario is considered a poor fit; no variables were selected
in at least nine of the partitions.
•	No predictors were selected for the CHAID analysis.
•	The logistic regression analysis is a poor fit for classifying people with high exposure levels.
•	The predictors for this model across the three objectives include diet-related activities and health
condition.
5-54

-------
Table 5.3,2.5-DOS Selected Predictors of Nickel Dose in Urine (ug/g creatinine) and Analysis Criteria
Across the Phase 3 Objectives in Arizona (N=171)
Type; M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
FOOD AND BEVERAGE
INTAKE
CONCENTRATION
CONC130
Diet
Measurements



Nl
Q
TOTAL HRS/WK WORKED
AT HOME, BOTH JOBS
AC14AIZ
Time at Home



*
Q
PERFORMED VIGOROUS
EXERCISE
AIA2TZ
Exercise
ft
*
*

Q
PERFORMED MODERATE
EXERCISE
A1A28Z
Exercise
•



Q
USE TOBACCO
PRODUCTS?
B06A
Tobacco


*
*
Q
# TIMES PAST WEEK
USED SANDER
F02F2
Wood Work



*
Q
PAST WEEK TAKE
DIURETICS
F06A2
Medications/Suppl
ements



¦k
Q
PAST WEEK TAKE
OTHER MEDICINE?
F06E2
Medications/Suppl
amerits
*

*

Q
DUST LEVEL RATING
TQ4A
Dust Level
*



Q
DRIPLINE LOCATION
TD6G1
Dripline



*

Analysis Criteria







Adjusted R Square


0.211
0.108



Mallows' Prediction
Criterion


8.518
3



Relative Risk Estimate




0.829


% Change in Risk Estimate




17,1


% Correct Classification -
High Exposure





27.778

Nageikerke R Square





0.321
* Variable included in the final model.
M - Measurement variable was not Included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis,
« The adjusted concentration values for the urine measurements ranged between 0.48 and 23.53 ug/g.
•	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis is a fair fit for classifying people with high exposure levels.
The predictors for this model across the three objectives include diet measurements, medication,
household characteristics, and smoking and personal activities.
5-55

-------
5,3.3 VOCs
Appendix E describes the sources and human exposure routes for the primary VOCs analyzed in
the Arizona study: Benzene, 1,3-Butadiene, Formaldehyde, and Toluene. (1,3-Butadiene was not
analyzed in this report.)
5.3.3.1 Benzene
Table 5.3.3.1-CIA Selected Predictors of Benzene Concentration in Indoor Air (ug/m3) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=166)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






M
OUTDOOR AIR
CONCENTRATION
CONC311
Air Measurements


*
NI
Q
DOORS AND WINDOWS
LEFT OPEN
AIA26Z
Doors/Window
Open


*

Q
# CIGARETTES/DAY
SMOKED [CATEGORIES]
BQ7A
Tobacco
*



Q
# MINUTES WITH
SMOKER IN ENCL.
VEHICLE
B08C
Tobacco


*

Q
PAST 6 MONTHS WALL
ADDED OR REMOVED
B25B
Housing Structure/
Remodeling
*



Q
DOORWAY FROM
GARAGE TO LIVING
QTRS?
B27C
Garage
Structure/Activity


*

Q
WALL/WINDOW AIR
CONDITIONER?
B29B2
Ventilation System
(AC/Heat)


*

Q
HEATING FUEL -
ELECTRICITY?
B31C
Heating Fuel
Usage




Q
FREQ. OF WOOD/COAL
STOVE USE
B36C
Fireplace/Wood
Stove
*



Q
WHAT COUNTY DO YOU
LIVE IN?
CNTY_Z
Location


*

Q
SURROUNDING AREA:
INDUSTRIAL
T06A4
Location/
Characteristics
*
*


Q
EXTERIOR SIDING-
CONCRETE BLOCK
TQ6C4
Housing Structure
•




Analysis Criteria







Adjusted R Square


0.408
0.22



Mallows' Prediction
Criterion


7
3



Relative Risk Estimate




0.444


% Change in Risk Estimate




55.6


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
NI - Measurement variable was not included in this analysis, however, its selection in the Regression or CHAID Analysis indicates
that it may be a predictor for this analysis.
NALR - Analysis was run, but not finalized because of separation issues.
5-56

-------
•	The concentration values for the indoor air measurements ranged between 0.67 and 2463 ug/m3.
The regression analysis for the 6-partition scenario is considered a good fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
•	The CHAID analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H,
•	The predictors for this model across the three objectives include air measurements, household
characteristics, ventilation system, and smoking activities.
5-57

-------
Table 5.3.3.1-DOS Selected Predictors of Benzene Dose in Blood (ug/L) and Analysis Criteria Across the
Phase 3 Objectives in Arizona (N=112)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
USE TOBACCO
PRODUCTS?
BQ6A
Tobacco
*



Q
# CIGARETTES/DAY
SMOKED [CATEGORIES]
B07A
Tobacco


*

Q
~AYS IN 3-MO. EAT
HOME GROWN CANNED
CROP
B12B
Specific Foods





Analysis Criteria







Adjusted R Square


0.31
NA



Mallows' Prediction
Criterion


3
NA



Relative Risk Estimate




0.687


% Change in Risk Estimate




31.3


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
NA; Analysis run, but no variables were selected.
NALR - Analysis was run, but not finalized because of separation issues.
•	The concentration values for the blood measurements ranged between 0,01 and 0.97 ug/L.
*	The regression analysis for the 6-partition scenario is considered a fair fit; no variables appeared in at
least nine of the partitions. The predictors for the regression analysis seem reasonable for this model
except for B12B,
The CHA1D analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful because of the narrow range of values.
The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
The predictors for this model across the three objectives include specific foods in the diet and activities
involving smoking.
5-58

-------
5.3.3.2 Formaldehyde
Table 5,3,3,2-CIA Selected Predictors of Formaldehyde Concentration in Indoor Air (ug/m3) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=167)
Type; M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAID
Logistic
Regression

Selected Predictors






Q
CLEANED
FIREPLACE,Wood STOVE
AIA06Z
Fireplace/Wood
Stove
*
*
*

Q
SOURCE OF DRINKING
WATER
B26D
Source of Water
•



Q
WALL/WINDOW AIR
CONDITIONER?
B29B2
Ventilation System
(AC/Heat)
*



Q
FREQ, OF
PORT./UNVENTED GAS
HEATER USE
B35C
Heating Fuel
Usage




Q
INDOOR
PESTICIDE,MONTH LAST
USED
B38E
Pesticide Use




Q
DAYS SINCE LAST USED-
INSECTICIDES
F02G3
Pesticide Use
*




Analysis Criteria







Adjusted R Square


0.168
0.042



Mallows' Prediction
Criterion


6
2



Relative Risk Estimate




0.852


% Change in Risk Estimate




14.8


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
NALR - Analysis was run, but not finalized because of separation issues.
The concentration values for the indoor air measurements ranged between 6.14 and 407.7 ug/m3.
•	The regression analysis for the 6-partition scenario is considered a poor fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
•	The CHAID analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
•	The predictors for this model across the three objectives include pesticide use, source of water,
ventilation system, and activities involving fuel.
5-59

-------
5.3.3.3 Toluene
Table 5.3.3.3-CIA Selected Predictors of Toluene Concentration in Indoor Air (ug/m3) and Analysis
Criteria Across the Phase 3 Objectives in Arizona (N=166)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partltion
CHAID
Logistic
Regression

Selected Predictors






Q
DOORS AND WINDOWS
LEFT OPEN
AIA26Z
Doors/Window
Open
•



a
DOORWAY FROM
GARAGE TO LIVING
QTRS?
B27C
Garage
Structure/Activity


*

Q
INDOOR PESTICIDE,
WHO MIXED
B38H
Pesticide Use




Q
PAST WEEK USED AIR
CENTRAL HEAT
F01L
Ventilation System
(AC/Heat)




Q
SURROUNDING AREA:
INDUSTRIAL
T06A4
Location/
Characteristics
*




Analysis Criteria







Adjusted R Square


0.316
0.139



Mallows' Prediction
Criterion


6
2



Relative Risk Estimate




0.538


% Change in Risk Estimate




46.2


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included In the final model.
NALR - Analysis was run, but not finalized because of separation issues.
•	The concentration values for the indoor air measurements ranged between 0.76 and 368 ug/m}.
•	The regression analysis for the 6-partition scenario is considered a fair fit; the regression analysis for
the 9-partition scenario is considered a poor fit. Painling-related variables such as B10A, B". OB, and
B10C were expected.
•	The CHA1D analysis is a fair fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
•	The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
The predictors for this model across the three objectives include pesticide use, household
characteristics, and ventilation system.
5-60

-------
Table 5.3.3.3-DOS Selected Predictors of Toluene Dose in Biood (ug/L) and Analysis Criteria Across the
Phase 3 Objectives in Arizona (N=114)
Type: M = Measurement, Q = Question
Type
Description
Variable
Category
Regression
6-partition
Regression
9-partition
CHAJD
Logistic
Regression

Selected Predictors






Q
IN ENCLOSED GARAGE
WITH CAR
AIA03Z
Garage
Structure/Activity
*



Q
IN VEHICLE WITH
SMOKER
AIA21Z
Tobacco
*
*


Q
USE TOBACCO
PRODUCTS?
806A
Tobacco
*



Q
# CIGARETTES/DAY
SMOKED [CATEGORIES]
B07A
Tobacco
*

*

Q
DAYS PAST MO.
STRIPPING PAINT
(CHEM)?
B10B
Paint Usage





Analysis Criteria







Adjusted R Square


0.424
0.117



Mallows' Prediction
Criterion


6
2



Relative Risk Estimate




0.736


% Change in Risk Estimate




26.4


% Correct Classification -
High Exposure





NALR

Nagelkerke R Square





NALR
* Variable included in the final model.
NALR - Analysis was run, but not finalized because of separation issues.
•	The concentration values for the blood measurements ranged between 0.01 and 2.6 ug/L.
The regression analysis for the 6-partition scenario is considered a good fit; the regression analysis for
the 9-partition scenario is considered a poor fit.
•	The CHA1D analysis is a poor fit for classifying people by their exposure level. Looking at the
distributions within the nodes and the range of the predicted values of the nodes, the differentiation of
levels may not be useful.
¦ The logistic regression analysis could not be finalized because of separation issues. Potential
predictors for this model are included in Appendix H.
•	The predictors for this model across the three objectives include personal activities and activities
involving smoking and paint.
5-61

-------
5.4 Summary of Results
The predictors selected as having strong relationships with the dependent measurement variables
for the Phase 3 objectives in general were reasonable for the conceptual models in which they
were evaluated. Although many of the model-based analyses could not be run because of limited
data, there were some models with fair to good fits and consistencies in the types of questions
selected as predictors. Tables 2,1,2,2, 2.3, and 2.4 give a good high-level comparison of the
predictors selected for each objective by study and chemical class.
In Objective 1, Modeling and Regression Analysis, the combination of the categorical regression
and stepwise regression analyses was a little more successful than the analyses for the other two
objectives in terms of model-based analyses with fair to good fits. The categorical regression
analysis helped strengthen the visibility of the relationships by not requiring the relationship to
be linear as in traditional regression analysis. Although the precise relationships between the
dependent measurement variables and the selected predictors is not shown in this report,
additional review of the transformations in categorical regression can offer a better
understanding for future studies.
In Objective 2, Classifying Individuals by Their Exposure Level, the CHAID analysis was
probably the least successful of the techniques. In most cases, there was not much
differentiation in the dependent measurement values with which the procedure could create
distinct classifications nor was there an adequate number of cases to allow the technique to
mitigate the variability in the variables. In many instances, the categories of predictors selected
from the CHAID analysis were consistent with predictors selected for the other objectives.
Some of the model analyses can identify characteristics of groups with potentially high exposure
levels. Very little can be identified at lower levels of exposure.
In Objective 3, Classifying Individuals with High Exposure Levels, there were a few fair or good
models. The percent correct classification for high-exposure levels in these models might offer a
reasonable first-cut screening assessment. Many of the model-based analyses could not be run
because of unresolved separation issues related to the small number of cases for the high-
exposure level group. The logistic regression analysis was used for a few models where it did
not make sense to perform analyses for the other objectives because of the high percentage of
below detection limit values.
5-62

-------
6 References
Adgate, J. L., D, B. Barr, C. A. Clayton, L. E. Eberly, N. C. G. Freeman, P, J. Lioy, L. L.
Needham, E. D. Pellizzari, J. J. Quackenboss, A. Roy, and K. Sexton. 2001.
Measurement of Children's Exposure to Pesticides; Analysis of Urinary Metabolite
Levels in a Probability-Based Sample. Environ Health Perspect 109:583-590.
Agency for Toxic Substances and Disease Registry (ATSDR). 1997. Toxicological profile for
nickel. Atlanta, Georgia: U.S. Department of Health and Human Services, Public Health
Service.
Agresti, A. 1990. Categorical Data Analysis. New York, NY: John Wiley & Sons, Inc.
Allison, Paul D. 1999. Logistic Regression, Using the SAS System, Theory and Application.
Cary, NC: SAS Institute Inc.
ATSDR. 1988. The nature extent of lead poisoning in children in the United States: a report to
Congress. Atlanta, Georgia: Agency for Toxic Substances and Disease Registry.
ATSDR. 1999. Toxicological profile for lead. CAS# 7439-92-1. Atlanta, Georgia: Agency for
Toxic Substances and Disease Registry.
ATSDR Case Studies in Environmental Medicine Arsenic Toxicity.

ATSDR,	Chromium Toxic Profile, 
ATSDR.	Fact sheet 
ATSDR.	Fact sheet 
ATSDR.	Fact sheet ;
ATSDR.	PCE fact sheet 
ATSDR.	Public Health Statement, July 1989
ATSDR.	Toxic Data Fact Sheet, 1997.
ATSDR. Who's at Risk: Arsenic Case Study.

6-1

-------
Belsley, D. A. 1991, Conditioning Diagnostics. Collinearity and Weak Data in Regression.
New York, NY: John Wiley & Sons, Inc.
Box, G. E. P., and D. R, Cox. 1964. An Analysis of Transformations (with Discussion).
Journal of the Royal Statistical Society, Series B 26(2): 211-52.
Breiman, Leo, Jerome H. Friedman, Richard A. Olshen. Charles J. Stone. 1984. Classifications
and Regression Trees. Boca Raton, Florida: Chapman & Hall/CRC.
Breiman, L., and J. H. Friedman. 1985. Estimating Optimal Transformations for Multiple
Regression Analysis and Correlation. JASA. 80: 580-98.
Buja, A. 1990. Remarks on Functional Canonical Variatcs, Alternating Least Squares Method
and ACE, Ann Statist. 18: 1032-69.
Cadmium Website  (Environment/Section 2.2.1)
California OEHHA. Chronic Toxicity Summary. Reference Exposure Levels Adopted by
OEHHA as of September 2002. See 
California OEHHA. Determination of Noncancer Chronic Reference Exposure Levels,
December 2000
California OEHHA. Draft Nickel Goal. 
California Reference Exposure Levels: Chronic and Acute Toxicity Summaries for Chloroform.
California Reference Exposure Levels: Acute Toxicity Summaries for Chloroform
(citation therein of Calabrese and Kenyon, "Air Toxics and Risk Assessment", Lewis
Publishers, 1991)
California Reference Exposure Levels: Chronic Toxicity Summary for Benzene.
Canadian Occupational Health and Safety Resource.

Clayton, C. A., E. D. Pellizzari, R. W. Whitmore, R. L. Perritt, and J. J. Quackenboss. 1999.
National Human Exposure Assessment Survey (NHEXAS): Distributions and
Associations of Lead, Arsenic and Volatile Organic Compounds in EPA Region 5. J
Expo Anal Environ Epidemiol) 9(5): 381-392.
Cook, M. E., and Morrow, H. 1995. Anthropogenic Sources of Cadmium in Canada. National
Workshop on Cadmium Transport Into Plants, Canadian Network of Toxicology Centers,
Ottawa, Ontario, Canada, June 20-21, 1995.
Cornell University Pesticide Management Program, Arsenic Acid Chemical Fact Sheet. 1986.

6-2

-------
Draper, N. R., H, Smith. 1966. Applied Regression Analysis. New York, NY: John Wiley &
Sons, Inc.
Dusseldorp, E., and J. J. Meulman. 2001. Prediction in Medicine by Integrating Regression
Trees into Regression Analysis with Optimal Scaling. Methods InfMed. 40: 403-9.
Echols, S. L., D. L. Macintosh, K. A. Hammerstrom, and P. B. Ryan. 1999. Temporal
Variability of Microenvironmental Time Budgets in Maryland. Journal of Exposure
Analysis and Environmental Epidemiology 9(5); 502-512.
Elinder, Cad-Gustaf. 1985. "Cadmium: Uses, Occurrence, and Intake." Cadmium and Health: A
Toxicological and Epidemiological Appraisal. Boca Raton, Florida: CRC Press, Inc.
Fernandez M., U.S.G.S. Chemical Contaminants to Tampa Bay.
tbeptech.org/waterbudget/AbstractsPostersBios/fernandez2.pdf
Freeman, N. C. G., P. J. Lioy, E. Pellizzari, H. Zelon, K. Thomas, C. Clayton, and J.
Quackenboss. 1999. Responses to the Region V NHEXAS Time/Activity Diary. J
Expo Anal Environ Epidemiol 9(5): 414-426.
Gifi, A. 1990. A Nonlinear Multivariate Analysis. Chichester, England: John Wiley & Sons,
Inc.
Gilbert, Richard O. 1987. Statistical Methods for Environmental Pollution Monitoring. New
York, NY: Van Nostrand Reinhold.
Godish T. 1991, Indoor Air Pollution Control, Chapter 3. Chelsea, Michigan: Lewis Publishers.
Hand, David J, 1999. Statistics and Data Mining: Intersecting Disciplines. SIGKDD
Explorations. 1(1): 16-9,
Harris, Richard J. 1975. A Primer of Multivariate Statistics. New York, NY: Academic Press.
Hathaway, G. J,, N. H. Proctor, J, P. Hughes, and M. L. Fischman. 1991, Proctor and Hughes'
Chemical Hazards of the Workplace. 3rd ed. New York, NY: Van Nostrand Reinhold.
Hazardous Substances Data Bank: Chloroform. Bcthcsda, MD: National Library of Medicine.
Hastie, T.s R. Tibshirani, and A. Buja. 1994. Flexible Discriminant Analysis by Optimal
Scoring. J ASA. 89: 1255-70.
Helsel, D. R. 1990. Less than obvious: statistical treatment of data below the detection limit.
Environ Sci Technol. 24(12): 1766-1774.
Homung, R. W., and L. D. Reed. 1990. Estimation of average concentration in the presence of
nondectablc values. Appl Ocup Environ Hyg. 5(1): 46-51.
Hosmer, David W., and Stanley Lemeshow. 2000. Applied Logistic Regression, 2nd ed. New
York: Wiley.
6-3

-------
Hotelling, H. 1933. Analysis of a Complex of Statistical Variables into Principal Components.
Journal of Educational Psychology 24; 417-41,498-520.
IARC monopaphs on the evaluation of carcinogenic risk of chemicals to man. Volume 20.
Lyon, France: World Health Organization, International Agency for Research on Cancer
[1979],
International Atomic Energy Agency. Trace and macro elements~Pb Values Assigned in
Anthropogenic Pollution Materials.

Jackson, J. Edward. 1991. A User's Guide To Principal Components, New York, NY: John
Wiley & Sons, Inc.
Jolliffe, I. T. 1972. Discarding Variables in a Principal Component Analysis. I: Artificial Data.
Applied Statistics. 21(2): 160-73.
Jolliffe, I. T. 1986. Principal Component Analysis. New York, NY: Springer-Verlag.
Jolliffe, I. T. 2002. Principal Component Analysis. New York, NY: Springer-Verlag.
Kraskal, J. B. 1965. Analysis of Factorial Experiments by Estimating Monotone
Transformation of the Data. Journal of the Royal Statistical Society, Series B 27: 251 -
63.
Krzanowski, W. J. 1987. Selection of Variables To Preserve Multivariate Data Structure, Using
Principal Components. Applied Statistics. 36(1): 22-33.
Lauwerys, R. R.. 1986. Health Maintenance of Workers Exposed to Cadmium. New York, NY:
The Cadmium Council, Inc.
Lead Development Association International.

Lebowitz, M. D., M. K. O'Rourke, S. Gordon, D. J. Moschandreas, T. Buckley, and M.
Nishioka. 1995. Population-based exposure measurements in Arizona: a phase I field
study in support of the National Human Exposure Assessment Survey. J Expo A nal
Environ Epidemiol 5(3):297-325.
Macintosh DL, Spengler JD, Ozkaynak H, Tsai L, Ryan PB (1996). "Dietary exposures to
selected metals and pesticides." Environ Health Perspect. 104(2):202-9.
Macintosh, D.L., L.L. Necdham, K.A. Hammerstrom, and P. B. Ryan. 1999. A Longitudinal
Investigation of Selected Pesticide Metabolites in Urine. Journal of Exposure Analysis
and Environmental Epidemiology 9(5): 494-501.
Magidson, J. 1993. The CHAID approach to segmentation modeling. In R. Bagozzi (Ed.),
Handbook of Marketing Research. Cambridge, Massachusetts: Blackwell.
6-4

-------
Mallows, C. L. 1973. Comments regarding Mallows'Cp. Technometrics pp. 661-667,
Mansfield, E. R., J. T. Webster, and R. F. Gunst. 1977. An Analytic Variable Selection
Technique for Principal Component Regression. Applied Statistics 26(1): 34-40.
Measurement Group Website, ,
Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage Publications Series:
Quantitative Applications in the Social Sciences, No. 106. Thousand Oaks, California.
Millard, Steven P. 2002. Environmental Stats for S-Plus: User's Manual for Version 2.0,
Second Edition. New York, NY: Springer-Verlag.
Miller, Rupert G. 1981. Simultaneous Statistical Inference. Second Edition. New York, NY:
Springer-Verlag.
Moschandreas, D. J., S. Karuchit, Y. Kim, H. Ari, M, D. Lebowitz, M. K, O* Rourke MK, S.
Gordon, and G. Robertson. On predicting multi-route and multimedia residential
exposure to chlorpyrifos and diazinon. J Expo Anal Environ Epidemiol 2001;
1l(l):56-65,
MSDS from American Sales Corporation.

Nagelkerke, N. J. D. 1991. A Note on a General Definition of the Coefficient of Determination.
Biometrika. 78: 691-2.
Neter, John, Michael H. Kutner, Christopher J. Nachtsheim, William Wasserman. 1996.
Applied Linear Statistical Models, Fourth Edition. Boston, Massachusetts: WCB
McGraw-Hill.
Nieboer, E., and J.O. Odland. 2001. "Toxicological Profile and Related Health Issues: Nickel."
McMaster University, April 2001.
Ontario Ministry of the Environment, .
Organization for Economic Co-operation and Development (OECD). 1994. Risk Reduction
Monograph No. 5: Cadmium. Paris, France: OECD Environment Directorate.
Organization for Economic Co-operation and Development (OECD). 1996. Report From
Session F, "Sources of Cadmium in Waste," Chairman's Report of The Cadmium
Workshop, ENVIMCICHEMIRD(96)1, Stockholm, Sweden, October 1995.
O'Rourke, M. K., P. K. Van de Water, S. Jin, S. P. Rogan, A. D. Weiss, S. M. Gordon, D. M.
Moschandreas, and M. D. Lebowitz. Evaluations of Primary Metals from NHEXAS
Arizona: Distributions and Preliminary Exposures. J Expo Anal Environ Epidemiol
1999; 9(5): 435-445.
6-5

-------
Pellizzari, E. P. Lioy, J. Quackenboss, R. Whitmore, A. Clayton, N. Freeman, J, Waldman, K.
Thomas, C. Rodes, and T. Wilcosky. 1995. Population-based exposure measurements in
EPA region 5: a phase I field study in support of the National Human Exposure
Assessment Survey. J Expo Anal Environ Epidemiol 5(3):327-358.
"Public Health Goal for Chromium in Drinking Water." 1999. California Environmental
Protection Agency, Office of Environmental Health Hazard Assessment.
Quackenboss, J. J., E. E. Pellizzari, P. Shubat, R. W. Whitmore, J. L. Adgate, K. W. Thomas, C.
G. Freeman, C. Stroebel, P. J. Lioy, A. C. Clayton, and K, Sexton. Design Strategy for
Assessing Multi-Pathway Exposure for Children: the Minnesota Children's Pesticide
Exposure Study (MNCPES). J Expos Anal Environ Epidemiol 2000;10:145-158
Ryan, P. B., N. Huet, and D.L. Macintosh. 2000. Longitudinal investigation of exposure to
arsenic, cadmium, and lead in drinking water. Environ Health Persp 108(8):731-5.
Scanlon, K.A., Macintosh, D.L, Hammerstrom, K.A., and Ryan, P.B. 1999. A Longitudinal
Investigation of Solid-Food based Dietary Exposure to Selected Elements. Journal of
Exposure Analysis and Environmental Epidemiology 9(5): 485-493,
Sexton, Ken, Michael A. Callahan, and Elizabeth F. Bryan. 1995a. Estimating Exposure and
Dose to Characterize Health Risks: The Role of Human Tissue Monitoring in Exposure
Assessment. Environmental Health Perspectives Suppl 103(3): 13-30.
Sexton, Ken, David E. Kleffman, and Michael A. Callahan. 1995b, An Introduction to the
National Human Exposure Assessment Survey (NHEXAS) and Related Phase I Field
Studies. Journal of Exposure Analysis and Environmental Epidemiology 5(3): 229-32.
Sexton, Ken, Michael A, Callahan, Elizabeth F. Bryan, Christopher G. Saint, and William P.
Wood. 1995c. Informed Decisions about Protecting and Promoting Public Health:
Rationale for a National Human Exposure Assessment Survey. Journal of Exposure
Analysis and Environmental Epidemiology. 5(3): 233-56.
Sexton, Ken, John L. Adgate, Lynn E. Eberly, C. Andrew Clayton, Roy W. Whitmore, Edo D.
Pellizzarri, Paul J. Lioy, James J. Quackenboss. 2003. Predicting Children's Short-
Term Exposure to Pesticides: Results of a Questionnaire Screening Approach.
Environmental Health Perspectives. 111(1): 123-8.
SPSS. 1998. SPSS White Paper: Optimal Scaling Methods for Multivariate Categorical Data
Analysis. Chicago, Illinois: SPSS, Inc.
SPSS. 1999a. SPSS White Paper: Answer Tree Algorithm Summary. Chicago, Illinois: SPSS,
Inc.
SPSS. 1999b. SPSS White Paper: Data Mining an Introduction. Chicago, Illinois: SPSS. Inc.
SPSS. 2001. Answer Tree 3.0 User'Guide. Chicago, 1L: SPSS, Inc.
SPSS. 2003a. CATREG Algorithm. .
6-6

-------
SPSS. 2003b. Logistic Regression Algorithm.
.
Two Crows Website, .
Two Crows Corporation. 1999. Introduction to Data Mining and Knowledge Discovery. Third
Ed. Potomac, Maryland.
U.S. Department of Labor. .
U.S. Department of Labor.

U.S. Environmental Protection Agency, .
U.S. Environmental Protection Agency (EPA). 1992. Guidelines for Exposure Assessment.
Washington D.C.: EPA/600Z-92/011. FR 57: 22888-22938.
U.S. Environmental Protection Agency, Indoor Environments Division home page.
U.S. Environmental Protection Agency (EPA). 1999, Sociodemographic Data Used for
Identifying Potential Highly Exposed Populations. Washington, D.C.: EPA/600/R-
99/060.
U.S. Environmental Protection Agency Region 3 Risk Based Concentration Table.

U.S. Environmental Protection Agency. 2000. Strategic Plan for the Analysis of the National
Human Exposure Assessment Survey (NHEXAS) Pilot Study Data.

U.S. Geological Survey (USGS). Chemical Modeling and Thermodynamic Data Evaluation of
Major and Trace Elements in Acid Mine Waters and Ground Waters: Hexavalent
chromium in ground water, Mojave Desert, CA. ,
Van Asschc. F. J. 1998. A Stepwise Model to Quantify the Relative Contribution of Different
Environmental Sources to Human Cadmium Exposure. Paper presented at NiCad '98,
Prague, Czech Republic, September 21-22, 1998.
Van de Geer, John P. 1993a. Multivariate Analysis of Categorical Data: Theory. Newbury
Park, California: Sage Publications.
Van de Geer, John P. 1993b. Multivariate Analysis of Categorical Data: Applications.
Newbury Park, California: Sage Publications.
Wietlisbach, V. (1999). Statistical Approaches in the Development of Clinical Practice
Guidelines From Expert Panels: The Case of Laminectomy in Sciatica Patients. Medical
Care 37(8): 785-797.
6-7

-------
WorldBankGroup, 1998. "Arsenic." Pollution Prevention and Abatement Handbook, 1998.

World Health Organization (WHO). 1992. Environmental Health Criteria 134—Cadmium.
International Programme on Chemical Safety (IPCS) Monograph.
World Health Organization (WHO) 2000. Environmental Health Criteria 214—Human Exposure
Assessment.
Xintaras, C. 1992. Impact of Lead-Contaminated Soil on Public Health. Agency for Toxic
Substances and Disease Registry, May 1992.
Young, F. W., J. Dc Leeuw, and Y. Takane. 1976. Regression with Qualitative and Quantitative
Variables: An Alternating Least Squares Method with Optimal Scaling Features.
Psycho me trika 41: 505-28.
6-8

-------
Appendix A
OMB Version of NHEXAS Questionnaires

-------
This page intentionally left blank.

-------
OMB Clearance #:2080-0053
Expires: My 31, 1998
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
DESCRIPTIVE QUESTIONNAIRE
Participant Identification Number
[Place Label Here]
Public reporting burden for this collection of information is estimated to average	hours (or
minutes) per response, and to require	hours recordkeeping. This includes the time for reviewing
instructions, searching existing data sources, gathering and maintaining the data needed, and
completing and reviewing the collection of information. Send comments regarding this burden estimate
or any other aspect of this collection of information, including suggestions for reducing the burden, to
Chief, Information Policy Branch, 2136, U.S. Environmental Protection Agency, 401 M St., S.W.,
Washington, D.C. 20460; and to the Office of Information and Regulatory Affairs, Office of
Management and Budget, Washington, D.C. 20503.
INTERVIEWER/TECHNICIAN ID:
Date Completed:	/	/		July 14, 1995
A-l

-------
TABLE - RECORD OF CALLS

-------
(THIS PAGE WILL CONTAIN THE INFORMATION NECESSARY TO IDENTIFY THE
PARTICIPANT AND WILL BE DESIGNED BY EACH CONSORTIUM TO MEET ITS NEEDS.
THIS IS AN EXAMPLE OF THE INFORMATION THAT WILL BE RECORDED.]
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
DESCRIPTIVE QUESTIONNAIRE
LOCATION DATA (Technician Completed-address/ID label)
State	 County	
Census Tract	Block	
Street Address
City, Zip
GPS Reading: Latitude
Longitude
Apt/Space #
Zip code
INTERVIEWER/TECHNICIAN ID:		Date Completed:
A-3

-------
Hello. I'm (NAME) with (NAME OF CONSORTIA MEMBER). We are conducting a survey in cooperation
with the Environmental Protection Agency on exposures to substances in the environment in and around your
home. You have been selected at random to participate in this survey. We mailed a letter to this address that
explains the importance of your participation. Do you remember receiving this letter? (IF LETTER NOT
RECEIVED, HAND COPY TO RESPONDENT. ALLOW TIME FOR READING. ANSWER ANY
QUESTIONS.)
HOUSEHOLD ELIGIBILITY
Dl. VERIFY ADDRESS ON LABEL. CIRCLE" 1" OR RECORD CORRECTED ADDRESS BELOW.
ADDRESS ON LABEL IS CORRECT	G O T O
QUESTION D2
NEW ADDRESS:	
Street/RFD	Apt. #
City
State
ZIP Code
D2. VERIFY THAT THE RESPONDENT IS A PERMANENT RESIDENT OF THE HOUSEHOLD
(NOT A VISITOR, BABY SITTER, HOUSE SITTER, ETC.), RESIDES WITH THE MEMBERS
OF THE HOUSEHOLD AT LEAST HALF THE YEAR, AND IS AT LEAST 18 YEARS OLD.
IF RESPONDENT IS NOT A RESIDENT OF THE HOUSEHOLD OR IS NOT 18 YEARS OR
OLDER, REQUEST TO SPEAK TO SOMEONE ELIGIBLE TO ANSWER FOR THE
HOUSEHOLD. IF AN ELIGIBLE SCREENING RESPONDENT IS OBTAINED, CIRCLE "1,"
IF NO ELIGIBLE SCREENING RESPONDENT IS AVAILABLE, CIRCLE "2."
ELIGIBLE SCREENING RESPONDENT 	 1 --> CONTINUE
NO ELIGIBLE SCREENING RESPONDENT	 2 --> STOP. ENTER
PENDING
CODE 02 ON
RECORD OF
CALLS AND
THANK
RESPONDENT.
A-4

-------
D3, Is this property your primary residence or is it a vacation home or second home where you live less
than half the year? (CIRCLE ONE.)
PRIMARY RESIDENCE 		1CONTINUE
VACATION/SECOND HOME 		2~>	GOTOD9
D4. Do more than 10 people live at this address? (CIRCLE "Y" OR "N.")
YES 					Y —>	CONTINUE
NO 					N —>	GOTOD9
A-5

-------
HOUSEHOLD ROSTER
D5a. First, I would like to ask a few general questions about you and the other people who live here now.
Just to be sure I account for everyone, please tell me the first names of all the people who currently
live here. Let's begin with the person or persons who own the residence or pay the rent. (ENTER
FIRST NAMES IN COLUMN B OF THE ROSTER. ENTER RELATIONSHIP TO HEAD IF
FIRST NAMES ARE REFUSED.)
I have listed (NAMES). Is there anyone else living here now such as friends, roomers, or other
people we might have overlooked? (IF SO, ADD THEM TO THE ROSTER.)
ASK QUESTION D5b FOR EACH LISTED INDIVIDUAL.
D5b. Is (NAME) a full-time resident of this household, that is a person who lives in the residence year
round except for short periods of time?
ASK QUESTIONS D5c-k FOR EACH LISTED INDIVIDUAL. RECORD RESPONSE IN ROSTER.
D5c. CIRCLE THE SEX ("M" FOR MALE OR "F" FOR FEMALE) OF EACH PERSON IN COLUMN
C. ASK IF NOT OBVIOUS.
D5d. What is (NAME's) year of birth? (ENTER 2 DIGITS IN COLUMN D.)
D5e. What is (NAME's) race? (READ CHOICES AND CIRCLE ONE NUMBER IN COLUMN E.)
YES
NO .
. Y—> CONTINUE
. N—> DELETE
FROM
ROSTER AND
CONTINUE
WITH NEXT
NAME.
White 		
Black or African-American .
American Indian	
Eskimo or Aleut 	
Asian or Pacific Islander . ..
Some other race (Specify:	
2
3
4
5
6
DON'T KNOW	
REFUSED 		
. ... DK
. ... RE
A-6

-------
D5f. Is (NAME) of Hispanic or Spanish origin? (CIRCLE RESPONSE IN COLUMN F.)
YES 					 Y
NO	 N
DON'T KNOW					DK
REFUSED							 RE
D5g. How much school has (NAME) completed? (READ CHOICES AND CIRCLE ONE NUMBER IN
COLUMN G FOR THE HIGHEST LEVEL COMPLETED OR DEGREE RECEIVED. IF
CURRENTLY ENROLLED, CIRCLE THE LEVEL OF THE PREVIOUS GRADE ATTENDED
OR HIGHEST DEGREE RECEIVED.)
No schooling completed or
kindergarten only		1
Primary or middle school
(Grade 1 through 8)		2
Some high school (Grade 9 through 11)		3
High school graduate (Grade 12 or GED) 		4
Some college or technical school		5
College graduate		6
Some post-college		7
DON'T KNOW		DK
D5h. Does (NAME) smoke tobacco products? (CIRCLE RESPONSE IN COLUMN H.)
YES	 Y —> CONTINUE
NO	 N —> GO TO D5j
DON'T KNOW	 DK—> GO TO D5j
D5i. Does (NAME) smoke inside the house? (CIRCLE RESPONSE IN COLUMN I.)
YES 			 Y
NO 	 N
DON'T KNOW	 DK
A-7

-------
D5j. Does (NAME) work outside the home? (CIRCLE RESPONSE IN COLUMN?)
YES 							 Y
NO 	 N
DON'T KNOW	 DK
D5k. Does (NAME) attend school or daycare outside the home? (CIRCLE RESPONSE IN COLUMN K)
YES 					 Y
NO 				 N
DON'T KNOW . 		 DK
A-8

-------
HOUSE CHARACTERISTICS
D6. I would now like to ask you a few questions about your home. Is your home... (READ CHOICES
AND CIRCLE ONE. INCLUDE ALL APARTMENTS, FLATS, ETC., EVEN IF VACANT.)
A mobile home or trailer		1
A one-family house detached from any
other house		2
A one-family house attached to one or
more houses		3
A building with 2 apartments		4
A building with 3 or 4 apartments 	 5
A building with 5 to 9 apartments		6
A building with 10 to 19 apartments		7
A building with 20 to 49 apartments	 8
A building with 50 or more apartments 	 9
Other (Specify;	) 	 10
D7. How many rooms are there in this house or apartment? Do NOT count bathrooms, porches,
balconies, foyers, or halls.
Rooms
D8. Is this house or apartment... (READ CHOICES AND CIRCLE ONE.)
Owned by you or someone in this household with
a mortgage or loan?				1
Owned by you or someone in this household free
and clear (without a mortgage)? 		2
Rented for cash rent? 				3
Occupied without payment of cash rent? 		4
DON'T KNOW . 					DK
A-9

-------
RESPONDENT SELECTION
D9. a. WHAT IS THE ROSTER LINE NUMBER OF THE SELECTED PARTICIPANT? ENTER
"00" IF NO ONE IS SELECTED AND GO TO QUESTION D10.
b. OBTAIN FULL NAME OF SELECTED PARTICIPANT.
[NOTE: INSTRUCTIONS SPECIFIC TO EACH CONSORTIUM FOR OBTAINING
INFORMED CONSENT, SETTING UP APPOINTMENTS AND COMPLETING THE
BASELINE QUESTIONNAIRE WILL BE ENTERED HERE.]
FULL NAME OF PARTICIPANT 	
IF PARTICIPANT IS UNDER 18, OBTAIN FULL NAME OF PARTICIPANT'S
GUARDIAN.
FULL NAME OF GUARDIAN 	
D10. IF NO ONE SELECTED: My supervisor needs to call some of the people I talk with in order to
verify my work.
IF PARTICIPANT SELECTED: We may need to call you to verify the appointments.
a. Do you have a telephone in this house or apartment?
YES	
NO	
DON'T KNOW
REFUSED
Y ~> CONTINUE
N --> GO TO DIOc
DK --> GO TO D 10c
RE --> STOP. ENTER
FINAL
RESULT CODE
AND THANK
RESPONDENT.
b. What is the telephone number, starting with the area code?
--> GOTOD11
A-10

-------
c.	Is there a telephone on which you can receive calls?
YES	 Y —> CONTINUE
STOP. ENTER
NO	 N —> F I N A L
RESULT
REFUSED			 RE —> CODE AND
THANK
RESPONDENT
d.	What is the telephone number, starting with the area code?
(	)-	-		 --> GOTOD11
D11, When would be a good time to call you?
ENTER FINAL RESULT CODE AND THANK RESPONDENT.
A-ll

-------
HOUSEHOLD ROSTER
A
B
C
D
E
F
G
H
I
J
K
Roster
#
Name/Relation
to Head
Sex
Year
of
Birth
Race
Hispanic?
School
Completed?
Smoke?
Smoke
Inside
Work
Outside
Home?
School/
Daycare
Outside
Home?
01

M F

1 2 3 4 5 6 DK RE
Y N DK RE
1 2 3 4 5 6 7 DK
Y N DK
YNDK
YNDK
Y N DK
02

M F

1 2 3 4 5 6 DK RE
Y N DK RE
1 2 3 4 5 6 7 DK
Y N DK
YNDK
YNDK
Y N DK
03

M F

1 2 3 4 5 6 DK RE
Y N DK RE
1 2 3 4 5 6 7 DK
Y N DK
YNDK
YNDK
Y N DK
04

M F

1 2 3 4 5 6 DK RE
Y N DK RE
12 3 4 5 6 7 DK
YN DK
YNDK
YNDK
Y N DK
05

M F

1 234 5 6 DK RE
Y N DK RE
1 2 3 4 5 6 7 DK
Y N DK
YNDK
YNDK
Y N DK
06

M F

1 2 3 4 5 6 DK RE
Y N DK RE
1 2 3 4 5 6 7 DK
YN DK
YNDK
YNDK
Y N DK.
07

M F

1 2 3 4 5 6 DK RE
Y N DK RE
1 2 3 4 5 6 7 DK
YNDK
YNDK
YNDK
Y N DK
08

M F

1 2 3 4 5 6 DK RE
YN DK RE
1 2 3 4 5 6 7 DK
YN DK
Y N DK
YNDK
Y N DK
09

M F

i 2 3 4 5 6 DK RE
Y N DK RE
1 234567 DK
YNDK
YNDK
YNDK
Y N DK
10

M F

1 2 3 4 5 6 DK RE
Y N DK RE
1 2 3 4 5 6 7 DK
YNDK
YNDK
YNDK
Y N DK

































A-12

-------
OMB Clearance #: 2080-0053
Expires: July 31,1998
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
BASELINE QUESTIONNAIRE
Participant Identification Number
[Place Label Here]
Public reporting burden for this collection of information is estimated to average 	hours (or
minutes) per response, and to require	hours recordkeeping. This includes the time for reviewing
instructions, searching existing data sources, gathering and maintaining the data needed, and
completing and reviewing the collection of information. Send comments regarding this burden estimate
or any other aspect of this collection of information, including suggestions for reducing the burden, to
Chief, Information Policy Branch, 2136, U.S. Environmental Protection Agency, 401 M St., S.W.,
Washington, D.C. 20460; and to the Office of Information and Regulatory Affairs, Office of
Management and Budget, Washington, D.C. 20503.
INTERVIEWER/TECHNICIAN ID:	
Date Completed:	/	/		July 14, 1995
A-13

-------
[THIS PAGE WILL CONTAIN THE INFORMATION NECESSARY TO IDENTIFY
THE PARTICIPANT AND WILL BE DESIGNED BY EACH CONSORTIUM TO MEET
ITS NEEDS. THIS IS AN EXAMPLE OF THE INFORMATION THAT WILL BE
RECORDED.]
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
BASELINE QUESTIONNAIRE
DESIGNATED PARTICIPANT
(If the participant is less than 10 years old, what is the name of the individual who is
providing the answers for the designated respondent?)
Name of Participant	
Completed by	(if other than participant)
Relation to participant	
Home Phone	 Date: / /
LOCATION DATA (Technician Completed—addrcss/ID label)

State Countv


Census Tract Block


Street Address
/


Apt/Space #

Citv. Zip
/


Zip code

INTERVIEWER/TECHNICIAN ID:
Date Completed: /
/



A-14

-------
DEMOGRAPHICS
These first questions ask about (you/this child). REMIND PARENT/GUARDIAN TO RESPOND FOR
CHILD.
B1, What is the highest level of school (you have/this child has) completed? (READ CHOICES
AND CIRCLE ONE.) IF CURRENTLY ENROLLED, MARK THE LEVEL OF PREVIOUS
GRADE ATTENDED OR HIGHEST DEGREE RECEIVED.
No school completed or Kindergarten only 		1
Primary or middle school (Grade 1-8) 					2
Some high school (Grade 9-11) 		3
High school graduate (Grade 12 or GED) 			4
Some college or technical school		5
College graduate . 							6
Some post college 		7
B2. CIRCLE SEX OF PARTICIPANT.
MALE	 1
FEMALE 		 		 2
B3. What is (your/his/her) date of birth? 	/	/__
Month Day Year
B4. How tall (are you/is he/she) without shoes? 	ft 	inches
B5. How much (do you/does he/she) weigh? 	pounds
ASK QUESTIONS B6 AND B7 ONLY IF RESPONDENT IS 10 YEARS OLD OR MORE. IF
RESPONDENT IS LESS THAN 10, GO TO B8.
Bfia. (Do vou/Dogs he/she) currently smoke tobacco products or use smokeless tobacco products?
(CIRCLE "Y" OR "N.")
YES 						Y --> GO TO B7a
NO					 . . N --> CONTINUE
DON'T KNOW	DK --> CONTINUE
B6b. (Have you/1 las he/she) ever smoked tobacco products or used smokeless tobacco products?
/TTUPT "R »v'f AP "xr*n
i v^Jlv IN J
YES	Y -> CONTINUE
NO	N --> GO TO B8
DON'T KNOW					DK --> GO TO B8
A-15

-------
B6c. How long ago did (you/he/she) stop using tobacco products ? (ENTER NUMBER OR "DK FOR
DON'T KNOW.)		 ~> GO TO B8
B7a, On average, how many cigarettes (do you/does he/she) smoke per day! (READ CHOICES AND
CIRCLE ONE.)
None		1
Less than Vi pack						2
Vz pack or more, but less than 1 pack	3
1 pack or more, but less than 1 Vi packs 4
1	/i packs or more, but less than 2 packs 5
2	or more packs				6
Occasional (social smoker) 				7
DON'T KNOW					D
B7b. On average, how manv cigars (do you/does he/she) smoke per day? (ENTER NUMBER OR
"DK FOR DON'T KNOW) 	
B7e. On average, how many pipesful of tobacco (do you/does he/she) smoke per day? (ENTER
NUMBER OR "DK FOR DON'T KNOW) 	
B7d. On average, how many times per day (do you/does he/she) use smokeless tobacco products?
(ENTER NUMBER OR "DK FOR DON'T KNOW) 	
A-16

-------
PERSONAL EXPOSURE ACTIVITIES
These next few questions are about things that happen at your home, on the job, or in school, and food
(you/he/she) eat(s) that might put (you/hira/her) in touch with the chemicals we are studying. Some of
these questions ask about different periods of time. Some ask about the past month, some ask about the
past 3 months, and some ask about the past 6 months. In order to help make these time periods clear,
please think about something (you/he/she) did or which happened to (you/him/her) about 1 month ago, 3
months ago, and 6 months ago. For example, finished school, got married, had a baby. Please tell me
what each event was so that 1 can use them later. (RECORD EVENTS HERE AND USE AS NEEDED
DURING INTERVIEW.)
1 MONTH EVENT: 	
3 MONTH EVENT:
6 MONTH EVENT:
B8. On average for the past month, how many (hours/minutes) per week did (you/he/she) spend....?
(IF LESS THAN 1 HOUR, ROUND TO THE NEAREST QUARTER HOUR; IF BETWEEN 1
HOUR AND 10 HOURS, ROUND TO THE NEAREST HOUR; IF GREATER THAN 10
HOURS, ROUND TO THE NEAREST 10 HOURS; e.g., 10, 20, 30,40, 50 HOURS. ENTER
NUMBER AND CIRCLE MINUTES OR HOURS.)
a.	Inside (your/his/her) home with someone who was smoking tobacco?	(min/hr)
b.	At work with someone who was smoking tobacco?	(min/hr)
c.	In a car, bus, van, or other enclosed vehicle with someone who was smoking tobacco?
	(min/hr)
d.	In any other indoor or enclosed location with someone who was smoking tobacco?
	(min/hr)
B9a During the past month, has anyone, including you, smoked inside your home? (CIRCLE ONE.)
YES	Y --> CONTINUE
NO	N —>GO TO BlOa
DON'T KNOW . 				DK --> GO TO B 10a
B9b. During the past month, how many people, including visitors, smoked tobacco inside your home.
(ENTER NUMBER OR "DK" FOR DON'T KNOW.) 	
A-17

-------
BlOa. On average for the past month, on how many days did (you/he/she) paint walls, furniture, cars or
other objects? (READ CHOICE AND CIRCLE ONE.)
Never			1
1 -3 days per month	2
1 -2 days per week	3
3-6 days per week					 4
Daily			5
DON'T KNOW	.	DK
B1 Ob. On average for the past month, on how many days did (you/he/she) use chemical paint strippers
to remove paint? (READ CHOICE AND CIRCLE ONE.)
Never	1
1-3 days per month					2
1-2 days per week	3
3-6 days per week 	4
Daily	5
DON'T KNOW				DK
BlOc. On average for the past month, on how many days did (you/he/she) remove paint by other
methods such as scraping, heat gun or sanding? (READ CHOICE AND CIRCLE ONE.)
Never						 1
1-3 days per month 				2
1-2 days per week							 3
3-6 days per week 			4
Daily	5
DON'T KNOW				DK
B1 la. During the past three months, on how many days (did you/did he/she) use lead solder to solder
pipes, do electronic repairs, or join pieces of stained glass? (READ CHOICES AND CIRCLE
ONE.)
Never	1
1-2 days	2
1 -3 days per month							 3
1 -2 days per week	4
3-7 days per week			5
DON'T KNOW 						DK
B1 lb. During the past three months, on how many days (did you/did he/she) use lead-based oil paint to
paint pictures or jewelry? (READ CHOICES AND CIRCLE ONE.)
Never	1
1 -2 days			2
1-3 days per month			3
1-2 days per week	4
3-7 days per week 				5
DON'T KNOW			DK
A-18

-------
B11 c. During the past three months, on how many days (did you/did he/she) mold lead into fishing
sinkers, bullets, or other objects? (READ CHOICES AND CIRCLE ONE.)
Never .,, 					1
1-2 days 		2
1-3 days per month		3
1-2 days per week	4
3-7 days per week		 5
DON'T KNOW 				DK
B 12a, During the past three months, on how many days (did you/did he/she) eat fresh fruits or
vegetables grown at your home? (READ CHOICES AND CIRCLE ONE.)
Never					1
1-2 days	2
1-3 days per month			3
1 -2 days per week		 4
3-7 days per week 							5
DON'T KNOW	DK
B 12b. During the past three months, on how many days (did you/did he/she) eat canned or preserved
fruits or vegetables that were grown at your home? (READ CHOICES AND CIRCLE ONE.)
Never	1
1-2 days	2
1-3 days per month			3
1-2 days per week		 4
3-7 days per week			5
DON'T KNOW			DK
B13. Do you currently work full time or part time at any location away from your home? (CIRCLE
"Y" OR "N." INCLUDE WORKING FOR OTHERS, SELF-EMPLOYED, AND VOLUNTEER
WORK. INCLUDE THOSE WHO WORK OUT OF A HOME OFFICE IF THEY WORK
PART OF THE TIME AWAY FROM HOME.)
YES	 Y —> CONTINUE
NO			 N —> GO TO B 17a
B14a. On average for the past month, how many hours per week did (you/he/she) work at (your/his/her)
primary job? (INCLUDE WEEKS WHERE TIME WAS TAKEN OFF FOR VACATION,
SICKNESS, ETC. IF LESS THAN 10 HOURS, ROUND TO THE NEAREST HOUR; IF
GREATER THAN 10 HOURS, ROUND TO THE NEAREST 10 HOURS; e.g., 10, 20, 30,40,
50 HOURS).
	hours/week
i. On average, how many of these hours were spent working at home?
hours/week
A-19

-------
B14b. What kind of business or industry is this? (For example, manufacturing, retail store,
government, farm, school.)
B14e. What is (your/his/her) job title? (For example, electrical engineer, stock clerk, typist, farmer.)
B14d. What activities (do you/does he/she) perform most often as part of (your/his/her) duties at that
job? (For example, typing, keeping account books, filing, selling cars, operating printing press,
finishing concrete.)
B14e. (Do you/Does he/she) wear protective clothing while at (your/his/her) primary job? (CIRCLE
"Y" OR "N.")
YES 				Y —> CONTINUE
NO			N —> GO TO B 14g
B 14f, Which types of protective clothing (do you/does he/she) wear while at (your/his/her) primary
job? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Gloves ,. 				1
Overalls 	2
Overcoat (e.g. lab coat; smock) 			3
Respirator			4
Other (Specify:	) 	5
DON'T KNOW			DK
B14g. While at (your/his/her) primary job, (do you/does he/she) come into contact at least once a week
with? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Saw dust 		1
Road dust 				2
Fiberglass 		3
Silica (sand blasting)		4
Mine dust 						5
Surface dust in office, classroom, store 		6
Other known type of dust
(Specify:	) 		7
Unknown type of dust		8
No contact with dust 			9
A-20

-------
B 14h. While at (your/his/her) primary job, (do you/does he/she) come into contact
with? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
at least once a week
Welding fumes 		1
Solder or flux fumes		2
Plastic fumes		3
Paint fumes (include varnish, shellac, etc.)		4
Gasoline or diesel fumes		5
Other known type of fumes, smoke, gas, or vapors
(Specify:	) 		6
Unknown type of fames, smoke, gas, or vapors		7
No contact with fumes, smoke, gas, or vapors		8
B 14i. (Do you/Does he/she) come into contact with chemicals used to kill insects, rodents, or weeds at
least once a week while at (your/his/her) primary job? (CIRCLE "Y" OR "N.")
YES	Y —> CONTINUE
NO.....			N —> GO TOB15
B14j. If yes, with which types of chemicals (do you/does he/she) come into contact while at
(your/his/her) primary job? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Raid/Black Flag 				1
Insect Repellents				2
Chlorpyrifos, Dursban, Lorsban, Triehlorpyrphos
Pyrinex, Dowco 179, Brodari		3
Malathion, Cythion, Chemathion, Malaspray
Zithiol		4
Diazinon; D-100; D-500 					5
Carbaiyl, Sevin, Tricarnas, UC 7744 		6
Other known termiticides
(Specify;	)		7
Other known pesticides/insecticides
(Specify:	)		8
Atrazine, Aatrex, Vectal SC, Atratol,
Gesaprim, Prsmatol A		9
Other known herbicides
(Specify:	;	)		10
Fungicides					11
Unknown type of pesticide, insecticide,
herbicide, or fungicide 		DK
B15. Do you have a second job? (CIRCLE "Y" OR "N.")
YES			Y-> CONTINUE
NO	N --> GO TO B17a
A-21

-------
B 16a. On average for the past month, how many hours per week did (you he/she) work at
(your/his/her) second job? (INCLUDE WEEKS WHERE TIME WAS TAKEN OFF FOR
VACATION, SICKNESS, ETC. IF LESS THAN 10 HOURS, ROUND TO THE NEAREST
HOUR; IF GREATER THAN 10 HOURS, ROUND TO THE NEAREST 10 HOURS; e.g., 10,
20, 30, 40, 50 HOURS).
	hours/week
i. On average, how many of these hours were spent working at home?
hours/week
B16b. What kind of business or industry is this? (For example, manufacturing, retail store, government,
farm, school.)
B16c. What is (your/his/her) job title? (For example, electrical engineer, stock clerk, typist, farmer.)
B16d. What activities (do you/does he/she) perform most often as part of (your/his/her) duties at that
job? (For example, typing, keeping account books, filing, selling cars, operating printing press,
finishing concrete.)
B 16e, (Do you/Does he/she) regularly wear protective clothing while at (your/his/her) second job?
(CIRCLE "Y" OR "N.")
YES 		'			Y --> CONTINUE
NO	N —> GO TO B16g
B 16f. Which types of protective clothing (do you/does he/she) wear while at (your/his/her) second job?
(READ CHOICES AND CIRCLE ALL THAT APPLY.)
Gloves 	1
Overalls 	2
Overcoat (e.g., lab coat; smock)			3
Respirator		 4
Other (Specify:	) 	5
Don't know 	DK
A-22

-------
B16g. While at (your/his/her) second job, (do you/does he/she) come into contact at least once a week
with? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Saw dust 				1
Road dust 						2
Fiberglass 			3
Silica (sand blasting)		4
Mine dust 						5
Surface dust in office, classroom, store 		6
Other known type of dust
(Specify:					 )	7
Unknown type of dust				8
No contact with dust		9
B16h. While at (your/his/her) second job, (do you/does he/she) come into contact at least once a week
with? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Welding fumes 		1
Solder or flux fumes		2
Plastic fumes				3
Paint fames (include varnish, shellac, etc.)		4
Gasoline or diesel fumes		5
Other known type of fumes, smoke, gas, or vapors
(Specify:	) 		6
Unknown type of fames, smoke, gas, or vapors			7
No contact with fumes, smoke, gas, or vapors		8
B16i. (Do you/Does he/she)come into contact with chemicals used to kill pests, rodents, or weeds at
least once a week while at (your/his/her) second job? (CIRCLE "Y" OR "N.")
YES		 Y --> CONTINUE
NO			N --> GO TO B17a
A-23

-------
B16j,	If yes, with which types of chemicals (do you/does he/she) come into contact while at
(your/his/her) second job? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Raid/Black Flag 			1
Off.	.....2
Chlorpyrifos, Dursban, Lorsban, Trichlorpyrphos
Pyrinex, Dowco 179, Brodan	 3
Malathion, Cythion, Chemathion, Malaspray
Zithiol 		 4
Diazinon;D-1GQ;D-500 		 5
Carbaryl, Sevin, Tricamas, UC 7744 			 6
Other known termiticides
(Specify:	)		 7
Other known pesticides/insecticides
(Specify:	)	 8
Atrazine, Aatrex, Vectal SC, Atratol,
Gesaprim, Primatol A	 9
Other known herbicides
(Specify:	)	 10
Fungicides	 11
Unknown type of pesticide, insecticide,
herbicide, or fungicide 	 DK.
B17a. Do you attend classes as a student at any location away from your home? (CIRCLE "Y" OR "N."
INCLUDE ELEMENTARY AND SECONDARY SCHOOLS, COLLEGES AND
UNIVERSITIES, BUSINESS SCHOOL, TRADE AND VOCATIONAL SCHOOLS.)
YES	Y -> CONTINUE
NO					N —> GO TO B18
B17b. On average for the past month, how many hours per week did (you/he/she) attend classes as a
student? (INCLUDE WEEKS WHERE TIME WAS TAKEN OFF FOR VACATION,
SICKNESS, ETC. IF LESS THAN 10 HOURS, ROUND TO THE NEAREST HOUR; IF
GREATER THAN 10 HOURS, ROUND TO THE NEAREST 10 HOURS; e.g., 10, 20, 30,40,
50 HOURS).
hours/week
B18. FOR CHILDREN LESS THAN 6 YEARS OF AGE, CONTINUE WITH QUESTION B18a.
OTHERWISE GO TO QUESTION B19.
18a. On average how many hours per week does (he/she) spend away from the home, for
example, at daycare, in a preschool, or at a neighbor's house? (IF LESS THAN 10
HOURS, ROUND TO THE NEAREST HOUR; IF GREATER THAN 10 HOURS,
ROUND TO THE NEAREST 10 HOURS; e.g., 10, 20, 30,40, 50 HOURS).
	hours/week IF ZERO, GO TO B19
A-24

-------
18b. Where does (he/she) spend this time away from home? (READ CHOICES AND
CIRCLE ALL THAT APPLY.)
Another home 					1
Daycare center, nursery school, or preschool		2
Other school 									3
Other (Specify)	 		4
B19. What methods of transportation did (you/he/she) use to go to work, school, or daycare in the past
six months? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Car, truck, van, or taxi cab 				 1
Bus, trolley bus, or trolley car			 2
Train, subway or elevated train		3
Motorcycle 				4
Bicycle 				5
Walk				6
Other method (Specify:	) 		7
A-25

-------
HEALTH STATUS
B20. Overall, how would you describe (your/his/her) current health? (READ CHOICES AND
CIRCLE ONE.)
Good			 1
Fair 			 2
Poor			 3
B21,
(Have you/Has he/she) ever had any of the following? (READ CHOICES AND CIRCLE "Y"
OR "N" . IF YES, PLEASE ASK REST OF QUESTION. IF PARTICIPANT IS UNCERTAIN,
CIRCLE "N" AND CONTINUE WITH THE NEXT CONDITION.)
Diabetes?
Y-->
Were you told
(you/he/she)
had this by a
doctor or nurse?
Y —>
(Do you/
Does he/she)
have it now?
Y —>


N
N
N
b
Neuromuscular disease,
Y-->
Y —>
Y —>

such as Polio, Multiple Sclerosis,
N
N
N

Muscular Dystrophy?



c
Asthma, allergies?
Y—>
Y —>
Y -->


N
N
N
d
Ulcer?
Y—>
Y —>
Y —>


N
N
N
e
Any disease of the esophagus?
Y —>
Y —>
Y -->


N
N
N
f
Gastritis?
Y—>
Y -->
Y -->


N
N
N
g
FREQUENT indigestion?
Y—>
Y —>
Y —>


N
N
N
h
Any other stomach trouble?
Y—>
Y —>
Y —>


N
N
N

IF YES, PLEASE SPECIFY:



i
Enteritis?
Y-->
Y -->
Y -->


N
N
N
j
Diverticulitis?
Y—>
Y —>
Y -->

N
N
N
k
Colitis?
Y~>
Y -->
Y -->


N
N
N
1
A spastic colon?
Y-->
Y -->
Y -->

N
N
N
How old (were
you/was he/she)
when the doctor
or nurse first
told you?
A-26

-------
m FREQUENT constipation?
Y—>
N
Y -->
N
Y —>
N
n
Any other intestinal or bowel
Y—>
Y —>
Y —>

trouble?
N
N
N

IF YES, PLEASE SPECIFY:



0
Cirrhosis of the liver?
Y -->
Y —>
Y -->


N
N
N
P
Fatty liver?
Y—>
Y —>
Y -->


N
N
N
q
Hepatitis?
Y—>
Y —>
Y -->


N
N
N
r
Yellow jaundice?
Y—>
Y~>
Y-->


N
N
N
s
Any other liver trouble?
Y-->
Y —>
Y —>

N
N
N

IF YES, PLEASE SPECIFY:



t
Nephritis?
Y-->
Y —>
Y —>


N
N
N
u
Kidney stones?'
Y-->
Y -->
Y -->


N
N
N
V
Any other kidney trouble?
Y—>
Y —>
Y —>


N
N
N
IF YES, PLEASE SPECIFY:
w Any disease requiring
chemotherapy?
Y—> How long ago did (you/he/she) last have
N	chemotherapy?		
IF YES, PLEASE SPECIFY DISEASE:
A-27

-------
BASIC HOUSING CHARACTERISTICS
These next questions are about your (house/apartment). Please feel free to ask another member of your
household for assistance if necessary.
B22. Is this property actively used as a farm or ranch? (CIRCLE "Y" OR "N.")
YES 			Y
NO 			 N
B23. About when was this building first built? (READ CHOICES AND CIRCLE ONE.)
1990 TO PRESENT 	1
1985 TO 1989 	2
1980 TO 1984 		3
1970 TO 1979 	4
1960 TO 1969 	5
1950 TO 1959 		6
1940 TO 1949 	7
1939	OR EARLIER 			8
DON'T KNOW	DK
B24. When did (you/he/she) move into this (house/apartment)? (READ CHOICES AND CIRCLE
ONE.)
1990 TO PRESENT 	1
1985 TO 1989 			2
1980 TO 1984 	3
1970 TO 1979 			4
1960 TO 1969 	5
1950 TO 1959 				 6
1940	TO 1949 			7
1939 OR EARLIER 	8
DON'T KNOW			 . DK
B25. In the last six months, have any of the following been performed in this home?	(CIRCLE "Y"
OR "N.")
YES	NO
Adding a room 			 Y	N
Putting up or taking down a wall 	 Y	N
Replacing windows	 Y	N
Refinishing floors	 Y	N
Exterior painting , 		 Y	N
Interior painting . 		 Y	N
B26a. Does this (house/apartment) have running water? (CIRCLE "Y" OR "N.")
A-2 8

-------
YES							Y --> CONTINUE
NO		N -> GO TO B26c
B26b. What is the source of the running water in your house/apartment? (READ CHOICES AND
CIRCLE ALL THAT APPLY.)
Public or commercial water system 	 1
NAME	
Private well	 2
Cistern							 3
Some other source		 4
DON'T KNOW	DK
B26e. Which water source is used most often (more than half the time) for cooking? (READ
CHOICES AND CIRCLE ONE.)
Tap water 						1
Bottled water	 2
Some other source 			 3
DON'T KNOW	 DK
B26d. Which water source is used most often (more than half the time) for drinking? (READ
CHOICES AND CIRCLE ONE.)
Tap water 	
Bottled water ....
Some other source
1
2
3
DON'T KNOW	 DK
B26e, Do you use any of the following to treat your water at home? (CIRCLE "Y" or "N" FOR EACH
TREATMENT TYPE OR "DK" FOR DON'T KNOW.)
DON'T


YES
NO
KNOW
i.
Water Softener 			
	 Y
N
DK
ii.
Charcoal Filter 		
	 Y
N
DK
iii.
Reverse Osmosis	
	 Y
N
DK
iv.
Distillation 	
	 Y
N
DK.
v.
Other (Specify:
) ... Y
N
DK
B27a. Is there an enclosed garage attached to this (house/apartment)? (CIRCLE "Y" OR "N.")
YES			 Y --> CONTINUE
NO	 N -> GO TO B28
B27b. Where is the attached garage? (READ CHOICES AND CIRCLE ONE.)
Underneath the main living quarters		1
Same level as the main living quarters 			2
Somewhere else; Specify:	 		3
A-29

-------
B27c.
Is there a doorway leading directly from the garage into the living quarters? (CIRCLE "Y" OR
"N.")
YES 			 Y
NO					N
B27d. Are automobiles, vans, trucks or other motor vehicles parked in this attached garage? (CIRCLE
"Y" OR "N".)
YES	 Y
NO			 N
B28, Are any gas powered devices stored in any room, basement, or attached garage in this
(house/apartment)? (CIRCLE ONE. DO NOT INCLUDE CARS. VANS. OR TRUCKS. DO
INCLUDE MOTORCYCLES, GAS- POWERED LAWN MOWERS, TRIMMERS OR
BLOWERS, BOAT ENGINES, ETC.)
YES	 Y
NO	 N
DON'T KNOW 	DK
B29a. Is air conditioning (refrigeration) used to cool this (house/apartment)? (CIRCLE "Y" OR "N,")
YES				 Y —> CONTINUE
NO					 N ~> GO TO B30
B29b. Which types of air conditioning units do you use? (READ CHOICES AND CIRCLE ALL
THAT APPLY.)
Central unit/units	 1
Window or wall unit/units 	 2
Portable unit/units	 3
B29e. During which month (do you usually/would you) start using air conditioning to cool this
(house/apartment)? During which month (do you usually/would you) stop using air
conditioning? (CIRCLE THE START AND STOP MONTHS.)
Start Month: Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
Stop Month; Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
A-30

-------
B30. QUESTION B30 SHOULD BE ASKED IN ARIZONA ONLY. OTHERS GO TO B31.
B30a. Is an evaporative (swamp) cooler used to cool this (house/apartment)? (CIRCLE "Y" OR "N")
YES			Y —> CONTINUE
NO 		N --> GO TO B31
B30b. Which types of evaporative (swamp) coolers are used ? (READ CHOICES AND CIRCLE ALL
THAT APPLY.)
Central unit/units	1
Window or wall unit/units 	2
Portable unit/units 		3
B30c. How often are the pads changed on the coolers? 				__times/year.
Are they....
Changed during the summer	1
	times/summer.
Changed once at the beginning of the summer	2
Changed after	years of use	3
B30d. What types of pads are currently used in the coolers (READ CHOICES AND CIRCLE ALL
THAT APPLY).
Aspen pads			1
Paper pads			2
Synthetic pads	3
Other:					4
B30e. How often is the water drained and the cooler cleaned? 	times/year.
B30f. How often is water treatment added to the water?	times/year
B31. Which fuels are used for heating this (house/apartment)? (READ CHOICES AND CIRCLE ALL
THAT APPLY.)
Gas: from underground pipes serving
the neighborhood	 1
Gas: bottled, tank, or LP 					 2
Electricity 				 3
Fuel oil, kerosene, etc			 4
Coal or coke 	 5
Wood	 6
Solar energy 					 7
Other fuel (Specify:		) 	 8
No fuel used 		 9
Don't know								 DK
A-31

-------
B32. Does this (house/apartment) have a central heating system with ducts that blow air into most
rooms? (CIRCLE "Y" OR "N.")
YES 					 Y
NO			 N
B33. During which month (do you ws«a//v would you) start using heating devices? During which
month (do you usually!would you) stop using heating devices? (CIRCLE THE START AND
STOP MONTII.)
Start month: Jan Feb Mar Apr May Jim Jul Aug Sept Oct Nov Dec
Stop month: Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
B34a. During the months identified in the last question, do you use portable kerosene heaters in this
(house/apartment)? (CIRCLE "Y" OR "N,")
YES			 Y —> CONTINUE
NO			 N —> GO TO B35a
B34b. How many kerosene heaters did you use last year1
B34c. How often do you use your kerosene heater during the heating season? (READ CHOICES
AND CIRCLE ONE.)
Less than one day a month				1
One to three days per month		2
One or two days a week 				3
3-5 days a week		 				4
More than 5 days a week 			5
B35a. During the heating season, is a portable or nonvented gas heater used in this (house/apartment)?
(CIRCLE "Y" OR "N.")
YES					 Y --> CONTINUE
NO	 N --> GO TO B36a
B35b. How many gas heaters?
B35c. How often is a portable or nonvented gas heater used? (READ CHOICES AND CIRCLE ONE.)
Less than one day a month						1
One to three days per month				2
One or two days a week 				3
3-5 days a week 							4
More than 5 days a week 		5
A-32

-------
B36a, During the heating season, is a wood-burning or coal-burning stove used in this
(house/apartment)? (CIRCLE "Y" OR "N.")
YES	 Y —> CONTINUE
NO	 N —> GO TO B37a
B3fib. How many wood or coal-burning stoves?
B36c. How often is a wood-burning or coal-burning stove used during the heating season? (READ
CHOICES AND CIRCLE ONE.)
Less than one day a month				1
One to three days per month		2
One or two days a week 		3
3-5 days a week		4
More than 5 days a week 				5
B36d. What is burned in the stove? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Wood	 1
Coal	.	 2
Other: (Specify:	) 	 3
B37a. During the heating season, is a fireplace used in this (house/apartment)? (CIRCLE "Y" OR
"N.")
YES	 Y —> CONTINUE
NO,	 N —> GO TO B38a
B37b. How many fireplaces?
B37c. How often is a fireplace used during the heating season? (READ CHOICES AND CIRCLE
ONE.)
Less than one day a month 			 1
One to three days per month	 2
One or two days a week 	 3
3-5 days a week 				4
More than 5 days a week 			 5
A-33

-------
B37d. What is burned in the fireplace? (READ CHOICES AND CIRCLE ALL THAT APPLY.)
Wood		1
Artificial logs 		2
Gas fire		3
Other (Specify:	)		4
B38a. In the past 6 months, were any chemicals for the control of termites, insects, rodents, or other
pests used inside this (house/apartment)? (CIRCLE ONE.)
YES 		 Y —> CONTINUE
NO 			 N --> GO TO B39a
DON'T KNOW 		DK --> GO TO 839a
B38b. In the past 6 months, what rooms in your home were treated with products for the control of
termites, insects, rodents, or other pests? (READ CHOICES AND CIRCLE ALL THAT
APPLY)
Living room 					1
Family room	2
Dining room				.3
Kitchen	4
Bathroom(s) 		5
Bedroom (s) 		 6
Other rooms 			7
DON'T KNOW			DK
B38c. What areas within the room(s) were treated? (READ CHOICES AND CIRCLE ALL THAT
APPLY)
Floors 			 1
Baseboards 	2
Lower half of the walls	3
Upper half of the walls			4
Ceilings 						5
Cupboards with dishes, pots, and pans	6
Cupboards with food			7
Cabinets used for storage 			8
Closets			9
Other (Specify):	 			10
DON'T KNOW	DK
A-34

-------
B38d. In the past 6 months, how many times.... (ENTER NUMBER OR "DK" FOR DON'T KNOW.)
i.	did (you/he/she) personally apply these products inside this (house/apartment)?
ii.	did a professional exterminator apply these products inside this house
or apartment? 	
iii.	did someone else apply these products inside this (house/apartment)? 	
B38e. In what month were they last used inside this (house/apartment)? 	 (ENTER MONTH
OR "DK" FOR DON'T KNOW.)
B38f. What is(are) the name(s) of the product(s) last used inside this (house/apartment)? (IF
RESPONDENT DOES NOT KNOW, ASK TO SEE THE CONTAINERS. ENTER NAME OR
"DK" FOR DON'T KNOW) 	
B38g. The last time this product was used inside this (house/apartment) how was it prepared for
application? (READ CHOICES AND CIRCLE ONE.)
Mixed or diluted 		1 —> CONTINUE
Applied directly as purchased (no mixing)	2 —> GO TO 133 9a
DON'T KNOW	!	DK ~> GO TO B39a
B38h. The last time this product was used inside this (house/apartment), who mixed the product?
(READ CHOICES AND CIRCLE ONE.)
Respondent 		1
Professional exterminator	2
Other					3
DON'T KNOW					DK
B38i. Where was it mixed? (ENTER LOCATION OR "DK" FOR DON'T KNOW.)
B39a. In the past 6 months, were any chemicals for the control of termites, insects, rodents, or other
pests used outside this (house/apartment)? (CIRCLE ONE.)
YES	Y —> CONTINUE
NO 								N --> GO TO B40a
DON'T KNOW							DK -> GO TO B40a
A-35

-------
39b. In the past 6 months, how many times.... (ENTER NUMBER OR "DK" FOR DON'T KNOW.)
i. did (you/he/she) personally apply these products outside this (house/apartment)?
ii.	did a professional exterminator apply these products outside this house
or apartment? 	
iii.	did someone else apply these products outside this (house/apartment)? 	
B39c. In what month were they last used outside this (house/apartment)? (ENTER MONTH OR "DK"
FOR DON'T KNOW.)
B39d. What is (are) the name(s) of the product(s) last used outside this (house/apartment)? (IF
RESPONDENT DOES NOT KNOW, ASK TO SEE THE CONTAINER(S), ENTER NAME
OR "DK" FOR DON'T KNOW)
B39e. The last time this product was used outside this (house/apartment), how was it prepared for
application? (READ CHOICES AND CIRCLE ONE.)
Mixed or diluted			1 --> CONTINUE
Applied directly as purchased (no mixing)			2 --> GO TO B40a
DON'T KNOW	DK -> GO TO B40a
B39f, The last time this product was used outside your (house/apartment), who mixed the product?
(READ CHOICES AND CIRCLE ONE.)
Respondent 						1
Professional exterminator	2
Other			3
DON'T KNOW 		DK
B39g. Where was it mixed? (ENTER LOCATION OR "DK" FOR DON'T KNOW.)
B40a. In the past 6 months, have you had any regular lawn or yard treatments? (CIRCLE ONE.)
YES						 . Y —> CONTINUE
NO 	 N —> GO TO B41
DON'T KNOW				 . . . DK --> GO TO B41
A-36

-------
1340b. Who usually applies these treatments? (READ CHOICES AND CIRCLE ONE.)
Respondent 			 1
Professional 				2
Someone else 	3
B40c. Were the treatments applied wet or dry?
Wet 		1
Dry 			2
DON'T KNOW 			DK
B4Gd. In the past 6 months, how many of these lawn treatments contained weed control? (ENTER
NUMBER OR "DK" FOR DON'T KNOW.)
B40e. In the past 6 months ,how many of these lawn treatments contained insect control? (ENTER
NUMBER OR "DK" FOR DON'T KNOW.)
B40f. In what month was the last treatment applied? (ENTER MONTH OR "DK" FOR DON'T
KNOW.)
B41. During the past six months have mothballs been used in this (house/apartment)? (CIRCLE "Y"
OR "N.")
YES			 Y
NO	 N
B42. During the past six months have room deodorizers been used in this (house/apartment)?
(CIRCLE "Y" OR "N.")
YES	 Y
NO			 N
B43a. Do you have house pets such as dogs, cats, gerbils, hamsters, rabbits, guinea pigs, birds?
(CIRCLE "Y" OR "N.")
YES 	 Y--> CONTINUE
NO , 								 .. N --> GO TO B44
B43b. How many of these pets are kept indoors all the time?
A-37

-------
B43g.
How many of these pets are kept outdoors all the time?
B43d. How many of these pets are kept both indoors AND outdoors?
B43e. Are any chemicals used on the pets to control fleas and ticks? (CIRCLE "Y" OR "N,")
YES			 Y-> CONTINUE
NO		 . .. N --> GO TO B44
B43f, What is the name of the product last used on one of your pets to control fleas or ticks? (IF
RESPONDENT DOES NOT KNOW, ASK TO SEE THE CONTAINER(S). ENTER NAME
OR "DK" FOR DON'T KNOW)
A-38

-------
FAMILY INCOME
B44. Family income is often used in scientific studies to compare groups of people who are simiLar.
We do some analysis of the data using these groups. Please remember that all the data you
provide is held in strict confidence.
Approximately what is the gross annual income for all family members in this
household? (HAND CARD, PENCIL, AND ENVELOPE TO RESPONDENT.) Please
circle the number on this card and put the card in the envelope. Seal the envelope and
return it to me. (IF RESPONDENT PROVIDES ANSWER DIRECTLY, CIRCLE
NUMBER BELOW. IF RESPONDENT SEALS RESPONSE IN ENVELOPE, CIRCLE
"EN." IF RESPONDENT DOES BOTH, CIRCLE BOTH NUMBER AND "EN.")
Less than $9,999 		1
$ 10,000 - $ 19,999						2
S 20,000 -S 29,999			3
$ 30,000 - $ 39,999			4
$ 40,000 - $ 49,999			5
$ 50,000 -S 74,999			6
S 75,000 -$ 99,999			7
$100,000 or more 	 8
ANSWER IN ENVELOPE			EN
DON'T KNOW	DK
REFUSE	RE
A-39

-------
This page intentionally left blank.

-------
OMB Clearance #: 2080-0053
Expires: July 31,1998
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
TIME DIARY AND ACTIVITY QUESTIONNAIRE
Participant Identification Number
[Place Label Here]
Public reporting burden for this collection of information is estimated to average	hours (or minutes)
per response, and to require	hours recordkeeping. This includes the time for reviewing instructions,
searching existing data sources, gathering and maintaining the data needed, and completing and
reviewing the collection of information. Send comments regarding this burden estimate or any other
aspect of this collection of information, including suggestions for reducing the burden, to Chief,
Information Policy Branch, 2136 , U.S. Environmental Protection Agency, 401 M St., S.W., Washington,
D.C. 20460; and to the Office of Information and Regulatory Affairs, Office of Management and
Budget, Washington, D.C. 20503.	
INTERVIEWER/TECHNICIAN ID:	
Date Completed:	/	/		July 14, 1995
A-41

-------
[THIS PAGE WILL CONTAIN THE INFORMATION NECESSARY TO IDENTIFY THE
PARTICIPANT AND WILL BE DESIGNED BY EACH CONSORTIUM TO MEET ITS NEEDS.
THIS IS AN EXAMPLE OF THE INFORMATION THAT WILL BE RECORDED.]
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
TIME DIARY AND ACTIVITY QUESTIONNAIRE
DESIGNATED PARTICIPANT
(If the participant is less than 10 years old, what is the name of the individual who is providing
the answers for the designated respondent?)
Name of Participant	
Completed by	(if other than participant)
Relation to participant	
Home Phone		Date: / /
LOCATION DATA (Technician Completed—address/ID label)
State	 County	
Census Tract	Block	
Street Address	
Apt/Space #
City, Zip	
... Zip code
INTERVIEWER/TECHNICIAN ID:		Date Completed: / /
A-42

-------
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
TIME DIARY AND ACTIVITY QUESTIONNAIRE
At the end of each day, take a few minutes to record the time (you/your child) spent in each of
the seven listed locations. There is one box for each day of the study. The numbers in the box
stand for hours of the day. For example, 5 in the morning is 5:00 a.m. to 5:59 a.m. For each
hour of the day, place an X through the number for each location where (you/your child) spent
any time during the hour. Make sure there is at least one X for each hour of the day.
The terms used in the time diary are defined as follows:
•	Home: The house or apartment where (you live/your child lives); the location
where we are collecting samples.
•	Work:	A place away from
home where (you
work/your child
works).
School:	A place away from
home where (you
attend/your child
attends) school.
Transit:	Any travel from one
location to another,
including all travel
between such places
as home, school, and
shopping centers, as
well as all other travel
on roads, paths, or
trails.
•	Other:	All other places (you
spend/your child
spends) time besides
home, work, school,
and in transit between
locations.
A-43

-------
Bay 1
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Night time)
DATE	/_/_
In Transit
5 6 7 8 9 10 11
12 1 2 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4
PLEASE ALSO COMPLETE THE COLUMN FOR DAY 1 STARTING ON PAGE 6.
Day 2
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Night time)
DATE_/__/_
In Transit
5 6 7 8 9 10 11
12 12 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4
PLEASE ALSO COMPLETE THE COLUMN FOR DAY 2 STARTING ON PAGE 6.
Day 3
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Night time)
DATE_/__/_
In Transit
5 6 7 8 9 10 11
12 1 2 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 II
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4
PLEASE ALSO COMPLETE THE COLUMN FOR DAY 3 STARTING ON PAGE 6.
A-44

-------
Day 4
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Night time)
DATE__/__/_
In Transit
5 6 7 8 9 10 11
12 1 2 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Other
5 6 7 8 9 10 11
5 6 7 g 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4
PLEASE ALSO COMPLETE THE COL UMN FOR DAY 4 STARTING ON PA GE 6.
Day 5
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Nighttime)
DATE_/__/_
In Transit
5 € 7 8 9 10 11
12 1 2 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Oilier
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4
PLEASE ALSO COMPLETE THE COLUMN FOR DAY 5 STARTING ON PAGE 6,
Day 6
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Night time)
DATE_/_/_
In Transit
5 6 7 8 9 10 11
12 1 2 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 12 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4
PLEASE ALSO COMPLETE THE COLUMN FOR DA Y 6 STARTING ON PAGE 6.
A*-45

-------
Day?
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Night time)
DATE_/__/_
In Transit
' 5 6 7 8 9 10 11
12 1 2 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 i 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 I 2 3 4
12 I 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 II
6 7 8 9 10 11
6 1 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 I 2 3 4
PLEASE ALSO COMPLETE THE COLUMN FOR DAY 7 STARTING ON PAGE 6.
Day 8
Day of
Week
Location
Morning
Afternoon
Evening
Early Morning
(Night time)
DATE_/__/_
In Transit
5 6 7 8 9 10 11
12 1 2 3 4 5
6 7 8 9 10 11
12 1 2 3 4

Inside at Home
Inside at Work and School
Inside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4

Outside at Home
Outside at Work and School
Outside at Other
5 6 7 8 9 10 11
5 6 7 8 9 10 11
5 6 7 8 9 10 11
12 1 2 3 4 5
12 1 2 3 4 5
12 1 2 3 4 5
6 7 8 9 10 11
6 7 8 9 10 11
6 7 8 9 10 11
12 1 2 3 4
12 1 2 3 4
12 1 2 3 4
PLEASE ALSO COMPLETE THE COLUMN FOR DAYS STARTING ON PAGE 6.
A-46

-------
DAILY ACTIVITY INFORMATION

1
2
3
4
5
6
7
8

Day
Day
Day
Day
Day
Day
Day
Day

Date
Date
/ /
Date
/ /
Date
I /
Date
/ /
Date
/ /
Date
/ /
Date
/ /
Questions Al-AL:
: Please circle "Y" for Yes or "IN*
for No.




Al. Did (you/your child)
pump gas today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A2. Did (you/your child)
spill gasoline on
(your/his/her) skin
today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A3. Did (you/your child)
spend at least 15
minutes in an enclosed
garage with a parked
car today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A4. Did (you/your child)
have soil or dirt from
your yard in contact
with the skin today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A5. Did (you/your child)
have grass or leaves
from your yard in
contact with the skin
today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A6, Did {you/your child)
clean a Fireplace or
wood stove today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A-47

-------

1
2
3
4
5
6
7
8

Day
Day
Day
Day
Day
Day
Day
Day

Date
/ /
Date
/ /
Date
/ /
Date
/ /
Date
/ /
Date
/ /
Date
/ !
Date
/ /
Questions A1-A13 (continued
i: Please circle "Y" for Yes or "N"
for No.



A7. Did (you/your child)
start or tend a fire in a
fireplace or wood stove
today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A8. Did (you/your child)
use ail outdoor grill or
burn wood, leaves, or
trash today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A9. Were any tobacco
products smoked in die
home today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Alt). Did {you/your child)
take a shower today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
All. Did (you/your child)
take a bath today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A12. Did (you/your child)
prepare (pour, mix)
pesticides,
insecticides, or
herbicides for use
today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A13. Did (you/your child)
apply pesticides,
insecticides, or
herbicides today?
Y N
Y N
Y N
Y N
Y N
Y N
Y N
Y N
A-48

-------

1
2
3
4
5
6
7
8

Day
Day
Day
Day
Day
Day
Day
Day

Date
/ /
Date
/ /
Date
/ /
Date
/ /
Date
/ /
Date
/ /
Date
/ t
Date
/ /
Questions A14-A18; Please enter the number.
A14. How many glasses or
cups of water did
(you/your child)
drink today?
# drinks
# drinks
# drinks
# drinks
# drinks
# drinks
# drinks
# drinks
A15. How many cigarettes
did (you/ your child)
smoke today?
# cigarettes
# cigarettes
# cigarettes
# cigarettes
# cigarettes
# cigarettes
# cigarettes
# cigarettes
A16. How many cigars or
pipesful did
(you/your child)
smoke today?
# cigars/-
pipesful
# cigars/-
pipesful
# cigars/-
pipesful
# cigars/-
pipesful
# cigars/-
pipesful
# cigars/-
pipesful
it cigars/-
pipesful
# cigars/-
pipesful
A17. How many times did
(you/ your child) use
smokeless tobacco
today?
# times
# times
# times
U times
# times
# times
# times
# times
A18. How many times did
(you/your child)
wash (your/his/her)
hands today?
# times
# times
a times
# times
# times
# times
# times
# times
A-49

-------

1
2
3
4
5
6
7
8

Day
Day
Day
Day
Day
Day
Day
Day

Date
Date
Date
Date
Date
Date
Date
Date

/ /
/ i
/ /
/ /
/ /
j /
/ /
/ /
Questions A19-A28: Please enter time spent. If the time was less than 1 hour, enter 15 min, 30 min, 45 min, or
1 hr, whichever is closest to time actually spent. Tf time was greater than 1 hour, round to the nearest hour.
Circle either min. or hr.
A19. (You/your child)
traveled on roadways
or highways today?
A20. (You/your child)
spent indoors with
someone who was
smoking?
A21. (You/your child)
spent in a vehicle
with someone who
was smoking?
A22. (You/your child)
spent swimming in
indoor or outdoor
pools today?
A23, (You/your child)
spent using cleaning
supplies (cleaners,
waxes, polishes)
today?
A24. (You/your child)
spent laying down or
sitting on the carpet
or rugs at home
		today?	
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
rhr
mtn or sir
min or hr
mm or hr
min or hr
min or hr
rain or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
mill or hr
min or hr
A-50

-------

1
2
¦ 3
4
5
6
7
8

Day
Day
Day
Day
Day
Day
Day
Day

Date
Date
Date
Date
Date
Date
Date
Date

/ I
/ /
/ /
/ /
i I
/ /
/ /
/ /
Questions A19-A28: Please enter time spent. It the time was less than 1 hour, enter15 min, 30 min, 45 min, or 1 hr,
whichever is closest to time actually spent. If time was greater than 1 hour, round to the nearest hour. Circle either min.
. or hr., '








A25. (You/your child)








spent in an enclosed
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
workshop or garage








used as a workshop








today?








A26. Doors and windows








at your house were
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
left open for








ventilation today?








A27. (You/your child)








spent performing
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
vigorous exercise








like digging or other








heavy manual labor,








running, bicycling,








aerobic dancing,








playing basketball or








soccer today?








A2S. (You/your child)








spent performing
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
min or hr
moderate exercise








like walking,








gardening, working








while on your feet,








or playing Softball or








golf today?








For Technician Use Only
Comp. J ]
Comp. [ ]
Comp. [ ]
Comp. [ ]
Comp. [ ]
Comp. [ ]
Comp. [ ]
Comp. [ ]

Asst. [ ]
Asst. [ ]
Asst. [ ]
Asst. [ ]
Asst. [ ]
Asst. [ ]
Asst. [ ]
Asst. [ ]

Do []
Bo []
Do []
Do []
Do []
Do []
Do []
Do []
A-51

-------
This page intentionally left blank.
A-52

-------
EXTERIOR AND INTERIOR RESIDENTIAL CHARACTERISTICS
T6a. Surrounding area (within a quarter mile radius of this property): (CIRCLE ALL THAT APPLY.)
T6b. Distance to street (MEASURE THE DISTANCE FROM THE CURB TO THE PRIMARY
ENTRANCE TO THE RESIDENCE OR CHECK BOX IF DISTANCE IS ESTIMATED TO BE
GREATER THAN 300 FEET,):
	feet
	 >300 feet
T6c. Exterior siding material (including foundation): (CIRCLE ALL THAT APPLY.)
Residential
Recreational
Commercial
Industrial .
Agricultural
2
3
4
5
6
Other (specify):
Wood	
Brick	
Vinyl/aluminum
Concrete block 	
Stucco	
Asbestos/asphalt
Other (Specify:	
2
3
4
5
6
7
T6d. Is there paint on any exterior surface that is chalking, chipping or peeling?
YES	
NO			
NOT PAINTED
.. . . 1
. . . . 2
....3
T6e. Is there paint on any interior surface that is chalking, chipping or peeling?
YES	
NO	
NOT PAINTED
2
3
A-56

-------
OMB Clearance #: 2080-0053
Expires; July 31, 1998
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
TECHNICIAN WALK-THROUGH QUESTIONNAIRE
Participant Identification Number
[Place Label Here]
Public reporting burden for this collection of information is estimated to average	hours (or minutes)
per response, and to require	hours recordkeeping. This includes the time for reviewing instructions,
searching existing data sources, gathering and maintaining the data needed, and completing and
reviewing the collection of information. Send comments regarding this burden estimate or any other
aspect of this collection of information, including suggestions for reducing the burden, to Chief,
Information Policy Branch, 2136 , U.S. Environmental Protection Agency, 401 M St., S.W., Washington,
D.C. 20460; and to the Office of Information and Regulatory Affairs, Office of Management and
Budget, Washington, D.C. 21)503.	
INTERVIEWER/TECHNICIAN ID: 	
Date;	/	/		July 14,1995
A-53

-------
[THIS PAGE WILL CONTAIN THE INFORMATION NECESSARY TO IDENTIFY
THE PARTICIPANT AND WILL BE DESIGNED BY EACH CONSORTIUM TO MEET
ITS NEEDS. THIS IS AN EXAMPLE OF THE INFORMATION THAT WILL BE
RECORDED.]
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
TECHNICIAN WALK-THROUGH QUESTIONNAIRE
LOCATION DATA (Technician Completed—address/ID label)
State	 County	
Census Tract	Block	
Street Address	/	
Apt./Space #
City, Zip		___	/
Zip code
INTERVIEWER/TECHNICIAN ID:	 Date Completed: / /
A-54

-------
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
TECHNICIAN WALK-THROUGH QUESTIONNAIRE
COMPLETE THIS QUESTIONNAIRE BY OBSERVATION. YOU MAY ASK PARTICIPANT ANY
QUESTIONS THAT ARE NOT APPARENT.
T1. How many stories (floors) are in this building? (COUNT ONLY FLOORS WITH FINISHED
ROOMS FOR LIVING PURPOSES OR FINISHED BASEMENTS.)
	Floors
IF MULTI-FAMILY BUILDING , CONTINUE. ELSE, GO TO QUESTION T3.
T2. Which floor(s) do respondents live on? 	floor(s).
T3. Of these rooms, how many are carpeted or have rugs covering most (>50%) of their surface?
	Rooms
T4. Using the following statements, how would you rate the overall dust level within the residence?
(CIRCLE ONE.)
Very Dusty -- 				 1
Some Dust - obvious efforts to control dust 	 2
"No" Dust — extreme dust control, very clean 		 3
Additional Comments on dust control:		
T5, Indicate nearest major intersection;		
A-55

-------
T6f, Material around primary entrance to structure: (CIRCLE ALL THAT APPLY.)
Soil 					1
Grass 		2
Cement/asphalt/brick 		3
Gravel				4
Wood 						5
Other (Specify:	)		6
T6g. Dripline: (CIRCLE ONE.)
At wall 				1
Gutters — no dripline 			2
	feet from wall 					3
Other (Specify:	)		4
T6h. Roof type and composition: (CIRCLE ALL THAT APPLY.)
Tarred roof - petroleum base
Sealed with roof protector . .
Wood shakes/shingles 	
Composition asphalt shingles
Other (Specify:	
T6i. Yard material: (CIRCLE ALL THAT APPLY.)
Soil 				1
Grass 		2
Porch/balcony 					3
Cement 			 .	4
Wood/deck 		5
Other (Specify:	)		6
Not applicable 		7
T6j. Types of foundation: (CIRCLE ALL THAT APPLY.)
Slab 			
Crawl space 	'	
Combination crawl space/basement
Full basement 	
Other (Specify:	
DON'T KNOW 	
T7a. Does this residence have a swimming pool?
YES			 Y —> CONTINUE
NO	 N -> GO TO T8a
T7b. Where is the swimming pool located?
Inside 							 1
Outside 			 2
	 2
	 3
	4
J	 5
	 3
	 4
J 	 5
......... DK
A-57

-------
TSa. Does this house or apartment have a hot tub or Jacuzzi?
YES		Y —> CONTINUE
NO					N —> STOP
T8b. Where is the hot tub or Jacuzzi located?
Inside							1
Outside 	 2
A-58

-------
T9. Subject Tracking (Arizona Only)
It is vital that the subject number is assigned correctly. Respondent numbers were assigned during the
initial contact. Prior to entering the field, record the preassigned respondent numbers and the first name of
the subject. Verify the previous information and record additional information. Record the names and
status of any previously absent or unreported household members. Assign additional household members a
respondent number.
Pre-Assigned
Respondent #
Legal
First Name
Date
of
Birth
Relationship to
Respondent
01
Bedroom
# (from
diagram)
Respondents #
During this
Visit Series
Changes in
Respondent
Status


































































































The Primary Respondent is Number
A-59

-------
T10. Household Diagram
1.	Overall dimensions of the portion of the house or apartment occupied by the residents:
Average length:	ft 		Width:	ft
Ceiling height:	ft.
2.	Diagram the house with appropriate dimension for each room. If present, label the living room (LR) or
family room (FR), the kitchen (KA), and other rooms (OR). In addition, label the main room (MR)
occupied (usually the living or family rooms) As a convention, label the bedrooms in order of size.
(B01=the largest, B02=the next largest, etc.). Bedrooms of equal size can be labeled arbitrarily.
Indicate room(s) where Indoor Samples are placed:
PM		Active VOC		
Carpet dust		Passive VOC	
Surface dust		Passive HCHO	
P1D		P ersonal Air (Respondent #)
Other:		Other:	
A-60

-------
Til. Characteristics of floor surfaces and cleaning utensils

Floor Surface
Cleaning Methods
ROOM/FLOOR #
Carpeted
1	Looped
2	Shag
3	Cut/Pile
4	Looped Cut
5	Other
Hard Surface
1	Concrete
2	Brick
3	Wood
4	Tile
5	Other
Other (Specify)
Scotch
Guard
Applied
Last Date and Method of
Carpet Cleaning (i.e.
Professional or Do-it-
yourself, Water, Steam, or
Chemicals)
Does Anyone
Frequently Occupy
the Floor of this
Room (Crawling, Sleeping,
Playing, Sitting)
GIVE NAME

1 2 3 4 5
1 2 3 4 5

Y N



1 2 3 4 5
1 2 3 4 5

Y N



12 3 4 5
1 2 3 4 5

Y N



12 3 4 5
1 2 3 4 5

Y N



1 2 3 4 5
12 3 4 5

Y N



1 2 3 4 5
12 3 4 5

Y N



1 2 3 4 5
1 2 3 4 5

Y N



1 2 3 4 5
1 2 3 4 5

Y N



1 2 3 4 5
1 2 3 4 5

Y N



1 2 3 4 5
1 2 3 4 5

Y N


A-61

-------
This page intentionally left blank.

-------
OMB Clearance #: 2080-0053
Expires: July 31, 1998
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
FOLLOWUP QUESTIONNAIRE
Participant Identification Number
[Place Label Here]
Public reporting burden for this collection of information is estimated to average	hours (or minutes)
per response, and to require	hours recordkeeping. This includes the time for reviewing instructions,
searching existing data sources, gathering and maintaining the data needed, and completing and
reviewing the collection of information. Send comments regarding this burden estimate or any other
aspect of this collection of information, including suggestions for reducing the burden, to Chief,
Information Policy Branch, 2136, U.S. Environmental Protection Agency, 401 M St., S.W., Washington,
D.C. 20460; and to the Office of Information and Regulatory Affairs, Office of Management and
Budget, Washington, D.C. 20503.	
INTERVIEWER/TECHNICIAN ID:
Date Completed	/	/	July 14, 1995
A-63

-------
[THIS PAGE WILL CONTAIN THE INFORMATION NECESSARY TO IDENTIFY
THE PARTICIPANT AND WILL BE DESIGNED BY EACH CONSORTIUM TO MEET
ITS NEEDS. THIS IS AN EXAMPLE OF THE INFORMATION THAT WILL BE
n I? f - n u n i1 n 1
IxJU Ej Ui J
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
FOLLOWUP QUESTIONNAIRE
DESIGNATED PARTICIPANT
(If the participant is less than 10 years old, what is the name of the individual who is providing
the answers for the designated participant?)
Name of Participant	
Completed by	(if other than participant)
Relation to participant	•
Home Phone	 Date: / /
LOCATION DATA (Technician Completed—address/ID label)
State	 County	
Census Tract	Block	
Street Address			
Apt./Space #
City, Zip	/	
Zip code
INTERVIEWER/TECHNICIAN ID: 	Date Completed / /
A-64

-------
These first questions are about things which may have happened in your home. They can be things (you do or see/he/she does or sees) or just normal
activities. Please think about only the past week, the time when you were taking part in this study.
Fl. In the past week, were any of the following items used in your home? (READ CHOICES AND CIRCLE "N" FOR NO AND "Y" FOR YES.)
No Yes
a.	central air conditioner? 		N Y
b.	a window or wall air conditioning unit(s)? ... .	N Y —>	bl. Was it set to... (READ CHOICES AND
CIRCLE ONE.)
recirculate 	 I
outdoor air	 2
DON'T KNOW	DK
c.	an evaporative cooler? 				N Y
d.	a portable or ceiling fan?		N Y
e.	a window fan?		N Y
f.	an exhaust fan?		 N Y
A 65

-------


No
Yes
g-
a wood-or coal-buming stove or furnace?	
N
Y
h.
an oil-burning furnace? 	
N
Y
i.
a kerosene space heater? 	
N
Y
j-
a gas-fired space heater? 	
N
Y
k.
a fireplace?			
N
Y
1. forced-air central heat? (not oil, wood,or coal
burning)		N	Y
m. electrostatic precipitator?		N	Y
n. ultrasonic humidifier? 		N	Y
o. Other filtering device?			N	Y
Specify: 	
A-66
CIRCLE CORRECT
NUMBER
12 3 4 5 6 7	->
12 3 4 5 6 7	->
1 2 3 4 5 6 7	-->
1 2 3 4 5 6 7	-->
12 3 4 5 6 7	-->
When (you/he/she) used
(READ CHOICES), on
how many days, if any,
did you see or smell
unusually heavy smoke
or other fumes coming
into the room?
(ENTER 0 OR
NUMBER OF DAYS)
	days
_ days
	days
	days
_ days
12 3 4 5 6 7
1234567
1234567
12 3 4 5 6 7

-------
In the past week, did (you/he/she) spend any time..,.? or (were you/was he/she) near anyone who was....? (READ THE QUESTION FOR EACH ACTIVITY. IF
RESPONDENT ANSWERS YES FOR ANY ACTIVITY, CIRCLE "Y" AND ASK DETAILED SUB-QUESTIONS. CIRCLE "N" FOR NO. IF THE TIME USING
OR BEING NEAR THE USE OF A PRODUCT IS LESS THAN 1 HOUR, ENTER 15 MIN, 3D MIN, 45 MIN, OR I HR, WHICHEVER IS CLOSEST TO TIME
REPORTED. IF TIME IS GREATER THAN 1 HOUR, ROUND TO THE NEAREST HOUR. CIRCLE EITHER MIN. OR HR.)
No	Yes
a.	paints or solvents (thinners and removers)? ...	N	Y—>
b.	glues or adhesives, such as contact cements,
super glues, and aerosol adhesives, that contain
chemical solvents?				N	Y—>
c.	petroleum products (kerosene, fuel oil) (not
pumping gas)?				N	Y->
d.	gas-powered lawn mower? 		N	Y—>
e.	chain saw or other gas-powered equipment ....	N	Y—>
f.	sander?		N	Y->
g.	insecticides, pesticides, herbicides in any way,
including farming or gardening? 		N	Y—>
How many
times in the past
week?
times
times
Jimes
_times
times
times
Number of
days
since last
used?
	days
times
_days
_days
_days
-days
_days
days
How long
(were
you/was he/
she) using or
near use of:
CIRCLE
min OR hr
_min/hr
_min/hr
_min/hr
min/hr
_min/hr
min/hr
Did (you/
he/she)
handle them
(yourself/
himself?
herself)?
N
N
N
N
N
N
Did (you/
he/she)
wash
hands after
use?
Y->
Y->
Y ->
Y->
Y->
Y->
min/hr N
N
N
N
N
N
N
N
Y
Y
Y
Y
Y
Y
Did (you/
he/she) wear
gloves,
masks, or
other
protective
equipment?
N Y
N
N
N
N
N
N
Y
Y
Y
Y
gl. Did (you,'he/she) mix the product (yourself/himself/herself)? (CIRCLE "Y" OR "N.")
Yes 	Y
No 	N
A-67

-------
In the past week, did (you/he/she) spend any time....? or (were you/was he/she) near anyone who was....? (READ THE QUESTION FOR EACH ACTIVITY. IF
RESPONDENT ANSWERS YES FOR ANY ACTIVITY, CIRCLE "Y" AND ASK DETAILED SUB-QUESTIONS. CIRCLE "N" FOR NO. IF THE TIME
SPENT IN AN ACTIVITY OR BEING NEAR SOMEONE ENGAGED IN AN ACTIVITY IS LESS THAN 1 HOUR, ENTER 15 MIN, 30 M1N, 45 MIN, OR I
HR, WHICHEVER IS CLOSEST TO TIME REPORTED. IF TIME IS GREATER THAN 1 HOUR, ROUND TO THE NEAREST HOUR. CIRCLE EITHER
MIN. OR HR.)
a.	Vacuuming?
b.	Sweeping indoors?
c.	Dusting?
e.	Gardening?
f.	Woodworking?
g.	Metal working/welding?
No
N
N
N
N
N
N
How many
times
Yes in past week?
Y -->
Y—>
Y~>
Y—>
Y—>
Y—>
_ times
_ times
times
times
_ times
times
Number of
days since
last done?
.days
.days
days
. days
. days
days
How long did
(you/he/she) spend.... or
(were you /was he/she) near
someone else .... ?
CIRCLE min OR hr
min/hr
_ min/hr
min/hr
min/hr
min/hr
min/hr
Did you do this
yourself?
Yes
Y
Y
Y
Y
Y
Y
No
N
N
N
N
N
N
A-68

-------
In the past week, did (you/he/she) spend any time,...? or (were you/was he/she) near anyone who was...,? (READ THE QUESTION
FOR EACH ACTIVITY. IF RESPONDENT ANSWERS YES FOR ANY ACTIVITY, CIRCLE "Y" AND ASK DETAILED SUB-
QUESTIONS. CIRCLE "N" FOR NO.)
No
Yes
How many
times
Number of
days since
ione?
burning stove?
Did you do this






Yes
No
a.
Broiling, smoking, grilling, or barbecuing
food?
N
Y—>
times
days
Y
N
b.
Accidentally burning food while cooking?
N
Y—>
times
days
Y
N
c.
Grilling with charcoal or gas?
N
Y—>
times
days
Y
N
d.
Cooking with a wood-burning or coal-
N
Y—>
times
days
Y
N
During the past week, did you or anyone else park a car or other motor vehicle in: (READ CHOICES AND CIRCLE "Y" OR "N.")


YES
NO
NOT
APPLICABLE
a.
a garage attached to your home?
Y
N
NA
b.
a detached garage?	
Y
N
NA
c.
a carport attached to your home?
Y
N
NA
A-69

-------
The next questions are about the food (you/he/she) ate, any medicines (you/he/she) took, and other health
concerns. Again, we only want to know about the past week, while (you were/he/she was) taking part in
the study.
F6. Please tell me the names of any medications (you/he/she) took during the past week. Include
those drugs which a doctor prescribed, any (you choose/he/she chooses)
(yourself/himself/herself) "over the counter", and any herbal or home medications. (PROBE
FOR MEDICATIONS IN THE CATEGORIES LISTED. CIRCLE "N" FOR NO AND "Y" FOR
YES. IF RESPONDENT ANSWERS YES TO ANY CHOICE, LIST TYPES OF
MEDICATIONS, INCLUDING BRAND NAMES, IN FIRST COLUMN, AND ASK
DETAILED SUB-QUESTIONS. ASK TO SEE MEDICATION CONTAINERS AND FILL IN
PRESCRIBED OR RECOMMENDED DOSE IN MG. IF NO WRITTEN INFORMATION IS
AVAILABLE, PROBE FOR DOSE IN MG OR OTHER APPROPRIATE UNITS.)
Medication
No
Yes
How many
times in past
week?
Average
Dose
a. Diuretics?
N
Y ->
times


times


times

b. ChelatingAgents (EDTA,
Calcium Disodium,
Versenate, Succimer, or
Chemet)?
N
Y -->
times


	times
times



e. Antacids (Tums, Rolaids)?
N
Y -->
times



times


	 times

d. Hormones (thyroid
medication, birth control
pills)?
N
Y —>
times


times


times

e. Other?
N
Y —>
times

	times
times




times


times


times




A-70

-------
F7. Please tell me whether (you/he/she) took an^ vitamins or mineral supplements during ihe past
week (PROBE FOR MINERAL SUPPLEMENTS IN THE CATEGORIES LISTED AND ANY
OTHER VITAMINS OR MINERAL SUPPLEMENTS. CIRCLE "N" FOR NO AND "YM FOR
YES. IF RESPONDENT ANSWERS YES TO ANY CHOICE, LIST TYPES OF VITAMINS
AND MINERALS, INCLUDING BRAND NAMES IN FIRST COLUMN, AND ASK
DETAILED SUB-QUESTIONS. ASK TO SEE VITAMIN CONTAINERS AND FILL IN
PRESCRIBED OR RECOMMENDED DOSE IN MG, IF NO WRITTEN INFORMATION IS
AVAILABLE, PROBE FOR DOSE IN MG OR OTHER APPROPRIATE UNITS.)
Vitamin and mineral supplements:
No
Yes
How many
times in past
week?
Average
Dose
a. Calcium supplement?
N
Y —>
times

b. Selenium supplement?
N
Y ">
times

c. Chromium supplement?
N
Y —>
times

d. Multivitamins and all other
vitamin and mineral
supplements?
N
Y —>
times

times


times


times


times


times


times


times


times




ASK ONLY FOR FEMALES OVER 12. OTHERS GO TO F9.
F8. Are you currently expecting a baby or nursing a baby? (CIRCLE "Y" OR "N.M)
Yes	Y
No 	N
A-71

-------
ASK F9 ONLY IF RESPONDENT IS NOT MAINTAINING A FOOD DIARY.
OTHERWISE GO TO QUESTION F10 ,
F9. Did (you/he/she) eat the following foods last week, that is. while (you were/he/she was)
participating in this study? (CIRCLE "N" FOR NO AND "Y" FOR YES. IF RESPONDENT
ANSWERS YES TO ANY CHOICE, ASK DETAILED SUB-QUESTIONS.)

NO
YES
How many times
in past week?
Average
Portion/Size
a. Broccoli cauliflower, or Brussels
sprouts?
N
Y —>
times


# cups
b. Cabbage, cole slaw, or sauerkraut?
N
Y —>
times


# cups
c. Mustard greens, collards, or Swiss
chard?
N
Y —>
times


# cups
d. Turnips, or rutabagas?
N
Y -->
times


# cups
e. Grapefruit or grapefruit juice? (IF
RESPONDENT IS LESS THAN 16
YEARS OLD, GO TO g.)
N
Y —>
times


# ounces
f. Alcoholic drinks (beer, wine,
liquor)?
N
Y ~>
times


# drinks
g. Any foods that have been grilled,
barbecued, flame broiled, smoked,
charred, or blackened by burning ?
N
Y —>
times


# ounces
A-72

-------
F10, During the past week (were you/was he/she) 011 any kind of diet either to lose weight or for any
other reason? (CIRCLE "Y" OR "N.")
YES					Y --> CONTINUE
NO							N --> STOP
F11. What diet or diets (were you/was he/she) on? (READ CHOICES AND CIRCLE ALL THAT
APPLY.)
Weight loss or low calorie diet?		1
Low fat or cholesterol diet? 				2
Low salt or sodium diet?						3
Sugar free or low sugar diet? 			4
Low fiber diet? 		5
High fiber diet?					6
Diabetic diet? 	^					7
Ar\y kind of vegetarian diet?		8
C 'jer (Specify:	^	) ....	9
A-73

-------
This page intentionally left blank.

-------
OMB Clearance #: 2080-0053
Expires: July 31, 1998
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
24-HOUR FOOD DIARY
Participant Identification Number
[Place Label Here]
Public reporting burden for this collection of information is estimated to average	hours {or minutes)
per response, and to require	hours recordkeeping. This includes the time for reviewing instructions,
searching existing data sources, gathering and maintaining the data needed, and completing and
reviewing the collection of information. Send comments regarding this burden estimate or any other
aspect of this collection of information, including suggestions for reducing the burden, to Chief,
Information Policy Branch, 2136, U.S. Environmental Protection Agency, 401 M St., S.W., Washington,
D.C. 20460; and to the Office of Information and Regulatory Affairs, Office of Management and
Budget, Washington, D.C. 20503.	
INTERVIEWER/TECHNICIAN ID;	
Date Completed	/	/		July 14, 1995
A-75

-------
HOW TO USE THE 24-HOUR FOOD DIARY
FOR PARTICIPANTS LESS THAN 10 YEARS OLD, A PARENT OR GUARDIAN SHOULD
PROVIDE ASSISTANCE, AS NEEDED, IN COMPLETING THE FOOD DIARY.
INSTRUCTIONS
(1)	We want you to list all of the foods, beverages, or drinking water you or this child eat(s)
or drink(s) from midrnght to midnight.
(2)	Every time you or this child eat(s), write down the name of the meal (breakfast, lunch,
dinner, snack).
(3)	Then write down on a separate line the (brand/generic) name of every food, or beverage
that you or this child eatfs) or drink(s).
(4)	For food mixtures such as stews or potpies, please write down the major kinds of foods in
the mixture. Use the lines immediately below the one on which the name of the mixture
is entered. In food mixtures, the component ingredients can be identified, for
example—the type of meat in a stew—beef, lamb, venison, etc.
(5)	For beverages (including water), write down how many cups or glasses that you or this
child drink(s). Estimate equivalent measures of water or other beverages taken from a
fountain or large container. Don't forget your second and third cups of coffee or tea, or
refills at a restaurant.
A-76

-------
NHEXAS FOOD DIARY FOLLOW-UP - DAY 1 [NOTE: THE 24-HOUR FOOD DIARY WILL
CONTAIN SIMILAR ENTRIES FOR EACH DAY ON WHICH DUPLICATE DIET SAMPLES
ARE COLLECTED,!	
START DATE: TIMEr
END DATE: TIME:

FOR
INTERVIEWER
USE ONLY
Meal
PLEASE LIST ALL FOODS, BEVERAGES, AND
VITAMINS THAT YOU OR THIS CHILD EAT(S) OR
DRINK(S) AND HOW MANY OF EACH ITEM
How
Many
Portion
Size
Cooking
Method
Non-
Retail
Source
Lunch
EXAMPLE: CHEESEBURGER
1




EXAMPLE: SALAD WITH LETTUCE AND
TOMATOES
1




EXAMPLE: WATER
2 glasses







































































































































CONTINUE ON BACK IF YOU HAVE MORE FOODS TO LIST,
A-77

-------
NHEXAS FOOD DIARY FOLLOW-UP - DAY 1

FOR
INTERVIEWER
USE ONLY
Meal
PLEASE LIST ALL FOODS, BEVERAGES, AND
VITAMINS THAT YOU OR THIS CHILD EAT(S)
OR DRINK(S) AND HOW MANY OF EACH ITEM
How
Many
Portion
Size
Cooking
Method
Non-Retail
Source .








































































































































































A-78

-------
OMB Clearance #: 2080-00553
Expires: July 31,1998
NATIONAL HUMAN EXPOSURE ASSESSMENT SURVEY
FOOD DIARYFOLLOWUP
Participant Identification Number
[Place Label Here]
Public reporting burden for this collection of information is estimated to average	hours (or minutes)
per response, and to require	hours recordkeeping. This includes the time for reviewing instructions,
searching existing data sources, gathering and maintaining the data needed, and completing and
reviewing the collection of information. Send comments regarding this burden estimate or any other
aspect of this collection of information, including suggestions for reducing the burden, to Chief,
Information Policy Branch, 2136 , U.S. Environmental Protection Agency, 401 M St., S.W., Washington,
D.C. 20460; and to the Office of Information and Regulatory Affairs, Office of Management and
Budget, Washington, D.C. 20503.	
INTERVIEWER/TECHNICIAN ID:
Date Completed:	/	/		July 14,1995
A-79

-------
COMPLETE ON SAME DAY DAY:
SAMPLES ARE COLLECTED DATE:
1
/ /
2
/ /
3
/ /
4
/ /
FD1. Was breakfast eaten? (OBSERVE FROM DIARY AND CIRCLE "Y" OR
"N",)
Y N
Y N
Y N
Y N
FD2. Where was (your/his/her) breakfast prepared and eaten? (READ CHOICES
AND CIRCLE "P" FOR PREPARED AND "E" FOR EATEN.)
a.	Home 	
b.	Restaurant or cafeteria 	
c.	Work site 	
d.	School or dav care center	
e.	Other 	
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
FD3. How often (do you/does he/does she) eat a breakfast like the one you
described in the diary? (READ CHOICES AND ENTER ONE RESPONSE
LETTER FOR EACH DAY OF FOOD COLLECTION.)
a.	6 or 7 times per week
b.	1 to 5 times per week
c.	Less than once a week
	


—
FD4. Was lunch eaten? (OBSERVE FROM DIARY AND CIRCLE "Y" OR "N.")
Y N
Y N
Y N
Y N
FD5. Where was (your/his/her) lunch prepared and eaten? (READ CHOICES
AND CIRCLE "P" FOR PREPARED "AND "E" FOR EATEN.)
a.	Home 	
b.	Restaurant or cafeteria 	
c.	Work site 	
d.	School or day care 			
e.	Other 	
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
FD6. How often (do you/does he/does she) eat a lunch like the one you described
in the diary? (READ CHOICES AND ENTER ONE RESPONSE LETTER
FOR EACH DAY OF FOOD COLLECTION.)
a.	6 or 7 times per week
b.	1 to 5 times per week
c.	Less than once a week



	
FD7. Was dinner eaten? (OBSERVE FROM DIARY AND CIRCLE "Y" OR "N.")
Y N
Y N
Y N '
Y N
FD8. Where was (your/his/her) dinner prepared and eaten? (READ CHOICES
AND CIRCLE "P" FOR PREPARED AND "E" FOR EATEN.)
a. Home 	
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
P E
b. Restaurant or cafeteria
C. Work site 				

e. Other 	

I D9. How often (do you/does he/does she) eat a dinner like the one you
described in the diary? (READ CHOICES AND ENTER ONE RESPONSE
LETTER FOR EACH DAY OF FOOD COLLECTION.)
a.	6 or 7 times per week
b.	1 to 5 times per week
c.	Less than once a week
-
	
	
	
A-80

-------
COMPLETE ON SAME DAY DAY:
1
2
3
4
SAMPLES ARE COLLECTED DATE;
/ /
/ /
/ /
/ /
FD10, Please think back. Were there any foods or beverages that you could not or




did not collect for use? (LIST IDENTITY, SOURCE, AND AMOUNT OF




EACH MISSING FOOD AND THE DAY IT WAS NOT COLLECTED.)




a. At Breakfast 						—							 ...






Y N
Y N
Y N
Y N










ti At I imrh
Y N
Y N
Y N
Y N






c. At Dinner ...




















d. For Snacks - include beverages such as coffee or tea





Y N
Y N
Y N
Y N






FD11. Did (you/he/she), for any reason, eat more or less food than usual? (READ




CHOICES AND ENTER a b, OR c .)




a. More food than usual —> CONTINUE




b. Less food than usual -> CONTINUE




c. Same as usual -> GO TO FD13




FD12. Because of: {READ CHOICES AND CIRCLE ALL THAT APPLY.)




a. Travel or vacation 	




b. Weight control diet 	
a
a
a
a
c. Illness or medical condition 					
b
b
b
b
d. Work or school schedule	
c
c
c
c
e. Entertainment or social occasion	
d
d
d
d
f. Because of the food collection study	
e
e
e
e
g. Ease/quickness of preparation 	
f
f
f
f
h. Other

g
g
g
g
Day 1

h
h
h
h
Day 2





Day 3





Day 4





FD13. Did (you/he/she), for any reason, eat different foods than (your/his/her) usual




diet? (CIRCLE "Y" OR "NO") 	






Y N
Y N
Y N
Y N
FD14. If yes, was that because : (READ CHOICES AND CIRCLE ALL THAT




APPLY.)





a. Travel or vacation 						




b. Weight control diet 	
a
a
a
a
c. Illness or medical condition			
b
b
b
b
d. Work or school schedule	
c
c
c
c
e. Entertainment or social occasion	
d
d
d
d
f. Because of the food collection study 					
e
e
e
e
g. Ease/quickness of preparation 	
f
f
f
f


g
6
e
g
Day 1

h
h
h
h
Day 2





Day 3





Day 4





A-81

-------