Bafteiie
n
ati
Febiuary 15, 2007
505 King Avenue
Columbus, Ohio 43201 -2693
(614)424-6424 Fax (614) 424-5263
Ms Sineta Wooten
Project Officer
Program Assessment & Outreach Branch (7404)
OPPT, Room E827
U S Environmental Protection Agency
1200 Pennsylvania Avenue NW (7404T)
Washington, D C 20460
Dear Ms Wooten
Contract No. EP-YV-04-021
Work Assignment 2-13
Final Quality Assurance Plan
Enclosed for your review is a revised version of the Quality Management Plan for Woik
Assignment 2-13 on Targeting Elevated Blood-Lead Levels in Children This report piovides
information regarding the data sources that will be utilized in this work assignment, the statistical
analysis methods that will be used to develop predictive models, and quality assurance and
quality control measures taken to ensuic quality and integrity of results This is a delivciablc
under Amendment 4 of the subject work assignment
If you have any technical questions about either, please contact me at 614/424-4547, or
Warren Strauss at 614/424-4275
Bruce E Buxton, Ph D
Vice Piesident and Senioi Piogiam Managci
Statistics and Information Analysis
BEB/WS Inl
Enclosure
cc Ms Margaict Conomos. EPA WAM
Di Barry Nussbaum, EPA Technical Advisor
Mr Ronald Morony, EPA Project Officei
Sincerely,


-------
February 15, 2007
DRAFT QUALITY MANAGEMENT PLAN FOR THE
TARGETING ELEVATED BLOOD-LEAD LEVELS IN
CHILDREN PILOT STUDY
Prepared By
BATTELLE
505 King Avenue
Columbus, Ohio 43201
EPA Contract No. EP-W-04-021
Prepared For
Sineta Wooten, Project Officer
Margaret Conomos, Work Assignment Manager
Barry Nussbaum, Technical Advisor
Program Assessment and Outreach Branch
National Program Chemicals Division
Office of Pollution Prevention and Toxics
U.S. Environmental Protection Agency
1200 Pennsylvania Avenue NW (7404T)
Washington, D.C. 20460
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES

-------
REVISED QUALITY MANAGEMENT PLAN FOR
TARGETING ELEVATED BLOOD LEAD LEVELS IN CHILDREN PILOT STUDY
Version #1
February 15, 2007
Approval for Battelle
Principal Investigator
Project Managei
Warren Strauss	Bruce Buxton
Approval for U S Environmental Protection Agency
EPA Project Officer	Date	OPPT Quality Officer	Date
Sineta Wooten	John M Dombrowski
EPA WAM	Date	EPA Technical Advisoi	Date
Margate! Conomos	Bairy Nussbaum
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES

-------
Draft Quality Management Plan for the
Targeting Elevated Blood-Lead Levels in
Children Pilot Study
February 15, 2007
1.0 INTRODUCTION
Over the past 15 yeais, various childhood lead poisoning prevention programs (CLPPPs)
throughout the U S have conducted analyses of their screening data to develop "risk indices," or
mathematical models for piedicting the prevalence of childhood lead poisoning in various
diffeient geographic areas within their legions of concern These modeling efforts are generally
intended to both characterize the extent of the pievalence of childhood lead poisoning within
their geographic areas and support the development of targeted screening and outreach plans in
order to reach the 2010 goal of eliminating childhood lead poisoning throughout the U.S 1,2
To date, the majority of modeling efforts focused on risk assessment and targeting have focused
on combining screening information and data with demographic variables available from the
U S Census Previous studies have combined childhood surveillance data (aggregated at the
zip-code or census tract level) with demographic predictor variables from the census for the
purposes of targeting geographic areas at higher risk of childhood lead poisoning
(Miranda, Dolinoy, and Overstieet 2002, Miianda et al 2005, Strauss, Nahhas et al 2001).
These studies have led to recommendations for using age of housing and percent of population
below the poverty line for targeting neighborhoods that may be of increased risk for childhood
lead poisoning (CDC 1997) Numeious studies have also been used to document the lelationship
between children's blood-lead concentrations and measures of lead in lesidential environmental
media (dust, soil, air, water, and food) (HUD 1995, Lanphear et al 1998, Stiauss,
Carroll et al 2001) These studies have contributed to EPA and HUD regulations and policies
for identifying and ieducing residential childhood lead exposures (24 CFR Part 35, 40 CFR Part
745, 40 CFR Part 745, U S Department of Housing and Uiban Development September 15,
1999) Other studies have combined blood-lead surveillance data with programmatic
information on housing units tieated to determine the positive impact of housing-based
intervention programs (Strauss et al 2006)
The goal of this study is to explore models in order to predict the numbci of children at nsk of
elevated blood lead levels for a given geogiaphic aiea based on a hieraichical combination of
demographic, environmental, and programmatic information sources While the models will be
highly dependent on available data, it is expected that this study will piovide an appropriate
statistical methodology that combines each data source in an appropriate manner, adjusting foi
global and local trends over time In doing so, the models will build upon concepts of
hieiarchical modeling and longitudinal data analysis
As EPA, CDC, and other federal and stale agencies piepare to meet the 2010 goal of eliminating
childhood lead poisoning, oui pilot study of integiating seveial diflerent types of data sources
1	hup //www ede eov'/nceh/lead/abcnii/piogiam him
2	hllo //www ena eov/lcad/oubs/fedsii:negv2000 ndf
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
1

-------
will hopefully improve the predictive power of models that icly on a single information source
This will allow for more efficient targeting of those geographic areas that need the most help in
eliminating childhood lead poisoning
1.1 MODEL DEVELOPMENT
This pilot study seeks to develop models to predict the numbei of children at risk of elevated
blood lead levels for a given geographic area based on a hieraichical combination of
demographic, environmental, and programmatic information sources Doing so requires looking
at both the mechanisms of childhood lead risk assessment and contiol activities at the local level
as well as at broad trends across the U S The two main analysis goals correspond to developing
predictive models at two different levels of geographic specificity, and appear as follows
1	Broad Coverage (Lower Resolution) Model This type of model is intended to be
able to characterize broad trends over time in the prevalence of childhood lead
poisoning at the county level across the entire U S This mode! will be based on
quarterly county-level aggregated surveillance data from the CDC and augmented
with environmental, demographic, and piogrammatic (level of financial spending)
information
2	High Resolution Model This type of model represents the effort to assess the lelativc
contribution of various exposure sources associated with elevated blood-lead
concentrations within select communities This type of model certainly pays homage
to the concept that exposures that contribute to childhood lead-poisoning are likely to
be community specific Although we recognize that theie are non-local, non-
environmental (including genetic susceptibility), and even non-domestic sources that
can also contribute to lead poisoning, the focus of this study is on the local factois
that can contribute to childhood lead poisoning (Spivey 2007) It is anticipated that
this analysis goal will be met through modeling efforts of Los Angeles case
management data and Massachusetts housing unit lead assessment and/or control
activities These data sources will be augmented with other environmental and
demographic information such as air monitoring, toxics release inventory, age of
housing stock, poverty level, etc
It is anticipated that synergies will be evident between the two models described above While a
formal combination of the two geographical models may not be accomplished during the
timeframe of this current pilot study, belter understanding of (1) the relative importance of
various exposure sources in addition to leaded paint in housing, and (2) the geographic areas
across the U S that iemain at increased risk for childhood lead-poisoning, will set the stage foi
doing so in the near future
In order to meet the two main analysis goals above, a draft Statistical Analysis Plan is piovided
in Section 3 Batlelle has worked with the EPA Work Assignment Manager (WAM) to develop
a Statistical Analysis Plan that details the specific goals of the data analysis, and the statistical
methods that will be utilized to meet those goals including both descriptive methods used in
preparation of more sophisticated modeling efforts Battelle will continue to work with the EPA
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
2

-------
WAM to amend and revise the plan, as necessary, and to accommodate additional data analysis
goals and methods as the project evolves past the March 2, 2007 date for this particular work
assignment The Statistical Analysis Plan will become the basis for the final report
1.2 DATA MANAGEMENT AND COMPUTING RESOURCES
The primary objective of this pilot study is to utilize combined information from different
sources at various levels of geographic and temporal specificity to more accurately target
geographic areas at high risk for not meeting the 2010 goal of eliminating childhood
lead-poisoning As such, work on the study will lequire careful integration of a variety of data
sources with various characteristics and documentation
Data to support this study will be gathered from a variety of sources, including federal, state and
local lead poisoning prevention programs, as well as publicly available data that can be
downloaded from the internet (e g , Census data, EPA's Toxic Release Inventory, etc ) When
Battelle first receives each data source, we will review the data and supporting documentation to
gain knowledge on the structure, relationship and quality of the data Battelle database managers
will work as appropriate with the project team (including collaborators providing data to the
project, as well as EPA) to determine the final format for each database, desired uses of the
databases, as well as the requirements for maintaining the databases Based on this information,
Battelle will construct master databases for the national low-resolution model and for each of the
high-resolution models (Los Angeles and Massachusetts) that integrate the various
environmental, demographic, and programmatic variables, and facilitate statistical analyses of
the combined data Battelle is piepared to construct these databases by combining data sent by
EPA to Battelle or acquired from national databases (U S Census, U S Geological Survey, etc)
in a variety of formats including MS SQL Server, MS Access, Excel, ACSII, Access, ArcView,
and SAS® electronic databases In order to combine the various data sets, they will be merged
on key fields, including state, county, Census tract, and time period The data being used for
analyses of a particular geographic level, e g county, are comparable because they are
representative of that geographic area
Throughout the development process, Battelle will conduct checks for completeness on all study
databases, and will work with data-sharing collaboiators and EPA to attempt to complete missing
data as necessary to complete the proposed statistical analyses Any changes to the databases
(corrections, additions, deletions, etc ) will be fully documented in appropriate meta-data files,
and reported to EPA As part of constructing and maintaining these databases, Battelle will
develop appropriate documentation of the combined mastei databases
Battelle has Standard Operating Procedures (SOPs) in place that will be followed to ensure the
proper storage, backup, and retrieval of datasets created and analyzed for this study Highlights
of these SOPs are summarized below, and in the Quality Management Plan (Section 6 of this
Quality Assurance Plan) The various databases are backed up to tape nightly via Battelle's
automated backup routines, and will only be accessible to membeis of the proposed Battelle
Project Team CD-ROM backups also will be made on a regular basis to serve as a safeguard in
case the backup system fails foi any reason
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
3

-------
An additional important aspect of the proposed project beyond the acquisition of the appropriate
data sources is the processing and analysis of data Battelle has all the necessary computing
software and hardware to accomplish the scope of work within this study A brief overview of
Battelle's statistical and computing software and hardware is given here Every technical staff
member in Battelle's Statistics and Information Analysis Product Line has the latest version of
SAS installed on their Windows® PC The base system and procedure modules, Stat and Graph,
are standard on each computer Supplementary modules, including IML, FSP, QC, Insight, and
AF, are available to all staff as needed
Generally, the SAS System is used to translate and prepare the data for statistical or
mathematical analyses Data sets can be converted from their native format to permanent SAS
datasets, or data can be seamlessly retrieved electronically using SAS tools for dynamic data
exchange (DDE) or Open DataBase Connectivity (ODBC) functionality Our team is familiar
with the advantages and disadvantages of each of these data retrieval approaches as well as the
SAS code necessary to accomplish such tasks In addition to SAS, we have available S-Plus,
C++, and nearly every other major software product for developing and applying statistical
routines
Our technical staff have extensive knowledge and availability of SQL and all of its various
dialects including Access SQL, Transact SQL (Microsoft SQL Server), and SQL*Plus (Oracle)
In addition to SAS and SQL, Battelle staff have several other supporting software tools that
provide us diverse database capabilities, including Geographic Information Systems (GIS) such
as Arcview and Arclnfo, as well as worldwide web delivery capabilities Battelle staff can
access and update data in most major systems, ranging from Access and Paradox to SQL Server
and Oracle. One installation of SQL Server is maintained and used within the group Our
Information Management department maintains other installations of SQL Server We arc
well-versed in converting data to and from various formats, including delimited text, dBase,
MS Access, Excel, FoxPro, Paradox, all versions of SAS, and S-Plus We normally use
third-party tools to handle data conversion For unusual types of conversions, we have written
utilities in MS Access, Visual Basic, C#, and Delphi Our proposed team is familiar with the
AMP/350, AMP/370, and R-2 AIRS data formats, having used these data on several piojects
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
4

-------
2.0 DESCRIPTION OF DATA
The main two goals of the statistical analysis are to develop a predictive model(s) to implement
targeted interventions based on a better understanding (i) the relative importance of various
exposure sources in addition to leaded paint in housing and (2) the geographic areas across the
U S that remain at increased risk for childhood lead- poisoning In doing so, blood-lead data
will be combined with various environmental, demographic and programmatic datasets at
different levels of geographic specificity and coverage The details associated with each of these
various data sources as well as their anticipated inclusion in a high or low geographic resolution
model are provided below
2.1 RESPONSE VARIABLE: MEASURES OF BLOOD-LEAD CONCENTRATION OF
CHILDREN
The statistical models are based upon blood-lead levels of children corresponding to the various
geographic areas studied It is anticipated that the CDC Lead Poisoning Prevention Branch will
provide quarterly summary data from their National Surveillance database for children aged
6-36 months These summary measures will include the number of children screened, number of
children who exceeded certain blood-lead thresholds and potentially a continuous summary
measure (geometric mean blood-lead concentration) for state/local grantees with a history of
universal screening and reporting Careful attention will be paid to ensure that children with
multiple testing results over time are represented appropriately in the analysis dataset Our
intention is to have the models reflect the annual prevalence of childhood lead poisoning over
time As such, Battelle will develop an appropriate algorithm to select representative screening
test(s) for children with multiple results with an objective of having children represented in the
analysis dataset maximally once a year
Blood-lead surveillance data from Massachusetts and Los Angeles will be provided on specific
testing results for individual children (with confidential identification infoimation excluded)
The Massachusetts blood-lead surveillance data will represent all children aged 6-36 months
tested, while the individual data records from Los Angeles will likely only represent children
with elevated blood-lead concentrations Aggregate summaiy measures (at the census
tract-level) from the State of California will be utilized to establish the denominator of childien
tested in each quarter m the City of Los Angeles, so that aggregate summary measures, similar to
those described above for the National Model, may be constiucted to support the Los-Angeles
specific data analysis goals Similar aggiegate measures may also be constructed for the
Massachusetts data, to ensuie consistency and comparability of results
Due to selection bias, we expect that the CDC National Suiveillance dataset as well as the local
data souices in Massachusetts and Los Angeles may show higher proportions of elevated
blood-lead concentrations than found in the general population For this reason, we will compaic
the proportion of children with elevated blood-lead concentrations as well as the distribution of
the potential continuous summaiy measure with those reported by the most recent six-years of
available CDC National Health and Nutrition Examination Survey (NHANES) If the
Surveillance data differs gieatly from the NHANES data, we may recommend methods foi
calibrating the Surveillance data to better match the National Distribution of childhood
blood-lead concentrations as appropnate (Strauss, 2001) in subsequent work efforts
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
5

-------
2.2 EXPLANATORY VARIABLES
The following subsections provide a description of the various explanatory variables that will be
utilized in the models developed to predict risk of childhood lead poisoning at the National and
local levels
2.2.1 Demographic Data from the U.S. Census
Demographic information from the 2000 U S Census will be utilized in both high and low
resolution models, with data being acquired at the county level for the entire nation and at the
census tract level for Massachusetts and Los Angeles
The data gathered by the census bureau includes over 1000 variables To narrow the scope of the
project, we will explore 50 variables within 10 general categories, most of which had been
previously used by Battelle in a CDC sponsored study to predict risk of elevated blood-lead
concentrations at the census tract level (Strauss, 2001)
In many cases, these variables are constructed from counts or summary statistics published in the
detailed U S Census Tables For example, within each geographic area, the census provided the
number of houses that were built before 1950 and the median income of all households
However, in order for us to draw comparisons from tract to tract and/or county to county, the
census variables needed to be manipulated in a fashion that depended upon the format of the
variable For example, count variables, such as the number of housing units built before 1950,
were changed to percentages as shown below
Numberof Houses Built Befoie 1950 „ rTT „ , „
= Percent or Houses Built Before 1950
Total Numberof Houses Bui kin the Tract
Table 1 supplies a list of the variables placed under investigation and the denominators used for
the percentage calculation
Summary-statistic variables describing income on the other hand, may be standardized within
state to adjust for between state differences in the cost of living Table 1 also notes which
variables were standardized
2.2.1.1 Census Variable Descriptions
Density Since both counties and census tiacts vaiy with lespect to spatial area and population,
and previous work suggests that risk of childhood lead poisoning differs between nual
and urban aicas, we will utilize a population density variable as a potential explanatoiy
variable or effect modifier in the statistical models We will investigate population
density in two ways The first of which divides the number of people within the tract by
the amount of land area measured in 001 square kilometers The second divides the
number of housing units by the amount of land area measured in 001 square kilometers
Housing units include the following a house, apartment, mobile home, group of rooms oi
single room that is occupied as sepaiate living quarters
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
6

-------
Race The census bureau presents 5 general race groups; (l)white, (2)black, (3)Indian, Eskimo,
and Aleut, (4) Asian Pacific and (5) Other Additionally, we will create one additional
variable relating to race that describes the number of people that consider themselves
neither white nor black
Age The census bureau does not report the age of people directly Instead, the agency leports
the total number of people that fall into various age categories The variable, "Pet le 6
years" will be created to identify the number of children within each geographic area less
than or equal to 6 years of age at the time of the 2000 Census Additionally, we will
calculate the median age of the total population and of those less than or equal to 6 years
old by taking a weighted average of the midpoint of each age category (the counts uie
used as the weights)
Family Structure The census bureau does not supply a unique variable that indicates the
number of single parent households within a tract Therefore, this variable will be
created by combining census variables as follows
M = Number of Households with a male householder (no wife piesent) whose own
children are under 18 years old
F = Number of Households with a female householder (no husband present) whose own
children are under 18 years old
T = M +F +Number of married couples with own children under 18 years
(M + F)
Percent of Sinele Parent Household =	
T
Education Four variables pertaining to the proportion of adults with various levels of education
will be created as follows
L9 = Number of people older than 18, that have less than a 9th grade education
L12 = Number of people older than 18, that have 9th though 12th grade experience, but
do not have a high school diploma
12 = Number of people older than 18, that obtained a high school diploma or GED
C = Number of people older than 18, that have some college experience but did not
receive a college degree
T = Number of People that are over than 18 years old
(L9)
Percent less than 9th giade =	
T
Percent no HS degree = +
(L9 + L\2 + \2 + C)
T
(L9 + L12 + 12)
Pei cent no col lege degree =
Percent no college experience =
T
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
7

-------
Income We will calculate median income per household, family, and person Additionally wc
will investigate the proportion of households that do not receive any wages, do not
receive any earnings and do receive public assistance The Census defines earnings and
wages as follows
"Earnings" iepresent the amount of income received regularly before deductions for
personal income taxes, Social Security, bond purchases, union dues, Medicare
deductions, etc
"Wages" include total money earnings received for work performed as an employee
during the calendar year 1999 It includes wages, salary, Armed Forces pay,
commissions, tips, piece-rate payments, and cash bonuses earned before deductions were
made for taxes, bonds, pensions, union dues, etc
Poverty Level Similar to the income variables described above, we will summarize the poverty
level of the people individually and the families as a whole within each geographic area
by creating the variables Percent Persons and Percent Families Below the Poverty Level
In order to focus on the poverty level of the children within each tract however, we
created the variables Percent Persons Five Years and Under and Percent Families with
Children Under Five Years Below Poverty Level Note that in calculating the various
percentages for each of the variables the denominator changes
Vacant The remaining variables targeted for exploration within our analyses characterize the
housing stock For example, noting the percent of housing units that are vacant
potentially indicates the level of care taken to maintain buildings within the tract
Buildings that are not occupied are more likely accumulate dust or debris to which the
children of the tract may be exposed upon reoccupancy
Housing Year Built and Occupied Housing Year Built During the 1950's the United States
started to become aware of the consequences associated with the exposure of lead in
paint Thus, the use of lead paint within homes began to decrease However, in 1977 the
use of lead paint in homes became illegal Thus, the years during which the housing units
were built within each tract is important to characterize, older homes are more likely to
contain lead paint than newer homes Furthei more, occupied housing units are moie
likely to have lead paint removed than vacant homes Thus, the years during which
occupied housing units weie built and the years during which all housing units were built
will be investigated within this leport
Rent and Value The peicent of occupied housing units that are rented, rather than owned, is
calculated by dividing the number of rented occupied housing units within the tract by the
total number of occupied housing units Vanables constructed to characterize amount of
rent paid in addition and value of all housing units (owned and rented) will be
standardized to account for state-to-state diffeiences in the cost of living The median
rent variable utilized represents the median across an entire geographic area (e g , county
or Census tract) As part of futuie work, geogiaphic areas could be categorized as either
metropolitan or rural and grouped at broader geogiaphic levels such as EPA region A
Battel le
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
8

-------
median of the broader area could be calculated and then a difference from the regional
median calculated for each specific county or Census tract Note that this methodology
could be applied to income-related variables, as well
Table 1. Initial Variables for Analysis Created From the 2000 Census
Variable
Group
Index
Census Variable*
Format
Calculation
Analyzed Variable
Density
I
Persons
Count
Land Area
(Units = 001 km2)
Population Density
2
Housing units
Count
Land Area
(Units = 001 km2
Housing Density

3
White population
Count
Persons
Pet White

4
Black population
Count
Persons
Pet Black
Race
5
Indian, Eskimo, and Aleut
population
Count
Peisons
Pet Indian, Eskimo,
Aleut
6
Asian Pacific population
Count
Persons
Pet Asian Pacific

7
Other Race population
Count
Persons
Pet Other Race

8
All races excluding white and
black* = #5 + #6 + #7
Count
Persons
Pet Non White and
Non Black

9
Children Less than or Equal
to Six Years Old
Count
Persons
Pet le 6 years
Age
10
Median Age*
Statistic

Median age of persons
11
Median Age of Children Less
than or Equal to Six Years
Old*
Statistic

Median age of persons
le 6 years
Family
Structuie
12
Single Parent* = Single Male
with Children + Single
Female with Children
Count
Household with
Children Less than or
equal to 18 years old =
Married Couple with
children + Single Male
with Children + Single
Female with Children
Pet Single Parent

13
Less than a 9th grade
Education
Count
Persons 18 years old
and over
Pet less than 9th grade

14
Less than high school* = #13
+ persons with 9th to 12th
grade education without
obtaining a high school
diploma
Count
Persons 18 years old
and over
Pet no HS degree
Education
15
Less than college* = #14 +
Persons with high school
diploma, but no college
experience
Count
Persons 18 years old
and over
Pet no college

16
Less lhan college degree* -
#15 + Persons that attended
college without obtaining a
college diploma
Count
Peisons 18 years old
and ovei
Pet no college dcgiee
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
9

-------
Table 1. (continued) Initial Variables for Analysis Created From the 2000 Census
Variable
Group
Index
Census Variable*
Format
Calculation
Analyzed Variable

17
Household Median Income
Statistic

Standardized Median
Income for
Households

18
Family Median Income
Statistic

Standaidized Median
Income of Families
Income
19
Per Capita Income
Statistic

Standardized per
capita income of
persons

26
Households without earnings
Count
Households
Pet No Earnings

27
Households without wages
Count
Households
Pet No Wage or Salai y

29
Households that obtain public
assistance
Count
Households
Pet With Public
Assistance

30
Persons below poverty level
Count
Persons for whom
poverty status is
determined
Pet Persons Below
Poverty
Poverty
Level
31
Persons who are less then or
equal to five years old that
are below poverty level*
Count
Persons who are less
then or equal to five
years old for whom
poverty status is
determined
Pet Persons Below
Poverty of Age LE 5
Below

32
Families with total income
below the poverty level
Count
Families
Pet Families Below
Poverty

33
Families with total income
below the poverty level that
have children under five
yeais old
Count
Families with children
under five years old
Pet Poverty of
Families with Children
LT 5

34
Vacant
Count
Housing Units
Pet Vacant

35
Housing Units Built befoie
1940
Court
Housing Units
Pet Pie 1940 Housing

36
Housing Units Built before
1950
Count
Housing Units
Pel Pre 1950 Housing

37
Housing Units Built befoie
1960
Count
Housing Units
Pet Pre 1960 Housing
Housing
Units
38
Housing Units Built befoie
1970
Count
Housing Units
Pet Pre 1970 Housing

39
Housing Units Built bcloie
1980
Count
Housing Units
Pet Pte 1980 Housing

40
Median Yeai thai Housing
Units were Built
Statistic

Median Yeai Built

41
Median Year (hat Housing
Units wcie Built - Calculated
by Battelle
Statistic

Calculated Median
Yeai Built
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
10

-------
Table 1. (continued) Initial Variables for Analysis Created From the 2000 Census
Variable
Group
Index
Census Variable*
Format
Calculation
Analyzed Variable
Occupied
Housing
Units
42
Housing Units that are rented
Count
Occupied Housing
Units
Pet Renter Occupied
43
Occupied Housing Units
Built before 1940
Count
Occupied Housing
Units
Pet Pre 1940 Occupied
Housing
44
Occupied Housing Units
Built before 1950
Count
Occupied Housing
Units
Pet Pre 1950 Occupied
Housing
45
Occupied Housing Units
Built before 1960
Count
Occupied Housing
Units
Pet Pre i960 Occupied
Housing
46
Occupied Housing Units
Built before 1970
Count
Occupied Housing
Units
Pet Pre 1970 Occupied
Housing
47
Occupied Housing Units
Built before 1980
Count
Occupied Housing
Units
Pet Pre 1980 Occupied
Housing
48
Median Year (hat Occupied
Housing Units were Built
Statistic

Median Year Built -
Occupied Only
Housing
Value
49
Median Rent
Statistic

Standardized Median
Gross Rent
50
Value of Owner Occupied
Housing Units
Statistic

Standardized Median
Housing Unit Value
'Variables that were created by combining different pieces of information from the 2000 Census
2.2.2 Environmental Data Sources
Environmental data acquired for this project will include air and groundwater monitoring data
aggregated at the county level for the low tesolution model and at higher resolutions when
possible for the LA and MA analyses In cases where the data aie available foi a limited number
of air-monitonng stations or drinking water samples available for the legion(s) being
investigated, geospatial modeling techniques might be used as appropriate to develop predictions
across the entire region Existence of industrial sources of lead within each county as indicated
by the toxics release mventoiy will also be included as an environmental data source Each of
these data sources is discussed in further detail below Given the time constraints on gathering
these data and conducting the analyses, a reasonable amount of environmental data were
obtained and integrated into analyses, however, it is likely that other relevant envnonmental data
sources are available and could be incorporated into these analyses in the futuie. For example,
while modeled air lead data aie being used as part of the initial analyses because of their
availability at the county level, available air monitoring data could be modeled to interpolate
values for all geographic areas of interest and included in the analyses
As with the other variable types, a child is linked to a particular geographic area based on then-
address at the time of the blood test used in the analysis They aie then associated with the
various environmental, demographic, and other data associated with that geogiaphic aiea
Certainly there aie cases where childien move or spend significant time in other geographic areas
that will not be captured by this analysis If a child spends a significant amount of time in other
locations outside their homes but within the same county or Census tract, then all exposure data
associated with their home is relevant for those other locations It is our assessment that this
Battelie
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
11

-------
movement is less of a problem in the national analyses where county-to-county moves are less
frequent as opposed to the local analysis where movement between Census tracts is more
frequent
2.2.2.1 Concentrations of Lead in Air
EPA maintains a number of ongoing air monitoring programs that collect data over time on
concentrations of various criteria air pollutants, air toxics, constituents of particulate matter, and
other airborne chemicals Each of these monitoring programs have multiple air monitoring
stations that are deployed throughout the country to meet various goals associated with the Clean
Air Act and other federal and state regulations and programs For example, some of the
monitoring stations are placed in close proximity to industrial sources of pollution and major
populations centers, while other stations are place in remote areas to assess background chemical
concentrations While many of these monitoring sites provide information on the concentration
of lead in air over time, a quick assessment of the spatial coverage of these monitoring networks
suggested that making use of these data would be problematic for this study due to time and
resource constraints. Lead concentrations in air from the monitoring networks are not available
in the majority of counties that will be covered in the low resolution model, or the census tracts
that will be covered in the high resolution models - as shown at the following EPA Website
(http //www epa gov/antrends/lead html)
Instead of utilizing air monitoring data as described above, we intend on making use of modeled
predictions of concentrations of lead in air from EPA's 1999 National Scale Air Toxics
Assessment - in which county and census tract level predictions are available throughout the
entire country based on the use of predictive models Documentation for the 1999 National
Scale Air Toxics Assessment, as well as the predicted air concentration data can be found at
http //www ena gov/ltn/atw/natal999/tables html The predictions were generated using the
Assessment System for Population Exposure Nationwide, or ASPEN This model is based on the
EPA's Industrial Source Complex Long Term model (ISCLT) which simulates the behavior of
the pollutants after they are emitted into the atmosphere ASPEN uses estimates of toxic air
pollutant emissions and meteorological data from National Weather Service Stations to estimate
air toxics concentrations nationwide
The ASPEN model takes into account important determinants of pollutant concentrations, such
as
rale of release
location of release
the height from which the pollutants are released
• wind speeds and directions fiom the meteorological stations nearest to the lelease
breakdown of the pollutants in the atmosphere after being released (i e , reactive decay)
settling of pollutants out of the atmosphere (1 e , deposition)
tiansformation of one pollutant into another (i e , secondary formation)
The model estimates toxic an pollutant concentrations for every census tract in the continental
United States, however - these data are only available for 1999.
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
12

-------
2.2.2.2 Toxic Release Inventory
EPA's Toxic Release Inventory catalogs various sources of lead, based on information provided
by industrial facilities This data source is being used to generate county-level estimates across
the Nation (and census tract level estimates within Massachusetts and Los Angeles) of the total
amount of lead and/or lead containing compounds that are released by industrial facilities into
the environment via air, surface water, or underwater injection
Although the above descnbed ASPEN modeling results are based on the (airborne) emissions
data and how they would theoretically translate into average ambient air-lead concentrations at
the county and census tract levels- the data from the Toxic Release Inventory that are being
managed for this project are available for multiple years and for other types of emissions (such as
surface water) Thus, this information may add predictive power to the models
2.2.2.3 Water Quality Data
The plumbing system inside a home and from the street to the home may contribute to drinking
water contamination To address this potential source, EPA obtained data from their Drinking
Water Information System that includes tap water lead levels for public water systems Public
water treatment plants are monitored over a period of years via random sampling of tap water
within their jurisdictions The length of monitoring vanes depending on the water quality, 1 e ,
systems found to have higher water lead levels undergo momtonng for longer periods of time
Data available from this momtonng program include 90th percentile water lead values
2.2.3 Programmatic Data Sources
Most of the explanatory vanables being explored in this project are considered risk factors for
childhood lead poisoning We will also attempt to characterize a number of factors that might
mitigate these risks We anticipate that the level and charactenstics of programmatic support
from either federal, state, or local sponsors will contnbute towards meaningful reductions in the
prevalence of childhood lead poisoning The level of financial support available within each
county will serve as a proxy for programmatic support in the low resolution (National) models
In the high resolution models, the various charactenstics of the programs (information fiom
housing inspections and case management services) will be explored within the statistical
models The following sections detail the specific characteristics of the vanables used within the
models
2.2.3.1 Level of Support Within County - Financial
The goal of this variable is to construct a longitudinal history of current and cumulative
pei-capita dollars allocated to each county to combat childhood lead poisoning For the National
(low lesolution) Model, we will concentrate on Fedeial Funding at the state and local levels
Batlelle has made contact with technical staff at HUD's Office of Healthy Homes and Lead
Hazard Control to gain access to their data on grants funded since the inception of the
Lead-Based Paint Hazard Control Grant Program in 1992 In addition, data were also obtained
from CDC's Lead Poisoning Prevention Branch on their program's grant funding The EPA
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
13

-------
WAM is taking responsibility on gaining access to similar data on funding childhood lead
poisoning prevention programs by EPA
For the high-resolution model in the Commonwealth of Massachusetts, we will also have access
to information on within-state funding
To date, the only programmatic funding information that we have been able to access is
indicative of state and HUD funding of housing-based interventions in the Commonwealth of
Massachusetts Battelle will continue to work with its contact at HUD to gain access to this
information - however, information not acquired by the middle of January 2007 is unlikely to be
integrated into the analyses
2.2.3.2	Risk Assessment and Control Within Housing Units (Massachusetts)
The Commonwealth of Massachusetts maintains an extensive database on all lead-based paint
inspections conducted over time (dating back to the early 1990's) The Massachusetts
Department of Public Health is providing a database that contains a single record for each
inspection, with the following information housing-unit id, census tract, date of inspection,
result of inspection (whether the housing unit was found to be in compliance with Massachusetts
standards) The database contains records on over 200,000 housing units - with many housing
units having multiple inspections over time Note that for units with multiple records, we
anticipate identifying time-periods in which the units were both in and out of compliance with
the Massachusetts Standards
These data will be used in the Massachusetts high resolution models in two ways First, we will
attempt to develop a longitudinal summary measure of the proportion of housing within each
census tract that is known to be in compliance with the Massachusetts Standards We anticipate
that within a census tract, as this proportion increases over time the nsk of childhood lead
poisoning will decrease
Second, due to the fact that we have access to individual blood-lead recoids from Massachusetts
with linkable housing-unit identification vanables, we can make a determination of whether a
housing unit was in compliance at the time of the blood-lead lest for each child in the database
(with potential outcomes of the determination being yes, no, and unknown)
The first approach described above is consistent with the methods for exploring aggregated
summary blood-lead information over time within each census tract (which is the primary
analysis focus within this work assignment) The second apptoach allows us to utilize some
predictive information at the individual child level This information may help impiovc
prediction, and also help out analysis team assess what information might be lost when
transitioning from individual-level data to aggregate summary data in the analyses
2.2.3.3	Case-Management Services (Los Angeles)
The Los Angeles (LA) data consists of detailed records on appioximately 700 children that
received case management services in response to an elevated blood-lead concentration These
records include an assessment by the case manager of the likely lead exposure sources that the
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
14

-------
EBL child encountered, including residential sources of lead-based paint, parental occupation
and hobby related sources of exposure, use of glazed ceramic pottery, home remedies, and
cosmetics suspected of containing lead, and activity patterns of the child that might contribute
towards lead poisoning
Some of these children may have repeated tests, and it is unclear as to whether the City of Los
Angeles will make these repeat tests available to EPA for this project In addition to the data on
EBL children who received case management services in Los Angeles, EPA is working with
CDC to try to arrange for aggregate summary information at the census tract level of geographic
specificity on all children tested in the City of Los Angeles over the same period of time These
data are controlled by the State of California Department of Health, and it is unclear as to
whether data transfer can be arranged in time to be utilized for this project
Combining these two information sources will provide information on the peicentage of elevated
blood-lead children who receive case management services. The case management data will be
useful for determining which sources of lead exposure are common among those receiving
follow-up, and perhaps which exposure sources are associated with the highest EBL cases
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
15

-------
3.0 STATISTICAL ANALYSIS
Battelle's general approach to data analysis involves translating study hypotheses into statistical
models, identifying potential confounders and effect modifiers, and considering how the
underlying assumptions of proposed statistical methods might affect their application Our
statisticians are highly experienced in applying a wide variety of statistical modeling techniques,
including linear and non-linear regression modeling, analysis of variance techniques, generalized
linear models including logistic regression models, mixed models, random effects, generalized
estimating equations approaches for correlated data (including correlated binary or poisson
distributed data), and hierarchical Bayesian modeling The following text provides our technical
approach for conducting the descriptive analyses and statistical modeling work associated with
this task
Descriptive analysis. Battelle starts every analysis with an assessment of the study sample,
i.e., the proportion of counties and census tracts in the sample with complete data for both the
response variable and the explanatory variables In preparation for developing longitudinal
statistical models, we then generate univariate summaries of each variable as a function of time,
and make comparisons of these distributions using side-by-side box-plots for continuous data or
bar-charts for categorical data This helps verify that the data are clean and ready for analysis
and identify cells with sparse data Such descriptive analyses will be conducted on each
database, to characterize the distributions of all observed variables using frequency distributions
for categorical variables, and simple summary statistics (mean, median, mode, minimum,
maximum, and select percentiles) for continuous variables Distributional assumptions may also
be explored for certain variables, as appropriate, in preparation for more sophisticated models
For example, some environmental concentration data may depart from normality, and follow a
log-normal distribution In these cases, we may additionally report the geometric mean and
geometric standard deviation as part of the simple descriptive summary Battelle is prepared to
create and explore new variables (as appropriate functions of other variables in the study
databases) as part of the descriptive analyses, for later use in statistical modeling These
variables will be added to the study databases, as appropriate, and will be integrated in all
appropriate files and documentation (metadata and data dictionary)
The univariate descriptions are then followed by fitting a series of cioss-sectional bivanate
relationships between the blood-lead iesponse vanable(s) and each candidate explanatory
variable These cross sectional relationships will be explored as a function of time to better
understand the stability of these relationships, and whether they change over time, so that they
can be modeled appropriately in the more sophisticated longitudinal analyses These analyses
will also help identify which explanatoiy variables are most predictive of the blood-lead
response variable
In preparation for more sophisticated statistical analyses, such as the Geneializcd Lincai Mixed
Logistical Regression Model outlined below, we will also perform relevant stratified analyses to
investigate interactions discussed in the data analysis plan For example, we may investigate the
population density variable in this manner, as density may serve as a surrogate to differentiate
between rural and urban geogiaphic areas in the analyses - and exposure vanables may be
different in these types of areas Similarly, we can use EPA regions as a potential stratification
variable If we do not observe variation in the measure of effect (e g , odds ratios) across the
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
16

-------
levels of a third variable, however, we know we can probably treat that third variable as a
potential confounder in the multivariate model, rather than as an effect modifier If the odds
ratios differ markedly—e g , the effect appears to be protective in one subgroup and hazardous in
another subgroup—we know we must consider the third variable as an effect modifier
Statistical Models for Analysis Goal #1 (Broad Coverage - Lower Resolution Model)
This model will be used to characterize broad trends over time in the prevalence of childhood
lead poisoning across the entire U.S The various surveillance, environmental sources,
demographic characteristics, and programmatic support will be aggregated to the county level for
those localities with universal screening and reporting Data will be longitudinal in nature with
time intervals of one quarter of a year
For the purposes of discussion, we will assume that the modeling approach will focus on a
logistic regression model for the proportion of children that have elevated blood-lead
concentrations (>10 (Jg/dL) The temporal nature of declining childhood lead poisoning will be
addressed via classic concepts of longitudinal data modeling of the low resolution data Let
Y,j represent the number of children that were detected with blood-lead concentration above
10 jj-g/dL from the ith county and j,h point in time (quarter),
n,j represent the numbei of children that had their blood-lead concentration tested from within the
t,h county andjlh point in time (quarter),
Please note that we expect that n^Ny, where N,} represents the total population of
children in the ih county and/' point in tune
t,j represent time (in years) corresponding to the Y,j response variable, and
X,j represent a series of predictor variables associated with the Y,j response variable These
predictor variables will likely represent air monitoring data, groundwater data, census
demographic data, programmatic data on federal financial support for lead poisoning prevention,
and other related information as detailed above that can potentially help predict the prevalence of
lead poisoning at the county level
We introduce the following is a potential baseline model
logii(£lvj>=A + fl <„+Ax„+<5.,+«.,
Where the beta parameters (P) leprescnt a vector of fixed effects, and the gamma paramcteis (y)
lepresent random effects that allow each county to have their own trend over tunc In this model
it can be assumed that So, and 5|, jointly follow a multivariate normal distribution with mean zero
V, <
2 2
^*21 ^22
required to help integrate the complex spatial/temporal correlation
and covanance matrix Z -
More complex assumptions or purameteis may be
Battel le
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
17

-------
Counties with larger yi parameters estimates represent areas where lead-poisoning has not
significantly decreased over time Similarly, the parameter estimates can be used to identify
those counties with the highest predicted prevalence of childhood lead poisoning at various time
points in the future.
The product of this effort will be a time-series of maps (or a movie) that spatially interpolates
risk of childhood lead poisoning as a function of various appropriate predictor variables, as well
as full documentation of the spatio-temporal model developed to meet this analysis goal
Battelle will also spend time and resources on this work assignment modifying an existing
visualization tool for specific application to this project The resulting executable that will be
delivered to EPA will allow users to interact with the modeling results at different levels of
temporal and geographic specificity The tool will allow the user to select an appropriate
response variable (e g proportion of children with blood-lead concentrations above 5 ng/dL) and
play a movie that displays a time-series of maps that displays how the predicted (or observed)
risk changes over time The user will also be able to zoom in on a rectangular area, to see these
results with a higher degree of geographic specificity The user will also be able to stop the
movie (or rewind, fast-forward) to isolate specific points in time. By using the mouse, the user
will also be able to select a specific county and the tool will then display the observed and
predicted data for that particular county in a separate window Although data maps have
potential for misinterpretation and may present some political sensitivities, they are beneficial
because they provide a visual presentation of the results that can be relatively easily
comprehended by users at different technical levels
If time and resources permit, the tool will also include some useful querying functionality (such
as being able to isolate the top 'n' counties with respect to predicted risk of childhood lead
poisoning, or rate of decline in prevalence) The visualization tool will be written in C++, and
will be built in a manner that will allow EPA to modify the model and for Battelle to quickly
import the resulting data from a modification into the tool
The Battelle Team will fully document all analyses performed on the study datasets This will
include keeping log files for each analysis run and providing useful comments in each analysis
program that describe exactly what is being done in each step These efforts will ensure that, in
the future, EPA staff could replicate any of our analyses and achieve the identical results
Besides our programmers and analysts being able to write progiams to perform innovative
analyses of any degree of complexity using existing statistical software packages (e g , SAS, S+,
ArcView, SUDAAN, etc ), we have highly qualified staff who are can create custom softwaie
using many programming languages (e g , Visual Basic, Access, C++, FORTRAN, etc ) to
perform specialized data management and analysis tasks
Statistical Models for Analysis Goal #2 (High Resolution Models) High-resolution models
will be utilized to identify the relative contribution of vanous types of exposure souices in
elevated risk for childhood lead-poisoning within a select community These types of sources
include housing factors, broader environmental exposure, demographic composition, and
piogrammatic resources
Battelle
PROPRIETARY. PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
18

-------
While this type of model pays homage to the concept that exposures contributing to childhood
lead-poisoning are likely community-specific, analysis of the high-resolution models may have
certain limitations including selection bias and generalizability to other geographic areas.
Consequently, analysis of high-resolution data may primarily descriptive in nature These
descriptions, however, may be the best way to identify the casual relationships and pathways to
reducing lead-poisoning risk within a neighborhood The characteristics of these pathways will
be taken into account when building the low-resolution model for the entire U S For example
exposure sources or influential demographic and programmatic variables that account for a
significant proportion of risk of childhood lead-poisoning within a community can be combined
with lower resolution data across the nation via concepts from hierarchical Bayesian modeling
The following sections provide an overview of the statistical analyses that we are proposing for
the Massachusetts and Los Angeles data sources Please note that as of the time that this QMP
was written - there were doubts as to whethei EPA and Battelle would be able to gain timely
access to the Los Angeles data We therefore outline the general approach that would be used
for these data in the event that they are received by mid-January
Modeling Approach for Massachusetts
The Massachusetts Department of Public Health (MDPH) is entering into a limited use data
sharing agreement with Battelle, so that they can provide blood-lead testing results on individual
children (aged 6-36 months) and housing inspection data in a format that preserves linkages
through a housing unit identification variable These data will be utilized in two different
modeling approaches The first modeling approach will seek to develop census tract quarterly
summary measures similar to the National Model for blood-lead (e g exceedance proportions
and geometric means), as well as summaiy measures for the proportion of housing units in each
census tract that are known to be in (or out of) compliance with the Massachusetts standard of
care (for use as a potential explanatory variable) MDPH has also provided Battelle with
summary information regarding HUD and State funding of residential housing interventions
(lead hazard control and abatement) - which will be used to develop a longitudinal summary of
current and cumulative per-capita spending on residential intervention within each census tract
(using various assumptions on the allocation of such dollars) Other explanatory variables, such
as the census, toxic release inventory, 1999 National Air Toxics Assessment, and water quality
data will be available for use in these models
These census-tract level summary data (both response variable and explanatory variables) will be
modeled using a similar approach as what is being proposed for the National (Low Resolution)
Model - only the unit of clustering will be census tract rather than county This model will be
fit, with appropriate exploratory analyses and model documentation being provided to EPA in
the technical report deliverable
If time and resources permit, Battelle will conduct a series of analyses to compaie and contrast
the modeling results for the Commonwealth of Massachusetts between the model described
above and the subset of Massachusetts specific data and results that can be extracted from the
National (Low Resolution) model This would entail the following steps
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
19

-------
1	Fit a high-resolution model to the Massachusetts data at the census tract level using the
same type of summary measures that were used in the National Model (only at the
census tract level) This will be called the Baseline High Resolution Model for
Massachusetts
2	Determine whether any improvement in prediction can be made in the baseline model by
first assessing other census or environmental variables that were available for use in the
National Model, and then secondly adding the Massachusetts specific information on
housing-based intervention activities These activities will be used to develop an
Improved High Resolution Model for Massachusetts
3. Aggregate the input data for the response variable and model piedictions across census
tracts within each county in Massachusetts for both the Baseline and Impioved High
Resolution Models, and compare these with the county-level input and model predictions
in Massachusetts from the National Model
Finally, if time and resources permit, we will develop a statistical model that depends on
individual data records for the response variable (rather than aggregate summary measures) -
where we determine whether the specific unit where a child resides was in or out of compliance
with the Massachusetts standard of care at the time of a blood-lead test, and use this information
as part of the prediction The other variables described above that are available at the census
tract level would also be utilized as predictor variables in this model, and similar exercises can
be used to assess the relative performance (and improvement in prediction) that occurs when
modeling this more specific data
Modeling Approach for Los Angeles
The Los Angeles data consists of detailed records on approximately 700 children that received
case management services in response to an elevated blood-lead concentration These records
include an assessment by the case manager of the likely lead exposure sources that the EBL child
encountered, including residential sources of lead-based paint, parental occupation and hobby
related sources of exposure, use of glazed ceramic pottery, home remedies, and cosmetics
suspected of containing lead, and activity patterns of the child that might contribute towards lead
poisoning
While this information is highly specific, it is unfortunate thai similar information was not
gathered across a small subsample of children without elevated blood-lead concentrations Thus,
these data cannot be utilized to determine causal factors for EBL Rather - they may be used to
describe what potential exposure factors do children in Los Angeles with elevated blood-lead
concentrations have in common
If made available to this project, summary data from the Stale of California that represent iesults
of all children tested in Los Angeles can also be used to determine the prevalence of childhood
lead poisoning in Los Angeles census tracts over a similar period of observation These data can
be combined with the more specific case management lecords, to see if any additional picdictive
power can be gleaned fiom this data
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
20

-------
4.0 SCHEDULE OF DELIVERABLES
Table 1 below provides a schedule proposed by Battelle for this project
Table 1. Revised Proposed Schedule for EPA WA 2-13 Targeting Areas of Elevated Blood
Lead Levels (BLL) in Children
Date
Responsible
Organization
Description of Activity
10/16/06
Team
Agreement on scope of work (see bullets below) and organizational
responsibilities for the project
12/14/06
CDC*
Transfer of data from Los Angeles and Massachusetts to Battelle to
support analysis goal #1
Transfer of data from the National Surveillance Database to Battelle to
support analysis goal #2
EPA
Transfer of EPA data sources (Air Monitoring and Toxics Release
Inventory) data to Battelle
Battelle
Battelle assembles other data sources (programmatic, demographic,
other environmental) to support analysis goal #2
1/10/07
Battelle
Draft Quality Assurance Plan (including data analysis plan) for the Proiect
1/15/07
Battelle
Integrated dataset to support analysis goals #1 and #2 (based on input
received on 12/14/06 from CDC and EPA)
2/2/07
Battelle
Draft report on Exploratory Analyses
2/16/07
Battelle
Draft report on High Resolution Models (Massachusetts)
2/2307
Battelle
Draft report on Broad Coverage Model
3/2/07
Battelle
Integration of Broad Coverage Modeling Results into Graphical User
Interface (time series of maps) - Executable Visualization Tool to be
delivered to EPA
* Note that the transfer of blood-lead records from CDC and Massachusetts has not yet occurred, but
that Battelle has conducted preliminary work to ensure that the data can be quickly integrated once
received
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
21

-------
5.0
PERSONNEL QUALIFICATIONS AND ROLES
This work assignment and al! data analysis efforts will be managed by Mr Warren Sirauss, a
senior level biostatistician with extensive expertise in analyses of correlated binary outcomes and
childhood lead poisoning research Dr Bruce Buxton, program manager on the Battelle Contract
with EPA, will serve as the Quality Assurance Manager and Technical Advisor to the study
They will be supported by various talented staff at Battelle, including
Jyothi Nagaraja - a Research Scientist statistician expert in developing logistic regression
modeling approaches with previous experience on the CDC risk indices project,
Michele Morara - A mathematician and computer software specialist who will be
developing the visualization tool,
« Tim Pivetz - a senior statistician who has worked with blood-lead and housing inspection
data from the Massachusetts Department of Public Health on previous research studies
supported by HUD,
•	Elizabeth Slone - a junior statistician expert in conducting exploratory analyses in
advance of development of inferential models,
Nicole Iroz-Elardo - a junior statistician expert in childhood exposure related research,
. Darlene Wells - A senior database management specialist who will lead efforts on
working jointly with technical staff from CDC's Lead Poisoning Prevention Branch to
create an appropriate longitudinal dataset of quarterly county summary blood-lead
information,
•	Rona Boehm - a database management specialist who will lead the integration of all
datasets obtained for this work assignment,
Jennifer Sawchuk - a database management specialist who will assist in the acquisition
and integration of data from publicly available sources (e g census, TRL ASPEN model),
Michael Schlatt - a computei programmer who will work with Darlene Wells and CDC
to develop appropriate code to extract the summary information from CDC's National
Surveillance database, and
Cyndi Reese - a word processing specialist who will assist in the preparation of reports
and presentations
Table 2 provides an overview of the roles and responsibilities of each membei of our proposed
project team Table 3 provides a bieakdown of anticipated hour allocation to five project
activities- Task Management, Data Management, Low Resolution Model, High Resolution
Model, and Visualization Tool Table 3 also piovides a summary of total hours planned for each
member of the project team, as well as the number of hours alieady spent on the project as of
December 28, 2006
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
22

-------
Table 2. Roles and Responsibilities of Proposed Staff
Staff Member
Field of Expertise
Roles and Responsibilities
Warren Strauss
Biostalistics
Work Assignment Leader
Bruce Buxton
Statistics
Program Manager, QA Manager, Technical Advisor
Jyothi Nagaraja
Biostalistics
Develop logistic regression and structural equations
models under the guidance of Warren Strauss
Michele Morara
Mathematics and Computer
Science
Develop the Visualization Tool, Assist in model
specification and Bayesian implementation if
necessary
Tim Pivetz
Statistics and Lead
Exposures
Assist in model development and documentation
Elizabeth Slone
Statistics / SAS Programming
Conduct exploratory analyses
Nicole Iroz-Elardo
Statistics / Exposure
Help conceptualize modeling strategy
Darlene Wells
Data Management and
Software Development
Assist in developing data management strategies,
and gaining access to CDC National Surveillance
Data
Rona Boehm
Data Management
Data management leader for the project
Jennifer Sawchuk
Data Management
Assist in accessing and managing publicly available
data
Michael Schlatt
Software Development
Develop SQL server extraction tool to create
summary dataset from CDC's National Surveillance
Dataset
Cyndi Reese
Secretarial Supporl
Word processing and PowerPoint presentation
support
Table 3. Allocation of Hours by Task
Staff Member
Task 1
Task
Management
Task 2
Data
Management
Task 3
High
Resolution
Model
Task 4
Low
Resolution
Model
Task 5
Visualization
Tool
Total
Hours
Planned
Total
Already
Spent
Warren Strauss
80
10
75
75
20
260
140
Bruce Buxton
32




32
4
Jyothi Nagaraja

10
70
70

150
3
Michele Morara


25
60
230
315
115
Tim Pivetz


60
20

80
0
Elizabeth Slone


20
20

40
0
Nicole Iroz Elardo


10
10

20
20
Darlene Wells

20


15
35
15
Rona Boehm

90



90
38
Jennifer Sawchuk

70



70
20
Michael Schlatt

35



35
18
Cyndi Reese
4

8
8

20
3
Totals
116
235
268
263
265
1147
376
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
23

-------
6.0 QUALITY ASSURANCE / QUALITY CONTROL PROGRAM
The primary focus of Battelle's Quality Assurance/Quality Control Program is to ensure that the
quality and integrity of all deliveiables produced for EPA are of appropriate, known, and
documented quality To do this, Battelle will apply a project specific Quality Management Plan
(QMP) to all work performed for EPA, ensuring that all project activities that affect the quality
and integrity of prepared deliverables are planned, coordinated, and communicated to all project
participants The project QMP also ensures that all technical services, final deliverables, and the
supporting data produced for EPA meet or exceed the quality standards that EPA has set foi this
project The following sections present our proposed QMP for this project
6.1 PROJECT MANAGEMENT
On this project, EPA will utilize technical support from staff in Battelle's Statistics and
Information Analysis department under Work Assignment 2-13 of Contract No EP-W-04-021,
with Ms Margaret Conomos serving as EPA's Work Assignment Manager (WAM) and
Dr Barry Nussbaum serving as the EPA technical advisor Mr Warren Strauss will serve as
Work Assignment Leader and will supervise all technical activities that Battelle performs in
support of this project. Dr Bruce Buxton is Battelle's EPA Program Manager on this contract
and will serve as technical, quality assurance, and managerial reviewer of Battelle's deliverables
on this work assignment
Quality assurance (QA) activities for this project involve reviewing and approving the Data
Management and Statistical Analysis Plan and any necessary amendments, implementing the
plan, reviewing project reports, performing any necessary audits to check for compliance with
the plan, and ensuring that corrective actions are taken when necessary Mr Strauss, as Work
Assignment Leader, will be responsible for implementation of the plan, day-to-day QA activities,
and submitting amendments in a timely fashion Dr Bruce Buxton, as QA Manager, will be
responsible for reviewing and approving the Statistical Analysis Plan and any amendments, and
for conducting audits as necessary Reports prepared under this project will be reviewed by
Mr Strauss and the EPA WAM, as well as by Dr Buxton Ensuring any necessary corrective
action will be the immediate responsibility of Mr Strauss under supervision of Dr Buxton
Battelle has established piocedures for monitoiing the schedule and resouice expenditures of
each Task within the project As tasks proceed, Battelle will submit a monthly progress report to
EPA that summarizes progress made toward completing the milestones and deliverables,
problems encountered and how they were overcome, plans for the upcoming month and beyond,
and a financial summary of labor resource expcndituies These monthly progress reports, along
with frequent discussions between Battelle staff and EPA will facilitate the identification and
resolution of any potential problems, and will serve to document progress
One of Mr Stiauss' key responsibilities will be to ensure the technical quality of all work,
reports, and deliverables provided to EPA Battelle staff members are experienced in identifying
and resolving small problems to prevent them fiom getting worse - and frequent communication
with our collaborators and clients provide the cornerstone to problem resolution Battelle's
business practices include the application of a formal company-wide policy to assure our clients
Battelle
PROPRIETARY. PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
24

-------
that our work is of the highest quality This policy includes the assignment of a senior Battelle
expert to provide technical review and concurrence support for each project deliverable In
addition, Mr Strauss and/or Dr Buxton will also technically review each report and deliverable
before it is submitted to EPA to ensure a quality deliverable
6.2	PROJECT DESCRIPTION AND SCHEDULE
This project involves data management, statistical analysis, geographical information systems
support, and high-level statistical methods development for integrating data sources to predict
how risk of childhood lead-poisoning changes over time as a function of a variety of
environmental exposures, demographic characteristics and programmatic support The original
period of performance for this project was 7.5 months from the date of award, with a specific
schedule of deliverables presented in an earlier vvorkplan However, this schedule has been
severely compressed with the majority of activities being conducted in the last 10 weeks of the
project due to multiple difficulties encountered in accessing the critical blood-lead response
variable information from CDC and local lead poisoning prevention programs
6.3	QUALITY OBJECTIVES AND CRITERIA
To meet the project goals, the project has the following data quality objectives
1	Manage the data from the multiple sources of information relating environmental
exposure, demographic characteristics, programmatic support to childhood lead-
poisoning risk
1 1 Identify missing, incomplete or error prone data from electronic data souices
supplied by EPA, and work with EPA TOPO to resolve these issues
1 2. Develop separate study databases and master integrated databases foi statistical
analyses
1	3 Provide complete documented data dictionary for the study databases
2	Conduct statistical analyses of data as directed by EPA
2	1 Provide descriptive statistics of all variables included in the study, and all created
variables
2 2 Conduct statistical analyses using appropriate statistical software (e g SAS) or
other software products approved by EPA WAM Provide diagnostics on all
statistical models to characterize model fit and distributional assumptions, as
appropriate
3	Develop visualization tool that will allow users to interact with the results of the
broad-based national model
6.4	SPECIAL TRAINING
Staff with prior knowledge of programming in the STAT module within the SAS® System and
with prior tiaining in statistical modeling, especially with legard to binary data, is desnablc for
this pioject Staff with prior knowledge of Geogiaphic Information Systems, particularly
ArcView® and Arclnfo from ESRI, is also desnable foi this project
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
25

-------
6.5 DOCUMENTATION AND RECORDS
At EPA, the Work Assignment Manager will maintain a file for the project The file will include,
but is not limited to, this QMP and statistical analysis plan (and all revisions), computer
programs and listings developed under this statistical analysis plan, result summaries, and
diskettes with the data The data provided to the EPA WAM by Battelle will be stripped of all
potentially identifiable or confidential information - but will be organized in such a way that the
analyses performed by Battelle can be replicated
Battelle's Project Manager will maintain a file for this project that contains the inputs required
for conducting Battelle's activities, as well as the outputs generated from conducting these
activities Among the required inputs to be included in the project file are copies of this QMP,
and copies of data provided to Battelle by various data sources (subject to provisions included in
limited data sharing agreements with Massachusetts and Los Angeles) The outputs generated by
Battelle's activities that will be included in the project file will include the fully documented
individual and combined master datasets, SAS programs and ArcView files written to perform
summaries and the statistical analyses on the EPA data, those portions of SAS and ArcView
outputs that are generated by executing these programs and which directly provide information
to be cited in the deliverables, the visualization tool executable, and all project deliverables
Battelle will document progress made on this project within monthly progress reports to EPA
and will keep the EPA TOPO updated via e-mail and conference call as Battelle's activities
proceed
6.6 DATA ACQUISITION AND MANAGEMENT
Battelle will maintain the data for this project in a manner that preserves the confidentiality of all
the data and prevents its release Battelle will maintain the data received from EPA and other
souices (CDC, Massachusetts, Los Angeles) and any data files created by analysis programs on
either hard drive or removable diskette Removable diskettes will be locked up when not in use,
any computer listings or data summaries shall be locked in a file when not in use, unneeded
listings or summaries shall be shredded, and the project will be discussed only with managers
and staff who are familial with the project and its data confidentiality lestnctions When the
EPA TOPO sends the data to Battelle, the data will be sent by overnight delivery service, or
through a secure FTP transmission site established by Battelle The data diskettes will be
enclosed in an envelope labeled "To be opened by addressee only " The Project Manager will
track the delivery of the data and verify with the Battelle Project Manager that the data have been
received
Once data files are leceived from EPA, Battelle will handle the original data (e g data with
personal identifiers) as though the data were classified as confidential business information
(CBl) under the Toxic Substances Control Act (TSCA), even though EPA may not specifically
classify these data as "CBI " While they are in Battelle's possession, the data files will not be
shared with anyone outside of the project team, and the pioject team will not share files via e-
mail attachment Until they are required to be returned to EPA, the data diskettes will reside
solely within Battelle's headquarters in Columbus, OH, which is a controlled access campus
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
26

-------
When other Battelle technical staff are required to assist in the data analysis effort, Batielle's
Project Manager will brief them on all confidentiality requirements associated with the project
and the data and will ensure that they follow appropriate requirements when possessing the data
diskettes Battelle will follow the TSCA CBI Security Manual when conducting data analyses
on computer systems and on establishing reporting and record keeping procedures on this
project
When Battelle's Work Assignment Leader returns the data diskettes to the original source at the
end of the project, along with any analysis programs and output that may be required, the
diskettes will be enclosed in an envelope labeled "To be opened by addressee only" and will be
sent by overnight delivery service Battelle's Work Assignment Leader will track the delivery of
the diskettes and verify that the appropriate source has received the diskettes
6.7	SOFTWARE ACQUISITION & USAGE
Microsoft Access and SQL server will be the primary software tools used for data management,
although some data or results may be transferred to or presented in spreadsheets readable by
software such as MS Excel, SAS, hard-copy data records or ArcView The SAS® System will be
the primary statistical data analysis tool used on this project, but may be supplemented by
S-Plus, or other specialized statistical software as agreed upon by EPA ArcView will be the
primary software tool used for GIS applications on this project All required software presently
reside on personal computers at EPA and Battelle
6.8	DATA VALIDATION AND USABILITY
Battelle will work directly with each information source (or review appropriate documentation
for publicly available sources) to ensure that the variables in the data files aie understood To
ensure complete and accurate tiansfer of data, records counts, variable lists, and summary
statistics will be compared between the original data sources and the data transferred to the study
database Standard software packages (e g., SAS® System, Arcview) will be used for the data
analyses, and the exploratory analyses will be used to identify any obvious problems with the
data (e g incomplete data, or data that is inconsistent with the documentation) Any major
deviations that are identified in these comparisons will be investigated to locate and understand
the rationale for the deviations If EPA determines that the outcome of these comparisons is not
acceptable, then the Battelle Work Assignment Leader and the EPA Woik Assignment Manager
will work jointly to get information that will explain the unacceptable outcome and to determine
whether data are available that meet the requirements of this project
6.9	RECONCILIATION WITH USER OBJECTIVES
The goals of this pioject aie to (1) perform data management seiviccs, (2) conduct exploratoiy
and sophisticated longitudinal statistical analyses, and (3) develop a visualization tool for a pilot
study to develop a method of identifying geographic aieas with children at high risk of lead
poisoning by combining environmental exposure, demographic characteristics, and
programmatic level of support infoimation All EPA objectives are captuied in the Quality
Management Plan developed for this project by Battelle
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
27

-------
Databases developed in Task 1 will be stored in a format agreed upon by the Battelle Work
Assignment Leader and the EPA Work Assignment Manager.
The specific form of statistical analyses to be performed will be identified and determined in
discussions between Battelle and EPA technical staff. When necessary, the objectives and
methods associated with these analyses will be documented in e-mail messages between Battelle
and EPA, prior to formal integration into the Statistical Analysis Plan The Statistical Analysis
Plan will be amended as necessary throughout the performance of this project, to reflect any
modification to the statistical analysis approach dictated by the course of research and outcomes
from the statistical modeling efforts
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
28

-------
7.0 REFERENCES
24 CFR Part 35, 40 CFR Part 745, Lead, Requirements for Disclosure of Known Lead-Based
Paint and/or Lead-Based Paint Hazards in Housing, Final Rule (3/6/1996) Accessed at
http//www leadsalehomcs info/pdfs/all titlctcn fulltcxl cnglish pdf'#scarch=%22HUD7o
20IQ18%20Rule%22
40 CFR Part 745, Lead; Identification of Dangerous Levels of Lead; Final Rule (1/5/2001)
Accessed at http //www epa gov/fedrgstr/EPA-TOX/2001/Januarv/Dav-05/l84 ixlf
CDC 1997 Screening Young Children for Lead Poisoning Guidance for State and Local Public
Health Officials, edited by U S Department of Health and Human Sevices Atlanta GA
Public Health Sevices, CDC
HUD 1995 The Relation of Lead Contaminated House Dust and Blood-Lead Levels Among
Urban Children Washington DC U S Department of Housing and Urban Development
Lanphear, BP, TD Matte, J Rogeis, RP Clickner, B Dietz, RL Bornschein, P Succop, KR
Mahaffey, S Dixon, W Galke, M Rabinowitz, M Farfel, C Rohde, J Schwartz, P Ashley,
and DE Jacobs. 1998 The contribution of lead-contaminated house dust and residential
soil to children's blood lead levels A pooled analysis of 12 epidemiologic studies
Environ Res 79 (1) 51-68
Miranda, ML, DC Dolinoy, and MA Overstreet 2002 Mapping for Prevention GIS Models for
Directing Childhood Lead Poisoning Prevention Programs Environmental Health
Perspectives 110 (9) 947-53
Miranda, ML, JM Silva, MA Overstreet Galeano, JP Brown, DS Campbell, E Coley, CS
Cowan, D Harvell, J Lassiter, JL Parks, and W Sandele 2005 Building Geographic
Information System Capacity in Local Health Departments Lessons from a North
Carolina Project Am J Public Health 95 (12) 2180-5
Spivey, Angela The Weight of Lead Effects Add Up in Adults Environmental Health
Perspectives Volume 115. Numbei 1, Januai v 2007.
Strauss, Warren, R Carroll, Steve Bortnick, John Menkedick, and B Schultz 2001 Combining
Datasets to Predict the Effects of Regulation of Environmental Lead Exposure in Housing
Stock Biometrics 51 203-210
Strauss, Warren, Ramzi Nahhas, Leanna House, Amy Kurokawa, and Bradley Skarpness 2001
Development of Models to Predict Risk of Childhood Lead Poisoning at the Census Tract
Level Columbus OH Technical Report to the U S Centers for Disease Control and
Pievention under Contract No 200-98-0102
Strauss, Warren, Tim Pivetz, P Ashley, John Menkedick, E Slonc, and S Cameron 2006
Evaluation of Lead Hazard Control Treatments in Four Massachusetts Communities
through Analysis of Blood-lead Surveillance Data Environmental Research 99
(2) 214-223
US Department of Housing and Ui ban Development Septembei 15, 1999 Final Rule,
Requirements for Notification, Evaluation and Reduction of Lead-Based Paint Hazaids in
Federally Owned Residential Piopcrty and Housing Receiving Federal Assistance
Washington DC Federal Register, 50140-50231
Battelle
PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES
29

-------