Bafteiie n ati Febiuary 15, 2007 505 King Avenue Columbus, Ohio 43201 -2693 (614)424-6424 Fax (614) 424-5263 Ms Sineta Wooten Project Officer Program Assessment & Outreach Branch (7404) OPPT, Room E827 U S Environmental Protection Agency 1200 Pennsylvania Avenue NW (7404T) Washington, D C 20460 Dear Ms Wooten Contract No. EP-YV-04-021 Work Assignment 2-13 Final Quality Assurance Plan Enclosed for your review is a revised version of the Quality Management Plan for Woik Assignment 2-13 on Targeting Elevated Blood-Lead Levels in Children This report piovides information regarding the data sources that will be utilized in this work assignment, the statistical analysis methods that will be used to develop predictive models, and quality assurance and quality control measures taken to ensuic quality and integrity of results This is a delivciablc under Amendment 4 of the subject work assignment If you have any technical questions about either, please contact me at 614/424-4547, or Warren Strauss at 614/424-4275 Bruce E Buxton, Ph D Vice Piesident and Senioi Piogiam Managci Statistics and Information Analysis BEB/WS Inl Enclosure cc Ms Margaict Conomos. EPA WAM Di Barry Nussbaum, EPA Technical Advisor Mr Ronald Morony, EPA Project Officei Sincerely, ------- February 15, 2007 DRAFT QUALITY MANAGEMENT PLAN FOR THE TARGETING ELEVATED BLOOD-LEAD LEVELS IN CHILDREN PILOT STUDY Prepared By BATTELLE 505 King Avenue Columbus, Ohio 43201 EPA Contract No. EP-W-04-021 Prepared For Sineta Wooten, Project Officer Margaret Conomos, Work Assignment Manager Barry Nussbaum, Technical Advisor Program Assessment and Outreach Branch National Program Chemicals Division Office of Pollution Prevention and Toxics U.S. Environmental Protection Agency 1200 Pennsylvania Avenue NW (7404T) Washington, D.C. 20460 Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES ------- REVISED QUALITY MANAGEMENT PLAN FOR TARGETING ELEVATED BLOOD LEAD LEVELS IN CHILDREN PILOT STUDY Version #1 February 15, 2007 Approval for Battelle Principal Investigator Project Managei Warren Strauss Bruce Buxton Approval for U S Environmental Protection Agency EPA Project Officer Date OPPT Quality Officer Date Sineta Wooten John M Dombrowski EPA WAM Date EPA Technical Advisoi Date Margate! Conomos Bairy Nussbaum Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES ------- Draft Quality Management Plan for the Targeting Elevated Blood-Lead Levels in Children Pilot Study February 15, 2007 1.0 INTRODUCTION Over the past 15 yeais, various childhood lead poisoning prevention programs (CLPPPs) throughout the U S have conducted analyses of their screening data to develop "risk indices," or mathematical models for piedicting the prevalence of childhood lead poisoning in various diffeient geographic areas within their legions of concern These modeling efforts are generally intended to both characterize the extent of the pievalence of childhood lead poisoning within their geographic areas and support the development of targeted screening and outreach plans in order to reach the 2010 goal of eliminating childhood lead poisoning throughout the U.S 1,2 To date, the majority of modeling efforts focused on risk assessment and targeting have focused on combining screening information and data with demographic variables available from the U S Census Previous studies have combined childhood surveillance data (aggregated at the zip-code or census tract level) with demographic predictor variables from the census for the purposes of targeting geographic areas at higher risk of childhood lead poisoning (Miranda, Dolinoy, and Overstieet 2002, Miianda et al 2005, Strauss, Nahhas et al 2001). These studies have led to recommendations for using age of housing and percent of population below the poverty line for targeting neighborhoods that may be of increased risk for childhood lead poisoning (CDC 1997) Numeious studies have also been used to document the lelationship between children's blood-lead concentrations and measures of lead in lesidential environmental media (dust, soil, air, water, and food) (HUD 1995, Lanphear et al 1998, Stiauss, Carroll et al 2001) These studies have contributed to EPA and HUD regulations and policies for identifying and ieducing residential childhood lead exposures (24 CFR Part 35, 40 CFR Part 745, 40 CFR Part 745, U S Department of Housing and Uiban Development September 15, 1999) Other studies have combined blood-lead surveillance data with programmatic information on housing units tieated to determine the positive impact of housing-based intervention programs (Strauss et al 2006) The goal of this study is to explore models in order to predict the numbci of children at nsk of elevated blood lead levels for a given geogiaphic aiea based on a hieraichical combination of demographic, environmental, and programmatic information sources While the models will be highly dependent on available data, it is expected that this study will piovide an appropriate statistical methodology that combines each data source in an appropriate manner, adjusting foi global and local trends over time In doing so, the models will build upon concepts of hieiarchical modeling and longitudinal data analysis As EPA, CDC, and other federal and stale agencies piepare to meet the 2010 goal of eliminating childhood lead poisoning, oui pilot study of integiating seveial diflerent types of data sources 1 hup //www ede eov'/nceh/lead/abcnii/piogiam him 2 hllo //www ena eov/lcad/oubs/fedsii:negv2000 ndf Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 1 ------- will hopefully improve the predictive power of models that icly on a single information source This will allow for more efficient targeting of those geographic areas that need the most help in eliminating childhood lead poisoning 1.1 MODEL DEVELOPMENT This pilot study seeks to develop models to predict the numbei of children at risk of elevated blood lead levels for a given geographic area based on a hieraichical combination of demographic, environmental, and programmatic information sources Doing so requires looking at both the mechanisms of childhood lead risk assessment and contiol activities at the local level as well as at broad trends across the U S The two main analysis goals correspond to developing predictive models at two different levels of geographic specificity, and appear as follows 1 Broad Coverage (Lower Resolution) Model This type of model is intended to be able to characterize broad trends over time in the prevalence of childhood lead poisoning at the county level across the entire U S This mode! will be based on quarterly county-level aggregated surveillance data from the CDC and augmented with environmental, demographic, and piogrammatic (level of financial spending) information 2 High Resolution Model This type of model represents the effort to assess the lelativc contribution of various exposure sources associated with elevated blood-lead concentrations within select communities This type of model certainly pays homage to the concept that exposures that contribute to childhood lead-poisoning are likely to be community specific Although we recognize that theie are non-local, non- environmental (including genetic susceptibility), and even non-domestic sources that can also contribute to lead poisoning, the focus of this study is on the local factois that can contribute to childhood lead poisoning (Spivey 2007) It is anticipated that this analysis goal will be met through modeling efforts of Los Angeles case management data and Massachusetts housing unit lead assessment and/or control activities These data sources will be augmented with other environmental and demographic information such as air monitoring, toxics release inventory, age of housing stock, poverty level, etc It is anticipated that synergies will be evident between the two models described above While a formal combination of the two geographical models may not be accomplished during the timeframe of this current pilot study, belter understanding of (1) the relative importance of various exposure sources in addition to leaded paint in housing, and (2) the geographic areas across the U S that iemain at increased risk for childhood lead-poisoning, will set the stage foi doing so in the near future In order to meet the two main analysis goals above, a draft Statistical Analysis Plan is piovided in Section 3 Batlelle has worked with the EPA Work Assignment Manager (WAM) to develop a Statistical Analysis Plan that details the specific goals of the data analysis, and the statistical methods that will be utilized to meet those goals including both descriptive methods used in preparation of more sophisticated modeling efforts Battelle will continue to work with the EPA Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 2 ------- WAM to amend and revise the plan, as necessary, and to accommodate additional data analysis goals and methods as the project evolves past the March 2, 2007 date for this particular work assignment The Statistical Analysis Plan will become the basis for the final report 1.2 DATA MANAGEMENT AND COMPUTING RESOURCES The primary objective of this pilot study is to utilize combined information from different sources at various levels of geographic and temporal specificity to more accurately target geographic areas at high risk for not meeting the 2010 goal of eliminating childhood lead-poisoning As such, work on the study will lequire careful integration of a variety of data sources with various characteristics and documentation Data to support this study will be gathered from a variety of sources, including federal, state and local lead poisoning prevention programs, as well as publicly available data that can be downloaded from the internet (e g , Census data, EPA's Toxic Release Inventory, etc ) When Battelle first receives each data source, we will review the data and supporting documentation to gain knowledge on the structure, relationship and quality of the data Battelle database managers will work as appropriate with the project team (including collaborators providing data to the project, as well as EPA) to determine the final format for each database, desired uses of the databases, as well as the requirements for maintaining the databases Based on this information, Battelle will construct master databases for the national low-resolution model and for each of the high-resolution models (Los Angeles and Massachusetts) that integrate the various environmental, demographic, and programmatic variables, and facilitate statistical analyses of the combined data Battelle is piepared to construct these databases by combining data sent by EPA to Battelle or acquired from national databases (U S Census, U S Geological Survey, etc) in a variety of formats including MS SQL Server, MS Access, Excel, ACSII, Access, ArcView, and SAS® electronic databases In order to combine the various data sets, they will be merged on key fields, including state, county, Census tract, and time period The data being used for analyses of a particular geographic level, e g county, are comparable because they are representative of that geographic area Throughout the development process, Battelle will conduct checks for completeness on all study databases, and will work with data-sharing collaboiators and EPA to attempt to complete missing data as necessary to complete the proposed statistical analyses Any changes to the databases (corrections, additions, deletions, etc ) will be fully documented in appropriate meta-data files, and reported to EPA As part of constructing and maintaining these databases, Battelle will develop appropriate documentation of the combined mastei databases Battelle has Standard Operating Procedures (SOPs) in place that will be followed to ensure the proper storage, backup, and retrieval of datasets created and analyzed for this study Highlights of these SOPs are summarized below, and in the Quality Management Plan (Section 6 of this Quality Assurance Plan) The various databases are backed up to tape nightly via Battelle's automated backup routines, and will only be accessible to membeis of the proposed Battelle Project Team CD-ROM backups also will be made on a regular basis to serve as a safeguard in case the backup system fails foi any reason Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 3 ------- An additional important aspect of the proposed project beyond the acquisition of the appropriate data sources is the processing and analysis of data Battelle has all the necessary computing software and hardware to accomplish the scope of work within this study A brief overview of Battelle's statistical and computing software and hardware is given here Every technical staff member in Battelle's Statistics and Information Analysis Product Line has the latest version of SAS installed on their Windows® PC The base system and procedure modules, Stat and Graph, are standard on each computer Supplementary modules, including IML, FSP, QC, Insight, and AF, are available to all staff as needed Generally, the SAS System is used to translate and prepare the data for statistical or mathematical analyses Data sets can be converted from their native format to permanent SAS datasets, or data can be seamlessly retrieved electronically using SAS tools for dynamic data exchange (DDE) or Open DataBase Connectivity (ODBC) functionality Our team is familiar with the advantages and disadvantages of each of these data retrieval approaches as well as the SAS code necessary to accomplish such tasks In addition to SAS, we have available S-Plus, C++, and nearly every other major software product for developing and applying statistical routines Our technical staff have extensive knowledge and availability of SQL and all of its various dialects including Access SQL, Transact SQL (Microsoft SQL Server), and SQL*Plus (Oracle) In addition to SAS and SQL, Battelle staff have several other supporting software tools that provide us diverse database capabilities, including Geographic Information Systems (GIS) such as Arcview and Arclnfo, as well as worldwide web delivery capabilities Battelle staff can access and update data in most major systems, ranging from Access and Paradox to SQL Server and Oracle. One installation of SQL Server is maintained and used within the group Our Information Management department maintains other installations of SQL Server We arc well-versed in converting data to and from various formats, including delimited text, dBase, MS Access, Excel, FoxPro, Paradox, all versions of SAS, and S-Plus We normally use third-party tools to handle data conversion For unusual types of conversions, we have written utilities in MS Access, Visual Basic, C#, and Delphi Our proposed team is familiar with the AMP/350, AMP/370, and R-2 AIRS data formats, having used these data on several piojects Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 4 ------- 2.0 DESCRIPTION OF DATA The main two goals of the statistical analysis are to develop a predictive model(s) to implement targeted interventions based on a better understanding (i) the relative importance of various exposure sources in addition to leaded paint in housing and (2) the geographic areas across the U S that remain at increased risk for childhood lead- poisoning In doing so, blood-lead data will be combined with various environmental, demographic and programmatic datasets at different levels of geographic specificity and coverage The details associated with each of these various data sources as well as their anticipated inclusion in a high or low geographic resolution model are provided below 2.1 RESPONSE VARIABLE: MEASURES OF BLOOD-LEAD CONCENTRATION OF CHILDREN The statistical models are based upon blood-lead levels of children corresponding to the various geographic areas studied It is anticipated that the CDC Lead Poisoning Prevention Branch will provide quarterly summary data from their National Surveillance database for children aged 6-36 months These summary measures will include the number of children screened, number of children who exceeded certain blood-lead thresholds and potentially a continuous summary measure (geometric mean blood-lead concentration) for state/local grantees with a history of universal screening and reporting Careful attention will be paid to ensure that children with multiple testing results over time are represented appropriately in the analysis dataset Our intention is to have the models reflect the annual prevalence of childhood lead poisoning over time As such, Battelle will develop an appropriate algorithm to select representative screening test(s) for children with multiple results with an objective of having children represented in the analysis dataset maximally once a year Blood-lead surveillance data from Massachusetts and Los Angeles will be provided on specific testing results for individual children (with confidential identification infoimation excluded) The Massachusetts blood-lead surveillance data will represent all children aged 6-36 months tested, while the individual data records from Los Angeles will likely only represent children with elevated blood-lead concentrations Aggregate summaiy measures (at the census tract-level) from the State of California will be utilized to establish the denominator of childien tested in each quarter m the City of Los Angeles, so that aggregate summary measures, similar to those described above for the National Model, may be constiucted to support the Los-Angeles specific data analysis goals Similar aggiegate measures may also be constructed for the Massachusetts data, to ensuie consistency and comparability of results Due to selection bias, we expect that the CDC National Suiveillance dataset as well as the local data souices in Massachusetts and Los Angeles may show higher proportions of elevated blood-lead concentrations than found in the general population For this reason, we will compaic the proportion of children with elevated blood-lead concentrations as well as the distribution of the potential continuous summaiy measure with those reported by the most recent six-years of available CDC National Health and Nutrition Examination Survey (NHANES) If the Surveillance data differs gieatly from the NHANES data, we may recommend methods foi calibrating the Surveillance data to better match the National Distribution of childhood blood-lead concentrations as appropnate (Strauss, 2001) in subsequent work efforts Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 5 ------- 2.2 EXPLANATORY VARIABLES The following subsections provide a description of the various explanatory variables that will be utilized in the models developed to predict risk of childhood lead poisoning at the National and local levels 2.2.1 Demographic Data from the U.S. Census Demographic information from the 2000 U S Census will be utilized in both high and low resolution models, with data being acquired at the county level for the entire nation and at the census tract level for Massachusetts and Los Angeles The data gathered by the census bureau includes over 1000 variables To narrow the scope of the project, we will explore 50 variables within 10 general categories, most of which had been previously used by Battelle in a CDC sponsored study to predict risk of elevated blood-lead concentrations at the census tract level (Strauss, 2001) In many cases, these variables are constructed from counts or summary statistics published in the detailed U S Census Tables For example, within each geographic area, the census provided the number of houses that were built before 1950 and the median income of all households However, in order for us to draw comparisons from tract to tract and/or county to county, the census variables needed to be manipulated in a fashion that depended upon the format of the variable For example, count variables, such as the number of housing units built before 1950, were changed to percentages as shown below Numberof Houses Built Befoie 1950 „ rTT „ , „ = Percent or Houses Built Before 1950 Total Numberof Houses Bui kin the Tract Table 1 supplies a list of the variables placed under investigation and the denominators used for the percentage calculation Summary-statistic variables describing income on the other hand, may be standardized within state to adjust for between state differences in the cost of living Table 1 also notes which variables were standardized 2.2.1.1 Census Variable Descriptions Density Since both counties and census tiacts vaiy with lespect to spatial area and population, and previous work suggests that risk of childhood lead poisoning differs between nual and urban aicas, we will utilize a population density variable as a potential explanatoiy variable or effect modifier in the statistical models We will investigate population density in two ways The first of which divides the number of people within the tract by the amount of land area measured in 001 square kilometers The second divides the number of housing units by the amount of land area measured in 001 square kilometers Housing units include the following a house, apartment, mobile home, group of rooms oi single room that is occupied as sepaiate living quarters Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 6 ------- Race The census bureau presents 5 general race groups; (l)white, (2)black, (3)Indian, Eskimo, and Aleut, (4) Asian Pacific and (5) Other Additionally, we will create one additional variable relating to race that describes the number of people that consider themselves neither white nor black Age The census bureau does not report the age of people directly Instead, the agency leports the total number of people that fall into various age categories The variable, "Pet le 6 years" will be created to identify the number of children within each geographic area less than or equal to 6 years of age at the time of the 2000 Census Additionally, we will calculate the median age of the total population and of those less than or equal to 6 years old by taking a weighted average of the midpoint of each age category (the counts uie used as the weights) Family Structure The census bureau does not supply a unique variable that indicates the number of single parent households within a tract Therefore, this variable will be created by combining census variables as follows M = Number of Households with a male householder (no wife piesent) whose own children are under 18 years old F = Number of Households with a female householder (no husband present) whose own children are under 18 years old T = M +F +Number of married couples with own children under 18 years (M + F) Percent of Sinele Parent Household = T Education Four variables pertaining to the proportion of adults with various levels of education will be created as follows L9 = Number of people older than 18, that have less than a 9th grade education L12 = Number of people older than 18, that have 9th though 12th grade experience, but do not have a high school diploma 12 = Number of people older than 18, that obtained a high school diploma or GED C = Number of people older than 18, that have some college experience but did not receive a college degree T = Number of People that are over than 18 years old (L9) Percent less than 9th giade = T Percent no HS degree = + (L9 + L\2 + \2 + C) T (L9 + L12 + 12) Pei cent no col lege degree = Percent no college experience = T Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 7 ------- Income We will calculate median income per household, family, and person Additionally wc will investigate the proportion of households that do not receive any wages, do not receive any earnings and do receive public assistance The Census defines earnings and wages as follows "Earnings" iepresent the amount of income received regularly before deductions for personal income taxes, Social Security, bond purchases, union dues, Medicare deductions, etc "Wages" include total money earnings received for work performed as an employee during the calendar year 1999 It includes wages, salary, Armed Forces pay, commissions, tips, piece-rate payments, and cash bonuses earned before deductions were made for taxes, bonds, pensions, union dues, etc Poverty Level Similar to the income variables described above, we will summarize the poverty level of the people individually and the families as a whole within each geographic area by creating the variables Percent Persons and Percent Families Below the Poverty Level In order to focus on the poverty level of the children within each tract however, we created the variables Percent Persons Five Years and Under and Percent Families with Children Under Five Years Below Poverty Level Note that in calculating the various percentages for each of the variables the denominator changes Vacant The remaining variables targeted for exploration within our analyses characterize the housing stock For example, noting the percent of housing units that are vacant potentially indicates the level of care taken to maintain buildings within the tract Buildings that are not occupied are more likely accumulate dust or debris to which the children of the tract may be exposed upon reoccupancy Housing Year Built and Occupied Housing Year Built During the 1950's the United States started to become aware of the consequences associated with the exposure of lead in paint Thus, the use of lead paint within homes began to decrease However, in 1977 the use of lead paint in homes became illegal Thus, the years during which the housing units were built within each tract is important to characterize, older homes are more likely to contain lead paint than newer homes Furthei more, occupied housing units are moie likely to have lead paint removed than vacant homes Thus, the years during which occupied housing units weie built and the years during which all housing units were built will be investigated within this leport Rent and Value The peicent of occupied housing units that are rented, rather than owned, is calculated by dividing the number of rented occupied housing units within the tract by the total number of occupied housing units Vanables constructed to characterize amount of rent paid in addition and value of all housing units (owned and rented) will be standardized to account for state-to-state diffeiences in the cost of living The median rent variable utilized represents the median across an entire geographic area (e g , county or Census tract) As part of futuie work, geogiaphic areas could be categorized as either metropolitan or rural and grouped at broader geogiaphic levels such as EPA region A Battel le PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 8 ------- median of the broader area could be calculated and then a difference from the regional median calculated for each specific county or Census tract Note that this methodology could be applied to income-related variables, as well Table 1. Initial Variables for Analysis Created From the 2000 Census Variable Group Index Census Variable* Format Calculation Analyzed Variable Density I Persons Count Land Area (Units = 001 km2) Population Density 2 Housing units Count Land Area (Units = 001 km2 Housing Density 3 White population Count Persons Pet White 4 Black population Count Persons Pet Black Race 5 Indian, Eskimo, and Aleut population Count Peisons Pet Indian, Eskimo, Aleut 6 Asian Pacific population Count Persons Pet Asian Pacific 7 Other Race population Count Persons Pet Other Race 8 All races excluding white and black* = #5 + #6 + #7 Count Persons Pet Non White and Non Black 9 Children Less than or Equal to Six Years Old Count Persons Pet le 6 years Age 10 Median Age* Statistic Median age of persons 11 Median Age of Children Less than or Equal to Six Years Old* Statistic Median age of persons le 6 years Family Structuie 12 Single Parent* = Single Male with Children + Single Female with Children Count Household with Children Less than or equal to 18 years old = Married Couple with children + Single Male with Children + Single Female with Children Pet Single Parent 13 Less than a 9th grade Education Count Persons 18 years old and over Pet less than 9th grade 14 Less than high school* = #13 + persons with 9th to 12th grade education without obtaining a high school diploma Count Persons 18 years old and over Pet no HS degree Education 15 Less than college* = #14 + Persons with high school diploma, but no college experience Count Persons 18 years old and over Pet no college 16 Less lhan college degree* - #15 + Persons that attended college without obtaining a college diploma Count Peisons 18 years old and ovei Pet no college dcgiee Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 9 ------- Table 1. (continued) Initial Variables for Analysis Created From the 2000 Census Variable Group Index Census Variable* Format Calculation Analyzed Variable 17 Household Median Income Statistic Standardized Median Income for Households 18 Family Median Income Statistic Standaidized Median Income of Families Income 19 Per Capita Income Statistic Standardized per capita income of persons 26 Households without earnings Count Households Pet No Earnings 27 Households without wages Count Households Pet No Wage or Salai y 29 Households that obtain public assistance Count Households Pet With Public Assistance 30 Persons below poverty level Count Persons for whom poverty status is determined Pet Persons Below Poverty Poverty Level 31 Persons who are less then or equal to five years old that are below poverty level* Count Persons who are less then or equal to five years old for whom poverty status is determined Pet Persons Below Poverty of Age LE 5 Below 32 Families with total income below the poverty level Count Families Pet Families Below Poverty 33 Families with total income below the poverty level that have children under five yeais old Count Families with children under five years old Pet Poverty of Families with Children LT 5 34 Vacant Count Housing Units Pet Vacant 35 Housing Units Built befoie 1940 Court Housing Units Pet Pie 1940 Housing 36 Housing Units Built before 1950 Count Housing Units Pel Pre 1950 Housing 37 Housing Units Built befoie 1960 Count Housing Units Pet Pre 1960 Housing Housing Units 38 Housing Units Built befoie 1970 Count Housing Units Pet Pre 1970 Housing 39 Housing Units Built bcloie 1980 Count Housing Units Pet Pte 1980 Housing 40 Median Yeai thai Housing Units were Built Statistic Median Yeai Built 41 Median Year (hat Housing Units wcie Built - Calculated by Battelle Statistic Calculated Median Yeai Built Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 10 ------- Table 1. (continued) Initial Variables for Analysis Created From the 2000 Census Variable Group Index Census Variable* Format Calculation Analyzed Variable Occupied Housing Units 42 Housing Units that are rented Count Occupied Housing Units Pet Renter Occupied 43 Occupied Housing Units Built before 1940 Count Occupied Housing Units Pet Pre 1940 Occupied Housing 44 Occupied Housing Units Built before 1950 Count Occupied Housing Units Pet Pre 1950 Occupied Housing 45 Occupied Housing Units Built before 1960 Count Occupied Housing Units Pet Pre i960 Occupied Housing 46 Occupied Housing Units Built before 1970 Count Occupied Housing Units Pet Pre 1970 Occupied Housing 47 Occupied Housing Units Built before 1980 Count Occupied Housing Units Pet Pre 1980 Occupied Housing 48 Median Year (hat Occupied Housing Units were Built Statistic Median Year Built - Occupied Only Housing Value 49 Median Rent Statistic Standardized Median Gross Rent 50 Value of Owner Occupied Housing Units Statistic Standardized Median Housing Unit Value 'Variables that were created by combining different pieces of information from the 2000 Census 2.2.2 Environmental Data Sources Environmental data acquired for this project will include air and groundwater monitoring data aggregated at the county level for the low tesolution model and at higher resolutions when possible for the LA and MA analyses In cases where the data aie available foi a limited number of air-monitonng stations or drinking water samples available for the legion(s) being investigated, geospatial modeling techniques might be used as appropriate to develop predictions across the entire region Existence of industrial sources of lead within each county as indicated by the toxics release mventoiy will also be included as an environmental data source Each of these data sources is discussed in further detail below Given the time constraints on gathering these data and conducting the analyses, a reasonable amount of environmental data were obtained and integrated into analyses, however, it is likely that other relevant envnonmental data sources are available and could be incorporated into these analyses in the futuie. For example, while modeled air lead data aie being used as part of the initial analyses because of their availability at the county level, available air monitoring data could be modeled to interpolate values for all geographic areas of interest and included in the analyses As with the other variable types, a child is linked to a particular geographic area based on then- address at the time of the blood test used in the analysis They aie then associated with the various environmental, demographic, and other data associated with that geogiaphic aiea Certainly there aie cases where childien move or spend significant time in other geographic areas that will not be captured by this analysis If a child spends a significant amount of time in other locations outside their homes but within the same county or Census tract, then all exposure data associated with their home is relevant for those other locations It is our assessment that this Battelie PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 11 ------- movement is less of a problem in the national analyses where county-to-county moves are less frequent as opposed to the local analysis where movement between Census tracts is more frequent 2.2.2.1 Concentrations of Lead in Air EPA maintains a number of ongoing air monitoring programs that collect data over time on concentrations of various criteria air pollutants, air toxics, constituents of particulate matter, and other airborne chemicals Each of these monitoring programs have multiple air monitoring stations that are deployed throughout the country to meet various goals associated with the Clean Air Act and other federal and state regulations and programs For example, some of the monitoring stations are placed in close proximity to industrial sources of pollution and major populations centers, while other stations are place in remote areas to assess background chemical concentrations While many of these monitoring sites provide information on the concentration of lead in air over time, a quick assessment of the spatial coverage of these monitoring networks suggested that making use of these data would be problematic for this study due to time and resource constraints. Lead concentrations in air from the monitoring networks are not available in the majority of counties that will be covered in the low resolution model, or the census tracts that will be covered in the high resolution models - as shown at the following EPA Website (http //www epa gov/antrends/lead html) Instead of utilizing air monitoring data as described above, we intend on making use of modeled predictions of concentrations of lead in air from EPA's 1999 National Scale Air Toxics Assessment - in which county and census tract level predictions are available throughout the entire country based on the use of predictive models Documentation for the 1999 National Scale Air Toxics Assessment, as well as the predicted air concentration data can be found at http //www ena gov/ltn/atw/natal999/tables html The predictions were generated using the Assessment System for Population Exposure Nationwide, or ASPEN This model is based on the EPA's Industrial Source Complex Long Term model (ISCLT) which simulates the behavior of the pollutants after they are emitted into the atmosphere ASPEN uses estimates of toxic air pollutant emissions and meteorological data from National Weather Service Stations to estimate air toxics concentrations nationwide The ASPEN model takes into account important determinants of pollutant concentrations, such as rale of release location of release the height from which the pollutants are released • wind speeds and directions fiom the meteorological stations nearest to the lelease breakdown of the pollutants in the atmosphere after being released (i e , reactive decay) settling of pollutants out of the atmosphere (1 e , deposition) tiansformation of one pollutant into another (i e , secondary formation) The model estimates toxic an pollutant concentrations for every census tract in the continental United States, however - these data are only available for 1999. Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 12 ------- 2.2.2.2 Toxic Release Inventory EPA's Toxic Release Inventory catalogs various sources of lead, based on information provided by industrial facilities This data source is being used to generate county-level estimates across the Nation (and census tract level estimates within Massachusetts and Los Angeles) of the total amount of lead and/or lead containing compounds that are released by industrial facilities into the environment via air, surface water, or underwater injection Although the above descnbed ASPEN modeling results are based on the (airborne) emissions data and how they would theoretically translate into average ambient air-lead concentrations at the county and census tract levels- the data from the Toxic Release Inventory that are being managed for this project are available for multiple years and for other types of emissions (such as surface water) Thus, this information may add predictive power to the models 2.2.2.3 Water Quality Data The plumbing system inside a home and from the street to the home may contribute to drinking water contamination To address this potential source, EPA obtained data from their Drinking Water Information System that includes tap water lead levels for public water systems Public water treatment plants are monitored over a period of years via random sampling of tap water within their jurisdictions The length of monitoring vanes depending on the water quality, 1 e , systems found to have higher water lead levels undergo momtonng for longer periods of time Data available from this momtonng program include 90th percentile water lead values 2.2.3 Programmatic Data Sources Most of the explanatory vanables being explored in this project are considered risk factors for childhood lead poisoning We will also attempt to characterize a number of factors that might mitigate these risks We anticipate that the level and charactenstics of programmatic support from either federal, state, or local sponsors will contnbute towards meaningful reductions in the prevalence of childhood lead poisoning The level of financial support available within each county will serve as a proxy for programmatic support in the low resolution (National) models In the high resolution models, the various charactenstics of the programs (information fiom housing inspections and case management services) will be explored within the statistical models The following sections detail the specific characteristics of the vanables used within the models 2.2.3.1 Level of Support Within County - Financial The goal of this variable is to construct a longitudinal history of current and cumulative pei-capita dollars allocated to each county to combat childhood lead poisoning For the National (low lesolution) Model, we will concentrate on Fedeial Funding at the state and local levels Batlelle has made contact with technical staff at HUD's Office of Healthy Homes and Lead Hazard Control to gain access to their data on grants funded since the inception of the Lead-Based Paint Hazard Control Grant Program in 1992 In addition, data were also obtained from CDC's Lead Poisoning Prevention Branch on their program's grant funding The EPA Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 13 ------- WAM is taking responsibility on gaining access to similar data on funding childhood lead poisoning prevention programs by EPA For the high-resolution model in the Commonwealth of Massachusetts, we will also have access to information on within-state funding To date, the only programmatic funding information that we have been able to access is indicative of state and HUD funding of housing-based interventions in the Commonwealth of Massachusetts Battelle will continue to work with its contact at HUD to gain access to this information - however, information not acquired by the middle of January 2007 is unlikely to be integrated into the analyses 2.2.3.2 Risk Assessment and Control Within Housing Units (Massachusetts) The Commonwealth of Massachusetts maintains an extensive database on all lead-based paint inspections conducted over time (dating back to the early 1990's) The Massachusetts Department of Public Health is providing a database that contains a single record for each inspection, with the following information housing-unit id, census tract, date of inspection, result of inspection (whether the housing unit was found to be in compliance with Massachusetts standards) The database contains records on over 200,000 housing units - with many housing units having multiple inspections over time Note that for units with multiple records, we anticipate identifying time-periods in which the units were both in and out of compliance with the Massachusetts Standards These data will be used in the Massachusetts high resolution models in two ways First, we will attempt to develop a longitudinal summary measure of the proportion of housing within each census tract that is known to be in compliance with the Massachusetts Standards We anticipate that within a census tract, as this proportion increases over time the nsk of childhood lead poisoning will decrease Second, due to the fact that we have access to individual blood-lead recoids from Massachusetts with linkable housing-unit identification vanables, we can make a determination of whether a housing unit was in compliance at the time of the blood-lead lest for each child in the database (with potential outcomes of the determination being yes, no, and unknown) The first approach described above is consistent with the methods for exploring aggregated summary blood-lead information over time within each census tract (which is the primary analysis focus within this work assignment) The second apptoach allows us to utilize some predictive information at the individual child level This information may help impiovc prediction, and also help out analysis team assess what information might be lost when transitioning from individual-level data to aggregate summary data in the analyses 2.2.3.3 Case-Management Services (Los Angeles) The Los Angeles (LA) data consists of detailed records on appioximately 700 children that received case management services in response to an elevated blood-lead concentration These records include an assessment by the case manager of the likely lead exposure sources that the Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 14 ------- EBL child encountered, including residential sources of lead-based paint, parental occupation and hobby related sources of exposure, use of glazed ceramic pottery, home remedies, and cosmetics suspected of containing lead, and activity patterns of the child that might contribute towards lead poisoning Some of these children may have repeated tests, and it is unclear as to whether the City of Los Angeles will make these repeat tests available to EPA for this project In addition to the data on EBL children who received case management services in Los Angeles, EPA is working with CDC to try to arrange for aggregate summary information at the census tract level of geographic specificity on all children tested in the City of Los Angeles over the same period of time These data are controlled by the State of California Department of Health, and it is unclear as to whether data transfer can be arranged in time to be utilized for this project Combining these two information sources will provide information on the peicentage of elevated blood-lead children who receive case management services. The case management data will be useful for determining which sources of lead exposure are common among those receiving follow-up, and perhaps which exposure sources are associated with the highest EBL cases Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 15 ------- 3.0 STATISTICAL ANALYSIS Battelle's general approach to data analysis involves translating study hypotheses into statistical models, identifying potential confounders and effect modifiers, and considering how the underlying assumptions of proposed statistical methods might affect their application Our statisticians are highly experienced in applying a wide variety of statistical modeling techniques, including linear and non-linear regression modeling, analysis of variance techniques, generalized linear models including logistic regression models, mixed models, random effects, generalized estimating equations approaches for correlated data (including correlated binary or poisson distributed data), and hierarchical Bayesian modeling The following text provides our technical approach for conducting the descriptive analyses and statistical modeling work associated with this task Descriptive analysis. Battelle starts every analysis with an assessment of the study sample, i.e., the proportion of counties and census tracts in the sample with complete data for both the response variable and the explanatory variables In preparation for developing longitudinal statistical models, we then generate univariate summaries of each variable as a function of time, and make comparisons of these distributions using side-by-side box-plots for continuous data or bar-charts for categorical data This helps verify that the data are clean and ready for analysis and identify cells with sparse data Such descriptive analyses will be conducted on each database, to characterize the distributions of all observed variables using frequency distributions for categorical variables, and simple summary statistics (mean, median, mode, minimum, maximum, and select percentiles) for continuous variables Distributional assumptions may also be explored for certain variables, as appropriate, in preparation for more sophisticated models For example, some environmental concentration data may depart from normality, and follow a log-normal distribution In these cases, we may additionally report the geometric mean and geometric standard deviation as part of the simple descriptive summary Battelle is prepared to create and explore new variables (as appropriate functions of other variables in the study databases) as part of the descriptive analyses, for later use in statistical modeling These variables will be added to the study databases, as appropriate, and will be integrated in all appropriate files and documentation (metadata and data dictionary) The univariate descriptions are then followed by fitting a series of cioss-sectional bivanate relationships between the blood-lead iesponse vanable(s) and each candidate explanatory variable These cross sectional relationships will be explored as a function of time to better understand the stability of these relationships, and whether they change over time, so that they can be modeled appropriately in the more sophisticated longitudinal analyses These analyses will also help identify which explanatoiy variables are most predictive of the blood-lead response variable In preparation for more sophisticated statistical analyses, such as the Geneializcd Lincai Mixed Logistical Regression Model outlined below, we will also perform relevant stratified analyses to investigate interactions discussed in the data analysis plan For example, we may investigate the population density variable in this manner, as density may serve as a surrogate to differentiate between rural and urban geogiaphic areas in the analyses - and exposure vanables may be different in these types of areas Similarly, we can use EPA regions as a potential stratification variable If we do not observe variation in the measure of effect (e g , odds ratios) across the Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 16 ------- levels of a third variable, however, we know we can probably treat that third variable as a potential confounder in the multivariate model, rather than as an effect modifier If the odds ratios differ markedly—e g , the effect appears to be protective in one subgroup and hazardous in another subgroup—we know we must consider the third variable as an effect modifier Statistical Models for Analysis Goal #1 (Broad Coverage - Lower Resolution Model) This model will be used to characterize broad trends over time in the prevalence of childhood lead poisoning across the entire U.S The various surveillance, environmental sources, demographic characteristics, and programmatic support will be aggregated to the county level for those localities with universal screening and reporting Data will be longitudinal in nature with time intervals of one quarter of a year For the purposes of discussion, we will assume that the modeling approach will focus on a logistic regression model for the proportion of children that have elevated blood-lead concentrations (>10 (Jg/dL) The temporal nature of declining childhood lead poisoning will be addressed via classic concepts of longitudinal data modeling of the low resolution data Let Y,j represent the number of children that were detected with blood-lead concentration above 10 jj-g/dL from the ith county and j,h point in time (quarter), n,j represent the numbei of children that had their blood-lead concentration tested from within the t,h county andjlh point in time (quarter), Please note that we expect that n^Ny, where N,} represents the total population of children in the ih county and/' point in tune t,j represent time (in years) corresponding to the Y,j response variable, and X,j represent a series of predictor variables associated with the Y,j response variable These predictor variables will likely represent air monitoring data, groundwater data, census demographic data, programmatic data on federal financial support for lead poisoning prevention, and other related information as detailed above that can potentially help predict the prevalence of lead poisoning at the county level We introduce the following is a potential baseline model logii(£lvj>=A + fl <„+Ax„+<5.,+«., Where the beta parameters (P) leprescnt a vector of fixed effects, and the gamma paramcteis (y) lepresent random effects that allow each county to have their own trend over tunc In this model it can be assumed that So, and 5|, jointly follow a multivariate normal distribution with mean zero V, < 2 2 ^*21 ^22 required to help integrate the complex spatial/temporal correlation and covanance matrix Z - More complex assumptions or purameteis may be Battel le PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 17 ------- Counties with larger yi parameters estimates represent areas where lead-poisoning has not significantly decreased over time Similarly, the parameter estimates can be used to identify those counties with the highest predicted prevalence of childhood lead poisoning at various time points in the future. The product of this effort will be a time-series of maps (or a movie) that spatially interpolates risk of childhood lead poisoning as a function of various appropriate predictor variables, as well as full documentation of the spatio-temporal model developed to meet this analysis goal Battelle will also spend time and resources on this work assignment modifying an existing visualization tool for specific application to this project The resulting executable that will be delivered to EPA will allow users to interact with the modeling results at different levels of temporal and geographic specificity The tool will allow the user to select an appropriate response variable (e g proportion of children with blood-lead concentrations above 5 ng/dL) and play a movie that displays a time-series of maps that displays how the predicted (or observed) risk changes over time The user will also be able to zoom in on a rectangular area, to see these results with a higher degree of geographic specificity The user will also be able to stop the movie (or rewind, fast-forward) to isolate specific points in time. By using the mouse, the user will also be able to select a specific county and the tool will then display the observed and predicted data for that particular county in a separate window Although data maps have potential for misinterpretation and may present some political sensitivities, they are beneficial because they provide a visual presentation of the results that can be relatively easily comprehended by users at different technical levels If time and resources permit, the tool will also include some useful querying functionality (such as being able to isolate the top 'n' counties with respect to predicted risk of childhood lead poisoning, or rate of decline in prevalence) The visualization tool will be written in C++, and will be built in a manner that will allow EPA to modify the model and for Battelle to quickly import the resulting data from a modification into the tool The Battelle Team will fully document all analyses performed on the study datasets This will include keeping log files for each analysis run and providing useful comments in each analysis program that describe exactly what is being done in each step These efforts will ensure that, in the future, EPA staff could replicate any of our analyses and achieve the identical results Besides our programmers and analysts being able to write progiams to perform innovative analyses of any degree of complexity using existing statistical software packages (e g , SAS, S+, ArcView, SUDAAN, etc ), we have highly qualified staff who are can create custom softwaie using many programming languages (e g , Visual Basic, Access, C++, FORTRAN, etc ) to perform specialized data management and analysis tasks Statistical Models for Analysis Goal #2 (High Resolution Models) High-resolution models will be utilized to identify the relative contribution of vanous types of exposure souices in elevated risk for childhood lead-poisoning within a select community These types of sources include housing factors, broader environmental exposure, demographic composition, and piogrammatic resources Battelle PROPRIETARY. PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 18 ------- While this type of model pays homage to the concept that exposures contributing to childhood lead-poisoning are likely community-specific, analysis of the high-resolution models may have certain limitations including selection bias and generalizability to other geographic areas. Consequently, analysis of high-resolution data may primarily descriptive in nature These descriptions, however, may be the best way to identify the casual relationships and pathways to reducing lead-poisoning risk within a neighborhood The characteristics of these pathways will be taken into account when building the low-resolution model for the entire U S For example exposure sources or influential demographic and programmatic variables that account for a significant proportion of risk of childhood lead-poisoning within a community can be combined with lower resolution data across the nation via concepts from hierarchical Bayesian modeling The following sections provide an overview of the statistical analyses that we are proposing for the Massachusetts and Los Angeles data sources Please note that as of the time that this QMP was written - there were doubts as to whethei EPA and Battelle would be able to gain timely access to the Los Angeles data We therefore outline the general approach that would be used for these data in the event that they are received by mid-January Modeling Approach for Massachusetts The Massachusetts Department of Public Health (MDPH) is entering into a limited use data sharing agreement with Battelle, so that they can provide blood-lead testing results on individual children (aged 6-36 months) and housing inspection data in a format that preserves linkages through a housing unit identification variable These data will be utilized in two different modeling approaches The first modeling approach will seek to develop census tract quarterly summary measures similar to the National Model for blood-lead (e g exceedance proportions and geometric means), as well as summaiy measures for the proportion of housing units in each census tract that are known to be in (or out of) compliance with the Massachusetts standard of care (for use as a potential explanatory variable) MDPH has also provided Battelle with summary information regarding HUD and State funding of residential housing interventions (lead hazard control and abatement) - which will be used to develop a longitudinal summary of current and cumulative per-capita spending on residential intervention within each census tract (using various assumptions on the allocation of such dollars) Other explanatory variables, such as the census, toxic release inventory, 1999 National Air Toxics Assessment, and water quality data will be available for use in these models These census-tract level summary data (both response variable and explanatory variables) will be modeled using a similar approach as what is being proposed for the National (Low Resolution) Model - only the unit of clustering will be census tract rather than county This model will be fit, with appropriate exploratory analyses and model documentation being provided to EPA in the technical report deliverable If time and resources permit, Battelle will conduct a series of analyses to compaie and contrast the modeling results for the Commonwealth of Massachusetts between the model described above and the subset of Massachusetts specific data and results that can be extracted from the National (Low Resolution) model This would entail the following steps Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 19 ------- 1 Fit a high-resolution model to the Massachusetts data at the census tract level using the same type of summary measures that were used in the National Model (only at the census tract level) This will be called the Baseline High Resolution Model for Massachusetts 2 Determine whether any improvement in prediction can be made in the baseline model by first assessing other census or environmental variables that were available for use in the National Model, and then secondly adding the Massachusetts specific information on housing-based intervention activities These activities will be used to develop an Improved High Resolution Model for Massachusetts 3. Aggregate the input data for the response variable and model piedictions across census tracts within each county in Massachusetts for both the Baseline and Impioved High Resolution Models, and compare these with the county-level input and model predictions in Massachusetts from the National Model Finally, if time and resources permit, we will develop a statistical model that depends on individual data records for the response variable (rather than aggregate summary measures) - where we determine whether the specific unit where a child resides was in or out of compliance with the Massachusetts standard of care at the time of a blood-lead test, and use this information as part of the prediction The other variables described above that are available at the census tract level would also be utilized as predictor variables in this model, and similar exercises can be used to assess the relative performance (and improvement in prediction) that occurs when modeling this more specific data Modeling Approach for Los Angeles The Los Angeles data consists of detailed records on approximately 700 children that received case management services in response to an elevated blood-lead concentration These records include an assessment by the case manager of the likely lead exposure sources that the EBL child encountered, including residential sources of lead-based paint, parental occupation and hobby related sources of exposure, use of glazed ceramic pottery, home remedies, and cosmetics suspected of containing lead, and activity patterns of the child that might contribute towards lead poisoning While this information is highly specific, it is unfortunate thai similar information was not gathered across a small subsample of children without elevated blood-lead concentrations Thus, these data cannot be utilized to determine causal factors for EBL Rather - they may be used to describe what potential exposure factors do children in Los Angeles with elevated blood-lead concentrations have in common If made available to this project, summary data from the Stale of California that represent iesults of all children tested in Los Angeles can also be used to determine the prevalence of childhood lead poisoning in Los Angeles census tracts over a similar period of observation These data can be combined with the more specific case management lecords, to see if any additional picdictive power can be gleaned fiom this data Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 20 ------- 4.0 SCHEDULE OF DELIVERABLES Table 1 below provides a schedule proposed by Battelle for this project Table 1. Revised Proposed Schedule for EPA WA 2-13 Targeting Areas of Elevated Blood Lead Levels (BLL) in Children Date Responsible Organization Description of Activity 10/16/06 Team Agreement on scope of work (see bullets below) and organizational responsibilities for the project 12/14/06 CDC* Transfer of data from Los Angeles and Massachusetts to Battelle to support analysis goal #1 Transfer of data from the National Surveillance Database to Battelle to support analysis goal #2 EPA Transfer of EPA data sources (Air Monitoring and Toxics Release Inventory) data to Battelle Battelle Battelle assembles other data sources (programmatic, demographic, other environmental) to support analysis goal #2 1/10/07 Battelle Draft Quality Assurance Plan (including data analysis plan) for the Proiect 1/15/07 Battelle Integrated dataset to support analysis goals #1 and #2 (based on input received on 12/14/06 from CDC and EPA) 2/2/07 Battelle Draft report on Exploratory Analyses 2/16/07 Battelle Draft report on High Resolution Models (Massachusetts) 2/2307 Battelle Draft report on Broad Coverage Model 3/2/07 Battelle Integration of Broad Coverage Modeling Results into Graphical User Interface (time series of maps) - Executable Visualization Tool to be delivered to EPA * Note that the transfer of blood-lead records from CDC and Massachusetts has not yet occurred, but that Battelle has conducted preliminary work to ensure that the data can be quickly integrated once received Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 21 ------- 5.0 PERSONNEL QUALIFICATIONS AND ROLES This work assignment and al! data analysis efforts will be managed by Mr Warren Sirauss, a senior level biostatistician with extensive expertise in analyses of correlated binary outcomes and childhood lead poisoning research Dr Bruce Buxton, program manager on the Battelle Contract with EPA, will serve as the Quality Assurance Manager and Technical Advisor to the study They will be supported by various talented staff at Battelle, including Jyothi Nagaraja - a Research Scientist statistician expert in developing logistic regression modeling approaches with previous experience on the CDC risk indices project, Michele Morara - A mathematician and computer software specialist who will be developing the visualization tool, « Tim Pivetz - a senior statistician who has worked with blood-lead and housing inspection data from the Massachusetts Department of Public Health on previous research studies supported by HUD, • Elizabeth Slone - a junior statistician expert in conducting exploratory analyses in advance of development of inferential models, Nicole Iroz-Elardo - a junior statistician expert in childhood exposure related research, . Darlene Wells - A senior database management specialist who will lead efforts on working jointly with technical staff from CDC's Lead Poisoning Prevention Branch to create an appropriate longitudinal dataset of quarterly county summary blood-lead information, • Rona Boehm - a database management specialist who will lead the integration of all datasets obtained for this work assignment, Jennifer Sawchuk - a database management specialist who will assist in the acquisition and integration of data from publicly available sources (e g census, TRL ASPEN model), Michael Schlatt - a computei programmer who will work with Darlene Wells and CDC to develop appropriate code to extract the summary information from CDC's National Surveillance database, and Cyndi Reese - a word processing specialist who will assist in the preparation of reports and presentations Table 2 provides an overview of the roles and responsibilities of each membei of our proposed project team Table 3 provides a bieakdown of anticipated hour allocation to five project activities- Task Management, Data Management, Low Resolution Model, High Resolution Model, and Visualization Tool Table 3 also piovides a summary of total hours planned for each member of the project team, as well as the number of hours alieady spent on the project as of December 28, 2006 Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 22 ------- Table 2. Roles and Responsibilities of Proposed Staff Staff Member Field of Expertise Roles and Responsibilities Warren Strauss Biostalistics Work Assignment Leader Bruce Buxton Statistics Program Manager, QA Manager, Technical Advisor Jyothi Nagaraja Biostalistics Develop logistic regression and structural equations models under the guidance of Warren Strauss Michele Morara Mathematics and Computer Science Develop the Visualization Tool, Assist in model specification and Bayesian implementation if necessary Tim Pivetz Statistics and Lead Exposures Assist in model development and documentation Elizabeth Slone Statistics / SAS Programming Conduct exploratory analyses Nicole Iroz-Elardo Statistics / Exposure Help conceptualize modeling strategy Darlene Wells Data Management and Software Development Assist in developing data management strategies, and gaining access to CDC National Surveillance Data Rona Boehm Data Management Data management leader for the project Jennifer Sawchuk Data Management Assist in accessing and managing publicly available data Michael Schlatt Software Development Develop SQL server extraction tool to create summary dataset from CDC's National Surveillance Dataset Cyndi Reese Secretarial Supporl Word processing and PowerPoint presentation support Table 3. Allocation of Hours by Task Staff Member Task 1 Task Management Task 2 Data Management Task 3 High Resolution Model Task 4 Low Resolution Model Task 5 Visualization Tool Total Hours Planned Total Already Spent Warren Strauss 80 10 75 75 20 260 140 Bruce Buxton 32 32 4 Jyothi Nagaraja 10 70 70 150 3 Michele Morara 25 60 230 315 115 Tim Pivetz 60 20 80 0 Elizabeth Slone 20 20 40 0 Nicole Iroz Elardo 10 10 20 20 Darlene Wells 20 15 35 15 Rona Boehm 90 90 38 Jennifer Sawchuk 70 70 20 Michael Schlatt 35 35 18 Cyndi Reese 4 8 8 20 3 Totals 116 235 268 263 265 1147 376 Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 23 ------- 6.0 QUALITY ASSURANCE / QUALITY CONTROL PROGRAM The primary focus of Battelle's Quality Assurance/Quality Control Program is to ensure that the quality and integrity of all deliveiables produced for EPA are of appropriate, known, and documented quality To do this, Battelle will apply a project specific Quality Management Plan (QMP) to all work performed for EPA, ensuring that all project activities that affect the quality and integrity of prepared deliverables are planned, coordinated, and communicated to all project participants The project QMP also ensures that all technical services, final deliverables, and the supporting data produced for EPA meet or exceed the quality standards that EPA has set foi this project The following sections present our proposed QMP for this project 6.1 PROJECT MANAGEMENT On this project, EPA will utilize technical support from staff in Battelle's Statistics and Information Analysis department under Work Assignment 2-13 of Contract No EP-W-04-021, with Ms Margaret Conomos serving as EPA's Work Assignment Manager (WAM) and Dr Barry Nussbaum serving as the EPA technical advisor Mr Warren Strauss will serve as Work Assignment Leader and will supervise all technical activities that Battelle performs in support of this project. Dr Bruce Buxton is Battelle's EPA Program Manager on this contract and will serve as technical, quality assurance, and managerial reviewer of Battelle's deliverables on this work assignment Quality assurance (QA) activities for this project involve reviewing and approving the Data Management and Statistical Analysis Plan and any necessary amendments, implementing the plan, reviewing project reports, performing any necessary audits to check for compliance with the plan, and ensuring that corrective actions are taken when necessary Mr Strauss, as Work Assignment Leader, will be responsible for implementation of the plan, day-to-day QA activities, and submitting amendments in a timely fashion Dr Bruce Buxton, as QA Manager, will be responsible for reviewing and approving the Statistical Analysis Plan and any amendments, and for conducting audits as necessary Reports prepared under this project will be reviewed by Mr Strauss and the EPA WAM, as well as by Dr Buxton Ensuring any necessary corrective action will be the immediate responsibility of Mr Strauss under supervision of Dr Buxton Battelle has established piocedures for monitoiing the schedule and resouice expenditures of each Task within the project As tasks proceed, Battelle will submit a monthly progress report to EPA that summarizes progress made toward completing the milestones and deliverables, problems encountered and how they were overcome, plans for the upcoming month and beyond, and a financial summary of labor resource expcndituies These monthly progress reports, along with frequent discussions between Battelle staff and EPA will facilitate the identification and resolution of any potential problems, and will serve to document progress One of Mr Stiauss' key responsibilities will be to ensure the technical quality of all work, reports, and deliverables provided to EPA Battelle staff members are experienced in identifying and resolving small problems to prevent them fiom getting worse - and frequent communication with our collaborators and clients provide the cornerstone to problem resolution Battelle's business practices include the application of a formal company-wide policy to assure our clients Battelle PROPRIETARY. PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 24 ------- that our work is of the highest quality This policy includes the assignment of a senior Battelle expert to provide technical review and concurrence support for each project deliverable In addition, Mr Strauss and/or Dr Buxton will also technically review each report and deliverable before it is submitted to EPA to ensure a quality deliverable 6.2 PROJECT DESCRIPTION AND SCHEDULE This project involves data management, statistical analysis, geographical information systems support, and high-level statistical methods development for integrating data sources to predict how risk of childhood lead-poisoning changes over time as a function of a variety of environmental exposures, demographic characteristics and programmatic support The original period of performance for this project was 7.5 months from the date of award, with a specific schedule of deliverables presented in an earlier vvorkplan However, this schedule has been severely compressed with the majority of activities being conducted in the last 10 weeks of the project due to multiple difficulties encountered in accessing the critical blood-lead response variable information from CDC and local lead poisoning prevention programs 6.3 QUALITY OBJECTIVES AND CRITERIA To meet the project goals, the project has the following data quality objectives 1 Manage the data from the multiple sources of information relating environmental exposure, demographic characteristics, programmatic support to childhood lead- poisoning risk 1 1 Identify missing, incomplete or error prone data from electronic data souices supplied by EPA, and work with EPA TOPO to resolve these issues 1 2. Develop separate study databases and master integrated databases foi statistical analyses 1 3 Provide complete documented data dictionary for the study databases 2 Conduct statistical analyses of data as directed by EPA 2 1 Provide descriptive statistics of all variables included in the study, and all created variables 2 2 Conduct statistical analyses using appropriate statistical software (e g SAS) or other software products approved by EPA WAM Provide diagnostics on all statistical models to characterize model fit and distributional assumptions, as appropriate 3 Develop visualization tool that will allow users to interact with the results of the broad-based national model 6.4 SPECIAL TRAINING Staff with prior knowledge of programming in the STAT module within the SAS® System and with prior tiaining in statistical modeling, especially with legard to binary data, is desnablc for this pioject Staff with prior knowledge of Geogiaphic Information Systems, particularly ArcView® and Arclnfo from ESRI, is also desnable foi this project Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 25 ------- 6.5 DOCUMENTATION AND RECORDS At EPA, the Work Assignment Manager will maintain a file for the project The file will include, but is not limited to, this QMP and statistical analysis plan (and all revisions), computer programs and listings developed under this statistical analysis plan, result summaries, and diskettes with the data The data provided to the EPA WAM by Battelle will be stripped of all potentially identifiable or confidential information - but will be organized in such a way that the analyses performed by Battelle can be replicated Battelle's Project Manager will maintain a file for this project that contains the inputs required for conducting Battelle's activities, as well as the outputs generated from conducting these activities Among the required inputs to be included in the project file are copies of this QMP, and copies of data provided to Battelle by various data sources (subject to provisions included in limited data sharing agreements with Massachusetts and Los Angeles) The outputs generated by Battelle's activities that will be included in the project file will include the fully documented individual and combined master datasets, SAS programs and ArcView files written to perform summaries and the statistical analyses on the EPA data, those portions of SAS and ArcView outputs that are generated by executing these programs and which directly provide information to be cited in the deliverables, the visualization tool executable, and all project deliverables Battelle will document progress made on this project within monthly progress reports to EPA and will keep the EPA TOPO updated via e-mail and conference call as Battelle's activities proceed 6.6 DATA ACQUISITION AND MANAGEMENT Battelle will maintain the data for this project in a manner that preserves the confidentiality of all the data and prevents its release Battelle will maintain the data received from EPA and other souices (CDC, Massachusetts, Los Angeles) and any data files created by analysis programs on either hard drive or removable diskette Removable diskettes will be locked up when not in use, any computer listings or data summaries shall be locked in a file when not in use, unneeded listings or summaries shall be shredded, and the project will be discussed only with managers and staff who are familial with the project and its data confidentiality lestnctions When the EPA TOPO sends the data to Battelle, the data will be sent by overnight delivery service, or through a secure FTP transmission site established by Battelle The data diskettes will be enclosed in an envelope labeled "To be opened by addressee only " The Project Manager will track the delivery of the data and verify with the Battelle Project Manager that the data have been received Once data files are leceived from EPA, Battelle will handle the original data (e g data with personal identifiers) as though the data were classified as confidential business information (CBl) under the Toxic Substances Control Act (TSCA), even though EPA may not specifically classify these data as "CBI " While they are in Battelle's possession, the data files will not be shared with anyone outside of the project team, and the pioject team will not share files via e- mail attachment Until they are required to be returned to EPA, the data diskettes will reside solely within Battelle's headquarters in Columbus, OH, which is a controlled access campus Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 26 ------- When other Battelle technical staff are required to assist in the data analysis effort, Batielle's Project Manager will brief them on all confidentiality requirements associated with the project and the data and will ensure that they follow appropriate requirements when possessing the data diskettes Battelle will follow the TSCA CBI Security Manual when conducting data analyses on computer systems and on establishing reporting and record keeping procedures on this project When Battelle's Work Assignment Leader returns the data diskettes to the original source at the end of the project, along with any analysis programs and output that may be required, the diskettes will be enclosed in an envelope labeled "To be opened by addressee only" and will be sent by overnight delivery service Battelle's Work Assignment Leader will track the delivery of the diskettes and verify that the appropriate source has received the diskettes 6.7 SOFTWARE ACQUISITION & USAGE Microsoft Access and SQL server will be the primary software tools used for data management, although some data or results may be transferred to or presented in spreadsheets readable by software such as MS Excel, SAS, hard-copy data records or ArcView The SAS® System will be the primary statistical data analysis tool used on this project, but may be supplemented by S-Plus, or other specialized statistical software as agreed upon by EPA ArcView will be the primary software tool used for GIS applications on this project All required software presently reside on personal computers at EPA and Battelle 6.8 DATA VALIDATION AND USABILITY Battelle will work directly with each information source (or review appropriate documentation for publicly available sources) to ensure that the variables in the data files aie understood To ensure complete and accurate tiansfer of data, records counts, variable lists, and summary statistics will be compared between the original data sources and the data transferred to the study database Standard software packages (e g., SAS® System, Arcview) will be used for the data analyses, and the exploratory analyses will be used to identify any obvious problems with the data (e g incomplete data, or data that is inconsistent with the documentation) Any major deviations that are identified in these comparisons will be investigated to locate and understand the rationale for the deviations If EPA determines that the outcome of these comparisons is not acceptable, then the Battelle Work Assignment Leader and the EPA Woik Assignment Manager will work jointly to get information that will explain the unacceptable outcome and to determine whether data are available that meet the requirements of this project 6.9 RECONCILIATION WITH USER OBJECTIVES The goals of this pioject aie to (1) perform data management seiviccs, (2) conduct exploratoiy and sophisticated longitudinal statistical analyses, and (3) develop a visualization tool for a pilot study to develop a method of identifying geographic aieas with children at high risk of lead poisoning by combining environmental exposure, demographic characteristics, and programmatic level of support infoimation All EPA objectives are captuied in the Quality Management Plan developed for this project by Battelle Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 27 ------- Databases developed in Task 1 will be stored in a format agreed upon by the Battelle Work Assignment Leader and the EPA Work Assignment Manager. The specific form of statistical analyses to be performed will be identified and determined in discussions between Battelle and EPA technical staff. When necessary, the objectives and methods associated with these analyses will be documented in e-mail messages between Battelle and EPA, prior to formal integration into the Statistical Analysis Plan The Statistical Analysis Plan will be amended as necessary throughout the performance of this project, to reflect any modification to the statistical analysis approach dictated by the course of research and outcomes from the statistical modeling efforts Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 28 ------- 7.0 REFERENCES 24 CFR Part 35, 40 CFR Part 745, Lead, Requirements for Disclosure of Known Lead-Based Paint and/or Lead-Based Paint Hazards in Housing, Final Rule (3/6/1996) Accessed at http//www leadsalehomcs info/pdfs/all titlctcn fulltcxl cnglish pdf'#scarch=%22HUD7o 20IQ18%20Rule%22 40 CFR Part 745, Lead; Identification of Dangerous Levels of Lead; Final Rule (1/5/2001) Accessed at http //www epa gov/fedrgstr/EPA-TOX/2001/Januarv/Dav-05/l84 ixlf CDC 1997 Screening Young Children for Lead Poisoning Guidance for State and Local Public Health Officials, edited by U S Department of Health and Human Sevices Atlanta GA Public Health Sevices, CDC HUD 1995 The Relation of Lead Contaminated House Dust and Blood-Lead Levels Among Urban Children Washington DC U S Department of Housing and Urban Development Lanphear, BP, TD Matte, J Rogeis, RP Clickner, B Dietz, RL Bornschein, P Succop, KR Mahaffey, S Dixon, W Galke, M Rabinowitz, M Farfel, C Rohde, J Schwartz, P Ashley, and DE Jacobs. 1998 The contribution of lead-contaminated house dust and residential soil to children's blood lead levels A pooled analysis of 12 epidemiologic studies Environ Res 79 (1) 51-68 Miranda, ML, DC Dolinoy, and MA Overstreet 2002 Mapping for Prevention GIS Models for Directing Childhood Lead Poisoning Prevention Programs Environmental Health Perspectives 110 (9) 947-53 Miranda, ML, JM Silva, MA Overstreet Galeano, JP Brown, DS Campbell, E Coley, CS Cowan, D Harvell, J Lassiter, JL Parks, and W Sandele 2005 Building Geographic Information System Capacity in Local Health Departments Lessons from a North Carolina Project Am J Public Health 95 (12) 2180-5 Spivey, Angela The Weight of Lead Effects Add Up in Adults Environmental Health Perspectives Volume 115. Numbei 1, Januai v 2007. Strauss, Warren, R Carroll, Steve Bortnick, John Menkedick, and B Schultz 2001 Combining Datasets to Predict the Effects of Regulation of Environmental Lead Exposure in Housing Stock Biometrics 51 203-210 Strauss, Warren, Ramzi Nahhas, Leanna House, Amy Kurokawa, and Bradley Skarpness 2001 Development of Models to Predict Risk of Childhood Lead Poisoning at the Census Tract Level Columbus OH Technical Report to the U S Centers for Disease Control and Pievention under Contract No 200-98-0102 Strauss, Warren, Tim Pivetz, P Ashley, John Menkedick, E Slonc, and S Cameron 2006 Evaluation of Lead Hazard Control Treatments in Four Massachusetts Communities through Analysis of Blood-lead Surveillance Data Environmental Research 99 (2) 214-223 US Department of Housing and Ui ban Development Septembei 15, 1999 Final Rule, Requirements for Notification, Evaluation and Reduction of Lead-Based Paint Hazaids in Federally Owned Residential Piopcrty and Housing Receiving Federal Assistance Washington DC Federal Register, 50140-50231 Battelle PROPRIETARY, PRIVILEGED, AND CONFIDENTIAL INFORMATION TO BE USED ONLY FOR GOVERNMENT PURPOSES 29 ------- |