United States Science Advisory EPA-SAB-IHEC-ADV-96-004 Environmental Board (1400) September 1996 Protection Agency Washington, DC &EPA AN SAB REPORT: THE CUMULATIVE EXPOSURE PROJECT REVIEW OF THE OFFICE OF PLANNING, POLICY, AND EVALUATION'S CUMULATIVE EXPOSURE PROJECT (PHASE 1) BY THE INTEGRATED HUMAN EXPOSURE COMMITTEE ------- September 30, 1996 EPA-SAB-IHEC-ADV-96-004 Honorable Carol M. Browner Administrator U.S. Environmental Protection Agency 401 M Street, S.W. Washington, D.C. 20460 Subject: Science Advisory Board's review of the Office of Policy, Planning, and Evaluation's (OPPE) Cumulative Exposure Project (Phase 1) Dear Ms. Browner: The Office of Policy, Planning, and Evaluation (OPPE) Cumulative Exposure Project is intended to provide a national distribution of cumulative exposures to environmental pollutants, providing comparisons of exposures across communities, exposure pathways, and demographic groups. Its ultimate goal is to develop analyses of multiple exposures and multiple pollutants, providing EPA with the ability to identify the most significant environmentally-mediated human health problems and the most impacted communities or demographic groups. The project has been structured in two phases. In the first phase, cumulative exposures to chemicals on the list of hazardous air pollutants (HAPS) occurring through three separate pathways - inhalation, food ingestion and drinking water ingestion- are being independently estimated for the U.S. population. The methodolo- gies and databases used and the approaches for estimating variability in exposure levels across geographic areas and demographic groups are currently under develop- ment for the base year (1990). In the second phase, indoor air exposures to HAPS from various indoor sources and via various exposure pathways will be considered. Methods for combining inhalation and ingestion exposures for all pathways will be developed to provide estimates of total exposure to many chemicals. The Board was asked to provide an Advisory report for Phase 1 (which is the subject of this letter) and a Consultation for Phase 2 (notice of which was transmitted to you in our letter of ------- August 6, 1996). Consequently, the Integrated Human Exposure Committee (IHEC) met on June 26-27, 1996 and focused on the following issues incorporated in the formal Charge (see section 2.2 of the enclosed report for the full detailed Charge): a) Underlying scientific basis for the project b) Modeling methodology, including: 1) proposed inventory of toxic emissions releases 2) air dispersion modeling techniques 3) methods for evaluating the performance of the model 4) food/drinking water ingestion exposure estimates 5) data sources for food/drinking water consumption amounts 6) methods for assigning values to data samples below the limit of quantification 7) treatment of uncertainty The Committee wishes to commend the Agency regarding the quality of the documentation provided for the review. Although there are some technical caveats regarding the methodology (see below), the Committee believes that the overall conceptual framework and underlying scientific foundation for the Cumulative Exposure Project is sound. The project provides a strong basis for developing an integrated assessment of population exposures to toxic pollutants, and, ultimately, a means to compare exposures to multiple toxicants in all media across geographical and demo- graphic groups. It must be noted, however, that the project is very ambitious and suffers, at least in the near term, from limitations in some of the measurement data. An Agency commitment to support the procedures for validation of the results will also be required. Ultimately, however, the project should provide a more strategic means of evaluating exposures to toxic pollutants than does the chemical-by-chemical and medium-by-medium approach currently used by the Agency. As better databases are developed, the cumulative exposure framework should be able to provide a means for assessing differences in exposures across regional and demographic sub-groups, and eventually, for identifying sub-populations with very high exposures. When coupled with an understanding of the effects of such exposures, the Agency should be able to target its efforts to protect human health to those areas and population groups most at risk, including children. ------- In order to achieve fully the objectives of the Cumulative Exposure Project, certain critical additional data will be needed. The Cumulative Exposure Project, by its very nature, provides an excellent means of identifying the most critical empirical data needed to assess population exposures to toxic pollutants. Identification of the most important data gaps to be filled will require strong scientific collaborations between EPA modelers and measurement experts. Because of the nature of EPA's mission and organization, model development efforts tend to be separated from environmental measurement efforts. The scientific method requires a more integrated and iterative use of models and measurements. Models are used to organize and interpret measurement data and then to design the next measurement experiments. Measured data are then used to test and further develop the models. The IHEC strongly urges the Agency to encourage and reward collaboration between the scientists who develop models and those who make environmental measurements. We also encourage the Agency to begin examining ways in which environmental data collected for regulatory purposes might be collected in ways that would make these data simultaneously useful for scientific purposes. With some thought, however, it should be possible to develop improved guidelines for collection of some environ- mental data so that it could be used for the dual purposes of assessing regulatory compliance and advancing environmental science in order to improve the future protection of public health. A major, and more long-term challenge for the Agency in this project, and for the scientific community in general, will be to develop defensible means of combining the exposures to multiple toxics in a manner that provides meaningful ways of assessing potential health risks from the total exposures to many chemicals. In order to estimate exposures to more than one chemical, a health metric (e.g., toxicological potency) must be incorporated. A measure of potency is required to scale exposures to a common reference point so that total exposure is meaningful. This aspect of the project is one that will require time, thought, and scientific creativity. The Cumulative Exposure Project is very likely to provide a strong driving force to go beyond the current, and oversimplified chemical by chemical approach to human health risk assessment, to begin to address health risks in a way that recognizes that the human population is simultaneously exposed to multiple environmental pollutants. Specific technical issues are discussed in the body of our report, but certain ------- overarching issues impact the entire effort. These issues include: a) the need for EPA to make a strong commitment to providing the measure- ment resources that will be needed for the success of this project; we note that the National Human Exposure Assessment Survey (NHEXAS) Project will provide valuable measurement data for some of the chemicals of concern in this project b) A commitment to develop criteria and a strategic plan for determining which measurement data are the most important to collect, given limited resources c) A commitment to verify the performance of the model by comparing its predictions with "ground truth" data. The methods proposed for perfor- mance evaluation appear to be appropriate, but are restricted both in scope and in number. These restrictions limit the robustness of any conclusions derived regarding the overall reliability, accuracy and preci- sion of the exposure model predictions. d) An effort by the Agency to begin examining means by which environmen- tal data collected primarily for regulatory purposes might be also collected and recorded in databases in ways that would make such data simulta- neously useful for scientific purposes, such as exposure analysis. For example, many of the environmental databases that are needed for this project report a large percentage of measurements in categories such as "below the limit of detection" (BDL) or "missing data." This greatly limits their use for other purposes such as exposure analysis. More sensitive analytical methods will be required to minimize the percentage of BDL data. Effective quality assurance protocols will also be required if envi- ronmental databases are to be used for dual purposes. e) Coordination of this effort with other federal agencies that are generating databases that are important to the success of this project will also be needed. f) Inclusion in the model evaluation process and report of more discussion ------- of the limitations and capabilities of the models being considered. The proposed framework can appropriately be called an integrated multi-media framework, because it includes multiple environmental and exposure media (i.e., outdoor air, indoor air, water, food, etc.); multiple pathways of exposure; and multiple routes of contact (inhalation and ingestion). Nevertheless, as currently constructed, the framework cannot be used for prospective cross-media exposure assessments, because it is not designed to characterize the dynamic exchange of chemicals between various media. This issue should not be considered as an error given the objectives of the model, but should be discussed as an inherent limitation so decision makers do not misinterpret model predications. Although the goals for the Cumulative Exposure Project are very ambitious and EPA's scientists will face many challenges in achieving these goals, this project will provide a more integrated approach for evaluating exposures to and risks from multiple environmental pollutants than the chemical-by-chemical approach that is now used. We appreciate the opportunity to review this document, and look forward to your response to the issues we have raised. Sincerely yours, Dr. Genevieve Matanoski Chair, Science Advisory Board r. Joan Daisey Chair, Integrated Human Exposure Committee ENCLOSURE Distribution List Administrator ------- Deputy Administrator Assistant Administrators Deputy Assistant Administrator for Pesticides and Toxic Substances Deputy Assistant Administrator for Research and Development Deputy Assistant Administrator for Water EPA Regional Administrators EPA Laboratory Directors EPA Headquarters Library EPA Regional Libraries EPA Laboratory Libraries Staff Director, Scientific Advisory Panel ------- NOTICE This report has been written as a part of the activities of the Science Advisory Board, a public advisory group providing extramural scientific information and advice to the Administrator and other officials of the Environmental Protection Agency. The Board is structured to provide balanced, expert assessment of scientific matters relating to problems facing the Agency. This report has not been reviewed for approval by the Agency and, therefore, the contents of this report do not necessarily represent the views and policies of the Environmental Protection Agency, nor of other agencies in the Executive Branch of the Federal government, nor does mention of trade names or commercial products constitute a recommendation for use. ------- ROSTER Chair Dr. Joan Daisey, Lawrence Berkeley Laboratory, Berkeley, CA Members Dr. Paul Bailey, Mobil Business Resources Corporation,Paulsboro, NJ Dr. Robert Hazen, State of New Jersey Department of Environmental, Protection and Energy, Trenton, NJ Dr. Timothy Larson, University of Washington. Seattle, WA Dr. Paul Lioy, Robert Wood Johnson School of Medicine, Piscataway, NJ Dr. Kai-Shen Liu, California Department of Health Services, Berkeley, CA Dr. Thomas E. McKone, University of California, Berkeley, CA Dr. Maria Morandi, University of Texas Health Science Center, Houston, TX Dr. Jerome O. Nriagu, The University of Michigan, Ann Arbor, Ml Dr. Barbara Petersen, Technical Assessment Systems, Inc., Washington, DC Mr. Ron White, American Lung Association, Washington, DC Consultant Dr. Robert A. Harley, University of California Berkeley, Berkeley, CA Science Advisory Board Staff Mr. Samuel Rondberg, Designated Federal Official, U.S. Environmental Protection Agency, Science Advisory Board (1400-F), 401 M Street, SW, Washington, DC 20460 Staff Secretary Mrs. Dorothy M. Clark, Staff Secretary, U.S. Environmental Protection Agency, Science Advisory Board (1400F), 401 M Street, S.W., Washington, DC 20460 ------- ABSTRACT The Committee believes that, with caveats, the Cumulative Exposure Project's conceptual framework is scientifically sound and provides a basis for an assessment of population exposures to toxicants, and, ultimately, a means to compare exposures to multiple toxicants across geographical and demographic groups. The project is very ambitious and suffers (at least in the near term) from limitations in the data. Also, the Agency and the scientific community needs to develop defensible means of combining exposures to multiple toxic pollutants in order to assess health risks from combined exposures to many chemicals. Ultimately, the project should provide a more strategic means of evaluating exposures to toxicants than does the chemical-by-chemical, medium-by-medium approach currently used. We encourage the Agency to begin to examine ways in which environmental data collected for regulatory purposes might be collected in ways that would make these data simultaneously useful for scientific purposes. Specific technical issues are discussed in the body of the report, but there are several overarching issues. These include: EPA's need to make a strong commitment to providing the measurement resources that will be needed for the success of this project; a commitment to develop criteria and strategic plans prioritizing collection of measurement data; a commitment to verify the performance of the model by comparing its predictions with "ground truth" data; an effort by the Agency to begin examining means by which environmental data collected primarily for regulatory purposes might be also collected and recorded in databases in ways that would make such data simultaneously useful for scientific purposes; coordination of this effort with other federal agencies that are generating databases that are important to the success of this project; and inclusion in the model evaluation process and report of more discussion of the limitations and capabilities of the models being considered. KEYWORDS: multimedia exposure; exposure modeling; toxic pollutants; mixtures. ------- TABLE OF CONTENTS 1. EXECUTIVE SUMMARY 1 2. INTRODUCTION AND CHARGE 4 2.1 Introduction 4 2.2 Charge 7 3. DETAILED FINDINGS 9 3.1 Proposed methods for modeling ambient levels of air toxics 9 3.2 Proposed toxics emissions inventory 10 3.3 Dispersion modeling techniques 11 3.4 Proposed methods for model evaluation 12 3.4.1 Reliability and validity 13 3.4.2 Sensitivity and uncertainty analysis 14 3.4.3 Concentration predictions 14 3.4.4 Overall model evaluation process 16 3.5 Proposed methodology for estimating food ingestion exposures 16 3.6 Data sources for food consumption amounts and residue levels 18 3.7 Methods for dealing with food samples below the limit of quantification . 20 3.8 Treatment of uncertainty in the food ingestion exposure estimates 22 3.9 Methodology for estimating drinking water ingestion exposures 24 3.10 Data sources for drinking water consumption amounts/drinking water contaminant levels 24 3.11 Assigning values to drinking water data samples below the limit of quantifi- cation 25 3.12 Treatment of uncertainty in the drinking water ingestion exposure estimates 26 4. CONCLUSIONS 27 4.1 Science underlying the basic approaches and findings 27 4.2 Research needs 29 5. REFERENCES R-1 APPENDIX A A-1 ------- 1. EXECUTIVE SUMMARY The Committee wishes to commend the Agency regarding the quality of the documentation provided for the SAB review. Although there are some caveats regard- ing the content, the Committee believes that the overall conceptual framework for the Cumulative Exposure Project is scientifically sound and provides a strong basis for an integrated assessment of population exposures to toxic pollutants, and, ultimately, a means to compare exposures to multiple toxic pollutants in all media across geograph- ical and demographic groups. It must be noted, however, that the project is very ambitious and suffers, or will be handicapped, at least in the near term, from limitations in the measurement data. Ultimately, the project should provide a more strategic means of evaluating exposures to toxic pollutants than does the chemical-by-chemical, medium-by-medium approach currently used by the Agency. As better databases are developed, the cumulative exposure framework should be able to provide a means for assessing differences in exposures across regional and demographic sub-groups, and ultimately, for identifying sub-populations with very high (as well as very low) exposures. This will enable the Agency to target its efforts to those areas posing the greatest risk to the greatest number of people, and so protect human health in a more effective way than has been possible in the past. In order to achieve fully the objectives of the Cumulative Exposure Project, certain critical additional data will be needed. EPA's National Human Exposure Assessment Survey (NHEXAS) will provide some valuable data of the type needed by the project, but it will not be available until late 1997 or early 1998. The Cumulative Exposure Project, by its very nature, provides an excellent means of identifying the most critical data needed to assess accurately population exposures to toxic pollutants. Because of the nature of EPA's mission and organization, model development efforts tend to be separated from environmental measurement efforts. The scientific method requires a more integrated and iterative use of models and measurements. Models are used to organize and interpret measurement data and then to design the next measurement experiments. Measured data are then used to test and further develop the models. The IHEC strongly urges the Agency to encourage and reward collaboration between the scientists who develop models and those who make environ- mental measurements. We also encourage the Agency to begin to examine ways in which environmental data collected for regulatory purposes might be collected in ways that would make these data simultaneously useful for scientific purposes. With some thought, it should be possible to develop improved guidelines for collection of some environmental data so that it could be used for the dual purposes of assessing regula- 1 ------- tory compliance and advancing environmental science to improve the future protection of public health. A major, and longer-term challenge for the Agency in this project, and for the scientific community in general, will be to develop defensible means of combining the exposures to multiple toxic pollutants in a manner that provides meaningful ways of assessing health risks from the combined exposures to many chemicals. In order to estimate exposures to more than one chemical, a health metric (e.g., toxicological potency) must be incorporated. A measure of potency is required to scale exposures to a common reference point so that total exposure is meaningful. This aspect of the project is one that will require time, thought and scientific creativity. The Cumulative Exposure Project is very likely to provide a strong driving force to go beyond the current, and oversimplified chemical-by-chemical approach to human health risk assessment, to begin to address health risks in a way that recognizes that the popula- tion is simultaneously exposed to multiple environmental pollutants. Specific technical issues are discussed in the body of our report, but certain overarching issues impact the entire effort. These include: a) the need for EPA to make a strong commitment to providing the measure- ment resources that will be needed for the success of this project; we note that the NHEXAS Project will provide valuable measurement data for some of the chemicals of concern in this project b) A commitment to develop criteria and a strategic plan for determining which measurement data are the most important to collect, given limited resources c) A commitment to verify the performance of the model by comparing its predictions with "ground truth" data. The methods proposed for perfor- mance evaluation appear to be appropriate, but are restricted both in scope and in number. These restrictions limit the robustness of any conclusions derived regarding the overall reliability, accuracy and preci- sion of the exposure model predictions. d) An effort by the Agency to begin examining means by which environmen- tal data collected primarily for regulatory purposes might be also collected and recorded in databases in ways that would make such data simulta- neously useful for scientific purposes, such as exposure analysis. For example, many of the environmental databases that are needed for this ------- project report a large percentage of measurements in categories such as "below the limit of detection" (BDL) or "missing data." This greatly limits their use for other purposes such as exposure analysis. More sensitive analytical methods will be required to minimize the percentage of BDL data. Effective quality assurance protocols will also be required if envi- ronmental databases are to be used for dual purposes. e) Coordination of this effort with other federal agencies that are generating databases that are important to the success of this project will also be needed. f) Inclusion in the model evaluation process and report of more discussion of the limitations and capabilities of the models being considered. The proposed framework can appropriately be called an integrated multi-media framework, because it includes multiple environmental and exposure media (i.e., outdoor air, indoor air, water, food, etc.); multiple pathways of exposure; and multiple routes of contact (inhalation and ingestion). Nevertheless, as currently constructed, the framework cannot be used for prospective cross-media exposure assessments, because it is not designed to characterize the dynamic exchange of chemicals between various media. This issue should not be considered as an error given the objectives of the model, but should be discussed as an inherent limitation so decision makers do not misinterpret model predications. Although the goals for the Cumulative Exposure Project are very ambitious and EPA's scientists will face many challenges in achieving these goals, this project will provide a more integrated approach for evaluating exposures to and risks from multiple environmental pollutants than the chemical-by-chemical approach that is now used. ------- 2. INTRODUCTION AND CHARGE 2.1 Introduction The Office of Policy, Planning, and Evaluation (OPPE) Cumulative Exposure Project is intended to provide a national distribution of cumulative exposures to environmental pollutants, providing comparisons of exposures across communities, exposure pathways, and demographic groups. A substantial portion of EPA's exposure analyses and risk assessments are designed to support specific regulatory actions, and as a result they frequently focus on a single pollutant, a single source or source category, or a single medium. In reality, people tend to be exposed through multiple pathways to numerous pollutants originating from a variety of sources. OPPE's Cumulative Exposure Project is intended to develop analyses of multiple exposures which, in that Office's opinion, will support consideration of several important issues in environmental policy, such as targeting resources to the most significant problems and to the most impacted communities or demographic groups. The project has been structured in phases. In the first phase, cumulative exposures occurring through three separate pathways-inhalation, food ingestion and drinking water ingestion are independently estimated. Outputs of this phase include separate reports describing the development of the cumulative exposure estimates for each pathway, along with analyses of the distribution of estimated exposure levels across geographic areas and demographic groups. This phase also supports continuing efforts to develop data on environmental exposures by highlighting important data gaps and identifying areas for more in-depth and targeted analyses. In the second phase, methods for evaluating exposures to indoor sources of air pollution will be considered, and the results from the first phase will be examined to determine whether there are appropriate methods to combine the separate pathway analyses to develop estimates of multi-pathway cumulative exposure. The scope for this planned integration analysis may initially be limited to a select set of pollutants and a limited geographic area to evaluate the feasibility of combining data from the separate pathway analyses. The following sections provide an overview of each of the three pathway analyses. ------- a) Air Toxics: Outdoor Concentrations and Inhalation Exposure Output: Estimated outdoor concentrations and inhalation exposures for over 150 hazardous air pollutants (HAPs), by census tract. These estimates will be used for scoping analyses to identify geographic areas and subpopulations with high air toxics concentrations, and to assess the relative contributions to exposure from broad sectors of the economy, e.g. transportation, manufacturing, waste management. Description: This portion of the analysis will provide estimates of the distribution of HAP concentrations. This analysis consists of three sub- components: Outdoor Concentration Modeling. By applying dispersion modeling to national inventories of emissions from both stationary and mobile sources, annual average ambient concentrations of over 150 toxic pollutants (based on the list of hazardous air pollutants in the Clean Air Act) will be estimated for each census tract in the continental U.S. Indoor Concentrations: Both modeling and monitoring data on indoor concentrations of HAPs are being reviewed. Data for a variety of indoor sources of HAPs are being examined, including: environmental tobacco smoke, volatilization of chlorinated organics from showers, consumer products, formaldehyde; and radon. Due to gaps in data on indoor sources and concentrations, the scope and outputs of this component of the analysis remains under consideration. Exposure Modeling: The relative importance of outdoor and indoor concentrations of pollutants in determining actual levels of inhalation exposure largely depends on the amount of time individuals spend indoors and outdoors. Conceptually, this component of the project is intended to estimate cumulative inhalation exposures by weighting indoor and outdoor concentrations using activity pattern data. The scope and outputs of this component, however, will depend on the resolution of the approach for addressing indoor concentrations. b) Drinking Water Ingestion Output: Estimated exposure to pollutants in drinking water at the county level, and demographic characterization of the distribution of exposure levels. ------- Description: This portion of the analysis will evaluate exposures that result from ingestion of drinking water. As with outdoor source air exposures, exposure to contaminants in drinking water is driven primarily by where a person lives. The intended approach is to estimate levels of contamination in drinking water at the county level, using actual measurements of approximately 30 pollutants in drinking water supplies. For counties with missing data, extrapolations will be made from other counties based on the size, location and type of water system. Exposure estimates will be derived by combining the pollutant concentration estimates with estimated rates of drinking water ingestion. c) Food Ingestion Output: Estimated cumulative dietary intake of pesticides and industrial pollutants by demographic group and region. Description: This portion of the analysis will estimate exposures that result from ingestion of food. In general, there are two primary mechanisms by which food becomes contaminated with pollutants. First, residues of pesticides from field application are frequently found on produce at the consumer level. Second, many foods are contaminated with pollutants such as dioxin, mercury, lead, and some pesticides which are persistent and wide-spread in the environment and which bioaccumulate in the food chain. Fish, meats, and dairy products are frequently contaminated with bioaccumulative pollutants. The analysis of food ingestion exposures will use available measurements of approximately 40 pesticides and bioaccumulative pollutants in various foods. Because food tends to be distributed nationally, significant geographic distinctions in pollutant levels in many foods are not expected, other than for subsistence populations. Instead, the primary cause of variation in dietary exposure is variability in food consumption patterns. For this analysis, demographic distinctions in dietary profiles will be used to characterize exposure patterns. Dietary profiles for 22 demographic groups, distinguished by age, race, income, gender and region, have been estimated using USDA survey data. These dietary profiles will then be combined with estimates of pollutant levels in 34 different raw and processed foods to estimate exposures. ------- d) Integration of Air, Water and Food Exposure Output: Combined cumulative exposures via inhalation and ingestion by geographic area or subpopulations will be determined. Specific methods and outputs will be considered and evaluated as analyses of the three individual pathways progress. 2.2 Charge a) Overarching Considerations: Are the basic approaches and findings supported by the underlying science? b) Specific Issues: 1) Are the proposed methods for modeling ambient concentrations of air toxics appropriate? 2} Is the proposed inventory of toxics releases appropriate for the national dispersion modeling effort and for developing representative long-term concentrations of air toxics by census tract? How can the emissions inventory be improved? (Elements of the emissions inventory include: emissions rates, stack parameters, spatial assignments and allocations, and characterization of the uncertainty in all of the above. Different approaches and data sources are proposed for different categories of sources, e.g. point sources vs. area sources vs. mobile sources.) 3) Are the dispersion modeling techniques applied to the emissions inventory appropriate for developing representative, geographically differentiated long-term concentrations of air toxics? (Elements of dispersion modeling include: spatial resolution, meteorological data, atmospheric processes, temporal resolution, and treatment of terrain.) 4) Are the methods proposed for evaluating the performance of the model appropriate? (Evaluation methods include comparison of outputs to monitoring data, and specific evaluation of emissions for tracts with unusually high concentrations.) 5) Is the proposed methodology for estimating food ingestion exposures appropriate? ------- 6) Are the sources of data proposed for both food consumption amounts and residue levels for food the best or most appropriate to use? 7) Are the proposed methods for assigning values to samples below the limit of quantification for food ingestion data appropriate? 8) Is the proposed treatment of uncertainty in the food ingestion exposure estimates appropriate? 9) Is the proposed methodology for estimating drinking water ingestion exposures appropriate? 10) Are the sources of data proposed for both drinking water consumption amounts and drinking water contaminant levels the best or most appropriate to use? 11) Are the proposed methods for assigning values to samples below the limit of quantification for drinking water data appropriate? Are the extrapolations to areas where there are no drinking water monitoring results reasonable? 12) Is the proposed treatment of uncertainty in the drinking water ingestion exposure estimates appropriate? 8 ------- 3. DETAILED FINDINGS 3.1 Proposed methods for modeling ambient levels of air toxics It is necessary to consider the reactivity/persistence of toxic substances in the environment before attempting to model their ambient air concentrations. The modeling approach already takes into account the reactivity of VOCs in the atmosphere, and includes appropriate "decay factors" for highly reactive pollutants such as 1,3-butadiene. The model is also appropriate for a large number of VOCs with medium reactivity (half-life up to several months). However, for the lowest-reactivity compounds (e.g., PCBs, trace metals), the current year emission inventory may not be representative because it does not account for re-emission/resuspension of historically-emitted persistent chemicals already present in the environment. Therefore, the Committee believes that model results for environmentally persistent toxic substances may understate true inhalation exposures. The analysis level of the ambient air model is the census tract. Various other analysis levels (e.g., census block, county, air basin, state) were discussed. The use of census tract centroids as the model receptors needs to be considered and explained carefully. Problems will arise in rural areas where population may be clustered in several small centers, with large agricultural or unsettled areas in between. Measured hourly ambient CO concentrations (approximately 500 fixed monitoring sites operate continuously in the US) might be used to estimate ambient concentrations of other mobile source related pollutants. Ambient concentrations of CO and non-methane hydrocarbons were well-correlated (R-squared = 0.75) in early morning samples collected in Los Angeles during the 1987 Southern California Air Quality Study (Lawson, 1990). It should be noted however, that these ratios vary diurnally and seasonally, particularly for reactive compounds. In a broader sense, and for most other compounds, the basic approach adopted by the modeling project is reasonable, that is, that the direct atmospheric pathway is the most important route of airborne exposure. There are numerous ways to model transport and transformation in the atmosphere. The approach chosen for this project uses modified Gaussian plume models to predict the spatial distribution of pollutants near the source. The advantages of this approach are that it is relatively simple and requires only modest computational resources; that it includes a reasonable approximation of the chemical fate of these materials in the environment; and that it provides adequate spatial resolution near major sources. The disadvantages of this ------- approach are that the Gaussian models do not work well in certain geographical locations and under certain meteorological conditions, and that the models are not appropriate for predicting the fate of pollutants greater than 50 kilometers away from the source. The limitations of location and meteorology are not fully discussed in the document. Some suggestions for further discussion of these issues are presented below in section 3.3. 3.2 Proposed toxics emissions inventory The Committee has some doubts concerning the selection of 1990 as the base year for the inventory. Although EPA staff indicated that this year would provide an appropriate baseline for comparison with future-year exposures, the IHEC is concerned that, because of major changes to gasoline composition that occurred in 1992 (wintertime only, in approximately 40 urban areas with high CO levels) and in 1995 (in nine high-ozone areas), EPA's assessment may overstate exposures to benzene and aromatic HCs, and understate exposures to formaldehyde and MTBE. To complicate matters further, motor vehicle fuels continue to be re-formulated in a variety of ways. EPA staff indicated that census tract-level modeling was needed to meet project objectives, specifically to answer questions related to environmental justice (i.e., are minority populations experiencing higher exposure to hazardous air pollutants?). Lower-income populations are likely to drive older cars with higher emissions. Furthermore, these vehicles are likely to be higher-mileage, not well-maintained, and lower-cost models more prone to malfunctions - which may result in increased air emissions. These differences in mobile source emissions by census tract need to be accounted for in the modeling of ambient air concentrations. In addition, some area source emissions may be lower if lower-income neighborhoods repaint their homes less frequently. The data used to construct the inventory for incinerators captures 400 "off-site" incineration facilities only. Other incineration activities occurring "on-site" at cement kilns etc. may not be included. Other smaller incinerator sources (e.g. medical waste incinerators at hospitals) may also be missing from the inventory. New profiles for heavy-duty diesel engine exhaust have been published by Sagebiel et al. (1996) based on measurements in the Fort McHenry and Tuscarora tunnels. Existing profiles for heavy-duty diesel engine exhaust VOC speciation were characterized as uncertain in the preliminary modeling that the SAB reviewed. 10 ------- 3.3 Dispersion modeling techniques As noted above in section 3.1, Gaussian models do not work well under certain meteorological conditions and at certain geographical locations. Specifically, problems can occur at low wind speeds, during highly unstable or stable conditions, and when the source is in complex terrain and/or near a shoreline. Gaussian models also are an awkward formalism for inclusion of particle deposition when the particles are large and their trajectories are determined as much by gravitational settling as by advection. The failure of Gaussian models under meteorological extremes is well documented (Weiss, 1985). During strongly unstable conditions, the plumes from tall stacks descend quickly to the surface under the influence of large-scale convective eddies. The result is that actual ground level concentrations are systematically higher (typically by factors of about 2) and occur closer to the source than those predicted by Gaussian models. The EPA/American Meteorological Society workshops have recognized this for some time, and have proposed alternative models for use in predicting short-term, peak, ground-level impacts from tall stacks during unstable conditions. The Gaussian models of plumes from tall stacks can also under-predict the ground level concentrations at night during stable and moderately stable conditions. The Gaussian model predicts that the plume, in effect, is isolated from the ground and most of the pollution passes overhead and out of the modeling domain. However, this is frequently not true at night. Surface induced wind shear can enhance mixing of the plume toward the ground, resulting in ground level concentrations that can be orders of magnitude greater than those predicted by the Gaussian model. Under these circumstances, the models do not predict well. Current dispersion models also do not generally include the processes of deposition, washout, or resuspension. This may be a particular problem for environmentally persistent hazardous air pollutants found in the particulate phase. Finally, the fact that the Gaussian models assume superposition of steady state plumes also leads to under-predictions near the source during very light, variable winds. This "sloshing" effect is not conceptually consistent with the superposition assumption, that is, a Gaussian plume is not allowed to turn back on itself during the modeling period. The effect of these systematic -under-predictions by Gaussian models of the plumes from tall stack is offset to some extent by the fact that predictions of long-term averages include a high proportion of neutral or near neutral conditions over the year. 11 ------- The fact that moderately stable conditions occur more frequently than highly unstable conditions and light wind conditions implies that the shear-induced turbulence effects are probably a greater confounder of the annual average predictions from Gaussian models than are the convective eddy effects or the "sloshing" effects. Numerically based grid models, as contrasted with Gaussian plume models, can deal with the "sloshing" effects explicitly and also provide a flexible algorithm for dealing with particle deposition. In general, however, they tend to be more computer intensive than the simple Gaussian models. One exception is the "Wyndvalley" model, which in its simplest form is limited somewhat in wind direction resolution, but can be implemented with only wind speed information. These uncertainties and their potential impacts on estimated annual averages should be carefully considered and evaluated (see in particular pages 5-2 through 5-4 of the document. 3.4 Proposed methods for model evaluation In the supplied documentation for the cumulative distribution model, methods proposed for evaluating the model performance were directed at answering three specific questions. These questions are: a) how reliable is the national HAPs inventory as an input to the cumulative exposure model? b) How sensitive are exposure predictions to the manner in which the source data are mapped to a receptor region and integrated with the dispersion model? c) how well do the concentration predictions of the ASPEN model (Assessment System for Population Exposure Nationwide) dispersion and mapping modules compare to observed HAPs concentrations in areas where such measurements have been made? The first question is addressed by comparing the cumulative exposure source inventory to other national and state inventories for particular HAPs and groups of HAPs. The second question is addressed using a sensitivity analysis to assess how alternate model formulations and variations of input data (other than emissions data) impact exposure predictions. The third question is addressed by comparing annual 12 ------- average observed concentrations at four different U.S. sites with ASPEN predictions. 3.4.1 Reliability and validity In considering whether these methods are appropriate, the Committee addressed two questions-are the methods proposed appropriate, and are they sufficiently comprehensive to inform the decision makers about the capabilities, limitations, and overall reliability of the cumulative exposure process? With regard to these questions and the characteristics of the proposed total exposure models, the methods proposed for performance evaluation appear to be appropriate. However, the proposed performance evaluation methods are restricted both in scope and in number. These restrictions limit the robustness of any conclusions derived regarding the overall reliability, accuracy and precision of the exposure model predictions. In the following discussion, the IHEC provides examples to illustrate this summary finding. In making these comments, the Committee recognizes that model evaluation is a tiered and interactive process that typically involves several steps including verification, validation, sensitivity analysis, and uncertainty analysis. We consider the extent to which these steps have been incorporated in the proposed cumulative exposure methodology. Verification is a model evaluation process that poses the question "does the model do what it is designed to do?" This process involves a careful audit of the model assumptions and algorithms and extensive model testing and de-bugging. The documentation provided to the Committee was not specific on how the model verification process has been, and will be, integrated into the overall model development process. In order to facilitate the verification process, it is important to keep models simple. A model should be no more complex than is necessary. Simple model constructs are more reliable and easier to evaluate and verify, although not necessarily more accurate or precise. One approach that expedites the model verification process is to engage in model comparison or "round-robin" exercises. A useful example of a recent multimedia model comparison exercise is presented in a recent book (Cowan etal., 1995). describing a Society for Environmental Toxicology (SETAC) exercise in which several multimedia fate models were compared and verified against each other by having all the models applied to the same problem. Such an exercise is feasible for the EPA's proposed cumulative exposure model for air toxics since the Dutch Government has recently issued a similar model (RIVM, 1994). This model, called the Uniform System for the Evaluation of Substances (USES) provides a single framework for comparing 13 ------- the potential risks of different chemical substances released to multiple media of the environment. It is an integrated modeling system that includes multiple environmental media and multiple human exposure pathways. The exposure assessment in USES starts with substance release rates to water, soil, and air during the various life-cycle stages of a substance and follows its subsequent distribution in the total environment. Establishing some contact with the RIVM group would be one way to evaluate and improve the reliability of the model-development process. 3.4.2 Sensitivity and uncertainty analysis The ultimate goal of a sensitivity analysis is to rank the input parameters on the basis of their contribution to variance in the output. Sensitivity analyses can be either local or global with respect to the range of model outcome values. A local sensitivity analysis is used to examine the effects of small changes in parameter values at some defined point in the range of outcome values. A global sensitivity analysis quantifies the effects of variation in parameters over their entire range of outcome values and requires a quantitative uncertainty analysis as a starting point. The sensitivity analyses in the cumulative exposure methodology can be characterized as a local sensitivity analysis, which is useful for assessing model performance about some single outcome value, (such as a median) but does not characterize model sensitivity at the margins. Describing uncertainty in an output variable, Y, such as cumulative dose, involves determination of the range of Y, its arithmetic mean value, the arithmetic or geometric standard deviation of Y, and the upper and lower quantiles of Y, such as the 5% lower bound and 95% upper bound and how this range maps to distribution of model input values. The uncertainty analysis provided in the methodology report is primarily qualitative and not quantitative, that is, it is primarily a ""local" sensitivity analysis. Thus, the uncertainties can not be characterized in terms of statistical factors such as confidence intervals. The IHEC considers this appropriate at present. However, for the longer-term development of the cumulative exposure framework, EPA should consider how to include a more "global" sensitivity analysis. 3.4.3 Concentration predictions Validation is the model evaluation process that poses the question of whether the model provides the "truth", that is, does it make correct predictions. Since many environmental models used in a policy context are applied to make predictions that cannot or should not be validated (i.e. cancer risk predictions), full validation is rarely plausible or possible. However, the reliability of any model can be increased by conducting 14 ------- comparison of model predictions to an appropriate intermediate data set. That is, a model predicting variations of human dose may be very difficult to validate, but the intermediate concentrations predicted by the model can be audited. A comparison of outdoor concentrations predicted by ASPEN to HAPs measurements in four areas of the U.S. is provided in the methodology document. This exercise is commendable and on the right track. Nevertheless, as a comprehensive validation exercise this approach suffers from some important limitations. This process is not a validation of the overall multimedia exposure characterization but only serves to validate the ASPEN model predictions of yearly average HAP air concentrations. Since this is a comparison of yearly average concentrations to yearly average predictions, there is little opportunity to evaluate the extent to which uncertainties and ignorance regarding factors such as emissions rates, speciation, dispersion, secondary formation, deposition, re-emission from soil, etc. are responsible for any observed differences between observed and predicted concentrations. It is important to consider data sets that would make possible a more precise and defensible allocation of the sources of variance between prediction and observation. A "check list" should be developed that can be displayed as part of the computer output. This list should contain specific caveats, including the following, all of which could contribute to variances between predicted and "true" values: a) light, variable winds b) location relative to bodies of water c) proximity to complex terrain d) occurrence of wind shear e) existence of sharp land-use boundaries If the given modeling domain included any of these conditions as being distinctly important relative to the overall average location, then there would an appropriate message to warn the user/evaluator of the possible limitations of the model predictions. Where data are available that indicate a systematic under or over prediction of the model for a given compound, that may also be included 3.4.4 Overall model evaluation process 15 ------- A final issue that the Committee considered is that the model evaluation process and report should include more discussion of the limitations and capabilities of the models being considered. For example, the multimedia exposure analyses described in these reports are useful for retrospective analysis, but not for prospective trends analysis. The proposed framework can appropriately be called an integrated multi-media framework, because it includes multiple environmental and exposure media (i.e., outdoor air, indoor air, water, food, etc.); multiple pathways of exposure; and multiple routes of contact (inhalation and ingestion). Nevertheless, as currently constructed, the framework cannot be used for prospective cross-media exposure assessments, because it is not designed to characterize the dynamic exchange of chemicals between air and soil, between air/soil and vegetation or food webs, between air or soil and surface water, between soil and ground water, etc. This issue should not be considered as an error given the objectives of the model, but should be discussed as an inherent limitation so decision makers do not misinterpret model predications. As currently presented, this framework does capture multiple exposures through air, water, food, etc. However, this framework is based on an integration of air modeling and exposure media measurements and thus can not be used to capture the impact on human exposure of changes with time of air emissions rates. Such changes can impact indirect exposures through air/vegetation, air/soil, air/water, etc. transfers in a way that cannot be tracked with this model system. 3.5 Proposed methodology for estimating food ingestion exposures The Committee urges that a distinction be made between using available data as "case studies" or guides for testing methodology, and actually measuring exposure. The model's methodology for food ingestion exposures should address the same exposure parameters as the other routes of exposure. We believe that these include: a) estimates of the times of exposure b) estimates of the duration of exposure c) methodology to compute the probability of multiple sources of exposure (e.g., contributions via routes other than ingestion of food) The proposed methods for estimating of central tendency are appropriate. This estimate is intended as a long term or "life time" exposure estimate. 16 ------- The proposed methods for estimation of the probability of exposure along the distribution will require more data and in particular appropriate handling of the food consumption data. These methods require incorporation of methodology to identify when exposure occurs so that the probability of exposure from multiple foods can be computed. In attempting to do Monte Carlo analyses for food consumption, one needs to work directly with the raw data because the consumption of foods are not independent - e.g. a person who eats apples may not eat pears, etc. Fortunately the data do exist in a form that allows doing this. That is, the food consumption data are available in a raw format that captures what foods each individual consumed on each of the three days of the survey. The IHEC suggests that EPA include water from the same USDA food consumption survey used for foods. It can be subjected to Monte Carlo analysis simply as though it were another food. Some decisions need to be made as to the suitability of the data for subgroups of the population. If the number of subjects in a subgroup is small, it may not be appropriate to compute percentiles. This is particularly true when looking at ingestion of selected food groups by subgroups of the population. We would recommend defining criteria for when a subgroup size is inadequate to allow calculations to be made. As was discussed at the public meeting, the USDA data permit each individual's body weight to be incorporated so no assumptions are necessary regarding body weight. Care will need to be taken in matching the form of the food for which consumption is reported to the appropriate residue concentration information. The Office of Pesticide Programs has processing studies for many of the pesticides which may be useful in building an appropriate bridge between the food consumption and residue data. Dealing with industrial contaminants and heavy metals is a more complex problem since these agents can be introduced during processing. Where data are available for one set of processed foods it is possible to make assumptions that levels would be similar in other processed foods. For example, the FDA total diet study (TDS) presents contaminant levels for 234 foods and identifies which foods from the USDA survey each food was assumed to represent. However, the TDS may be more useful 17 ------- as a "case study" than as a statistically representative survey of residues in the U.S. food supply. In extracting data from existing monitoring databases EPA should be sure that compounds detected by more than one method are not double counted. The Agency also needs to develop consistent methods for handling samples that are below the limit of detection, and it would be desirable to estimate the impact of assumptions regarding the concentrations below the limit of quantification on the estimates of exposure. For consistency, it will be important to either use the methods proposed in the EPA Exposure Guidelines for Assessment (U.S. EPA, 1992) or to note why other approaches were employed. Limiting the number of food categories should not lead to an underestimation of exposure. In fact, if all the foods in a category are assumed to contain residues when in fact only a subset does, then exposure will be overestimated. In dealing with multiple pollutants, the Committee recommends tackling either multiple pollutants or multiple sources first, since it is far too complex to do both at the same time. If multiple pollutants are evaluated residue levels must be adjusted to reflect potency prior to combining residue levels for different compounds. We recommend experts in toxicology be consulted to ensure that differences in toxicity and potency are correctly reflected in the calculations. 3.6 Data sources for food consumption amounts and residue levels The Committee believes that OPPE has identified the most appropriate existing food consumption data and also the available residue information. In particular, the USDA Nationwide Food Consumption Surveys are the best available data for these types of analyses. This survey contains both food consumption information and water intake for about 5,000 individuals per year and there are currently 3 survey years available using the same survey design so they can be combined. The EPA project staff correctly noted the difficulties in obtaining concentration levels for many compounds. We believe they have identified the existing data and are proposing to take maximum advantage of the available data. Study reports should continue to identify appropriate caveats and especially those situations where additional data are desirable. In the materials supplied to the Committee (Memorandum: Methodology for the Baseline Cumulative Exposure Analysis for the Ingestion Pathway, June 5, 1995, page 11), it is noted that regional differences in pesticide levels on fruits and vegetables are 18 ------- not likely to occur because such produce are shipped across the country, etc. It is, however, feasible to look at regional variation of food consumption, and, in some instances, at residue levels. If regional assessments are of interest, it is possible to apply weighting factors to the pesticide residue data to generate regional estimates, particularly for pesticides. Regional estimates are likely to be important for heavy metals. However, we agree that adequate data are not currently available. Given the limitations of the data, the IHEC suggests an "example" calculation be included. Thus, although the data may limit the extent to which regional or seasonal changes may be considered, it is possible (for some compounds) to demonstrate the appropriate approaches and to evaluate the impact of variations in contaminant levels for both region and season. This approach would assist in determining the relative priorities for additional data. We would note that the United States Department of Agriculture's Pesticide Data Program (USDA POP) data are evenly distributed throughout the year, and given that produce is not available in equal amounts throughout the year, it will be necessary to apply the USDA-developed statistical weights when conducting seasonal analyses. Exposures to such contaminants can also result from consumption of home grown fruits and vegetables. Unfortunately, there have been virtually no analysis of home-grown fruits and vegetables in the United States. In computing aggregate exposure, it may therefore be necessary to consider that home-grown foodstuffs have similar residues to those commercially grown in the same general area of the country. An additional exposure component would be exposure to the homeowner who treats his/her fruits and vegetables with pesticides. This exposure could be estimated from existing worker exposure estimates for the same pesticides, scaled to reflect the less frequent exposure of homeowners. (The Home and Garden Pesticide use survey should provide useful data to incorporate into this assessment.) There are frequent situations where EPA analysts would like to know potential exposures due to residues in food from the home garden. Acquiring data to fill this gap might be one area to which high priority should be assigned. The cited water consumption data from Ershaw and Cantor (1989) is quite old and newer data are available in the same USDA survey that is proposed for use with food consumption data. That survey also contains some information about the consumption of bottled water. 3.7 Methods for dealing with food samples below the limit of quantification 19 ------- The methods for assigning values to non-detects1 on page 11 of the Memorandum of the "Methodology for Baseline Cumulative Exposure Analysis for the Ingestion Pathway" are given simply as "... assigning non-detects values of one-half the detection limit, zero, and the detection limit." No further descriptions are given in the following text on how cases with non-detectable values will be assigned to one of the three possible values. It is hard to comment on methods which are not clearly defined. Assigning different values to non-detects will result in different estimates for population means of parent distributions. The performance of different estimates depends not only on the methods for assigning values to non-detects but also on other factors such as the type of parent distributions, sample size, and proportion of values under the limit of detection. To illustrate (for this report) how sample means vary under different conditions, a Member of the IHEC developed Monte Carlo simulations using three types of distributions, five sample sizes and three detection limits. The three distributions employed for simulation work were normal, uniform, and exponential distributions. Sample sizes were 5, 10, 25, 50 and 100. The percent of values at or below the limit of detection were set at 25%, 50%, and 75%. The results of these Monte Carlo simulations are shown in Appendix A. In addition to the three methods of assigning non-detects, sample means were also calculated by excluding all non- detects, a method cited in some publications as a procedure to treat non-detects. From Appendix A, the bias of different estimates can be easily observed when sample means are compared with population means. For all three distributions, assigning non-detects to zero has the effect of biasing the estimates (sample means) toward the lower ends of the distributions. Excluding non-detect values or setting non- detects to detection limits has the effect of biasing the estimates toward the higher ends of the distributions. Replacing non-detects with one-half the detection limits (Vz DL) has different effects on different distributions. The estimate is an unbiased one for an uniform distribution, biased toward the lower end for normal distributions, and biased toward the higher end for exponential distributions. Although Vz DL replacement results in biased estimates for normal and exponential distributions, the sizes of bias are relatively small compared with other methods. Under the circumstances that parent distributions are unknown, and assigning non-detect cases to one-half the detection limit is probably not a bad choice. As shown in the simulation results, the deviation of sample means from population means increases as the proportion of non-detects increases. It is logical that the more information we have about the distribution, the closer we can estimate the 1 To be consistent with the OPPE documents, the terms "limit of quantification" and "limit of detection" are used here interchangeably, although, strictly speaking, they are not the same. 20 ------- central tendency. On the other hand, the less information we have, such as in the case wherein 75% of the observed values fall into the non-detectable category, the less accurate estimates become. Here again, assigning non-detect values to % DL has proven to be a reasonable estimate even with increased uncertainty. When "93% of the food contamination database is comprised of samples with values below detection limits" (p.3 of the Addendum to the Methodology for Baseline Cumulative Exposure Analysis for the Ingestion Pathway), any methods deployed to estimate the central tendency will be highly uncertain. Without knowledge of the parent distributions of various food contaminants, it is hard to judge the direction and magnitude of biases. The simulation results illustrated here were based on simple assumptions on distribution and detection limits to demonstrate the effects of various factors. In real situations, the conditions are likely to be much more complex than the simplified simulation presented here. It is not unusual to find a a parent distribution that is a multimodal lognormal distribution, a mixture of two or more lognormal distributions. And it is not uncommon that the detection limit for a given food contaminant varies from one data source to another. Therefore the combined data set comprises sample subsets with varying limits of detection. Sensitivity analysis may be useful in showing the range (upper and lower bounds) of estimates for central tendency, but it can not improve the accuracy and precision of estimates. The uncertainty of contamination data can be improved with any additional information from other sources. If a naturally occurring substance is ubiquitous and its concentration can be determined with better analytical methods, then zero should not be assigned to values reported to be below the detection limit. The same rule should be applied to stable chemicals that have been produced for decades and are distributed worldwide. Non-detects are not true zeros but the results of low-resolution- measurements. In contrast, if a pesticide has been applied only to certain fruits and vegetables in limited regions, it is not unreasonable to assign zero to non-detects for food categories from uncontaminated areas. It will be helpful to determine the shape of the parent distribution if the type of distribution (e.g. lognormal) has been repeatedly observed from other studies. With the supporting information from other sources, the same type of distribution can be assumed for the estimation of central tendency and Monte Carlo simulations. 3.8 Treatment of uncertainty in the food ingestion exposure estimates Analytic assumptions and data limitations are the major potential sources of uncertainties listed in Exhibit 8 on page 20 of the June 5, 1995 Memorandum cited 21 ------- earlier. Uncertainties associated with analytic assumptions are further considered for non-detectable contaminant values, consumption values for non-consumers, independence of food consumption values, independence of food contaminant values, and geographic and demographic uniformity of contaminant concentrations. Uncertainties associated with data limitations are discussed for food consumption data and food contamination data respectively. Uncertainty related to non-detectable contaminant values having been discussed extensively, there is no need for further discussion. The proposal to assign consumption values to non-consumers mentioned in the June 5 Memorandum (pp. 21-22) was dropped in the May, 1996 Addendum (p. 2) attributed to Industrial Economics, Inc. Discussion of this issue thus becomes unnecessary. The assumptions on the independence of food consumption values and food contamination values can be easily tested using the food database. We should not be surprised to find out that neither the food consumption behavior nor the food concentration values are independent. Vegetarians who consume non-meat products and lactose-intolerant-people who avoid dairy products are good examples of people who have a tendency to select or avoid certain types of food. Since not every pesticide is applied nationwide to all fruits and vegetables and not every food product is distributed evenly across all regions, it should be expected that contaminant concentrations in some kinds of foods are related. Actually, Monte Carlo simulations do not need the assumptions about independence of food consumption and contamination values as long as surveyed individuals and food samples are independent. The assumptions about the geographic and demographic uniformity of contaminant concentrations could be more serious problem. For example, there are large concentrations of Hispanic and Asian Americans in California and in some other urban areas. They tend to live in city areas with many grocery stores that provide food items imported from Mexico, Central America, or southeast Asia. Their consumption quantities and the contaminant concentrations of certain food items are very likely to be related. The assumption that contaminant concentrations are the same for different geographic and demographic subpopulation may lower the estimate of exposure level, and hence the associated risk, to certain subpopulations. Uncertainties associated with food consumption data are addressed under reporting error and bias, non-response bias, and potential bias of three-day samples, but no treatments are suggested to correct the error or adjust the bias. Reporting error 22 ------- and bias from the U.S. Department of Agriculture's Continuing Survey of Food Intake for Individuals (USDA CSFII) are not likely to be corrected or adjusted unless an independent quality control activity can be carried out during the survey to check the validity of survey answers with actual food consumption behavior. The CSFII response rate was over 50 percent, a fairly good rate for a national survey at this scale, but not high enough to claim it is representative. It is very likely that subpopulations from lower socio-economic classes are under-represented. In general, the poorly educated and those with income around poverty level are less likely to respond. The bias can be adjusted by applying different weight to individuals from different subpopulations. The three-day sample of consumption behavior may not be a problem at all, if these three days are a typical three-day segment and not different from any other three days in a year. Taking the nation as a whole, if food consumption patterns do not fluctuate significantly from day to day (with the exception of Christmas and Thanksgiving or similar holidays, and possible variations from week days to week ends), any three-day sample should be equally representative. Uncertainty associated with food contamination data are addressed under food processing factors and sampling error. Nothing is mentioned about laboratory variation, although sampling method, sample storage, preparation, and analysis by different laboratories will certainly add to the variation in final concentration values, and it may also contribute to variability in reported lower limits of detection. The proposal that processing factors will be developed and applied to the raw food contamination concentrations to estimate processed food concentrations (p. 11 of Memorandum) was dropped in the Addendum (p. 2) due to the complexity of the task. The Committee suggests that existing raw and processed food concentration data should be compiled and compared. Additional data should be collected, if it is possible, in order to provide a general idea about the range of food processing factors. The problems of small sample size and over-sampling of suspect contaminants are mentioned in the Memorandum and Addendum. The possible consequences affecting the national estimates are identified, but no corrections are suggested. Insufficient and unrepresentative data are serious problems which can not be solved by modeling efforts. Unless more nationwide representative data for certain food items can be collected, any national estimates based on the limited data could be easily challenged. 3.9 Methodology for estimating drinking water ingestion exposures 23 ------- The sources of uncertainty in the estimates of exposures via food have been appropriately recognized (Exhibit 8 of the June 5, 1996 Memorandum). Although estimation of some of these uncertainties may be attempted using Monte Carlo simulations, as proposed, others are more difficult to do because no data are available. Also, Monte Carlo simulations are of limited use when most of the data are below detection (93% in the case of foods). Such data may not allow estimation of either the parameters of the distribution or the distribution of the underlying variables. If the range of uncertainty calculated for different assumptions about the detection limits is very large, it will not be possible to derive any conclusions about differences in among geographic subgroups. The assumption of uniformity within racial groups may not be correct (diet for Hispanics varies according to place of origin). The result could be a number of contaminants for which uncertainties in consumption are unreliable (i.e., the variability within the demographic category may be larger than the variability between categories). 3.10 Data sources for drinking water consumption amounts/drinking water contaminant levels The proposed methodology will utilize water consumption data derived from the CSFII (which is more current than the data described initially in the documentation), and contaminant data reported by water distribution systems. The approach also adopts reasonable methods for accounting for systems/populations that cross county lines. In principal, this approach is justified, but limited by the available data. The contaminant data will probably incorporate concentrations of compounds present naturally in the water source (and not removed by treatment), as well as some residues and reaction products from water treatment chemicals. However, it does not provide data on contaminants added or removed by the distribution system before the water reaches individual households. This is a data gap that needs to be addressed. Also, since the use of water treatment units at the household level is not rare, information on such treatment and their frequency of use may be needed. This information might be obtained from companies that sell these units. It is likely, for example, that in areas with hard water and with subpopulations of medium to high income, the use of water softening units is frequent. Another area of concern is the attribution of contaminant levels from ground water systems serving communities of less than 100 households to individual wells (i.e., households whose water is derived from their own well). Given that many 24 ------- individual household wells have probably not been drilled and maintained appropriately (a problem that is less likely to occur for any ground water distribution system) these wells are likely to have higher levels of contamination. In some areas of the country, these individual wells are very common (e.g., the area along the border with Mexico). It might be possible to determine if the health departments in such areas have performed sampling of individually-owned wells. It is likely that water from these wells will be at the high end of the contaminant concentrations for at least some agents. The use of bottled water also should be considered, since it may be very common in certain areas of the country as well as for some population subgroups, perhaps depending on income level Part of the reported differences between the US population and the estimated number of water consumers may be due to individuals with second homes. In some counties (e.g. Galveston in Texas), a very large number of the residences could be second homes. The IHEC notes that the databases developed for estimating exposure to HAPs via the ingestion of drinking water might also be useful in phases of the project in which inhalation and dermal exposures to HAPs during recreational water contact are estimated. 3.11 Assigning values to drinking water data samples below the limit of quantification It seems reasonable to assign one-half the detection limit for nondetects under conditions specified on p. 30 of the June 5, 1995 Memorandum supplied to the Committee. The performance of the various methods for assigning values to non- detects can best be evaluated with a good and complete actual data set which include several counties. The methods can be considered reasonable when the detection limits are not very different from each other. Serious biases could occur if negative results are due to unreasonably high detection limits and not because of low concentrations. Taking the detection limits of arsenic concentrations on the following page (p. 31) as an example, the potential problem is easily recognizable. For a given county, if the detection limit is set at 50 ug/L, even though the true average concentration is 25 ug/L, all water samples could be below detection. Another county with a detection limit of 1 ug/L could have an average concentration of 10 ug/L and most water samples would be classed as above detection. The method of assigning zero to non-detectable values may result in an average concentration of zero for the first county and around 10 ug/L for the second county. 25 ------- It is a common practice to replace an average or regressed value for a missing value and then adjust the degrees of freedom of the test statistics accordingly. Substituting missing values in geographic areas or categories is reasonable only if few areas or categories have no monitoring data. The method of substitution will be highly unreliable if most data for specific areas or categories are missing. 3.12 Treatment of uncertainty in the drinking water ingestion exposure estimates It is clearly stated on p. 38 of the June 5, 1995 Memorandum cited earlier that "the results of the drinking water analysis are based on assumptions and statistics derived from incomplete, potentially inaccurate, and highly variable data." However, no uncertainty analysis has been conducted due (the Committee has been informed) to resource limitations. To reduce uncertainties associated with the drinking water data, the first necessary step is data validation. State-reported data with coding errors, duplicate records and only positive values reported should be treated appropriately before the data are analyzed. Substitution or extrapolation should only be allowed when the exercise is reasonable. For national estimates, there is no need to fill in a concentration value for every one of the over three thousand counties. A few hundred good and representative data points will enable EPA to do a reasonable job on the estimation of central tendency. Additional data points of unreliable substitutions can not increase the accuracy and precision of the estimate. It is reasonable and entirely acceptable to estimate central tendency based on good data and address limitations and uncertainties along with the estimation. 26 ------- 4. CONCLUSIONS 4.1 Science underlying the basic approaches and findings Although it is not, strictly speaking, a scientific issue, the Committee wishes to commend the Agency and the contractor staff involved for the quality of the documentation provided for the review. The materials submitted to the IHEC were well written and provided a good overview of the framework and the components of the project, as well as explicating the thinking and decisions made about approaches taken in the various components, e.g., outdoor air toxics and indoor air toxics. The Committee believes that the overall conceptual framework for the Cumulative Exposure Project is scientifically sound and provides a strong basis for a more integrated assessment of population exposures to toxic pollutants, the underlying contributors to exposure and, ultimately, a basis for comparisons of exposures to multiple pollutants in all media across geographical and demographic groups. Although the project is highly ambitious and some parts will be hampered in the near term by limitations in some of the measurement data, the overall direction and scope of the project will provide a more strategic means of evaluating exposures to toxic pollutants than does the chemical-by-chemical approach that is currently being used by the Agency. The framework is also broadly applicable for assessing human exposures to environmental pollutants in general. The Agency and its contractors have made a good start in the project. They have systematically begun to gather and integrate the necessary data bases and have made reasonable decisions on approaches to be taken for various components of the project. They have recognized many of the key issues and problems to be addressed and have utilized existing information, expertise and good judgment in their efforts to address these. In the near-term, the project is likely to provide the first estimates of the mean exposures of the U.S. population to a range of toxic pollutants emitted into the environment. As better measurement databases are developed, the cumulative exposure framework should be able to provide a means for assessing differences in exposures across regional and demographic sub-groups, and ultimately, for identifying sub-populations with very high exposures. This will enable the Agency to target its efforts to protect human health in a more effective way than has been possible in the past. 27 ------- In order to achieve fully the objectives of the Cumulative Exposure Project, certain critical additional measurement data will be needed. The Cumulative Exposure Project, by its very nature, provides an excellent means of identifying the most critical kinds of measurement data needed to assess population exposures to toxic pollutants. Because of the nature of EPA's mission and organization, model development efforts tend to be separated from environmental measurement efforts. The scientific method requires a more integrated and iterative use of models and measurements. Models are used to organize and interpret measurement data and then to design the next measurement experiments. Measured data are then used to test and further develop the models. The IHEC strongly urges the Agency to commit the measurement resources that will be needed for this project and to encourage and reward collaboration between the scientists who develop models and those who make environmental measurements. We also encourage the Agency to begin to examine ways in which environmental data collected for regulatory purposes might be collected in ways that would make these data simultaneously useful for scientific purposes. Many of the environmental measurements that are currently made, at great expense, for regulatory compliance, cannot be used to advance our understanding of environmental pollution because of missing data, large numbers of data points below the limits of detection, or the design of the sampling strategy. With some thought, it should be possible to begin to develop improved guidelines for collection of some environmental data so that it could be used for the dual purposes of regulatory compliance and advancement of environmental science. A major, and more long-term challenge for the Agency in this project, and for the scientific community in general, will be to develop scientifically defensible means of combining the exposures to multiple toxics in a manner that provides meaningful ways of assessing potential health risks from the combined exposures to many chemicals. In order to estimate exposures to more than one chemical, a health metric (e.g., toxicological potency) must be incorporated. A measure of potency is required to scale exposures to a common reference point so that total exposure is meaningful. This aspect of the project is one that will require time, thought and scientific creativity. The Cumulative Exposure Project is very likely to provide a strong driving force to go beyond the current, and oversimplified chemical-by-chemical approach to environmental health risk assessment, to begin to address environmental health risks in a way that recognizes that the population is simultaneously exposed to multiple 28 ------- environmental pollutants and that there is variability in the exposure of the population. 4.2 Research needs The goals set for the project are very high, and it should be a very significant step toward integrating exposures across different media and various chemicals if even only some of the objectives are achieved. Given the fact that resources are limited and that many of the required databases are incomplete and highly variable, it is not surprising that the Committee's review of this project identified several areas that require additional research and/or additional data. A key methodological question that requires further thought and research relates to the problem of attempting (ultimately) to establish priorities for remedial or protective actions based only on existing data captured in current sampling programs (e.g., in this effort, looking at exposure to compounds that are currently sampled in either the FDA program or the POP). Reliance on such data can not identify compounds that may be of potential concern but which have not been identified and included in a monitoring program. It appears that for most chemicals, the available data will not allow a realistic estimate of the distributions of exposure from multiple sources. In fact we note that there are only 3-4 compounds for which residue data are available from all routes. Research to "fill-in the blanks" for agents about which there is at least some information which makes them a concern would be a logical first priority, with data-gathering on a wide variety of other agents which might prove to be of concern as a secondary priority. Finally, it is worth noting that many of the difficulties encountered by this project stem from the nature of the data sets acquired to carry out the analyses. State data sets are collected primarily to determine compliance with state environmental laws, which vary from state to state. It is not unusual that they also differ in reporting format and detection limits even if the methods of sampling and analysis are the same. Although this is more of an operational issue than a specific research need, efforts should be made to coordinate state's activities on data collection and reporting so that state data are compatible and can be correctly and easily integrated for scientific purposes. 29 ------- 5. REFERENCES Cowan, C.E., Mackay, D. Feijtel, T.C.J., Van De Meent, D., Diguardo, A.,Davis, J.and Mackay, N., (1995) The Multimedia Fate Model: A vital Tool for Predicting the fate of Chemicals, (Society for Environmental Toxicology and Chemistry, Pensacola, FL.) Ershaw, A.G., and Cantor, F.P. 1989. Total water intake and tapwater intake in the United States: Population-based estimates of quantiles and sources, a report prepared under National Cancer Institute order #263-MD-810264, Life Sciences Research Office. Bethesda, MD> Lawson, D.R. 1990. The Southern California Air Quality Study. J. Air Waste Manage. Assoc. 40:156 et seq. RIVM. 1994. Uniform System for the Evaluation of Substances (USES), version 1.0. National Institute of Public Health and Environmental Protection (RIVM) Ministry of Housing, Spatial Planning and Environment (VROM), Ministry of Welfare, health, and Cultural Affairs (WVC), The Hague the Netherlands, VROM Distribution No. 11144/150. Sagebiel et al. 1996. Real-world emissions and calculated reactivities of organic species from motor vehicles. Atmos. Environ. 30, no 12: 2287-2296. U.S. EPA. 1992. Guidelines for exposure assessment. U.S. Environmental Protection Agency, Office of Research and Development, Washington DC. EPA/600/Z/92/001. Weiss, J.C. 1985. Updating applied diffusion models. Journal of Climate and Applied Meteorology. 24 (11), 1111 -1130. R-1 ------- APPENDIX A Monte Carlo Simulations on Detection Limits A. Design of Monte Carlo Simulations: Three distributions: Normal distribution (mean=10, standard deviation=3) Uniform distribution (mean=0.5, standard deviation=0.29) Exponential distribution (mean=1, standard deviation=1) Sample sizes: N = 5, 10,25, 50, 100 Percentages of values under the detection limits: 25%, 50%, 75% Methods of assigning non-detects Y - no detection limit is set Y1 - Non-detects are excluded Y2 - Value of non-detects = detection limit Y3 - Value of non-detects = one-half the detection limit Y4 - Value of non-detects = zero B. Abbreviations of column labels: POPUEX: Population mean POPUSD: Population standard deviation BLOCPROP: Proportion of observations under detection limit SIZE: Sample size set for the simulation work YN: Sample size of Y YEX: Sample mean of Y Y1N: Sample size of Y1 Y1 EX: Sample mean of Y1 Y2N: Sample size of Y2 Y2EX: Sample mean of Y2 Y3N: Sample size of Y3 Y3EX: Sample mean of Y3 Y4N: Sample size of Y4 Y4EX: Sample mean of Y4 A-1 ------- NORMAL POPUEX POPUSD BLOCPROP SIZE YN YEX YIN Y1EX Y2N Y2EX Y3N 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. .25 .25 .25 .25 .25 .50 .50 .50 .50 .50 .75 .75 .75 .75 .75 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 9.86 10.24 9.96 10.28 9.41 9.86 10.24 9.96 10.28 9.41 9.86 10.24 9.96 10.28 9.41 3 9 16 41 70 2 7 14 26 42 2 3 7 16 16 12.72 11.17 11.79 11.23 10.82 14.29 11.82 12.30 12.45 12.01 14.29 12.75 13.67 13.57 13.57 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 10.82 10.85 10.41 10.65 9.97 11.71 11.28 11.29 11.28 10.84 12.93 12.24 12.49 12.52 12.27 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 A-2 ------- UNIFORM POPUEX POPUSD BLOCPROP SIZE YN YEX YIN Y1EX Y2N Y2EX Y3N 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 .28868 0.25 0.25 0.25 0.25 0.25 0.50 0.50 0.50 0.50 0.50 0.75 0.75 0.75 0.75 0.75 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 0.51444 0.46040 0.55898 0.51700 0.48490 0.51444 0.46040 0.55898 0.51700 0.48490 0.51444 0.46040 0.55898 0.51700 0.48490 3 7 20 38 79 3 5 16 26 42 2 1 8 16 21 0.78912 0.58413 0.65741 0.64986 0.57827 0.78912 0.65501 0.73974 0.77568 0.75345 0.92584 0.85061 0.89734 0.87910 0.89267 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 0.57347 0.48389 0.57593 0.55389 0.50933 0.67347 0.57751 0.65343 0.64335 0.60645 0.82034 0.76006 0.79715 0.79131 0.77996 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 A-3 ------- EXPONENTIAL POPUEX POPUSD BLOCPROP SIZE YN YEX YIN Y1EX Y2N Y2EX Y3N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. .25 .25 .25 .25 .25 .50 50 50 50 50 75 75 75 75 75 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 0.83826 0.84186 0.97578 1.03301 1.04594 0.83826 0.84186 0.97578 1.03301 1.04594 0.83826 0.84186 0.97578 1.03301 1.04594 4 8 20 38 85 4 4 13 30 62 0 2 7 11 20 1.00358 1.03312 1.19375 1.30930 1.20061 1.00358 1.63465 1.58887 1.53215 1.46806 2.07587 2.15047 2.45019 2.39901 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 0.86040 0.88404 1.01254 1.06411 1.06367 0.94149 1.06975 1.15892 1.19655 1.17359 1.38629 1.52421 1.60026 1.62035 1.58884 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 A-4 ------- EXPONENTIAL POPUEX POPUSD BLOCPROP SIZE YN YEX YIN Y1EX Y2N Y2EX Y3N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. .25 .25 .25 .25 .25 .50 50 50 50 50 75 75 75 75 75 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 0.83826 0.84186 0.97578 1.03301 1.04594 0.83826 0.84186 0.97578 1.03301 1.04594 0.83826 0.84186 0.97578 1.03301 1.04594 4 8 20 38 85 4 4 13 30 62 0 2 7 11 20 1.00358 1.03312 1.19375 1.30930 1.20061 1.00358 1.63465 1.58887 1.53215 1.46806 2.07587 2.15047 2.45019 2.39901 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 0.86040 0.88404 1.01254 1.06411 1.06367 0.94149 1.06975 1.15892 1.19655 1.17359 1.38629 1.52421 1.60026 1.62035 1.58884 5 10 25 50 100 5 10 25 50 100 5 10 25 50 100 A-5 ------- |