Publication No: 820-R-14-003 FINAL PEER REVIEW REPORT External Peer Review of EPA's Draft Document Fish Consumption Rates Peer Reviewers: Patricia M. Guenther, Ph.D., RD Dale Hattis, Ph.D. Kenneth M. Portier, Ph.D. Janet A. Tooze, Ph.D., M.P.H. Contract No. EP-C-13-010 Task Order 2013-05 February 1, 2014 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates TABLE OF CONTENTS I. INTRODUCTION 1 II. CHARGE TO REVIEWERS 2 III. GENERAL IMPRESSIONS 4 IV. RESPONSE TO CHARGE QUESTIONS 7 Charge Question 1 7 Charge Question 2 9 Charge Question 3 13 Charge Question 4 15 Charge Question 5 18 Charge Question 6 23 Charge Question 7 24 Charge Question 8 27 Charge Question 9 28 Charge Question 10 30 V. INDIVIDUAL REVIEWER COMMENTS 31 Patricia M. Guenther, Ph.D., RD 32 Dale Hattis, Ph.D 37 Kenneth M. Portier, Ph.D 43 Janet A. Tooze, Ph.D., M.P.H 57 Attachment A: EPA's Draft Document "Fish Consumption Rates" 63 Attachment B: Mark-up of Draft Document by Patricia M. Guenther, Ph.D., RD 64 l ------- External Peer Review of EPA's Draft Document Fish Consumption Rates I. INTRODUCTION In October 2000, the Environmental Protection Agency's (EPA) Office of Water (OW) published a document titled, "Methodology for Deriving Ambient Water Quality Criteria for the Protection of Human Health." This document presented EPA's recommended methodology for developing human health ambient water quality criteria (HHAWQC) as required under Section 304(a) of the Clean Water Act (CWA). For each pollutant, chronic criteria were derived to reflect long-term consumption of fish and water. The fish consumption rate recommended for use in calculating human health criteria in the 2000 Methodology was derived from an analysis of the 1994 to 1996 data from the USDA's Continuing Survey of Food Intake by Individuals (CSFII) Survey. As fish consumption may have changed over the past decade and new analytical methodologies have been developed, OW has conducted a new analysis of fish consumption rates (FCR). These new FCRs were estimated using data from the National Health and Nutrition Examination Survey (NHANES) 2003-2010. NHANES is a continuous survey designed to collect data on the health and nutritional status of the U.S. population. EPA's draft document (Attachment A) presents the methodologies used to extract fish consumption data from the NHANES datasets, including the habitat apportionment methodology, the trophic level assignment methodology, and the statistical methodology using a modified version of the NCI Method. EPA intends to use the analyses of the NHANES data described in the document for peer review to update the general population fish consumption rate recommendations in the 2000 Methodology. EPA also intends to provide the data publically for use by states and tribes in generating more site specific HHAWQC with more localized data. The purpose of the requested letter review was for EPA to receive written comments from individual experts on the scientific merit of the document, appropriateness of the assumptions made, methods utilized, and quality and relevance of the data. Peer Reviewers: Patricia M. Guenther, Ph.D., RD Guenther Consulting Salt Lake City, UT Dale Hattis, Ph.D. Clark University Worcester, MA Kenneth M. Portier, Ph.D. America Cancer Society Atlanta, GA Janet A. Tooze, Ph.D., M.P.H. Wake Forest School of Medicine Winston-Salem, NC 1 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates II. CHARGE TO REVIEWERS The National Water Program Guidance for fiscal year (FY) 2011 describes how the Environmental Protection Agency (EPA), states, and tribal governments will work together to protect and improve the quality of the Nation's water, including wetlands, and ensure safe drinking water. The Guidance describes the key actions needed to accomplish the public health and environmental goals proposed in the EPA 2010-2015 Strategic Plan. These goals are: Protect public health by improving the quality of drinking water, making fish and shellfish safer to eat, and assuring that recreational waters are safe for swimming. Human health ambient water quality criteria (HHAWQC) for chemical pollutants are derived to establish ambient concentrations of pollutants which, if not exceeded, will protect the general population from chronic adverse health effects from those pollutants due to consumption of aquatic organisms and water. The procedures for calculating HHAWQC were described in the EPA's "Methodology for Deriving Ambient Water Quality Criteria for the Protection of Human Health" (2000 Methodology) EPA-822-B-00-004 (USEPA 2000). For each pollutant, chronic criteria are derived to reflect long-term consumption of fish and water. The fish consumption rate recommended for use in calculating human health criteria in the 2000 Methodology was derived from an analysis of the 1994 to 1996 data from the USDA's CSFII Survey. The recommended fish consumption rate of 17.5 g/day represents the 90th percentile of the 1994-96 CSFU data for the adult population (Jacobs, et al). This value also represents the uncooked weight estimated from the CSFII data, and represents intake of freshwater and estuarine finfish and shellfish only. The EPA believes that States and authorized Tribes should have the flexibility to develop criteria, on a site-specific basis, that provide additional protection appropriate for different or unique populations. The EPA is aware that exposure patterns in general, and fish consumption in particular, vary substantially. The EPA understands that unique or highly exposed populations may be widely distributed geographically throughout a given State or Tribal area. The EPA recommends that priority be given to identifying and adequately protecting these unique and highly exposed populations. The report submitted for peer review documents an analysis of the 2004-2010 NHANES data on the fish consumption and the association of fish consumption with geography, age, sex, ethnic race, and income. Charge Questions: 1. Is the document logical, clear and concise? Explain. If not, how could the document be improved? 2. Were scientific and statistical assumptions explained and are they appropriate? Explain. 3. Has appropriate literature been cited? Explain. Are there publicly available, peer-reviewed papers that should be included? Explain. 4. Is the methodology as presented and defined in the report scientifically appropriate for meeting the objectives of the project? Additionally and specifically: 2 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates a. Please comment on methods for calculating fish consumption rates. b. Please comment on the means for combining fish frequency data. c. Please comment on the method used to apportion species. 5. Please comment on appropriateness of the models used for estimating fish consumption rates, focusing on both the "NCI method" and the "modified EPA method." a. Is the EPA method clearly described and supported? Explain. b. Are uncertainties in the EPA model identified and characterized? Explain. 6. Is the EPA method adequate for accomplishing the objective? Explain. 7. Specifically in regards to the analysis: a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? 8. Are the results presented in the report understandable and appropriate for meeting the objectives of the project? Explain. If not, how could the presentation of the results be improved? 9. Are scientific uncertainties explained and are they appropriate? Explain. 10. The data used in the analysis have been subdivided based on demographic and geographical characteristics of the respondents. Are the subsets of data sufficiently robust to characterize fish consumption within the subgroups for the purposes stated in the report? Please provide your response for each of the major subgroup categories included in the main body of the report. 3 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates III. GENERAL IMPRESSIONS Patricia M. Guenther, Ph.D., RD In general, the methods and procedures should be clear enough so that they could be independently produced; this is not the case for how the dietary data were handled. It is not possible to judge the accuracy of the information presented because it is impossible to know exactly what types of fish and the exact amounts of fish that were consumed by the survey participants. One must assume that the reports of 24-hour dietary intake were accurate, precise, and unbiased; and this should be stated in the report. The limitations of the standardized recipes used for mixed dishes were not mentioned. This probably is not an important factor because most fish are probably not consumed as part of a mixed dish; however, it should be mentioned. It is not stated anywhere that the amounts presented in the tables are uncooked amounts of fish. How the cooked amounts reported by survey participants were converted to uncooked amounts is unclear. It is also unclear if the uncooked amounts are for the edible portion of fish or for the entire fish. I leave it to the statisticians to decide if the statistical methods used are clear and sound; however, it does seem that the modified NCI method yielded results that are fit for use in terms of how close they are to estimates from the original NCI method. Dale Hattis, Ph.D. This is a very good piece of work, applying very sophisticated statistical methods to the available data. However, it could be improved by adding a discussion chapter that analyzes and summarizes the findings relevant to risk assessment. I have done some preliminary analysis of geometric means and geometric standard deviations for total fish consumption from probability plots of the percentile information (see table on the next page.) Using this kind of analysis, the reader could be informed, for example that among racial groups, the "other race" category stands out as having higher overall fish consumption than other races. I assume this is due to the inclusion of Native Americans in that group, some of whom are subsistence fishers and are particularly at risk for high consumption of locally-caught fish and shellfish. It is also of interest that women of child-bearing age have slightly smaller geometric mean consumption but a greater apparent interindividual variability in consumption than other age/sex groups. Another aspect that could be improved would be to provide an additional set of data tables in which the dependent variable was not raw grams consumed per day per person, but grams consumed per kilogram of body weight. This could be readily done using the same methodology because the NHANES data include individual body weights. Finally, I think it would be helpful to show calculations of geometric standard deviations by the various breakdowns in the detailed tables so that the reader could appreciate (1) which groups have more or less variability in fish consumption and (2) so that comparisons could be made to long-term biomarkers of fish consumption, such as methylmercury and PCB blood concentration distributions. These latter 4 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates statistics may be in part available from other measurements in the NHANES data. In addition, I published some older data on these variables: Hattis, D. and Burmaster, D. E. "Assessment of Variability and Uncertainty Distributions for Practical Risk Analyses" Risk Analysis, Vol. 14, pp. 713-730, 1994. Table of Results of Lognormal Fitting to the Consumption Percentiles for All Fish (Based on Data from Table 6a) Geom Mean Geom. Std Group (g/day) Dev. All adults 14.61 2.247 Males 17.02 2.216 Females 13.03 2.216 Women 13-49 9.66 2.512 21-35 11.56 2.498 35 -<50 14.62 2.172 50-<65 20.33 2.025 65+ yrs 13.21 2.218 Non-Hi sp White 13.67 2.231 Non-His Black 16.78 2.090 Other Race 27.39 2.044 Kenneth M. Portier, Ph.D. Overall, I find the report readable, stays on topic and comprehensive. There are very few areas needing major revision and the writing is clear and concise with very, very few spelling errors. This said, I do see an alternate way of reorganizing the information in Chapter 4 to improve flow and understanding (see responses to charge questions 1 and 3 specifically). Janet A. Tooze, Ph.D., M.P.H. I found the layout of this report to be presented in a logical, clear, and concise manner. The classification of the fish groupings from the 24-hour recall data appeared to be done appropriately using the NHANES data as well as other sources. Being able to obtain the information from NHANES on geographical region is a strength. The tables are clearly presented and are provided for a broad range of fish type and subgroup. The document demonstrated a sound understanding of the NCI method. However, there are serious concerns about the validity of the estimates produced by the modified EPA method. In particular, this method makes a number of approximations to the NCI method, but it does not fully explore the implications of each of these approximations, nor does it fully justify the approximations that are made. Furthermore, details were lacking regarding some of the statistical methods including: validation of the modified EPA Method, construction of BRR weights, inclusion of covariates in models, and construction of subgroup estimates. From the report, it is not apparent that the time savings from making a number of approximations in the modified EPA method is worth the potential loss in bias and efficiency of the estimates produced. The dataset that was constructed 5 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates of fish consumption for NHANES participants appeared to be developed making reasonable assumptions and I have no concerns about the dataset used. I am concerned that the statistical methods utilized to estimate the distribution of usual fish intake is not well justified, and could lead to biased estimates. 6 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates IV. RESPONSE TO CHARGE QUESTIONS Charge Question 1 Is the document logical, dear and concise? Explain. If not, how could the document be improved? Patricia M. Guenther, Ph.D., RD In general, yes; however, the dietary data processing needs to be described more clearly. Dale Hattis, Ph.D. Yes. However, it could go into more detail for the non-statistician on the choices of distributional methods. Overall these seem reasonable, and the comment that there is very little difference between log-logistic and lognormal distributions is helpful. It might also be helpful to explain, if it is true, that the logistic distributions were selected for modeling because of greater mathematical tractability than lognormals. Kenneth M. Portier, Ph.D. I found the document logically ordered and the writing clear and concise but confusing in a couple of places. The document defines its objective in the Background section and identifies the major data source in Chapter 2. Chapter 3 introduces the NCI method, which is again described in Sections 4.4.1 and 4.4.2. Not certain why one even needs Chapter 3 since the material in Chapter 3 might be better as a background section in Chapter 4 (or a new Statistical Methods Chapter). Chapter 4 combines a number of "methods" that could very easily comprise their own chapters. The methods discussion around habitat apportionment (Section 4.1) and trophic level assignment (section 4.2) could be combined in one chapter describing how fish-related characteristics are used in estimating (stratified) consumption rates. The specific comments to Question 2 suggest some ways that these Sections (or new Chapter) might be better organized. In particular, organizing the apportionment discussion around the "rules" and data sources used in apportionment would improve understanding. Section 4.3 on "Extracting reported amounts of fish consumed" could be a part of Chapter 2 since it really describes how the FNDDS files were processed to find food codes containing finfish and shellfish, hence it tells us in more detail what NHANES data were actually used. 7 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Section 4.4 (Statistical Methods) deserves its own chapter (called Statistical Methods) since it contains the key discussions of the NCI method for estimation of fish consumption and described the modifications of this approach that constitutes the "EPA method." This discussion could benefit from a short discussion relating sample size to estimate uncertainty to help answer the question of "How many observations are needed to estimate consumption to a specified level of precision?" Chapter 5 (Results) can benefit from more discussion of model goodness of fit. Overall, there is a need to standardize labels. In the report I find references to the "NCI method," the "NCI model," the "EPA model," the "EPA approach," the "EPA method," the "Modified NCI Method" (page 22) and in the Figures, the "Westat Modified NCI Method." It initially was difficult to know how many "methods" were really under consideration. Was it two or three? Only after one reads Chapter 4 do you realize there are only two "methods," with two "models" for each method, one for probability of fish consumption and one for amount of fish consumed. I will refer to the NCI and the EPA "methods" in my remarks. Occasionally, I will refer to the model for estimating the probability of fish consumption and the model for amount of fish consumed for specific methods. There are also two "methods" for simulating UFC based on the fitted NCI or EPA method estimated parameters and associated models. Additional suggestions for report improvement can be found in my replies to the remaining questions. Janet A. Tooze, Ph.D., M.P.H. In general, the document is clear, logical, and concise. The document is logically organized in the order of presentation, and outlines all necessary sections of the study population, methods, and results. The results are clearly presented. There are some details lacking in the statistical analysis section (see questions 4 and 5). 8 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 2 Were scientific and statistical assumptions explained and are they appropriate? Explain. Patricia M. Guenther, Ph.D., RD Yes, the assumptions underlying the NCI method were well explained. However, the assumptions made about the standardized recipes in the FNDDS were not mentioned. A statement of the assumption that the reports of 24-hour dietary intake were accurate, precise, and unbiased is also missing. Dale Hattis, Ph.D. The statistical assumptions were described but the reasoning underlying them could have been more fully explained (see previous comment). Kenneth M. Portier, Ph.D. I did not find any specific sections discussing scientific or statistical assumptions in the report. Scientific and statistical assumptions seem to be discussed as needed throughout the document. I think it is appropriate that it be done this way. Further, discussion of assumptions is needed in a number of places as outlined below. Page 1: We are told that the current default fish consumption rate (FCR) used by OW are the 90th and 99th percentile estimates from the freshwater and estuarine fish consumption distributions computed from the CSFII. When you get to the bottom of Page 2 you find that we will actually be provided with "the UFCR estimates and 95 % CI of the mean and the 25th, 50th, 75th, 90th, 95th, 97th, and 99th percentiles." There is no discussion (or justification) for why these particular percentiles (probably to illustrate the right tail of the consumption distribution which is where risk assessment interest is greatest). Why not also provide 5%-tiles up to 95% and illustrate the whole distribution? Page 1: It is stated that "As fish consumption may have changed over the past decade..What is the evidence for this as a reasonable assumption on which to justify the effort of creating new estimates? [One or a couple of references to current studies, popular reports, NOAA landings values, etc. would satisfy this need.] Page 1: Reference is made to the NCI method. Have other methods been proposed but rejected? Page 1: It is stated that "The calculation using the NCI Method are very time consuming." It is assumed that either: 1) EPA does not have the time to make these calculations or 2) EPA cannot find the computational power to makes these calculations in a reasonable amount of time. I don't find this discussed anywhere. Acceptance of this assumption is key to justifying the development and use of the EPA modified method. 9 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Page 2: Estimates are desired for 18 different categories of fish. It is assumed that each category is important to some entity. Nowhere is there discussion as to why these categories specifically are chosen. Page 3: Chapter 2 discusses the NHANES as a quality source of finfish and shellfish consumption for the general US population. Are we to assume that this is the only source of such data? A discussion of other potential sources for fish consumption data and why the NHANES was used is needed. Page 5: The FNDDS is discussed in general (here, but in more detail in Section 4.3), but the "science" behind this database merits at least a paragraph. This database is used in the critical step of translating what is eaten (a menu item) to how much fish is consumed. Page 7: The scientific and statistical assumptions of the NCI method are covered in Question 4. Page 8: While "The assignments of species were completed by a fisheries biologist" it is not clear what assumptions and/or rules were employed in this assignment. If I were to employ a different fisheries biologist, would that individual come up with the same habitat apportionment? By providing insight into the assumptions and rules used by the fisheries biologist, we are better able to ensure repeatability (a scientific method characteristic) to this process. The "decisions" listed in the four bullets are actually some of the "rules" used by the fisheries biologist in the assignment. Are these all of the rules? It is clear that NOAA landings data factor into these "rules" (Section 4.1.2). In addition, the final rule is "that unspecified fish consumed was assigned the overall average habitat apportionment of all species reported consumed." Is this reasonable? Page 11: The statement, "No species in a group was assigned 0 percent based on a 0 count in the files, because it may be reported in another NHANES cycle," requires additional clarification. What was the rule used to assign the value greater than zero? Page 14: The fourth bullet on this page refers to "best professional judgment" and an example in catfish is described. Is catfish the only NHANES grouping that is impacted by this "rule"? Table 3 might be modified to indicate which fish allocation is impacted by "best professional judgment." The scientific issue here is repeatability. Pages 17-20: Assumptions for statistical methods presented in Question 5. Page 20: (Section 4.4.3) It is not clear from the first sentence in this section whether the bulleted statements represent constraints on the NCI method estimates when used for simulating fish consumption or whether these statements are constraints under which the NCI method estimates are derived. I think these bullets are actually establishing the specific "reality" we are attempting to simulate using the information from the fitted NCI model. 10 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Paragraphs 4 and 5 on page 54 (Section 5.4.2) initiate a discussion on model assumptions but doesn't really take it very far. In paragraph 4, you say "The validity of these assumptions can be discussed and, to some extent evaluated using data." but don't elaborate. Maybe a little elaboration is justified. At the bottom of page 54, you write "In our opinion, the NCI method makes reasonable assumptions and, given the assumptions, has adequate sample size to provide estimates with little bias relative to the confidence interval width." I personally tend to agree with the report on this, but I suggest giving the reader a little more, especially about the reasonableness of the method assumptions. The issue of how fish never-consumers are handled is never addressed: One issue that is not addressed in the report impacts how the results of this study are used in a population risk assessments when the population consists of a fraction of individuals who, for personal reasons, never eat fish. Estimates of US residents who self-report as vegetarian or vegan range (not fish consumers) from a low of about 2.5% to a high of about 13.7% of the population (see http://en.wikipedia.ors/wiki/Vesetarianism in the United States#USA for details). The NCI and EPA methods seem to assume that every individual who provides data via the NHANES 24-hour or 30-day surveys has a positive probability of consuming fish over the covered time period. In statistical jargon, they assume an underlying continuous distribution of consumption. With this assumption, for any individual if we were able to effectively record consumption for a long enough period of time, every individual would be observed eating fish at least once in that time period. The reality is that the underlying fish consumption distribution is a mixture distribution with a positive probability of fish non- consumption (of say p . 025 to .137) and one minus this probability of consumption. The problem lies in that the NHANES survey does not have a question that identifies individuals who would "never eat fish, " hence it does not allow us to easily split out "fish consumers" from "fish non-consumers". The individuals who report no fish consumption are a mixture of "never consumers " and "low likelihood consumers. " The NCI method estimate of the probability of fish consumption in a 24-hour period essentially uses one probability for the mixture. This issue is not a problem at the estimation phase but does come up when the estimated model is used to simulate an individual's long-term probability of fish consumption. The equations on page 21 suggest that the long-term probability of fish consumption (Quj) will always be greater than zero (distribution is assumed Logistic, a continuous distribution, and hence the probability of a single value (0) is zero.) But this model uses the estimated 24-hour consumption probability (P, page 19) that includes the mixture. So, the problem is that the simulation is really about fish consumers, but one of the parameters used in the simulation (P, which affects the estimate of the other "pi s ") represents both consumers and non-consumers. The ultimate result is that the percentiles for the fish consumption distribution are all likely to be over estimates which conveniently adds a conservative lean to population risk assessments. 11 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Janet A. Tooze, Ph.D., M.P.H. The document demonstrates a thorough understanding of the NCI method and the assumptions that it makes. There are no concerns about the implementation of this method; however, it was only used to compare to the EPA method, not to make the estimates in the report. The authors used what they are calling the modified EPA method to produce the tables in this report, and the implications of the assumptions of this method are not as clearly described as the assumptions of the NCI method. In Section 5.4.2, the assumptions made and discussed are that of the NCI method; however, the modified NCI method is what is used in this report, and the assumptions that it makes should be addressed in this section, rather than that of the NCI method. It is not clear from this report that the authors understand what the implications of the assumptions of this modified EPA method are. For example, the authors compare their method to the NCI method, and show that in some cases it provides higher estimates than the NCI method. However, it is not clear from the report why this is so, and what assumptions of their method lead to this potential bias. 12 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 3 Has appropriate literature been cited? Explain. Are there publicly avail able, peer- reviewed papers that should be included? Explain. Patricia M. Guenther, Ph.D., RD For the most part, yes. The Freedman paper is irrelevant to this analysis and should be omitted. It may be helpful to list Kipnis et al., 2009, "Modeling data with excess zeros and measurement error: application to evaluating relationships between episodically consumed foods and health outcomes," Biometrics 65, 1003-1010, because it demonstrates the usefulness of food frequency data as covariates (although for a different purpose). Dale Hattis, Ph.D. These might be cited for background and for the distributions of exposure to seafood-borne contaminants: Hattis, D. and Burmaster, D. E. "Assessment of Variability and Uncertainty Distributions for Practical Risk Analyses" Risk Analysis. Vol. 14, pp. 713-730, 1994. Hattis, D., "Using Indicator Information for Managing Risks," Chapter 14 in: Environmental Indicators and Shellfish Safety. C. R. Hackney and M. D. Pierson, eds., Chapman & Hall, New York, pp. 364-380, 1993. Ahmed, F. E., Hattis, D., Wolke, R. E., and Steinman, D., "Human Health Risks Due to Consumption of Chemically Contaminated Fishery Products," Environ. Health Perspect., Vol. 101 (Suppl. 3), pp. 297-302, 1993 Probably there are other more recent references that would be appropriate for similar reasons. Kenneth M. Portier, Ph.D. A number of places need citations. Page 1: References to justify the statement that fish consumption rates have been changing. Reference to increasing NOAA landings values might suffice here, although looking at NMFS total landings data suggests decreased tonnage from 1993 to 2012 (4.6 MT in 1993 to 4.2MT in 2012). Page 8: The second bullet incorporates a quote but there is no indication where this quote comes from. (I assume this is part of the Clean Water Act, but not certain.) This statement also requires further clarification since the current sentence structure is complex making it difficult to understand. Page 13: The two references used for tropic level assignments are EPA technical reports from 2002 and 2003. Have these documents been examined recently to ensure they continue to describe "best science?" 13 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Page 22: Section 4.4.4 - This section should be significantly increased. A reference for method of computing confidence limits on the log scale and back transforming is provided below. The method for using full sample weights and replicate weights with NHANES data can be complicated for the uninitiated. I don't think the NHANES web site provides sufficient information for the reader of this report to understand how weights should (are) used in the analysis. The design effects discussion in the NCHS 2005 reference given is inadequate for this. A reference or two here, and/or a short discussion in an appendix, would ensure that future readers are not confused on what was done here. The four steps for computing the CIS really need to be described in greater detail. Again, the issue here is ensuring that readers are able to replicate the report results (scientific validity). Gilbert, Richard O., Statistical Methods for Environmental Pollution Monitoring, 1987, Van Norstrand Reinhold, NY, NY, Chapter 13 Characterizing Lognormal Populations, pp 164-176. Page 22: A reference/web link for the MIXTRAN macro is needed. Janet A. Tooze, Ph.D., M.P.H. It appears that appropriate literature has been cited in this report, with the exception of the modified EPA method that is presented. No literature is cited to support the modified EPA method that is used for the estimates given in the table. It appears to be an ad hoc method that has not been peer reviewed. 14 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 4 Is the methodology as presented and defined in the report scientifically appropriate for meeting the objectives of the project? Additionally and specifically: a. Please comment on methods for calculating fish consumption rates. b. Please comment on the means for combining fish frequency data. c. Please comment on the method used to apportion species. Patricia M. Guenther, Ph.D., RD a. Please comment on methods for calculating fish consumption rates. The modifications made to the NCI method seem satisfactory, but I defer to the statisticians. b. Please comment on the means for combining fish frequency data. If this refers to Section 2.2.2, then the methodology is appropriate. c. Please comment on the method used to apportion species. Reasonable. Dale Hattis, Ph.D. Yes, except that for understanding dosage distributions. I think it would be helpful to calculate fish consumption per unit body weight per day in addition to raw fish consumption per day. a. Please comment on methods for calculating fish consumption rates. These seem reasonable and generally appropriate. b. Please comment on the means for combining fish frequency data. As far as I could tell, the authors also seem to have made reasonable choices here. c. Please comment on the method used to apportion species. Seems OK. Kenneth M. Portier, Ph.D. I will assume that this question is asking about the methodology of processing the NHANES data to obtain short-term fish consumption likelihood and amount. Questions 5 and 6 ask for specific comments on the NCI and EPA modified methods for estimating long-term probability of fish consumption and amount consumed distributions. 15 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates The approach requires two broad steps. First is obtaining food consumption and from these self- reported types and amounts, estimating the amount of fish consumed. These data also allow estimation of the short-term probability (likelihood) of fish consumption. Next is to model the likelihood and amount of fish consumed, as obtained from NHANES, in such a way that the parameters of interest for risk assessment, the long-term fish consumption probability or likelihood, and the distribution of fish consumption (intake) given reported consumption are estimable. These two components are then used to estimate usual fish consumption (intake) as a long-term mean. This approach is both practical and has historically been used by others. a. Please comment on methods for calculating fish consumption rates. The method used to estimate the amount of fish consumed using NHANES data and detailed recipe analysis is state of the science. There is no discussion in the report about the uncertainties associated with the fish proportions associated with each food code presented in Appendix B. In addition, the uncertainties associated with percent moisture loss for each processing method in Table 4 are not discussed or provided. In the future, if someone was interested in understanding how variation in fish proportions in foods or moisture loss in processing methods impacts the usual fish consumption estimate (e.g. sensitivity analysis), it would be beneficial to have published standard errors of these key proportions. I do not know if standard errors are available from the original sources of these data. b. Please comment on the means for combining fish frequency data. The data needed for the NCI and EPA modified models is Aij, the amount of fish consumed, in grams, reported in a 24-hour dietary recall. This amount can represent all fish and/or shellfish, or can represent some subset of fish groupings, tropic class, or habitat class (defined in Chapter 4, Sections 4.1-4.2). In this report, the food codes (recipes) were decomposed to provide the fish proportion of the food and multipliers to which are used to calculate total fish and fish/shellfish subsets. This is all straightforward. The text does not describe how the multipliers in Appendix B are actually used. I had to work through the following example to understand these. An example like this should be placed in the report somewhere to help the reader interpret the column heading and the values therein. Let's examine the first line of Table B-l, "Shrimp dip, cream cheese base." Assuming one gram of this recipe, we would have .262 grams of fish or shellfish. To adjust for moisture loss (25%), the .262 grams would be assumed to be 75% of what was originally there. Hence, the pre-processed amount of fish would be ,262/.75=349 grams. This value (.349) is identical to the "Multiplier for total fish" so this column identified the amount of pre- processed fish in the recipe. I assume the .062 value for "Multiplier for marine fish" then indicates the amount of marine fish in the pre-processed recipe that produced one gram of final food. Since shrimp is the only fish in the recipe, we use the marine proportion for shrimp in Table 1 to assign 17.6 of the total fish to marine (.349 x .176=. 062). And so on 16 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates c. Please comment on the method used to apportion species. Overall, I have little to say about the apportionment of species other than the comment in Question 4b above on use of multipliers, and Question 2 comments for pages 8 and 11 about replication of "professional judgment." Janet A. Tooze, Ph.D., M.P.H. The methodology for creating the dataset of fish consumption by individuals in NHANES from the FNDDS files for the 24-hour recall appears to be appropriate. The statistical methods used to estimate the distribution of the fish consumption dataset are not well-validated, and may produce biased estimates. a. Please comment on methods for calculating fish consumption rates. The methods for estimating the distribution of fish consumption are based on the modified EPA method. It appears that this method was created for this project to estimate the distribution of fish consumption in order to provide estimates of consumption more quickly than using the more time consuming NCI method. In order to do this, the authors made a number of simplifications to the method with respect to the transformation selected, the modeling of probability of consumption, the modeling of the consumption day amount, the simulation of the usual fish consumption, how subgroup estimates were derived, and the calculation of the confidence intervals. Although there are well-accepted methods for modeling repeated measures binomial data (including generalized linear mixed effects models, which the NCI method uses, and GEE), the report presents what appears to be an ad hoc approach that is not cited in the statistical literature, nor is well validated in this report. Although this method saves computing time, it appears that it may lead in some cases to biased estimates of fish consumption rates for the US population. b. Please comment on the means for combining fish frequency data. The methodology for extracting the reported amounts of fish consumed from the 24-hour recall using the FNDDS files appeared to be appropriate. With respect to the statistical methodology, it is not clear if (and how) the 30-day fish consumption frequency data from the questionnaire were used as a covariate in statistical models; this is an appropriate way to use this information, but it is not clear if it was used in this manner. c. Please comment on the method used to apportion species. The method used to apportion species appears to be appropriate based on the food codes and the supporting data presented in the report. 17 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 5 Please comment on appropriateness of the models used for estimating fish consumption rates, focusing on both the "NCI method" and the "modified EPA method." a. Is the EPA method clearly described and supported? Explain. b. Are uncertainties in the EPA model identified and characterized? Explain. Patricia M. Guenther, Ph.D., RD a. Is the EPA method clearly described and supported? Explain. Defer to the statisticians. b. Are uncertainties in the EPA model identified and characterized? Explain. I believe so, but defer to the statisticians. Dale Hattis, Ph.D. a. Is the EPA method clearly described and supported? Explain. Yes. The comparisons indicating comparable results for the modified EPA method and the NCI method build confidence. However, aside from leaving out some specific variables, I was not clear on the exact differences between the methods. b. Are uncertainties in the EPA model identified and characterized? Explain. They seem to be reasonably well identified, although a clearer summary would be helpful. The assumption of normality in the transformed parameters seems a reasonable approximation but the difference between the actual data and the distribution imposed by the normality assumption could be more explicitly shown to the reader to further build confidence in the method and results. Kenneth M. Portier, Ph.D. I would like to make a few remarks about the NCI method here since the specific sub questions focus on the EPA method. These comments relate to Section 4.4.2. The first paragraph states that "The NCI method can be implemented using two SAS macros..." Does this mean that the reader can use this tool but for this report a different approach was used? Or does this mean that for this report the NCI method "was implemented" in SAS using two macros that can be obtained from the NCI? (But it doesn't tell me now to get them... do I write the Director?) In the second paragraph: 18 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates • The limits on k are not defined. What are the covariates? Are they all continuous, all categorical or mixed? • It would be clearer if you specified that j=1 for most individuals and only a few individuals have j={ 1, 2}. • Given that "The usual daily consumption is the weighted average of the weekday and weekend estimates" and given Friday is part of the weekend, the weights for this weighted average would be 4/7 x (Weekday average) + 3/7 x (Weekend average). Is this correct? Unclear. • What are the default starting values that NLMIXED uses to initiate its search (provide in a table or define how computed)? Are the MIXTRAN and DISTRIB macros to be provided in the report so that an informed user could examine this code to determine this? (issue of repeatability) • Cij is never defined (assumed to be "indicator of consumption"). • The X is not defined (the Cox transformation parameter). • The TCi are not defined as the person level effects for likelihood of consuming fish. • The otij are not defined as the person level effects for amount of fish consumed at the jth 24-hour recall. • The 7Txk are not identified as the coefficients that relate covariates to likelihood of fish consumption. • The axk are not identified as the coefficients that relate covariates to amount of fish consumption. • Note it might be nice to indicate that in this model, Cij, otij and m are all random effects, the rest of the parameters are fixed effects. Note that Pij is the probability of consuming fish in a 24-hour period. According to this model, 0 < Pij < 1. Py can never be 0 or 1 for any individual which assumes there are no fish non-consumers in the fitted data. Since it is highly likely that this is not true, the model is not quite realistic for its given data. a. Is the EPA method clearly described and supported? Explain. The description of the EPA method begins at the bottom of page 18. It would be better if the EPA method had its own section separate from the description of the NCI method. First, I think it is very important to state in a way that the reader notices it, that from one method fit to one set of fish consumption data, all of the sub-population estimates are derived. That is, all of Table C-l comes from one fit of the EPA method run applied to the total finfish and shellfish consumption data. The estimates of the model parameters obtained from the fit of the method to the data provide everything needed to compute all of these consumption distribution estimates. This tends to get lost in the report. This is important statistically because all of the data (for fish subset being run) are used to estimate the model parameters and, hence, all of the data are factored into subsequent confidence intervals. You aren't running fits to smaller and smaller datasets for subpopulations which would produce even wider confidence intervals. 19 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates The justification for simplifying the NCI method for parameter estimation is weak and I feel should be discussed in more detail. Some statistics on run times for the NCI method, run on a current model PC and used to estimate one fish consumption scenario, would likely be justification enough. Is the NCI method susceptible to running distributed on a computer grid (such as the World Community Grid - http://www.worldcommunitvgrid.org/) where thousands of computers could be used to produce the needed results? If so, that weakens the need for a modified method. The last paragraph on page 18 is actually a synopsis of the EPA method, used before you get into the formal details of the method. Rather than talk about what the SAS macro does, talk about the modification to the NCI method and then simply indicate that the approach has been implemented into a SAS macro called ??? (name never given) and available from ??? (location not provided). You indicate the use of a "normal scores plot" (a q-norm plot I assume) as an aid to determining the initial lambda* estimate (Box and Cox power transformation parameter). Exactly how is this done? Can you provide a reference to this approach? A good discussion and references to estimating the Box and Cox transformation parameter can be found in: Piegorsch, Walter W. and A. John Bailer, 1997, Statistics for Environmental Biology and Toxicology, Chapman & Hall, London, GB, Pages 130-131 It might be clearer if you list the EPA modified procedure as a series of steps. (I did this to help me understand the method but suggest it might also help other readers.) Step 1; compute the four summary statistics for each individual. Step 2; fit the logistic regression model. Step 3; iteratively fit the constrained logit model to minimize a weighted Chi square statistic and estimate individual level effects for the probability of fish consumption. Step 4; estimate the correlation between person-level random effects by using the residuals from the probability model as a predictor in the amount sub-model. Fit the amount sub- model using only records from the first 24-hour recall. Step 5; estimate the within-person variance component. Step 6; estimate the person-level random effect variance. The four equations found on the fifth line of page 19 should be stacked to be consistent with other equations. If you list these statistics vertically, you can add their "labels" to the right and remove the next two lines. Since j can at most be equal to 1 or 2, you are only averaging, summing or counting for a few individuals. The statement "The person-level random effect is included by assuming the predicted logit when excluding the random effect is proportional to the predicted logit when including the random effect." is not clear at all. It made more sense AFTER I look at equation 4 on page 19. 20 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Ok, here is where I get confused. In equation 4 you have log (P/(l-P)) as the response in the logistic regression. But for this to work shouldn't the P be Pi? But then in equation 5 you use Pi in the response and regress it against the logit of the Pi? Is the critical element here that equation 4 is fit incorporating survey weights, whereas equation 5 does not use the weights? Please clarify. Equation 5 basically says that the observed and (survey weighted) predicted Pi are proportional to each other and the residual is the individual level effects. This is not a particularly intuitive relationship and seems to be the key to why the EPA method would work. I think it is really important to motivate this step. Why would you expect this to work? How do you know that this results in normally distributed to? You write that "Calculation of standard errors requires: 1) calculation of replicate weights consistent with the NHANES survey design and strata and PSU variables; 2) running the macros using the full-sample weight and each replicate weight; and 3) combining the results to estimate the standard errors." I assume this is true for both the NCI and EPA method. I assume that these calculations occur each time SAS Proc SurveyReg is used. The reader needs to know or understand Proc SurveyReg to understand the importance of this quote. Another place a reference is needed. b. Are uncertainties in the EPA model identified and characterized? Explain. There is no place in the report where NCI or EPA method parameter estimates and their corresponding standard errors are displayed (uncertainty relates to parameter precision). Estimates and approximate standard errors must have been calculated for all model parameters - these would be required output from the statistical estimation routines. Not sure most readers would be interested in seeing these estimates in the body of the report, but since these estimates are important for the simulation of UFC these values should be available, either in an appendix or in an online file (repeatability issue again). Nowhere is goodness of fit for either model discussed (prediction uncertainty). Do these models fit equally well for particular data? Since the methods predict two outcomes, probability of fish consumption (logistic regression) and amount of fish consumption (regular regression), you would need two tables. An adequate (generally accepted) goodness of fit statistic like the R2 for regular regression is not available for logistic regression. Reporting the final scaled deviance would allow comparison for the logistic regressions. Along with the number of parameters in the model, these statistics form the basis for many proposed goodness of fit statistics for generalized linear models and hence might be the minimum required fit information that would need reporting. There are similar issues with the Cox and Box transformed linear regression since the R2 statistic is actually a function of the lambda* estimate. Still, reporting R2 values would allow some comparison. Section 4.4.6 compares the predicted UFCR from the two fitted methods. This is not the same as the model fit which examines predictions to actual for a specific model and data set. Both the NCI and EPA methods might predict the observed data adequately and still differ in predicted UFCR values. 21 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Janet A. Tooze, Ph.D., M.P.H. a. Is the EPA method clearly described and supported? Explain. As cited in the response to 4a, the EPA method is not well supported by the report. There are no citations to the statistical literature to support its use. There are no simulation studies to show that it will provide unbiased estimates, efficient estimates of usual intake (under the assumptions that the 24 hour recall is unbiased). It is described in the report, although some key details, such as how the BRR weights were created and used, are omitted, and the methods are not well justified. b. Are uncertainties in the EPA model identified and characterized? Explain. The statistical models used are described for estimating probability, amount, and the simulations. The statistical methods regarding the calculation of confidence intervals and BRR weights are not well described in the report. The number of simulations used to estimate the distribution (N=5 vs N=100 for NCI method) is not well justified. To fully identify and characterize this model would require a more extensive analysis with statistical simulations and comparison to the NCI method and other methods for estimating the distribution of usual intake for different scenarios of episodically consumed foods. Although estimating the SE for the percentiles is quite time consuming, taking 64 BRR runs per models, the percentile estimates (without SEs) are estimated from 1 run. The authors could have obtained these estimates for all the models and compared them to the point estimates from the modified method. 22 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 6 Is the EPA method adequate for accomplishing the objective? Explain. Patricia M. Guenther, Ph.D., RD It seems reasonable. Dale Hattis, Ph.D. Yes, it seems to be quite adequate based on the comparisons provided. Kenneth M. Portier, Ph.D. Adequacy here relates to the extent to which the EPA method suitably duplicates the NCI method results. Clearly the figures in the report indicate that on a distributional basis both methods seems to produce similar fish consumption distributions so to this extent the EPA method is adequate. I still worry about the issue of fish never-consumers and how they are handled by both methods. Of course, from a risk assessment point of view, fish never-consumers are never exposed to the contaminants that might be found in fish and hence might be considered not part of risk picture. Still, when examining population risks, ignoring fish never-consumers in these methods results in risk being over-estimated (the risk distributions are shifted to the right). Janet A. Tooze, Ph.D., M.P.H. If the objective is to obtain an unbiased estimate of the distribution of the various types of fish consumption in the report, under the assumption that the 24-hour recall provides an unbiased estimate of fish consumption, then the EPA method does not appear to be adequate for accomplishing this objective. It is not fully validated, and the results in Section 5.3 indicate that it may be biased. 23 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 7 Specifically In regards to the analysis: a. Were sufficient Information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? Patricia M. Guenther, Ph.D., RD a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. Not really. As stated above, the handling of the dietary intake data is unclear. b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? The procedures/methods for handling the dietary data are unclear. Dale Hattis, Ph.D. a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. The national representativeness of the NHANES data is fully described, as is the sampling protocol and the use of the population weights. All of this seems appropriate. b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? Yes. Only, I think in introducing the body weight factor to allow better representation of the distributions of consumption controlled for this major variable. Kenneth M. Portier, Ph.D. I assume in my reply below that this question is specifically about Section 5.3 (and indirectly the material in Section 4.4.3) where the NCI method is compared to the EPA method (which is referred to only in this section as the Modified NCI method). To me, this section represents an analysis of the EPA method. a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. In Section 4.4.3, we are provided with the methodology for simulating UFC with the NCI and EPA methods. After reading this section, I had a number of unanswered questions. 24 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates The "modifications" listed in the three bullets at the bottom of page 20 really describe the objective of the simulation exercise - a desire to compare UFC for a "standard week" ignoring recall-to-recall and within person variability. I get this, but I am not sure WHY you might want to limit the comparison this way. Justification or motivation needed here? Why 100 simulated values for each person? Optimal? Adequate? Just a number used for demonstration purposes (likely)? You fail to mention that you will be simulating fish consumption for every individual for which we have fish consumption data from NHAMES. You could just have likely created a synthetic cohort of fish consumers as the basis for the simulation. Oh! There is that 3/7 weight which just shows up here without explanation. See bullet 3 question 5. You need to make clear that the model parameter estimated values used for the NCI method simulation are different from the model parameter estimated values used for the EPA method (another reason to report these estimates in a table somewhere). Similarly, the lambda values used in the back transformation, Bui, values may be different for the NCI and EPA methods. The statement "This equation includes an adjustment with the within person variance in the fish consumption amount (uf ). This adjustment makes the untransformed fish consumption essentially unbiased compared to the original mean across the 24-hour recalls." needs a reference at a minimum and maybe even some motivation for why this is even needed. How often is a simulated Tui < - 1/A.? Does this happen more often for the EPA method? OK, so 100 Qui and Tui are available for each individual. How do I interpret these values? Theoretically, an individual has only one true "long-term probability of fish consumption." The average of the 100 QuiS is an estimate of this true value. Does this mean that the variance of these QuiS is an estimate of the uncertainty in our estimate for individual i? Same for the Tui. The first time I read the equation at the top of page 22 I thought that you were multiplying the mean Qui with the mean Tui to get the UFC for individual i, but actually you are computing 100 UuiS and then computing the mean (call it Ui) of these values to get the UFC for individual i. Is this correct? Oh, wait, you use the NHANES survey weights in here, so clearly you are computing individual averages by method. So, you DO NOT compare the Ui nci to the Ui epa but instead compare overall mean UFCnci to UFCepa and compare distributional tiles with a quantile-quantile plot. I understand that and to a certain extent it makes sense from a risk assessment point of view. What is important is that the methods simulate similar UFC distributions, overall and for strata. Still, for a model goodness of fit assessment, I would also be interested to see statistics/graphics that compared the Ui nci to the Ui epa. Doing this comparison will require some careful thought. In particular, a randomly simulated individual effect (for either probability of consumption or amount of consumption) might be generated once and used in the appropriate place for the different NCI and EPA methods to avoid the Ui nci to the Ui epa difference reflecting differences in random effect values. 25 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? I think the methodology used to compare the two methods is appropriate and makes sense for a tool focused to risk assessment. The methodology might be inadequate to aid understanding of whether the EPA method and NCI method produce similar estimated UFC for individuals with similar demographics. Janet A. Tooze, Ph.D., M.P.H. a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. Further information could be given about the predictors used in each model. In Section 4.4.5, the report cites that "all significant predictors" were used, but no criterion for significance is given, and it is not clear which predictors were used in which models. Although Section 2.2.2 outlines that the 30-d fish consumption frequency data could be used in statistical models, it is not clear if these data were used in any models, as they are not included in the list of variables in Section 4.4.5. It is not clear if people were excluded if they were missing covariate data. Furthermore, the methodology for creating subgroup estimates by age, gender, geographic region, etc. is not described in the report. It is important to know if covariates were used to define subgroups, or if the models were stratified by subgroup. b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? I think it is appropriate to include all plausible 24-hour recall data from NHANES for this analysis, as long as there are no apparent data entry or recipe errors. The report did not detail whether any type of data cleaning was done. 26 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 8 Are the results presented in the report understandable and appropriate for meeting the objectives of the project? Explain. If not, how could the presentation of the results be improved? Patricia M. Guenther, Ph.D., RD Need to state in the text and tables that the results are uncooked amounts and for edible portion only if that is the case. Dale Hattis, Ph.D. The results as far as they go are presented reasonably. As indicated above, I would like to see further analysis of parameters relevant for risk assessment and singling out of particularly important results for risk assessment implications. Kenneth M. Portier, Ph.D. My responses to all of the other questions contain suggestions for improving the presentation. There are places where the material is not clear and the writing should be improved. There are a couple of places where material that should appear together, such as the background for the NCI method and the discussion of the method itself, are in separate chapters where they might be better presented as one. Janet A. Tooze, Ph.D., M.P.H. The results presented in the report appear to be understandable and appropriate to the task. I believe that the authors of the report presented what they were asked to do; however, I have concerns with the validity of the estimates produced. 27 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 9 Are scientific uncertainties explained and are they appro/mate? Explain. Patricia M. Guenther, Ph.D., RD For the most part, yes, except as described above. Dale Hattis, Ph.D. Generally, yes. However, the key issue of within-person correlations of fish consumption appears to be based on just two days for each individual. This means that the degree of correlation of fish consumption on different days must be measured with some error. The degree of uncertainty in estimates of the within-person correlation probably should be discussed as it may tend to produce uncertainties in the allocation of variance between person-to-person differences and within-person differences. In addition to this, the report explains that there are reports of habitual fish consumption over a prior month. It would be good to see some explicit analysis of these data, or at least a clearer explanation of how these data contributed to the overall analysis. Kenneth M. Portier, Ph.D. I assume that this question is directed at Section 5.4 and my reply is focused on this section. Section 5.4.1: How might the results have been changed if a different fisheries biologist been used? Was the variability of NOAA landings from year-to-year incorporated in this analysis? Section 5.4.2: The first sentence seems to imply that the largest portion of the uncertainty in CI for the estimated distributional p-tiles (from the NCI method) comes from uncertainty in estimation of the within- and between- person variance components. Is this correct? Was this determined via a sensitivity analysis? Or was this determined by looking at the standard errors for the variance component estimates? It might be useful to expand on this since this has implications for future data needs (the need for more multi-day 24-hour recall records - something many EPA scientific review panels have asked for). When you say "The model," I suggest you use a more complete descriptor - "The NCI method." Section 5.4.3 (page 55): This paragraph is difficult to read because the phrase "the weighting" may not be clear to the reader. All the information is here, just improve the writing to be clearer of what the message is. An illustrative example of the issues at stake might help here. Section 5.4.4: The statement "However, they generally collect data in northern counties in the summer and southern counties in the winter." represents in my mind the biggest shortcoming of using these data for this analysis and the greatest potential for bias. I think this issue should also be discussed closer to the beginning of the report. Section 5.4.5: OK except the label "Modified NCI Method" should be standardized to the "EPA Method." 28 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Janet A. Tooze, Ph.D., M.P.H. Estimating usual fish consumption of specific species is a difficult task, and requires a number of assumptions in terms of data summary and analysis. The way in which the data were summarized appeared to be consistent with other studies and there was some discussion regarding the assumptions with respect to regions, seasonality, and habitat. From my knowledge of this area, these appeared to be appropriate. With respect to the statistical methodology, it appears that there are additional uncertainties that were not addressed to the degree that they could be (see my response to previous questions for details). It would be helpful to discuss the statistical methodology used in the previous report, to explain the discrepancies between the previous estimate of the 90th percentile of consumption compared to the new estimate. 29 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Charge Question 10 The data used in the analysis have been subdivided based on demographic and geographical characteristics of the respondents. Are the subsets of data sufficiently robust to characterize fish consumption within the subgroups for the purposes stated in the report? Please provide your response for each of the major subgroup categories included in the main body of the report. Patricia M. Guenther, Ph.D., RD As stated in the report, it is preferable to estimate fish consumption for the subgroups using a statistical model, rather than the same fish consumption rates for everyone. Dale Hattis, Ph.D. I think so. Kenneth M. Portier, Ph.D. The stratification or subdivisions seem reasonable and justified. The categories seem to cover most of the fish consumption categories that would be needed for risk assessments. Just a thought, not an action item: If I were to suggest one additional demographic factor it might be education level coded at two levels; "high school diploma/GED and below" and "some college and above." Education is highly correlated with income so most of the education effect is captured by the finer coded income factor. The non-Hispanic White category has the highest sample size and I wonder if it might be possible to break out a category of "Asian and Pacific Islander" and/or "Native American/Alaskan Native." These two later categories are likely to be higher consumers of fish but also, given the design of NHANES, are unlikely to be very well represented in the sample and not represented at all in many geographic regions. Janet A. Tooze, Ph.D., M.P.H. Table C-56 details the number reporting fish consumption on both 24-hour recalls by fish type. In general, one would want at least 50 participants per cell in order to estimate the variance components for between and within person variation. As mentioned previously, it is not clear exactly how the subgroup estimates were derived. If they were derived from covariates in one large model, it may be appropriate to assume the same ratio of between with within variance holds for the smaller subgroup. However, if the models are stratified by subgroup (which I do not think they were, but it is not completely clear), then the sample size of some of these subgroups would not be of sufficient size to produce stable estimates of variance components. 30 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates V. INDIVIDUAL REVIEWER COMMENTS 31 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Review By: Patricia M. Guenther, Ph.D., RD 32 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Peer Review Comments on EPA's Draft Document Fish Consumption Rates Patricia M. Guenther, Ph.D., RD Guenther Consulting I. GENERAL IMPRESSIONS In general, the methods and procedures should be clear enough so that they could be independently produced; this is not the case for how the dietary data were handled. It is not possible to judge the accuracy of the information presented because it is impossible to know exactly what types of fish and the exact amounts of fish that were consumed by the survey participants. One must assume that the reports of 24-hour dietary intake were accurate, precise, and unbiased; and this should be stated in the report. The limitations of the standardized recipes used for mixed dishes were not mentioned. This probably is not an important factor because most fish are probably not consumed as part of a mixed dish; however, it should be mentioned. It is not stated anywhere that the amounts presented in the tables are uncooked amounts of fish. How the cooked amounts reported by survey participants were converted to uncooked amounts is unclear. It is also unclear if the uncooked amounts are for the edible portion of fish or for the entire fish. I leave it to the statisticians to decide if the statistical methods used are clear and sound; however, it does seem that the modified NCI method yielded results that are fit for use in terms of how close they are to estimates from the original NCI method. II. RESPONSE TO CHARGE QUESTIONS 1. Is the document logical, clear and concise? Explain. If not, how could the document be improved ? In general, yes; however, the dietary data processing needs to be described more clearly. 2. Were scientific and statistical assumptions explained and are they appropriate? Explain. Yes, the assumptions underlying the NCI method were well explained. However, the assumptions made about the standardized recipes in the FNDDS were not mentioned. A statement of the assumption that the reports of 24-hour dietary intake were accurate, precise, and unbiased is also missing. 3. Has appropriate literature been cited? Explain. Are there publicly available, peer-reviewed papers that should be included? Explain. For the most part, yes. The Freedman paper is irrelevant to this analysis and should be omitted. It may be helpful to list Kipnis et al., 2009, "Modeling data with excess zeros and measurement 33 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates error: application to evaluating relationships between episodically consumed foods and health outcomes," Biometrics 65, 1003-1010, because it demonstrates the usefulness of food frequency data as covariates (although for a different purpose). 4. Is the methodology as presented and defined in the report scientifically appropriate for meeting the objectives of the project? Additionally and specifically: a. Please comment on methods for calculating fish consumption rates. The modifications made to the NCI method seem satisfactory, but I defer to the statisticians. b. Please comment on the means for combining fish frequency data. If this refers to Section 2.2.2, then the methodology is appropriate. c. Please comment on the method used to apportion species. Reasonable. 5. Please comment on appropriateness of the models used for estimating fish consumption rates, focusing on both the "NCI method" and the "modified EPA method. " a. Is the EPA method clearly described and supported? Explain. Defer to the statisticians. b. Are uncertainties in the EPA model identified and characterized? Explain. I believe so, but defer to the statisticians. 6. Is the EPA method adequate for accomplishing the objective? Explain. It seems reasonable. 7. Specifically in regards to the analysis: a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. Not really. As stated above, the handling of the dietary intake data is unclear. b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? The procedures/methods for handling the dietary data are unclear. 34 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates 8. Are the results presented in the report understandable and appropriate for meeting the objectives of the project? Explain. If not, how could the presentation of the results be improved? Need to state in the text and tables that the results are uncooked amounts and for edible portion only if that is the case. 9. Are scientific uncertainties explained and are they appropriate? Explain. For the most part, yes, except as described above. 10. The data used in the analysis have been subdivided based on demographic and geographical characteristics of the respondents. Are the subsets of data sufficiently robust to characterize fish consumption within the subgroups for the purposes stated in the report? Please provide your response for each of the major subgroup categories included in the main body of the report. As stated in the report, it is preferable to estimate fish consumption for the subgroups using a statistical model, rather than the same fish consumption rates for everyone. III. SPECIFIC OBSERVATIONS The following line numbers refer to the attached version of the report. It also includes editorial suggestions (track changes) for making the document clearer; suggestions made for tables apply to other tables in addition to where they appear. Line 237 [page 3]—Add "Survey participants are not asked to provide detailed recipes for mixed dishes. For those, standard, default recipes are used." This has implications since participants are not queried about the types of fish used in stews, sandwiches, etc. This is a limitation that should be acknowledged. Lines 303-307 [page 5]—This paragraph should be edited as follows: "The USDA Food and Nutrient Database for Dietary Studies (FNDDS) is the underlying database used to code dietary intakes for NHANES. It is a database of foods, their nutrient values, and gram weight equivalents for various amounts of foods. For each new version of FNDDS, foods, gram weights, and nutrient values are reviewed and updated to reflect the U.S. food supply by incorporating new foods based on what is reported in the survey and updating existing entries." The weights found in the FNDDS are not necessarily for "typical" portion sizes. Lines 316-319 [page 5]—It should be explained in detail earlier in the document that the FNDDS contains standard recipes. How those recipes were used in this analysis should also be described. Lines 433-436 [page 8]—These "groupings" are the unique food codes, right? Why not call them that? The term "food codes" is used elsewhere. Suggest instead, "When the raw 24-hr recall data are processed by NHANES, fish species reported are assigned food codes. The list below presents the food codes for fish that are specified in the FNDDS and the additional species that are included in each." 35 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Line 503 [page 10]—Smelt must have been reported before 2003; otherwise, the code would not exist. This should say instead, "[not reported in 2003-2010]." Lines 520-521 [page 11]—This is unclear. Would it be correct to say, "For these groups, we used raw (uncoded) 24-hour recall files from NHANES from 2007-08 (which are not publically available, and the only cycle made available to us) and counted the number of times a species was reported"? If so, the text should be revised accordingly; if not, the procedure should be described more clearly. Lines 614-625 [page 16]—This section is particularly unclear. It is unclear if the amounts of fish tabulated are cooked or uncooked. This should be specified. If they are uncooked, how were the cooked amounts from the NHANES data converted to uncooked amounts? These "adjustments" should be explained in detail. Furthermore, are these uncooked amounts of edible portion only, or are they uncooked amounts of whole fish? Do they include skin? Do they include bones? Lines 614-615 [page 16]—Some fish are prepared and cooked by the consumer. Please explain the differences between "pre-processing," commercial processing, and cooking by the consumer and how these were handled in the data processing. Lines 621-622 [page 16]—Adjustment factors were applied to the proportions of what? Shouldn't they be applied to the gram amounts? Lines 621-625 [page 16]—These factors are the percentages of moisture that is lost through processing. What is missing are the factors that were used to convert the cooked/processed fish, reported in NHANES, back to the uncooked/unprocessed form. Tables—What does "Inc Ref' mean? Because these are population estimates, the last two rows in the income section of the tables should be combined into something like "Income unknown." 36 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Review By: Dale Hattis, Ph.D. 37 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Peer Review Comments on EPA's Draft Document Fish Consumption Rates Dale Hattis, Ph.D. Clark University I. GENERAL IMPRESSIONS This is a very good piece of work, applying very sophisticated statistical methods to the available data. However, it could be improved by adding a discussion chapter that analyzes and summarizes the findings relevant to risk assessment. I have done some preliminary analysis of geometric means and geometric standard deviations for total fish consumption from probability plots of the percentile information (see table on the next page.) Using this kind of analysis, the reader could be informed, for example that among racial groups, the "other race" category stands out as having higher overall fish consumption than other races. I assume this is due to the inclusion of Native Americans in that group, some of whom are subsistence fishers and are particularly at risk for high consumption of locally-caught fish and shellfish. It is also of interest that women of child-bearing age have slightly smaller geometric mean consumption but a greater apparent interindividual variability in consumption than other age/sex groups. Another aspect that could be improved would be to provide an additional set of data tables in which the dependent variable was not raw grams consumed per day per person, but grams consumed per kilogram of body weight. This could be readily done using the same methodology because the NHANES data include individual body weights. Finally, I think it would be helpful to show calculations of geometric standard deviations by the various breakdowns in the detailed tables so that the reader could appreciate (1) which groups have more or less variability in fish consumption and (2) so that comparisons could be made to long-term biomarkers of fish consumption, such as methylmercury and PCB blood concentration distributions. These latter statistics may be in part available from other measurements in the NHANES data. In addition, I published some older data on these variables: Hattis, D. and Burmaster, D. E. "Assessment of Variability and Uncertainty Distributions for Practical Risk Analyses" Risk Analysis. Vol. 14, pp. 713-730, 1994. 38 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Table of Results of Lognormal Fitting to the Consumption Percentiles for All Fish (Based on Data from Table 6a) Group All adults Males Females Women 13-49 21-35 35 -<50 50-<65 65+ yrs Non-Hi sp White Non-His Black Other Race Geom Mean (g/day) Geom. Std Dev. 14.61 17.02 13.03 9.66 11.56 14.62 20.33 13.21 13.67 16.78 27.39 2.247 2.216 2.216 2.512 2.498 2.172 2.025 2.218 2.231 2.090 2.044 II. RESPONSE TO CHARGE QUESTIONS 1. Is the document logical, clear and concise? Explain. If not, how could the document be improved ? Yes. However, it could go into more detail for the non-statistician on the choices of distributional methods. Overall these seem reasonable, and the comment that there is very little difference between log-logistic and lognormal distributions is helpful. It might also be helpful to explain, if it is true, that the logistic distributions were selected for modeling because of greater mathematical tractability than lognormals. 2. Were scientific and statistical assumptions explained and are they appropriate? Explain. The statistical assumptions were described but the reasoning underlying them could have been more fully explained (see previous comment). 3. Has appropriate literature been cited? Explain. Are there publicly available, peer-reviewed papers that should be included? Explain. These might be cited for background and for the distributions of exposure to seafood-borne contaminants: Hattis, D. and Burmaster, D. E. "Assessment of Variability and Uncertainty Distributions for Practical Risk Analyses" Risk Analysis. Vol. 14, pp. 713-730, 1994. Hattis, D., "Using Indicator Information for Managing Risks," Chapter 14 in: Environmental Indicators and Shellfish Safety. C. R. Hackney and M. D. Pierson, eds., Chapman & Hall, New York, pp. 364-380, 1993. 39 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Ahmed, F. E., Hattis, D., Wolke, R. E., and Steinman, D., "Human Health Risks Due to Consumption of Chemically Contaminated Fishery Products," Environ. Health Perspect., Vol. 101 (Suppl. 3), pp. 297-302, 1993 Probably there are other more recent references that would be appropriate for similar reasons. 4. Is the methodology as presented and defined in the report scientifically appropriate for meeting the objectives of the project? Additionally and specifically: Yes, except that for understanding dosage distributions. I think it would be helpful to calculate fish consumption per unit body weight per day in addition to raw fish consumption per day. a. Please comment on methods for calculating fish consumption rates. These seem reasonable and generally appropriate. b. Please comment on the means for combining fish frequency data. As far as I could tell, the authors also seem to have made reasonable choices here. c. Please comment on the method used to apportion species. Seems OK. 5. Please comment on appropriateness of the models used for estimating fish consumption rates, focusing on both the "NCI method" and the "modified EPA method. " a. Is the EPA method clearly described and supported? Explain. Yes. The comparisons indicating comparable results for the modified EPA method and the NCI method build confidence. However, aside from leaving out some specific variables, I was not clear on the exact differences between the methods. b. Are uncertainties in the EPA model identified and characterized? Explain. They seem to be reasonably well identified, although a clearer summary would be helpful. The assumption of normality in the transformed parameters seems a reasonable approximation but the difference between the actual data and the distribution imposed by the normality assumption could be more explicitly shown to the reader to further build confidence in the method and results. 6. Is the EPA method adequate for accomplishing the objective? Explain. Yes, it seems to be quite adequate based on the comparisons provided. 40 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates 7. Specifically in regards to the analysis: a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. The national representativeness of the NHANES data is fully described, as is the sampling protocol and the use of the population weights. All of this seems appropriate. b. Were these criteria adequate? Was the methodology appropriate? Explain. If not, how could the methodology could be improved? Yes. Only, I think in introducing the body weight factor to allow better representation of the distributions of consumption controlled for this major variable. 8. Are the results presented in the report understandable and appropriate for meeting the objectives of the project? Explain. If not, how could the presentation of the results be improved? The results as far as they go are presented reasonably. As indicated above, I would like to see further analysis of parameters relevant for risk assessment and singling out of particularly important results for risk assessment implications. 9. Are scientific uncertainties explained and are they appropriate? Explain. Generally, yes. However, the key issue of within-person correlations of fish consumption appears to be based on just two days for each individual. This means that the degree of correlation of fish consumption on different days must be measured with some error. The degree of uncertainty in estimates of the within-person correlation probably should be discussed as it may tend to produce uncertainties in the allocation of variance between person-to-person differences and within-person differences. In addition to this, the report explains that there are reports of habitual fish consumption over a prior month. It would be good to see some explicit analysis of these data, or at least a clearer explanation of how these data contributed to the overall analysis. 10. The data used in the analysis have been subdivided based on demographic and geographical characteristics of the respondents. Are the subsets of data sufficiently robust to characterize fish consumption within the subgroups for the purposes stated in the report? Please provide your response for each of the major subgroup categories included in the main body of the report. I think so. 41 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates III. SPECIFIC OBSERVATIONS Page Paragraph Comment or Question 17 2 I had to look up what a logit distribution was. A clearer mathematical description of what this is in general and why it was selected would be helpful. From Wikipedia, logit(P) = log(P/(l-P) = -log(l/p - 1) 17 3 Similarly, the Box-Cox distribution should be explained and the why of the choice of this transformation described. Also, nowhere is there a presentation of which lambdas (power numbers) were indicated by the data. This could be done in an appendix. 20 1st bullet "The predicted values reflect a standard week (3 weekend days and 4 weekday days) rather than the distribution of weekday and weekend recalls in the data." It seems odd to describe a "standard week" in this way, rather than one with 2 weekend days and 5 weekday days. The why of this choice needs to be explained, and perhaps there should be a brief description of how much difference this makes in the results. 23 2 The description of the age groups in this paragraph makes no mention of the l-<3 age group included in Table 5. It seems to me this age group should be added to the description or the reader will wonder why children under age 3 are not covered. C-l The tables in this appendix should give the units (g/day). 42 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Review By: Kenneth M. Portier, Ph.D. 43 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Peer Review Comments on EPA's Draft Document Fish Consumption Rates Kenneth M. Portier, Ph.D. America Cancer Society I. GENERAL IMPRESSIONS Overall, I find the report readable, stays on topic and comprehensive. There are very few areas needing major revision and the writing is clear and concise with very, very few spelling errors. This said, I do see an alternate way of reorganizing the information in Chapter 4 to improve flow and understanding (see responses to charge questions 1 and 3 specifically). II. RESPONSE TO CHARGE QUESTIONS 1. Is the document logical, clear and concise? Explain. If not, how could the document be improved ? I found the document logically ordered and the writing clear and concise but confusing in a couple of places. The document defines its objective in the Background section and identifies the major data source in Chapter 2. Chapter 3 introduces the NCI method, which is again described in Sections 4.4.1 and 4.4.2. Not certain why one even needs Chapter 3 since the material in Chapter 3 might be better as a background section in Chapter 4 (or a new Statistical Methods Chapter). Chapter 4 combines a number of "methods" that could very easily comprise their own chapters. The methods discussion around habitat apportionment (Section 4.1) and trophic level assignment (section 4.2) could be combined in one chapter describing how fish-related characteristics are used in estimating (stratified) consumption rates. The specific comments to Question 2 suggest some ways that these Sections (or new Chapter) might be better organized. In particular, organizing the apportionment discussion around the "rules" and data sources used in apportionment would improve understanding. Section 4.3 on "Extracting reported amounts of fish consumed" could be a part of Chapter 2 since it really describes how the FNDDS files were processed to find food codes containing finfish and shellfish, hence it tells us in more detail what NHANES data were actually used. Section 4.4 (Statistical Methods) deserves its own chapter (called Statistical Methods) since it contains the key discussions of the NCI method for estimation of fish consumption and described the modifications of this approach that constitutes the "EPA method." This discussion could benefit from a short discussion relating sample size to estimate uncertainty to help answer the question of "How many observations are needed to estimate consumption to a specified level of precision?" 44 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Chapter 5 (Results) can benefit from more discussion of model goodness of fit. Overall, there is a need to standardize labels. In the report I find references to the "NCI method," the "NCI model," the "EPA model," the "EPA approach," the "EPA method," the "Modified NCI Method" (page 22) and in the Figures, the "Westat Modified NCI Method." It initially was difficult to know how many "methods" were really under consideration. Was it two or three? Only after one reads Chapter 4 do you realize there are only two "methods," with two "models" for each method, one for probability of fish consumption and one for amount of fish consumed. I will refer to the NCI and the EPA "methods" in my remarks. Occasionally, I will refer to the model for estimating the probability of fish consumption and the model for amount of fish consumed for specific methods. There are also two "methods" for simulating UFC based on the fitted NCI or EPA method estimated parameters and associated models. Additional suggestions for report improvement can be found in my replies to the remaining questions. 2. Were scientific and statistical assumptions explained and are they appropriate? Explain. I did not find any specific sections discussing scientific or statistical assumptions in the report. Scientific and statistical assumptions seem to be discussed as needed throughout the document. I think it is appropriate that it be done this way. Further, discussion of assumptions is needed in a number of places as outlined below. Page 1: We are told that the current default fish consumption rate (FCR) used by OW are the 90th and 99th percentile estimates from the freshwater and estuarine fish consumption distributions computed from the CSFII. When you get to the bottom of Page 2 you find that we will actually be provided with "the UFCR estimates and 95 % CI of the mean and the 25th, 50th, 75th, 90th, 95th, 97th, and 99th percentiles." There is no discussion (or justification) for why these particular percentiles (probably to illustrate the right tail of the consumption distribution which is where risk assessment interest is greatest). Why not also provide 5%-tiles up to 95% and illustrate the whole distribution? Page 1: It is stated that "As fish consumption may have changed over the past decade..." What is the evidence for this as a reasonable assumption on which to justify the effort of creating new estimates? [One or a couple of references to current studies, popular reports, NOAA landings values, etc. would satisfy this need.] Page 1: Reference is made to the NCI method. Have other methods been proposed but rejected? Page 1: It is stated that "The calculation using the NCI Method are very time consuming." It is assumed that either: 1) EPA does not have the time to make these calculations or 2) EPA cannot find the computational power to makes these calculations in a reasonable amount of time. I don't find this discussed anywhere. Acceptance of this assumption is key to justifying the development and use of the EPA modified method. 45 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Page 2: Estimates are desired for 18 different categories of fish. It is assumed that each category is important to some entity. Nowhere is there discussion as to why these categories specifically are chosen. Page 3: Chapter 2 discusses the NHANES as a quality source of finfish and shellfish consumption for the general US population. Are we to assume that this is the only source of such data? A discussion of other potential sources for fish consumption data and why the NHANES was used is needed. Page 5: The FNDDS is discussed in general (here, but in more detail in Section 4.3), but the "science" behind this database merits at least a paragraph. This database is used in the critical step of translating what is eaten (a menu item) to how much fish is consumed. Page 7: The scientific and statistical assumptions of the NCI method are covered in Question 4. Page 8: While "The assignments of species were completed by a fisheries biologist" it is not clear what assumptions and/or rules were employed in this assignment. If I were to employ a different fisheries biologist, would that individual come up with the same habitat apportionment? By providing insight into the assumptions and rules used by the fisheries biologist, we are better able to ensure repeatability (a scientific method characteristic) to this process. The "decisions" listed in the four bullets are actually some of the "rules" used by the fisheries biologist in the assignment. Are these all of the rules? It is clear that NOAA landings data factor into these "rules" (Section 4.1.2). In addition, the final rule is "that unspecified fish consumed was assigned the overall average habitat apportionment of all species reported consumed." Is this reasonable? Page 11: The statement, "No species in a group was assigned 0 percent based on a 0 count in the files, because it may be reported in another NHANES cycle," requires additional clarification. What was the rule used to assign the value greater than zero? Page 14: The fourth bullet on this page refers to "best professional judgment" and an example in catfish is described. Is catfish the only NHANES grouping that is impacted by this "rule"? Table 3 might be modified to indicate which fish allocation is impacted by "best professional judgment." The scientific issue here is repeatability. Pages 17-20: Assumptions for statistical methods presented in Question 5. Page 20: (Section 4.4.3) It is not clear from the first sentence in this section whether the bulleted statements represent constraints on the NCI method estimates when used for simulating fish consumption or whether these statements are constraints under which the NCI method estimates are derived. I think these bullets are actually establishing the specific "reality" we are attempting to simulate using the information from the fitted NCI model. 46 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Paragraphs 4 and 5 on page 54 (Section 5.4.2) initiate a discussion on model assumptions but doesn't really take it very far. In paragraph 4, you say "The validity of these assumptions can be discussed and, to some extent evaluated using data." but don't elaborate. Maybe a little elaboration is justified. At the bottom of page 54, you write "In our opinion, the NCI method makes reasonable assumptions and, given the assumptions, has adequate sample size to provide estimates with little bias relative to the confidence interval width." I personally tend to agree with the report on this, but I suggest giving the reader a little more, especially about the reasonableness of the method assumptions. The issue of how fish never-consumers are handled is never addressed: One issue that is not addressed in the report impacts how the results of this study are used in a population risk assessments when the population consists of a fraction of individuals who, for personal reasons, never eat fish. Estimates of US residents who self-report as vegetarian or vegan range (not fish consumers) from a low of about 2.5% to a high of about 13.7% of the population (see http://en.wikipedia.ors/wiki/Vesetarianism in the United States#USA for details). The NCI and EPA methods seem to assume that every individual who provides data via the NHANES 24-hour or 30-day surveys has a positive probability of consuming fish over the covered time period. In statistical jargon, they assume an underlying continuous distribution of consumption. With this assumption, for any individual if we were able to effectively record consumption for a long enough period of time, every individual would be observed eating fish at least once in that time period. The reality is that the underlying fish consumption distribution is a mixture distribution with a positive probability of fish non- consumption (of say p . 025 to .137) and one minus this probability of consumption. The problem lies in that the NHANES survey does not have a question that identifies individuals who would "never eat fish, " hence it does not allow us to easily split out "fish consumers" from "fish non-consumers". The individuals who report no fish consumption are a mixture of "never consumers " and "low likelihood consumers. " The NCI method estimate of the probability of fish consumption in a 24-hour period essentially uses one probability for the mixture. This issue is not a problem at the estimation phase but does come up when the estimated model is used to simulate an individual's long-term probability of fish consumption. The equations on page 21 suggest that the long-term probability of fish consumption (Quj) will always be greater than zero (distribution is assumed Logistic, a continuous distribution, and hence the probability of a single value (0) is zero.) But this model uses the estimated 24-hour consumption probability (P, page 19) that includes the mixture. So, the problem is that the simulation is really about fish consumers, but one of the parameters used in the simulation (P, which affects the estimate of the other "pi s ") represents both consumers and non-consumers. The ultimate result is that the percentiles for the fish consumption distribution are all likely to be over estimates which conveniently adds a conservative lean to population risk assessments. 47 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates 3. Has appropriate literature been cited? Explain. Are there publicly available, peer-reviewed papers that should be included? Explain. A number of places need citations. Page 1: References to justify the statement that fish consumption rates have been changing. Reference to increasing NOAA landings values might suffice here, although looking at NMFS total landings data suggests decreased tonnage from 1993 to 2012 (4.6 MT in 1993 to 4.2MT in 2012). Page 8: The second bullet incorporates a quote but there is no indication where this quote comes from. (I assume this is part of the Clean Water Act, but not certain.) This statement also requires further clarification since the current sentence structure is complex making it difficult to understand. Page 13: The two references used for tropic level assignments are EPA technical reports from 2002 and 2003. Have these documents been examined recently to ensure they continue to describe "best science?" Page 22: Section 4.4.4 - This section should be significantly increased. A reference for method of computing confidence limits on the log scale and back transforming is provided below. The method for using full sample weights and replicate weights with NHANES data can be complicated for the uninitiated. I don't think the NHANES web site provides sufficient information for the reader of this report to understand how weights should (are) used in the analysis. The design effects discussion in the NCHS 2005 reference given is inadequate for this. A reference or two here, and/or a short discussion in an appendix, would ensure that future readers are not confused on what was done here. The four steps for computing the CIS really need to be described in greater detail. Again, the issue here is ensuring that readers are able to replicate the report results (scientific validity). Gilbert, Richard O., Statistical Methods for Environmental Pollution Monitoring, 1987, Van Norstrand Reinhold, NY, NY, Chapter 13 Characterizing Lognormal Populations, pp 164-176. Page 22: A reference/web link for the MIXTRAN macro is needed. 4. Is the methodology as presented and defined in the report scientifically appropriate for meeting the objectives of the project? Additionally and specifically: I will assume that this question is asking about the methodology of processing the NHANES data to obtain short-term fish consumption likelihood and amount. Questions 5 and 6 ask for specific comments on the NCI and EPA modified methods for estimating long-term probability of fish consumption and amount consumed distributions. 48 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates The approach requires two broad steps. First is obtaining food consumption and from these self- reported types and amounts, estimating the amount of fish consumed. These data also allow estimation of the short-term probability (likelihood) of fish consumption. Next is to model the likelihood and amount of fish consumed, as obtained from NHANES, in such a way that the parameters of interest for risk assessment, the long-term fish consumption probability or likelihood, and the distribution of fish consumption (intake) given reported consumption are estimable. These two components are then used to estimate usual fish consumption (intake) as a long-term mean. This approach is both practical and has historically been used by others. a. Please comment on methods for calculating fish consumption rates. The method used to estimate the amount of fish consumed using NHANES data and detailed recipe analysis is state of the science. There is no discussion in the report about the uncertainties associated with the fish proportions associated with each food code presented in Appendix B. In addition, the uncertainties associated with percent moisture loss for each processing method in Table 4 are not discussed or provided. In the future, if someone was interested in understanding how variation in fish proportions in foods or moisture loss in processing methods impacts the usual fish consumption estimate (e.g. sensitivity analysis), it would be beneficial to have published standard errors of these key proportions. I do not know if standard errors are available from the original sources of these data. b. Please comment on the means for combining fish frequency data. The data needed for the NCI and EPA modified models is Aij, the amount of fish consumed, in grams, reported in a 24-hour dietary recall. This amount can represent all fish and/or shellfish, or can represent some subset of fish groupings, tropic class, or habitat class (defined in Chapter 4, Sections 4.1-4.2). In this report, the food codes (recipes) were decomposed to provide the fish proportion of the food and multipliers to which are used to calculate total fish and fish/shellfish subsets. This is all straightforward. The text does not describe how the multipliers in Appendix B are actually used. I had to work through the following example to understand these. An example like this should be placed in the report somewhere to help the reader interpret the column heading and the values therein. Let's examine the first line of Table B-l, "Shrimp dip, cream cheese base." Assuming one gram of this recipe, we would have .262 grams of fish or shellfish. To adjust for moisture loss (25%), the .262 grams would be assumed to be 75% of what was originally there. Hence, the pre-processed amount of fish would be ,262/.75=349 grams. This value (.349) is identical to the "Multiplier for total fish" so this column identified the amount of pre- processed fish in the recipe. I assume the .062 value for "Multiplier for marine fish" then indicates the amount of marine fish in the pre-processed recipe that produced one gram of final food. Since shrimp is the only fish in the recipe, we use the marine proportion for shrimp in Table 1 to assign 17.6 of the total fish to marine (.349 x .176=.062). And so on... 49 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates c. Please comment on the method used to apportion species. Overall, I have little to say about the apportionment of species other than the comment in Question 4b above on use of multipliers, and Question 2 comments for pages 8 and 11 about replication of "professional judgment." 5. Please comment on appropriateness of the models used for estimating fish consumption rates, focusing on both the "NCI method" and the "modified EPA method. " I would like to make a few remarks about the NCI method here since the specific sub questions focus on the EPA method. These comments relate to Section 4.4.2. The first paragraph states that "The NCI method can be implemented using two SAS macros..." Does this mean that the reader can use this tool but for this report a different approach was used? Or does this mean that for this report the NCI method "was implemented" in SAS using two macros that can be obtained from the NCI? (But it doesn't tell me now to get them... do I write the Director?) In the second paragraph: The limits on k are not defined. What are the covariates? Are they all continuous, all categorical or mixed? It would be clearer if you specified that j=1 for most individuals and only a few individuals have j={ 1, 2}. Given that "The usual daily consumption is the weighted average of the weekday and weekend estimates" and given Friday is part of the weekend, the weights for this weighted average would be 4/7 x (Weekday average) + 3/7 x (Weekend average). Is this correct? Unclear. What are the default starting values that NLMIXED uses to initiate its search (provide in a table or define how computed)? Are the MIXTRAN and DISTRIB macros to be provided in the report so that an informed user could examine this code to determine this? (issue of repeatability) Cij is never defined (assumed to be "indicator of consumption"). The X is not defined (the Cox transformation parameter). The TCi are not defined as the person level effects for likelihood of consuming fish. The otij are not defined as the person level effects for amount of fish consumed at the jth 24-hour recall. • The 7Txk are not identified as the coefficients that relate covariates to likelihood of fish consumption. • The axk are not identified as the coefficients that relate covariates to amount of fish consumption. • Note it might be nice to indicate that in this model, Cij, otij and m are all random effects, the rest of the parameters are fixed effects. Note that Pij is the probability of consuming fish in a 24-hour period. According to this model, 0 < Pij < 1. 50 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Pij can never be 0 or 1 for any individual which assumes there are no fish non-consumers in the fitted data. Since it is highly likely that this is not true, the model is not quite realistic for its given data. a. Is the EPA method clearly described and supported? Explain. The description of the EPA method begins at the bottom of page 18. It would be better if the EPA method had its own section separate from the description of the NCI method. First, I think it is very important to state in a way that the reader notices it, that from one method fit to one set of fish consumption data, all of the sub-population estimates are derived. That is, all of Table C-l comes from one fit of the EPA method run applied to the total finfish and shellfish consumption data. The estimates of the model parameters obtained from the fit of the method to the data provide everything needed to compute all of these consumption distribution estimates. This tends to get lost in the report. This is important statistically because all of the data (for fish subset being run) are used to estimate the model parameters and, hence, all of the data are factored into subsequent confidence intervals. You aren't running fits to smaller and smaller datasets for subpopulations which would produce even wider confidence intervals. The justification for simplifying the NCI method for parameter estimation is weak and I feel should be discussed in more detail. Some statistics on run times for the NCI method, run on a current model PC and used to estimate one fish consumption scenario, would likely be justification enough. Is the NCI method susceptible to running distributed on a computer grid (such as the World Community Grid - http://www.worldcommunitygrid.org/) where thousands of computers could be used to produce the needed results? If so, that weakens the need for a modified method. The last paragraph on page 18 is actually a synopsis of the EPA method, used before you get into the formal details of the method. Rather than talk about what the SAS macro does, talk about the modification to the NCI method and then simply indicate that the approach has been implemented into a SAS macro called ??? (name never given) and available from ??? (location not provided). You indicate the use of a "normal scores plot" (a q-norm plot I assume) as an aid to determining the initial lambda* estimate (Box and Cox power transformation parameter). Exactly how is this done? Can you provide a reference to this approach? A good discussion and references to estimating the Box and Cox transformation parameter can be found in: Piegorsch, Walter W. and A. John Bailer, 1997, Statistics for Environmental Biology and Toxicology, Chapman & Hall, London, GB, Pages 130-131 It might be clearer if you list the EPA modified procedure as a series of steps. (I did this to help me understand the method but suggest it might also help other readers.) Step 1; compute the four summary statistics for each individual. Step 2; fit the logistic regression model. 51 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Step 3; iteratively fit the constrained logit model to minimize a weighted Chi square statistic and estimate individual level effects for the probability of fish consumption. Step 4; estimate the correlation between person-level random effects by using the residuals from the probability model as a predictor in the amount sub-model. Fit the amount sub- model using only records from the first 24-hour recall. Step 5; estimate the within-person variance component. Step 6; estimate the person-level random effect variance. The four equations found on the fifth line of page 19 should be stacked to be consistent with other equations. If you list these statistics vertically, you can add their "labels" to the right and remove the next two lines. Since j can at most be equal to 1 or 2, you are only averaging, summing or counting for a few individuals. The statement "The person-level random effect is included by assuming the predicted logit when excluding the random effect is proportional to the predicted logit when including the random effect." is not clear at all. It made more sense AFTER I look at equation 4 on page 19. Ok, here is where I get confused. In equation 4 you have log (P/(l-P)) as the response in the logistic regression. But for this to work shouldn't the P be Pi? But then in equation 5 you use Pi in the response and regress it against the logit of the Pi? Is the critical element here that equation 4 is fit incorporating survey weights, whereas equation 5 does not use the weights? Please clarify. Equation 5 basically says that the observed and (survey weighted) predicted Pi are proportional to each other and the residual is the individual level effects. This is not a particularly intuitive relationship and seems to be the key to why the EPA method would work. I think it is really important to motivate this step. Why would you expect this to work? How do you know that this results in normally distributed to? You write that "Calculation of standard errors requires: 1) calculation of replicate weights consistent with the NHANES survey design and strata and PSU variables; 2) running the macros using the full-sample weight and each replicate weight; and 3) combining the results to estimate the standard errors." I assume this is true for both the NCI and EPA method. I assume that these calculations occur each time SAS Proc SurveyReg is used. The reader needs to know or understand Proc SurveyReg to understand the importance of this quote. Another place a reference is needed. b. Are uncertainties in the EPA model identified and characterized? Explain. There is no place in the report where NCI or EPA method parameter estimates and their corresponding standard errors are displayed (uncertainty relates to parameter precision). Estimates and approximate standard errors must have been calculated for all model parameters - these would be required output from the statistical estimation routines. Not sure most readers would be interested in seeing these estimates in the body of the report, but since these estimates are important for the simulation of UFC these values should be available, either in an appendix or in an online file (repeatability issue again). 52 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Nowhere is goodness of fit for either model discussed (prediction uncertainty). Do these models fit equally well for particular data? Since the methods predict two outcomes, probability of fish consumption (logistic regression) and amount of fish consumption (regular regression), you would need two tables. An adequate (generally accepted) goodness of fit statistic like the R2 for regular regression is not available for logistic regression. Reporting the final scaled deviance would allow comparison for the logistic regressions. Along with the number of parameters in the model, these statistics form the basis for many proposed goodness of fit statistics for generalized linear models and hence might be the minimum required fit information that would need reporting. There are similar issues with the Cox and Box transformed linear regression since the R2 statistic is actually a function of the lambda* estimate. Still, reporting R2 values would allow some comparison. Section 4.4.6 compares the predicted UFCR from the two fitted methods. This is not the same as the model fit which examines predictions to actual for a specific model and data set. Both the NCI and EPA methods might predict the observed data adequately and still differ in predicted UFCR values. 6. Is the EPA method adequate for accomplishing the objective? Explain. Adequacy here relates to the extent to which the EPA method suitably duplicates the NCI method results. Clearly the figures in the report indicate that on a distributional basis both methods seems to produce similar fish consumption distributions so to this extent the EPA method is adequate. I still worry about the issue of fish never-consumers and how they are handled by both methods. Of course, from a risk assessment point of view, fish never-consumers are never exposed to the contaminants that might be found in fish and hence might be considered not part of risk picture. Still, when examining population risks, ignoring fish never-consumers in these methods results in risk being over-estimated (the risk distributions are shifted to the right). 7. Specifically in regards to the analysis: I assume in my reply below that this question is specifically about Section 5.3 (and indirectly the material in Section 4.4.3) where the NCI method is compared to the EPA method (which is referred to only in this section as the Modified NCI method). To me, this section represents an analysis of the EPA method. a. Were sufficient information and explanations given that describes how the data were used and what criteria were used to determine the suitability of the data? Explain. In Section 4.4.3, we are provided with the methodology for simulating UFC with the NCI and EPA methods. After reading this section, I had a number of unanswered questions. The "modifications" listed in the three bullets at the bottom of page 20 really describe the objective of the simulation exercise - a desire to compare UFC for a "standard week" ignoring recall-to-recall and within person variability. I get this, but I am not sure WHY you might want to limit the comparison this way. Justification or motivation needed here? 53 ------- External Peer Review of EPA's Draft Document Fish Consumption Rates Why 100 simulated values for each person? Optimal? Adequate? Just a number used for demonstration purposes (likely)? You fail to mention that you will be simulating fish consumption for every individual for which we have fish consumption data from NHAMES. You could just have likely created a synthetic cohort of fish consumers as the basis for the simulation. Oh! There is that 3/7 weight which just shows up here without explanation. See bullet 3 question 5. You need to make clear that the model parameter estimated values used for the NCI method simulation are different from the model parameter estimated values used for the EPA method (another reason to report these estimates in a table somewhere). Similarly, the lambda values used in the back transformation, Bui, values may be different for the NCI and EPA methods. The statement "This equation includes an adjustment with the within person variance in the fish consumption amount ( |