oEPA United States Environmental Protection Agency Office of Water 4304T EPA-822-R-20-003 January 2020 EPA RESPONSE TO THE EXTERNAL PEER REVIEW REPORT ON THE EXPANDED MULTIPLE LINEAR REGRESSION BIOAVAILABILITY MODELS FOR ALUMINUM EFFECTS ON AQUATIC LIFE (2018) ------- EPA-822-R-20-003 EPA RESPONSE TO THE EXTERNAL PEER REVIEW REPORT ON THE EXPANDED MULTIPLE LINEAR REGRESSION BIOAVAILABILITY MODELS FOR ALUMINUM EFFECTS ON AQUATIC LIFE (2018) January 2020 U.S. ENVIRONMENTAL PROTECTION AGENCY OFFICE OF WATER OFFICE OF SCIENCE AND TECHNOLOGY HEALTH AND ECOLOGICAL CRITERIA DIVISION WASHINGTON, D C. 11 ------- Table of Contents 1 Introduction 1.1 Background 1.2 Peer Reviewers 1.3 Revi ew Materi al s Provi ded 1.4 Charge Questions 2 External Peer Reviewer Comments and EPA Responses, Organized by Charge Question... 2.1 General Impressions 2.2 Charge Question la 2.3 Charge Question lb 2.4 Charge Question lc 2.5 Charge Question 2a 2.6 Charge Question 2b 2.7 Charge Question 2c 2.8 Charge Question 2d 2.9 Charge Question 3a 2.10 Charge Question 3b 3 References Cited by Reviewers and EPA Responses .. 1 .. 1 ..2 ..2 ..2 .. 3 ..4 11 13 15 24 33 35 39 42 46 47 in ------- 1 Introduction EPA organized a contractor-led independent, external peer review of the 2018 revised multiple linear regression bioavailability models for aluminum developed by DeForest et al. (2018b). Two documents were provided to the external peer reviewers: 1) a Memorandum "Updated Aluminum Multiple Linear Regression Models for Ceriodaphnia dubia and Pimephales promelas" dated 8/24/18 (DeForest et al. 2018b) and 2) an earlier publication by DeForest (DeForest, D.K., K.V. Brix, L.M. Tear and W.J. Adams. 2018a. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines. (Environ. Toxicol. Chem. 37(1): 80-90)). Two criteria calculators developed by EPA, based on the DeForest et al 2018 Memorandum, were also provided to the external peer reviewers: 1) MLR Modellndividual SlopesAluminum Criteria Calculator_8.29.18.xslm, 2) MLR Model Pooled Slopes Aluminum Criteria Calculator_8.29.18.xslm. The external peer review was completed on September 21, 2018. The external peer reviewers provided their independent responses to EPA's charge questions and general impressions of the multiple linear regression models. This report documents EPA's response to the external peer review comments provided to EPA. This report presents the 9 peer review charge questions and five individual reviewer comments (verbatim) in Sections 2.1 through 2.10 along with their general impressions. New information (e.g., references) provided by reviewers is presented in Section 3. Each reviewer's comments were separated by charge question into distinct topics and responded to each topic individually. 1.1 Background Section 304(a) (1) of the Clean Water Act, 33 U.S.C. § 1314(a)(1), directs the Administrator of EPA to publish water quality criteria that accurately reflecting the latest scientific knowledge on the kind and extent of all identifiable effects on health and welfare that might be expected from the presence of pollutants in any body of water. In support of this mission, EPA is updating water quality criteria to protect aquatic life from the potential effects of aluminum in freshwater environments. EPA thus funded a contractor-led focused, objective evaluation of 2018 revised multiple linear regression bioavailability models for aluminum, to determine if their quality was sufficient for EPA to use in aluminum criteria development. The publication on multiple linear regression bioavailability models for aluminum by Deforest et al (2018a) was applied in the 2017 EPA draft Aluminum Aquatic Life Ambient Water Quality Criteria. The 2017 datasets used to develop the DeForest et al (2018a) aluminum bioavailability models were supplemented in 2018 with an additional nine C. dubia toxicity tests and nine P. promelas toxicity tests to expand the range of water chemistry conditions for model development (OSU 2018a,b,d), in order to develop revised bioavailability models for aluminum, as described in the Memorandum which the external peer reviewers evaluated. As a result of this additional work, the individual (non- pooled) species MLR models were updated. Additionally, the authors were able to develop a pooled MLR model that incorporated both the invertebrate and fish toxicity data into one equation. EPA sought the expertise of external peer reviewers to provide an analysis of which model(s), the pooled model or the individual-species models, might be more appropriate to use in aluminum criteria development. 1 ------- 1.2 Peer Reviewers An EPA contractor identified and selected five expert external reviewers who met the technical expertise criteria provided by EPA and who had no conflict of interest in performing this review. The EPA contractor provided reviewers with instructions, the review materials below, and the charge to reviewers prepared by EPA. Reviewers worked individually to develop written comments in response to the charge questions. 1.3 Review Materials Provided • DeForest, D.K., K.V. Brix, L.M. Tear and W.J. Adams. 2018. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines. Environ. Toxicol. Chem. 37(1): 80-90. • Memorandum "Updated Aluminum Multiple Linear Regression Models for Ceriodaphnia dubia and Pimephales promelas" dated 8/24/18 • MLR Model lndividual SlopesAluminum Criteria Calculator_8.29.18.xslm • MLR Model Pooled Slopes Aluminum Criteria Calculator_8.29.18.xslm • Appendix A 9-5-18.xlsx. Appendix A is an Excel database that was provided to the peer reviewers to check models and answer questions for Charge Question 2 "Using the data provided in the Appendix A, please complete a side-by-side comparison of the results of the Non-pooled Aluminum Criteria Model and the Pooled Aluminum Criteria Model criteria derivations." 1.4 Charge Questions 1. Please review the DeForest et al. 2018 paper (DeForest, D.K., K.V. Brix, L.M. Tear and W.J. Adams. 2018. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines. Environ. Toxicol. Chem. 37(1): 80-90) and the Memorandum "Updated Aluminum Multiple Linear Regression Models for Ceriodaphnia dubia and Pimephales promelas" dated 8/24/18. • Is it appropriate to integrate the new toxicity data into the MLR equations? If not, why not? • Please comment on whether the pooled (fish and invertebrate captured in one equation) and non-pooled (fish and invertebrate captured by separate equations) MLRs are appropriately parameterized. • Does the pooled model behave similarly as the non-pooled models? 2. Using the data provided in the Appendix A, please complete a side-by-side comparison of the results of the Non-pooled Aluminum Criteria Model and the Pooled Aluminum Criteria Model criteria derivations. • Please draw conclusions regarding the differences in the values (CMC and CCC) generated and explain your rationale. • Please evaluate the scientific appropriateness of using a pooled model vs. non- pooled model and explain the rationale of your opinion. 2 ------- • Would the pooled MLR Aluminum Criteria Model be sufficiently robust and protective to use as the underlying basis for the aluminum aquatic life water quality criteria? • Please provide suggestions of alternate approaches, if any. 3. Ease of Use: • Please provide any suggestions of how to make an approach easier for a stakeholder (e.g., states) to use, such as improvements to user manual, better upfront input design, etc.? • Do you have any other suggestions to improve the ease of use? 2 External Peer Reviewer Comments and EPA Responses, Organized by Charge Question The following tables list the charge questions submitted to the external peer reviewers, the external peer reviewers' comments regarding those questions, broken into distinct topics, and EPA's responses to the external peer reviewers' comments. 3 ------- 2.1 General Impressions Re\ iewer Com mciils KIW Response to Comment Reviewer 1 Prior to agreeing to conduct this review, I have been working on an NAS panel on an update of the 2015 EPA Multi-Sector General Stormwater Permit (MSGP). Because aluminum is a stormwater benchmark monitoring requirement for some of the sectors in this permit, I have familiarized myself with the original aquatic life criteria developed for aluminum (1988). I have also briefly looked over the 2017 draft document. I therefore appreciate the difficulty of working with metal toxicity and risk assessments for aquatic ecosystems. As pointed out in the Deforest memorandum and other papers (see the special edition of ET&C 37(1) 2018 for a number of papers dealing with aluminum toxicity), including the 2017 draft, the editorial by Adams et al. 2018 (ET&C 37(1) 34-35, aluminum toxicity is dependent upon water quality characteristics (pH, hardness, DOC), not unlike other metals, including copper and zinc. The Biotic Ligand model has been used in the past but it is difficult to use. I found that the multiple linear regression (MLR) model approach outlined in the Deforest memorandum is well-thought out. I am particularly impressed with the Calculator as it produces excellent results and is easy to use. The additional studies (new toxicity data since the original ALC in 1988) included in this document are of great value as they increased all of the R2 values. The MLR model is a great improvement over past models because it incorporates pH, DOC, and hardness as these values relate to bioavailability and hence toxicity. The MLR can be used to normalize acute and chronic toxicity data to a set of predetermined water quality conditions. The MLR was also used to determine what water quality parameters are of value and which are not as important in terms of R2. Furthermore, the authors determined that a pooled MLR model had higher adjusted and predicted R2 values compared to the species-specific models. This conclusion was justified by the results of the individual and pooled models. I agree that the results of these models indicate that the pooled model should be used in place of individual models. Thank you for your comment and support of the MLR approach for aluminum Ambient Water Quality Criteria (AWQC). EPA used additional statistical analysis beyond just R2 to determine which MLR model, pooled versus individual, is the most appropriate to use. Reviewer 2 I have reviewed the documents provided by Versar that are presented in the below Table. An updated version of the Memorandum was provided on September 12. The Al criteria presented in these documents was developed based on multiple linear regression model approach. Two MLR criteria models were developed. One is for individual species (non- pooled model) and the other is for a combination of 2 species of C. dubia and P. promelas (pooled model). The model development was clearly described in DeForest et al. 2018 paper. The Memorandum presented an update to the models of DeForest et al. 2018 at which, new Thank you for your comment and analyses of the two approaches. Specific items are addressed below as they are further discussed in detail in your answers to other charge questions. 4 ------- Re\ iewer Com mciils KIW Response to Comment data for C. dubia and P. promelas were used for calculation of the model coefficients (slopes). A pooled model that combined data for C. dubia and P. promelas was also presented in the Memorandum. The provided scenarios of data that had a pH range of 5-9, a DOC range of 0.5-10 mg/L, and a hardness range of 25-400 mg/L as CaC03 were used to run the models and calculate the CMC and CCC values. A relative site-by-site comparison of the CMC and CCC values of the pooled and non-pooled models was conducted by calculating the ratio of the CMC and CCC values predicted by the pooled model to those predicted by the non- pooled model (Fig A and B). Below are some general comments for the model development and performance. Some of these comments will be further discussed and presented in the answers to the charge questions. • The MLR model approach is for sure easier to use than the Biotic Ligand Model approach. However, the BLM takes metal speciation and bioavailability into account and can be applied for various environmental conditions. The MLR is a statistical approach and its application is logically limited -the range of environmental conditions that was used for model development. Most of the data used for the model development were coming from laboratory research that used formulated water which is cleaner and less extreme than field waters. Given the complicated chemistry of Al, especially in different pH conditions, I am not sure how well the MLR model prediction will represent the natural environment. • The current data (including the addition of the new data set) don't seem to be strong for a multiple regression analysis that get involved with at least 3 variables and interaction terms between them including a quadratic term, such as for pH (pH*pH). When such regression models are developed, data of factorial design experiments are more suitable for use. The limitation of data used for the model development might end up with a model that is less representative and hence less accurate prediction, especially for cases that the data are outside or at the boundary of the current range and for other species rather than the two species used for the model calibration. • There are advantages and disadvantages between the pooled and non-pooled models. The non-pooled model clearly distinguish the dependence of Al toxicity on water quality. For examples, quadric model for pH and P. subcapitata and C. dubia but linear for P. promelas. The pooled model combined C. dubia and P. promelas data and likely excluded the quadratic term. This might make the model be biased to P. promelas. Since data for other fish species are not sufficient and the dependence of 5 ------- Ue\ iewei Com mciils 111* A Response lo Comment A1 toxicity on pH for other fish species is unknown, the current pooled model might not be representative. The conclusion of using the pooled model instead of non- pooled model for predicting A1 criteria is less convincing. The pooled model predictions are much higher than the non-pooled model predictions for low and high pH cases. This doesn't sound that the pooled model criteria is protective although it is more convenient and preclude the need to recalculate genus species distribution. Given the MLR criteria- a statistical approach, 95% confidence intervals can be used instead of the acceptable prediction of 2-fold above and below the perfect prediction that has been used by the BLM approach. lipliiui MLR Model Pooled SlopesAluminum Criteria Calculator 8.29.18.xlsm Pooled Slopes Aluminum Calculator MLR Model lndividual Slopes Aluminum Criteria Calculator 8.29.18.xlsm Individual Slopes Aluminum Calculator Appendix A 9-5-18.xlsx Appendix A file is to be used to check models for charge question #2 DeForest_et_al-2018- Environmental_Toxicology_and_Chemistry.pdf DeForest et al. 2018 Paper DeForest Aluminum MLR Models Update Memo (2018-08-24).pdf DeForest Memo to EPA 6 ------- Reviewer Comments EPA Response to Comment A 2S,000 20,000 1S,Q00 U 5 U 10,000 5,000 -Pooted-CMC ¦Non-pooled CMC •Pooled/nonpooled ratio HhihHHhHHNNNN o a 1 9- _4J o o 7 ------- Reviewer Comments EPA Response to Comment B 12,000 y 6,ooo u 4,000 2,000 o j ^aAL «-» rO yl C H fM *! Poded-CCi Non-poole PocJed/noi «—1 dCCC n-pooled ratio uS HrtnHNNNlS r 6.0 • 5.0 o -j= p 4.0 2 Tf - 3.0 1 g ¦ 2.0 ^ T3 V O - 1.0 £ ¦ 0.0 Reviewer 3 It is clear that the scope of this review is to evaluate different possible aluminum criteria calculators (excel spreadsheets) all based on multiple linear regression (MLR). The primary purpose of this review is to evaluate and provide written comments on EPA's Aluminum Criteria Calculator/Model and answer three charge questions. The focus of the review is on two Excel spreadsheets with multiple tabs that contain the aluminum model. A user s guide is included in the Excel spreadsheets as a ReadMe tab. The starting place for this MLR process is the recent DeForest et al. (2017) paper along with more recent data and revised MLR models (memo from DeForest et al., 2018). From these MLR models, which predict ECx concentrations as a function of pH, hardness and DOC, spreadsheets were built to predict effect concentrations as a function of those 3 water chemistry variables and convert them to CCC and Criterion Maximum Concentration (CMC) for use by stake holders. Spreadsheets were built using old and new data (the old data spreadsheet is already available online, the new spreadsheets are what are being evaluated here). The new data spreadsheets include either pooled or non-pooled versions. Thank you for your comment and support of the Aluminum Criteria Calculator. 8 ------- Re\ iewer Com mciils KIW Response to Comment The initial impression of the proposed Criteria Calculator is that it was a good choice to use the familiar Excel software platform. Essentially all potential end-users (scientists, consultants, permit writers, ...) will be familiar with Excel. This comfortable environment is a good choice for this tool. These models are designed for ease of use, using the common and familiar excel interface, and have been designed with the end user in mind. There is excellent transparency in how easy it is to find the underlying MLR equations within the spreadsheet, as well as seeing all the effects data that are used in the original MLR modelling. The information presented is accurate (the spreadsheets seem to apply the DeForest equations correctly) and for the most part presented clearly (see some exceptions below). In terms of soundness of conclusions, there were no conclusions to evaluate. Just the software tools. Reviewer 4 The use of multiple linear regression (MLRs) in metals criteria is an important step for translating the advances of biotic ligand modeling (BLMs) and related bioavailability research into functional criteria. Particularly with aluminum, they are a huge step forward from the old pH groups and can be both predictive of toxicity when exceeded, and protective of aquatic life uses when met. EPA has successfully used nonlinear regressions for many years with their ammonia criteria, and the educated public (i.e., dischargers, regulators) should have no problem working with these. The new toxicity dataset development and comprehensive data reduction and modeling are exemplary and hopefully harbingers for approaches with other outdated criteria. This review focused on comparing the performance of two MLR models. The outputs of the two models were often dissimilar, which was not expected. Comparisons with BLM outputs and other comparisons of MLR outputs with test calculations and natural waters suggested that the individual or "non-pooled" MLR models has the better performance of the two. It was not clear that the pooled model would be as protective as intended by the guidelines for developing water quality criteria. Unfortunately, the severely compressed review schedule and my overlapping field work prevented a more in-depth review of the underlying math, and precluded taking time to ask the developers if I was interpreting and using the model correctly. Some of my criticisms could well be off the mark owing to the haste of this review. I did see the 12 September 2018 email that there was a correction to the memo and model, but with my overlapping field work and the long processing times to run the model, I did not have opportunity to go back and repeat my analyses before the 20 September 2018 deadline. Thank you for your comment. EPA agrees that the use of MLRs in the aluminum criteria development is an important step forward in developing functional criteria that reflect the latest science. 9 ------- Re\ iewer Com mciils KIW Response to Comment Reviewer 5 The work is a very well-executed model development based on a highly-screened aquatic toxicity dataset that offers a significant advancement in environmental risk assessment of aluminum in freshwater. The authors of the DeForest et al. 2018 paper and the subsequent peer-reviewed citations represent experienced and qualified experts in the related fields. The enlarged dataset offered in the work of the OSU Aquatic Toxicology Lab has appropriately increased the value and usefulness of the MLR approach, and furthermore allows defendable pooled MLRs. The approach and dataset presented are peer-reviewed and represent our best available knowledge moving forward to update and improve the current three-decade-old approach to quantifying aluminum risk in aquatic ecosystems. The papers, data, and technical memorandum used in the supporting material present a convincing case for moving forward. Although the actual model spreadsheet would be improved with better notation and comments fields for novice users, and a much better effort at user guidance, the overall MLR model appears well developed. The model spreadsheet supporting documentation needs work before general distribution since the user base is less than familiar with this approach. The Readme appears written by experts for an audience of users with similar expertise and that is most often not the case at the state regulatory level, especially in smaller states. General release of the criteria calculating model with its present level of documentation may lead to confusion and frustration with many users. The guidance for this review was somewhat challenging as well. For example the use of "Non-pooled" and "Individual" for the same thing was confusing. The models pre-loaded with scenarios was also somewhat mysterious at first, because I would assume you want the user base to fill in water quality scenarios of concern and run the model for specific results related to their management concerns. The Pooled Model does not appear to produce results consistent with the output of Non- pooled Model when comparing a side-by-side scenario data set. Hence, unless there is a reason for the rather large non-concordance of the two output sets, possibly due to user error, the Pooled Model would not be appropriate for use and appears to be generally overprotective. Thank you for your comment and suggestions for improving the "Read Me" tab on the Aluminum Criteria Calculator. As noted in the 2018 final Aluminum Criteria document, EPA completed an analysis of the residuals (observed value minus the predicted value) for the two models (individual vs. pooled MLR) to determine if one model fit the data better. This analysis showed that the individual model's residuals had smaller standard deviations. Additionally, the pooled model had some patterns in the residuals of the predictions relative to the independent variables (e.g., pH). There were no patterns in the residuals for either the C. dubia or P. promelas individual MLR models. EPA elected to use the individual, non-pooled fish and invertebrate models in the 2018 final recommended aluminum aquatic life AWQC, based on external peer reviewers' comments and EPA's own analyses. This modeling approach is also consistent with the approach in the draft 2017 aluminum criteria document. Analyses comparing the performance to the two model approaches (individual vs. pooled MLR) is presented in Appendix L of the final 2018 Aluminum Criteria document (EPA's MLR Model Comparison of DeForest et al. (2018b) Pooled and Individual- Species Model Options). 10 ------- 2.2 Charge Question 1a. 1. Please review the DeForest et al. 2018 paper (DeForest, D.K., K.V. Brix, L.M. Tear and W.J. Adams. 2018. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines. Environ. Toxicol. Chem. 37(1): 80-90) and the Memorandum "Updated Aluminum Multiple Linear Regression Models for Ceriodaphnia dubia and Pimephales promelas" dated 8/24/18. la. Is it appropriate to integrate the new toxicity data into the MLR equations? If not, why not? Re\ iewer Com mciils Response (o ComiiKMils Reviewer 1 Yes. In fact, results of these MLR equations show that the addition of the new toxicity data improve the models. Thank you for your comment. EPA agrees that the additional of the new toxicity data improves the models. Reviewer 2 Yes, the MLR models developed by DeForest et al. 2018 are basically statistical models. Therefore, the models will be more confident if more data are used for model calibration. The Memorandum mentioned the improvement (higher R2 values) when new data set was included. In addition, the new data set covered a wider range of water quality parameters. Therefore, the updated models logically can be used to predict the toxicity of Al for a wider range of water quality, such as hardness, pH, and DOC. Thank you for your comment. EPA agrees that additional data improves the MLR models, especially new toxicity tests that are outside the previously existing empirical range. Reviewer 3 Yes it is appropriate to include the new toxicity data in the MLR equation. The original DeForest paper specifically mentions that data expanding the range of pH, DOC and hardness would be required to use the model for parameters outside the calibration range. A limitation of MLR models, because they are empirical, is that you cannot use them for waters outside the calibration range. Expanding the calibration range is exactly appropriate. Examination of Figures 1-4 in the DeForest memorandum clearly show that effect concentration predictions only negligibly change with this added data. Thank you for your comment. EPA agrees that additional data improves the MLR models, especially new toxicity tests that are outside the previously existing empirical range. Reviewer 4 Yes. The new toxicity data fills gaps in the tested water quality conditions that were lacking earlier. Thank you for your comment. EPA agrees that additional data improves the MLR models, especially new toxicity tests that are outside the previously existing empirical range. Reviewer 5 The DeForest et al. 2018 ETC paper is the most comprehensive attempt at developing a model of the aquatic toxicity of aluminum in three decades. The paper develops a multiple linear regression model based on DOC, pH, and hardness conditions that are derived from a robust, screened aquatic toxicity data set. The regression analysis was on data from P. subcapitata, C. dubia, and P. promelas. The predictive MLR model demonstrated the ability to predict chronic toxicity with variable DOC, pH, and hardness conditions within a factor of two for 91% of the tests explored. There have been four citations of this paper in the very Thank you for your comment. EPA agrees that additional data improves the MLR models, especially new toxicity tests that are outside the previously existing empirical range 11 ------- Re\ iewer Com mciils Response (o ComiiKMils short period since its publication - achieving a highly cited notation. However, most of these have one of the authors as a co-author, and two contain the additional A1 aquatic toxicity data of Gensemer et al. The additional co-authors on these papers as well as their publication in the leading journals in the field suggest the research is if the highest quality. The MLR approach thus demonstrates in this peer-reviewed paper, its viability for use in a regulatory science arena related to risk management of the freshwater aquatic toxicity of aluminum. It is appropriate and necessary to integrate the new toxicity data into the MLR equations. The OSU Aquatic Toxicology Lab data completes and enhances the MLR robustness specifically because of the targeted test quality and range of water quality conditions of the data set. The regulatory science community is fortunate that this data set became available during the review phase of the 2017 Draft Aquatic Life Criteria for Aluminum in Freshwater. As demonstrated in the September 12, 2018, updated August 24, 2018, Memorandum, Updated Aluminum Multiple Linear Regression Models for Ceriodaphnia dubia and Pimephales promelas, the integration of the new toxicity data expands the DOC, pH and hardness ranges where the MLR can be reliably used. 12 ------- 2.3 Charge Question 1b. lb. Please comment on whether the pooled (fish and invertebrate captured in one equation) and non-pooled (fish and invertebrate captured by separate equations) MLRs are appropriately parameterized. Re\ iewer Com mciils Response (o ComiiKMils Reviewer 1 All of the MLRs are appropriately parameterized. I would not add anything to the model inputs. However, it was interesting to me that the ln(DOC) x pH term was excluded in the C. dubia model but retained in the P. promelas model. As a modeler, I have encountered scenarios like this in the past. Sometimes, this is just a matter of inadequate data sets. Thank for your comment. EPA agrees that additional data would improve the MLR models developed. However, the models were developed with the best available data at this time. Reviewer 2 The idea of combining fish and invertebrate data to develop a pooled model sounds reasonable because the model then can be used for predicting toxicity for both fish and invertebrate. However, it is not clear to me on how the sensitivity of each species was quantitatively taken into account. The Memorandum did mention that a species term and terms for each of the independent variables and their interactions were included in the pooled model but I don't see them in the results and conclusion. Equations 5 to 8 are separately for C. dubia and P. promelas. No slope for species term and intercept value was presented for the pooled models on page 6 of the Memorandum. The species-specific intercepts are presented on page 5 of the memorandum (for Equations 5 to 8). Note that for both of the EC2o models presented (Equation 5 to 8) all terms and slopes are the same except for these specific-species intercepts. If the pooled MLR model were to be used to develop aluminum criteria these intercepts would not be used in the normalization equation, but all the other terms and slopes would be used. Reviewer 3 The MLR method in the original DeForest paper is mathematically and scientifically sound. The parameters for both models were derived from this method so yes the parameters are sound. It is a limitation of empirical models that there is no theoretical basis for the values of the parameters so there is no theory to compare the values to. For this approach it is sufficient that the data points are described by the MLR parameters in a statistically best sense. Thank you for your comment. Reviewer 4 It's hard to say with confidence. Certainly, in the DeForest and others' update memo, the pooled model performs very well fitting the Ceriodaphnia and fathead minnow data. However, in comparisons between the pooled model, the non-pooled model, and the aluminum BLM (Santore et al. 2018), the outputs were sometime quite different. Conceptually, these patterns should be similar between the models. They weren't. Unfortunately, in this type of comparison, while the comparisons are reassuring when they are similar, when they are dissimilar it is not obvious why or which model is more believable. However, some aspects of the pooled MLR do seem amiss, with the flat response for hardness and a much greater magnitude of change for the DOC than for the individual slopes MLR or the BLM. Generally, the performance looks better for the non-pooled model, but that would have to be weighed against any advantage of reduced complexity and possibly better Thank you for your comment. EPA agrees about performance of the individual, non-pooled Model approach. EPA decided to use the non-pooled MLR model approach in the final aluminum criteria document, based on external peer reviewers' comments and EPA's own analyses. EPA's analyses comparing the performance to the two model approaches (individual, non-pooled vs. pooled MLR) is presented in Appendix L of the final 2018 Aluminum Criteria document. 13 ------- Re\ iewer Com mciils Response (o ComiiKMils response from stakeholders for the pooled model. Reviewer 5 The pooled (fish and invertebrate captured in one equation) and non-pooled (fish and invertebrate captured by separate equations) MLRs are appropriately parameterized. The published DeForest et al. 2018 paper, and the subsequent works that cite this paper, develop a significant level of background in the peer-reviewed literature about the dominant water quality characteristics influencing aluminum aquatic toxicity. In the MLRs, ln(DOC), pH, and ln(Hard) are used in a common and defendable manner to define probability distributions in the scope of this risk assessment. The ground-truthing of the model with toxicity testing results suggests robustness. "... the updated dataset supported development of a pooled MLR model that had comparably high adjusted and predicted R2 values compared to the species-specific MLR models. The pooled models also provided a similar level of accuracy in predicted EC 10s and EC20s compared to the species-specific models." Thank you for your comment. EPA agrees that the MLRs are appropriately parameterized and the toxicity testing suggests robustness. As noted in the 2018 final Aluminum Criteria document, EPA completed an analysis of the residuals (observed value minus the predicted value) for the two models (individual vs. pooled MLR) to determine if one model fit the data better. This analysis showed that the individual model's residuals had smaller standard deviations. Additionally, the pooled model had some patterns in the residuals of the predictions relative to the independent variables (e.g., pH). EPA elected to use the individual, non-pooled fish and invertebrate models in the 2018 final recommended aluminum aquatic life AWQC, based on external peer reviewers' comments and EPA's own analyses. 14 ------- 2.4 Charge Question lc. lc. Does the pooled model behave similarly as the non-pooled models? Re\ iewer Com mciils Response (o ComiiKMils Reviewer 1 Yes. The pooled model does behave similarly to the non-pooled models. In fact, the R2 were somewhat higher of the pooled model compared to the individual models. A strong case is made by DeForest et al. 2018, for the use of the pooled model over the use of the individual models. Thank you for your comment. As noted in the 201N final Aluminum Criteria document, EPA completed an analysis of the residuals (observed value minus the predicted value) for the two models (individual vs. pooled MLR) to determine if one model fit the data better. This analysis showed that the individual model's residuals had smaller standard deviations. Additionally, the pooled model had some patterns in the residuals of the predictions relative to the independent variables (e.g., pH). There were no patterns in the residuals for either the C. dubia or P. promelas individual MLR models. This modeling approach is also consistent with the approach in the draft 2017 aluminum criteria document. Reviewer 2 The predictions of the two models for various scenarios showed a similar trend (Fig A and B) but relatively the predictions of the two models at low and high pH are about 5 time different as discussed above. Thank you for your analysis. EPA agrees that model show similar trends but the predictions differ at low and high pH. Analyses comparing the performance to the two model approaches (individual vs. pooled MLR) is presented in Appendix L of the final 2018 Aluminum Criteria document (EPA 's MLR Model Comparison of DeForest et al. (2018b) Pooled and Individual-Species Model Options). Reviewer 3 Yes. There are three attached figures at the end of this document that demonstrate the same behavior of the pooled and non-pooled models (Figures 1 to 3). The individual (non-pooled) model and the pooled model both show protection (increasing EC20) as DOC increases and hardness increases for all 3 pHs plotted. C. Dubia was used as the example for these Thank you for your analysis. EPA agrees that the pooled model behaves similarly to the non-pooled model but the EC20s show differences, including that the predictions differ at low and high pH. EPA 15 ------- Reviewer Comments Response to Comments calculations. There are differences between the two models. The pooled model tends to show lower effect concentrations but the relative differences are never more than a factor of 2 and this only occurs at extremely low hardness values. The differences tend to be much smaller than that. More significantly it can be seen that by plotting the data used to calibrate the model (blue dots on Figures 1-3) the data and the model agree, although the pooled data does not agree as well as the individual data. This is to be expected because the pooled data has to satisfy more points simultaneously. The agreement between pooled and individual ECx predictions is also clearly shown by the four figures in the DeForest memo as mentioned in comment 1(a) above. elected to use the individual, non-pooled fish and invertebrate models in the 2018 final recommended aluminum aquatic life AWQC, based on external peer reviewers' comments and EPA's own analyses. individual C. Dubia pooled C. Dubia 15000 15000 . 10000 10000 5000 5000 % difference relative difference a; 50 DOC y 0.5 DOC Figure 1. C. Dubia MLR predicted EC20 values at pFI 6.3. The top left plot is determined using Equation 2 individual EC20 (EC20i) from the DeForest memo. The top right plot is 16 ------- Reviewer Comments Response to Comments determined using Equation 6 for pooled EC20 determinations (EC20p). The range of DOC and H were selected to match the calibration range of the MLR model. The blue dots correspond to chronic C. Dubia data from the chronic tab of the Criteria Calculator spreadsheet. The % difference plot corresponds to 100*(EC20i-EC20p)/EC20i and the relative difference is EC20i/EC20p. individual C. Dubia pooled C. Dubia 10000 - 10000 5000 - 5000 DOC % difference DOC relative difference aj u c |