United States Office of Water EPA 823-P-99-001 Environmental Protection 4304 December 1999 Agency s»EPA Response to Comments on 1998 Update of Ambient Water Quality Criteria for Ammonia ------- Response to Comments on 1998 Update of Ambient Water Quality Criteria for Ammonia September 1999 U.S. Environmental Protection Agency Office of Water Office of Science and Technology Washington, DC 20460 Office of Research and Development Mid-Continent Ecology Division Duluth, MN 55804 ------- Notices This document contains responses to public comments that EPA solicited through 63 FR 44256 (August 18, 1998), which announced the publication of the 1998 Update of Ambient Water Quality Criteria for Ammonia. Because that criterion was published as guidance and not regulation, neither the solicitation of public comment nor the response to public comment are required by law or regulation. Rather EPA is publishing these responses to improve understanding of the technical issues involved in deriving a criterion for ammonia. For this reason, this document includes technical comments only. Although EPA considered or acted upon the few policy-related comments submitted, discussion of this material is not included here. These technical responses are published in conjunction with a revision of that ammonia criterion, undertaken in response to public comment, and contained in the 1999 Update of Ambient Water Quality Criteria for Ammonia, which supersedes the 1998 document. It is important to note that except where indicated otherwise, all comment citations to page-, figure-, and table-numbers refer to those of the 1998 Update. Page-, figure-, and table-numbers changed in the 1999 Update. This Response to Comments document and the two above mentioned criteria Update documents are not regulations, and cannot substitute for the Clean Water Act or EPA regulations. Thus, they cannot impose legally binding requirements on EPA, States, Tribes, or the regulated community. The original comments are available in docket number W-98-20 at the Water Docket, Environmental Protection Agency, 401 M Street SW, Washington, DC 20460. The technical comments addressed in this document (in order of appearance) were submitted by (A) Jim Schmidt, Wisconsin Department of Natural Resources; (B) Tom Sinnott, New York State Department of Environmental Conservation; (C) Hall and Associates, Washington, DC, through several municipalities, wastewater authorities, or associations of such entities; (D) John Zambrano, New York State Department of Environmental Conservation; (E) John Sullivan, American Water Works Association; (F) David Fowler, Milwaukee Metropolitan Sewerage District; (G) Alan Anthony and Alex Barron, Virginia Department of Environmental Quality, and (H) John Hall, of Hall and Associates. Acknowledgment These responses were written by Charles Delos and Russ Erickson. Comments or questions on this material may be submitted to: Charles Delos, U.S. EPA, Mail Code 4304, Washington, DC 20460 (e-mail: delos.charles®epamail.epa.gov). ------- Comments and Responses Comment A-O I My only concern is with the use of EC2O estimates for defining chronic values. The intent of this value is clear, namely to provide a better definition of the safe value used to define chronic toxicity and particularly to generate a value that can be used consistently between species. However, my question is whether or not the general EPA guidance (such as the GLWQI) needs to be revised to allow this estimate, since there is no mention of the use of EC2O in any of the Federal guidance from which Wisconsin’s standards are derived. The concern, therefore, is not with consistency within the ammonia database, but rather with any implications the use of EC2O might have for the chronic toxicity database and criteria of other substances. We in Wisconsin would both suspect and expect that EC2O values should be able to be generated from the existing database on any compound, since it is just a matter of defining a dose-response relationship for each toxicity test result. However, it is not clear why this approach was not incorporated on more of an across-the-board basis, such as in the GL WQI. Our concern is more with consistency between substances than just for ammonia itself Another reason for Wisconsin’s concerns are more internal, as Wisconsin and other Great Lakes states are using 1C25 to define chronic toxicity in whole effluent toxicity tests. This matter may not be totally relevant when it comes to ammonia, especially since there are still on-going negotiations between the Great Lakes states and EPA over the whole effluent toxicity standards. However, the fact that states are using a 25th percentile to define chronic toxicity in the whole effluent toxicity test regime as opposed to the 20th percentile for ammonia is a consistency concern as well. Wisconsin is not really in a position at this time to suggest that the EPA chronic criteria for ammonia be based on an alternative level such as EC25 to be consistent with 1C25, but our concern is mainly over any implication that this approach be used for all substances (besides ammonia) without going through a more general review and approval scenario such as GLWQI. This is because we see no reason why the EC2O approach cannot be usedfor other substances, and we just want to make sure that the EC2O approach gets its own review rather than face any implication that this is general guidance that is “hidden” within the ammonia criteria document. Response The regression approach for obtaining an ECx (effect concentration for “x” percent of individuals) from concentration-response data from chronic tests is consistent with the 1985 National Guidelines and the GLI Guidelines. Neither guidance specifies the particular ECx or the particular statistical technique. For the 1998 and 1999 Ammonia Update, EPA used the EC2O, and a maximum likelihood nonlinear regression approach, using a weighting scheme to emphasize the data closer to the EC2O. However, the document notes that there is an element of risk management (choice of level of protection) in such use. Consequently, states and tribes are free to use other response levels in setting criteria. States and tribes are also free to use other regression techniques, or to continue obtaining chronic values via the traditional hypothesis testing approach. A table of regression parameter values for all toxicity tests used in the calculation of the CCC has been added to the 1999 Update. This allows readers to calculate any ECx (e.g., the EC25) for each of the tests and to derive the corresponding CCC, also as discussed in response to Comment B-02. EPA does not believe that there are substantial differences I ------- between levels of protection provided by the EC2O, the EC25, and traditional hypothesis testing. Comment B-O I In the development of the proposed chronic criterion for ammonia, the 1998 Update departs from the 1985 Guidelines by selecting the EC 20 as the appropriate threshold for chronic toxic effects instead of using the geometric mean of the lowest observed adverse effects concentration and the highest no observed adverse effects concentration. It is unclear whether or not this change was made for solely statistical purposes or f it is the intention of the EPA to establish a precedent that a 20% impact to survival, growth, or reproduction is acceptable. No defense, just!/Ication, or explanation for the selection of the EC2O as the appropriate chronic effects threshold, beyond its statistical utility, was provided. The New York State Division of Fish, Wildlife and Marine Resources (DFWMR) believes that the EC2O is an inappropriately high threshold that will not be protective of the aquatic community, particularly during periods of low flow when organisms are faced with multiple stressors. If regression analysis is used for determining chronic values, the appropriate threshold is the Ed, or X- intercept. The safety of aquatic communities should not be compromised because of the high variability of toxicity tests, particularly since contaminant concentrations at or near CCC levels are likely to be only one of a multitude of stressors that aquatic organisms are faced with during low flow conditions. Response Before addressing the comment, EPA notes that in oral conversation with EPA staff, the commenter withdrew this comment. However, EPA believes the issue is of general interest and merits discussion. As may be observed from concentration- response graphs in Appendix 6 of the document, the EC 1 would occur in the flat portion of the concentration-response curve and therefore would involve great statistical uncertainty. That is, given the inherent random variability in chronic toxicity results, there could be a wide range of concentrations that might in fact cause a one percent effect. The percentage of control organisms affected is in fact often rather variable and thus uncertain. Perhaps more important, however, is that EC2O is generally reasonably close to the traditionally used chronic value, the geometric mean of the NOEL (which statistically is not significantly different from zero effect) and the LOEL (which statistically is significantly different from zero effect). EPA believes that experience has shown that its other criteria, falling between the NOEL and LOEL, are fully protective. Thus, EPA believes that criteria derived from EC2O would likewise be fully protective. EPA did not intend any significant change in the level of protection inherent in the SMCVs or GMCVs. EPA only intended to provide a better technique for obtaining chronic values: one that better considered all the data within a study. In conclusion, EPA does not believe that an effect level as low as one percent among sensitive species is generally necessary for the protection of aquatic life uses. This is particu- larly true when applied to chronic tests of early life stages that naturally undergo a high rate of mortality, dependent on population density. Comment B-02 The 1998 Update document does state on page 71 that the selection of the EC 20 as a chronic threshold is a risk management decision. The document does not, however, provide the necessary information for a risk manager to select a more appropriate effects concentration. The data points for each study used to calculate the regression equation, or the final regression equations for each study were not provided It would have been very simple for the document to include a table that listed the data points for each study normalized 2 ------- to ammonia nitrogen at pH 8 and adjusted for temperature. It would have been simpler still to include i/ic regression equation for each study next to the corresponding graph in Appendix 6 Then risk managers would have the option of recalculating the ammonia criterion for a different effects concentration. Response EPA agrees. This information has been added to in the 1999 Update. The reader may now use the values of the regression parameters to calculate ECs for other percentage responses. It should be noted, however, that the weighting of data in the regression technique was done with the intent of estimating the EC2O. Consequently, estimates for ECs comparatively distant from the EC2O (e.g., EC5 or EC5O) might not be the very best the test data are capable of supporting. Comment B-03 The text of the 1998 Update document does not support the Office of Science and Technology Policy Recommendations that a cold-season ammonia criterion could be established that is as much as three-fold higher than the criterion applicable to the remainder oft/ic oft/ic year. Ostensibly, the same reasoning could be applied to any contaminant; that is, (1 early life stages are the most sensitive, then a less stringent standard can be applied during the periods of time when early l(fe stages are not present. Furthermore, a state can propose a less stringent criterion any time for any substance, providing that it is scient(flcally just flabIe. The data in the 1998 Update document explicitly do not provide adequate scient j/Ic j usufication for an arbitrarily-based three-fold increase in the chronic ammonia criterion during cold weather. The recommendation should either be withdrawn, or the 1998 Update document should be revised to provide explicit guidance as to how a cold-weather ammonia standard should be appropriately derived. Response EPA does not agree that the information presented in the 1998 Update does not support the cold-season relaxation of the chronic criterion where fish early life stages are absent. Nevertheless, EPA agrees that the discussion could be improved, and has substantially expanded the discussion and increased the level of analytical rigor for the 1999 Update. Data indicate that survival ofjuvenile and adult fish is a less sensitive endpoint than growth and survival of early life-stage fish. In addition, the data indicate that the sensitivity of invertebrates decreases with decreasing temperature. Consequently, it is appropriate for the chronic criterion to increase when fish early life stages are absent and water temperatures are low. The new discussion is contained in the 1999 Update sections on temperature dependency and seasonality of endpoints. Comment B-04 The cold-season CCC assumed that the toxicity of total ammonia to fish is independent of temperature for each endpoint. The chronic survival tests conducted over a range of temperatures for juvenile fathead minnows show that they are more sensitive at colder temperatures (that is, 9.6 mg NIL at 6 °C versus 19.3 mg NIL at 25 °C and 15.9 mg NIL at 30 °C). Response EPA does not agree. Fathead minnow, with an EC2O of 9.6 mg N/L at 6 .°C, is easily protected by the 1998 CCC with the 3X winter provision, and the ELS-absent 1999 CCC. Although EPA would be reluctant, based on the non-significant fathead minnow trend, to conclude that juveniles and adults of all fish species become more sensitive at low temperature, the seasonal adjustments would still be justified, even if all fish species exhibited the fathead minnow temperature trend. For 1999, EPA took the fathead minnow juvenile and adult GMCV to be 9.3 mg NIL. This is so far above the 3 ------- temperature-adjusted Hyalella GMCV that the fathead minnow GMCV could be cut in half without affecting the CCC calculation. Comment B-05 If the lower limit of each range were taken as a point value, then three times the lower GMCV limit of 3 mg NIL would not exceed the 9 mg NIL lower limit of the LC 20 range. However, f the upper limit of each range was taken as a point value, than three times the upper GMCV limit of 8 mg N/L (24 mg NIL) would exceed the 15 mg NIL upper limit of the LC2O range by a considerable degree. If the geometric mean of the ranges are used, three times the GM of the GMCV range (4.9 mg NIL) exceeds the GM of the LC 20 range (11.6 mg NIL). The very highest a cold-water ammonia criterion should be set at is only 2.4 times the year-round chronic criterion. Response If the summer CCC had been set at 3 mg NIL rather than 1.27 mg NIL, then there might be good reason to limit the winter adjustment to 2.4X rather than 3X. However, application of the 2.4X ratio to the actual 1998 CCC (1.27 mg NIL) would not seem appropriate because the two most sensitive genera are not fish, but invertebrates. For invertebrates, the reason the winter criterion can be higher is that the data for invertebrates show them to be less sensitive at low temperature. Consideration of the available fish LC2Os do not indicate that a 3X adjustment would yield a toxicity problem. Nevertheless, for 1999 EPA has redone the entire seasonal assessment to provide a genus- by-genus evaluation of chronic effect levels under winter conditions. Comment B-06 Regarding invertebrates, two most sensitive species in the ammonia chronic database are Hyalella azteca, and Musculium transversum. All of the chronic tests were conducted at temperatures ranging between 23.5 - 25°C. No data are presented to suggest that the cold- water chronic toxicity would be any different than the warm-water toxicity. On page 76, the document defends the concept that survival is less sensitive under cold weather conditions by using acute toxicity data. They cite one study that found the 96 hour acute LC5O for Musculium transversum was 1.9 times higher at 15°C than at 21 °C, and 2.7 times higher at 5°C than at 21°C. The same study found that for an amphipod, Crangonyx pseudo gracilis, the 96 hour LC5Os were 6-fold higher at 12- 13°C and 8-fold higher at 4°C than at 25°C. If the assumption is made that chronic toxicity is similarly reduced by colder temperatures, then raising the CCC can bejust /ied. However, acute toxicity is not the same as chronic toxicity. Ammonia is a metabolic waste product. No organism can live on its own waste, waste products must be excreted. Long term, continuous exposure to ammonia could have physiological impacts through different pathways than those that cause short term, acute toxicity. Even jf the toxicity pathways are the same, the document acknowledges on page 76 that: “The effect of temperature on the rate of biochemical processes might, however, affect the result of acute tests more than the results of chronic tests.” One could hypothesize a corollary to the finding of the amphipod toxicity data described above, that it would take the same concentration of ammonia 6-8 times longer to result in an LC5O at 5°C than at 25°C. Six to eight times 4 days (96 hours) is only 24- 32 days, sign /Icant!y less than a typical winter. Response EPA agrees that decreasing temperature would probably reduce chronic toxicity less than acute toxicity. Thus, in the 1999 Update the relationship between chronic toxicity and temperature has been taken to be less steep than the relationship between acute toxicity and temperature. EPA believes that the 1999 Update, in contrast to the 1998 Update, has evaluated the available data as rigorously as is feasible, and has accounted for the effect of 4 ------- temperature on kinetics. EPA does not believe, however, that it can endorse the comment’s speculation that the same concentration that causes toxicity in four days at 25 °C would cause toxicity in 24-32 days at 5 °C. Comment B-07 The proposed chronic criterion for ammonia is under protective because of the use of the “less than” values for Hyalella azteca and Muscilium transversum in the chronic database. Because these species are the two lowest species in the chronic database, they have considerable influence over the slope of the regression that determines the FCV. One way in which the uncertainty of the SMCVfor Muscilium iransversum could have been accommodated would have been to include the Zischke & Arthur (1987) field data, which showed a CVof 1. Inclusion of this data would result in a GMCVoJ 1.90 instead of 2.62, and the resulting FCVwould in turn be <1.15 instead of <1.27. While this approach is not mathematically or statistically precise, it does reduce the uncertainty of the Muscilium transversum GMCV. Some type of uncertainty analysis should also be applied to Flyalella azteca. Response The GMCV for Hyalella is a less-than value because the lowest treatment concentration yielded more than 20 percent effect. This species is being retested. If the results are substantially different than those presented in the document, then EPA will recalculate the criterion. The GMCV for fingernail clam is a less-than value because the relevant tests were of juvenile long term survival. It was not an early life stage test capable of reflecting survival, growth, and reproductive endpoints. No test protocol of that type exists for fingernail clam. EPA believes that including the fingernail clam juvenile long- term survival test in the data set improves the reliability of the criterion and is to be preferred over simply rejecting such tests as unsuitable for deriving chronic criteria. Lowering the fingernail clam GMCV by including field results does not substantially change the CCC. EPA has calculated the GMCVs from lab data because they are generally more precise than field data. EPA has no basis for using the field data for fingernail clam and not using the field data for other species. In contrast to fingernail clam, the field data for fathead minnow indicated higher effect concentrations than the lab data. EPA is also reluctant to modify its handling of fingernail clam data on the basis of factors related to the Hyalella test. Such attempts to balance unrelated potential errors tend to appear subjective and are difficult to explain. Comment B-08 The document erroneously states that the fingernail-clam chronic value is already based on long-term survival ofjuveni(es so it is a relevant end-point for cold-weather conditions. The chronic value was based on two 42 day studies during warm water conditions, and one “similar” study. Forty-two days cannot be construed to be “long-term survival “, considering that the typical lifespan for the species is 12-18 months. There is no basis for assuming those tests in any way reflect survival during cold weather conditions. Pennak (1989) reports that when temperatures drop below 10°C, freshwater mussels like Musculium transversum burrow into the substrate with only thefr siphons extended. Even though the animals are generally dormant, their siphons do open infrequently, so they would be exposed to ambient concentrations of ammonia during the winter. Response See response to Comment B-07 for discussion of the use of the fingernail clam tests. Also note that since the chronic criterion is applied as a 30-day average, 42-day average concentrations would necessarily have to be somewhat less 5 ------- than the 30-day average in real-world time series. EPA does not agree with the comment’s implication that the Update document assumed that the animals were dormant in winter. In fact, EPA assumed the that they were not dormant; EPA intended to provide them with the same level of protection in winter as in summer, while recognizing the observed effect of temperature on sensitivity. Comment B-09 Similarly, according to Pennak (1989), Hyalella azteca is a cold stenotherm. Although amphipods are typically shallow water organisms, some species can be found quite deep in the water and most are well-adapted to cold water conditions. There is no reason to believe that they are entirely dormant during cold weather. They can be present in very high abundances, and f active in during winter months, they would be exposed to potentially toxic concentratins of ammonia. Response As with Comment B-08, EPA does not agree that the temperature adjustment assumes that the organisms are “entirely dormant” at cold temperature, any more than the pH adjustment assumes that the animals are dormant at low pH. The temperature relationship was derived from the available toxicity data. Considerations of what the organisms might be doing in winter versus summer did not affect the Hyalella GMCV, in part because the available toxicity data do not allow distinguishing between survival and reproduction effect concentrations. Comment C-01 The Update contains no overall evaluation of the level ofconservatism.. . .Such an analysis would have clearly demonstrated that the suggested criteria are much more restrictive than the laboratory data underlying the criteria calculation. Response In fact, the 1998 Update did provide such an evaluation, at least with respect to the criterion concentration values, in the form of Figures 9- 12 (or Figures 11-14 in the 1999 Update). These figures compare all the acceptable acute and chronic toxicity test data to the criteria concentrations, and therefore would embody all aspects of the criteria process that affect the concentration calculations. There are issues, such as averaging periods, which are not embodied in this piot, which will be treated separately later, but these figures do demonstrate that the procedures used to derive the criteria did not result in criteria concentrations “much more restrictive than the laboratory data underlying the criteria calculation”. Consider the 1998 Update’s Figure 10 regarding acute toxicity. The FAV is supposed to correspond to the fifth percentile most sensitive taxa, so only a small percentage of the test results should lie below the FAV. In the pH 7.5-8.5 range, several percent of the LC5Os, for a variety of organisms, lie below the FAV. The criterion is two-fold lower than the FAV to provide a level of protection better than 50% mortality, but several LC5Os are even below this concentration. Thus, to characterize the criterion as “much more restrictive” than the laboratory data is unjustified. At more extreme pH (<7.5 and >8.5), only a few LC5Os are near or below the FAV and all are well above the criterion concentration. This might be a reason for the contention that the criterion is too restrictive, and is certainly a basis for criticism of the pH relationship later in the Comments. However, this behavior is exactly what is expected because there are few data in these pH ranges. The FAV is intended to be at the fifth percentile, so no LC5Os should lie below it in these pH ranges, in which there are not very many tests. The fact that there are some LC5Os at or very near the FAV in these pH ranges actually indicates the FAV is appropriate. To demonstrate this point, a more 6 ------- detailed analysis, provided in Appendix I of this response document, shows what scatter pattern is expected if the p1-I relationship used here is absolutely correct. This analysis shows a pattern very similar to that in Figure 10. For chronic toxicity, the 1998 Update’s Figure 12 provides a similar demonstration that the criterion is not “much more restrictive than the laboratory data”. There are several chronic values for a variety of species near or below the criterion line. Given the limited amount of data available, the location of the criterion relative to the data is reasonable and argues against the alleged severe compounding of conservative assumptions, at least with respect to the procedures used to derive the criterion concentrations. The fact that the data do not lie near the criterion concentration at high and low pH is again expected due to the limited amount of data and the relative sensitivity of the species involved. As mentioned above, these figures do not encompass all factors which might relate to how conservative the criteria are, but do refute the contention that the “criteria are much more restrictive than the laboratory data underlying the criteria calculation”, at least with respect to the criterion concentrations. Conservatism due to other issues such as averaging periods is addressed later in response to specific comments. Comment C-02 The Update document also fails to correct the long-standing misapplication of chronic criteria under seven-day once in ten year flows “. Response EPA has performed rigorous analysis of first- order serially correlated, log normally distributed time series of concentrations, with the intent of determining what percentage of grab samples or 24-hour composite samples would need to attain the CCC in order for the 30-day average not to exceed the CCC more than once in three years. EPA has previously noted, for example in material supporting the Great Lakes Initiative, that it believes that the once-in-three-year goal is consistently sufficient to insure protection, but not always necessary. Although the scope of the ammonia project provided for a careful assessment of the appropriate averaging period, it did not provide for a re-examination of the once-in-three-year goal. However, if it were assumed that the once-in-three-year goal was appropriate, the purpose of the time series analysis was to determine the grab or composite sample exceedance frequency (or the percentage of time exceeding) that would correspond to the 30-day average, once-in- three-year goal. EPA recognizes the incongruity of performing a sophisticated analysis on a rudimentary endpoint, but believes the results are still of interest, and relevant to addressing the comment. The time-series analysis indicated that to attain the 30-day once-in-three-year goal, the percentage of grab or 24-hour composite samples that need to be below the CCC depends on (a) the degree of serial correlation, and (b) the amount of variability in the time series. Serial correlation is measured by the correlation coefficient between the logs of daily mean concentrations on adjacent days. High serial correlation means that the variations in concentration occur smoothly. Low serial correlation means that the concentrations variations are abrupt and choppy. Low serial correlation makes it more difficult to put together a sufficient number of daily exceedances to cause the 30-day averaging period to exceed the criterion. Variability is measured by the standard deviation of the logs of concentrations. Because the analysis is simply relating the frequency of either grab or daily exceedances to the frequency of 30-day exceedances (not to the 7 ------- mean of the time series), it might not be expected that the standard deviation would be an important parameter. However, it should be noted that the time series is taken to be log normal, but the 30-day average is an arithmetic mean, not a logarithmic (or geometric) mean. The use of arithmetic means on log normally distributed values yields a dependency on the log variance. On a log scale the highs and lows are equidistant from median in such a distribution, but on an arithmetic scale the highs are relatively further from the median than are the lows. Consequently, as the log standard deviation increases, it takes fewer peaks, because of their magnitude relative to the troughs on an arithmetic scale, to exceed the 30- day arithmetic mean. The procedure for counting 30-day average exceedances is that used in the 1986 Technical Guidance Manual for Performing Waste Load Allocations, Book VI Chapter 1, Stream Design Flow...”. There are many different ways of counting multiple successive exceedances. The results are affected by the counting procedure used. EPA believes that this particular counting procedure is not unreasonable, even though it has no better relationship to the original selection of the three-year recurrence goal than other equally valid counting procedures. The results of the analysis indicated that for log normally distributed concentrations with log serial correlation coefficient between 24-hour composites of 0.86-0.94 or lower, and log standard deviation of 0.5-0.8 or lower, attainment of the CCC in 95% of grab samples or 24-hour composites can be expected to allow attainment of the 30-day once-in-three-year goal. Available data suggest that the above degree of serial correlation and variability are reasonable for surface waters. What this means is that maintaining concentrations below the CCC 95% of the time will yield attainment of the 30-day once-in-three-year goal. Comment C-03 The Update document, using acute data, concludes that fish (unlike invertebrates) are equally sensitive to total ammonia at low and high temperatures and therefore, constant total ammonia criteria are necessary. Although invertebrates could tolerate much higher ammonia levels at lower temperatures, no temperature adjustment was made in formulating the acute or chronic criteria because invertebrates were determined to be “insensitive” based on acute data. However, unlike the acute database, the two most chronically sensitive species were invertebrates, not fishes. Because the chronic criteria were based directly on invertebrate sensitivity to ammonia at high temperatures (25°), the Update should have allowed much less restrictive criteria at lower temperatures. This is a clear error in the chronic criteria derivation and has a major impact on the appropriate chronic criteria at various temperatures. The chronic criteria should be temperature dependent. Response EPA agrees that the 1998 Update was problematic in not having a temperature dependency. The analysis was revised for 1999, such that the CCC increases as temperature causes the sensitive invertebrate GMCVs to increase, irrespective of whether the fish ELS are present or absent. Comment C-04 EPA improperly asserts that, depending upon fish spawning, either a 1.27 mg/i or 3.81 mg/i (total ammonia as N) chronic criterion is necessary throughout the winter months. This recommendation is not supported by any information on the sensitive species that drove the document (e.g., Hyalella and Muscillum - fingernail clam). No data presented in the Update indicate that chronic toxicity criteria need to be more restrictive than 9.0 mg/i (as N) in the winter when sensitive organisms do not spawn and sign j/Icant growth is not occurring. All available winter data on the four most 8 ------- sensitive organisms (including bluegill) indicate that a value of 9.0 mg/I indexed to pH of 8 should be acceptable in the winter Response See response to C-03. EPA does not agree that the CCC could be 9 mg NIL. This would substantially exceed the likely effect concentrations for Hyalella. Refer also to responses to C-34 through C-42. Comment C-05 In generating the pH relationship, the Update stated that it was “speculative to assign different relationships for different taxa” and there, EPA “used [ the] average generic shape for the pH dependence” (Update @ 24 - 25). Contrary to these statements, the chronic pH relationship was ‘flattened out” (i.e., made more restrictive than the acute pH relationship) based on very limited data. This generated more stringent chronic ammonia criteria for pH values ranging from 7.0 - 7.7 even though a review of the very limited chronic data does not indicate that more restrictive criteria are necessary. Using test results below pH 7.0 (a relatively rare stream condition) to skew the criteria lower in the 7.0 - 7.7 range (a typical environmental condition) is inappropriate. This lowered the criteria by 30%@pH 7.5 and 85%@pH 7.0. Response EPA does not agree. First, although there are limited chronic pH data, statistical tests on the available data demonstrated significant differences between chronic and acute relationships, which is also very evident from visual inspection of the data (see Appendix II of this response document). Most importantly, the statistical comparisons included acute and chronic tests from the same study , for which the ratios between acute and chronic effect concentrations differed by several-fold across the pH range. Such large differences should not be ignored. Second, there is no inconsistency here between the data standards applied to generating chronic relationships and those used to elect a generic acute pH relationship. The Update recognized that acute pH relationships differ among taxa, and even considered the potential significance of the pH relationship for Hyalella on acute criteria at low pH and ion concentration. What “speculative” referred to was not whether there was a basis for concluding that certain taxa had different relationships, but rather which specific relationships, other than the pooled generic one, should be assigned to various taxa, especially those that were not tested for pH effects. Relationships between pH and toxicity were not available for the three most sensitive genera, and the fourth most sensitive genus (Oncorhynchus) has a relationship very close to the generic one. Establishing different pH relationships for different taxa was not justified due to the lack of certain data and the minimal effect such an effort would have on the criterion. This issue is thus much different than that regarding differences between acute and chronic toxicity, to which the comment drew a parallel. This issue of pH relationships is discussed in more detail in the responses to Comment C-24 through -33. Comment C-06 The pH relationship was developed based upon fish, under the claim that they are the most acutely sensitive species, but is then applied to the invertebrate data which represent the most chronically sensitive species. The invertebrate data do not indicate any consistent pattern of pH dependence (see, Hyalella acute data) calling into question the pH dependence assumption for chronic criteria. Response The first sentence of the comment does not correctly characterize the 1998 Update. First, the pH relationships were not “developed based upon fish”, but on a mix of fish and invertebrates. Second, a “claim that they (fish) are the most acutely sensitive species” had nothing to do with the development or application of the pH relationships. 9 ------- With regard to the invertebrate pH dependence, it is true that there is uncertainty, just as there is uncertainty in any relationships for criteria, including those proposed in the commenter’s Appendix A. The Update attempted to minimize uncertainty by using the average trends in data most relevant to each question. Nothing in this comment indicates that this was not the case and most definitely does not make the case for “conservative” assumptions. The legitimacy and uncertainty of the pH relationships is discussed further in later sections of this response document. Comment C-07 The chronic database was skewed with an abundance of more sensitive species, causing the calculation of a lower chronic criteria than isjust /Ied by the Guidelines for Development of National Water Quality Criteria for the Protection ofAquatic Organisms and Their Uses (USEPA 1985) (hereinafter “National Guidelines ‘). The Update acknowledged that the acute criteria database was more balanced with a better representation of sensitive and less sensitive species (Update @ 71). While EPA recognized this fact, no action was taken to properly balance the chronic criteria calculation. This assumption lowered the criteria by approximately 15 percent (15%). Response The above chronic database issue can be viewed from two standpoints: (a) relative to the Guidelines minimum database, and (b) relative to the acute database. The composition of the Update’s chronic database is in accordance with the National Guidelines, exhibiting the diversity specified for the minimum database, with the exception of one invertebrate, an insect. This shortfall was handled by assuming that, if tested in a full chronic test (in place of the available subchronic test), such insect would have been tolerant. This particular assumption cannot be viewed as conservative. In addition, it may be noted that the chronic database included two genera from the tolerant family Daphnidae, and thus could have legitimately had even one fewer tolerant species. On the other hand, when compared to the acute database, the chronic database is dominated by taxa with low or mid-range acute LC5O, and has little representation from taxa having high LC5Os. The Update document notes this fact in the context of discussing a variety of factors or uncertainties, some of which would raise the criterion, and some of which would lower the criterion. EPA does not feel that it has a basis for saying that the taxonomic representation in the acute database better corresponds to nature than does the taxonomic representation in the chronic database. Consequently, EPA does not feel that it has a good basis for undertaking what might well be criticized as a contrived adjustment of N, the number of tested species. In particular, EPA does not agree with the commenter’s implication that such an adjustment of N is called for by any provision in the Guidelines, although such an adjustment is not without precedent. Because N was already adjusted upward by one (for the tolerant insect), it is only three highly tolerant species shy of matching the acute data set. Increasing N by three would yield less than a 10% increase in the CCC. Comment C-08 Less restrictive acute and chronic criteria are justjfled under cool weather conditions based upon analysis of the studies that spec flcally evaluated temperature dependence for the more sensitive species. (See, Appendix A.) By “lumping” all of the data together in Figure 4, including those data and studies that were insufficient to assess whether a total ammonia/temperature dependence existed, EPA caused the criteria to be more stringent by afactor of 1.5 - 2.0 during the winter, even where sensitive fish species may spawn. Response EPA does not agree that the pooling method used in the Update caused the criterion to be 10 ------- more stringent at low temperature. This is discussed more in later responses (C-34 through C-42). Nevertheless, EPA agrees that the 1998 CCC was somewhat too low under certain conditions, particularly when fish ELS were present in cooler waters. This has been addressed in the 1999 Update. It might be noted, however, that under some other conditions, 1999 CCC is more stringent than the 1998 CCC. Comment C-09 The short term chronic criteria were arbitrarily reduced from an acceptable seven-day average value 2.5 times the thirty-day average to a four-day average 2.0 times the thirty-day average. There is no technical basis for this reduction in variability factor and averaging period. Response Some additional safety is provided by limiting the 4-day average to a factor of 2.0 times, rather than 2.5 times the 30-day average. However, EPA agrees that a 4-day average 2.5 times the 30-day average is supported by the data. The 1999 Update has incorporated this change into its recommendation. See also the response to Comment C-54. Comment C- 10 The chronic studies were generally long-term studies (significantly greater than thirty days). The criteria are typically applied to environmental conditions that occur for periods of seven to thirty days once every Jive to ten years even though the document indicates that a thirty-day exceedance once in three years is acceptable. EPA has previously acknowledged that applying criteria in this manner is “very conservative.” Despite this acknowledgment, the Update makes no attempt to advise state authorities on appropriate ways to convert the criteria to effluent limits without imposing excess conservatism. (For example, a 30/Q/3 flow should be used for permit derivation with the chronic criteria. Use of this flow basis ensures that even minor exceedance of the criteria will not occur more frequently than once in three years when permit compliance is achieved.) Response See responses to comment C-02 and C-59. Comment C-Il When treatment plant design conservatism is considered (normally plants are designed to operate at 50% - 30% of the permit limit and typically better under low flow conditions), it is apparent that the standards to permits process will produce afurther safety factor of two to three. Response EPA considers this to be outside the scope of the criterion. EPA does not believe that it is appropriate adjust the criterion to counteract potential conservative biases of engineers designing sewage treatment facilities. EPA cannot assume that all permit limits result in the addition of new or upgraded treatment processes, in which a design engineer would have an opportunity to exercise a conservative bias. Comment C-12 The Update asserts that acute criteria, based upon no-mortality 96-hour tests, should be applied as one-hour averages. There is no basis whatsoever in the Update to support this position, and this issue has major permitting consequences. The restrictive acute averaging period recommendation is cited by states to reduce allowable mixing zones even though no realistic acute threat is present. At most a 24- hour averaging period should be applied to the acute criteria with a caution that pulse or batch discharges (a non-municipal discharge scenario) need to be evaluated separately considering the toxicological information in the Update. Response This issue is raised again in Comment C-60 and will be discussed in detail there. However, it 11 ------- should be noted here that the 1-hour averaging period is specified in the National Guidelines, which allows alternatives where appropriate. As explained later, available data do not demonstrate a basis for significant relief from this default averaging period, and certainly do not justify a 24-hour averaging period. This comment also seems to imply that short averaging periods create inappropriate conservatism due to how they are implemented. EPA recognizes the need for updating its mixing zone guidance, which is contained in the Technical Support Document for Water Quality-based Toxics Control(1991). The averaging period is an expression of how a concentration time series should be restricted to limit the possibility of transient exposures that can cause greater effects than intended by the criteria concentrations. Setting a longer averaging period to compensate for potential conservatism in how the criteria are applied is not appropriate. Comment C-I 3 The recommendation to apply the EL S/invertebrate-based warm weather chronic criteria during the winter months violates the National Guidelines requirement that the criteria must be well supported and not have a sign jficant likelihood of over- or under- protection. The Update acknowledges that essentially none of the critical chronic criteria development assumptions apply during the winter months (e.g., the presence of sensitive life stages of fish, the temperature adjustment for invertebrates, etc.). The mixed database of sensitive invertebrates and fishes clear4’ indicated that non-ELS fish l festages are much less sensitive than ELS and that the invertebrate temperature dependency sign flcantly affects the chronic criteria calculation. Therefore, EPA must state that the ELS chronic criteria do not apply during low temperature periods, and the Agency needs to recalculate a winter criterion using valid and applicable assumptions. Response The National Guidelines do state general goals of avoiding over- and under-protection, but also give specific tests and procedures for setting chronic criteria, including data needs for adjusting criteria for various water quality factors, which are followed by the Update. The Guidelines do not recommend adjusting for specific factors if there is not sufficient data to do so. EPA believes that the cold-season policy provision in the 1998 Update addressed the concern of the comment. Nevertheless, EPA recognizes the problem that a lack of a temperature-dependency caused, and has provided for this in the 1999 Update. EPA believes that the 1999 Update also fully addresses the concern of the comment. Comment C-I 4 The use of Hyalella data from Borgmann (1994) (which were inserted by EPA into the criteria calculations after the Peer Review) is unauthorized because of high mortality occurring in the controls. Only 66% of the controls survived, and considerable variability was exhibited in the control reproduction (30 to 65 young per flask) . The actual individual test results only reported 60% survival in two of the four tests! This indicates that (1) the stock was diseased, (2) some factor other than ammonia greatly influenced survival, or (3) there were problems in running the test. EPA would never accept such results for establishment of site- spec flc criteria if submitted by a permittee. In any event, EPA should not use this data to establish the criteria, as it clearly fails acceptability guidelines (no greater than 20% mortality); thus, it should not be used to set a national criteria. Response EPA does not agree for the following reasons: (a) Reproductive tests arc generally quite variable, and variability per se is no measure of acceptability. The relevant issue is whether, 12 ------- with this variability, appropriate statistical tests demonstrate a significant effect of the chemical relative to the controls. This was clearly the case here. This is certainly not an inconsistency with the Guidelines. (b) This comment does not give references or specifics about its cited “acceptability guideline”. In fact, no such guideline exists for this test. There is a guideline of 20% mortality for certain tests. This includes a 10 y test for Hyalella, but the test is question here was for 10 weeks . If 20% mortality is acceptable in 10 days, is it unreasonable to have 13% more mortality in another 60 days? These are relatively short-lived organisms and 10 weeks is a significant fraction of their life span. In culture units that are reproducing and growing vigorously, average adult mortality often reaches and exceeds 20% over a ten week period, and the tests in question included earlier life stages which would typically have even greater mortality. (c) Several aspects of the tests argue against undue influence of disease or other factors. There was a clear, consistent, and steep dose- response with ammonia concentration. The data were combined from duplicate tests which were consistent with each other. Another test starting with adults (which did have only 20% control mortality) showed 68% reduction in reproduction over six weeks at the lowest ammonia concentration tested. This reduction is similar to the 78% reduction in reproduction over ten weeks observed at the same concentration in the tests starting with juvenile organisms. (The juvenile test was used because it (a) included lower concentrations which better defined the effect concentration and (b) included more life stages). For the above reasons, using this study is not only inconsistent with the National Guidelines, but actually quite in accordance with them and the use of the best available information. Comment C- 15 It should be noted that the Peer Review process concurred with EPA ‘s earlier recommendation that [ the Borgmann (1994) Hyalella data] should not be used because of the high control mortality. It is inappropriate to use this data to derive more restrictive criteria in light of this prior position and the lack of subsequent Peer Review of this data. Response EPA does not agree. It is true that in the draft Update that went to peer review, EPA did not include the Hyalella data in the actual criteria calculations. But this was not because of the high control mortality. The primary reason was that the EC2O was below the lowest treatment concentration, which resulted in a large uncertainty in the EC2O and violated the prerequisitess adopted for EC2O estimation. A secondary reason regarded uncertainty about possible impacts of pH variation during the test based on the pH ranges reported in the paper. Furthermore, the draft Update specifically stated that the calculated criterion should be evaluated with respect to whether it afforded protection to Hyalella based on this test. Therefore, although the draft Update did not use the EC2O of this test directly in the calculations, it still gave credence to this test. After the peer review, and in spite of one of the peer reviewers concerns about the suitability of the test, two things happened that changed the EPA position. Concerns about the pH variation were satisfied after receipt of more detailed data from the paper’s author. This allowed the pH associated with effects concentrations to be adequately defined. More importantly, it was realized that, even if the EC2O was too uncertain to use in calculations, the lowest test concentration represented a concentration which almost certainly exceeded the EC2O. Using this in the criteria calculations thus provided-a “anti- conservative” (high) estimate for this organism and for the criterion. Continuing to not use this point would result in a criterion higher than what valid information for a sensitive organism 13 ------- indicated. To not use this information would be inconsistent with the National Guidelines and the use of the best available information. EPA does not agree that such changes between draft and final represent an improper response to or handling of the peer review process. Peer review is intended to elicit expert opinions which will point out errors or raise issues not fully addressed in the draft document and which should be considered in further changes to the document. It does not require that all peer opinions or suggestions be followed - this is often not appropriate or possible, in part because there is often a difference of opinion among the peers. In fact, there were a variety of peer comments which, if followed, would have made the criterion more restrictive, but which were not adopted in the Update. Nevertheless, although EPA is not bound to follow the course of actions recommended by peer reviews, EPA does consider it necessary to provide clear justifications for not following peer recommendations. Peer review also does not preclude other changes not raised in the peer process, even if the peer group endorses an approach which is then changed. Again, proper justification should be given if the change appears to be contrary to peer opinion. With respect to the Hyalella test, EPA believes that it has clearly documented sound reasons for the change in question. Comment C-16 It should be noted that unpublished follow-up studies on Hyalella, not included in the record, indicated that increased ammonia levels were acceptable (between 2.5 mg/I and 3.5 mg/i [ as N] nominal exposure). However, the measured ammonia levels were much less than nominal exposures (35-60% less) calling into question the validity of the test measurements and procedures. (Note: nominal and measured concentrations in Borgmann (1994) were virtually identical over the ten week test period.) Response EPA believes that the new range-finding test (from the Columbia Lab), not used in the Update, shows Hyalella to be quite sensitive, the effects concentrations being no more than a factor of two higher than in the test used in the Update, even based on nominals. If anything, this argues against the contention that the earlier Hyalella test was faulty in some way. Based on measured concentrations, this new test shows even greater sensitivity than the earlier test. Comment C-i 7 The Update used organism reproduction as the sensitive endpoint in the chronic analysis for Hyalella. The Hyalella study authors concluded that organism reproduction should not be used for evaluation purposes because of low sample size and high variability between replicates. (Borgmann @ 332-333.) Courts have repeated ruled that it is per se arbitrary and capricious to use an expert’s test results in a manner contrary to the conclusions of the expert. A/may, Inc. v Cal 4fano. 569 F 2d 674 (D.C. Cir 1977). Thus, the use of the Borgmann study (jf at all) should be limited to chronic mortality endpoint analysis, recognizing that growth is also nor adversely affected at that level as stated by Borgmann. Response This comment misrepresents the cited statements by the author, which are as follows: Reproduction was significantly reduced at 0.32 mM ammonia in the experiment with young amphipods. This was first observed in experiment 2 (the first 10 week experiment). Concentrations of 0.10 and 0.18 were, therefore, added in the final 10 week experiment. Reproduction was also lower at these concentrations, but not significantly so because of the low sample size and high variability in reproduction between replicates. Reproduction in the experiment with adults was also reduced at 0.32 and 0.56 mM, although only the 14 ------- reduction at 0.56 mM ammonia was statistically significant. The mode of action of ammonia is, therefore, different from that of metals of PCBs, which do not cause significant reproductive impairment at concentrations below those causing chronic mortality (Borgmann et al., 1989, 1990, 1993). Chronic mortality is, therefore, not a reliable indicator for use in estimating safe concentrations of ammonia to Hyalella, unlike the other toxicants studied so far in our laboratory. What the author actually states is that reproductive effects at the lowest concentrations tested were not significantly different from the controls based on the statistical tests that he employed, because of the low sample size and high variability among replicates. This in no way is saying that reproduction should not be used for evaluation purposes and is certainly not supporting the use of chronic mortality instead of reproduction. In fact, the author clearly is concluding that reproductive effects are present at lower concentrations than mortality and explicitly states that it is mortality that is not reliable in estimating safe concentrations of ammonia. One problem here is that the statistical tests used by this author are insensitive and cannot confirm that 50-70% inhibition of reproduction is significant. This is not uncommon in the toxicity literature and more appropriate statistical evaluations show these reproductive effects to be real, as Borgmann clearly believes they are. EPA also believes that it does not make sense not to use a study which shows an organism to respond to low concentrations because the author did not statistically demonstrate that the effects were present at even lower concentrations. But even if this comment were correct that reproductive effects should be ignored, what would be the consequence to the criterion? The EC2O for survival in these tests is at the lowest test concentration, the same concentration used in the Update. Thus, the criterion would be exactly the same, except that it would not be based on a “less than” value. Comment C- 18 In addition to basic data acceptability issues regarding the Hyalella test, there are a number of confounding factors that preclude use of this test. Both Borg,nann studies (1994 and 1996) confirmed that a host of water quality factors, unrelated to ammonia, influence the toxicity of ammonia to this organism (potassium, hardness, bromide, and sodium). The 1996 acute tests demonstrated that water effects change organism sensitivity by at least a factor often. In mid-west and western streams where salt and hardness levels are typically high, this organism would be insensitive to ammonia at hardness ,greater than 200 mg/l. Thus, whether or not this organism is the “most sensitive tested” will vary from site to site. Given this information, the National Guidelines would require that the criteria be a function of these various parameters. The Update, however, concluded that essentially no water efftcl ratio is relevant to the criteria. Given the atypical response of Hyalella, using this organism to drive the national criteria calculation is inappropriate. Minimally, the Update should recalculate the criteria eliminating Hyalella for streams with hardness greater than 200 mg/I. Response This comment does not accurately reflect available data and relationships. It is true that certain ions do affect ammonia toxicity of Hyalella. Ankley et al. (1995) reported that acute ammonia toxicity to Hyalella decreased with increasing hardness, and further noted that this increase was greater at low pH. At pH 8.5, the variation in LC5O was only 1.5-fold between soft water (50 mg/L CaCO3) and hard water (240 mg/L CaCO3), but appeared to be at least 10-fold different between soft and hard water at pH 6.5. Borgmann (1994) also showed a large difference in toxicity between waters with hardness of 14 mgfL and 140 mg/L as CaCO3 (although this effect was also confounded by 15 ------- pH, the effects of other ions were clear). Borgmann and Borgmann (1997) showed that these effects were due to sodium and potassium, not hardness (and also not bromide as this comment asserted). The effects of sodium and potassium appear to be on the toxicity of ammonium ion, not unionized ammonia, which is why the effects are most pronounced at low pH, where ammonium ion toxicity predominates. This implies the following: (a) EPA does not agree that the criterion should be recalculated for hardness> 200 mgfL by excluding the Hyalella data for this range. First of all, hardness is not the factor of importance here, so it would be inappropriate to adjust the criterion in terms of hardness. Second, it would be inappropriate to simply exclude one species, because there is no reason to suspect that Hyalella is “atypical” as the comment suggests. These ion effects are consistent with mechanisms for ammonia toxicity that are likely true for other organisms as well, though not necessarily to the same degree. In fact, other authors have noted effects of such ions on ammonia toxicity to fish (e.g., Soderberg and Meade, 1992, J. Appl Aquaculture 4:83). (b) If there is some adjustment to be made, it should be based on sodium, and this adjustment should also vary with pH. Borgmann and Borgmann (1997). propose a model for this, this model being a simple extension of the joint toxicity model already used for the pH relationship in the criteria, with the toxicity of ammonium ion being sodium dependent rather than constant. (Their model also includes a potassium dependence, but the sodium dependence will almost always predominate in natural waters.) (c) The data and model of Borgmann and Borgmann indicate that increasing sodium concentrations above that already present in the Hyalella chronic reproductive test (0.6 mM, or about 14 mgfL) would have relatively little effect. Their data show, at pH 7.6, the LC5O at 10 mM sodium (230 mgfL) to be only 45% greater than at 23 mgIL. Their model indicates that if the chronic Hyalella test had been run at tenfold higher sodium levels (140 mg/L), the effect concentrations would only be about 30% greater. So high a sodium concentration is very unusual in receiving waters (the 14 mgfL in the test itself is already at the median sodium concentration for U.S. waters based on NASQAN monitoring, and less than 10% of waters have concentrations even over 100 mgfL). Very few waters would get relief, and what relief should be given is limited and uncertain. (d) In contrast, applying this model to the Hyalella chronic data would usually result in greatly lower effects concentrations in waters with low sodium concentration. Borgmann and Borgmann (1997) showed 7-day total ammonia LC5Os as low as 0.14 mg/L for low ion waters (2 mg/L sodium) with pH of 7.4-7.8. Furthermore, this model (and Ankley’s data) suggest the pH relationship should be more restrictive at low pH because these sodium effects are more pronounced there. (e) However, these effects are only well established for Hyalella, and only for acute toxicity. The limited data with fish are hard to quantify. The key question would be what to assume about the sodium dependence for other species - the same as Hyalella, something different, or none at all? There really is not sufficient information to address that issue, but whatever is assumed, the consequence might be more restrictive criteria under many conditions, with some modest relief when sodium concentrations are high. The comment is correct that this is an issue of potential importance. It would be preferable to have this factor accounted for in the criterion, but the Update did not do so because sufficient data to reliably quantify the effect of dissolved ions on ammonia toxicity is lacking. It is inappropriate to characterize the Update as deviating from the Guidelines simply because it does not account for all factors known to affect 16 ------- toxicity. In fact, to adjust the criterion for this factor the Guidelines would require more data than is available in this case. The issue should be what uncertainties are present because this factor is not addressed and what uncertainties would be introduced by modeling the factor with inadequate information. Low dissolved oxygen and low levels of chlorine are also known to increase ammonia toxicity, which would be relevant to P01W discharges, but also were not accounted for because of uncertainties regarding modeling them. Comment C-I 9 Use of the 1981 Sparks and Sandusky fingernail clam study is inappropriate. The study acknowledged numerous flaws and problems with the culturing of the organisms and the conduct of the tests as follows: (1) the authors had a history ofproblems with growth and reproduction in the lab; (2) bacterial slimes impacted the tests; and (3) growth of harvested organisms only lasted two weeks! With respect to the well water test used by the Update, the study authors stated that the clams in well water “didn ‘t grow at all” and were “starving” due to inadequate food supply. (Sparks and Sandusky ® 32-36.) Given the authors clear statements that the tests were not run properly (one must feed organisms in a chronic test), it is apparent why these test results were sign (/Icanlly less than similar tests run at the same laboratory. This test result should be stricken from the criteria calculation. Response While this comment is correct about some problems faced in the series of experiments by these authors, it misrepresents much about this study. The most significant errors involve the quote and reference to page 32-36 in the report and the implication that organisms were not fed. First of all, the organisms were fed. Second, the growth problems in this section referred to an experiment early in the study in which feeding was more limited than later in the study (when the ammonia tests were run). Third, for the experiments after which the feeding was modified (including the ammonia experiment) the organisms did grow in the control and this growth continued throughout the tests (it did not last only “two weeks”). Fourth, there was a dose dependent inhibition of growth correlated with ammonia exposure, including in the first two weeks, further substantiating that the control organisms were healthy enough to grow in the absence of ammonia stress. Admittedly, the control growth was still low compared to tests in river water and can probably be attributed to less food, but it is inappropriate to characterize this as “starving” and it is sheer speculation to suggest that the response to ammonia is due to anything other than the ammonia. Even if suboptimal nutrition did increase susceptibility, this does not invalidate the test - organisms in nature do not always exist under optimal conditions or show optimal growth either. EPA also does not agree that problems in the Sparks and Sandusky study were the reason “test results were significantly less than similar tests run at the same laboratory” (in the study of Anderson et al.). There is no particular reason to consider the early tests to be more valid. They were conducted by the same organization using very similar methods. If anything, the later study improved feeding methods and showed better growth in the control organisms than the earlier study. Clearly, the nutritional status is not a likely reason for the differences between the two studies. Organism sensitivity can vary for a variety of reasons, and the fact that the earlier study had a higher effect concentration should not be treated as evidence that it is more valid. There is no convincing reason to conclude that the effects concentration from the Sparks and Sandusky study is inappropriately low and should not be used. While the control growth was low, there was growth, there was good control survival, and there was a consistent and large dose-response relationship with ammonia concentration. Such results should not be ignored, and neither should the results from the 17 ------- earlier study be ignored. The Update averaged results from both studies, so the value used in the criteria calculation represents a compromise, moderate sensitivity. The Update also considered other information to evaluate whether the sensitivity indicated in the Sparks and Sandusky study should be suspect and whether the value used for criteria calculations should be considered inappropriate. It was found that other available information suggests that this organism should be sensitive. The mesocosm study at Monticello discussed in the Update showed substantial effects on fingernail clams at concentrations near or even below that of the Sparks and Sandusky study, and in this study the control treatment showed high reproduction rates, indicating that the organisms were thriving. The Update also considered that these studies in the laboratory did not include reproduction and early life stages, which have been shown to be more sensitive than juveniles to ammonia in studies with other clams, so if anything the effect concentrations would be expected to be too high. Another factor considered was that the study of Anderson et al. showed that ammonia quickly decreases ciliaiy motion in clam gills at low concentrations - such an endpoint is not one that can be used directly for criteria, but provides information on ammonia effects that increases the credibility of observed effects on growth and mortality. Finally, another recent study substantiates low effects concen trations for a similar clam species. Hickey and Martin (1998, Arch. Environ. Contam. Toxicol.) reported on 60 day tests of Sphaerium novaezelandiae, a New Zealand species closely related to the one used in the Update, and a genus also found in the United States. They reported total ammonia EC5Os for survival of 3.8 mg NIL, for morbidity of 2.7 mgfL, and for reproduction of 0.8 mgIL. The pH in these tests varied between ammonia exposure levels and with time, so there is some uncertainty as to what pH these effects concentrations correspond to, but it is certainly less than 8.0 and perhaps as low as 7.5. In any event, these results indicate sensitivity as greIt or greater than the Sparks and Sandusky study and the Monticello study and the authors stated that “the use of the U.S. EPA criteria would provide minimal protection for S. novaezelandiae for chronic ammonia exposure”. Comment C-20 The National Guidelines @ 43 require that the criteria accurately reflect facto rs that influence organism sensitivity. EPA stated that invertebrates exhibit a temperature dependence with total ammonia. Therefore, the Hyalella andfingernail clam data (the two most chronically sensitive organisms) should have been adjustedfor temperature, yet no adjustment was made in developing the chronic criteria. As discussed in more detail in the temperature section below, failure to adjust the chronic criteria to reflect changes in organism sensitivity had a sign jflcant impact on the chronic criteria calculation. Much less restrictive chronic criteria would have resulted for temperatures ranging 20-0°C, regardless of whether or not early l fe stages offish were present. Therefore, the Agency needs to recalculate appropriate chronic criteria for various temperatures (25 - 0°C). Response For 1999 the CCC has been modified to account for the expected temperature dependency of the sensitive invertebrates, while still protecting ELS of fish when present, orjuvenile and adult fish when fish ELS are not present. EPA believes that the National Guidelines provide for the criteria derivation procedures used for temperature and seasonally varying 1999 CCC, but in no way require them. Comment C-2 1 The chronic data set fails to meet the minimum data guideline of eight species including salmonids and an insect, in addition, the exclusion of the extensive l /’e cycle test trout database developed by Thurston (which would have resulted in the calculation of higher 18 ------- criteria) resulted in the calculation of unduly restrictive chronic criteria. Failure to use the lçfe cycle tests simply because earlier, less extensive tests produced lower results is not valid. Such an approach would only allow new data to produce more restrictive criteria. If the minimum requirements are not met, EPA needs to use a d f [ erent calculation methodology. Minimally, the chronic criteria should be recalculated including the salmonid data and adding several insect test results. Response Regarding the salmonid data, it is true that this data was not used in the calculations, for reasons explained in the Update. The rainbow trout tests varied substantially among four different studies. The National Guidelines would normally give primacy to the life cycle (LC) test, but this primacy rule is based on the presumption that life cycle tests are more likely to include all sensitive stages and be a more accurate reflection of risk. Because two of the early life stage (ELS) tests were substantially more sensitive than the life cycle test, automatically giving primacy to the life cycle test becomes questionable. This is not equivalent, as the comment asserts, to only allowing “new data to produce more restrictive criteria”, but rather recognizing that LC tests by their very nature usually are more sensitive and that when ELS tests are actually more sensitive it is legitimate to consider why. It should be noted here that the National Guidelines stress the importance of not simply adhering to the specific recommendations, but doing what is felt most scientifically sound given available data. There are various reasons that those ELS tests would show more sensitivity than the LC test, and some of these reasons would mean that the ELS tests should be treated on a par with the LC test, or even given primacy. Depending on how these data are handled, this species could range from the most sensitive to moderately tolerant. The situation was further complicated by some tests providing only upper or lower limits of effects concentrations. Nevertheless, by comparing the CCC to the all the available salmonid data, viewed as a whole, a decision can be made as to whether the criterion should be considered adequately protective or not for salmonid waters, or whether the criteria should be modified for salmonids upon separate consideration of the rainbow trout tests. EPA does not agree that its handling of the salmonid data produced “unduly restrictive” criteria. In fact, it had very little effect on the criterion. If the rainbow trout data (either the LC test result alone or the average of the values from all rainbow trout tests) were used in the calculations, the dataset would be one genus larger and this would increase the calculated FCV from 1.27 to 1.31 at pH=8, an increase of only 3%. Regarding the insect data, an acceptable chronic test was not available, so it is true that the minimum data set was not fully met. Nevertheless, based on the other insect data available, the Update assumed that, if an insect were tested, it would likely be more tolerant than the four most sensitive genera tested. The criterion was calculated based on these four genera, but increasing the total number of genera by one to account for the presumed tolerant insect. This is justified as follows. First, the assumption that a insect would be tolerant is reasonable, given their high acute tolerance and the fact that a subchronic test on an insect suggested it was fairly tolerant. There is no need to assume a specific tolerance, only that the insect is more tolerant than the fourth most sensitive genus. It should be noted that this assumption is not “conservative”, which the commenter was earlier concerned with, but the opposite of conservative. Second, calculation of the chronic criterion directly from chronic data is the prefened method in the Guidelines, although it is generally not done because of the lack of chronic data. Calculation by acute- chronic ratios is subject to more uncertainty, especially in this case where acute-chronic ratios are not available for some important taxa. 19 ------- To abandon direct calculation because the database does not include one element which probably is tolerant and will not be used directly in the calculations is contrary to trying to use procedures which will produce criteria most reflective of the available data. Finally, EPA can see no valid technical reason why the chronic criteria would need to be recalculated including “several insect test results”. Comment C-22 The Update correctly notes that the chronic criteria database is skewed by the inclusion of a preponderance of sensitive species. Therefore, a more reasonable estimate of the chronic criteria needs to be developed by utilizing a higher N value that offsets the skewed database. An N of 12 would appear reasonable based upon the acute data sets. (See, Exhibit 2 comparing the sensitivity of organisms contained in the acute and chronic databases.) This would produce a total ammonia chronic criteria (25°C atpll 8) in the range of 1.5 - 1.75 mg/i (as N) consistent with the reliable data sets (i.e., excluding Hyalelia). Response EPA does not believe that this comment correctly reflects what the Update stated. It is true that the Update did note that the acutely tolerant species are under-represented in the chronic database relative to the acute database. But this was one of several comments speculating on how different factors, changes, or options might increase or decrease the chronic criterion, and did not specify that the chronic database was in fact inappropriately skewed to sensitive species. The differences between the acute and chronic database does not necessitate recalculating with larger N. There is no basis for deciding that the acute data set provides the preferred balance among species. Both data sets display a reasonable amount of diversity once the tolerant insect assumption is accounted for in the calculations. It cannot be known which better represents the range of sensitivity that would be found in an assemblage of taxa in the field, and for that reason, while the Update mentioned the possibility of further adjusting N, did not carry through with it. (See also C-07.) Comment C-23 EPA stated that any chronic criteria should be limited by the bluegill test result from Smith (1984). As indicated by the Peer Review comments (which EPA ignored), the sole bluegill study by Smith should not be used to arbiirarily decrease chronic criteria because it is a single unver fied test, critical DO information was not available to ensure test reliability, significant pH variability occurred over the duration of the test, and bluegill were not confirmed to be a highly sensitive species based upon EPA ‘sfield research. It is irrational to spend hundreds of thousands of dollars to investigate organism sensitivity, including closely related species sensitivity, and then to ignore all of the data by using one organism to set a national criteria. As stated in the Peer Review, absent confirmation that bluegill sensitivity sign /Icantly different than closely related species, the genus mean chronic value should be used in the criteria derivation as, on balance, the data indicate that this species will be adequately protected Response This comment seems somewhat moot, since neither the 1998 nor 1999 criteria do this, but the Update does note that this issue could arise if modifications to the criterion would raise the criterion above the chronic effect concentrations for the bluegill. If this happened, reliance of the criteria on just one test on bluegill, having a result quite’ different from a related species (green sunfish) having two chronic tests, would be an important issue. EPA does not agree with several of the statements made in this comment: (a) EPA does not believe that the actual pH in this test (as opposed to the reported pH) was excessively variable. The investigators evaluated the mean and variability of pHs by 20 ------- first converting pHs to hydrogen ion concentrations, determining means and standard deviations, and then converting back. Doing such calculations on a hydrogen ion basis is more appropriate for certain calculations, and during the era this test was done it was thought to be desirable for this type of data, although it is usually no longer considered so. But to provide a meaningful measure of the variability of pH is not just a matter of converting the hydrogen ion standard deviation back to the pH scale, which is what the authors attempted to do here. Thus, the mean pHs in this report are valid, but the standard deviations are not. The original data is not available, but it is now thought that the likely error in converting these values back to the hydrogen ion scale inflated the likely true low variability in the pH (probably around 0.2 units). These tests were run in exposure systems and water used for many similar tests, which almost always had low variability in the pH. (b) While the D.O. data are not available, various tests run with this or similar systems in the laboratory maintain D.O. levels well above values of concern. Give little likelihood that D.O. was unacceptably low, the absence of such data is a not a good reason for discounting this test. (c) The comparison by the Comments of these results to those for bluegill in the Monticello channel are not valid because the Monticello experiment only tested survival arid growth of juvenile bluegills. The Monticello results showed a 40% reduction in growth ofjuveniles at the highest exposure concentration (ca. 7 mg NIL). No mortality was observed due to ammonia, which is consistent with laboratory tests which indicate juvenile bluegill mortality requires higher concentrations. (However, it should be noted that with such substantial sublethal effects, mortality in the Monticello streams would likely have occurred at concentrations not much higher.) The chronic ELS test includes a more sensitive lifestage, and even that lifestage would be expected to show only small effects at the second highest (2 mg NIL) exposure level in the channels. Consequently, the Monticello experiment does not refute this experiment. The fact the bluegill juveniles are no more sensitive than many other fish species does not necessarily imply bluegill ELS can be no more sensitive than many other species. (d) Lastly, the invocation of the peer review is also of doubtful relevance here. Only one of five reviewers objected to using the test, and the only stated reason was that it was an unrepeated test that was more sensitive than other members in that genus. This comment is therefore misleading in its second sentence, which implies that the peer review comments raised all the listed concerns (including the D.O, pH, and field study issues discussed above) and that this was more than the opinion of one reviewer. The issue that does remain is whether one test is a sufficient basis for setting a criterion with the importance of ammonia, and with a data set such as is available for ammonia, where most species had multiple chronic tests available for defining their SMCVs. The use of an unrepeated test is in fact consistent with the Guidelines, as are the use of unrepeated tests for the four most sensitive genera, which can also have a significant effect on the criterion. Nevertheless, verification of tests that are critical to the criterion is a legitimate issue that should be considered if reliance on the single bluegill test became an issue. At lower temperatures the temperature- dependent 1999 CCC (with ELS present) does take on values above the bluegill ELS SMCV. Because bluegill early life stages prefer rather warm water (e.g., the ASTM chronic testing protocol calls for 28°C water), and because bluegill spawning generally occurs during only the warmest weeks or months of the year, the risks to bluegill are probably subdued, although not necessarily negligible if the Smith et al. results were to be accurate. 21 ------- Comment C-24 The Update @ 24 acknowledges that the relationship appears most applicable to fishes while other organisms (i.e., invertebrates) do not appear to respond in the same manner. Consequently, the Update acknowledged uncertainty with the pH relationship and eschewed an empirical approach to the data that “uses the average generic shape for the pH dependence.” Response In fact, no such statement or acknowledgment is made by the Update. The Update 24 notes variability among the taxa, especially at low pH, but does not make any distinction between fish and invertebrates. In fact, there is variation within both fish and invertebrates, and both groups contain species with pH dependence similar to, greater than, and less than the generic acute pH relationship. For chronic toxicity, the pH relationship is based on a fish and an invertebrate, both of which show a dependence similar to each other and to the chronic pH relationship used in the Update (see Appendix II of this response). Comment C-25 Although the Update (@24) stated that individual test results should not be used to mod jfy the pH relationship, the relationship was mod /Ied for chronic toxicity to be more restrictive in the range ofpH typically relevant to municipal facilities discharging to low flow streams (pH 7.0- 7.7). Response EPA believes that this comment has misinterpreted the Update. What the Update actually stated was that it would be speculative to assign different relationships to different taxa. Perhaps this was not explained as well as it could have been, but the issue it intended to address was whether different pH relationships could or should be assigned to each species or genus or larger taxonomic group, or whether a single average relationship should be used. This involved not just deciding what the pH relationship was for the species included in Figure 6 (1998) or Figure 8 (1999), but for other species as well. Three of the four most acutely sensitive genera did not have pH data and it was not evident what individual relationships in this figure would be most appropriate to them, other than an average one. The fourth genus already had a relationship very similar to the average relationship for all species. An average pH relationship therefore seemed most appropriate and relevant to the actual criteria calculations. Assigning different pH relationships for each taxa was “speculative” and served no useful purpose. However, this does not mean that these relationships could not be developed. Clearly, channel catfish exhibited a relationship different from most other fish, and a separate model could be developed for this species. But it would have no impact on the acute criterion because this species is not among the more sensitive species at any pH. Comment C-26 [ The d j’ference between the acute and chronic pH relationships] was based primarily upon a single, unconfirmed test result using smallmouth bass. Response In fact, the shape of the chronic pH relationship reflected both the smailmouth bass and daphnia studies, weighted roughly equally. Appendix II of this response provides graphs showing the similarity between the two studies. It is furthermore inappropriate to characterize this relationship as depending on a single test, because each study involved multiple tests, both acute and chronic, that showed consistent trends. Comment C-27 No analysis was presented to assess (1) how well the acute toxicity/pH relationship fit the available chronic data, (2) the needfor the reduction in light of the relative sensitivity of the organism, or (3) whether invertebrate data 22 ------- confirmed the need for a more restrictive approach Response Regarding item (1), the Update on page 26 reported that the regression analysis of the chronic data showed its relationship to deviate significantly from the acute relationship. A graphical presentation of the differences is provided in Appendix II of this response, as well as some additional information on the statistical results. Regarding item (2), which EPA interprets to refer to the fact that the taxa used for the chronic pH relationship were not among the four most sensitive genera. The argument being presented apparently is that, if the more restrictive flattening at low pH is based on relationships established for tolerant organisms, this restriction is not needed for criteria based on more sensitive organisms. While it would be preferable to have pH relationships for the most sensitive organisms, in the absence of such information “the need for the reduction” rests simply on what assumption is most appropriate regarding the sensitive taxa: (a) the relationship established for chronic toxicity in other organisms, even if they are more tolerant, or (b) the pH relationship that is based on even more tolerant, acute endDoints . The available chronic relationships provide the most relevant information, and accord to the National Guidelines regarding what data such relationships should be based on. It should also be noted that there are other options, possibly including basing the chronic pH relationship for Hyale Ha on the acute pH relationship for Hyalella. But this acute data also shows that the slope of the pH relationship should be more flat (than the average acute pH relationship) to extrapolate to low pH from pH 8, where the chronic Hyalella test was conducted. Thus, even based on the acute pH data, it can also be argued that there is a “need for the reduction”. Regarding item (3), the chronic pH relationship data included invertebrate data, which followed nearly the same trend as the fish chronic data. Again, Appendix II has figures demonstrating this. Comment C-28 The chronic data presented in Figure 12 (Update @ 70) do not support a sigm/Icant decrease in the slope of the pH relationship below pH = 7.7. EPA expended considerable effort to demonstrate that the overall pH relationship for acute toxicity was a good fit for the data trends (Update Figure 7 27). Because of the conflicting information on the causes of ammonia toxicity (total versus un- ionized ammonia), the approach employed was “somewhat empirical” and designed to fit the available data. (Update @ 7, 21 - 29.) The decrease in slope used in the chronic pH relationship was based on toxicity tests for only two organisms (Update Figure 8 28) and no comparison was made to demonstrate that the acute relationship was a “badjit” of the chronic data. The data do however indicate that chronic toxicity may ‘flatten out” (e.g., become less restrictive) above pH = 8.5. EPA should revise the criteria to ‘fit the data” consistent with the methodology claimed to be used. Response The Guidelines specifically note that if the acute-chronic ratio varies with the water quality characteristic, the acute relationship should not be used. This is the situation here, as discussed further in Appendix II. The Guidelines also speci ’ that a chronic relationship should be developed from chronic data, if sufficient data exist for at least one species. In fact, there were two data sets here that showed similar pH relationships for chronic toxicity data that differed significantly from the acute pH relationship based on regression analysis. Again, these differences are presented in more depth in Appendix II of this response. The smallmouth bass data set is particularly compelling because it includes parallel acute and chronic data sets which show acute-chronic 23 ------- ratios to increase consistently and significantly with decreasing pH. This is completely contrary to any assertion that these relationships should be the same. This does not preclude the possibility that, once tested, the chronic pH relationships for the sensitive organisms will be different, but in the absence of such information, the best assumption is that the relationship for chronic toxicity should have a less steep slope for extrapolating to lower pH. EPA does not agree that Figure 12 (1998) or Figure 14 (1999) supports a steeper slope at low pH and a shallower slope at high pH. This issue is further discussed in Appendix I of this response. Briefly, if data arc plotted against a relationship such as in Figure 12 and most of the data are near the middle of the pH range, by simple probability the data in the middle are more likely to include values iii the tails of the distribution and thus bulge further down. In contrast, the small number of tests at low and high pH would be less likely to produce results in the tails and the data would cluster closer to the overall mean among taxa. Focusing on the lower boundary of the data in each pH interval creates the appearance that slopes are greater at low pH and smaller at high p1-i than they actually are. This same appearance is evident in the acute data in Figure 10 (1998), but as the simulation in Appendix I demonstrates, this appearance is exactly what would be expected if the pH relationship were absolutely true for all the data. For the chronic data in Figure 12 (1998), it is clear that the data at low pH and high pH are for more tolerant organisms. Thus, EPA believes that the appearance of greater slopes at low pH and smaller slopes at high pH are based on confounding comparisons between sensitive and tolerant organisms. The key issue here is what pH relationship the sensitive taxa should be assumed to follow. The Comments would seem to suggest that even though two chronic data sets are available that show similar relationships to each other and show clear differences from acute data, they are not to be believed and that the acute relationship, because of its larger data set, should be followed, even though it is clearly a worse fit to the available chronic data sets. EPA believes that it is better to use the available chronic relationships. Furthermore, if acute data were used to set the pH dependence for chronic toxicity, there is an acute dataset for the most chronically-sensitive organism, Hyalella. As these comments themselves note elsewhere, the pH dependence for Hyalella is less than the average for all species, which means that even if acute pH data were used, the criteria should still be more restrictive at low pH. To suggest that the average acute pH relationship is a better choice to apply to the sensitive chronic data is not supported by the available chronic data or the available acute data for Hyalella. This does not mean that, once tested, the chronic pH relationships of the sensitive species might not be found to be different than assumed, or even like the acute pH relationship, but lacking this information, EPA believes it is making the best use of the available data. Comment C-29 The organisms that drove the chronic criteria (invertebrates) appear to be less affected by changes in pH than do fishes (Ankley 1995). Based upon Figure 6 in the Update (@25), the pH relationship will sigm/Icantly under-predict the acceptable total ammonia for these organisms. There is no basis to conclude that using the acute toxicity/pH relationship will under-predict toxicity to sensitive invertebrates. This comment is a bit confusing, primarily because the example it brings up actually argues against rather than f using the acute pH relationship. EPA interprets the invertebrates referred to in the first sentence to in include Hyalella, the species studied by Ankley et al., but also fingernail clam, the other invertebrate that drove the criterion. It is correct that Hyalella toxicity is less affected by pH than most fishes and than the average pH relationship. But this is not necessarily true for other invertebrates. Chironomus also is less affected by p1-I than the average, but 24 ------- Lumbriculus is more affected by pH. Daphnia and Macrobrachium seem to follow the average relationship very closely, with a little more flattening at low pH. What might be true for fingernail clams is completely unknown. But at least in the case of the most sensitive organism - Hyalella - the acute data do indicate a flatter pH relationship and it will be assumed that this organism serves as the example here. To state flatly, as this comment does, that the pH relationship under- or over-predicts toxicity is not entirely appropriate: either can occur depending how the relationship is used. (That is, if the slope is flatter than it should be, using it to go from low pH to high pH overpredicts toxicity, but using it to go from high pH to low pH underpredicts toxicity). Nevertheless, since the chronic criterion is heaviLy influenced by tests run near pH 8, EPA understands the commenter’s preference for using a steeper acute slope rather than the shallower chronic slope to extrapolate this data to lower pH, a pH region of interest to many municipal dischargers. But the above comment cites the even shallower slope Hyalella data in Figure 6 of the 1998 Update (or Figure 8 in 1999) as supporting such an action. In fact, the opposite is true. As this comment itself notes, the acute Hyalella data on Figure 6 show less dependence on pH (lower slopes) than does the average acute pH relationship. Therefore, using the average acute pH slope to extrapolate to lower pH will under-predict toxicity (over- predict toxic concentrations) rather than over- predict toxicity (under-predict toxic concentrations) as the Comment suggests. This is shown in Figure 6. The solid lines denote the model fit to just the Hyalella data, while the dotted lines denote the average acute pH relationship. The dotted lines arc near the observed data trends near pH 8, but are above the data at pH 6.5-7.5. This means that acute toxicity at low pH for Hyalella is under- predicted by the average acute pH relationship - toxic concentrations are over-predicted, not under-predicted as this comment claims. This is also evident in comparing the acute Hyalella data to the acute criterion equations - these data lie progressively closer to the criterion at lower pl-I (i.e., Hyalella is more tolerant than other organisms at higher pH, but becomes more sensitive relative to other organisms at lower pH). The fact that the acute pH relationship under- predicts relative toxicity for this organism was pointed out in the Update and it was noted that at low pH this organism is among the most sensitive acutely, at least at low sodium concentrations. Consequently, the facts presented in this comment actually argue against using the average acute pH relationship for describing the chronic toxicity of Hyalella. Comment C-30 Winter, not summer, is the most important period to ensure that the criteria are properly applied The Update acknowledged that low temperature chronic ammonia requirements should be based upon survival endpoints where ELS considerations are not present. There is no technical basis presented to believe that the acute toxicity/pH dependency should d ffer from the chronic pH dependency when both are based on this same effect (i.e., mortality). Response Because survival is the endpoint of concern in cold-season situations where early life stages are absent, the question here is whether (a) the expected chronic pH relationship for protecting survival under winter conditions should be the same as (b) the measured chronic pH relationship (survival, growth, and reproduction endpoints), or (c) the measured acute pH relationship (survival/mortality endpoints only), both from warm temperature tests. Early life stage (ELS) mortality was the main response in the smalimouth bass chronic data set. Even based solely on ELS mortality, the chronic pH relationship is similar to that used in the Update and is substantially different from the acute pH relationship in the same study. 25 ------- Consequently, although the comment does not fully articulate it, the real question is whether (a) the juvenile and adult chronic mortality is closer to (b) ELS chronic mortality, or to (c) juvenile and adult acute mortality, with respect to the pH relationship. Survival versus time curves for ammonia generally show a very rapid acute response which quickly tails off, followed by a gradual prolonged mortality. This pattern suggests two different modes of mortality - one acute and the other chronic. Chronic mortality can involve somewhat different, more systemic, disruptions than acute mortality, which might well be more closely related to growth and other sublethal effects. Consequently, EPA believes that in the absence of the needed studies ofjuvenile or adults to ascertain the pH relationship for their chronic survival, it is not unreasonable to link the their pH relationship to that for ELS chronic survival and growth. Nevertheless, EPA acknowledges the uncertainty, which can only be reduced through additional testing. See also the response to Comment D-05. With the formulation of the 1999 CCC, however, the issue is much less important, perhaps even moot. Where fish ELS are absent, the 1999 CCC is controlled by the Hyalella GMCV and its predicted temperature relationship. The juvenile and adult fish GMCVs are too high to directly affect the CCC. Consequently, it might be of greater interest to conduct testing to determine whether temperature affects the invertebrate pH relationship in any way. Comment C-3 1 The chronic toxicity versus pH data for Ceriodaphnia and smalimouth bass presented in the Update @ 28 more closely fit the acute relationship between pH 7.0 - 8.5, indicating that mod /I cation of the toxicity/pH relationship was unjust fled Analysis of the pooled chronic data indicated that the acute pH relationship generally fits the chronic data as well as the suggested more restrictive chronic relationship (see, Exhibit 3). EPA needs to conduct additional tests before proposing to establish a more restrictive chronic criteria approach. Response First, EPA does not agree that the commenter’s Exhibit 3 presents an “analysis”. This exhibit is a graph comparing the chronic data to the acute pH relationship. It provides no visual or mathematical comparison of the two possible relationships: certainly nothing that demonstrates “that the acute pH relationship generally fits the chronic data as well as the suggested more restrictive chronic relationship”. Second, it is clear from this graph that the acute relationship greatly under-predicts toxicity at pH 7 and below. Even at pH>7, the acute relationship underpredicts toxicity for smallmouth bass by almost two-fold at pH 7.25, and comes this close only by overpredicting toxicity at pH 7.8 and above. For one of the C. dubia sets, similar errors occur. Third, it is true that if the analysis is confined to pH>7, the acute relationship fits the data better than it does for the whole pH range. However, it still provides a worse fit than the chronic pH relationship in this range. Fourth, the criteria cannot restrict itself to pH>7. Lower pHs are common enough that they need to be accounted for in the relationship. Fifth, the data at pH<7 clearly demonstrate a flatter relationship than the acute relationship. This behavior not only reflects on what is happening at pH<7, but also what the relationship should be at higher pH, at least in the pH 7.0-7.5 range. In other words, the clear flattening at pH<7 indicates that some flattening should be present at pH>7. A good relationship tries to smoothly and appropriately account for all of the data. Appendix II of this response presents additional graphs and information that show the improved fit using the chronic pH relationship. 26 ------- Comment C-32 The Update discussion on the pH relationship states thai “it would be speculative to assign d jferent slopes for d fferenr taxa” and that data from relatively insensitive organisms should not impact the criteria derivation. (Update @ 24, 26.) The smalimouth bass are not among the most acutely or chronically sensitive organisms. In numerous places throughout the Update, EPA stated that criteria adjustments should not be made based upon test results from less sensitive species. Therefore, imposition of the chronic toxicity/pH relationship based solely on smalimouth bass was not technically just fIed and adds levels of conservatism to an already conservative approach without demonstrated need. Response This comment misinterprets statements in the Update. The statement regarding assigning different slopes to different taxa being “speculative” was discussed in response to Comment C-05. What was “speculative” was assigning anything other the average acute pH slope to describe acute toxicity for species for which pH relationships were not established, which was true for the most acute sensitive genera. What was done with the chronic pH relationship is compatible with EPA’s statement. It would be “speculative” to assign anything other than the average chronic relationship to the sensitive chronic species. It would be particularly “speculative” to assign the acute relationship to these species, given that the available data give clear indications against this. The assertion that EPA stated that “data from relatively insensitive ox ganisms should not impact the criterion derivations” is a misinterpretation. This statement referred to species that were observed to deviate from the average acute pH relationship at low pH, and simply noted that accounting for such deviations was not important because those particular species were tolerant and therefore not included in the four most sensitive genera used for the final criterion calculation. However, this statement was not trying to imply that data from tolerant species should not impact the criteria deviation in any way. In fact, the data from those species were used in the pH analysis to derive the average acute pH relationship. Again, this is consistent with what was done for the chronic pH relationship - the available data was used to derive an average relationship based on all organisms regardless of their tolerances, and the average relationship is assumed to apply to the more sensitive genera in the absenceof contrary information. EPA also does not believe that “in numerous places throughout the Update, EPA stated that criteria adjustments should not be made based upon test results from less sensitive species.” The Comments have not cited particular statements in the Update relevant to such data usage. Finally, the comment is not correct attributing the chronic pH relationship solely to smallmouth bass. Comment C-33 In summary, the proposed more stringent chronic total ammonia criteria between pH 7.0 to 7.7 is not just /ied based on the available data and will lead to unnecessary nitr!/Icarion requirements, particularly during the winter months when pH is naturally decreased. This is a critical pH range for municipal facilities, and a typical pH range encountered in surface waters throughout the country. The pH relationship should not be skewed to fit pH outside this range to the detriment of dischargers to waterbodies within the range. EPA should not claim that the pH dependent criteria have been well established based upon the acute criteria and then make further more restrictive adjustments based on limited and conflicting chronic data. A (1) single chronic study from a (2) relatively insensitive fish (smailmouth bass) at (3) atypical pH conditions that (4) did not drive the acute or chronic criteria calculation should not be used to modify the pH relationship that was developed from using over a dozen 27 ------- independently conducted tests. Moreover, the critical time period for application of the chronic criteria is in the winter, and there is no scientjfic basis to conclude that organism response to acute and chronic mortality endpoints exhibit a different pH profile during this period. As indicated in the Update document, an empirical approach should be used to establish the pH dependency applied to the criteria given (1) the complex interactions being assessed, (2) the inconsistency of organism responses among the most sensitive organisms, and (3) the lack of sufficient data. In light of the inadequate and conflicting chronic database, the acute pH relationship should be appliedfor pH between 7.0 and 7.7 and more closely reflect the acceptable chronic exposures from Figure 12 in the Update. (See, Exhibit 4, Figure 12 with mod /Ied pH slope.) The pH relationship above pH 8.0 should be flattened out consistent with the available data presented in the Update. Response This comment summarizes points made the previous comments. To summarize the responses to those comments, EPA believes there is a good basis for using the chronic data indicating that chronic pH relationships differ significantly from acute pH relationships, even though the acute pH relationship is based on more data. Again, Appendix II provides some additional analysis regarding this, but these differences were discussed in the Update. It is conceivable that more chronic testing will produce different relationships, just as more acute testing would produce taxa-specific assessments for the sensitive organisms. Any relationship will have some uncertainties, and the best that can be done is to base relationships on the most relevant data, as the Update did. See also the response to Comment D-05. Comment C-34 The discussion on temperature dependence focuses on the more recent research and concludes that “the temperature dependence is incompletely resolved and more research is needed, especially regarding chronic toxicity.” Update @11. This conclusion sho uld have led the Update to acknowledge that the same concerns raised in the 1984 ammonia criteria have never been resolved. The Update should acknowledge that no data confirm that existing EPA-approved un-ionized ammonia approaches are, in fact, under-protective. Response The quoted statement is accurate. This statement and others in the Update, do in fact acknowledge that concerns raised in the 1984 ammonia criteria are not yet fully resolved. But it is unclear what significance the comment places on this. Unresolved issues do not mean that no action should be taken, nor that the Update should not have been done to improve data treatment where possible. At this point EPA is not committing to either agree or disagree that there are no data indicating that some previously approved un- ionized ammonia criteria could be under- protective of aquatic life uses. As guidance, it is beyond the scope of any criteria document, including the Update, to proscribe the range of acceptable regulatory criteria in different states. See also the response to Comment D-06. Comment C-35 The temperature dependent discussion focuses on the 1987 DeGraeve study and concludes that un-ionized ammonia sensitivity increases with decreasing temperature. While these statements are basically accurate, the conclusion that both acute and chronic toxicity expressed as total ammonia is fixed regardless of temperature is clearly not accurate based on the DeGraeve acute data. Total ammonia LC 50 was demonstrated to increase ssgn flcantly from 30°C to 0°C for both channel catfish and fathead minnow. (See, Update Figure 3.) Moreover, the discussion on Arthur (1987) is misleading. That author basically concluded that he could 28 ------- not verify EPA ‘S assumed temperature relationship (See, Appendix A which includes an analysis of the Arthur study which demonstrates there is a total ammonia toxicity- temperature dependence.) Response Regarding the DeGraeve study, EPA does not agree that the “total ammonia LC5O was demonstrated to increase significantly from 30°C to 0°C for both channel catfish and fathead minnow”. Although the LC5O increased somewhat with decreasing temperature, standard linear regression techniques conducted on the DeGraeve fathead data showed that the slopes could not be judged to be significantly different from zero at the 5% confidence limit, given the trends and uncertainty in the data. Any dataset, even if the real slope is zero, will generally have some trend up or down. For the channel catfish data, there was a trend significant at the 5% level, but this was the only dataset of many which showed a significant trend, including others using this species. When separately examining many datasets, it is not unexpected that random variations will show “statistically significant” effects even if there was no real trend. In any event, trends should be based on all available data, not just selected data sets. This does not mean that there are no real trends with temperature, at least for some species, but rather that the data as a whole do not support the use of a particular trend for fish. Regarding the Arthur study, the Update presented a factual summary of what the tests for each species showed, including recognition that only three of five fish followed the temperature relationship presented in the 1984 document and that the insect data showed a different temperature relationship. If any of this presentation was misleading, some specifics should have been given. Arthur et al. did not state any overall conclusion about EPA’s 1984/1985 temperature relationship. Comment C-36 The purpose of the acute toxicity/temperature dependence regression was to evaluate temperature impacts on toxicity. Therefore, the data relevant to this evaluation required species testing over a sufficient temperature range to test the hypothesis. Other confounding factors (e.g, pH changes) needed to be minimized during the tests (i.e., only change one dependent variable). The acute criteria pooled data analysis (Update Figure 4) inappropriately included many studies that failed to conduct tests over a sufficient range of temperature to ensure that test variability was not a confounding factor. Therefore, ab initio, the analysis framework selected in the Update would be unable to determine a definitive trend. Numerous studies were conducted over less than a 10°C d(fference, and many exhibited excessive variability at identical temperatures. (See, Thurston and Russo [ 1983] where acceptable total ammonia levels variedfrom 15 to 50 mg/i at 12°C.) At least a 20°C range in the testing with a number of intermediate exposures should have been required, and tests exhibiting excessive variability should have been screened out of the temperature effects analysis as likely indicative of other factors influencing test results. Response This comment, and the following two, address the analysis in the Update of the temperature dependence of acute ammonia toxicity to fish. This comment also presents a conception of what is desirable in data sets for temperature effects. This includes a wide temperature span, minimization of confounding variables, and low variability of results at any particular temperature. Qualitatively, these are all worthy attributes, and whether certain datasets with limited temperature range, high variability, and/or few data inappropriately influenced the results is a legitimate concern. However, addressing this question requires specific consideration of what the effects of various types of datasets probably are. The questions that should be asked is (a) whether a data set, 29 ------- even if it has a limited range or size, or has some variability, will still make a net positive contribution to the overall analysis, and (b) whether that contribution is appropriately weighted relative to other data sets. EPA does not agree with the comment about what data sets are appropriate for inclusion in the overall analysis. Appendix ill of this response presents analyses which show that the pooled analysis used in the Update is an appropriate framework for integrating diverse sets, that the type of data sets used in the Update provide positive, appropriate contributions to the analysis and should not be excluded, and that the Comments are wrong that restricting analysis to a few sets with a broader range, and ignoring other data, gives better estimates. In this comment, it is asserted that a data set should be included in the analysis only if it has a range of at least 20 C. No justification is given for this specific range and no specific rationale is given for why certain data sets used in the Update should not be used. It is suggested that tests should cover “a sufficient range of temperature to ensure that test variability was not a confounding factor.” EPA, however, believes that test variability per se is not a confounding factor. If the test variability is due to factors that are correlated with temperature, then these factors are confounding (unless temperature were viewed a surrogate for them). This is the reason that pH correlations with temperature were addressed in the Update. Test variability will reduce the power of the test to detect effects, but this is entirely appropriate and in keeping with the nature of the data, and does not preclude data sets with a limited temperature range, such as that of Thurston and Russo, from contributing useful information. The analysis in Appendix III of this response shows that such sets make an appropriate contribution to the analysis. Comment C-37 Use of regression analyses and then extrapolating beyond the domain of the experimental data (by pooling results) is not statistically appropriate when only aftw closely spaced data points exist in the individual studies. The regression lines through the individual data sets are highly speculative given the limited data and limited temperature ranges. Pooling the data does not increase the certainly that no correlation with temperature has been demonstrated. Response This comment questions the practice of using data sets with few data points and a limited range as part of the temperature analysis. EPA agrees that the individual regression lines through such sets are uncertain, but disagrees that including such data sets in a pooled analysis is inappropriate. The pooling that EPA did here is no different than the pooling the commenter did to get an average slope in his Appendix A. The fact that some of the data sets cover only a small part of the range being analyzed does not make their use inappropriate. Whatever its range, each data set provides an estimate for the slope, which will be weighted in accordance with its uncertainty, so that small data sets with limited ranges will not contribute much to the overall estimate for the slope. But they still make a positive net contribution to information and do on average improve the overall estimate of the average slope, and the comment is wrong to assert that pooling of such sets does not on average improve overall uncertainty. The analysis in Appendix III provides a demonstration that such sets should be included. Comment C-38 Several tests used in EPA ‘spooled analysis contained an excessive number of replicates that gave undue weight to the single test result (e.g., Thurston Fathead Minnow . 1983). Other data such as Cary (1976) were so variable at identical temperatures as to render analysis of temperature effects impossible. With respect to the studies that were spec flcally designed to 30 ------- test for a temperature relationship (West and DeGrave), it is apparent that the allowable level of total ammonia does increase with decreasing temperature. Appendix A presents an analysis of the available data that could reasonably be used to evaluate temperalure/ toxicity trends. This analysis supports the Inclusion of a temperature adjustment to the acute (and therefore chronic) criteria. Thus, the conclusion that total ammonia toxicity should remain constant regardless of temperature appears misplaced. At least a 50% increase appears Just (/Iedfrom the more extensive studies for temperatures ranging 25 - 0°C. Response This comment, which asserts that some data sets with many replicates are unduly weighted in the analysis, provides an opportunity to explain how data influence the pooled regression analysis, and why EPA believes that such influence is appropriate. It is true that data sets with more data will be weighted more in the analysis, but unless there is some independent objective criterion to weight sets differently, this is proper because more data provide more information on the parameters being estimated. However, in the pooled analysis the actual influence each data set has on the slope is not just a matter of number of data, but their range and variability. The Thurston fathead minnow study is one of the data sets that the comments earlier noted has a limited range and high variability. As such, the uncertainty in its slope is increased relative to data sets with greater range and/or less scatter, and thereby its weight is less th n the number of data might indicate. As Appendix III demonstrates, the type of analysis and data sets used in the Update do combine into appropriately weighted estimates of the slope. The factors of concern to the commenter, range and variability, do in fact affect the influence a study has on the pooled slope. Thus, the Cary data set, which the comment terms “impossible” to analyze because of its variability, can be analyzed by the same techniques used in the comnienter’s Appendix A, and it validly assists in estimating the slope. The variability of the data might make that its slope more uncertain than those other data sets, but it still can make a legitimate contribution to the overall average slope, if weighted appropriately to its uncertainty. Besides, the variability of this set is not particularly great compared to general variability found in many toxicological evaluations, including the data sets of West et al. used in the commenter’s Appendix A. Excluding one low point, the range of the data is only a factor of two once the data are adjusted for pH. Again, Appendix III includes examples that show that the overall certainty of the slope estimate will be improved by inclusion of such data sets. This comment finally asserts that the data sets of West et al. (actually finally published as Arthur et al.) and of DeGraeve et al. constitute acceptable sets and refers to analysis in Appendix A which proposes temperature relationships based on this data (although oddly the fathead data from DeGraeve Ct al. is not used in this analysis). This Appendix is critiqued below. Briefly, this analysis does not provide appropriate data selection or treatment, and does not demonstrate a temperature relationship with nonzero slope is justified for acute toxicity to fish. Additionally, the assertion that an acute adjustment automatically establishes the same chronic adjustment is unfounded. If the commenter asserts that the data of DeGraeve et al. establishes a significant acute relationship that warrants relief at low temperature, then to be consistent the commenter should also accept that the chronic data of DeGraeve Ct al. indicate a need for a more restrictive relationship for chronic toxicity. Comment C-39 Fathead minnow (ident fled as a sensitive species) repeatedly has been confirmed in 31 ------- acute tests to tolerate higher total ammonia levels at lower temperatures. (See, Thurston, DeGraeve, and West) (DeGraeve ‘s chronic study offathead minnow did not counter this position as two critical test results (15 and 20°C) did not produce an EC2O endpoint.) The available minnow data would support afactor of 1.5 to 2.0 increase in the total ammonia- based criteria as temperatures decrease to 0°C. Response EPA does not agree. The comment cites Thurston’s work as demonstrating effects of temperature, yet when pH effects are accounted for (as the commenter earlier noted how important confounding effects might be), the slope of the data is virtually zero. This comment is also inconsistent with earlier assertions that this study did not have a significant enough temperature range. DeGraeve’s data shows a slope for fathead minnows, but it is at most a factor of 1.25 over a 25°C range (not 1.5 to 2.0), and statistical tests do not show that this is significant. West’s data (published as Arthur et al.) is caveated by its own authors regarding a variety of confounding factors that might be responsible for the effects. The claim that the temperature effect is a factor of 1.5 to 2.0 rests almost solely on a single test in Arthur et al. at extremely low temperature which even the commenter acknowledges (in Appendix A) might inappropriately skew the trend. EPA does not agree with the assertion that the chronic data of DeGraeve does not counter the commenter’s position. The fact that the tests at 15 and 20°C did not produce results does not prevent at comparison of the high (25 and 30°C) and low (6 and 10°C) temperature results. If the comment is correct that fathead minnows are more tolerant at low temperatures, this effect should be apparent in this data, regardless of the 15 and 20°C tests. In fact, the apparent effect is in the opposite direction, and EPA does not agree that an objective analysis can ignore this data. In fact, the temperature trend lines for both acute and chronic toxicity are not statistically significant and the Update treated both as no effect. Interestingly, the acute-chronic ratios do show more of an effect of temperature than either the acute or chronic data alone, reflecting the opposite trends of the acute and chronic data. Comment C-40 EPA ‘s conclusion that temperature effects need to be “sufficiently large” to be included in the criteria derivation is not an appropriate basis for decision-making. The 1984 ammonia criteria included a temperature effect even though it was relatively small (about afactor of 1.5). This amount of criteria adjustment is also the same range of effect encountered in dissolved versus total recoverable metals criteria. This level of increase is very sign /lcant to municipal dischargers and could easily mean the d /Jerence between the need to construct nitr /Ication facilities and not having to do so (e.g, permit limit of 10 mg/I versus 15 mg/i in the winter). Because conservative assumptions have a multiplicative effect in the calculation ofpermit limits, it is essential that all relevant adjustments be included regardless of how small they seem individually Response EPA did not conclude that temperature effects need to be “sufficiently large” to be included in the criteria derivation. The criteria for inclusion was whether a statistically significant effect could be demonstrated. At one point, EPA used the phrase “not particularly large” to simply describe in general terms the temperature dependence of some data sets, but this clearly did not relate to any decision criteria. The Update also noted the size of possible temperature effects when uncertainties represented by the adopted relationship were discussed, but again this was not the basis for any decisions. It should finally be noted that the temperature dependence for total ammonia in the 1984 criteria was at most a factor of 1.2 (not 32 ------- any decisions. It should finally be noted that the temperature dependence for total ammonia in the 1984 criteria was at most a factor of 1.2 (not 1.5) over the temperature range from zero to the TCAP, and usually less. Nevertheless, in response to comments EPA has re-evaluated the invertebrate temperature dependence, and has for 1999 produced a temperature dependent chronic criterion. Comment C-4 1 The conclusion that the criteria should not be temperature dependent is based upon studies of fishes. The Update acknowledges that invertebrates do no: follow this pattern and become less sensitive as temperature decreases (i.e., afixed un-ionized ammonia level appears appropriate for invertebrates). The manifest problem with the Update is that two invertebrates not fishes, controlled the chronic criteria derivation. Thus, using the acute fish data to claim that the invertebrate-driven chronic criteria could not be made less stringent as temperature decreases is plainly erroneous. The fish and invertebrate data sets should have included the relevant adjustments in calculating the criteria. Response EPA agrees that this comment raises legitimate concerns about temperature dependence. The available acute data for invertebrates do indicate decreased sensitivity at low temperature and the 1998 Update recognized this. The 1998 Update did not “claim that the invertebrate-driven chronic criteria could not be made less stringent as temperature decreases”, but rather clearly indicated the need to make modifications at lower temperatures and provided a policy statement encouraging this. Nevertheless, for the 1999 revision of the Update, EPA recognized the need for a more thorough analysis of the available data, and has in fact generated a temperature-dependent CCC, based in part on the invertebrate temperature relationship. Comment C-42 Appendix A [ of the commenter ‘s submission] contains a reevaluation of the relevant chronic criteria test results for the four most sensitive species, utilizing the available data on temperature impacts for the most sensitive organisms. That analysis confirms that the recommended criteria should be adjusted for temperature between 25 - 15°C as the sensitivity of invertebrates to ammonia changes sign flcantly over this range. For example, total ammonia sensitivity is expected to change by afactor of 3.3 over this range for the “most sensitive” amphipod (Hyalella) (assuming us used in the criteria derivation). At 15°C, Hyalella is not even among the four most sensitive species. Based upon a revised analysis that properly applies the temperature effect to the relevant species and uses a pH dependency based on the acute relationship, the total ammonia (as N) chronic criteria (N = 12, ELS present) is best described as follows: Temp. 8. Osu 7. Ssu 25°C 1.40mg/i 3.30mg/I 20°C 3.1 mg/I 7.4 mg/i 15°C 3.5 mg/i 8.5 mg/i This analysis presumes that the questionable Hyalella data remain in the database. Thus, the above recalculation of the criteria should be considered conservative. In summary, the acute and chronic criteria, expressed as total ammonia, should increase as temperature decreases. The acute criteria should be adjusted upward at low temperatures by at least a factor of 2.0 (between 25 and 5°C), consistent with the 1984 criteria analysis and the most recent studies. The chronic criteria should be increased even more (without considering the early i fe stage issue) because invertebrates are much less sensitive as temperature decreases (factor of 7 between 25-0°C). Making all of these adjustments that are supported by the data would substantially 33 ------- increase the base criteria (25°C, pH 8.0, ELS present) under cold weather conditions even ELS are present. Response Conceptually, this analysis has merit. It incorporates some of the considerations that the 1998 Update suggested for low temperature, but includes other considerations as well that make it applicable to all temperatures. However, this analysis also has many shortcomings and questionable features, which will be discussed in detail subsequently. Consequently, EPA does not agree that the values presented in the comment are appropriate for its national criteria. Comment C-43 [ The commenter ‘5] normalized data plots include a trend line in Cartesian coordinates, determined by least squares regression analysis. Response The commenter’s regression model presented on his page A-2 is appropriate and is the same used by the EPA in the Update. The use of least-squares linear regression with this model (after log transforming it) and dividing by the estimated LC5O at 20°C to normalize the data after the initial individual regressions are also appropriate. Nevertheless, EPA notes the following concerns: (a) No mention is made of the statistical significance of these regressions. Datasets with so few data will often have a substantial mean trend just from random variation. Appropriate statistical tests are needed tà determine if the mean trend is not attributable to random chance. (b) After conducting individual linear regressions of logLC5O versus temperature and normalizing the data based on these regressions, the commenter plotted data on linear axes and the regression of the normalized data is conducted of LC5O versus temperature, not logLC5O versus temperature. This is yields certain problems. First, it violates the assumption of homogeneity of variances required in least squares regression. In the absence of information to the contrary, the dependent variable should be logLC5O. Second, the regression to determine a common slope for this data should be run on unnormalized data using standard pooled regression technique. Normalizing data based on initial regressions and then running another regression which builds on the earlier regressions can increase error. Third, a linear relationship of LC5O versus temperature would not seem to be a reasonable a priori assumption. Temperature relationships in biology are more commonly power functions such as used in the individual regressions. Furthermore, a linear relationship of LC5O versus temperature here results in very low LC5Os just a couple of degrees above the 25°C, going to negative LC5Os at 29°C and above. This raises questions as to the appropriateness of the use of the arithmetic scale for LC5Os. The arithmetic scale causes a steep two-fold change in LC5O between 20 and 25°C. Nevertheless, despite EPA’s preference for the log scale for the LC5Os, EPA does recognize that within the range of values of the observed data the linear scale does provide a reasonably good fit for the data. Comment C-44 [ The commenter ‘s] trend line for [ three] invertebrate [ species] shows sensitivity decreased by a factor of six from a temperature of 25°C to approximately.5°C. Response This and other data do in fact make a case for acute sensitivity to invertebrates becoming greater with increasing temperature, and was so acknowledged in the 1998 Update. As noted in the previous comment, however, EPA believes a regression with log LC5O is better a better approach than with LC5O (arithmetic scale.). Also, the regression analysis to determine the mean trend should more properly be conducted on the un-normalized data. (Normalization 34 ------- should be just a method for better displaying the data.) The regression analysis should also take into account known uncertainties in the LC5O estimation if at all possible, not relying just on residual errors, which can underestimate true error. Beyond the regression technicalities, however, the key issue here is the implicit assumption used in the Comments that acute toxicity and chronic toxicity temperature relationships are the same. In fact, there are good reasons to expect less decrease in chronic toxicity with decreasing temperature. First, in the DeGraeve fathead minnow data, the slope on a total ammonia basis is -.0052 for acute (a factor of 1.35 higher at 5°C than at 30°C) but +.0057 for chronic toxicity (a factor of 1.39 lower at 5°C than at 30°C). The difference between acute and chronic toxicity therefore increases by almost a factor of two over 25°C. Importantly, even though separately the trends in the acute and chronic data cannot be said to be different from zero with 95% confidence, the acute and chronic slopes can be said to differ from each other with 95% confidence. Second, that acute toxicity is more protected by low temperatures than chronic toxicity is expected based on the physiology of toxicity. Part (not all) of the effect of temperature will simply be to slow various processes down, delaying (beyond the end of the test) rather than eliminating toxicity. Such an effect will be greater for short duration tests,and thus most evident in acute LC5Os. For these reasons, simply applying acute temperature relationships to chronic toxicity is questionable. Therefore, although EPA believes that invertebrates will-be more tolerant on a chronic basis at lower temperatures, it does not believe that the trend is quite a strong as indicated in the commenter’s Figure 9. For the 1999 Update EPA used log LC versus temperature plots, and projected the chronic invertebrate temperature slope to be the invertebrate acute slope minus the fathead minnow ACR temperature slope. (In general, for any data set the acute slope minus the ACR slope will equal the chronic slope.) In doing that, the key assumption EPA is making is that the ACR slopes for fish and invertebrates are the same. The ACR is mathematically or numerically related to the kinetic coefficient describing the rate at which toxicity occurs. Kinetic coefficients vary with temperature; chemical texts often present a “rule of thumb” for estimating how a 10°C temperature change will affect rates of a variety of chemical processes. Equating the fathead minnow and invertebrate ACR versus temperature slopes is thus akin to using the measured fathead minnow ACR slope for defining the rule of thumb on how temperature affects the relevant toxico- kinetics for a variety of species. Comment C-45 The [ commenter ‘ 5] trend line for vertebrates exhibited reduced temperature sensitivity as compared to invertebrate data, and the results were variable for various test species. Of the five test species, only Pimephales exhibited a large desensitivity at low temperature. The effect is relatively small when Pimephales is excluded Response The comments do not report on the statistical significance of the slope in Figure 10. Additionally, this slope is affected by regressing LC5O rather than log LC5O against temperature, because points are not necessarily weighted appropriately. In any event, of these five fish species, four show essentially a flat relationship and one shows a relationship that depends almost entirely on one point at low temperature. Without that point, the pooled relationship is flat and with that point the relationship is still not statistically significant, with just the one large residual at low temperature suggesting the possibility of a temperature effect. 35 ------- Comment C-46 DeGraeve et at. (1987) studies on channel ca JIsh suggest that this species exhibits a two- fold decrease in ammonia sensitivity as temperature decreases from 30°C to 0°C. Response EPA does not agree. First, there is no analysis or presentation to demonstrate that there is a significant effect of temperature in this data, other to assert a factor of two. Second, the comments do not specify the form and slope for whatever equation they propose to apply to channel catfish. Third, although the commenter previously emphasized the data of West/Arthur et al., the comments omit mention that West/Arthur’s slope for channel catfish is virtually zero in the figure appearing in the comments. Fourth, the comment ignores the fact that, for fathead minnow, the DeGraeve study also had a slight negative slope with temperature for acute toxicity, but a positive slope for chronic toxicity. EPA does not see any justification for applying the acute channel catfish temperature relationship to chronic toxicity. Comment C-47 [ In adjusting for the influence of temperature on invertebrate sensitivity, the commenter made no adjustment for life stage differences.] Response EPA agrees that no adjustment for life stage or endpoints is needed. Regarding the two most sensitive invertebrates, the endpoints and lifestages tested are arguably relevant to much lower temperatures. For the fingernail clam, the tests were of juvenile survival, which would be of concern for lower temperature seasons. For Hyalella, the most sensitive endpoint was reproduction, but the effect concentration used was also an EC2O for survival in ten week tests starting with juveniles, and six week tests starting with adults showed similar mortalities to juveniles at higher concentrations. Thus the effect concentration does relate to endpoints and life stages relevant to lower temperatures. This does not eliminate the need.to correct for any effect of temperature on these chronic endpoints, but it does establish a basis for saying that a life stage correction is not necessary. Comment C-48 The commenter has offered an analysis, which EPA has summarized below, including a few inferences where procedures were not clearly specified. [ (a) For all temperatures, invertebrate toxicity was adjusted for temperature using the relationship in Figure 9. Based on table values, slope is apparently -0.34 mgJL/°Cfor Hyalella, -0.39 mg/L/°Cfor Musculium, the steepness increasing with increasing tolerance.] [ (b) For temperatures for ELS present, fish chronic toxicity (except for channel ca flsh) was adjusted using the relationship in Figure 10 with the low temperature fathead minnow point excluded Based on table values, slope is apparently -0.080 mg/LI° C for Lepomis, -0.0092 mg/L/° Cfor Pimephales, the steepness increasing with increasing tolerance.] 1(c) Channel ca flsh chronic toxicity was adjusted by an incompletely spec /Ied temperature relationship. Based on table values, relationship is of log LC5O vs temperature with slope of -0.0101°C.] 1(d) When early fish life stages are absent, life stage adjustments included setting the bluegill chronic value was set to 15 mg NIL, not adjusting the channel ca ’/Ish at all, and setting the chronic value for other fish species to three-fold their early life stage value. No temperature relationship was apparently applied in this temperature range other than for channel catfish, but the multiplier was apparently applied to whatever temperatures the tests were at, not to some common temperature, which makes these values have 36 ------- inconsistent ratios relative to the temperature- dependent values with ELS present.] [ (e) The border between ELS present and absent was set at 15° C.] [ When fish early life stages are present, these procedures resulted in a chronic criterion at 25°C similar to the Update value, but increasing at 15°C to slightly more than twice the Update value. This increase is almost entirely due to the temperature relationship assumed for invertebrates, but there is also a slight effect due to the temperature dependence assigned to fish.] [ When fish early life stages are absent, these procedures result in a chronic criterion at 15°C about three times the Update value and at 0°C about six times the Update value. These increases are due to both the assumed temperature dependence for invertebrates and the assumed I fe stage dependence for fish. At 15° C, these criteria match the three-fold relaxation suggested in the Federal Register, and are about twice as high at 0° C.] Response The fundamental outline of the framework is reasonable. Rules are adopted for the temperature dependence of different endpoints and for how relevant endpoints shift among different temperature ranges. These rules are used to calculate a set of chronic values at each temperature, which in turn are used to calculate the chronic criterion at that temperature. An abrupt change occurs at the transition temperature for ELS presence. However, as already discussed, many of the rules or relationships applied by the Comments are of questionable validity. The data do not support using any temperature adjustment for chronic endpoints for fish. The temperature relationship for chronic toxicity to invertebrates is likely exaggerated due to the use of a questionable regression model and analysis, and not considering the applicability of acute temperature relationships to chronic toxicity. The estimated LC2O for chronic juvenile bluegill survival is likely high. EPA believes that with more reasonable relationships, the increases in criteria at lower temperatures would be less at most temperatures. One particularly important issue is the transition temperature between fish early life stages being present and absent. The comments present no justification in terms of fisheries biology for the selection of 15°C, and EPA does not know of a rationale for this type of selection. For this reason the 1999 Update does not assume a fixed temperature threshold for delineating the presence or absence of fish ELS nationwide. Even among warm water fisheries this temperature may vary somewhat, depending on ecoregion. Comment C-49 The National Guidelines require the calculation of non-ELS criteria given that the basic assumptions used to derive the criteria are inapplicable during the winter. Hyalella andfingernail clam data tested at 25°C, which drove the chronic criteria calculation, are clearly not applicable to low receiving water temperatures. Response EPA does not agree with this interpretation of the National Guidelines. The National Guidelines specif that chronic criteria should be based on tests which include early life stages, but do not indicate that such criteria should be modified when early life stages are not present. Nevertheless, EPA agrees that the ammonia chronic criterion can and should take into account this factor. The 1998 and 1999 Updates do provide for less restrictive criteria at low temperatures. Comment C-50 The applicability of mod fled criteria should not require a determination that no fish spawn in the winter First, fish did not drive the 37 ------- chronic criteria calculations. Second, f non- sensitive species spawn in significant numbers in winter months, the available data confirm that less restrictive criteria should apply. Response EPA agrees that presence or absence of early fish life stages should not be the only factor affecting temperature-related adjustments. EPA has made changes such that the 1999 Update’s CCC provides for a temperature dependence even when ELS are present. Comment C-Si States should not be prevented from adopting reasonable winter criteria based upon different interpretations of the data (which EPA readily admitted was incomplete). The Update indicated that up to a seven-fold increase in the criteria was supported by the test results to sensitive fish under low temperatures. Response EPA’s criteria documents present EPA’s recommended criterion. They do not set a binding norm. They do not prohibit other alternatives or define the acceptable range for such alternatives. In fact, the information provided in the 1998 and 1999 Updates was intended to assist in efforts to develop and evaluate alternatives. With regard to the second sentence above, the Update did not indicate that such an increase in criteria was supported by the cited test results. Although the 1999 Update does not estimate any fish GMCV below 8.8 mg NIL when ELS are absent, the temperature-adjusted Hyalella GMCV remains well below this value. Furthermore, the National Guidelines generally set the criterion below the lowest GMCV in this sized data set, as the commenter recognizes in Comment C-48. Comment C-52 EPA should not place the burden of resolving criteria issues on states and municipal entities when EPA itseiffailed to conduct sufficient research to resolve the ammonia winter criteria issue (as it committed to do 14 years ago). The Agency ‘s implementation policy constitutes an “unfunded mandate “for municipal entities to conduct long overdue federal research. Such a mandate must be reviewed pursuant to the federal Unfunded Mandates legislation. EPA ‘s attempt to provide relief under winter conditions is, for all practical purposes, no relief at alL To allow any adjustment, a comprehensive biological study would be required, which is beyond the means of many small municipal entities. Rigorous analysis of theoretical impacts should not be required where it is apparent that (1) the data in the criteria document fully support less restrictive criteria, (2) the receiving water is generally incapable of sustaining sensitive aquatic life (e.g., seasonal ammonia discharge from a small lagoon to an intermittent stream), or (3) sensitive life stages will not exist. Response Because the Unfunded Mandates Reform Act (UMRA) applies only to rules and not to guidance documents, the UMRA does not apply here. The comment does raise legitimate issues about how much biological information should be needed under different circumstances. For the 1999 Update EPA has dropped the recommendations about follow-up biological surveys, although EPA continues to encourage States and Tribes to use biological surveys for water quality assessment, including that for water receiving ammonia discharges. Nevertheless, because the criterion has different values when fish ELS are present and absent, States and Tribes should reasonably be able to define these periods in order to appropriately use this provision. EPA believes that it is more efficient to approach this on an eco-region-wide basis rather than a site-specific basis. 38 ------- In the Federal Register Notice announcing its 1999 Update EPA is providing additional implementation guidance on this issue. EPA is not expecting States, Tribes, or municipalities to perform intricate analyses of voluminous data. EPA believes that much of the needed information on spawning temperatures or periods and subsequent development periods is already known by state fisheries biologists. EPA favors a straightforward approach using available data and expert opinion. It should be noted that the 1998 Update recommended a 200 percent increase in the chronic criterion concentration during cold- season periods with fish ELS absent. In contrast, the 1999 Update, because the temperature dependency was built into its fish ELS-present formulation, provides no more than an additional 62 percent increase when fish ELS are absent. At temperatures greater than 7°C the percentage increase is less, and at temperatures greater than 14.5°C, the criteria concentrations for fish ELS present and absent are identical. Consequently, for properly implementing the fish ELS provision, it is most important that States and Tribes not misclassif ’ fisheries with fall- or winter-spawning salmonids as warm- water fisheries. Of intermediate importance is the proper identification of temperatures or dates for the onset of spawning for cool-water spawning fish. Relatively least important are the temperatures or dates for spawning of warm-water spawning fish, because the uncertainties here have relatively the least potential for substantially affecting the pollution control decision. Comment C-53 In this draft Update, EPA confirms that thirty- day averaging is acceptable so long as the four-day average value is not more than two times the thirty-day value. In reaching this conclusion, EPA relied upon a ‘/Ield study” conducted by the Agency’s Monticello Laboratory (actually art ylcial streams) with variable pH and temperature conditions, as well as limited laboratory data Response The Monticello data (now appearing in Appendix 8 of the 1999 Update) were not used to set the four-day averaging period or how much higher the criterion could be for that averaging period. The Monticello data were used to determine at what concentrations, relative to criterion concentrations, effects were observed in these streams. As such, the averaging periods used in the criterion were used in summarizing the stream data to provide an appropriate basis for comparison. The Monticello data were cited as support that the 4- day 2.0-2.5 x CCC provision was desirable, but was not the basis for setting the values for this provision. Rather, the basis for this provision is discussed in the 1998 or 1999 Update section titled “CCC Averaging Period” or “Chronic Averaging Period.” Comment C-54 Based upon this information, EPA indicated that seven-day averages could be 2.5 times the thirty-day no effect level without causing demonstrable impacts but, for “consistency,” EPA lowered both the averaging period (to four days) and the acceptable variability (to 2.0). (Update @ 78- 79.) Response See the response to Comment C-09. The basic issue here is that averaging periods cannot be equated with test periods, which merits some further explanation to help responses to this arid later comments. Several general properties of laboratory toxicity tests and results should be recognized when considering how to apply them to field data and how to set averaging periods. First, laboratory tests are generally conducted under fairly stable conditions with regard to toxicant concentration and other environmental variables, at least relative to typical field conditions. Second, tests 39 ------- are also generally conducted for fixed, standard test periods and the endpoint is expressed relative to the end of that period. Third, many studies have demonstrated that if concentrations fluctuate significantly during the test period, effects are generally greater than if concentrations are held constant at the same mean value as the fluctuating concentrations. Fourth, many endpoints do not need the entire test period to be affected, so that the test could be much shorter and still elicit the endpoint and result in the same effects concentration. Fifth, when tests of different lengths are run, effects concentration do not vary in inverse proportion to the test length; for example, if a 96-hr LC5O is 1 mg/L, for most toxicants the 24-hr LC5O would not be four-times the 96-hr LC5O, but generally much less, and in some cases no different than the 96-hr LC5O. The length of toxicity tests therefore should not be equated to an appropriate averaging period for concentrations derived from that test, simply because an averaging period equal to the length of the tests allows for concentration fluctuations which can elicit greater effects than the laboratory test, while still having the same average concentration as the laboratory test. For example, if a 96-hr LC5O is I mg/L, concentrations averaged over this period could fall below I mgfL, but contain periods with much higher concentrations, which would elicit greater effects than intended. The averaging period should be set small enough so that concentration fluctuations expected within the averaging period are not great enough to cause undesirable effects. However, averaging periods should also not be made smaller than necessary, because they will restrict the long- term mean exposure, which should not be lower than necessary. Another aspect of averaging periods that must be considered when comparing them to toxicity test lengths is that toxicity tests are for isolated exposures - with a beginning and an end - whereas field exposures typically are not. An averaging period does not circumscribe an isolated exposure, but rather pertains to the worst exposure in a longer time period, which would generally be preceded and followed by exposures below, but still near, the criterion concentration. Thus, in the example above, if the averaging period is made to be 24-hr rather than 96-hr, this is not the same as an isolated 24-hr exposure, but of an exposure that averages 1 mg/L over the worst 24-hr period and probably is a substantial fraction of I mgfL for a much longer period. It should finally be noted that the derivation of an alternative averaging period is acknowledged by the Guidelines, although they do not set forth specific procedures for deriving either the default national value or alternative values for the averaging period. In its 1998 and 1999 Updates EPA is recommending a chronic averaging period 7.5 times the default Guidelines value. For the usual four-day averaging period they recommend a criteria concentration 2 or 2.5 times greater (in 1998 and 1999 respectively) than the CCC obtained through the Guidelines. Comment C-55 Municipal faci lities are required to receive thirty-day and seven-day average permit limits unless it is impracticable to derive such limits (see, 40 CFR § 122.45(d); accord, In Re: City ofAmes, Iowa, NPDES Appeal No. 94-6, April 4, 1996). Thus, use ofa four-day average approach is not consistent with applicable NPDES rules. Response EPA does not agree that the criteria averaging period is to be delimited by the permit averaging period. NPDES rules are not germane here. The averaging period of criteria is and should be set for biological reasons, and can be and have been appropriately translated to other periods under various flow situations. The Technical Support Document for Water Quality-based Toxics Control covers this subject area. In addition, EPA can provide individual technical assistance to states and tribes that have 40 ------- questions about appropriate translation procedures. Comment C-56 When long-term life cycle tests (sixty days or greater) are used to calculate “no effect” levels and then set as maximum thirty-day exposures, a safety factor is built into the criteria derivation process. (See, Update discussion @ 75 that Diamond’s 21-day and 14-day test results under-predict chronic impacts because longer exposures cause greater impacts for equivalent concentrations. This is basic dose/response toxicology.) Borgmann ‘ 5 Hyalella chronic toxicity study indicated that the four-week EC5O concentration was approximately nine times higher than the ten-week results. Adding a further variability limitation without quantifying the safety factor already incorporated into the thirty-day average chronic criteria is inappropriate. Response While it is true that effect concentrations will generally decrease with duration, EPA disagrees with the comment in several ways. (a) The comparison using Borgmann’s data is inappropriate. The four-week LC5O is only slightly higher than the ten-week LC5O (0.95 versus 0.77 mM). They are both much higher than either the LC20 or the EC5O for reproduction, but those differences are a matter of endpoint, not duration of exposure. Therefore the cited factor is not germane to the issue at hand. (b) Few of the tests in the chronic database in the Update are of durations of 60 days or longer, although the two most sensitive organisms have tests with this duration. (c) The ammonia chronic averaging factors do not necessarily, or even likely, impose significant safety factors. Reducing averaging periods will reduce long-term allowed concentrations, but this does not necessarily provide a margin of safety with respect to the actual desired level of effect . In fact, as explained above, when concentrations fluctuate substantially, the averaging period needs to be smaller than the test period to prevent greater effects than are represented by the concentration derived from the laboratory tests. In this regard, a 30-day period relative to a 60-day period is not a large restriction. If concentrations do not fluctuate enough to be of toxicological concern, the issue is relatively moot, since 30-day averages will be near the 60-day averages, so again there is no significant safety factor. In fact, if exposures are relatively constant beyond 60 days, effects could possibly be even greater than intended because of the “basic dose/response toxicology” noted in this comment, and a 30-day averaging period provides no margin of safety at all. Comment C-57 EPA has repeatedly stated that field results should not be used to generate national criteria because of the uncertainty associated with the actual exposures, variable pH, and other critical water quality factors (e.g., dissolved oxygen, ‘temperature, predation, disease, etc.) that are impossible to accurately quant fr. Relying on the Monticello laboratory result to generate criteria is inconsistent with EPA ‘s own conclusion that such data are inherently unreliable. Response EPA does not agree. In fact, the.Güidelines do provide for consideration of field data, and such information has been used to set criteria, although the Update repeatedly states that Monticello data were not directly used because they were field data. Care must be taken in applying such data for the reasons given in this comment, but EPA has not made any absolute policy or conclusion as asserted here. Such an absolute policy would be irresponsible, since it would preclude use of data which could be relevant. Furthermore, as previously stated, the Monticello data were not used to set the averaging period. They were mentioned in the 41 ------- section regarding the chronic averaging period, but only as part of various data that might reflect on the consequences of different averaging periods. The averaging period and factor were based on certain laboratory data. Comment C-58 National criteria recommendations are only supposed to be developed when sufficient reliable information is available. The Averaging Period section of the Update is replete with statements that insufficient data are available to provide a defensible short term averaging period recommendation (“Rigorous definition of this excursion restriction is not possible with the limited data available.” (Update @ 78)). This weakly supported national criteria recommendation should therefore be withdrawn. Response EPA does not agree. The Update did recognize where uncertainties exist and where conclusions about averaging period varied among different studies. This is only proper, because any actions taken should be evaluated in terms of possible uncertainties. However, just because uncertainty is present does not mean that some action is not justified. The quoted sentence should be judged in terms of the context in which it was made, which was after summarizing the 7-day fathead minnow tests which showed effect concentrations 2.5-fold higher than the 30-day tests. The Update noted that these tests do provide the best indications of how much excursion above the 30-day test results might be permissible. However, the Update also noted that excursions of the exposure concentration should be restricted to preclude eliciting effects based on these 7-day tests. The quoted sentence merely pointed out that an exact, rigorous definition was not possible, because data for variable exposures within such 7-day tests are not available. But this uncertainty is not so great as to preclude y steps. It is certain, at least in the context of these tests, that the restrictions should be more stringent than using a 7-day averaging period with a 2.5 fold concentration factor, even if how much more so cannot be rigorously specified. It is also reasonably certain that the 4-day average with no factor specified by the Guidelines is unduly restrictive. This leads to a final point that should be made. If EPA considered the uncertainty in this restriction too great, EPA would not proceed with recommending the 30- day average period. EPA would not recommend the 30-day averaging period without corollary considerations of a shorter averaging period. Comment C-59 EPA makes no statement regarding the appropriate stream design flow to use f a steady state approach is employed in permit development. This is a major departure from all prior criteria documents that included a steady state flow recommendation. In light of historical (mis)statements that the proper design flow for applying chronic ammonia criteria is a 7/QuO flow, this issue must be addressed in the Update to avoid widespread misapplication of the new criteria. A thirty-day once in three year exceedance frequency is equivalent to allowing insiream concentrations to be above the criteria 2.8 percent of the time. EPA’s 1991 TSD recognizes that the design flow must properly reflect the allowable frequency and duration of criteria excursions. (See, 1991 TSD 79. “The design flows used in steady state modeling should be reflective of the CCC and CMC durations and frequencies. “) Based upon derivation of “never-to-exceed” permit limitations (i.e., compliance is assured on a 99 percentile basis), there is no need to include further safety factors in selection of an appropriate chronic criteria design flow because compliance with the permit assures that instream concentrations will never be greater than the criteria, as long as stream flows exceed the selected dilution flow. A 30/Q/3 flow may be used because, by definition, a flow lower than this flow does not occur more frequently than once in three years or 2.8 42 ------- percent of the time. (See, Exhibit 5, EPA opinion that for ammonia, at least a 30/QuO flow should be used to apply the chronic ammonia criteria.) Given the known safety factors that must be included in treatment plant design (e g., any ammonia limit less than 10 mg/I essentially requfres construction offull-scale nitr /Ication, resulting in actual effluent quality of one-half to one-tenth of the allowable effluent quality), there is no reasonable basis to claim that a 7/QuO design flow should be selected for chronic criteria application in steady state model applications. It must be noted that plant performance under drought conditions is optimum (i.e., well below permitted levels) due to stable and reduced treatment plant flows, regardless of the duration of the low flow event. Therefore, the Update should recognize that proper application of the thirty-day chronic criteria requires use of a 30/Q/3 flow, unless spec j/Ic information on the discharge indicates that a more frequent exceedance is likely to occur considering all relevant factors (e.g., plant performance under drought flow conditions, highly variable instream conditions, or intermittent discharge concerns). Response EPA agrees that it is appropriate clarify the appropriate design flow, in order to reduce confusion about the implications of the 30-day averaging period. If a design flow approach is used, then the either a 30Q5 or a seasonal flow exceeded 95 percent of the time is appropriate. EPA does not agree with the comment that a 30-day once-in-three-year exceedance goal ordinarily means 30/(365 x 3) = 0.028 = 2.8% excursion frequency. In time series having a realistic degree of serial correlation, which indicates the degree of smoothness in the day- to-day changes in concentration, the allowance excursion frequency is somewhat higher than this. Analysis of long time series indicates for if the correlation coefficient between the logs of daily composite samples is not more than 0.86- 0.94, an observed range for samples from larger rivers, and the log standard deviation of grab or composite samples is not more than 0.5-0.8, also an observed range for ambient samples, then the 30-day once-in-three-year goal actually allows 24-hour composite samples to exceed the criterion approximately 5 percent of the time. This assumes that the criteria exceedances are counted in the manner used in the 1986 Technical Guidance Manual for Performing Waste Load Allocations, Book VI Design Conditions, Chapter 1 Stream Design Flow for Steady State Modeling. The 2.8% frequency presented in the comment might be valid if concentrations changed as 30- day wide step functions. However, in realistic time series, many excursions of the criterion concentration are not of sufficient duration and magnitude to cause the 30-day average to exceed. It is also worth noting that the protection of 95 percent of the species 95 percent of the time is a commonly used goal in ecological risk assessment and management, and is the recommendation of the 1998 SETAC expert workshop Reevaluation of the State of the Science for Water Quality Criteria Development. While EPA recognizes that design flows are commonly used in deriving effluent limits, it should be noted that flow is only one of the important parameters. For the 1999 ammonia criterion, temperature and pH are also important. If a design condition approach is used, then appropriate seasonal values for these parameters also need to be selected. Time-variable modeling is a better way of dealing multi-parameter variability and correlation. Where it is important to find the most cost-effective alternative for protecting the aquatic life use, then it is appropriate to account for seasonality and day-to-day fluctuations of effluent quality, streamfiow, and pH, and the 43 ------- seasonality of temperature and fish early life stage presence/absence. Comment C-60 EPA also continues to recommend that a stringent one-hour averaging period is necessary for application of acute criteria even though the acute criteria derivation has no relationship to a one-hour exposure period. Acute criteria are based on 96 hour no mortality test concentrations. No information presented in the Update supports the needfor a one hour application of a 96 hour “safe” level. The stringent one-hour averaging period recommendation continues to confuse state authorities that develop NPDES permits (e.g., one California Regional Board has begun to include “one-hour” and “instantaneous maximum “permit limits to implement EPA ‘ S acute criteria recommendations; Iowa, Kansas, and other states have implemented arbitrarily restrictive mixing zone policies even on small streams (ten feet wide) that completely mix rapidly; Minnesota regulations preclude greater than 1:1 mixing, even on the Mississippi River, to address acute toxicity “threats”). All of these policies refer to the ‘fast acting toxicant” assumption that was the basis for asserting a one-hour averaging period was needed. Because there is no technical basis for this recommendation, it should be withdrawn and replaced with an appropriate averaging period that is reasonably related to the criteria derivation and not unduly conservative (e.g., 24-hour averaging period). Response EPA notes that the comment did not cite any information indicating that ammonia was not fast-acting in acute exposures. EPA is receptive to evaluating any such information. The 1-hour averaging period was not addressed in the Update because of resource limitations and because the previous ammonia document had already noted that ammonia is fast acting, so that it was not an issue likely to resuJt in large departure from the Guidelines values. Furthermore, as stated for other issues above, the criteria are meant to provide an expression of what exposures are of biological concern. Although EPA is aware that misapplications of criteria provisions may occur, as part of this project it is not prepared to address all such implementation issues. If these exposures are misapplied or misinterpreted, the remedy should not be to provide a less toxicologically appropriate expression of exposure, but rather to improve implementation. Averaging periods should not be considered as the sole determining factor in mixing zones, but one of several factors that determine mixing policy intended to provide a desired level of protection. This is a valid issue, but not one that criteria documents by themselves can solve. EPA, nevertheless, does not believe that the ammonia acute averaging period should be as long as 24-hours. As discussed above - regarding the chronic averaging period, averaging periods often need to be much shorter than the laboratory test duration because (a) test endpoints often do not need the entire test period to occur and (b) variable exposures will generally elicit greater effects than the relatively constant exposures typical of Laboratory tests. For acute ammonia toxicity to most fish, 24- hour LC5Os typically are the same, or only slightly above, 96-hour LC5Os. Thus, a 24-hour period applied to the CMC would still be vulnerable to greater-than-intended effects from any fluctuations within that period. In the field, such fluctuations can be more substantial than in the laboratory due to heterogeneous exposures and diel pH variations. That the averaging period should be much shorter than 24-hours is evident from several studies: (1) Ball (1967, Water Research, 1:767-775) This study showed that LC5Os for rainbow trout at 24 hours and beyond were the same; i.e., the threshold LC5O is closely approached within 24 hours. The 3-hr LC5O was only about 50% higher than the threshold LC5O, the 4-hr LC5O 44 ------- only 30% higher, the 6-hr LC5O only 20% higher, and the 12-hr LC5O only 10% higher. These data do not account for the fact that mortality from short exposures can be somewhat delayed beyond the exposure. Because of this, it is likely that the LC5Os at the short durations are even closer to the threshold values. (2) McCormick et at. (1984, Environmental Pollution, Series A, 36:147-163) These workers reported that LC5Os for green sunfish at 24 hours and beyond also were not significantly different -- the 24-hr LC5O was 0% to 15% higher than the 96-hr LC5O, depending on pH. The 3-hr LC5O was only 6-20% higher than the 24-hr value, the 6-hr LC5O only 5-15%, and the 12-hr value only 3-8% higher. Again, delayed mortality is not accounted for in these data. (3) Lloyd (1961, Water and Waste Treatment Journal, 8:278-279) Relationships reported in this study indicated that the two-hour LC5O for rainbow trout is only 50% greater than the threshold LC5O. Again, delayed mortality is not accounted for in this relationship, so that the short-duration LC5Os are probably somewhat inflated relative to the threshold LC5O. (4) Thurston et al. (1981, Water Research, 15:991-917) These workers reported that 96-hr LC5Os for rainbow trout for a pulsed exposure which alternated 6-hours on/6-hours off were only 40% higher based on peak concentrations than for a continuous exposure. Therefore, with such fluctuations, the average c oncentration should be <70% of that for a continuous exposure. The averaging period would have to be no more than eight hours, and can only be that long because concentrations within the pulse are relatively constant. (5) Bailey et at. (1985, Aquatic Toxicology and Hazard Assessment, 8th Symposium, ASTM, pp1 9 3-212) In this study, the threshold LC5O for bluegill was reached by 24 hours. The 8-hr LC5O was approximately 50% higher than the threshold and the 1-, 2-, and 4-hr LC5Os were approximately threefold higher. This species therefore showed a slower response than the aforementioned studies. However, despite this slower response, pulsed exposures in this study resulted in LC5Os approximately equal to the threshold LC5O when based on a two-hour averaging period and only about twice the threshold LC5O when based on a one-hour averaging period. The continuous exposures among these studies all suggest that the averaging period for ammonia should not be longer than a few hours, even without corrections for delayed mortality; otherwise, variations within the averaging period could conceivably be great enough so that lethal conditions are reached. The Bailey et at. pulsed exposures furthermore directly indicate that the averaging period should be no longer than two hours even for a fish that does not react particularly fast. An approach for better quantifying the averaging period is to take the inverse of the Mancini (1983, Water Research 10:1355-1362) kinetic constant. Mancini did report a constant for ammonia based on the study of Bailey et al. discussed above. His estimated constant - (0.1 6/hr) suggests an averaging period of 6 hours, again for an organism that is not particularly fast-responding and for data which does not account for all of the delays in mortality. The 3-hour data from McCormick et al. indicate that the constant is in the 0.4-1 .0/hr range, suggesting a 1.0-2.5 hour averaging period is needed. Again, if delayed mortality was taken into account, this could be even shorter. Not all fish respond this fast, but enough do to indicate that the averaging period should be comparatively short. Comment D-0 I In allowing a threefold increase in the criterion when ELS are absent, EPA qualitatively assesses that invertebrates will be protected during cold water temperatures. The 45 ------- invertebrate chronic values, however, could be adj usted for temperature, and the final chronic value could be recalculated using the Guidelines. The result is a chronic value higher than EPA ‘s threefold increase. We recognize that there is uncertainly in applying ihe available invertebrate temperature relationships for chronic toxicity. Did EPA reject the quantitative approach to the invertebrate data because of the greater risk when the criteria become even less stringent. Or is there sufficient confidence in the temperature relationship to further increase the criteria? A discussion of the degree of risk of using the temperature relationship for adjusting chronic values would be helpful for our decision making. Response EPA acknowledges the shortcomings of the qualitative approach of the 1998 Update. For the 1999 revision, a quantitative approach has been used. The key uncertainty is the potential difference between acute and chronic temperature relationships. Based on kinetic considerations, which are reflected in the observed difference between acute and chronic temperature relationships for fathead minnow, there is an expectation that temperature reductions will have less influence on reducing chronic toxicity than acute toxicity. This is because it is expected that reducing temperature causes some reduction in acute toxicity simply by delaying the toxicity beyond the end of the 96-hour test duration, rather than causing reduction in long term sensitivity. As discussed in the 1999 Update, a limited amount of data on temperature versus ACR can be brought to bear on the question. See also the response to Comment D-06. Comment D-02 In New York State, ELS will be present in April and perhaps late March when ambient water temperatures are between 5 and 18 degrees. Nitr /Ication, however, is difficult or expensive to accomplish at that time. Accordingly, a discussion of the invertebrate temperature relationship should include its potential use when ELS are present. Response EPA agrees and has added temperature considerations when ELS are present. Comment D-03 We note that f-fall ‘s comments also present a temperature relat ions hip for fish. This relationship appears to have some merit, and we concur with the comment that the analysis of Figure 4 of the Update, where the data from multiple species are combined, can give ala/se :mpression of no temperature dependency. Response EPA does not agree. See responses to comments C-34 through C-52 (in particular C- 38 and -39) and the discussion of Appendix [ II. Comment D-04 Perhaps temperature should be restored to the basic criteria as was the case with the pre- Update criteria. An EPA discussion and response would be appreciated. Response EPA agrees. The Update has been revised to incorporate temperature relationships, both for the case of ELS present and the case of ELS absent. Comment D-05 It was suggested by Hall and Associates that the chronic pH relationship should have been based on the acute model because the acute model was derived with an extensive data base and the chronic model was based on only two species. It was shown by Hall in his comments that this limited chronic data reasonably fits the acute model, suggesting that the more extensively supported acute model is a better estimate of an applicable model for chronic effects. An EPA discussion and response would be appreciated. 46 ------- Response EPA does not agree that it should have applied the acute pH relationship to chronic toxicity, given the chronic data available for smallmouth bass and Ceriodaphnia. See the responses to comments C-24 through C-33, and the discussion of Appendix 1. Nevertheless, considering the limited number of available chronic data points, EPA recognizes chronic slope would be sensitive to additional toxicity testing, were it to be done. EPA also recognizes that there may be some good ways of obtaining a chronic p 1-I relationship that might use the extensive acute data base, while still accounting for the acute-chronic differences indicated by the smallmouth bass and Ceriodaphnia data. Finally, EPA recognizes that in the face of uncertainty, selecting among alternative approaches is a risk management decision, for which States retain flexibility. Comment D-06 Lastly, it appears that ammonia criteria will continue to contain uncertainty as suggested in the Update document. Accordingly, EPA should allow flexibility for varying approaches, particularly with respect to temperature, when approving criteria at the state level. Response EPA agrees. Nevertheless, EPA must be able to defend its approval of State and Tribal standards. Consequently, States and Tribes can enhance the flexibility available to them by carefully considering the available information, and clearly articulatina the rationale for alternative approaches. Comment E-O1 The data presented for the consideration of the CCC being 3-fold higher in the cold season is based solely on the relationship between pollutant concentrations and effects on aquatic life and does not appear to take into account the effects of human health. Treatment of ambient waters for drinking water purposes needs to be considered for effects on public health and safety. Nutrient loading, such as ammonia, can lead to increased plant material in water bodies, which will increase the Total Organic Carbon levels. [ After disinfection in potable water supplies] this will lead to higher levels of disinfection by- products. The Ire atability of ambient source water varies with temperature. The effectiveness of chlorine as a disinfectant is reduced in colder temperatures. Allowing levels of ammonia to increase in the cold season... would create an increased chlorine demand which would create an increased disinfectant load which would create and increase the disinfection-by- products produced A WWA feels that the Agency should consider the potential to affect the quality of [ drinking water] source waters, and more importantly public health in the evaluation of cold-season ammonia levels. Response The comment is correct in noting that the aquatic life criteria derivation-does not involve- human health considerations. EPA’s water quality standards program has maintained separate lists of human health criteria and aquatic life criteria. Human health criteria are not derived with the intent of protecting aquatic life. Aquatic life criteria are not derived with the intent of protecting human health. States must adopt criteria to protect beneficial uses of their waters: protecting potable source waters, protecting human consumers of fish, and protecting the aquatic life itself. However, the criteria for these distinct uses differ from each other. Where concentrations of a pollutant impair both drinking water uses and aquatic life uses, separate criteria should be adopted to protect these uses. 47 ------- Not all waters are classified for all purposes. If EPA were to roll drinking water considerations into its aquatic life criteria, thereby creating an multi-purpose criterion, then it would have difficulty justifying the necessity for this criterion in waters that are not classified for drinking water use. Thus, EPA believes it better to tailor the criteria to the use. The concerns expressed in the comment should be addressed through human health criteria for protection of source waters. Regarding one technical detail in the comment, certain points should be made about the concern about ammonia as a nutrient in aquatic systems. First, it should be noted that algal growth in nearly all fresh waters is more limited by phosphorus than by nitrogen. Consequently, increased nitrogen loads do not necessarily stimulate additional plant growth, because the nitrogen is already present in excess. Second, ammonia criteria have no effect on nutrient nitrogen loads or concentrations. The so-called “ammonia removal” from wastewater does not remove much nitrogen, it merely oxidizes ammonia to nitrate, which is then discharged. Comment F—O 1 The EPA conclusion that ‘the CCC does not vary with the type offish present,” is not adequately supported. Response EPA did not intend to imply that site-specific recalculation of criterion, based on species present, and excluding species absent would not change the criterion. Rather, what EPA intended to convey was that for the coarse fisheries distinction sometimes recognized by national criteria, cold-water, salmonids present, versus warm-water, salmonids absent, the data did not support different values for the CCC, in part because the two most sensitive tested species were invertebrates, not fish, and in part because the chronic sensitivity of salmonids did not appear to differ substantially from some warm water fish. Comment F-02 The EPA assumption that the species which lack chronic test data are those which are relatively tolerant of ammonia in the acute database is incorrect. In particular, invertebrate data are lacking, yet some invertebrates are among the organisms most sensitive to ammonia. Response EPA does not believe it made the above assumption. EPA believes that the appropriateness of the ammonia criterion (and other aquatic life criteria) rests on the assumption that the tested species are representative of the large number of untested species that would be found in the field: that is, that the tested species are not biased toward either greater or lesser sensitivity than the multitude of untested species. Comment F-03 It is inconsistent to conclude that ammonia toxicity “does not appear empirically to vary with temperature” (Notice, p. 44256) while simultaneously recognizing that, at least for some species, toxicity of ammonia appears to decrease with decreasing temperature (Notice, p.4 1257). Response EPA agrees. This inconsistency has been resolved by the 1999 Update, by quantifying (rather than merely acknowledging, as in 1998) the temperature dependency of ammonia toxicity on invertebrates. The 1999 CCC values are temperature dependent. Comment G-O 1 The update identj/Ies several issues that could be more fully addressed if additional data were available. 1 urge EPA to conduct the needed research to further clarjfy those issues where the available data are meager, and where the uncertainty associated with those data are likely to have the greatest impact on the criteria. The following list recommends the 48 ------- research needed to more fully address the issues I believe are most important to Virginia. 1. Long term survival tests with sensitive juvenile and adult fishes and conducted at low temperatures for long enough durations to assess toxic effects under conditions relevant to establishing a cold season criteria. 2. Additional studies to better define the sensitivity of Hyalella, the bluegill and the fingernail clam as these appear to be among the most sensitive species. 3. Additional tests with sensitive species to better define the effect ofpH on chronic toxicity. 4. Additional tests to determine what affect osmotic stress may have on chronic toxicity of sensitive species. 5. Chronic tests with freshwater clams in the family Unionidae to act as surro gates for endangered species in the same family. Response EPA agrees with that studies in the above areas would be of interest. In addition, during formulation of the 1999 CCC it has become apparent that additional data on the invertebrate temperature relationship would be of interest. Comment H-O1 EPA does not possess technical studies demonstrating that chronic effects of ammonia occur at less than 9.0 mgIL for warm water fisheries during non-spawning/low temperature periods (e.g., temperature less than 10°C). Response EPA does not agree. For cold-weather survival of the invertebrate Hyalella, EPA estimates a pH=8 GMCV of 3.82 mg NIL at 10°C and of 4.63 mg N/L at <7°C. This is based on the Borgmann (1994) test at 25°C, adjusted by the 1999 Update’s invertebrate chronic temperature relationship. This temperature relationship was established from the Arthur et al. (1987) invertebrate acute temperature-dependence data, modified by the acute-chronic ratio’s (ACR) temperature dependence, which was estimated from the DeGraeve et al. (1987) data for fathead minnow ACR temperature behavior. For survival of warm-water juvenile and adult fish (ELS absent), EPA in the 1999 Update has estimated pH =8 GMCVs in the range of 8.78 to 9.55 mg N/L for four genera, independent of temperature, and therefore applicable to the <10°C condition specified in the comment. Lastly, it should be pointed out that the period during which fish ELS are present is longer than the spawning period, because it includes the time needed for development of embryos and larvae into juveniles. The term “spawning period” should thus not be substituted for “fish early life stage present period.” Comment H-02 There are no data demonstrating that a 1.27 mg/L chronic criterion (at pH 8.0) is required for either warm or cold water fishery protection when [ fish] ELS are not present or temperatures are less than 15° C. Response EPA believes the statement is correct. See the response to H-0 1. For the 1999 Update the CCC was set at 85 percent of the Hyalella GMCV, adjusted for temperature and pH. EPA believes that this should provide protection to a high percentage of taxa, tested and untested. In contrast to the 1998 Update, the 1999 Update CCC (pH=8) is substantially above 1.27 mg N/L at all temperatures below 15°C, thereby satisfying the concern of the comment. Comment H-03 Invertebrate data support the use of a temperature adjustment for chronic criteria for temperatures less than 25°C, and it is acceptable to base that adjustment on the 49 ------- available acute temperature effects information. Response EPA agrees with the first clause. A temperature adjustment has been added to the 1999 chronic criterion. The adjustment applies above 25°C as well as below. EPA based the chronic adjustment primarily on the invertebrate acute temperature dependency data (per the comment’s second clause), but modified this to account for the expected temperature dependency of the acute-chronic ratio (as stated in response to H-01). Comment H-04 Available data do not demonstrate that existing warm water fishery un-ionized ammonia standards used by many states ranging 0.04 - 0.07 mg/L, typically applied at 7Q10 or 30Q10 flows, are not protective where pH is above 7.0 su. Response All decisions about the protectiveness of proposed or existing State or Tribal standards are made on a case-by-case basis, considering the applicable laws and regulations, and the available technical information. This guidance does not pass judgement on State or Tribal standards. EPA thus declines here to relate test results to the protectiveness of such standards. lower pH increases the likelihood of observing an un-ionized effect concentration below some stated threshold. At pH>7, none of the data presented in the 1999 Update’s Figures 8, 10, 12, or 14, or Table 5 corresponds to effects at un-ionized concentrations <0.04 mgIL. Considering only the data in Table 5 of the 1999 Update (or Table 2 in 1998), two studies show effects below 0.07 mg/L. These are Sparks and Sandusky (1981) with fingernail clam, and Smith et al. (1984) with bluegill. Again, it must be emphasized that these facts do not by themselves indicate whether particular standards are or are not protective of aquatic life uses. Comment H-05 Available data on salmonid fisheries do not indicate that cold weather, non-spawning periods require total ammonia chronic criteria less than 3-4 mg NIL (atpH 8.0). Response At temperatures between 11 °C and 0°C, the 1999 Update CCC (pH=8), with fish ELS absent, falls in the range 3.05 to 3.95 mg NIL. Consequently, if the term “fish ELS absent period” is substituted for “non-spawning periods” in the above comment statement (as discussed in H-0l), then EPA agrees. Nevertheless, it is within the scope of this document to address a related, purely technical question of whether there are any studies showing chronic effects below the particular concentrations specified in the comment. Although certain studies, Rice and Bailey (1980) with pink salmon and Broderius et al. (1985) with smalimouth bass, have shown chronic effects below 0.04 mg/L un-ionized ammonia, both of these tests were at pH<7, and are thus not applicable to the pH>7 condition stipulated in the comment. When expressed as un-ionized ammonia, effect concentrations decrease with decreasing pH. Thus, testing at The distinction between spawning period and fish ELS absent period is greatest in fall- spawning salmonid fisheries, where the salmonid ELS are present all winter. Comment H-06 Available data for salmonidfisheries do not indicate that the 0.02 mg/I un-ionized ammonia chronic criterion used by several states is under-protective. 50 ------- Response See H-04 for discussion about why EPA declines to respond here to questions about the protectiveness of State or Tribal standards. Nevertheless, parallel with H-04, EPA can address a related, purely technical question of whether any chronic effects on salmonids have been observed below 0.02 mgfL un-ionized ammonia. Rice and Bailey (1980), testing at 4°C and pH 6.4, found effects on pink salmon at un-ionized ammonia concentrations less than 0.02 mg/L. See H-04 for discussion of the significance of this low test pH. Parallel with H-04, this fact does not by itself indicate whether particular standards are or are not protective of coidwater fisheries uses. Appendix I - Evaluating Adherence of pH Relationship to Data at Low and High pH One assertion in the comments was that the pH relationship was overly conservative at low and high pH. Part of the support for this assertion were observations regarding deviations of the data from the pH relationships in Figures 10 and 12 of the 1998 Update (Figures 12 and 14 in 1999). These observations included noting that the criterion fell further below available data at the extreme pHs than in the pH 7.5 to 8.5 range, that the data appeared to be less curved than the pH relationship used, and that the slope of the data at low pH was greater and at 1000 high pH was less than the pH relationship. This * appendix demonstrates that such observations do not raise legitimate doubts about the pH relationship - in fact, they are what is expected if the pH relationship is perfectly valid. The figure at the side is a simulated dataset in which the pH relationship used in the Update is assumed to be true. The dotted and solid lines are the FAV and CMC, respectively, from.Figure 10. The abscissas of the data points are the pHs of the acute data presented in Figure 10 of the Update, so that the pH distribution of the data is the same as presented in the Update. The ordinates of the data points were randomly drawn from a log-normal distribution with a log-mean 0.3 greater than the FAV and a log standard deviation of 0.25, which results in a scatter in the pH 7.5-8.0 range similar to that seen in Figure 10 of the 1998 Update (Figure 12 6 7 6 9 0• . I 4; 10 . :i. p 14 51 ------- in 1999). (That is, plotted LC5O = l0”(log(2 CMC) + NRnd), where NRnd is normally distributed with mean 0.3 and standard deviation 0.25, and where CMC is from Equation 12 of the 1998 Update or Equation 13 of the 1999 Update.) In this simulation, it is given that the curved pH relationship is valid. Yet, although the plotted “data” have been calculated from pH relationship (with added random variability), the appearance of the data has features that some comments argued to refute the pH relationship. It appears that the data can be described by a less curved line. Many of the data are below the FAV in the pH 7.5-8.5 range, but data are well above the FAV at the more extreme pH. This type of behavior is expected and is a matter of simple probabilities. The criterion should be farther below available data at the extreme pHs simply because there are fewer data there. For a pH range with hundreds of LC5Os, many data points would be expected to be below the fifth percentile, but with few LC5Os, none would be expected to be below it. The simulated data set here is similar in appearance to the actual data in Figure 10 in the 1998 Update (Figure 12 in 1999). The exact appearance of the simulation data set varies depending on the values of the random numbers generated. (That is, different seeds for the random number generator yield somewhat different simulation plots.) Nevertheless, the basic characteristics of their appearance are the same, with the low values from the center of the pH distribution falling closest to the criteria line. This indicates that the 1998 Update’s Figure 10 data plot is consistent with rather than at odds with the acute pH relationship. Appendix II- Differences Between Acute and Chronic pH Relationships Some comments were critical that the Update uses a different pH relationship for chronic toxicity than for acute toxicity. Concerns include the fact that the chronic pH relationship is based on just two datasets, in contrast to 15 data sets for the acute pH relationship. In various places, the comments assert that such limited data provide an inadequate basis for adopting a different pH relationship for chronic toxicity than already established for acute toxicity. The comments also make - various claims about the acute pH relationship being appropriate for chronic applications. The comments ignore that the Update did establish statistically significant differences between acute and chronic toxicity pH relationships, even with the limited data. Various specific comments are addressed in the main body of this response. This appendix presents more details on the differences between acute and chronic toxicity than was provided in the Update in order to better explain the basis for using different relationships. Smalimouth Bass One of the two studies used for the basis of the chronic pH relationship was that of Broderius et al. (1985), who conducted both early life stage chronic tests and acute tests on smalimouth bass at four different pHs. The acute data from this study are presented in 1998 Update Figure 6 and the chronic data in 1998 Update Figure 8 (or Figures 8 and 10 in 1999). The Update reported that the parameter estimates from the regression analysis of chronic data differed significantly from the 52 ------- parameter values determined from the regression analysis of the acute data, but provided no detailed comparisons of the acute and chronic data. Some further comparisons will be given here. In the figure at side, both the acute and chronic data from this study are plotted side by side. These figures show clear differences in trends with pH for acute and chronic toxicity. The differences in these trends are especially evident in the acute-chronic ratios also plotted on this figure, which vary by almost ten-fold over the pH range, in contrast to being constant if the acute and chronic pH relationships really were the same. 40 -J C) E 10 C 0 E E4 Cu I- 1 lmoufl, Bass Acute us Ovonic Toddty 1 1W P aEe LC5O 4 40. $ iw 10 40 4 4 4 2 10 , 1 I._..I..__, I....I. fta eC nc Ra o . S • The figure below shows just 6 7 8 9 6 7 8 9 6 - the chronic data from this study, plotted with both the acute and chronic pH relationship from the Update, the intercepts in these relationships being adjusted to best fit this dataset. For the chronic relationship, the residual mean square error (log scale) of the data from the line is .0 127, corresponding to an average deviation of the line of a factor of 1.30 (about 30%), in line with reasonable uncertainties for toxicity data. For the acute relationship, the residual mean squared error is .0695, more than five times larger than for the chronic relationship, and which corresponds a deviation factor of 1.84, extremely large S m a II m 0 u t h B a s $ C h r o n i C E C 20 deviations relative to the uncertainty in 40 the data. Even if the lowest pH data is ignored, the acute pH relationship gives a 20 much worse fit. 10 The contention in the comments that there is not a significant difference between acute and chronic pH - relationships and that the acute pH relationship provides an adequate fit to the chronic data is not supported by such analysis of this data. Using the acute pH relationship results in much larger errors and ignores clear differences between chronic and acute toxicity from this study. It should be further noted that the lines used here are based on the average regression across all datasets. The differences between the acute and -J z 0, Co 0 E E 4 2 1 0.4 6 7 8 9 pH Relationship 53 ------- chronic data for smallmouth bass are actually greater than accounted for by these average relationships. To ignore these differences would not be an appropriate use of available data. Finally, it should be noted that the differences here between acute and chronic toxicity are not due to different endpoints. Although the chronic toxicity EC2Os here included effects on both survival and growth, the effects were mostly due to mortalities. Even if just mortality was used from the chronic study, the pH dependence of chronic toxicity would still differ markedly from that of acute toxicity. Ceriodaphnia dubia The other dataset used in the evaluation of the pH dependence of chronic ammonia toxicity was that from Johnson (1995) using Ceriodaphnia dubia. This dataset consists of life cycle tests conducted at four different pHs and in waters of three different ion concentrations, with reproduction providing the most sensitive endpoint. As reported in the Update, this data shows a similar pH dependence to that for smallmouth bass, being significantly flatter than the acute pH relationship from the Update. It should be noted, however, that the acute pH relationship for crustaceans also appears to be flatter at low pH than the average acute pH relationship in the Update, so that there is not as much pH dependence of acute-chronic ratios as observed above for smallmouth bass. But whatever the trends in these ratios, the issue of concern here is whether the data for this organism support using a flatter pH relationship for chronic toxicity than for acute toxicity. Although regression analysis reported in the Update indicated this to be the case, Figure 8 of the 1998 Update did not provide good visual presentation of these trends because the data are quite variable (reproductive endpoint) and because data and lines from the waters of different ion concentration were superimposed. The following figure shows the empirical trends more clearly by showing the geometric average and range, at each tested pH, of the effect concentrations for the three different ion concentrations. (Data are plotted at the average pH for each set of three tests.) The acute-chronic ratios are the geometric averages for each pH and are not corrected for the small differences in pH between acute and chronic tests. Although these ratios are variable, on average they do increase substantially at low pH, although the trend is not as great as for smallmouth bass. Cedodaphnia dubia Acute rsus Chronic Todcity The figure on the next ° CtTIXiC EC2O .ACute LC5O AC Ratio page shows the data normalized I and averaged to better show the 100 10 mean trends of toxicity with pH. For the regression analysis reported in the Update, the shape E of the pH relationship was assumed to be the same among 10 datasets and a pooled, average 2 estimate for this shape was 10 derived. Average sensitivity differences between datasets, T i including among the three water 2 4 types used in the C. dubia study, 1’ .. 2 . . _ .I ....,....I 6 7 8 9 6 7 8 9 6 7 8 9 54 ------- were accounted for with separate . . . estimates for the EC2O atpH=8. This Ceriodaphnia dubia Chronic Toxicity estimate serves the role of an intercept 10 - and this procedure is analogous to a pooled linear regression in which the slopes of different datasets are assumed Chronic pH Relationship to be the same, but the Intercepts vary. o 4 - For the data presentation here, each EC2O was divided by the regression estimate of the EC2O at pH=8 for its E 2 - water type to produce normalized EC2Os. When so normalized, the points for all the water types can be . I - superimposed onto the same normalized pH relationship. Such a normalization was used to summarize acute toxicity data in Figures 4 and 7 of the 1998 0.4 Update, and was also used in Appendix A of the Hall & Associates z comments. Data so normalized still 0.2 - show the same residual variation around the regression curve as before normalization. To reduce that variation, 0.1 ‘ • I • I • I the geometric average of the three 6 7 8 9 normalized EC2Os was calculated at each tested pH. Since each of these data is an estimate for the same point on the relationship, such averaging is legitimate, although it is made somewhat approximate due to pH variations among these points. This normalization and averaging is done only to provide a more informative visual presentation of the average pH trends present in this data. Any statistical analysis and conclusions regarding parameter values and trends is still based on the original regression analysis. When the data are so plotted on the above figure, it is seen that the mean pH relationship for this dataset is similar to that for smallmouth bass presented above. As for smalimouth bass, both the acute and chronic pH relationships from the Update are also plotted on this figure, with their intercepts adjusted to best fit the data. For the chronic relationship, the residual mean squared error is 0.0 100, whereas for the acute pH relationship it is -.0191, almost twice as much. This difference arises largely from the fact that the chronic data show more flattening at lower pH, which is not accommodated for by the acute pH relationship. Although the disparity between this data set and the acute pH relationship is not as bad as was the case for smallmouth bass, the acute relationship still is substantially worse than the chronic pH relationship. All Species Aggregated All available chronic EC2Os for all tested species are plotted against pH in Figure 12 of the 1998 Update (Figure 14 in 1999). Scatter in these data, produced by interspecies variability and interlaboratory variability, make it difficult to infer from the plot exactly what the correct pH relationship should be. 7 Acute pH Relationship 55 ------- Random number simulations of the type described in Appendix I indicate that the Figure 12 EC2Os are consistent with the chronic pH relationship. The amount of scatter in the data and the sparsity of data at low pH make it difficult, however, to conclude that the Figure 12 could not also be consistent with the acute pH curve. Thus, EPA’s preference for a chronic pH dependence distinct from the acute pH dependence is based on analysis of the above described Broderius et al. (1985) smallmouth bass study and Johnson (1995) Ceriodaphnza study, and not on the aggregated data of .Figure 12, which was presented to illustrate the general relationship between the chronic criterion and the available EC2Os. Appendix III - Issues Regarding Pooling of Temperature Data Some comments were critical of the pooled analysis of temperature data used in the Update. The concern apparently was not of the general techniques used for pooled analysis, which were standard statistical techniques and were used in the same comments’ Appendix A. Rather, the concern regarded the inclusion of certain data sets in the pooled analysis which had a limited temperature range and/or a significant number of data with considerable variability. The contention appears to be that including such data sets introduced biases or uncertainties in the analysis that was responsible for the Update’s conclusion that there was no significant temperature dependence of acute ammonia toxicity (in fish). The comments further suggested that temperature dependence should be based only on data sets with at least a 20°C range, presumably so that the slope estimate from each data set has a better reliability than those from some of the data sets used in the pooled data set. While it is desirable for each data set to have a broad temperature range, EPA believes that it is not correct to necessarily exclude less-than-desirable datasets. The question that should be posed is whether the less desirable datasets make a net positive contribution to the analysis, even given their uncertainties, and whether that contribution is appropriately weighted relative to the more desirable datasets. The following analyses will address specific concerns raised in the comments about the limitations of some datasets, and use statistical simulations to demonstrate that inclusion of such datasets is appropriate. This does not preclude the possibility that a data set can contain misleading data and skew the results, but this is true of large data sets as well as small, and of data sets with large temperature ranges as well as small ranges. In the absence of some clear inconsistencies or confounding influences, including a data set in any analysis should be decided on the basis of whether, on average, it would be expected to improve the analysis or not. As the following discussion demonstrates, the policy of the Update to include even limited data sets is not expected to obscure underlying relationships, but would rather increase the likelihood of detecting them. Furthermore, the analysis procedures used in the Update recognize the relative uncertainties of slope estimates from the different datasets and weight them appropriately in developing pooled estimates for slopes, so that “high quality” datasets will have more impact than lower quality ones. A major concern raised in the comments was that some data sets had a temperature range of 10°C or less. The comments characterized this as “extrapolations” since these data sets were being used to infer a relationship beyond there range. Such a characterization is inappropriate, because each data set is used to provide an estimate of the slope only within its own range, and the entire relationship is based collectively on data which do cover a broad range, so that no “extrapolations” are done, even if 56 ------- the overall relationship is pieced together from relationships covering different ranges. Nonetheless, although this characterization is wrong, the contribution of data sets with limited temperatures ranges to the overall analysis is still a legitimate issue, because narrow temperature ranges result in more uncertain slope estimates. However, the analysis used in the Update recognizes that these estimates still provide legitimate information about the slope, and, if appropriately weighted, will make a positive contribution to the overall analysis. Based on statistical simulations, the analysis here will address whether data sets with limited temperature ranges can appropriately contribute to inferences about relationships across a broader range. The first simulations presented here will use two types of data sets. One data set will consist of six points equally spaced across a 0 to 30°C range (0, 6, 12, 18, 24, 30°C), representing a single “high quality” test series on an organism. The other data set will consist of three pairs of points, representing three separate sets of two tests, the first set being conducted at 0 and 6°C, the second at 12 and 18°C, and the third at 24 and 30°C. Thus, each data set has the same number of points at the same temperatures, but the first data set will provide one slope estimate based on six points, whereas the second data set will provide three, less certain, slope estimates based on two points each. These smaller sets are similar to the most limited ones used in the Update, whereas the larger set is similar to the data sets favored by the comments. In the first set of simulations, the real temperature relationship is assumed to be: log 1 0 (LC5O) = —0.01 (1 ëmperature(C) —15) The log slope of -0.01 corresponds to a factor of 2 change over a 30°C range (at 15°C, the true LC5O is 1.000, at 0°C it is 1.413, and at 30°C it is 0.707). The intercept in this equation is set to 15°C to put it in the middle of the data, rather than the extremes. Data points were randomly created based on this relationship, with a standard deviation of 0.1 for the log LC5Os (the average standard deviation observed in the temperature data sets in the Update). A total of 10,000 samples were created for each type of data set and subject to linear regression of log LC5O versus temperature. For the second data set type (with three sets of data pairs), the regression produced separate intercepts for each data pair, but a pooled estimate of slope. The mean and variability of the regression estimates for slopes and intercepts are summarized in the following table: Data Set Type Parameter Mean of Param Est SD of Param Est One Set of Six (0,6,12,18,24,30°C) Slope -0.0 100 0.0040 Intercept 0.000 0.04 1 Three Sets of Two ; ) (24,30°C) Pooled Slope -0.0100 0.0136 Intercept (0,6°C Pair) 0.000 0.178 Intercept (12,18°C Pair) 0.000 0.070 Intercept (24,30°C Pair) 0.000 0.178 As expected, the single data set of six points produced much more precise estimates of the parameters, but the threes sets of data pairs still produced unbiased estimates for the true slope (-0.0 100) and intercepts (0.000). These simulations also tracked how often the regressions produced a slope 57 ------- (a) greater than -0.0050 (i.e., less than half as steep) and (b) positive rather than negative. For the single data set of six, the slope was >0.005 10% of the time and positive 0.5% of the time. For the three sets of data pairs, the slope was >0.005 36% of the time and positive 23% of the time. Clearly, just three sets of paired data like this would produce imprecise estimates, and reliance shouldn’t be put on just on such a limited data set. But just because such a data set by itself produces imprecise estimates of slopes does not mean that it cannot make a worthwhile contribution as part of a larger pooled analysis, so the next simulations were run to demonstrate this point. Two simulations were run, each with a data set with a total of twelve points. In the first simulation, the data set consisted of two of the “high quality” sets of six evenly spaced points. In the second simulation, the data set consisted of one of the “high quality” sets and three of the sets of data pairs. The results of the simulation are presented below: Data Set Type Parameter Mean of Estimates SD of Estimates Two Sets of Six Pooled Slope -0.0100 0.0028 Intercept (Set 1) 0.000 0.04 1 Intercept (Set 2) 0.000 0.04 1 One Set of Six and Three Sets of Two Pooled Slope -0.0100 0.036 . Intercept (0,6°C Pair) 0.000 0.085 Intercept (12,18°C Pair) 0.000 0.070 Intercept (24,30°C Pair) 0.000 0.085 Intercept (Set of Six) 0.000 0.041 The first simulation produced the expected results. Using two sets of six increased the precision (reduced the standard deviation) of the slope estimate by a factor equal to the square root of two (=1.41), but had no effect on the precision of the intercept estimates since such centered intercepts rely just on the mean of the six points within each set. In the second simulation, using some “low quality” data along with one “high quality” data set also improved the precision of the slope estimate, but only slightly because the slope estimates from these small, narrow-range sets are uncertain. The precision of the intercept estimates from the low temperature and high temperature pairs improved markedly because the intercept is outside their temperature ranges and using the pooled slope to extrapolate to the intercept improves the estimation. The point to emphasize here is that including this “low quality” data does result in net improvements to the overall slope estimate. The fact that this data is uncertain and does not cover the whole temperature range does not mean it will make the analysis worse; rather, it will on average improve it, although its impact will be relatively small because of its uncertainty. This type of pooled analysis recognizes the uncertainty in each data set, and weights the slopes from each data set in proportion to the relative uncertainties. But in the simulations so far, each data set had the same “true” underlying slope. It is in fact likely that different species of organisms have different temperature relationships. In the absence of adequate data to define relationships for individual species, the Update evaluated whether there was 58 ------- any evidence for an overall trend in the available data for all fish species. If some of the species had different “true” slopes, how would the analysis respond to this? In fact, it would use whatever slope was present in each individual data set, and weight each according to its uncertainty to develop a pooled slope estimate, so that “high quality” data sets would affect the slope more than “low quality” ones. The following simulation results demonstrates this point. This repeats the previous two simulations, except that the “true” slope used to generate the data for the second half of the data was set to 0.00, rather than -0.01: Data Set Type Parameter Mean of Estimates SD of Estimates One Set of Six w/ Slope=-0.01 and One Set of Six w/ Slope=0.00 Pooled Slope -0.0050 0.0028 Intercept (Set 1) 0.000 0.04 1 Intercept (Set 2) 0.000 0.041 One Set of Six w/ Slope-0.01 and Three Sets of Two w/Slope=0.0O Pooled Slope -0.0092 0.036 Intercept (0,6°C Pair) -0.111 0.085 Intercept (12,18°C Pair) 0.000 0.070 Intercept (24,30°C Pair) 0.111 0.08 5 Intercept (Set of Six) 0.000 0.041 In the first simulation, when both sets of data are of equal quality, the resulting pooled slope analysis is the average of the two, as is appropriate. In the second simulation, the slope is heavily weighted toward the slope of the more certain data. Thus, this pooled analysis technique does not allow data sets which are inherently uncertain to unduly influence the analysis, as the comments imply. Another concern raised in the comments was that some data sets not only had a limited range of temperatures, but also had a large number of data points, so that the analysis was unduly skewed to whatever slope that data set happened to have. Additionally, it was noted that some of these data sets, such as that of Thurston et aL, had substantial variability and that it would be inappropriate for such variable data to have a heavy influence on the pooled slope estimate. The comments further suggested that the large variability in some of the data sets suggested problems with the data that would also argue against it being given substantial weight in the analysis. The concerns of the comments were not justified on several counts: (1) Part of the apparent variability of the Thurston et al. data is due to the large size of the data sets, which are more likely to show data in the tails of the distribution. In fact, the standard deviation of the Thurston et al. data in 1998 Update Figure 3 is less than that of the Reinbold and Pescitelli bluegill gill data and only slightly greater than some other data sets. It is also less than the fathead minnow, walleye, and rainbow trout data of Arthur et al., which the comments use as their preferred data sets. (2) The variability noted in some of these data sets is not necessarily unusually large for toxicity data when tests are repeated enough times to actually produce good estimates of their variability. Actually, the variability around the regression line for some the data sets is 59 ------- more deserving of attention. The residual error around several of the regression lines in 1998 Update Figure 2 and 3 is less than the uncertainty of LC5O estimates, indicating that, by random chance, the relationships look better than they really are. (3) While the variability in the Thurston et at. studies may be higher than average, the comments present no basis for suggesting that this data does not contain legitimate information about the temperature relationship. Even if other factors contribute to this variability, only if these factors are confounded with temperature so as to obscure or enhance the temperature relationship should this data not be considered. Except for pH, which was accounted for in the Update analysis, there is no such indication that such confounding is of concern, and there is no apparent reason to think that this data set is more likely to be confounded by other variables than is any other data set. (4) The greater number of points in the Thurston et aL data sets will increase its weight in the analysis, but this is appropriate, standard statistical procedure, since more data provide greater certainty in the slope estimate, other factors being equal. However, this effect is not proportional to the number of data, but rather to the square root of the number of data (or, for some regression parameters, less than the square root). Thus, if the Thurston et at. data set has four times as much data as another data set, it will not have four times the influence on the slope estimate, but rather it will have at most two times the influence. (5) Finally, the limited temperature range and large variability in the Thurston Ct al. data sets make the slope estimate from these data sets less certain, and thus decreases their weight in the development of the pooled slope estimate. Again, these data sets do not have as much influence on the slopes as the comments suggest. To suggest that variability per se should preclude the use of this data is contrary to basic statistical estimation procedures. The procedures used in the Update consider the temperature range of the data, the number of data, and the variability of the data and suitably weight them in the analysis. In order to further evaluate this issue of whether data sets with limited temperature range and with a large number of variable points should be included in the analysis, a series of simulations were run to demonstrate that such data sets provide an appropriate contribution to the pooled analysis. This involved the use of a data set with a temperature range from 10 to 20°C, with a size of either 6 (to match earlier sets) or 28 (the size of the Thurston fathead minnows dataset), and with a log variability of either 0.10 or 0.15. The first set of simulations looked just at using such a data set as shown at the top of the next page. These simulations demonstrate how a restricted range will greatly increase the uncertainty of the slope estimate. The “high quality” data set with six points over a 30°C range had a standard deviation of 0.0040 for the slope estimates. Reducing the temperature range here by a factor of three increased the standard deviation by a factor of three, to 0.0 120. Increasing the number of points to 28 reduced the uncertainty, but only by about a factor of 2, so that even with 28 data points the 10°C range data set here is less certain than the 30°C range data set with just 6 points. The above table also shows, as expected, that increasing the variability of the data points proportionately increases the uncertainty in the parameter estimates. 60 ------- Data Set Type Parameter Mean of Param Est SD of Param Est 6 Pts 10-20°C Std Dev = 0.1 Slope -0.0100 0.0120 Intercept 0.000 0.04 1 28 Pts 10-20°C StdDev=0.1 Slope -0.0100 0.0063 Intercept 0.000 0.019 6 Pts 10-20°C Std Dcv = 0.15 Slope -0.0100 0.0180 Intercept 0.000 0.06 1 28 Pts 10-20°C Std Dcv = 0.15 Slope -0.0100 0.0094 Intercept 0.000 0.029 Simulations were then run in which the data sets included both a small “high quality” set (6 points, 30°C range, 0.1 log standard deviation) and a large “low quality” set (28 points, 10°C range, 0.15 log standard deviation). Furthermore, the high quality data was generated assuming a -0.0! slope and the low quality with a 0.00 slope, so that the resultant pooled slope estimate will more clearly indicate the relative weight given to the different data. Data Set Type Parameter Mean of Param Est SD of Param Est 6 Pts 0-30°C Pooled Slope -0.0071 0.0038 Slope = 0.01 Std Dev 0.1 28 Pts 10-20°C Slope = 0.00 Intercept (Set of 6 Pta) 0.000 0.04! Intercept (Set of 28 Pts) 0.000 0.0063 Std Dcv = 0.15 The resultant slope is much closer to that used to generate the smaller groups of points, because the analysis procedure recognized that, despite the larger number of points in the second group, the smaller temperature range and greater variability made its slope estimate less certain. But the second group still provides useful information on the slope, so some influence is apparent in the pooled slope. Other simulations and analyses could provide further information on the utility of different types of data sets, but will not change the basic conclusion -- that data sets with limited temperature range, whether it be with little or many data and with small or large variability, are useful in this kind of analysis. Such data sets will not unduly influence the analysis so that relationships from “quality” data sets are obscured; rather, including all the data will improve the chances of detecting any true mean trends in the data as a whole. Consequently, the data analysis should not be restricted to a few selected data sets, especially since the data sets that the comments recommend suffer from various problems and do not themselves demonstrate statistically significant trends. As mentioned earlier, further collection of appropriate data may well demonstrate temperature dependence of ammonia for some fish, but the most appropriate analysis of all the currently available data do not support the use of any temperature dependence. 61 ------- |