Report on the Benchmark Dose Peer Consultation Workshop


 United States
 Environmental Protection
 Agency
Office of Research and
Development
Washington DC 20460
November 1996
 Report on the Benchmark
 Dose Peer Consultation
 Workshop
RISK ASSESSMENT FORUM

-------
                                                         EPA/630/R-96/OH
                                                           November 1996
                    REPORT ON THE

BENCHMARK DOSE PEER CONSULTATION WORKSHOP
                        Prepared by:

                  Eastern Research Group, Inc.
                     110 Hartwell Avenue
                     Lexington, MA 02173
                 EPA Contract No. 68-D5-0028
                    Risk Assessment Forum
              U.S. Environmental Protection Agency
                       Washington, DC
                                                     Printed on Recycled Paper

-------
                                        NOTICE

       Mention of trade names or commercial products does not constitute  endorsement or
recommendation for use. Statements are the individual views of each workshop participant; none
of the statements in this report represent analyses or positions of the Risk Assessment Forum or the
U.S. Environmental Protection Agency (EPA).

       This report was prepared by Eastern Research Group, Inc. (ERG), an EPA contractor, as
a general record of discussions  during the Benchmark Dose Peer Consultation Workshop. As
requested by EPA, this report captures the main points and highlights of discussions and includes
brief summaries of discussion topic sessions.  The report is not a complete record of all details
discussed, nor does it embellish, interpret, or enlarge upon matters that were incomplete or unclear.
In particular, each of the five discussion topic summaries was prepared at the workshop by individual
discussion topic leaders based on the panel members' discussions during the workshop. Thus, there
may be slight differences between the five topic leaders' summaries.  ERG did not attempt to
harmonize the chairs' comments.

-------
                                  CONTENTS

                                                                       Page

 Foreword	rv

 SECTION ONE—INTRODUCTION		.... 1-1

      Background    	1-1
      Presentations   	,	1-3
      Peer Consultation Workshop	1-7

 SECTION TWO—CHAIRPERSON'S SUMMARY OF THE WORKSHOP	2-1

      Dr. Rogene Henderson

 SECTION THREE—DISCUSSION TOPIC SUMMARIES		3-1

      Selection of Studies and Responses for Benchmark
      Dose/Concentration Analysis	3-1
        Dr. James Olson
      Selection of the Benchmark Response Level	3-7
        Dr. Elaine Faustman
      Model Selection and Fitting		3-12
        Dr. Colin Park
      Use of Confidence Limits	3-17
        Dr. Lorenz Rhomberg
      Selection of Benchmark Dose/Concentration to Use
      as the Point of Departure	3-26
        Dr. William Pease

 SECTION FOUR—OBSERVERS' COMMENTS	 4-1


 APPENDIX A.       PEER CONSULTANTS/PRESENTERS	 A-l

APPENDIX B.       WORKSHOP AGENDA	 B-l

APPENDIX C.       CHARGE TO WORKSHOP PANEL MEMBERS	 C-l

APPENDIXD.       PREMEETING COMMENTS	 D-l

APPENDIX E.       FINAL OBSERVER LIST	 E-l
                                     in

-------
FOREWORD
This report includes information and materials from a peer consultation workshop organized
by the U.S. Environmental Protection Agency's (EPA's) Risk Assessment Forum (RAF). The
meeting was held in Bethesda, Maryland, at the Holiday Inn Bethesda on September 10-11,1996.
The subject of the peer consultation was the document entitled Benchmark Dose Technical Guidance
Document (External Review Draft, EPA/600/P-96/002A). A copy of this report can be obtained
through the Office of Research and Development's publications office, Technology Transfer and
Support Division, National Risk Management Research Laboratory, U.S. EPA, 26 West Martin
Luther King Drive, Cincinnati, Ohio 45268 (telephone: 513-569-7562; fax: 513-569-7566). The expert
panel was convened to independently comment on the draft guidance document and make
recommendations that will enhance the guidance development process as well as the ultimate
product.

Notice of the workshop was published in the Federal Register on August 28, 1996 (61 FR
44308). The notice invited members of the public to attend the workshop as observers and provided
logistical information to enable observers to preregjster. About 40 observers attended the workshop,
including representatives from federal government, industry, trade organizations, and consulting
firms.

In outlining the scope of the peer consultation, EPA emphasized that the draft guidance
document is in a preliminary stage of development and should not be construed as a policy
statement. EPA explained that the guidance is intended to be used in conjunction with other
Agency risk assessment guidance and to harmonize the methods used to conduct cancer and
noncancer quantitative risk assessments. EPA explained further that the draft guidance document
is still in a preliminary stage and therefore could benefit greatly from the comments and
recommendations of outside experts. EPA asked the expert peer consultants to concentrate their
review on technical issues concerning selection of studies and responses for benchmark
dose/concentration (BMD/C) analysis; selection of the benchmark response level (BMR); model
selection and fitting; use of confidence limits; and selection of the BMD/C to use as the point of
departure for cancer and noncancer health effects.

A balanced group of expert panel members were selected from academia, industry,
consulting, government, and environmental organizations. Selected panel members provided broad
experience and demonstrated scientific expertise in risk assessment. Experts represented the
following disciplines: toxicology, biostatistics, risk assessment/risk management policy, and
mathematics. Appendix A lists the 18 panel members.

In workshop discussions, EPA sought comments from these scientific experts on the draft
guidance document. The draft guidance document presents a procedure that is intended to have
reasonable criteria and defaults to assist risk assessors in promoting consistency among analyses of
health effects data in the observable range. The procedure is also intended to be useful for
determining the point of departure that can be used as the basis for linear low dose extrapolation
for cancer, calculation of a margin of error, or application of uncertainty factors for calculating oral
reference doses (RfDs), inhalation reference concentrations (RfCs), or other exposure estimates for
human health risk assessment. EPA will use the expert panel members' comments and
IV

-------
recommendations drawn from this peer consultation workshop in considering revisions to the draft
guidance document.

       The workshop report is organized as follows. The report opens with a brief introduction that
covers the background of the benchmark dose guidance document, presentations on two ongoing
Agency initiatives on the benchmark dose approach, and the purpose of the workshop (section 1).
This is followed by the chairperson's summary (section 2) and then the five discussion topic leaders'
summaries (section 3).  The last section of the report provides observers' comments (section 4).
Appendices to the workshop report include a list of panel members, the workshop agenda, the
charge to workshop panel members, premeeting comments, and a list of observers.
                                 William Wood, Ph.D.
                                 Executive Director
                                 Risk Assessment Forum

-------
                                    SECTION ONE

                                  INTRODUCTION
       This report highlights issues  and conclusions from an  EPA Risk Assessment Forum-
 sponsored workshop on the Agency's Benchmark Dose Technical Guidance Document (External
 Review Draft, EPA/600/P-96/002A) published August 9, 1996 (61 FR 44308). The workshop was
 convened to gather information from scientific experts that will assist EPA in further developing the
 draft guidance document.
BACKGROUND

       EPA has followed distinct practices for evaluating the dose-response relationships of cancer
and noncancer-causing agents. The linearized multistage procedure has been applied to extrapolate
risk as the 95-percent upper confidence limit for cancer, and the lowest-observed-adverse-effect-level
(LOAEL) and the no-observed-adverse-effect-level (NOAEL) approaches have been used to conduct
dose-response analyses of noncancer health effects. In 1996, EPA published Proposed Guidelines for
Carcinogen RiskAssessment (61 FR 17960-18011), which present an approach that will begin to break
down the dichotomy between quantitative approaches for cancer and noncancer risks. The proposed
cancer risk assessment guidelines emphasize an agent's mode of action in producing tumors and the
need to model tumor data as well as other biological  responses that might be important in the
carcinogenic process. The models can be used to estimate a point of departure for extrapolation
below the range of observable effects. The benchmark dose approach is one way of determining the
point of departure  for linear low dose extrapolation of carcinogens, calculation of a margin of
exposure (MOE), or application of uncertainty factors for calculating oral reference doses (RfDs),
inhalation reference concentrations (RfCs), or other exposure estimates.

      Following a 1990 colloquium recommendation, the Risk Assessment Forum took an active
role in promoting research and discussion on benchmark dose issues. A draft report was prepared
                                          1-1

-------
that outlined the technique and presented the major questions and decisions involved in applying
the benchmark dose method. This draft report was the subject of a 1993 Forum-sponsored
colloquium on applications of benchmark dose methods to noncancer risk assessment. Following
the colloquium, a Risk Assessment Forum technical panel published a background document on the
use of the benchmark dose/concentration (BMD/C) in health risk assessment (EPA/63Q/R-94/007).
In addition, several workshops and symposia have been held to discuss the benchmark dose
approach. Subsequent to the development of the background document, a Forum technical panel
authored the external review draft guidance document that served as the focus of the August 1996
peer consultation workshop.

In her introductory remarks at the gathering, Carole Kimmel, Ph.D., of EPA's National
Center for Environmental Assessment (NCEA), who is a member and chair of the Risk Assessment
Forum's technical panel on benchmark dose, explained that because of the Forum's involvement, the
draft guidance document is the result of an Agency-wide effort supported by the different EPA
offices represented on the technical panel. Dr. Kimmel announced that the Risk Assessment Forum
was in the process of developing a framework for health risk assessment that will harmonize
approaches for both cancer and noncancer effects. Mode of action data and precursor information
are being incorporated into the approach, thereby bringing the issues of the underlying basis for the
ttxdcity of cancer and noncancer health effects closer together.

Dr. Kimmel went on to explain that the benchmark dose document is a working draft that
has undergone one round of internal review. Several issues still remain, and EPA felt it was
appropriate at this stage in the development of the guidance to solicit comments and input from
outside experts on applying the benchmark dose approach to cancer and noncancer risk assessments.
Following this workshop and discussions within the Agency, EPA will revise the document, conduct
a peer review, and then publish a document under the auspices of the Risk Assessment Forum. The
final document will be used in conjunction with other EPA risk assessment guidance.

Rogene Henderson, Ph.D., a senior scientist at the Inhalation Toxicology Research Institute,
served as the chairperson of the workshop. In her introductory remarks, Dr. Henderson reviewed
the agenda for the workshop (see Appendix B) and the charge to workshop panel members
(Appendix C). Dr. Henderson explained that EPA's goals for the guidance document are to have

1-2

-------
 a procedure that is usable; has reasonable criteria and defaults; can be used for cancer and
 noncancer assessments when endpoints are relevant to both; and informs our understanding of risk
 in the range of extrapolation.  She then discussed the limitations of using the LOAEIVNOAEL
 approach, including:

       •      levels are dependent on study design (e.g., choice of doses, numbers of animals);
       »      the variability in the data are not taken into account;
       •      the slope of the dose-response curve is not taken into account; and
       •      an uncertainty factor is used to connect a LOAEL to a NOAEL.

 In contrast, the benchmark dose approach, as  an alternative to the LOAEL/NOAEL approach,
 makes better use of the available data, including taking into account the slope of the dose-response
 curve and the variability of the data.  The BMD/C is defined as the lower confidence limit for the
 dose that is estimated to produce a given level  of change  in response (i.e., the benchmark response
 [BMR]).  BMD/C estimates are best w,hen there are doses in the study near the range of the
 BMD/C; but the BMD/C does not have to be one of the experimental doses.

       To help focus the groups' efforts on addressing the charge to workshop panel members, Dr.
 Henderson reviewed the purpose and goals of the workshop.  She reminded panel members that the
 objective was not to  reach consensus on issues, but to identify and elucidate issues relevant to the
 draft guidance document.
PRESENTATIONS

       Prior to discussions by panel members, EPA scientists who were among the co-authors of the
draft guidance document presented information to workshop participants on two Agency-sponsored
initiatives to support the development of guidance on the benchmark dose approach.
                                          1-3

-------
Discussion of Simulation Studies
Woodrow Setzer, National Health and Environmental Effects Research Laboratory

Dr. Setzer presented the preliminary results of simulation studies that are being conducted
to determine the usefulness of the limit of detection (LOD) approach for setting the BMR. Dr.
Setzer began his presentation by expressing the opinion that the motivation for adopting the BMD
approach was not principally dissatisfaction with existing approaches. Dr. Setzer indicated that the
assumption is to use BMDs as plug-in replacements for NOAELs, with little or no change in the
structure of uncertainty factors. The recommended BMR of EDOS or ED10 (effective dose) is likely,
however, to result in substantially lower (sometimes higher) RfD/RfCs than the NOAEL approach.
The LOD is a methodology for specifying a BMR in such a way that, hopefully, the overall
conservatism of noncancer risk assessments would be similar to what is obtained when using
NOAELs (though not necessarily for individual dose-response assessments). The definition of LOD
is the magnitude of response just detectable in a two-group design (control and one treatment
group) using a one-sided test with a Type I error of 0.05 and a predetermined power. The draft
guidance document proposes that, in the absence of determining a "biologically significant" response,
the BMR should be set as the LOD of a typical "good" design for the species and endpoint
considered. For example, designs recommended in various testing guidelines would be considered
good designs. The goal of the LOD methodology is to have a well-designed bioassay where the
resulting BMD provides the same level of conservatism as the NOAEL.

Dr. Setzer emphasized that the simulation study is a pilot, and that it is currently incomplete.
The results and analyses presented, therefore, must be considered to be preliminary and subject to
update. The goals of the simulation study are to:

» determine whether using an approach based on power simplifies the specification of
a BMR with respect to maintaining the current level of conservatism in the RfD/C;
" estimate the power to be used in the LOD to maintain the current level of
conservatism; and
« explore the behavior of the BMD relative to NOAELs.

1-4

-------
The structure of the simulations include assembling a collection of quantal and continuous dose-
responses on the dose range 0 to 100. In this study, four distinct quantal shapes and four distinct
continuous shapes are used. Each quantal shape is considered in conjunction with background
incidences of either 0.05 or 0.15. Each continuous dose-response shape is considered in conjunction
with a coefficient variation of either 15 or 30 percent. This makes a total of eight quantal models
and eight continuous models (a small sample of possible dose responses). Other components of the
simulation structure include:
• Consider experimental designs with either 10 or 20 animals per dose group and
either three or four total doses (i.e., four different designs).
• For each of the 32 model x design combinations for each kind of endpoint (quantal
and continuous), 100 random data sets using binomial random numbers for quantal
endpoints and lognprmal numbers for continuous endpoints are generated.
• For each quantal data set, fit log-logistic and Weibull models. For each continuous
data set, fit linear, quadratic, and power models. Also, assess the effect of threshold.
Reject badly fitting models using the chi-squared goodness of fit test and select
among the rest of the models by taking the model with the lowest Akaike
Information Criterion (AIC), a measure of the deviance of the model fit adjusted for
the degrees of freedom.
• Calculate BMRs given sample size and background (quantal) or coefficient of
variation (continuous) for powers of 0.10, 0.25, 0.50, 0.75, and 0.90.
• Calculate the NOAEL using the NOSTASOT (no-statistical-significance-of-trend)
approach.

Dr. Setzer reviewed the preliminary results of the pilot simulation study for quantal data
only. For each simulated quantal data set, the BMD was compared to the NOAEL. The
distribution of the median BMD:NOAEL ratio among the nine dose responses at each power level
for each of the designs showed that the ratio increased with increasing power levels. The results
indicate that the BMD is more stable than the NOAEL for the dose responses studied in the
simulation.
1-5

-------
Development of Software for Benchmark Dose/Concentration Analysis

Daniel Guth, National Center for Environmental Assessment

Dr. Guth discussed EPA's work on developing software for BMD/C analysis. Due to the

limited choices in commercial software, the inflexibility of available software, and the need for
consistent methods and model outputs, the Risk Assessment Forum technical panel identified the

need for software to accompany the proposed guidance as a priority. The intended audience of the

software is toxicologists, risk assessors, and statisticians.

Dr. Guth described the following software design criteria:

• Freely distributable—Does not require license fees or other software.

• User-friendly—GUI-based and only allows models appropriate to data type.

• Accessible—Windows and Macintosh platforms; able to run on a 486 or better
machine; and includes on-line help with explanations of models and parameters.

• Flexible—Standard versus advanced user modes; data entry (direct or import
spreadsheet); multiple models selectable; various data types allowed; graphical
outputs; and batch operation available.

• Does not set policy—BMR entered by user; parameters are unconstrained; more
models available than needed; and exceptions (i.e., method for calculating confidence
intervals and exclusion of "threshold" parameter).

The outline for the software capabilities includes:

• Input data—Import ASCII or spreadsheet files; enter data from the screen; modify
data structure (add, change, or delete variables); create a new data set as a subset
of an existing file; and generate random data for simulation.

• Data file management—Sort data; change or add data records; and transform or
compute a variable.

• Data analysis—Select data set (dependent and independent variables); select from
available models; save or execute requested analysis; and calculate BMD/C.
1-6

-------
              Models available—Dichotomous data (Probit, Weibull, Logistic, Gamma Multi-Hit,
              Quantal  Linear, Quantal  Quadratic, Quantal  Polynomial [multistage]); nested
              dichotomous data (Logistic, Rai and van Ryzin, National Center for Toxicological
              Research [NCTR]}; and continuous data (Linear, Polynomial, Power).

              Advanced mode—Specify parameter values; place constraints on parameter values;
              specify model fitting options; and generate simulated data from specified model,
              parameters.

              Output—Parameter  estimates; statistical  report  (goodness-of-fit  measures and
              diagnostics); and graphical displays (maximum likelihood estimate [MLE], confidence
              interval, points).
PEER CONSULTATION WORKSHOP


       To involve outside scientific experts in development of the draft guidance document, EPA's

Risk Assessment Forum sponsored a two-day workshop, which was held on August 10-11,1996, at

the Holiday Inn in Bethesda, Maryland. The meeting gathered 18 experts (see Appendix A for a

list of workshop peer consultants/panel members) with the objectives of describing points of view
about issues outlined in the charge to workshop panel members (Appendix C), identifying and

elucidating other issues, and highlighting areas for further development.


       Prior to the workshop, EPA provided each expert with a copy of the external review draft

Benchmark Dose Technical Guidance Document. EPA asked workshop participants to review these

materials and respond to the following issues:


       •      the appropriate selection of studies and responses for BMD/C analysis;

       •      the use of biological significance or limit of detection for selection of the BMR;

       •      model selection and fitting;

       •      the use of the lower confidence  limit as the BMD/C; and

       •      selection of the BMD/C to use as the point of departure for cancer and noncancer
              health effects.
                                          1-7

-------
These comments were assembled and sent to all panel members prior to the workshop. See
Appendix D for the workshop panel members' premeeting comments.
                                       1-8

-------
SECTION TWO

CHAIRPERSON'S SUMMARY OF THE WORKSHOP

Rogene Henderson, Chair
Inhalation Toxicology Research Institute
Albuquerque, NM

The major purpose of the workshop was to solicit the views of experts on the draft of EPA's
Benchmark Dose Technical Guidance Document. For this preliminary draft, input was sought on key
issues concerning the technicalities involved in use of the BMD/C approach so that EPA can
appropriately revise the guidance. The meeting was attended by the panel members, several co-
authors of the draft document, and public observers (see list of public observers in Appendix E),

The workshop was structured around the premeeting comments solicited from the panel
members. As background, however, two of the document co-authors gave informational
presentations about topics related to the guidance. Then the discussion leaders presented
summaries of the premeeting comments on each of five major issues regarding the calculation of the
BMD/C. This was followed by a general discussion of each issue by the panel. Authors of the
different sections of the document provided clarification of points as required. Observers were given
two opportunities to provide their comments during the meeting.
Informational Presentations

Dr. Woodrow Setzer of EPA presented information about the results of simulation studies
that are underway to determine the usefulness of the LOD method for setting the BMR. The panel
then discussed issues raised in the presentation. Panelists were in general agreement that biological
significance should be the primary factor in setting the BMR and not the LOD. The panel also
supported an approach in which biological significance is the first factor to be considered followed
by a test of statistical significance.
2-1

-------
The simulation studies of Dr. Setzer, which were considered to be well done, indicated that
50-percent power yielded results closest to that of the NOAEL. Based on this information, the panel
discussed whether the NOAEL should be considered the "gold standard" for the BMD/C approach
or whether the two should be considered separately. Some panel members contended that there is
no need to change to the BMD/C method if concurrence with the NOAEL is the validity test for
the BMD/C numbers. One might just as well use the NOAEL to start with. Others held that some
comparisons of the BMD/C numbers with earlier NOAEL numbers is necessary to determine if the
new method is in the "ballpark" of numbers that had previously been considered to be protective of
human health. Some panel members strongly disagreed with Dr. Setzer's statement that "The
mandate is to use BMDs as plug-in replacements for NOAELs, with little or no change in the
structure of uncertainty factors,"

The second presentation was given by Dr. Daniel Giith of EPA on a software package that
the Agency is developing for calculation of the BMD/C. In the panel discussion that followed, some
members expressed concern that the software might restrict some investigators from developing their
own software. In this context, the panel discussed the merits of prescriptive versus nonprescriptive
approaches to guidance on calculating the BMD/C. The workloads of many people who are doing
such calculations on hundreds of new compounds may prevent them from using anything, but a
standardized, prescriptive approach to making the calculations.
Discussions of Major Issues

Each of the five major issues identified in the charge to the panel were discussed at length.
Details of these discussions are summarized in the reports of the individual discussion leaders (see
section 3). Regarding these major issues, the panel members reached consensus on only one point:
Biological significance should be the basis for the choice of a BMR rather than the LOD approach
as proposed in the document. On whether the central estimate of the BMD or the lower 95-percent
confidence value should be used for further calculations of risk, the panel engaged in a lengthy
discussion. A majority felt that the central estimate should be used, for reasons stated in the
premeeting comments. No consensus was reached on this recommendation, however.
2-2

-------
In the discussion about the technical points involved in calculating a BMD/C, panel members

were generally in agreement that whatever model was chosen, the model and the data should be

graphed to help in determining goodness of fit. Also, panelists were in general agreement that

background responses should be included in the models; opinions were divided, however, on

assuming a'threshold for a model. Dichotomizing continuous data was not considered the best

approach by many panelists. One panel member described an approach in which continuous data

could be used to calculate a BMD/C without dichotomization, and this approach was well received

by the panel. Another panelist offered the aid of the American Industry Health Council (AIHC)

in helping EPA with some of these difficult technical issues.
General Comments on the Document as a Whole

The following general points summarize the panel's discussion about the document as a whole:
The panelists generally agreed that the guidance needs to be a "stand-alone"
document that can be understood by itself. In the main text, many references were
made to other EPA documents or to the appendices. Panelists suggested that the
document would be easier to use if excerpts from the cited documents were inserted
in the appropriate places and portions of text in the appendices were moved to the
main text.

A panelist suggested that a common nomenclature (rather than ED,, TD^ and
BMDj) should be adopted for noncancer and cancer endpoints when the BMD/C
approach is used. The panel expressed a general concern that the draft document
represents "statistical overkill" because the statistical approaches were much more
elegant than the relatively coarse data that one often has to work with. A related
concern was that the methods proposed in the document should be transparent. A
panelist suggested that the document should be reviewed by a group of risk managers
to determine if the method is reasonably transparent to the group that must use the
results of the calculations. Also, the value of calculating a range of values rather
than a single point estimate was mentioned.

Panelists discussed use of toxicokinetic data to improve dose estimates and
consideration of patterns of responses as well as single endpoints. Authors of the
report pointed out that for any calculation of risk, the best scientific information
available should be used. The issue of using the best data for dose and for response
was not considered unique to the problem of calculating the BMD/C.
2-3

-------
The panel expressed general concern about how the Agency might implement the
benchmark analyses in the regulatory arena. How would a benchmark dose be used
in a margin of exposure approach? What default options would be used? What
uncertainty factors would be used? The panel recommended that EPA explicitly
state that the process is an iterative one requiring sound scientific judgment.
Some panelists found the process described in the draft document to be too
prescriptive, while others pointed out that the people conducting risk assessments on
multiple compounds often only have time to follow a prescriptive approach.
General Issues Other Than the Five Main Issues

Additional issues arose during the course of the workshop and were addressed during
sessions on general considerations. One of these was whether the same uncertainty factors should
be used with the BMD/C numbers and with the NO AEL/LOAEL numbers. Because no LOAEL-to-
NOAEL conversion is involved in the benchmark approach, the uncertainty factor of 10 that is
normally used for this conversion was considered inappropriate for the benchmark approach. The
BMD/C, however, is associated with a stated level of response, such as the ED10. Thus, some
participants recommended using a factor of 10 to,go from the risk associated with ED10 to a lower
risk. Other uncertainty factors, such as a factor for animal-to-human extrapolation, for human
variability, and for differences in study duration would apply to the BMD/C as well as to the
NOAEL/LOAEL approach. Some panelists contended that the BMD10 should be considered a
LOAEL, but no consensus was reached on this point.

The panel briefly discussed the issue of whether both cancer and noncancer adverse health
effects, as well as both acute and chronic noncancer effects, could be analyzed by the BMD/C
approach. Panelists found no real impediment to using the general approach in all these cases,
although some specifics of the analyses might differ for each type of health effect.

Panel members also discussed the objective in calculating a benchmark dose. Is it for
comparison across endpoints? to match NOAEL values? to retain the same level of conservatism
as a NOAEL? to avoid using NOAELs with high levels of risk?
2-4

-------
As had been revealed during Dr. Setzer's presentation, the panel was of two opinions
regarding the attempt to match the BMDs to previously determined NOAELs. Some panelists
contended that such a course is necessary to determine if the level of conservatism of BMDs is
similar to that for the NOAELs, which previously had been thought to protect human health.
Others held that such an exercise is not necessary because the BMDs, which were developed to make
better use of all available scientific data in completing risk assessments, should be more valid than
the NOAELs. The panel did not reach consensus on this issue. The panelists did generally agree,
however, that BMDs should be valuable for comparison across endpoints.

The panel also discussed whether EPA should continue to move forward in developing a
benchmark approach. The general consensus was that the Agency should continue its work in this
area; there was agreement that the analysis of quantal data by this approach is further along than
the analysis of continuous data. In continuous data, one must be concerned with the severity of the
response. For analysis of continuous data, experts in the field of each type of endpoint measured
by such data would need to be gathered to determine what degree of change in an endpoint is
considered to be biologically significant as an adverse health effect. Such decisions in the many
fields of study concerning noncancer endpoints will involve a considerable investment of time and
money by the Agency.
2-5

-------

-------
                                  SECTION THREE

                        DISCUSSION TOPIC SUMMARIES

        Selection of Studies and Responses for Benchmark Dose/Concentration Analysis
                             James Olson, Discussion Leader
                        Department of Pharmacology and Toxicology
                          State University of New York at Buffalo
                                      Buffalo, NY
 General Comments
       The Introduction section of the document clearly presents the limitations of the current risk
assessment procedures that utilize LOAELs and NOAELs. While this presents a good background
on this issue, it would be helpful if the beginning of the document also presented a clear definition
of the BMD/C, the perceived benefits of this approach, and a brief discussion of how the EPA plans
to utilize the BMD/C for cancer and noncancer risk assessment. Page 17 may be too far into the
document to present a clear definition of BMD/C.

       The issue of selection of studies and responses is a critical first step in the process of
establishing a BMD/C. The document states that selection of the appropriate studies and endpoints
is discussed in  Appendix A and in various EPA publications (U.S. EPA, 1991a, 1994c, 1995f, 1996a
and b).  The panel members suggest that the clarity of the document would be greatly improved if
the Benchmark Dose Technical Guidance Document could be a stand-alone document. Citing other
documents and references is useful for identifying additional information, but whenever possible the
present  document should contain all necessary key information relevant to developing BMD/C
estimates.  For example, if only high quality, peer reviewed studies are to be considered for
evaluation, this needs to be stated directly in the document. If human studies are given more weight
than animal studies, this also needs to be clearly stated.  It is understood that the document cannot
discuss in detail all aspects of risk assessment that enter into the process of deriving a BMD/C.
However, it would be helpful for the document to acknowledge that it is the intention of the Agency
                                          3-1

-------
to address issues, such as exposure assessment, that are key to the process of risk assessment.
Pharmacokinetic considerations, including physiologically based pharmacokinetic (PBPK) models,
tissue dosimetry, body burden, and equivalent human dose, need to be identified in the document
as key issues in the process of selecting studies and endpoints for developing BMD/Cs.
Issues Related to Selection of Studies

The document clearly states that the first step in the process is a complete qualitative review
of the literature to identify and characterize the hazards related to a particular compound or
exposure situation (p. 18, lines 11 and 12). The document goes on to state that "the selection of the
appropriate studies is based on the human exposure situation that is being addressed, the quality of
the studies, and the relevance and reporting adequacy of the endpoints." Again, it needs to be stated
that only high quality, peer reviewed studies will be evaluated. It would also be helpful to include
material from Appendix A in the body of the document. The document states on p.18, lines 21-23,
that "the process of selecting studies for benchmark analysis is intended to identify those studies for
which modeling is feasible, so that BMD/Cs can be calculated and used in risk assessment." Several
panelists commented that all studies should be evaluated, without consideration as to suitability for
modeling. A number of comments were made regarding the minimum data set for calculating a
BMD/C (see 3 bullets at bottom of p. 19):

» The statement on p. 19, line 19, that "at minimum, the number of dose groups and
subjects should be sufficient to allow determination of a LOAEL," is not clear; the
statement implies that one should not model data sets for which a LOAEL was not
actually observed. This criterion also makes a precise definition of LOAEL essential.
* There was some disagreement with the statement that "with only one responding
group, there is inadequate information" (p. 19, line 23). This statement needs further
justification.
• The existence of only high level response data (criterion on p. 19, line 25) should not
necessarily preclude modeling.
3-2

-------
• Dose-response modeling should be conducted only when there is evidence of the
shape of the dose-response relationship. The presence of evidence suggesting the
existence and general shape of dose-response relationships allows for fitting dose-
response models to the data sets under consideration. Perhaps consider using a
trend test on the doses that exhibit a response as a way to determine if there exists
sufficient information about a dose-response relationship for modeling.

There was some concern regarding the requirement that preferred studies should always
contain dose-response data in the range of the BMD/C (see Appendix A in the document). To
make it a requirement that there be an experimental dose that gives a response about equal to the
BMR would be too restrictive. Data used in BMD/C calculations for acute toxicity (noncancer)
studies require some flexibility. Data are often comprised of small group size (five or six animals
per group) so that observing responses in the range of the BMR will not be possible for many of
these studies.

The reader should be cautioned regarding the statement on p. 18, lines 21-23, that for some
chemicals, use of a study that provides a NOAEL from a quality study for a relevant, sensitive
endpoint is preferable to a BMD (which may be higher) from a study where it can be calculated.

Tree analyses may be a useful tool in organizing, presenting, and communicating BMDs for
each of the relevant studies and endpoints. A plausibility distribution could also be used to reflect
the relative likelihood of these calculations being relevant to humans.
Issues Related to Selection of Responses

Selection of the appropriate endpoints for the BMD/C analysis is the next important
consideration. The document (p. 19, line 9) was somewhat vague, stating that the endpoints to
model should focus on endpoints that are relevant or assumed relevant to humans and potentially
the "critical" effect (i.e., the most sensitive). The document indicates that multiple endpoints can
be modeled, but are there sensitive responses (biological vs. toxic) that are not appropriate to
model? It might be helpful to discuss the use of specific endpoints, such as an increase in liver
weight, increase in hepatic cytochrome P450 protein levels, etc., with regard to their suitability for
deriving BMD/C estimates. Perhaps some discussion of biomarkers of exposure/effect would be
3-3

-------
helpful in the discussion regarding the selection of appropriate endpoints. The focus on endpoints
that are relevant to humans and the most sensitive effect is where extensive lexicological knowledge
is required on the part of the risk assessor. Considerable discussion and, hopefully, growing
consensus is needed on the identification of relevant endpoints. For example, the BMD approach
may not be well suited for neurotoxicity data sets. Further work is needed to address how data sets,
which are often unique to a specific endpoint, will be evaluated for modeling.

Endpoint selection should be based on the relevance of the endpoints and quality of the
study (good experimental protocol), without regard to the ability to derive BMD/C estimates. The
goodness of fit of the data ("smoothly increasing response") should not be a major factor in endpoint
selection. It will also be necessary to reduce the number of endpoints that need to be considered
in some cases (eliminate redundancy; consider issues such as representativeness and sensitivity).

Have there been any studies or work done to support the claim that having LOAELs
differing by a factor of 10 (p. 19, line 13) will ensure that the "critical" BMD/C will not be missed?
It appears inappropriate to make the statement that all endpoints whose LOAELs are within an
order of magnitude of the lowest LOAEL should be modeled (the critical effect will be selected as
simply the lowest BMD).

The BMD/C should be only one of several tools available to the risk assessor. If the data
on the endpoint from the best quality study are not conducive to model fitting but provide an
adequate point of departure (i.e., an appropriate NOAEL) for extrapolation to derive an acceptable
exposure level, then there is no need to use alternative endpoints to determine a BMD/C.

The EPA should consider emphasizing "severity" of impact (in contrast with sensitivity) as
the principal consideration. It would be ideal if BMD starting points for different compounds could
be selected to be roughly comparable in terms of potential impact on human health. One alternative
would involve scoring observable endpoints in terms of their severity and attempting to select the
critical endpoint in terms of biological significance.
3-4

-------
Issues Related to Selection of Studies and Endpoints for Cancer and Noncancer BMD/C Analysis

BMD/C analyses are intended to be used for a wide range of experimental data sets. Each
toxicological discipline has somewhat unique experimental protocols, generating data sets that vary
with regard to route of exposure, magnitude of dose (daily and cumulative), duration of the exposure
and study, type of data generated during and at completion of the study (dichotomous, continuous,
categorical), variability in the data, and potential health significance of the data collected. In
general, the panelists were supportive of attempting to model both cancer and noncancer data with
the goal of deriving BMD/Cs. However, the document needs to more directly address the inherent
differences in these study designs. Little attention was given to the issues of duration of exposure
and cumulative versus daily dose for noncancer endpoints. If possible, the document should attempt
to address these issues that relate to the use of cancer and noncancer endpoints in developing
BMD/Cs. Although separate sections of the document discuss the application of BMD/Cs for
cancer and noncancer risk assessment, it might be useful to have a separate section in the first part
of the document that addresses the special issues of deriving BMD/Cs from noncancer and cancer
studies.
Issues Related to Combining Data Sets

The panel members considered this to be an important issue that requires more clarification.
For example, more guidance is needed as to what constitutes biological and statistical compatibility.
The suggestion was made to include reference to the peer-reviewed publication by Allen et al.1 One
example of an approach for determining the appropriateness of combining data sets for analysis is
given In attachment B of Dr. Fowles premeeting comments (see Appendix E of this report).
1 Allen, B.C., Strong, P.L., Price, CJ., Hubbard, S.A., and Datton, G.P. 1996. Benchmark Dose
Analysis of Developmental Toxicity in Rats Exposed to Boric Acid. Fundam Appl Toxicol 32:194-
204.
3-5

-------
        Possible criteria for determining when studies could be combined are listed with the
 premeeting comments provided by Dr. Naumann.  They are:

        •     statistical evidence that the study attributes are not  different  (e.g., population
              variance, group mean response at similar dose levels);
        •     similarities  in conducting  studies (e.g., species, strain,  group size,  protocol,
              laboratory);
        *     similarities in endpoints and data reporting (e.g., individual values vs.  summary
              statistics);
        •     congruence in modeling results between individual and combined data sets (i.e., does
              the combined model yield similar values for goodness-of-fit, MLE, and lower bound
              on dose at the BMR level); and
        «     ability to clearly state the rationale for combining studies.

The combining and weighing of different endpoints within studies, as with the boron example (#3)
in Appendix D of the guidance document, should be approached with caution because, despite the
use of  expert judgment, it is  still  subjective. There is a need  to  avoid any appearance of
manipulating the data and to be able to explain the rationale for weighing endpoints. Transparency
is very important to avoid the "black box" aspect that is inherent to the BMD approach (and
mathematical modeling in general).
                                           3-6

-------
                         Selection of the Benchmark Response Level
                            Elaine Faustman, Discussion Leader
                           Department of Environmental Health
                      School of Public Health and Community Medicine
                                 University of Washington
                                       Seattle, WA
       The panelists considered the three approaches (biological significance, limit of detection, and
 default options) presented in the draft document for benchmark calculation. The panelists also
 considered the presentation at the meeting by Dr. Setzer on LOD methods and his simulation study
 results in making the following comments.

       The benchmark dose methodology based on biological significance  generated the most
 enthusiastic response. These comments were largely positive and of the three approaches this one
 generated the most supportive comments. Many panel members expressed the sentiment that a
 scientific basis for the risk assessment methods was a requirement.

       There was also, however, strong support for clarification in the document and a call for more
 research for all but developmental toxicity endpoints (some panelists even felt more studies were
 needed for this endpoint as well). In particular, the panel discussed neurotoxicity experiments where
 patterns of effects in the functional observation battery (FOB) tests were more important than
 individual responses.  In fact, some comments indicated that the biological significance of single
 responses would be unknown. No additional research evaluations were presented at the workshop
 to indicate that EPA had applied, this methodology to large numbers of neurotoxicity endpoints;
 however, some limited examples were cited by panel members. For the biological  significance
 approach, there was general agreement that further investigation of the neurotoxicity endpoints
would be desirable before wide-scale application for regulatory purposes.

       In part, the panel's request for clarification of biological significance could be addressed by
incorporating additional examples  of guidance information from the specific risk assessment
documents on developmental toxicity, reproductive toxicity (draft), neurotoxicity, and general acute
                                          3-7

-------
and chronic toxicity. Primarily, the panel sought clarification about the toxicology principles
underlying the response that would be pertinent for NOAEL or benchmark methods.

The need for additional information or illustration of how endpoint-specific guidance from
the referenced guidance documents would be used was very evident when the continuous endpoints
methodology was discussed. The panel spent considerable time discussing the significance of various
highly specific endpoints (e.g., what level of change in cholinesterase level is considered biologically
significant, what level of fetal body weight is considered adverse). These discussions are not
specifically relevant to the benchmark dose discussion but should be discussed for both NOAEL and
benchmark approaches. In regards to setting appropriate, biologically defensible levels of change
for continuous endpoints, the panel did discuss various approaches. Several panel members discussed
specific approaches (either in writing or in verbal comments) that they had used with continuous
data. There was general agreement that quantalization of continuous data resulted in loss of
information; however, the panelists had differing opinions about how much loss of information would
occur. Numerous methods for evaluating the differences in response of treated groups versus
control groups were discussed. The need for the document to address in greater detail the body of
literature on these approaches was evident from this discussion. One could imagine the usefulness
of adding tables listing the various approaches, how the approaches have been applied, and what the
authors' comments were concerning the applicability for that endpoint. In general, these approaches
centered on using some comparison of the treated response groups with the distribution of the
control responses. The panel members noted the lack of evaluations specific for neurotoxicity study
designs.

Another issue for benchmark methods and the setting of a response level arose when the
panel discussed the ultimate use (goal) of the benchmark dose methodology. The panel discussed
whether the benchmark methods were being used to develop a common metric across diverse
endpoints of cancer and noncancer effects (holistic view) or were to make comparisons within
compounds across endpoints. The panel discussed the need to identify common points of departure
(responses) for the benchmark methods that might be defined as the same response level in terms
of impact (e.g., equally likely to produce death, equally likely to diminish life quality, equivalent
adversity). Some panel members discussed the need to use severity rather than abnormality as the
common basis for response comparisons. Although time was spent on this topic, no solutions were

3-8

-------
identified. The panel recognized that this topic was better left for subsequent workshops and that
such considerations would be pertinent if either NOAEL or BMD approaches were to be used in
cross-endpoint comparisons, such as for cost-benefit analysis. The EPA representatives were asked
directly about this point, and they responded that the Agency would look to the specific disciplines
to define the significance of biological responses within each. The panel raised the possibility of
implementing the guidance document in phases, with use of an iterative process.

In regard to the response level issues, the panel introduced the discussion on how uncertainty
factors would be used. Panelists expressed the need to discuss uncertainty factors as part of the
discussion on response level. Protection of sensitive individuals would still be anticipated to be
accounted for in the use of an uncertainty factor approach.

A later discussion of whether the benchmark response should be viewed as a NOAEL or
a LOAEL is very pertinent to the response level discussion. The pros and cons of that approach are
discussed in that section.

The panelists spent a significant portion of their time at the meeting discussing general issues
that are critical for conducting risk assessment based on good science; however, most of these issues
were broader than the benchmark methodology and were just as important for calculation of
NOAEL values. It was clear from this discussion of science issues for good risk assessment that
most of the panel members were frustrated with the default approaches used in general risk
assessment but were wary of new methodologies— "rocking the boat." The panel contended that
more details are needed in the guidance document on how biological significance would be defined
and applied to benchmark does methodologies.

Panel members also discussed other areas of needed clarification, such as whether additional
risk or extra risk would be used as the basic response. Additional discussion of potential differences
between cancer and noncancer risk and the use of these two reference points is needed in the
document, and the definitions of these two descriptions needs to be in the body of the document
rather than hidden in an appendix. The panel found it confusing to discuss BMD terminology in
the noncancer endpoints and ED terminology in the text of the cancer endpoints. A common
3-9

-------
terminology is needed, and one panel member noted that the ED terminology is more intuitive in
regards to the use of confidence limits.

Further, the document needs additional illustration of the potential use of individual versus
mean responses as well as patterns of response versus individual endpoint responses. Some of this
could be accomplished in an expanded section on selection of critical endpoints. The requirement
of the benchmark dose methodology to be useful with patterns of exposure versus individual
responses seems to be especially important for endpoints like neurotoxicity. This issue also affects
NOAEL methods. Two panel members reminded the panel not to overlook human epidemiology
data, not only in defining the biological significance of findings in rodent studies but also in defining
how we look at population responses.

The second approach that the panel discussed and commented on was the limit of detection
methods. Most panelists wanted clarification of this approach. The guidance document referred to
the presentation provided by Dr. Setzer at the workshop. Dr. Setzer's presentation illustrated the
low power of detection of effects of many study designs used with toxicology testing. A problem that
the panelists had with the LOD approach was illustrated in a quote from the presentation of Dr.
Setzer: "The draft guidelines propose that, in the absence of the determination of a 'biologically
significant' response, the BMR be set as the LOD of a typical 'good' design for the species and
endpoint considered. For example, designs recommended by various testing guidelines would be
considered good designs." One panelist noted the similarities in this LOD approach as compared
with the LOD approach used with environmental monitoring. Abuses of this approach in that
application (e.g., remediation being undertaken for no contamination because the LOD for
environmental monitoring has been taken as a possible level of contamination and is added with
other nondetects) were referenced as an example of why this approach can cause problems. This is
specific to the BMD methods because they involve assigning a level of response, versus no response
assigned to the NOAEL value. Panel members felt that the LOD approach would lose the
advantage of assigning a specific response level—a key advance of the benchmark methodology
would be lost.

Other panelists in the group presented an alternative view that the LOD approach allowed
the researchers to bound the response. Thus, it would be very useful, not only in showing the

3-10

-------
researcher how low the power of currently used biology test procedures are, but also by providing
a reference point. Yet other panelists felt that the LOD approach would provide disincentives for

improving limitations in experimental design. By accepting a LOD approach, we would not be able

to overcome poor experimental design.

Another concern that the panel members raised in regard to the LOD approach is that it was
providing the wrong incentive for researchers to identify increasingly sensitive biomarkers for
response. The document does not address just how sensitive, yet not clearly adverse, biomarkers of

effect would be handled. Would a LOD approach also be used with these types of studies?

The panel also discussed default procedures. It was clear from both the written comments

and the verbal comments at the meeting that the panel members were confused about the

application of the default procedures for continuous endpoints. Most panelists felt more

comfortable with default approaches for quantal approaches than for the proposed LOD approach.
Most of the discussion centered on responses of 5 to 10 percent.

The panel members had several key general comments:

• Several panelists stated that the document represented a statistical overkill and was
in danger of being too prescriptive.

• Some panel members noted that the document was largely silent on use of
biologically based models. There was some support for adding a section to the
document on how the BMD methods might fit logically with progression toward
these models.

• A number of panel members contended that common nomenclature is needed for
use of BMD-like methods across endpoints. Eliminate use of both ED and BMD
terms to convey the same concept and use one term consistently for application in
cancer and noncancer endpoints.
3-11

-------
Model Selection and Fitting
Colin Park, Discussion Leader
The Dow Chemical Company
Midland, MI
Is the Order of Model Application for Continuous and Dichotomous Data Appropriate?

The main comment made here is that there is an apparent inconsistency between continuous
and quantal data. For continuous data, it is recommended that a linear model be fit first; whereas
in quantal data, more complex models are recommended (e.g., using a polynomial of degree k-1,
then reducing the number of terms as appropriate). One reason for this inconsistency is likely due
to historical practices of fitting models for these different types of data. There did not appear to
be any clear consensus on whether harmonization was important and if so, which way to go, although
the subject came up again under the next question.

Apart from this comment, the general consensus appeared to be that the guidelines in this
section were appropriate.
Should Other Models Be Considered, or Should the Number of Models Applied Be More
Restrictive?
Generally the panelists contended that models should not be more restrictive. In fact, some
commenters responded that additional models should be allowed; probit and Michaelis-Menton were
specifically recommended. There was, however, a difference in philosophy concerning complexity
of modeling.

One general school of thought was that the most simple model that is consistent with the
data should be used; that is, start with a linear model (with a threshold if necessary), then check for
lack of fit and add more parameters as necessary. It was pointed out, however, that lack of fit tests
are quite insensitive. It was recommended by one panelist—with apparent general agreement during
the meeting—that a "soft" criterion be used for the goodness of fit test (e.g., p=0.20).

3-12

-------
Others felt that complex models could be fit, then parameters eliminated as appropriate
(backwards elimination) (e.g., start with a polynomial of degree k-1, then look for the most
parsimonious model). (Personal note: This is the same discussion that went on 20 to 30 years ago
as to whether stepwise forward or stepwise backward regression was the most appropriate). One
concern mentioned in the workshop is that there is something to be said for simplicity, given that
the output will be used by non-statisticians.

Another school of thought approached the question from the point of view that results
should be calculated from a number of models, then the range (or distribution) of the results
displayed or represented as a summary statistic (e.g., the mean be calculated and professional
judgment be used).
Arc the Parameters Proposed as Defaults for Model Structure Appropriate?
a. What should be the default approach for selecting the degree of the polynomial to
use?
A number of commenters responded that the models should be as parsimonious as possible
(see above on alpha levels for goodness of fit). A few thought that the best fit should be the
criterion, although it is not clear if they had considered the impact on confidence limits.

b. Is the default of not including a background parameter appropriate unless there is
some indication of a background response level?

The apparent consensus on this issue was no.

c. Is the use of extra risk as a default for quantal data appropriate?

Most commenters had no opinion or said that they had no strong reason for answering one
way or the other. One commenter did say, however, that the use of extra risk for estimation is
inconsistent with how program offices use risk estimates to calculate population cancer burden and
that added risk should be used instead.
3-13

-------
One panelist suggested extra risk be used but said that the recommendation was based on
policy considerations of public health protection (i.e., using extra risk as the estimation procedure
results in higher potency/risk estimates).

d. Is the default of not including a threshold parameter appropriate?

No consensus was reached on this question, although the discussion at the workshop
appeared to provide some support/or inclusion of a threshold parameter, particularly in the case of
linear models for continuous data. The comment was made that the estimated (threshold)
parameter had minimal biological interpretation relative to the existence of true thresholds. It was
also pointed out, however, that most of the parameters in the statistical models had little biological
relevance. It was suggested that the correct interpretation of estimates of the threshold parameter
is that it represents an apparent threshold in the observed data, and might be more appropriately
referred to as an intercept.

e. Is the default of modeling continuous data as such appropriate?

A large majority agreed that continuous data should not be dichotomized, although it was
pointed out in this session, and in others, that there has been inadequate research into the operating
characteristics of different approaches to calculating benchmark doses from continuous data. For
example, how much sensitivity is lost by dichotomizing the data?

The issue of the need to first determine the biological significance of changes in many
continuous endpoints was raised in this session and continually through the workshop.
Is the Approach for Determining the Fit of the Model Appropriate? Are There Additional or
Alternate Criteria That Should Be Used?
See above on alpha levels.
It was mentioned, even by the statisticians, that more description and/or
familiarity with the AIC criterion was necessary.
3-14

-------
It was mentioned by a number of panelists that software that included a graphical
presentation of results was a good idea.

There was discussion in this session, and later, on the issue of evaluating results from
different models. The general feeling was that an arbitrary requirement that results
from different models be within a factor of 3 was probably not very supportable.
One suggested alternative was to carry forth all outputs in the form of a range or
distribution. Another suggestion was that if different models give widely different
results, this indicates that utilizing the BMD as a point of departure may not be a
good idea. Instead the traditional NOAEL/LOAEL approach should be used.
(Personal note: This would be okay for noncancer risk assessment, but what about
cancer? It appears that EDx's will be more consistent from model to model than
LEDx's, which is another argument in favor of using central estimates rather than
upper bounds.)
Additional Comments

A case was made by some participants for keeping the process more simple than is currently

being proposed. It was held that calculating the limit of detection, fitting numerous and somewhat

complex models, and calculating confidence intervals is unnecessary. The idea of a BMD is to
calculate a point of departure that is more data-driven than NOAEL's and LOAEL's (noncancer)

and that more accurately reflects the limitations of the data than the linearized multistage (LMS)

model (cancer). The complexity being proposed, however, was inconsistent with some of the

objectives of the risk assessment process (e.g., transparency).

It was held that confidence intervals raise the following problems:
From the point of view of non-statisticians, confidence intervals introduce a "black
box" component into the process.

The interpretation of the regulatory limits after the incorporation of uncertainty
factors is not clear. For example, 95 percent of the population is protected? We
are 95 percent sure that all the population is protected? Neither of these
interpretations is correct nor are they implied in the methodology, but it is possible
(likely) that these kinds of interpretations could be made.

The additional complexity of a lower bound rather than a best estimate has a very
small effect relative to the magnitude of uncertainty factors added on in the next
step.

3-15

-------
        •     There is more than one method for calculating the limits resulting in different
              answers, thereby adding confusion to the process.

        On the other hand, the use of confidence intervals does  reward good experimentation,
although simulation results show the rewards to be  small.  There was consensus, however, that
LED's should at least be calculated in conjunction with ED estimate's. The question concerned
which one to use if a single value was used as a point of departure.  It appeared to be generally felt
that reporting a range or distribution of the Ed, would solve this problem. A very few appeared to
favor the LEDX as a single reported value.

       The suggestion was again made in a later session that the point of departure be calculated
as a range or distribution, reflecting the statistical variability of  the ED^  There was minimal
discussion of this suggestion, with no apparent strong objections.

       There was  discussion  throughout the workshop on the  need to further validate  the
methodology, particularly for continuous data.
                                          3-16

-------
                                 Use of Confidence Limits
                           Lorenz Rhomberg, Discussion Leader
                             Harvard Center for Risk Analysis
                             Harvard School of Public Health
                                       Boston, MA
       The topic of use of confidence limits was the focus of a number of written premeeting
 comments as well as of lively discussion during  the workshop.  The charge to panel members
 included three questions:

       »     Should the lower confidence limit on dose be the definition of the BMD/C?
       »     Are the defaults for the method of confidence limit calculation appropriate?
       •     Is the default of the 95-percent confidence limit appropriate?

 These are best considered in reverse order, since the later ones presume answers to the earlier.


 Is the Default of the 95-Percent Confidence Limit Appropriate?

       That is, should we consider other percentiles? Note that the question is about the default;
 specifically, it applies to the BMD/C definition (presuming it is defined as the estimated lower limit
 on the dose producing the BMR).

       Two participants had  noteworthy written comments on this issue, which  they further
 discussed at the meeting.  One suggested "soft" limits (i.e., less than 95 percent) as a way to avoid
 the "linearization" of the confidence interval at lower dose levels.  Such soft limits were discussed
 elsewhere in regard  to the goodness-of-fit determination.   Here, however, they were aimed at
preserving the ability to track curvature in the data's dose-response pattern.  The value of this
consideration was debated, it not being clear to some why linearity of the lower limit with dose in
the lower dose range was to be considered problematic.  The lower bound on dose is required for
                                          3-17

-------
one dose only—that producing the BMR—and its behavior at other doses is not really at issue, in
this view.

Another panel member reminded us that the choice of 95 percent represents a tradeoff
between the costs of making the interval's coverage wider than necessary to include the target value
and the cost of missing the target value. The choice of coverage probability therefore has implicit
policy aspects. He also noted that confidence interval construction methods need to be made to
achieve their nominal coverage for all possible sets of parameter values; for particular data sets and
curve shapes, this nominal coverage may be achievable with narrower intervals, an advantage more
easily realized by bootstrapping methods than by other methods of confidence interval construction.
(It should be noted that neither panel member favored use of the confidence interval in definition
oftheBMD/C.)
Are the Defaults for the Method of Confidence Limit Calculation Appropriate?

This was the second question in the charge to workshop panel members, and it should be
noted that the direct question is again about defaults. As a default, the present document suggests
reliance on asymptotic methods. The discussion presumed that confidence limits would be calculated
and presented even if the BMD/C definition did not rely on them.

There are several methods for calculating confidence intervals. These are based on different
approaches and invoke different assumptions. Moreover, intervals can be placed on various
aspects—estimated parameters, slopes of lines, the population of instances or the mean—that have
very different meaning and interpretation, distinctions that are sometimes lost in risk communication.
Complex models may have complex methods for confidence interval calculation. Certain models
may have parameters difficult to estimate (or to estimate independently), affecting the size of the
confidence interval.

There were some written comments on this question, but it received little discussion at the
workshop. One panel member preferred likelihood-based methods for confidence intervals, since
3-18

-------
maximum likelihood is the preferred curve-fitting method. Another questioned use of asymptotic
methods for the typically small sample sizes of most lexicological studies.

Two panel members asked why bootstrap methods could not be considered. This introduces
fewer problems with model complexity, multiplicity of parameters, and small sample sizes. A
potential difficulty may be that risk managers using such assessments might be disturbed by the lack
of exactly repeatable interval calculations.

One panel member wrote that confidence intervals do not replace a good uncertainty
analysis, which is what is needed to characterize uncertainty. Another noted, however, that
confidence intervals are a natural way to express the degree of uncertainty in the calculation of the
BMR from a set of experimental data.
Is the Default of the 95-Percent Confidence Limit Appropriate?

This third question received the bulk of written comment and of discussion during the
workshop. The thrust of the question is whether the BMD/C definition should be based on the
statistical lower (95 percent) bound on the dose/concentration that is estimated to produce the BMR
(as proposed) or on a central estimate of that dose/concentration. Again, we presumed that
confidence intervals would be calculated and presented in any case, even if the BMD/C were to be
based on a central estimate of the dose producing the BMR.

The balance of opinion was for central estimates, although a significant group argued for a
definition of BMD/C based on the lower bound. Those favoring a lower bound-based BMD/C
definition cited reasons that were few and basic, while those arguing for central estimates gave
reasons that were many and varied.

The main argument of those defending the use of the lower bound in BMD/C definition is
that there is uncertainty in estimation of the BMD owing to experimental error in the particular
study used to estimate it. Given the aim of the risk assessment process to identify doses unlikely to
produce adverse effects, we should allow for this potential error to avoid underestimating the

3-19

-------
intended BMD. At the workshop, one panel member noted that any choice of a point in the
distribution of estimates—be it an upper or lower bound or a central estimate—implies a particular
choice of weights given to some kinds of errors versus others. No stance is free of such values, and
so the particular stance adopted should give weights appropriate to the aim of the exercise—in this
case, not to underestimate risk. Others argued that central estimates are more appropriate for
making comparative choices and for conveying the most plausible interpretation of the data; any
desire to gauge the probability of underestimation should, in this view, be addressed by a separate
examination of confidence limits or uncertainty analysis, not in the definition of the BMD/C.

A second argument offered in favor of a BMD definition based on lower limits was that use
of a lower bound encourages good experimental design, since good design leads to tighter limits and
thus higher reliable estimates of the BMD. Several participants, in written comments and at the
workshop, raised doubts as to whether the amount of incentive for more powerful experiments was
large enough to be of practical value; they cited their simulations that suggested that practically
foreseeable changes in experimental design had but minor effect on narrowing the confidence
interval. They questioned whether this benefit was worth the shortcomings of a lower limit-based
BMD definition. Another panel member pointed out that most testing is done according to
approved protocols that are evidently felt to be sufficiently powerful given practical constraints, so
that real design flexibility may be limited in any case.

Several commenters noted that using the lower bound would produce BMD/Cs that are
protective of public health. One panel member argued that such use of the lower bound is similar
to the upper bound used in cancer risk assessment, and would be appropriate from the point of view
of the goal of harmonizing cancer and noncancer methods. Workshop discussion noted that
harmonizing on central estimates might be a better alternative, in the view of some.

There were several main arguments (plus a number of ancillary ones) presented in favor of
defining BMD/Cs in terms of central estimates of the dose/concentration producing the BMR.
These were discussed in written comments and during the workshop. It was noted by several
participants that at a 1994 workshop on BMD procedures, the use of lower bounds had already been
debated and the use of central estimates endorsed. Some questioned why this issue kept returning.
3-20

-------
       The argument most often cited in one form or another was that a central estimate constitutes
 the single "best" interpretation of the data at hand.  It provides "an unbiased (not intentionally
 biased) starting point for risk assessment;" it constitutes "more precise use of experimental data;" it
 provides "our best understanding of the response and accurately portray[s] this to the risk manager:"
 lower bounds, on the other hand, indicate "where we think the BMDs might be—instead of where
 we think they are." At the workshop, some commenters felt that central estimates could also mislead
 by failing to emphasize the existence of a range of plausible lower dose estimates as causes of the
 BMR.

       A second argument is that the risk assessment process is "already conservative enough" and
 needs no special  accounting for experimental variability.   "There are significant conservative
 assumptions to make up for the animal variability;" "many other health-conservative steps are already
 built into the risk assessment process." Some commenters countered that uncertainty should be dealt
 with wherever it is found. Several discussants pointed out that the amount of uncertainty accounted
 for by the use of the lower bound is trivial compared to that acknowledged in the application of the
 several ten-fold uncertainty factors used in determining RfD/Cs. In this view, the slight numerical
 adjustment afforded by  the lower bound implies an unwarranted degree of precision in the risk
 assessment process.  Some workshop discussion questioned whether the amount of conservatism
 contributed by the lower bound on an effective dose was worth the "baggage" of accusations of
 overblown conservatism that would likely accompany its use.

       Third, it was argued that use of lower bound-based BMDs will hamper comparison among
 experiments and endpoints that differ in sample size and hence in the width of the confidence
 interval on dose producing the BMR.  A lower bound "confounds the evaluation of relative
 locations;" a central estimate, on the other hand, "facilitatefsj the comparison of critical effects for
 different endpoints." One panel member provided a hypothetical example showing how endpoints
 determined with poor dose-response resolution would often be chosen as critical effects if lower
bound-based BMDs were used for comparison, even if these endpoints did not appear critical in
terms of central estimates.   The comparability issue received considerable discussion at the
workshop.  It was  pointed out that the  desired consistency of central estimates would only be
achieved if a single, risk-based definition  of BMR were adhered to, and not if the proposed
definition of BMR on limit of detection were employed.  It was widely agreed that comparisons

                                          3-21

-------
among experiments and endpoints, including the choice of critical effect, might be better made on
the basis of central estimates of BMD. Those favoring use of lower bounds, however, suggested that,
once critical effects were chosen, a BMD definition based on a lower bound would still be possible
and appropriate. In addition, a panel member cautioned that confidence intervals should also be
examined during comparison among studies so as to help gauge the probability that rankings and
comparisons are robust to the uncertain values of the central estimates.

A fourth argument cited in favor of central estimates is that confidence limits are widely
misinterpreted by users of risk assessments. Their introduction, therefore, hampers risk
communication. One panel member argued that central estimates were much simpler to grasp and
allowed unsophisticated users to conduct and interpret assessments without delving into statistical
arcana. Several commenters related accounts of misapprehensions among users regarding the nature
and meaning of confidence limits. A 95-percent bound may be believed to refer to 95 percent of
the population being free of the effect, or the confidence limit may be interpreted as addressing all
of the uncertainties in the risk assessment, not just the experimental error in the single critical study.
The connection of the lower bound on dose to produce a risk with the upper bound on risk at a
given dose has confused many users. Some commenters argued, however, that the potential for
misinterpretation is a risk communication challenge to be faced, not grounds for omitting valuable
analysis. The development of EPA software to conduct BMD/C analysis, if well documented and
explained, may obviate some of these concerns.

A fifth issue that was raised is that of the stability and robustness of the confidence interval
calculation in the face of variation in methods, models, and data sets. Several commenters said that
choice of mathematical model to fit to data had more influence on the estimated value of the lower
bound on dose than on the central estimate. There are several alternative statistical approaches to
calculating confidence limits (raising the issue named in the second question of the charge to
workshop panel members regarding preference among them); confidence limits are somewhat
dependent on which is chosen. One panelist noted that in the context of a given model, the
instability of maximum likelihood estimates, an issue for low dose extrapolation, was not a serious
concern for estimation in the range of the BMR. There was some concern expressed that reliance
on lower bounds might constrain the choice of models and parameterizations to those with well-
behaved, relatively narrow confidence limits. For example, the statistical difficulty of reliably

3-22

-------
 estimating a threshold parameter may discourage consideration of modeling that might produce
 useful insight. (A panel member noted that the difficulty in estimating a value for a threshold
 parameter may be largely obviated by constraining the degree of high-dose extreme nonlinearity
 allowed.) Another panelist noted that confidence bounds are less sensitive to trend than central
 estimates. A panel member expressed concern that models with many parameters would, owing to
 their few degrees of freedom, have particularly wide confidence intervals. In response to the above
 issues, proponents of confidence intervals in BMD definition defended the value of addressing
 experimental uncertainty; many of the  issues could be addressed by appropriate specification of
 default procedures.

       A panelist noted that the NOAEL has no confidence interval or explicit  allowance for
 uncertainly in its determination. In a sense, its uncertainty has been addressed in the form of the
 uncertainty factors, an element  that now is being made more explicit analytically.

       Another panel member noted  that  central estimates of  the BMD  are  usually in the
 experimental range, while lower limits on these estimates may not be.

       As discussed earlier under the choice of the 95-percent limit, one panel member argued that,
 since confidence limits become linear with low doses, use of a lower bound in the BMD definition
 amounts to adoption of a linear extrapolation assumption essentially the same as that in the LMS
 procedure of cancer low dose extrapolation.

       One panelist noted that central  estimates are appropriate for use in cost-benefit analysis.
 Others noted that current noncancer risk assessment methods do not make explicit risk estimates
 for different dose levels (as cost-benefit analysis would presumably demand) and that the uncertainty
 factors do not produce central estimates, even when a BMD is based on a central estimate.

       In the workshop discussion, it was suggested that the issue of experimental uncertainty in the
BMD determination could be addressed in an explicit uncertainty factor. This would preserve the
advantages of a central estimate, while allowing consideration of uncertainty in its proper realm—risk
management choices in the face of uncertainty.  A panel member pointed out that ED10s were
somewhat like LOAELs and ED05s somewhat like NOAELs (at least in dose magnitude)—the use

                                           3-23

-------
 of an uncertainly factor instead of a lower bound-based BMD is thus similar to the traditional
 LOAEL-to-NOAEL uncertainty factor. Several participants questioned the wisdom of using a crude
 and approximate uncertainty factor in place of a well-defined and statistically justified lower bound
 that addresses the specific uncertainty of the given experimental design and response levels.

        There was considerable discussion of the idea  that experimental  uncertainty  could be
 considered as an element separate from the BMD definition.  One method for doing this is to make
 the uncertainty an explicit uncertainty factor, as suggested by a panel member. There was discussion
 of how big such a factor might be and how to  make it address specifics of the data in particular
 instances.  Several panel members stressed using central estimates for comparisons, choices, and
 estimations, and then examining confidence limits as a separate step to gain perspective.  Another
 panelist (in his written comments) said that a full quantitative uncertainty analysis is possible and
 preferable to any attempt to fold  particular experimental error concerns into any estimation
 procedure such as BMD definition.  The workshop participants discussed the possibilities of
 expressing the BMD as a range or distribution, reflecting not just a central estimate or a single lower
 bound, but the full spectrum of tenable possibilities weighed by their relative support.  This could
 enter into a distributional approach to other elements of the assessment, including distributions on
 exposure and on the values of the uncertainty factors used in extrapolating animal results to levels
 deemed safe for human exposure.   Some participants, however, doubted whether risk managers
 would welcome diffuse answers to safety questions.
Summary

       After considerable discussion of the issues, there continued to be disagreement among the
workshop participants regarding whether BMD/Cs should be defined as the lower bound or the best
estimate of the level associated with the BMR.  Many workshop participants argued for use of
central estimates. Their arguments noted several properties of lower limit-based BMD definitions
they deemed undesirable, but they focused on the notion that best estimates were most useful for
comparison among endpoints and studies, while considerations of error in BMD estimation should
be considered separately in the risk assessment process (so that risk management choices could take
account of it as decision-makers see fit). There was no consensus on this point, however, there being

                                          3-24

-------
a substantial minority of participants arguing that a lower bound-based BMD definition was
appropriate given the purpose of its estimation: to help define a dose unlikely to cause human
toxicity.  All agreed that some consideration of the uncertainty in estimation of BMDs owing to
experimental error had to enter into the risk  assessment process in some way, and that central
estimates and lower bounds should always both be reported.

       If estimation error is to be considered separately, there was disagreement as to whether it
is best to do so in a separate uncertainty factor, with a full quantitative uncertainty analysis including
a distribution on the estimated BMD, or simply as a reported lower bound. The question is hard
to separate from that of how other sources of uncertainty in the risk analysis are to be handled. The
central issue is whether uncertainties associated with each element or step of the analysis are to be
somehow incorporated into the results reported for that step (as with a lower bound-based BMD
definition), or whether a separate exercise examines the uncertainties of all steps comprehensively.
                                          3-25

-------
          Selection of Benchmark Dose/Concentration to Use as the Point of Departure

                              William Pease, Discussion Leader
                                Environmental Defense Fund
                                       Oakland, CA

       The panel members supported EPA's general approach of establishing a series of decision
 points with defaults as a way of selecting a BMD/C from various model predictions to serve as the
 basis for deriving regulatory standards.  Panel members criticized several technical aspects of the
 Agency's default approach, however, and raised several general issues regarding the attributes that
 a "point of departure" for low dose risk assessment might exhibit. The following sections summarize
 the technical concerns of the panel and then present the range of opinions expressed about various
 attributes of a BMD/C that might be useful in a regulatory risk assessment context.
Comments Requested in Charge to Workshop Panel Members

       a.     The determination of equivalence of methods

       EPA proposes to assess the "equivalence" of different BMD model predictions using
statistical procedures (agoodness-of-fit test), expert judgment (based on visual examination of model
fits to observed data), and an arbitrary default definition of equivalence (if model estimates of the
BMD/C are within a factor of 3).

       The  panel was in general  agreement that  this approach required revision and further
explanation. It was noted that goodness-of-fit tests need to be designed so that they evaluate models
in the low dose region of interest (i.e., fit in the area of the ED10) rather than across the entire
observed data range.  There was general support for the use of a visual assessment of model fit.
Most concerns raised regarded selection of a factor of 3 as the default definition of equivalence:
Opinions ranged from a statement that even this size factor could have substantial regulatory impact,
to a request for at least some rationale to support what must admittedly be an arbitrary criterion.
                                          3-26

-------
One panel member noted that the Agency should consider adopting different definitions of
equivalence based on whether the selection choice involved different BMD model estimates for the
same endpoint based on a single study's data or different BMD/Cs generated for different endpoints
from multiple studies.

b. Use of the Akaike Information Criteria for comparing the fit of models

For "equivalent" models, EPA proposes to select the final BMD/C based on application of
the Akaike Information Criteria (AIC).

Virtually all panel members requested additional description and references for this proposed
procedure. Statisticians on the panel raised several concerns about the AIC: It does not focus its
evaluation of model fit on the low dose region of interest, and it has generally been applied to select
among models in very data-rich situations (e.g., time-series data). Considerable concern was raised
about the ability of the method to usefully discriminate between model results based on typically
sparse dose-response data sets. Several alternatives to such heavy reliance on statistical techniques
to guide the decision process at this point included taking the geometric mean of equivalent model
results or selecting the lowest of the equivalent BMDs as a health protective default.

c. Is the default approach for selecting the BMD/C to use as the point of departure for
cancer and noncancer dose-response analysis appropriate?

For non-equivalent models that have passed statistical and visual goodness-of-fit testing, EPA
proposes to select the lowest estimated BMD/C as a health protective default.

Panel members acknowledged that some default procedure is required to select among results
when BMD-estimates are clearly model dependent. There was considerable discussion about
whether the need to rely on such defaults could be reduced by altering the Agency's current
definition of BMD/C so that it is redefined as a model's maximum likelihood estimate rather than
a lower 95th-percentile confidence bound. Some members noted that central estimates of BMD/Cs
from various models are much less variable than lower bounds, so that an altered definition of the
BMD could produce fewer instances of non-equivalent results.

3-27

-------
        In clear situations of model-dependent BMD/C estimates, most members of the panel
 generally agreed that a default approach of selecting the lowest estimate as a point of departure
 could be justified as a public health policy choice. It was noted that there would generally be no
 biological or statistical rationale available for selecting one model's result over another at this point.
 A panel member opposed to this approach recommended that whenever there is clear model
 dependence in BMD/C estimates, the BMD approach should be dropped and the Agency should
 shift back to using a NOAEL as a point of departure. Another panel member raised the possibility
 of carrying multiple model-dependent BMD estimates forward into the risk management process,
 rather than excluding some possible BMD values through the application of defaults.  Different
 model results (with their associated uncertainties and some estimate of their overall  plausibility)
 would be presented as part of a decision tree to risk managers.
 Comments on General Issues Regarding Selecting a Point of Departure

       The panel was unanimous in expressing concern that EPA had not clearly expressed its goals
 in adopting the BMD approach, and that this had prevented a thorough evaluation of the potential
 attributes that should be exhibited by BMD/Cs.  Through several presentations at the meeting, it
 became apparent that EPA primarily conceived of BMD/Cs as an improved replacement (a "plug-in")
 for NOAELs in noncancer risk assessment that should  generally not  affect the conventional
 application of uncertainty factors to derive reference doses.  Panel members raised a number of
 concern about this narrow conception of the BMD approach and identified three other desirable
 attributes for BMD/Cs  that the Agency should consider as it  proceeds with defining  and
 implementing the BMD approach.  The following sections summarize the discussions of the four
 potential attributes that BMD/Cs could be designed to support.

       a.      BMD/Cs should be a "plug-in" for NOAELs with minimum impact on uncertainty
              factors or  the RfD process

       EPA has been motivated to a considerable degree in  its BMD development effort to develop
 a new approach to noncancer risk assessment that addresses the widely acknowledged problems of
NOAELs as a starting point, but that preserves the current  "level of conservatism" associated with
                                          3-28

-------
 the conventional process. While this motivation is understandable from a political perspective (since
 it will result in minimal revisions to current standards and does not require altering standard
 uncertainty factors), panel members expressed considerable skepticism about this goal.  Members
 could reach no consensus on what the current "level of conservatism" provided by NOAEL-based
 reference doses was: Some argued it was clearly adequate, others that we have little or no empirical
 evidence about the degree of safety provided by most noncancer health standards.

       In the absence of a way of assessing health protectiveness, it appears that this approach of
 "aiming" the BMD to be as close to existing NOAELs as possible (i.e., designing the BMD approach
 to generate a BMDrNOAEL ratio of one over the complete set  of compounds assessed) could
 actually lead the Agency to duplicate some of the problems associated with the NOAEL approach.
 For example, EPA proposes to use the LOD method to establish BMD/Cs rather than a fixed
 incidence level. This approach was developed to ensure that the BMD/C estimated from insensitive
 studies (e.g., some neurotoxicity assays) would generally be equivalent to NOAELs that could be
 estimated from such studies.  Panel members generally rejected this approach because it rewards the
 existing detection limits of conventional testing protocols (e.g., by providing a higher percentage
 BMD/C as a starting point for neurotoxicity than developmental toxicity) and provides no incentive
 to improve detection limits for inadequately assessed endpoints.

       The panel generally agreed that it would be more appropriate for the agency to conceive of
 BMD/Cs as aiming for a low effect level rather than a conventional NOAEL.  Particularly if the
 Agency redefines the BMD/C to  be  a central estimate of a dose associated  with  a 10-percent
 incidence of biologically significant adverse effects (ED10), it will  simply not be plausible to equate
 this with the "no observed adverse effect level" of conventional  noncancer risk assessment. The
 BMD/C will be, by definition, an effect level where something biologically significant is occurring.
 The panel emphasized, however, that this does not make the BMD/C equivalent to a LOAEL. (In
 conventional noncancer risk assessment, LOAELs may serve as a point of departure for deriving a
 reference dose when a poor study has failed to ascertain a NOAEL and requires application of an
additional 10-fold LOAEL-> NOAEL uncertainty factor.) In contrast to LOAELs, the. ED10 will be
associated with a defined level of adverse incidence, often lower than that of LOAELs from well-
designed studies.   Because it  is an "effect level," however,  several members of the  panel
recommended that EPA should reexamine its current uncertainty factor practice.  The Agency

                                          3-29

-------
 should consider whether a new uncertainty factor (to extrapolate to a no risk dose from a low
 effective dose to obtain a point of departure) is required and should also examine other impacts that
 the BMD methodology could have on conventional uncertainty factors.

        The panel noted that the guidance document needs  to explicitly address under what
 conditions the Agency anticipates continuing to conduct noncancer risk assessment with NOAELs
 rather  than BMD/Cs.  Concerns were raised about the confusion that  may arise if the two
 approaches are mixed (e.g., used for quantal endpoints, but delayed application for continuous or
 neurotoxic endpoints). The document would benefit from a clear statement of the limited conditions
 under which NOAELs will continue to serve as points of departure.

        b.     BMD/Cs should  provide  a consistent and comparable point of departure for
              calculating reference doses or margins of exposure

        As proposed, EPA's BMD/Cs can represent different incidence proportions (from a quantal
 default of 10 percent to as high as 40 to 50 percent for some continuous data with high levels of
 detection) of adverse impacts of widely varying severity.  Risk characterizations for different
 compounds (based on the margin of exposure between current exposure and the BMD) will be more
 comparable if they employ a common level of adverse impact as their point of departure.  This is
 a feature that is (at least rhetorically) possessed by the current NOAEL/uncertainty factor (UF)
 approach: Standard setting begins for all compounds from a level observed to have no adverse
 impacts.  Uncertainty factors are then applied to derive an  RfD that is likely to be below the
 population's threshold for any adverse impact. The hazard indices that are conventionally used to
 conduct noncancer risk assessment (the ratio of current exposure to RfD) are therefore interpreted
 as an exposed population's distance from a "safe" level of exposure. EPA's current BMD approach
complicates this interpretation of hazard indices because it replaces the NOAEL with a starting point
that represents doses associated with different percentage increases in responses of differing severity
for different compounds.   To  maintain the integrity of an exposure/RfD  ratio as  a risk
characterization tool,  it would be ideal if BMD starting points for different compounds could be
selected to be roughly comparable in terms of the potential impact on human health.
                                          3-30

-------
        Several panel members emphasized that the BMR (percentage incidence) should be the same
 for most applications to allow for consistent interpretation of derived reference doses or margins of
 exposure.

        There was general agreement that EPA must address "severity" of impact in defining points
 of departure for noncancer risk assessment.  Panel members noted that it would be advisable to
 derive BMD/Cs based on incidence of adverse endpoints of comparable severity (either all BMDs
 could refer to some consistent minimal level of severity, or each BMD could be accompanied by a
 categorical indicator of the severity of the critical effect it is based on).  Either option would involve
 using the concept of biological significance as an organizing concept for defining BMR.  Expert
 judgment would be required to classify commonly observed adverse endpoints by severity (using
 different endpoint-specific measures of incidence and adversity to define a common severity scale).
 The panel noted that EPA will need to invest substantial effort in developing consensus definitions
 for biological significance (for many continuous endpoints) and that this effort could be extended
 to develop a common severity scale. Further effort should be devoted to issues raised by attempting
 to define a "common level of adverse impact" as a point of departure for standard setting.

       One additional alternative approach to this problem involves addressing the variations in the
 seriousness of impacts observed at the point of departure with an additional uncertainty factor based
 on severity (the guidance does  acknowledge that the nature of the response  should be a
 consideration when evaluating the adequacy of calculated margins of exposure or hazard indices).

       c.      BMD/Cs should provide a suitable basis for low dose extrapolation

       The use of BMD/Cs as a point of departure for low dose extrapolation elicited a wide range
of expert opinions among panel members. Several supported the guidance document's  general
dismissal of this potential use, arguing that extrapolation beyond the range of observation using
curve-fitting models is not credible or appropriate for noncancer endpoints.  Other panel members
noted that limiting the BMD approach to replacing NOAELs in a conventional uncertainty factor- .
based margin or  exposure assessment would forgo  exploring a much needed improvement in
noncancer  risk  assessment: the ability  to  generate  quantitative estimates of the  incidence of
                                          3-31

-------
 noncancer effects at various exposure levels.  Such quantitative estimates of low dose risks for
 noncancer endpoints are required if these effects are to be considered in cost-benefit analyses.

        Several panel members supported an intermediate position, noting that risk estimation within
 the experimental data range (of applied doses, not only of observed effects) was appropriate if it was
 constrained on an endpoint-specific basis (e.g., estimation down to an ED01 might be appropriate for
 cancer endpoints, given the power of cancer bioassays).  Low dose estimation outside this range
 would only be appropriate if analysts could provide a plausible theoretical or biological basis for use
 of specific models (as in the case of low dose cancer risk estimation).

        There was general panel agreement that the issues surrounding use of BMD/Cs as points of
 departure for low dose risk estimation required further discussion in the guidance document: Further
 guidance on risk estimation for exposure situations within the experimentally observed range is
 needed (noting, for example, how such estimation is currently  being applied to the premature
 morbidity and mortality associated with ozone and particulate matter); discussion of other peer-
 reviewed, low  dose risk estimation applications is  warranted; and a clear policy statement on
 potential uses of BMD  approaches for risk estimation is needed.

        d.      BMD/Cs should be derived to provide information useful for modifying uncertainty
               factors or evaluating margins of exposure

        Several panel members noted that the process of estimating a BMD/C provides information
 about the shape of a compound's dose-response curve in the low dose region that is very valuable
 to risk managers. Indications of steep or shallow slopes is helpful in evaluating whether margins of
 exposure equate to margins of protection for public health (e.g., a steep slope in the low dose region
 indicates that risks will decrease quickly in the low dose region, and that  the linear margin of
 exposure provided by standard uncertainty factor applications is likely to be protective). Information
 about the slope of the dose-response curve is also useful for evaluating the potential severity of
 impacts in the low dose region, which can be used  to classify endpoints to establish consistent
 starting points or to establish the appropriate magnitude for any new uncertainty  factor  aimed at
 extrapolating from low effective doses to no risk doses.  Information provided by the confidence
limits on a BMD/C could be used to help establish the appropriate magnitude of the conventional

                                           £•32

-------
modifying factor for data quality (if the lower confidence bound is omitted from the definition of
the BMD/C).

       Another panel member noted that considerable important information is compiled during
the benchmark dose process that warrants being presented to decision-makers. BMD/Cs could be
conceived as ranges instead of single points of departure (to reflect the variety of model options as
well as choices between best estimates and confidence limits). While the risk management process
has a limited history with this type of detailed risk assessment  data,  adopting a distributional
approach to BMD/C development could combine with a distributional approach to uncertainty
factors to support more probabilistic risk assessment for noncancer endpoints.
                                         3-33

-------

-------
SECTION FOUR

OBSERVERS' COMMENTS
Observers were given two formal opportunities to provide information and make public
statements during the workshop. Observers were asked to sign up if they intended to make a
statement. The following comments were made by observers at the workshop.

Arnold Kuzmack of EPA's Office of Water provided two comments. First, on the issue of
including the threshold parameter in models, Dr. Kusmack expressed the opinion that it will be
extremely difficult to communicate to the regulatory and legal communities that the BMD is not a
true estimate of the biological threshold. Second, Dr. Kusmack concurred with a comment from one
of the panel members about describing sets of comparable endpoints. Dr. Kusmack suggested that
ways to describe this information need to be developed over time.

Lynne Haber of ICF/Clement,Inc., had one comment regarding the choice of the BMR. Ms.
Haber supported the use of biological information for selecting the BMR and expressed the opinion
that the draft guidance needs more information on how to select and use biological information (e.g.,
10 percent change in body weight).

Joseph Siglin of Exxon Bioniedical Sciences, Inc., offered an opinion concerning experimental
design. He stated that discussions concerning better experimental designs are useful, however, there
needs to be recognition that experimental designs are often specified in regulatory guidelines (e.g.,
the number of groups, the number of animals per group). Dr. Siglin stated the number and types
of uncertainty factors applied could be a function of the confidence limits generated by the dose
response (i.e., 95-percent confidence limits). Dr. Siglin asked the panel members whether
application of the BMD should be restricted to experiments that lack a clear NOAEL. He pointed
out that the BMD approach is not applicable to studies that show no effects at limit dose levels;
however, this issue has not been specifically addressed in the guidance. Dr. Siglin concluded by
4-1

-------
supporting the concept of keeping the BMD guidance as simple as possible so that it can be easily
understood and applied by general lexicologists.

Amal Mahfouz of EPA's Office of Water expressed the opinion that the BMD is very useful
for certain endpoints.

Hugh Pettigrew of EPA's Office of Pesticides Programs responded to a comment made
during the panel discussions concerning the significance of selecting a factor of 3 for comparing
different BMD estimates. He noted that it is the largest integer that is less than the square root of
10. EPA is always interested in orders of magnitude, he said; however, estimates that differ by less
than the power of 3 only differ by half an order of magnitude. Concerning the use of the terms
"extra risk" versus "additional risk," Dr. Pettigrew pointed out that EPA uses extra risk to regulate
cancer risk and that under the new cancer risk assessment guidelines, EPA will still use extra risk.
Dr. Pettigrew recommended that the authors of the draft guidance document make terminology
compatible with other Agency guidance. Concerning the lower confidence limit on dose, Dr.
Pettigrew was under the impression that this was a one-sided lower confidence limit on dose or a
one-sided upper confidence limit on risk. He noted that existing software to make these calculations
assumes a one-sided confidence limit, therefore, it does not make sense to present central estimates
and upper and lower confidence limits because there is a conceptual difference between a lower one-
sided confidence limit and a lower limit of a two-sided confidence interval.

William Marcus of EPA's Office of Science and Technology made the observation that EPA
has been discussing dose-response curves for many years and that this workshop's discussions are
nothing new. He also stated that workshop participants were discussing concepts in the absence of
an understanding of what is really being addressed. Dr. Marcus expressed the opinion that cancer
is not always the worst endpoint, and that cancer as an endpoint may not differ from any other
lexicological endpoint. For example, lead is both dose related and dose responsive in bioassays.
EPA decided not to regulate lead based on cancer because the biological significance of other effects
(i.e., decreased mental aptitude in children) occurring at doses lower than those that cause cancer
were more serious. Dr. Marcus posed the following questions to panel members:
4-2

-------
How do you measure decreased mental ability in children?

Do you want to measure enzyme changes that result in nice statistical numbers, but
may not have any biological significance?

How do you decide which endpoint should be measured?

Should you consider the biological significance, the statistical significance, or the
endpoint that might provide you with information that alerts you to an unknown
problem?
                            4-3

-------

-------
         APPENDIX A




PEER CONSULTANTS/PRESENTERS
            A-l

-------

-------
SEPA
United States
Environmental Protection Agency
Risk Assessment Forum
    Benchmark  Dose  Peer
    Consultation Workshop
    Holiday Inn Bethesda
    Bethesda, MD
    September  10-11, 1996
    Peer Consultants/Presenters
    Bruce Allen
    Project Manager
    K.S. Crump Group
    ICF Kaiser Engineers, Inc.
    P.O. Box14348
    Research Triangle Park, NC 27709
    919-547-1715
    Fax: 919-547-1710

    George Daston
    Miami Valley Laboratories
    The Proctor & Gamble Company
    P.O. Box 398707
    Cincinnati, OH  45239-8707
    513-627-2886
    Fax: 513-627-1908

    Elaine Faustman
    Department of Environmental Health
    School of Public Health and Community Medicine
    University of Washington
    445 Roosevelt Way, NE - Suite 100
    Seattle, WA 98105
    206-685-2269
    Fax: 206-685-4696

    Jeff Fowles
    Office of Environmental Health  Hazard Assessment
    Air Toxicology & Epidemiology Section
    California Environmental Protection Agency
    2151 Berkeley Way - Annex 11
    Berkeley, CA 94704
    510-540-3324
    Fax: 510-540-2923
                          David Gaylor
                          Associate Director for Risk
                          Assessment Policy and Research
                          National Center for Toxicological Research
                          U.S. Food & Drug Administration
                          3900 NCTR Road (HFT-1)
                          Jefferson, AR  72079-9502
                          501-543-7001
                          Fax: 501-543-7576
                          E-mail: dgaylor@nctr.fda.gov

                          Daniel Guth*
                          National Center for Environmental Assessment
                          (MD-52)
                          U.S. Environmental Protection Agency
                          Research Triangle Park, NC 27711
                          919-541-4930
                          Fax:919-541-0245

                          William Hartley
                          Associate Professor
                          Toxicology and Risk Assessment
                          School of Public Health and Tropical Medicine
                          Tulane University
                          1501 Canal Street
                          New Orleans, LA 70112
                          504-588-5374
                          Fax: 504-584-1726
                          E-mail: hartley@mailhost.tls.tulane.edu
                          ^Presenter
       Printed on Recycled Paper

-------
 Rogene Henderson (Chair)
 Senior Scientist
 Inhalation Toxicology Research Institute
 Building 9217-Area Y
 KAFB East
 Albuquerque, NM  87115
 505-845-1164
 Fax:505-845-1198
 E-mail: rhenderson@lucy.tli.org

 Carole Kfmmel*
 National Center for Environmental Assessment
 U.S. Environmental Protection Agency
 401 M Street, SW (8623)
 Washington, DC 20460
 202-260-7331
 Fax: 202-260-8719

 Abby Li
 Toxicology Manager
 Monsanto Company, Ceregen
 645 South Newstead Avenue
 SL Louis, MO 63110
 314-694-7933
 Fax: 314-694-7938
 E-mail: aali@monsanto.com

 Rashmi Nair
 Monsanto Company
 800 North Lindbergh Boulevard (A3NF)
 St. Louis, MO 63167
 314-694-8808
 Fax: 314-694-8808
 E-mail: rsnair@ccmail.monsanto.com

 Bruce Naumann
 Principal Toxicologist
 Merck & Company, Inc.
 One Merck Drive (WS2F-45)
Whitehouse Station, NJ 08889-0100
 908-423-7908
 Fax: 908-735-1496
 E-mail: bruce_naumann@merck.com

 James Olson
 Professor
 Department of Pharmacology & Toxicology
 State University of New York at Buffalo
 102 Farber Hall
3435 Main Street
Buffalo, NY  14214-3000
716-829-2319
 Fax: 716-829-2801
 Colin Park
 The Dow Chemical Company
 2030 Building
 Midland, Ml 48640
 517-636-1159
 Fax: 517-636-6451

 William Pease
 Environmental Defense Fund
 Rockridge Market Hall
 5655 College Avenue
 Oakland, CA 94618
.510-658-8008
 Fax: 510-658-0630
 E-mail: pease@uclink4.berkeley.edu

 William Perry
 Health Scientist
 Directorate of Health  Standards Program
 Occupational Safety and Health Administration
 200 Constitution Avenue, NW- Room N3718
 Washington, DC 20210
 202-219-7111
 Fax:202-219-7125

 Christopher Poiiier
 Acting Chief
 Laboratory of Quantitative
 and Computational Biology
 National Institute for Environmental
 Health Sciences
 P.O. Box 12233 (MD-A306)
 Research Triangle Park, NC 27709
 919-541-4999
 Fax: 919-541-1479
 E-mail: portier@niehs.nih.gov

 Lorenz Rhomberg
 Harvard Center for Risk Analysis
 Harvard School of Public Health
 718 Huntington Avenue
 Boston, MA 02115
 617-432-0095
 Fax: 617-432-0190
 E-mail: rhomberg@hsph.harvard.edu

 Woodrow Setzer*
 Mathematical Statistician
 Biometry Branch
 Research and Administrative Support Division
 National Health and Environmental
 Effects Research Laboratory (MD-55)
 U.S. Environmental Protection Agency
 Research Triangle Park, NC 27711
 919-541-0128
 Fax:919-541-5394
 E-mail: setzer.woodrow@epamail.epa.gov
                                                  'Presenter

-------
Robert Sielken, Jr.
President
Sielken, Inc.
3833 Texas Avenue - Suite 230
Bryan, TX  77802
409-846-5175
Fax: 409-846-2671
E-mail: sielkeninc@aol.com

Thomas Starr
ENVIRON International Corporation
7500 Rainwater Road
Raleigh, NC  27615-3700
919-876-0203
Fax: 919-876-0201
E-mail: tbstarr@interramp.com

-------

-------
   APPENDIX B




WORKSHOP AGENDA
      B-l

-------

-------
SEPA
United States
Environmental Protection Agency
Risk Assessment Forum
    Benchmark Dose  Peer

    Consultation  Workshop



    Holiday Inn Bethesda

    Bethesda,  MD

    September 10-11,  1996


    Agenda

    TUESDAY, SEPTEMBER  10

       8:OOAM   Registration

       9:OOAM   Welcome and Introduction	 Carole Kimmel
                                  National Center for Environmental Assessment (NGEA), EPA,
                                                               Washington, DC

       9:15AM   Workshop Structure and Objectives	Workshop Chair:
                                                            Rogene Henderson,
                                           Inhalation Toxicology Research Institute (ITRI),
                                                              Albuquerque, NM


       9:35AM   Discussion of Simulation Studies 	 Woodrow Setzer, Jr.
                               National Health and Environmental Effects Research Laboratory,
                                                EPA, Research Triangle Park (RTP), NC

       9:55AM   Benchmark Dose Software Development	 Daniel Guth
                                                           NCEA, EPA, RTP, NC

      10:15AM   BREAK

      10:30AM   Selection of Studies and Responses
              for Benchmark Dose/Concentration Analysis	 Discussion Leader:
                                                                James Olson,
                                                       State University of New York,
                                                                  Buffalo, NY

      11:30AM   Selection of the Benchmark Response Level .	 Discussion Leader:
                                                              Elaine Faustman,
                                                         University of Washington,
                                                                  Seattle, WA
      12:30PM   LUNCH

              (Continued)
       1:30PM   Selection of the Benchmark Response Level 	 Discussion Leader:
                                                               Elaine Faustman
      Printed on Recycled Paper

-------
TUESDAY,  SEPTEMBER  10 (CONT'D)
    2:15PM    Model Selection and Fitting  	  Discussion Leader:
                                                                                  Colin Park,
                                                                  The Dow Chemical Company,
                                                                                 Midland, Ml
    3:15PM    BREAK

              (Continued)
    3:30PM    Model Selection and Fitting	  Discussion Leader:
                                                                                  Colin Park

    4:30PM    Observer Comment Period	  Facilitator: Rogene Henderson

    5:OOPM    Day One Closing Remarks	Rogene Henderson

    5:15PM    ADJOURN


WEDNESDAY,  SEPTEMBER  11

    8:30AM    Use of Confidence Limits	 Discussion Leader:
                                                                           Lorenz Rhomberg,
                                                               Harvard School of Public Health,
                                                                                 Boston, MA

    9:30AM    Selection of BMD/C to Use
              as the Point of Departure	 Discussion Leader:
                                                                                  Bill Pease
                                                              Environmental Defense Fund and
                                                                       University of California,
                                                                                Oakland, CA
   10:30AM    BREAK

   11 :OOAM    General Issues	 Discussion Leader:
                                                                          Rogene Henderson

   12:OOPM    Observer Comment Period	  Facilitator: Rogene Henderson

   12:30PM    LUNCH

              (Continued)
    1:30PM    General Issues	 Discussion Leader:
                                                                          Rogene Henderson

   2:30PM    Closing Remarks/Chair's Summary	Rogene Henderson

   2:45PM   ADJOURN

-------
            APPENDIX C




CHARGE TO WORKSHOP PANEL MEMBERS
              C-l

-------

-------
vvEPA
United States
Environmental Protection Agency
Risk Assessment Forum
Benchmark Dose Peer
Consultation Workshop
Holiday Inn Bethesda
Bethesda, MD
September 10-11, 1996
CHARGE TO REVIEWERS
Our overall goal in developing this document is to have a procedure that is usable, that has
reasonable criteria and defaults to avoid proliferation of analyses and model shopping, and that
promotes consistency among analyses. Ultimately, we are trying to move cancer and noncancer
assessments closer together, using precursor and mode of action data to extend and inform our
understanding of risk in the range of extrapolation. We would like to have in one package
something that is usable for cancer and noncancer assessments when endpoints are relevant to
both.

Please review the technical points below. As you are preparing your technical comments, we
would also like your advice on how best to achieve our goals as stated above. This should take
the form of further points to be developed in the document or issues that should be clarified.

In your review, please address the following issues and questions on the Benchmark Dose
Technical Guidance Document.

1. Selection of Studies and Responses for Benchmark Dose/C Analysis

a. Is the selection of studies and endpoints for the BMD/C appropriate? for
cancer? for noncancer?
b. Should these be the same for cancer and noncancer data?
c. Are there appropriate criteria for determining when data should be combined
for analysis?

2. Selection of the Benchmark Response Level

a. Is the use of biological significance or limit of detection an appropriate basis
for the selection of the BMR?
b. For the limit of detection, is the approach proposed in the document
appropriate?
c. Is information available to determine the appropriate power level? (Information
on current simulation studies will be presented at the workshop.)
d. Is the default for quantal and continuous data appropriate?
Printed on Recycled Paper

-------
3. Model Selection and Fitting

a. Is the order of model application for continuous and dichotomous data
appropriate?
b. Should other models be considered, or should the number of models applied
be more restrictive?
c. Are the parameters proposed as defaults for model structure appropriate?
i. What should be the default approach for selecting the degree of the
polynomial to use?
ii. Is the default of not including a background parameter appropriate unless
there is some indication of a background response level?
iii. Is the use of extra risk as a default for quantal data appropriate?
iv. Is the default of not including a threshold parameter appropriate?
v. Is the default of modeling continuous data as such appropriate?
d. Is the approach for determining the fit of the model appropriate? Are there
additional or alternate criteria that should be used?

4. Use of Confidence Limits

a. Should the lower confidence limit on dose be the definition of the BMD/C?
b. Are the defaults for the method of confidence limit calculation appropriate?
c. Is the default of 95 percent confidence limit appropriate?

5. Selection of the BMD/C To Use as the Point of Departure for Cancer and Noncancer
Health Effects

a. Comment on the determination of "equivalence" of models.
b. Comment of use of the Akaike Information Criterion for comparing the fit of
models.
c. Is the default approach for selecting the BMD/C to use as the point of
departure for cancer and noncancer dose-response analysis appropriate?

6. General Issues

a. The discussions concerning the use of BMD/C approach in cancer and
noncancer risk assessment.
b. How understandable the document is for the general toxicologist/risk assessor.
c. The overall organization of the document, further points to be developed or
needing clarification.
d. The examples of BMD/C analyses in Appendix D.

-------
     APPENDIX D




PREMEETING COMMENTS
        D-l

-------

-------
?xEPA
United States
Environmental Protection Agency
Risk Assessment Forum
            Benchmark Dose Peer
            Consultation Workshop

            Holiday inn Bethesda
            Bethesda, MD
            September 10-11, 1996

            Premeeting Comments
     Printed on Recycled Paper

-------

-------
PREMEETING COMMENTS

-------

-------
                                    CONTENTS
Peer Consultants' Comments                                                    Page

Bruce Allen	  1
George Daston	  11
Elaine Faustman	  17
Jeff Fowles	  27
David Gaylor	  41
William Hartley 	  47
Abby Li	  53
Rashmi Nair	.,	  59
Bruce Naumann 	  65
James Olson	  83
Colin Park	  89
William Pease	  95
William Perry	  105
Christopher Portier	  119
Lorenz Rhomberg	  127
Robert Sielken, Jr			  143
Thomas Starr	  169

-------

-------
Bruce Alien

-------

-------
Bruce Allen
Review Comments
REVIEW COMMENTS ON EPA'S BENCHMARK DOSE
TECHNICAL GUIDANCE DOCUMENT
Bruce C.Allen
ICF Kaiser
The entire process of a risk assessment that potentially involves BMD calculation can be
summarized as follows:
Step 1: Selection of appropriate studies and endpoints for use in the risk assessment.
Step 2: Determination, for each selected endpoint, whether a BMD estimate can and
should be derived. If not, an alternative value may be determined.
Step 3: Calculation of the BMDs desired.
Step 4: Interpretation and use of the BMDs (or alternative values when BMDs have not
been calculated).

The Agency has laid out clearly and succinctly (bottom of p. 11) the reasons why one would want
to move away from complete reliance on NOAELs and LOAELs. In addition, the comment on p.
10 (lines 26-29), to the effect that a NOAEL and LOAEL characterize only one particular study
(and even then, only a relatively small portion of the study) is a very important consideration. The
guidance for application of the BMD approach should be judged in light of how well it appears to
promote BMD analyses that improve the process of risk assessment, i.e., how well it eliminates or
decreases the problems that have been identified with use of NOAELs.

In general, the guidance provides a reasonable and rational way of proceeding with BMD analyses.
There are some particular restrictions and default choices that I would not have imposed, and there
are some areas where more explicit guidance may need to be provided. These are presented and
more fully discussed below, but my overall impression is that significant thought has been given
and care has been taken in the development of the guidance.

My first concern relates to the comments in the introduction (p. 12 line 5) that a BMD/C that is
estimated will always (or should always) be in the observable range. Many situations will arise,
and I believe the guidance does rapt necessarily rule these out, where a meaningful and useful
BMD/C can be determined that is less than the doses used in the study from which it is derived. In
fact, the boric acid example in Appendix D shows a case (for Study A of that example) where a
BMD was less that the lowest positive dose (which happened to be a LOAEL by traditional
thinking) when fetal weight was considered. Since one point of that example should be that the use
of the BMD/C approach can obviate the need for additional testing when a NOAEL is not
obtained, the statement on p. 12 about the BMD/C being in the observable range appears to be
inappropriate.

Similarly, on p. 17, following the definition of the BMD/C, the "requirement" that "at least one
dose be near the range of the response level for the BMD/C" (lines 17-18) should not be a
requirement at all. Clearly, there will be less uncertainty about the value of the BMD/C when that
is the case (and that reduction in uncertainty will be reflected in tighter confidence limits used to
define the BMD/C), but to make it a requirement that there be an experimental dose that gives a
response about equal to the BMR would be too restrictive. In fact, if using a NOAEL or LOAEL
is the alternative when this requirement is not met, this will lead to less consistency among points
of departure, because in that case we know that there will not be a dose level to choose that will
approximate the response level of interest. The advantage provided by dose-response modeling and

-------
Bruce Allen
Review Comments

estimation via that modeling of doses that are associated with some predefined level of response is
lost under this scenario.

Statements throughout the document (e.g., p. 17 line 12) that the BMD/C is not dependent on the
doses used hi a study should be toned down. Whereas, generally speaking, the choice of dose
levels should have little effect on the estimation of the dose-response relationship overall, the choice
of the doses will have some effect on the calculation of the bounds. It is precisely because the
bounds reflect uncertainty about the dose level associated with a particular response, and that
uncertainty depends on what response levels have been observed, that the BMD/C approach using
lower bounds on dose is so powerful. Confidence limit calculations provide a natural way for one
to express uncertainty about the value of the parameter (BMD/C) of interest.

With respect to the Data Array Analysis - Endpoint Selection process, more guidance may need to
be given for cases in which there are a larger number of studies or endpoints considered relevant.
In particular, how would one identify redundancy among endpoints (p. 18, line 29)? How does one
know when one endpoint "represents others for the same target organ" (p. 19, line 4)? Moreover,
it is not clear to me that a "smoothly increasing response" that allows good fit can or should be the
driving factor for endpoint selection at this stage. Dealing with fit problems is an important
consideration, but until the modeling is completed it may be difficult to determine whether good fits
can be obtained. It is not clear that fit is an important consideration at the stage of endpoint
selection.

Have there been any studies or work done to support the claim that having LOAELs differing by a
factor of 10 (p. 19, line 13) will insure that the "critical" BMD/C will not be missed?

The subheading "1. Selection of Endpoints to be Modeled" can probably be eliminated. What
needs to be emphasized is that the first stage of selection should be based on relevance, good
experimental protocol, etc., without regard to the ability to derive BMD/C estimates. Secondarily,
one wants to reduce the number of endpoints that need to be considered (redundancy,
representativeness, sensitivity), and some of the endpoints chosen then may still not be amenable to
dose-response modeling. Finally, one must pick which endpoints that remain can be modeled (data
set requirements) and what one will do with the remaining ones that are considered relevant,
representative, and potentially sensitive.

So, the section on minimum data set requirements (p. 19) should, first of all, include material from
Appendix A about the data needs (or at least explicitly reference Appendix A here with a strong
encouragement to look closely at the needs). Then the caveats about when one should not do dose-
response modeling and estimate a BMD/C even when the data needs are satisfied can be provided
in tiiis section as well.

However, that being said, I do not think that all of the restrictions or constraints imposed (bottom
of p. 19) are appropriate. I think a better way of characterizing the constraints would simply be to
state that dose-response modeling should be done only when there is evidence of the shape of the
dose-response relationship, if one exists. This would cover all of the really problematic cases that
should be included in the list of constraints, but it leaves open the possibility of (appropriately)
doing BMD/C estimation for some cases that would be excluded as the constraints are now stated.
For example:
The statement of the first constraint (line 19, p. 19) is not very clear. For one thing, does a
biologically but not statistically significant LOAEL count here? In general, what are the criteria

-------
                                                                                 Bruce Allen
                                                                           Review Comments

 by which a LOAEL is determined to exist (authors statement, pairwise tests, trend tests?). If this
 statement implies that one should not model data sets for which a LOAEL was not actually
 observed (as opposed to the possibility that the number of doses and subjects could not have given
 a LOAEL because of design limitations), then this is equivalent to saying there is no evidence of
 dose-response for the data set under consideration.  I would tend not to put as much emphasis on
 the existence of a LOAEL per se — it is so dependent on sample size for one thing — but rather the
 evaluation that something biologically and lexicologically "real" is happening. I can think of cases
 where a LOAEL may exist, but because of a large number of animals being tested, the differences
 that are statistically significant are not biologically meaningful and may just reflect differences that
 will inevitably exist among finite groups of "observations," even when those groups were drawn
 from the same population.  On the other hand, the lack of a LOAEL may be caused by a small
 number of animals; perhaps in some cases lack of statistical significance may exist but it might be
 considered that evidence of a dose-response relationship may evident. I would explicitly allow
 consideration of other parts of the data base (other studies and/or other related endpoints for the
 chemical under consideration — perhaps ones that are not being considered for modeling because of
 basic data deficiencies but which still carry information about dose-related effects; I would even
 consider related chemicals with similar effects) in order to make a "holistic" appraisal of the
 existence of dose-response relationships.
        The consideration of the presence of dose-response relationships would also rule out
 modeling data sets with only one positive response level (lacking the ancillary evidence described in
 the preceding paragraph).  A single positive response level (even when other doses have been tested
 and have exhibited background-level responses) should not be considered to provide evidence about
 the shape of the dose-response relationship and would therefore not be modeled.
        On the other hand,  the existence of only high levels of response (criterion starting at line 25
 of p.  19) should not necessarily preclude modeling.  There may be many cases in which responses
 are all above 50% but one still gets a clear picture of a dose-response relationship (what about
 cases where background starts out near 50%?). I would consider modeling those — the uncertainty
 concerning the BMD/C corresponding to a lower level of response may be greater than might
 otherwise be the case, but that is covered by the calculation of confidence limits. Later (in the step
 when one interprets and chooses from among various BMD/Cs) this uncertainty may dictate that
 another BMD/C is used for regulatory purposes, but a BMD/C calculated from such data will
 carry information relevant to the final decisions to be made.  If all responses are above 90%,
 however, then less (perhaps next to nothing) might be  said about dose-response shape and such
 cases could be ruled out. The same would hold if there only existed a plateau of continuous
 responses (and no ancillary information).

 The bottom line here is that it is the presence of evidence suggesting the existence and general
 shape of dose-response relationships that allows one to feel comfortable fitting dose-response
 models to the data set(s) under consideration.  When that evidence is present, then the appropriate
modeling can be done.  When it is absent, then the constraints on the modeling are not well-defined
and modeling should not be pursued.

Although this may require some additional elaboration, one might consider using a trend test on the
positive doses (i.e., exclude the controls) as a way to determine if there exists sufficient information
about a dose-response relationship for modeling and BMD/C estimation to be done.

There is a definite need to consider data combinations  (p. 20). However, more guidance  may be
required for users to know what constitutes biological  or statistical compatibility. Nevertheless,

-------
Bruce Allen
Review Comments

the point (lines 5-7) that additional research can affect the BMD/C estimates (including increasing
them, unlike a NOAEL estimate) is an important one that deserves to be emphasized.

Concerning the selection of the BMR level, I have one peripheral comment first (and this relates to
many similar occurrences throughout the document). There is a discussion of biologically
significant changes, with body weight as an example. One needs to be very careful with such
examples, to be explicit about what it is that is being measured and considered to be biologically
significant. In the case of body weight, a 10% change is suggested as a biologically significant
change. Is this 10% change in the mean values of different groups (this interpretation is suggested
on p. 58, line 12) or an individual drop in body weight that is 10% below an average (unexposed)
level? To see what difference this could make, consider a test group that had 50 animals with 25
of them having body weights of 90g and 25 with body weights of 95g, whereas the controls
averaged lOOg. As a group the treated animals average 92.5g (only 7.5% below the controls and
so not biologically significant?) whereas, individually, 50% of the animals exhibited a 10%
decrease in body weight. The latter appears to be a serious effect if 10% decrease on an individual
basis is important. The implications for setting BMRs are important also. If biological
significance is on a group average basis, then the BMR should be defined in terms of changes in
the mean value [(m(0)-m(d))/m(0) = . 10]. On the other hand, one might want to set the BMR in
terms of a relatively low probability (10%) that individual animals will experience a body weight
that is 10% lower than background. In the former case (based on average change), the treated
group in the example would not appear to have reached the BMR level. In the latter case
(individual basis) the group would be considered to have greatly exceeded the BMR response (50%
of the group members had 10% lower body weight!) and the dose for that group would appear to
be well above the BMD (or even the MLE for the selected BMR).

Secondly, if the BMD is intended to replace a NOAEL, then does this assignment of the BMR to
the biologically significant (LOAEL-like?) response tend to overestimate the NOAEL-like BMD?
The same concern might apply to the BMRs based on detection limits, except that the relatively
low power required (50%) might lessen the concern there.

It should be recognized (and perhaps explicitly stated somewhere in the document) that the decision
to base BMRs on detection limits carries with it some implicit acknowledgments. First, that
responses of a certain magnitude have no chance of being considered BMRs because of the sample
sizes, background rate (or variability), and power choices made. This implies (as it has always
done for NOAELs anyway) that one finds acceptable the high likelihood of "missing" changes of
such magnitude. Have people considered sufficiently the basis for the determination of the
standard sample sizes so that the Agency is willing to make such a statement? Second, when larger
sample sizes than the standard are used, it is quite possible that the BMDs that result will be
greater than LOAELs from those studies. The Agency needs to be willing to go on record as
supporting use of such BMD estimates and not defaulting back to a LOAEL or NOAEL just for
the sake of conservatism.

Because the manner in which the detection limits and BMRs are to be derived can only fix the
power (50% by policy choice) and the sample size (standard size) a priori but the background rate
may not be as well-determined, would the BMR be allowed to vary according to the observed
background rate hi any particular case? Would it be species and strain dependent and would
historical control data be used to define it? More needs to be specified concerning the choices for
background rates in the BMR derivations.

-------
Bruce Allen
Review Comments

A set of tables or graphs that gives detection limits as a function of background rate and sample
size might be very useful.

Starting in section III.C.2 (p. 23), issues of model fit are emphasized. It is important, therefore, to
be clear about what constitutes good fit and how it will be measured. It is not clear that the
standard techniques for assessing fit for continuous variables (e.g., F-tests) are adequate. Some
computer-intensive (simulation) approaches can be used and might be recommended (or built into
EPA's software). Such an approach would also alleviate another problem that is sometimes
encountered, i.e., having no degrees of freedom available for formal, traditional tests. The
simulation-based fit assessments do not need spare degrees of freedom, and I do not see a strong
need to limit the flexibility of the models used to fit data just for the sake of getting those degrees
of freedom.

Furthermore, the standard procedures for determining fit of continuous models look only at the
predictions of the means as compared to the observed means. They do not directly consider the
prediction of the variability. Yet, the estimates of variability are very important for BMD/C
estimation to the extent that the BMRs are based either on (1) changes in the mean relative to the
underlying variability or (2) a hybrid approach that depends on the predicted distributions around
the mean values to derive probabilities of response.

It appears that the agency needs to do a bit more development and make some decisions about fit
issues so that the guidance can be clear and explicit about how good or adequate, fit will be
determined.

Some questions/comments about the order of model application (p. 23, lines 12-20):
If a linear model is run first for continuous data, why not also for dichotomous data?
Why pick the polynomial model to run prior to the power model?
Do the choices for the continuous data models extend to the use of the hybrid approach? If
so, then the Weibull model (which has the added advantage that it is the same model that can be
applied to quantal data) should also be considered and explicitly listed.
Rather than picking an order for application, one might suggest the 2 or 3 models that
should be considered and that they all be run initially. This does not put such a burden on the
assessment of fit — a barely acceptable linear model might be substantially improved by adding
nonlinear terms, but this would not be determined in the step-wise procedure that is now specified.
The standard goodness of fit assessments are not really very good at discriminating between model
alternatives like that anyway.

With respect to the model structure (pp 23-25), the following comments are offered:
When (and if) one allows exponents on dose (and related parameters) to be less than one,
some procedure should be described whereby the instabilities can be lessened. One way we have
investigated is to do the fitting first with unrestricted exponents. Then, the exponent that is
returned as the maximum likelihood estimate is used as the lower bound on the exponent for a
second iteration, and it is only in the second iteration that lower bounds on dose are derived. Even
this does not always eliminate some very small values for the lower bound.
I would recommend eliminating the restriction and discussion of the background
parameter. I can think of no good reason for making a zero background the default. It is much
easier to make the default be the inclusion of the background term and only allow it not to be
estimated if the biological or toxicological data suggest that that is appropriate.

-------
Bruce Allen
Review Comments

Similarly, I would allow consideration of the threshold parameter, even though it does not
correspond to a biological threshold. Call it something else if desired, but there are instances
where its inclusion is essential for obtaining adequate representation of the dose-response
relationships. If alternative fit assessment procedures are implemented (see above) the loss of the
degree of freedom is not crucial. Moreover, by explicitly allowing that parameter, the class of
dose-response functions considered is increased. This is important because then the bounds that
are calculated and which constitute the BMDs represent the uncertainty in the dose estimation for
the larger family of curves. The resulting BMDs, even if they do not correspond to a curve that
includes a threshold parameter, were allowed to have it if the optimization dictated that it was
needed, and therefore one avoids criticism that the lower bounds are model-dependent in a way that
excludes "threshold-like" behavior.

The section on selecting the BMD/C (pp. 26-27) needs to be substantially altered. First, there is a
mixture of issues here that is not clearly delineated. The first issue is what to do with different
BMD/Cs for the same endpoint (and study) resulting from different model predictions. The second
issue is what to do with different BMD/Cs from different endpoints and/or studies. The first issue
is adequately addressed by the first two bullet items in this section, although I would emphasize
examination of the MLEs much more than has been done. The second issue is not addressed at all,
except to the extent that one can infer from the discussion of the NOAEL/LOAEL and BMD/C
comparison that the lowest BMD/C would be selected.

Even more important, that discussion of what to do with NOAELs/LOAELs when they look to be
the most sensitive is not adequate. Especially because the guidance lays out several cases where
BMD/C derivation should not be done, it is important to rethink how NOAELs and LOAELs can
and should be used, and not just rely on old concepts. The whole idea is to get away from the
problems of the NOAEL, not to exaggerate them by mixing them up pell-mell with BMD/Cs. As
an example, if the critical effect was from a large study, and the LOAEL was a LOAEL because of
the large sample size even though the observed response rate was less than the BMR, why would
one choose that LOAEL (or the corresponding NOAEL) as the point of departure? The basic
problems associated with use of NOAELs still plague this guidance if the last bullet item on p. 27
is all that is said about use of non-BMD/C results.

I have serious doubts that BMD/C approaches will prove to be very useful for cost-benefit
assessments and I completely disagree with the statement that the BMD approach "provides a good
starting point to develop benefits estimates for non-carcinogens" (p. 42, lines 22-23). NOAELs are
no better, but this is not an improvement that is provided by BMD-like analyses.

The discussion on p. 57, lines 4-8 does not make sense to me. What is the point here?

On p. 59, line 1, do the authors mean "refined" or "defined." The interpretation of the level of
effort needed before one can use the limit of detection approach for setting BMRs may depend on
that distinction.

I also think that the statements about the low background rate in the EPA-sponsored work on
developmental toxicity (p. 62, lines 1-3) is incorrect. For many of the endpoints that included
resorptions, the background rate was quite large. If one considers the analyses of the
developmental toxicity data sets that have been done to be the empirical equivalent of the limit of
detection calculations proposed, then it appears that the results of that analysis should be
interpreted as suggesting use of 5% additional risk when doing the recommended assessment of

-------
Bruce Allen
Review Comments

such data. The Agency should explicitly state that, unless other comparable analyses of
developmental toxicity data become available, the current information supports using 5%
additional risk for developmental toxicity data, especially in light of statements that the choice of
extra or additional risk is unimportant when a limit of detection approach is used.

The discussion on p. 68, starting with line 23 and continuing to p. 69, is not at all clear. Although
lack of independence is a problem that needs to be (and has been) addressed in certain instances,
what does the statement about "choice of the model form" mean?

On p. 72, lines 9-13, the discussion needs some attention. It is not that nonmonotonic data mean
that typical models can not be used, it is only that the fits might suffer because of the
nonmonotonicity. Careful consideration of the reasons for nonmonotonicity should be
recommended. Even still, log-transforming doses will not do anything about nonmonotonic
responses nor would it help much with abrupt increases in response.

The description of the figure on p. 79 may need some work. And why are the BMDs from the
figure different from any of those in Table 2?

Other general comments include the following:
There is insufficient attention paid to pharmacokinetics and the use of "delivered dose"
estimates in BMD analyses. The guidance should strongly encourage the use of such dose
estimates for BMD derivation and make note of the fact that the dose conversions (for the test
species) should be done prior to modeling, with "back-calculation" of human exposures associated
with delivered dose versions of BMDs completed after the modeling and BMD estimation. It has
been found (with vinyl chloride for example) that model fitting difficulties were largely resolved
when appropriate delivered dose estimates were used. [As a minor point, it might be noted that the
dose scalings referenced on p. 44 line 27 are for oral exposures — inhalation concentration scaling
is done using HEC considerations, right?).
The move to BMD approaches also offers and excellent opportunity to reassess the use of
uncertainty factors (UFs). UFs have been touched on in the document, but more explicit and
extensive discussion of how they might be considered and re-evaluated in light of the BMD/C
method could be added.
I think that the document should include a flow-chart showing how the various
considerations (mechanistic information, data set requirements, modeling constraints, etc.) come
into play and direct the course of a noncancer risk assessment. What this document should be
trying to do is describe the information inputs that point one in the direction of the type of analysis
(BMD or otherwise) that ought to be done. A BMD analysis using the standard set of models, the
default choices for BMRs, etc. may be the default of last resort, so to speak, in that one would do it
only if other options that include more chemical-specific and mechanistic inputs can not be
implemented or do not appear to be justified. A flow-chart would be very useful in summarizing
that process.

-------

-------
George Daston
     11

-------

-------
                                                         George P. Daston
                                                          August 27,1996

      Comments on ERA'S BENCHMARK DOSE TECHNICAL GUIDANCE
                               DOCUMENT

 My overall impression of this document is that it is a useful how-to manual on the
 application of BMD. It, along with the Risk Assessment Forum report entitled
 "The Use of the Benchmark Dose Approach in Health Risk Assessment" should
 be sufficient for toxicologists in the program offices to successfully and correctly
 apply this method for risk assessment  It is also worth noting that this document
 does a fine job of continuing EPA's efforts in harmonizing risk assessment for
 cancer and other forms of toxicity.  My most significant suggestion for the
 application of BMD is that h be based on a central estimate of the BMR instead of
 on a lower confidence limit (LCL).  White I agree with the use of a confidence
 limit in principle, its use becomes problematic when one tries to make
 comparisons across different toxic endpoints that are evaluated using study
 designs of varying group sizes, and varying statistical power.  Using a central
 estimate will 1)make better use of the one area along the dose-response curve
 that we can model with some precision; and 2) facilitate the comparison of critical
 effects for different endpoints. It should be possible using the central estimateto
 have a single default level of response for She BMR, rather than the sliding scale
 "limit of detection" approach. My suggestion is fleshed out in my response to
 question 4. a below.

 My answers to the specific questions are:

 1, a. The selection of studies and endpoints are, and should be, the same as for
 NOAEL-based risk assessment.  These studies, run according to regulatory
 guidelines, are widely regarded as satisfactory apical tests to detect hazards of
 all sorts.  These studies should therefore be adequate bases for risk assessment,
 regardless of the method employed.  However, the advent of the BMD should
 cause the Agency to suggest greater flexibility in study design, particularly in the
 number of dose groups and animals/group. It may be possible to design studies
 that better define the shape of the dose-response curve, especially its lower end,
 better than the standard 2-3 dose groups plus a control.

 It is worth making explicit the distinction between endpoints that are dichotomous
 or quanta! by nature (e.g., alive or dead, 5 or 6 fingers) than those that are
 quanta! by fiat.  The latter are best exemplified by the classifications of mild,
 moderate and severe that are widely  used in histopathology.  While these
 classifications are useful in providing the opinion of experts as to the severity and
 adversity of a finding, they obscure the fact that the observed responses are in
 reality part of a continuum. There may be some utility in de-quantalizing these
types of data for the purposes of  BMD-based risk assessment, particularly given
the Agency's Interest in using precursor and mose of action data to help
 understand the degree of risk in the range of extrapolation.

 1. b. There is no reason to make a distinction between cancer and non-cancer
endpoints.
 1. c. These criteria are appropriate, and the example provided in Appendix D is a
good one.
                              13

-------
 Other comments on Selection of Studies and responses: It appears that the
 guidance document indicates that the critical effect be selected as simply the
 lowest BMD.  This is connoted by the statement that all endpoints whose
 LOAELs are within an order of magnitude of the lowest LOAEL shouldbe
 modelled. This seems to be inappropriate, as it does not take into account all of
 the other information that is part of expert judgement, such as the plausibility of
 the effect, "its severity in comparison to other sensitive effects, etc., as well as the
 other information such as slope of the dose-response curve, that comes along
 with the calculation of the BMD, and may be very informative as to which effect
 should be selected as the basis for RfD calculation.

 The first of the three criteria for a minimum data set (bullet points on p. 4 and p.
 19) does not make sense to me. One of the real advantages of the BMD
 approach is that it would allow one to use a study that is statistically insufficient to
 generate a credible NOAEL or LOAEL but still conveys  enough information such
 that there is clear evidence of a hazard.  The second criterion in this section is
 also too restraining. While it is true that the choice of a mathematical model for a
 data set with only one pos'rtiveresponse group is arbitrary, it is no more arbitrary
 than the a priori choice of the dose level that ultimately becomes the NOAEL for
 the study. Given that the guidance for cancer risk assessment suggests a
 straight line as a default for low dose extrapolation, it seems appropriate to
 suggest a similar default for data within the experimental dose range.

 2, a.  For continuous variables, the biological significance determination is
 appropriate. The limit of detection approach is appropriate for those endpoints
 for which there is general agreement that some  level of  effect on that parameter
 is adverse, but there is insufficient information or consensus to pinpoint a specific
 level,  in those instances, it seems to me that an 80% statistical power would be
 more comparable to currently employed limits of detection than a 50% level. For
 those continuous variables for which there is no generally agreed upon
 interpretation regarding adversity, it would be my opinion that these not be used
 for risk assessment, although they may still be useful as auxiliary information.
 For quantal endpoints, the decision as to whether something is adverse should
 also be made  a priori and should be contained in the guidance given in regulatory
 risk assessment guidelines. I find a consistent level of response as the default
 for these endpoints to be more appealing than the limit of detection method, as
 this will facilitate comparison across endpoints.

 2. b. It looks OK, although as noted in the response to 2.a. ft is not my preferred
 option for quantal endpoints or some continuous ones.

 2.c. The  results of the simulation should prove useful.  Others have made
 calculations on limits of detection based on the CVs for various endpoints
 commonly measured in screening studies with sample sizes recommended by
 regulatory testing guidelines. These may also be a good source of information.

2,d. See  my response to 2. a, particularly regarding the need to first determine
whether a response is adverse.
                              14

-------
 3. a. The order of model application is satisfactory.  The log-logistic model for
 quanta! data has been demonstrated to be flexible enough to handle most dose-
 response curves, and does not have the problems of the Weibutl model in fitting
 the lower end of dose-response curves w'rth very steep slopes. The
 recommendation that the curve from each model be graphically displayed and
 critically reviewed for Hs relevance to the data, especially the lower end of the
 dose-response, is appropriate and cannot be stressed too much.

 3.b. The guidance document provides some flexibility in choosing additional
 models on an ad hoc basis as long as the choice is explicitly justified, so there is
 little need to include additional models. There is also no good reason to restrict
 further the number of models, at least until more experience is gained on the
 behavior of the models for a variety of toxic modalities.

 3.c. These defaults apprear to be appropriate and are in line with what was
 recommendede by an expert group at the EPA/A1HC/1LSI workshop on
 benchmark dose. The only point that is not supported by that working group is
 the choice of excess risk over additional risk, a point on which that group could
 not reach consensus. The explanation for the decision not to include a threshold
 term is very well put: there is no relationship between this arbitrary contrivance
 and a biological threshold.

 3. d. The approach is adequate, and as noted above, it is an excellent
 recommendation that the curve from each model be graphed. It should be stated
 in stronger terms that the exclusion of high dose data should be a last resort if
 none of the models fit. A preferred alternative would be to select other models
 that are not on the short list of recommended defaults. Furthermore, prior to
 excluding data, all of the data points should be graphed in a scattergram as an
 aid in determining the possible causes of lack of fit of any model (e.g., extreme
 non-monotonicity of the data).

 4. a. EPA should consider using central estimates instead of lower confidence
 limits in calculating the benchmark dose, especially for data from studies that are
 conducted  according to accepted regulatory guidelines.  There are several
 reasons for this recommendation.  First, one of the main reasons for using a LCL
 was to penalize studies with low sample size or other statistical deficiency as
 compared to standard guideline studies, which are widely regarded as adequate
 to detect hazard. However, as long as the studies that are used for BMD
 calculation  meet the requirements of the guidelines, this reason for relying on
 confidence limits is obviated. Other means can be employed to handle
 substandard studies, such as reliance on confidence limits for those that do not
meet the minimum requirements of the regulatory guidelines, employing
additional uncertainty factors, or simply not considering them as the source of the
 critical effect for risk assessment. Second, use of the central estimate is a better,
 more precise use of the experimental data. These data represent the area of the
dose-response relationship of which we are the most certain It is for this very
 reason that the potency comparisons in cancer risk assessment rely on central
estimates of the TD10 rather than a LCL  Why then would we wish to arbitrarily
                                  15

-------
 discard this small shred of certainty In an otherwise uncertain process? The
 third, very pragmatic reason for relying on a central estimate is that it greatly
 facilitates comparison of different endpoints. For a variety of reasons, regulatory
 guidelines for the detection of hazard of different endpoints rely on various
 numbers of animals per group and have different statistical power. A good
 example is the difference between developmental toxicity studies, where a 3-5%
 Increase in risk for malformations or resorptions may be statistically discemable,
 vs. a neurotoxicity study where it may take a 30-40% decrement In a clinical
 parameter before it is discemable. It  is clear that the consensus of the
 neurotoxicology community is that this design is satisfactory to detect a hazard;
 however, neither the NOAEL-based or BMD (calculated as a LCL) approach ould
 be adequate to assess risk from these studies. The former would be far too
 insensitive, and the latter would be overly sensitive. Furthermore, neither would
 be easily comparable with the developmental toxicity results.  If we were to
 evaluate a set of chemicals that were equivalently neurotoxic and
 developmental^ toxic using the NOAEL approach, all of the RfDs would probably
 be based on developmental toxicity; using the BMD, all would be based on
 neurotoxicity. This does not make sense. Therefore, I  recommend that the BMD
 be calculated as a central estimate, and that other steps be taken to account for
 study insufficiency, etc.  This recommendation also makes it easy to select a
 consistent level of response as the BMD for quantal endpoints, across all forms
 of toxicity. I find this to be preferable than the sliding scale approach that is now
 being taken in the Guidance Document.

 4. b.  If confidence intervals must be calculated, these defaults seem OK, at least
 to this non-expert.

 4. c. As noted above, I do not advocate using confidence limits for studies that
 meet regulatory testing requirements. Should a confidence inteival be used, I
 suggest that 1-1.5 standard deviations would be adequate.

 5, a. These determinations of equivalency appear satisfactory.
 5. b. I have no knowledge of the AIC.
 5. c. Yes.

 6. a. The EPA is to be congratulated for its continuing attempts to harmonize
 cancer and non-cancer risk assessment.  The use of the BMD is one way of
 moving toward that goal. I think, however, that it should be recognized that the
 BMD is not a credible basis for low-dose extrapolation for either endpoint. While
 this is acknowledged for non-cancer endpoints, it is not for cancer. Statements
 like that on p. 11, line 18 need to be rethought and either qualified or removed.

 6. b. The document appears to be right on target for the intended audience.

 6. c. The organization is satisfactory.

 6. d. The examples in Appendix D are excellent.  I suggest that the remainder of
the chemicals in IRIS for which the RfD was derived from a BMD also be
 Included in Appendix D for (Illustrative purposes.
                                 16

-------
Elaine Faustman
    17

-------

-------
Elaine Faustman
      School of Public Health and Community Medicine
      Department of Environmental Health-Roosevelt
      University of Washington
      4225 Roosevelt Way NE, #100
      Seattle, WA 98105-6099
      Phone:(206)543-9711  FAX: (206) 685-4696

      Comments on the USEPA Draft Benchmark Dose Technical Guidance Document
      (EPA/600/P-96/002A).

      August. 1996

      Page vii, line 4 & Page 3, lines 12-14. The first paragraph of document
      states that it is to be used in conjunction with EPA 1996c document.  For
      usability, the more "stand alone" this document  is, the easier it will be to
      use.

      Page 3-4,  lines 28, 29 and 1.  Has the EPA conducted studies to determine
      that only LOAELs within 10X of other LOAELs need to be evaluated for BMD
      analysis?  Is it true that no BMD/Cs would be less than 10 fold from the
      LOAEL.  Add references or detail in later section.

      Page 4, lines 2-14.  Other criteria that could be evaluated,  includes
      guidance on what to do with non-monotonic dose response relationships.

      Page 4, lines 11-14.  USEPA should explain in detail how they determined the
      criteria used in bullet item 3, under question 2. This reviewer had
      difficulty in determining how curves with all responses above 50% would all
      automatically be inappropriate to model. Additional criteria may be needed
      in this bullet item.  For example, perhaps specifying maximal percent
      response change per doses evaluated or ratios of dose spacing compared to
      response change.

      Page 10-12. This reviewer would suggest that the paragraph  starting  on Page
      11 at line 21 should go on page 10 at line 30.

      Page 11, lines 13-16.  Important concepts  are discussed here, yet it was
      unclear to this reviewer the rationale for choosing a linear default
      analysis.  Will a further section answer this?  More details need to be
      provided here, rather than just referring to  other documents. [See comments
      on Page 16].

      Page 11, lines 12-13.  Please make the following changes  in these lines:
                                        19

-------
Elaine Faustman
       page 11, lines 11-13.
       11	with appropriate curve-fitting models; and then (2) extrapolation
       below the range of observation is accomplished by modeling if there are
       sufficient mechanistic data or approaches or by a default procedure (linear,
       nonlinear, or both) if no such models or mechanistic approaches exist."

       Page 13, Figure 1. Is there a need to show the BMR and BMD at specific
       percentage response levels to clarify the figure? Also should the
       illustration show a BMR above the NOAEL, as well as below?  Should  the LOAEL
       be identified on this graph?  Should statistical significance of data points
       be shown?

       Page 14, lines 8-11. it  is not user friendly to constantly refer to other
       documents that should be used in conjunction with this document.  Write one
       benchmark document and provide enough details to be useful as a "stand alone"
       document.

       Page 14, line 23. Add references to this sentence.

       Page 14, line 29. What studies are referred to in this sentence? Only  those
       by Faustman et al  or is reference being made to earlier cited papers in lines
      26-27.

       Page 15, lines 12-14. These sentences give the impression that there  was no
      biological rational'for evaluating  reduced fetal weight. These sentences
      should be modified to include this rational for these choosing these studies.

      Page 15, line 14 & 17.  Replace  "cut off values" with response levels.

      Page 16, line 9.  This reviewer feels that caution should be used when
      mentioning the Baysian approaches because the only cited paper is not in a
      peer-reviewed journal.  Please add additional specific references for this
      application or remove this concept.

       Page 16, lines 12-18. Insert details from page 11, first and second
      paragraphs (lines 1-20) here. A  brief paragraph without details
      (specifically details in lines 8-16) can be left on page 11 that discusses
      the loss of dichotomy between cancer and non-cancer approaches.

      Page 16, lines 26-29 and page 17, lines 1-3. This willingness to continually
      incorporate new improvements in these processes would have more weight if the
      specific time of re-evaluation or the  next re-evaluation was also specified
      or at least the process for re-evaluation was specified. (See also comments
      for page 40).

      Page 17, line 17.  How "near"?


                                        20

-------
Elaine Faustman
      Page 19, lines 13-14.  Please provide rational for only looking at other
      endpoints if LOAEL is within 10 fold over the lowest LOAEL. Have studies
      been conducted that show that no other endpoints would result in lower
      BMD/Cs? For this reviewer, this point was not intrinsically obvious.  (See
      earlier comments on Page 3-4, lines 28, 29 and line 1).

      Page 19, lines 22-24.  Specify what is done when only one responding group is
      present.  Does the risk assessor use a NOAEL approach?  Indicate what is
      done, not just what is not done.

      Page 19, lines 25-28.  This constraint needs to be explained.  Why was 50%
      response chosen?  This reviewer would suggest that more specific guidance
      could be given.  (See earlier comments for page 4, lines 11-14).

      Page 20, section 3 Combining data.  Add reference here to peer-reviewed
      publication by Allen et al.

      Page 21, lines 14-17.  The guidance document should provide a few more
      details on what would be evidence of "biological significance".  This
      reviewer would suggest listing example EPA risk assessment guidance for
      developmental toxicity here.  Also, perhaps, referencing groups such as MARTA
      that publish guidance information. Would EPA accept "biological
      significance" only after peer review or consensus workshop concurrence?

      Page 21, lines 18-22.  Regardless of findings of simulation studies, this
      section needs to be expanded. Is an example given in Appendix B for each of
      these approaches?  Adding a few more details here could help  in  understanding
      this approach. This was very confusing for reviewer.

      Page 22, lines 1-4.  Document should explain why "extra risk" should be used
      for BMR set on basis of biological significance.  Also explain why it does
      not matter for limit of detection approaches.  Need to add glossary so users
      truly understand extra versus additional  risks.
      Don't hide definitions in appendix.

      Page 23, lines 7-8.  Explain what "curve fitting in a manner similar to the
      EPA software" means.  Reviewer needs additional details to understand these
      comments.

      Page 23, lines 12-20.  Provide a few details to justify the order which
      models are to be run on the data.  This justification could be as simple as
      adding a few references or adding a few sentences that explain the order of
      model selection.
                                      21

-------
Elaine Faustman
      Page 23, lines 22-24.  Again, this reviewer cautions the authors about
      referring to other documents for details that are needed in this document.
      Pull out key points and list here.

      Page 24, lines 3-9.  Add references to justify this approach.

      Page 24, line 18. Add some examples of what additional information or "work"
      is needed.

      Page 24, lines 19-20.  To be user friendly, add complete thoughts here rather
      than just referring to other places in the document.

      Page 24, lines 21-25.  This reviewer agreed with the approach delineated for
      the "threshold" intercept term.

      Page 25, line 25. Specify whether the EPA software will  include this
      approach.

      Page 25, lines 2-9.  This reviewer would suggest that reference to studies
      showing loss of statistical powers should be added here. Also the reviewer
      would suggest that a simulation study would probably show a "reward" i.e.
      decrease in confidence limit and increase in BMD/C if data is kept as
      continuous data.

      Page 25, lines 10-15.  Authors should add a sentence or two that discusses
      likelihood theory.

      Page 25, section 5.  Where will the concept of "non convergence of models" be
      discussed? This reviewer suggests that this location might be appropriate.

      Page 25, lines 24-29.  This reviewer applauds the authors for their
      requirement of graphical displays of the data.

      Page 26, lines 10-15.  This reviewer cautions the dropping of "high doses"
      without producing some additional guidelines.

      Page 26, lines 25-28.  Additional details on the Akaike Information Criterion
      are need.  Are these provided in the Appendix?  If so, add note.  This
      reviewer was surprised that no reference was given for this method. Authors
      must add peer-reviewed reference.

      Page  27, lines 5-8. Add a few details on how risk assessors could use
      evaluation of the MLEs to determine patterns or add reference to example in
      appendix.
                                     22

-------
Elaine Faustman
      Page 27, lines 9-12. Author should specify what the risk assessor should do
      when there is a mixture of BMD/Cs and NOAEL/LOAEL values and the critical
      effect is a BMD/C

      Page 28, lines 15-25.  Good examples.

      Page 29, lines 3-5. Would using a BMD/C approach possibly change the use of
      an extra 10 fold for inadequate experimental design (if the study could still
      meet the earlier criteria for allowing BMD/C use)?

      Page 31, line 8.  Insert the word "of between the word "use" and the words
      "these approaches."

      Page 31, Section V. This reviewer feels strongly that consistent
      nomenclature be used for all endpoints whether they are for non-cancer
      endpoints or cancer endpoints. Surely we can arrive at a consistent term for
      EDx, TDx, or BMDx.

      Page 32, line 1.  Please define lifetable methods and summary incidence
      methods.

      Page 32, lines 3-9. Authors must provide some additional details here. The
      document describes these approaches for BMD/C calculations and it should also
      provide similar details  for ED10s if this is really going to be a useful
      document.  Identify and explain where there are differences in these
      approaches.

      Page 33, line 18.  Correct typographical error.

      Page 34, line 17-18. Add details and reference to justify statement that
      "There might be modification other than DNA reactivity (e.g.. certain
      receptor-based mechanisms) that are better supported by the assumption of
      linearity."

      Page 35, lines 26-27.  Authors need to Illustrate how the MOE analysis
      considers steepness of the slope.  If these point are illustrated in the
      appendix, then please add reference here.
      Page 36, lines 7-9. Authors need to explain what is meant by the statement
      that"... tumor data might support a greater MOE than a more sensitive
      precursor response...".  Please give examples to illustrate what is meant by
      "greater MOE".

      Page 36, lines 14-18. Authors again need to illustrate these points with
      examples and additional details.
                                       23

-------
Elaine Faustman
       Page 36, lines 19-28.  Authors highlight problems in using different models
       for curve-fitting, yet they do not offer solutions.  Will the new software
       that is being developed include multistage models for use?  If so, cite here.
       How does the risk assessor resolve these differences between models?  Authors
       must provide better, specific guidance.

       Page 38, lines 21 and 22.  What does this sentence mean?  How can something
       be both more qualitative and quantitative? Explain.

       Page 40, lines 3-12. See earlier comments about delineating a process for
       updating (page 16).

       Pages 43-45. Authors do a good job at identifying research needs and
       inconsistencies that need to be addressed. Authors should describe a plan of
       how to address these critical needs.

       Page 55, lines 15-26.  Good discussion, but authors need to define what is
       meant by "poorest results".  Does this refer to comparisons with NOAEL
      values, size  of confidence levels, etc.?  Please specify.  Is this the
       reference that was used to set up criteria for acceptance of a BMD/C
       analysis?  If so, please provide a few more details to substantiate these
       criteria.

       Page 56, lines 14-17.  Will the examples in the appendix show how continuous
       data is used when individual animal data are not available and only summary
       data with a measure of variability? Please do include in the examples

       Page  59, lines 3 and 4. Do authors mean exposure versus dose in this
      sentence?

      Page  59, lines 24-26. Why is the BMD/C : NOAEL ratio  of one set as a goal?
      Please justify.  Is the NOAEL set as a "gold standard" for comparison? This
      reviewer would suggest that this is inappropriate.

      Page  60, lines  8-14. Add these definitions to the text as well, not just in
      appendix.

      Page  64, Figure 3. Authors should prepare similar tables for standard
      experimental sizes for both cancer and non-cancer experiments.  This would be
      especially interesting for the neurotoxicity behavioral study designs.

      Page 65, lines  18-24. Are the "other requirements" for setting the BMR going
      to be  developed by EPA? This reviewer would certainly encourage some
      additional agency work on this topic.
                                       24

-------
Elaine Faustman
      Page 66, lines 6-12.  Please specify what are parameters a and b. What
      parameter is background in your equation?

      Page 67, lines 4-6. Authors could give criteria for model selection.

      Page 70, lines 2-8. Authors need to provide additional details on how to
      handle the identification of the "best1 formats.

      Page 71, line 9.  Authors should explain what  is a "correlation structure."

      Page 71, lines 17-18.  Will the EPA model package include a goodness of fit
      statistic program?  This should be included.

      Page 71, line 23.  How large?

      Page 73, line 13.  Describe the likelihood ratio test and add a reference.

      Page 73, lines 15-17.  As this reviewer noted earlier, additional details and
      references on Akaike's Information Coefficient are needed.

      Page 74, line 21.  Add a few more details about the asymptotic normality
      approach for constructing confidence  limits.

      Page 75, lines 13-18.  Could SAS macros be written and included  as part of
      the EPA software?

      Page 76-78. Authors need to provide additional details on how individual
      versus group data were used in the model.

      Page 76, Figure 4. Was there a significant trend  test for these data? What
      was the GOF statistic? Please add these details.

      Page 78, lines 21-28 and Page 79, Figure 4. The explanation for Figure 4
      needs improvement.  A key to identify line types  is needed and possibly a
      larger range of line styles maybe necessary to clarify responses. For
      example,  line 23 and  24 refer to the lower solid line which this reviewer
      could not identify.

      Page 80, Table 1. Add what incidence is evaluated to the table heading.

      Page 81, lines 4-6. What provision is available in the guidelines?

      Page 81, Table 2. Were these values obtained using Fleiss, 1981 approach?
      If so, please cite this reference.
                                       25

-------
Elaine Faustman
      Page 82, lines 6-8. How did the author  assess the "excellent model fit?"
      What are the GOF statistics?

      Page 83, Table 4. Authors should carry this assessment to a conclusion.
      Illustrate how the databases could be combined.

      Page SS.'Table 1. What are the units listed for fetal wt.?

      Page 86, Table 2. What incidence is given in this table?  Please label.

      Page 91, Example 4.  It appears to this reviewer that a Fleiss, 1981 based
      table for N=50 is needed to obtain the values in Table 1. If this is so,
      please add.

      Page 92, line 14.  Authors state that the  data could not be adequately fit.
      How do the authors  know this? Was the GOF statistic rejected?

      Page 92, Example 4.  Authors need to give "bottom line".  What BMD/C will be
      used for risk management?

      Page 94, Appendix E. Model development looks great  but give increased
      indication throughout text  what models and features will be included  in
      the model package .  Will any GEE approaches be included?

      Appendix:  In general, the examples were set-up well and this reviewer liked
      the inclusion of a summary  of the main  points that were to be illustrated in
      each example. Overall however, the examples need additional details, perhaps
      even example data input in a format that will be compatible with the model
      package that you are putting together. Also provide basic statistical
      information and trend analysis for each of these examples.  Include
      statistics for GOF and likelihood ratio tests if applied.  All examples
      should have NOAEL and LOAEL values given for comparison. It was not always
      clear to me that a risk assessor had enough details from these pages to identify
      and independently determine all of these values. Authors should insure that
      each example stands alone and no other resources are needed to do any of
      these calculations.

      Glossary.  This technical document needs a glossary. Key words that need to
      be  included in the glossary (not all inclusive  list):

      Akaike Information Criterion
      Asymptotic Normality Approach
      Lifetable Methods
      Likelihood Theory
      Likelihood Ration Statistics
      Goodness of Fit Statistics
      Summary Incidence Methods
                                          26

-------
Jeff Fowles
   27

-------

-------
U.S. EPA Benchmark Dose Workshop Premeeting Comments
J. Fowles, page 1

Comments on the USEPA external review draft document: Benchmark Dose Technical Guidance
Document. August 9,1996. EPA/600/P-96/002A.
1) Selection of Studies and Responses for Benchmark Dose/C Analysis.

It is important to acknowledge that, particularly for acute non-cancer toxicity studies, the data are often
comprised of small sample groups (i.e. 5 or 6 animals per group). This means that observing responses in
the range of the BMR will not be possible for many of these studies. Therefore, it seems that the criteria
for data used in BMD/C calculations for acute toxicity studies requires some flexibility. This concern
would probably only apply to non-cancer data sets.

There are methods for determining appropriateness of combining data sets for analyses. One example of
such an approach is given in the appendix under "combining data sets".

2) Selection of Benchmark Response Level

The use of biological significance for continuous data seems a reasonable approach. This approach would
presumably supersede any lack of statistical significance if the data are expressed as dose-related changes
in the mean. The transformation of continuous to quantal data for the analyses is a straightforward and
logical process.

The explanation of the limit of detection needs further detail. As .it is currently written, there is little actual
guidance provided, and much is left to the risk assessor's understanding of relatively sophisticated
statistics. For example, according to the guidelines, a power level must be chosen for each species and
endpoint. The risk assessor must then decide on an appropriate incidence of detecting a difference from
background (50% is given as an example hi the document - is this a default recommendation?). How
should one go about selecting this distinguishing incidence?

The authors should make some clarifications in certain other areas as well. On page 59, lines 25-26 there is
a statement that there is a goal of achieving an "average BMD/C:NOAEL ratio of one." This "goal" has
not been discussed in any previous portion of the document and should be explained.

The defaults for the quantal and continuous data, in general, seem reasonable. I did not understand the
percentages given on page 60, lines 22-23. It seems that the percentage for extra risk in the example
should be 50%, not 10%. -

In our experience, there are arguments supported in this guideline for use of a 5% BMR and the log-normal
probit model for acute responses (single exposure studies of membrane irritation, neurological
disturbances, frank effects, or lethality). The data for this default are presented in the attachment provided.

3) Model selection and fitting:

The models presented appear to be adequate for the majority of chronic and developmental data sets.
However, the log-normal probit model has been historically the dominant model used hi acute toxicity
studies and our preliminary analysis indicates that the probit model compares favorably with the Weibull
model using proximity to NOAELs and distance between maximum likelihood estimate and 95% lower
confidence limits as evaluation criteria. Since, on the last page of the document, the probit model is
included in the software planned for release, it should be added as an acceptible default method for acute
toxicity data sets.
29

-------
U.S. EPA Benchmark Dose Workshop Premeeting Comments
J. Fowles, page 2

4) Use of confidence limits:

This section of the document seems complete.

5) Selection of the BMD/Cto Use as the Point of Departure for Cancer and Non-cancer Health Effects

There should be a preliminary analysis comparing models in order to determine if the default factor of 3 for
differences in BMD/C results is appropriate for eliminating concerns about model selection.

A discussion of the theory behind the Akaike Information Criterion, including its major assumptions would
be helpful,

6) General comments^

The document is written in an informal easy-to-read style, which is useful for the general readership.
There is a good deal of very useful information conveyed in the document. However, in certain areas, the
document relies on a good deal of implicit knowledge and contains unsupported default assumptions. The
decisions to use defaults in key areas are not clearly explained (e.g. why 10% is now the recommended
default for quantal data sets, when previously the benchmark dose workshop concluded 5% or 10% were
adequate (Barnes et al., 1995)). A more detailed discussion about the mathematical assumptions that are
integral in the different models would be helpful (e.g. Weibull versus quantal polynomial models, versus
probit model). Risk assessors need to be educated about the assumptions they are making when using these
models to avoid their reliance on "black-box" software outputs.

Other specific comments:

Section on uncertainty factors (page 29): The document states that the only change in uncertainty factors
with BMD/C methodology would be the absence of a LOAEL to NOAEL uncertainty factor. However, the
document also states that there is a "goal" of achieving a BMD/C to NOAEL ratio of 1, on average (page
59, lines 25-26). In other words, the BMD/C is simply trying to approximate as closely as possible, a
NOAEL. The problem with these two simultaneous statements is that the basic premise of conducting
BMD/C analyses is to improve the considerations of dose-response and sample size in our estimations of a
threshold. If we have confidence in our methodology, then a) it is not necessary to judge the results by
their proximity to the NOAEL, and b) uncertainty in the estimate has been reduced. We have proposed
that the uncertainty would be reduced in intra-animal variability in response, which likely has some bearing
on inter-individual variability.

Typographical and/or grammatical errors:

Page 27, line 5: "a" should be changed to "of".
30:

-------
 I
 Q
 o
 •*
 f2
 bfl
 a
•S
 o
,0
'53
1
OQ
 CO
 &
I
o
e

i

C3
 C


 -;
      c
     >^M

     t3
 G
 O
 =
 o
     ^O

     o
          «s
 c
 o
^wa

 c:

 a
 o
U

; ••
,
•—
CO
.22


rN
t~~i

ON
0
CN
'"•'


en
CN
CN
' — '
o"
CN
^
00
oo'

^^
._!
ON
vn



S
3
3
<
C3
S
QJ O
o t-~
CO ON
c
o
to

£>
s^
2 co
•a|
CO 2
cu ,*~;
U« N 	 '

en
00
en

CN
CN
oo
en


o
0
en


G"
S'o




-rS?
^H ^

2
3
2
^

13
Verberk e
(1977)

&*
O C
2 6
§"S
TJ §
C """
S h
3?.S

Vl
d


_ *
Q CN
£; d
Q^
,2; d


o

IZ >/-!
0?
o

00

D

1 	 ]
O
VO

2
o
5
^
S
ON
Silver anc
McGrath




*
Q vo
IZ; 00


m "^
o P
^
3N

^
3
VO

2
0
3
<
"3
,
• «•-«
C3
JS
.22

00
CN


CN
^^



00
en
en
CN
O
3N
CN

, — ^
ON OO
N ^l*



f^-
^f
n pxj
— 1 v-^

s
0
3
<

CO
1'
>
t/1 v_*



0)
o
E
>»
• _«
CO
.22


en
ON

ON
o



0
ON
O
x>
^f


o&N

ON*
n

•^j-
en
•o



3
N
3
«

13
aS
CO x— ^
•0 ON
< C-



^
2
^
. M
J=
.22

en
O
oo

O
' — '


0
o
o"
o
ON
oo


en ^
2"E
^
,__!
!0
3
OO
en
en
00

•s
5

O
"° CN1
ca ^*
o\
MacEwen
Vernot(I



CQ
2
>,
• --
j=
u


VO


—
'"^
en



oo
vo
CN
en
CN

ON

CN
en
CN

^ ^
1\
VO
—
CN


S
_o
U

"ca
Werner et
(1943)



»
• •M
C8
^J


O


ON
O



CN
oo
CN
en
o



{51
— 3

•f
i
j\
0
t~~l


W
Q
W

"c3
Werner et
(1943)



u
o
E
>,
••^
CO
J2





Q «
2 d
_ *
Q CN
Z d

o
ON
ON
CN
*
Q ON
H ON
•z; CN


f< '^
n CJ/
<0
^f
~l
t— <
^
CN


W
^
w

"c3
Werner et
(1943)



S
£
£-,
.•M
CO
r=


t — .
CN
ON

CN
1 — '



ON
en
en
vo
CN


ON"
N G*
en
^
^
»— '
n
CN


[d
O
W


Kulle et a
(1987)




.2
£-?

u a
5-e

o

•*

_
^^
o
CN


0

O


*O rrj
dS



/-~\
Sg
O N»^
CD
•o
5
"ca
3
o
P-i

"c8
1 s
K C-



	 	 	
2
>,
• M
^
^2

,_
CN
OO

^.
'— '
ON
•"^

oo
CN
en
J\
•"^


C^
^0

3
•*
ON
S
•*




U
ffi

13
i_i
Q C-



^_^
2
>,
.•M
2
.22

vo
^
^^

•n
— *
CO
1 — '

oo
N
10
CN


0s
N CH'
oo
^»
^~t
' — '
CT\
'""'



u
K
                                                31

-------

£3
»#-*

«s
O
D*
fc*rf
•a
c
id

o*
^o
55
Icf
Zl W

a
I cg

9** <*y\
*^
"•
M <^*S
r P*
SI

>J
[Jj '"'•
C P*
Q g;
EJj1 *jj*