NAC-0175_R0
Analysis of UCL Simulations
at the Lognormal Distribution
Performance of the Chebyshev UCL
Estimators and Improved
Recommendation Rules
6 January 2022
Prepared by
Neptune and Company, Inc.
1435 Garrison St, Suite 201, Lakewood, CO 80215
-------
Analysis of UCL Simulations at the Lognormal Distribution
1. Title: Analysis of UCL Simulations at the Lognormal Distribution
2. Filename: Analysis UCL lognormal sims report_Neptune formatted.docx
3. Description: The report analyzes of the results of simulations of upper confidence limits
(UCL) for the mean of the lognormal distribution, using various UCL procedures, after
generated lognormal data sets were filtered through the goodness-of-fit (GoF) rules
recommended for ProUCL 5.2. The purpose was to get an approximate characterization of
the behavior of various UCLs calculated by ProUCL in data that have been identified by the
GoF rules as putatively lognormal. The analysis culminates with the identification of a
simple set of rules for recommending the choice of UCLs in data that the GoF rules have
identified as tentatively lognormal.
Name
Date
4. Originator
John H. Carson Jr.
6 Jan 2022
5. Reviewer
Paul Black
4 Jan 2022
6. Remarks
6 Jan 2022
-------
Analysis of UCL Simulations at the Lognormal Distribution
CONTENTS
CONTENTS iii
FIGURES iv
TABLES vii
ACRONYMS AND ABBREVIATIONS \iii
1.0 Introduction 1
2.0 Simulation 1
3.0 UCL Properties 2
3.1 Bounded Loss for UCLs 4
3.2 Unbounded Loss for UCLs 7
3.2.1 Unbounded Coverage Loss for UCLs 8
3.2.2 Unbounded Accuracy Loss for UCLs 11
3.2.3 Combined Unbounded Loss 12
4.0 UCL Plots 12
4.1 Discussion of UCL Plots 23
5.0 UCL Recommendation Rules 24
5.1 Algorithm for Recommendation Rules 24
5.2 Risk Profiles 25
5.3 UCL Recommendation Modeling 27
5.4 Tentative Lognormal UCL Recommendations 29
6.0 Conclusion 29
7.0 References 30
Appendix A: Detailed UCL Plots using Deciles of Log SD 32
Appendix B: Session Info 53
6 Jan 2022
-------
Analysis of UCL Simulations at the Lognormal Distribution
FIGURES
Figure 1. Weighted linear bounded loss profiles for various parameter values. Values of b +
are chosen as: i) b+= b= 1, ii) b+= 1.5 matches the maximum loss for under-
and overestimation when a = 1; iii) b+= 1.67 matches the maximum loss for
under- and overestimation when a = 0.5, and iv) b+= 1.91 matches the
maximum loss for under- and overestimation when a = 0.1 6
Figure 2. Weighted logit squared error loss examples for coverage, c. = under-coverage
penalty coefficient. c+ = over-coverage penalty coefficient. Two different values
of c+ are shown 10
Figure 3. Weighted probit squared error loss examples for coverage. C. = under-coverage
penalty coefficient. C+ = over-coverage penalty coefficient. Two different values
of c+ are shown 10
Figure 4. Weighted mean squared error loss examples for inaccuracy (the combination of
bias and imprecision). B_ = negative bias penalty coefficient. B+ = positive bias
penalty coefficient. Three different values of b+ are shown, corresponding to
conservative, intermediate, and accurate estimates 12
Figure 5. UCL summary for Lognormal with log SD in (0.0831,0.859] 15
Figure 6. UCL Bounded Loss for Lognormal with log SD in (0.0831,0.859] 15
Figure 7. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD in
(0.0831,0.859] 16
Figure 8. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD in
(0.0831,0.859] 16
Figure 9. UCL summary for Lognormal with log SD in (0.859,1.37] 17
Figure 10. UCL Bounded Loss for Lognormal with log SD in (0.859,1.37] 17
Figure 11. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD in
(0.859,1.37] 18
Figure 12. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD
in (0.859,1.37] 18
Figure 13. UCL summary for Lognormal with log SD in (1.37,1.81] 19
Figure 14. UCL Bounded Loss for Lognormal with log SD in (1.37,1.81] 19
Figure 15. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD in
(1.37,1.81] 20
Figure 16. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD
in (1.37,1.81] 20
Figure 17. UCL summary for Lognormal with log SD in (1.81,5.41] 21
Figure 18. UCL Bounded Loss for Lognormal with log SD in (1.81,5.41] 21
Figure 19. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD in
(1.81,5.41] 22
Figure 20. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD
in (1.81,5.41] 22
Figure 21. Average of eight unbounded loss functions for various UCLs by log-scale SD and
sample size 26
6 Jan 2022 iv
-------
Analysis of UCL Simulations at the Lognormal Distribution
Figure 22. Maximum of eight unbounded loss functions for various UCLs by log-scale SD
and sample size 27
Figure 23. Decision tree for minimum average loss, pruned to two levels 28
Figure 24. UCL summary for Lognormal with Std Dev of Logs in (0.0831,0.576] 33
Figure 25. UCL Bounded Loss for Lognormal with Std Dev of Logs in (0.0831,0.576] 33
Figure 26. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.0831,0.576] 34
Figure 27. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (0.0831,0.576] 34
Figure 28. UCL summary for Lognormal with Std Dev of Logs in (0.576,0.773] 35
Figure 29. UCL Bounded Loss for Lognormal with Std Dev of Logs in (0.576,0.773] 35
Figure 30. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.576,0.773] 36
Figure 31. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (0.576,0.773] 36
Figure 32. UCL summary for Lognormal with Std Dev of Logs in (0.773,0.982] 37
Figure 33. UCL Bounded Loss for Lognormal with Std Dev of Logs in (0.773,0.982] 37
Figure 34. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.773,0.982] 38
Figure 35. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (0.773,0.982] 38
Figure 36. UCL summary for Lognormal with Std Dev of Logs in (0.982,1.17] 39
Figure 37. UCL Bounded Loss for Lognormal with Std Dev of Logs in (0.982,1.17] 39
Figure 38. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.982,1.17] 40
Figure 39. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (0.982,1.17] 40
Figure 40. UCL summary for Lognormal with Std Dev of Logs in (1.17,1.37] 41
Figure 41. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.17,1.37] 41
Figure 42. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.17,1.37] 42
Figure 43. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (1.17,1.37] 42
Figure 44. UCL summary for Lognormal with Std Dev of Logs in (1.37,1.57] 43
Figure 45. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.37,1.57] 43
Figure 46. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.37,1.57] 44
Figure 47. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (1.37,1.57] 44
Figure 48. UCL summary for Lognormal with Std Dev of Logs in (1.57,1.73] 45
Figure 49. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.57,1.73] 45
6 Jan 2022 v
-------
Analysis of UCL Simulations at the Lognormal Distribution
Figure 50. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.57,1.73] 46
Figure 51. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (1.57,1.73] 46
Figure 52. UCL summary for Lognormal with Std Dev of Logs in (1.73,1.95] 47
Figure 53. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.73,1.95] 47
Figure 54. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.73,1.95] 48
Figure 55. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (1.73,1.95] 48
Figure 56. UCL summary for Lognormal with Std Dev of Logs in (1.95,2.25] 49
Figure 57. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.95,2.25] 49
Figure 58. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.95,2.25] 50
Figure 59. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (1.95,2.25] 50
Figure 60. UCL summary for Lognormal with Std Dev of Logs in (2.25,5.41] 51
Figure 61. UCL Bounded Loss for Lognormal with Std Dev of Logs in (2.25,5.41] 51
Figure 62. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (2.25,5.41] 52
Figure 63. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std Dev
of Logs in (2.25,5.41] 52
6 Jan 2022 vi
-------
Analysis of UCL Simulations at the Lognormal Distribution
TABLES
Table 1. Loss function parameters 13
Table 2. Decision rule output for UCL minimum average risk tree 28
6 Jan 2022 vii
-------
Analysis of UCL Simulations at the Lognormal Distribution
ACRONYMS AND ABBREVIATIONS
CI
confidence interval
CoC
chemical of concern
cv
coefficient of variation
DQO
data quality objectives
GoF
goodness-of-fit
LOESS
locally estimated scatterplot smoothing
log SD
log-scale standard deviation
LOR
log of the odds ratio
MLE
maximum likelihood estimator
RelRMSE
relative Root Mean Squared Error
RPart
Recursive Partitioning
UCL
upper confidence limit
UCLmal
UCL type with minimum average loss
UCLmml
UCL type with minimax loss
6 Jan 2022
viii
-------
Analysis of UCL Simulations at the Lognormal Distribution
1.0 Introduction
This report presents a preliminary analysis of the results of simulations of upper confidence
limits (UCL) for the mean of the lognormal distribution after data sets are filtered through the
goodness-of-fit (GoF) rules recommended for ProUCL 5.2. The purpose of this is to get an
approximate characterization of the behavior of various UCLs calculated by ProUCL in data that
have been identified by the GoF rules as putatively lognormal. The analysis culminates with the
identification of a simple set of rules for recommending the choice of UCLs in data that the GoF
rules have identified as tentatively lognormal.
A UCL for the mean is the upper end of a one-sided confidence interval (CI) for the mean. The
coverage of a CI is the frequency of the CI covering the true mean, synonymous with the
confidence level. For a UCL, coverage is the frequency of over-estimating the true mean.
Although a UCL is defined in terms of a one-sided CI, in environmental practice it is very often
used as a conservative estimator of the mean concentration of a chemical of concern (CoC),
which is intended to represent the consequence of "reasonable maximum exposure" to the CoC.
The word "reasonable" must be emphasized here. The fact that risks based on UCLs are then
summed in a risk assessment can be exceedingly conservative, but that issue will not be
addressed here. UCLs for human health risk assessment are usually calculated for a confidence
level of 95% (significance level of 5%). Estimators with low bias and low variance are desirable.
In the risk assessment setting, value should be placed on:
low expected frequency of underestimating the mean (roughly equal to 100% minus the
confidence level),
small positive bias,
small average estimation errors, and
small probability of large overestimation.
UCLs for the mean are also used to assess compliance with environmental standards. Coverage
is perhaps more important in this setting. However, the consequences of exceeding a regulatory
standard by a large amount are generally deemed to be severe. This again emphasizes the
importance of ensuring that UCL estimators do not produce large overestimates or that they do
so very infrequently.
2.0 Simulation
The simulation used in this study generated 10,000 replicate data sets for each lognormal
distribution used. These distributions have a common mean of 100 and a wide range of
coefficients of variation (CVs) (25 values from 0.1 to 20) covering behavior from very slightly
skewed to highly skewed. Since these are all lognormal distributions, the CV determines the
standard deviation of logs of the values and vice versa. The CV is used as an index parameter for
the populations simulated in order to easily fit simulations for other distributions (and mixtures)
into the same framework.
The sample sizes of the simulated data sets range from 5 to 1,000 with 47 different values. For
each replicate, a sample of size 1,000 was generated as a parent sample from which samples of
the various sizes required were selected for computation of summary statistics and UCLs. The
6 Jan 2022
1
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCLs simulated include the Chebyshev 95% UCL, the Chebyshev 90% UCL, the H-UCL, the t-
UCL, the skewed t-UCL, the adjusted Gamma UCL, Hall's bootstrap UCL, the bootstrap-t UCL
and the BCa bootstrap UCL. The UCLs had a target coverage level of 95%, except for the
Chebyshev 90% UCL. The results, which took several days to compute using parallel computing
with up to 50 CPU cores, give an accurate characterization of the behavior of the UCLs
calculated.
The computations were performed in R 4.1.1. The H-UCL was computed using the EnvStats
library version 2.4.0.
The GoF selection rules, revised for ProUCL 5.2, are that data sets are treated as lognormal if:
they are rejected as normal at level 0.01 by both the Shapiro-Wilks and Anderson-Darling
goodness-of-fit tests,
they are rejected as being from a Gamma distribution at level 0.05 by both the Anderson-
Darling and Kolmogorov-Smirnov goodness-of-fit tests, and
they are not rejected as lognormal at level 0.1 by either the Shapiro-Wilks or the
Anderson-Darling goodness-of-fit test.
Data filtered according to these criteria were used to develop the performance measures and
recommendation rules for UCLs for designated lognormal data in ProUCL 5.2. The ultimate
objective is to model and improve the behavior of ProUCL 5.2, and later versions, when it is
given approximately lognormal data.
3.0 UCL Properties
A CI (including one-sided CIs) for a parameter is defined as a random interval that covers the
true (unknown) value of the parameter with specified probability (coverage probability or
confidence coefficient) from the point of view of the procedure generating the random interval
being applied over and over again to data generated by the same random process. In statistical
literature, the two most important considerations for CI are coverage and accuracy. There is
some disagreement about coverage. Most authors (such as Mood, Graybill, Boes, Randies,
Wolfe, Hollander, C.R. Rao, Davison, and Hinkley) indicate the coverage probability should be
at least approximately equal to the nominal value (confidence level) for a valid CI. A minority of
authors (such as Bickel and Doksum) hold that any CI for which the coverage probability equals
or exceeds the confidence level is valid. All of the authors listed have well-known books, listed
in this report's References section, which discuss confidence intervals.
All of the authors listed above hold that accuracy is a very important consideration. Accuracy is
a measure of the average closeness of the confidence limits to the true value of the parameter. In
the case of two-sided confidence, the average length of the CI is universally held to be a good
measure of accuracy. Generally, CIs are associated with the acceptance region of a statistical
hypothesis test constructed for the parameter in question. Most powerful tests are associated with
shortest length two-sided CIs and with most accurate one-sided confidence intervals. A UCL, U1,
is said to be more accurate than another UCL, U2, if both UCLs have the same coverage
probability, and on average U1
-------
Analysis of UCL Simulations at the Lognormal Distribution
compare. This suggests that estimation errors for UCLs be considered in terms of relative error,
%-i
As far as Neptune and Company, Inc. (Neptune) is aware, accuracy of CIs has not previously (in
version 5.1 and earlier) been considered in selecting methodologies for UCL computation in
ProUCL nor in their recommendation in specific cases. Here are three examples that illustrate the
importance of considering accuracy as well as coverage:
1. Suppose that one is attempting to estimate the average height of Americans, and measures
the height of a few individual Americans. One could construct a UCL that is the observed
sample average plus one inch. Or one could construct a UCL that is the observed sample
average plus 10 feet. The first UCL is likely to be somewhat close to the true average, but its
coverage probability is unknown. The latter UCL will have exceedingly good coverage
probability (100%, as there are no individuals over the height of 10 feet). However, the latter
UCL is clearly a ridiculous overestimate. It may be clear that it is an overestimate in this
example, but were soil concentrations the measured variable instead, it might not be so
obvious that this was a ridiculous overestimate.
2. Consider a UCL for mean concentration that is constructed as follows: all data is ignored.
One instead uses a random number generator so that, 95% of the time, the UCL is chosen to
be one million parts per million, and, the other 5% of the time, the UCL is chosen to be zero
parts per million. This UCL guarantees exactly 95% coverage probability. However, when it
overestimates the mean, it is likely to be a gross overestimation, and when it underestimates
the mean, it is likely to be a gross underestimation. Few would consider this UCL to be a
good method, despite its ideal coverage probability.
3. Suppose there were a UCL method that could construct an estimate that was equal to exactly
0.9999999 times the true mean every single time. Such a UCL method would have 0%
coverage probability, and yet would lead to an extremely accurate estimate of the true mean.
These examples suggest that coverage probability alone is a poor metric by which to judge a
UCL; accuracy should be considered as well. A slight underestimate is not necessarily worse
than a gross overestimate; some balance is needed. This would better align ProUCL with the data
quality objectives (DQO) process, which addresses both false positive and negative error rates.
When dealing with human health and ecological risks, there is a desire to be conservative, but
there are limits to this, as extreme conservatism could lead to expending great resources at a site
that does not require it, and preventing those resources from being utilized to better effect
elsewhere. Further, since risk calculations are generally calculated as linear effects with respect
to the mean concentration (represented by the UCL), slight overestimates or underestimates in
mean concentrations do not have a compounding effect on subsequent risk calculations.
Furthermore, the consequences of exceeding a regulatory standard are generally deemed to be
increasingly severe with increasing magnitude of the exceedance.
The considerations above can be formulated into loss functions, and UCL procedures can be
selected or ranked based on their risk (that is, their expected or long run average performance
with respect to the relevant loss functions). The use of loss functions in the face of uncertainty to
choose actions that minimize risk (expected loss) was developed in a branch of mathematics
known as Decision Theory. Decision Theory is used extensively in many fields of practice, such
6 Jan 2022
3
-------
Analysis of UCL Simulations at the Lognormal Distribution
as business, economics1, applied mathematics, statistics, computer science, and engineering.
According to Winkler (1972), writing from the decision theoretic point of view, "[a]n optimal
interval estimate ... minimizes the decision maker's expected loss." Lehmann (1986) defines a
uniformly most accurate lower confidence bound, and analogously a uniformly most accurate
upper confidence bound (limit), as the estimator that minimizes a specified estimation loss
function while satisfying the desire coverage.
Loss and risk functions can be fine-tuned to reflect the requirements of a very specific
application of a UCL. Conversely, a range of loss and risk functions can be used to show that a
UCL procedure is robust for a range of applications. This is the approach taken to evaluate UCL
performance in ProUCL and is the motivation for the loss functions described in the following
sections.
3.1 Bounded Loss for UCLs
Following Casella, Hwang, and Robert (1990), Casella and Hwang (1991), and Casella, Hwang,
and Robert (1993), bounded linear loss functions for UCLs are the sum of a size function that
penalizes inaccuracy and the 0-1 loss that penalizes lack of coverage. The average of the 0-1 loss
over a sample equals 1 minus the empirical coverage, so it is a penalty for lack of coverage. In
the case of one-sided intervals, specifically UCLs with 0 as a lower bound, the size of the
interval measured as its length equals the UCL. Therefore, it makes sense to consider an
alternative size measure that measures the closeness of the UCL to the true mean. An even
function of the relative difference of the UCL from the true mean works well. The associated risk
is the average of the loss function over a population or over a large sample, such as one
generated by simulation.
One pair of bounded linear loss and risk functions, using absolute relative error in the rational
size function defined in Casella, Hwang, and Robert (1990, page 10), is given by:
1 In business and economics, one maximizes expected utility, with utility being the negative of loss.
6 Jan 2022
4
-------
Analysis of UCL Simulations at the Lognormal Distribution
li =
Ui =
<7i =
1pt =
LB(Ui\n,d) =
1 -q =
U =
Rb(U\H> a) =
where I(u < jj.) is the indicator function that equals 1 if u < pL and 0 otherwise, qi is the 0-1
loss, 17j is the absolute relative error of the UCL estimate, q is the average of the 0-1 loss (an
estimate of the probability that the interval does not cover the true value), LB(u.i\pL, a) is the loss
incurred by an individual UCL estimate, and Rb(U\ju., a) is the expected loss or the loss averaged
over a population or a large sample. The use of relative absolute error in the loss makes it
consistent and easy to compare for different values of the true mean.
This particular loss function has some interesting and useful theoretical properties when used for
estimating two-sided intervals, as discussed in several papers, including Casella, Hwang, and
Robert (1990), Casella and Hwang (1991), and Casella, Hwang, and Robert (1993). It is
important that the penalty for lack of coverage and the penalty for inaccuracy be approximately
balanced so that neither dominates the other. When these are not balanced, problems like
Berger's paradox can occur (for example, Casella, Hwang, and Robert (1993)). In the bounded
loss shown above, the loss is 0 when the UCL equals the true mean. The loss increases to 1 +
as the UCL decreases from the true mean to 0 and increases to the limit of 1 as UCL
1+CX
increases.
However, note that this loss does not penalize over-coverage. Given that many authors regard
substantial over-coverage as a negative (see Section 3.0), this is a less than desirable feature. An
alternative would be to have penalties for both under-coverage and over-coverage that are
appropriate for a given application. This will be explored further in the section on unbounded
loss (Section 3.2).
Since for typical applications we want a UCL to be a conservative estimator but not overly so,
that is, to have both good coverage and good accuracy, it makes sense to penalize underestimates
of the true mean more than overestimates. This gives weighted bounded linear loss and risk
functions:
true mean
UCLj, fort = 1, are UCL estimates
n (0, Ui>LL
K.ui<^-[ x Ui
-------
Analysis of UCL Simulations at the Lognormal Distribution
lwB{U\n,a,b_,b+) = [b_ ¦ qt + b+ ¦ (1 - qt)]
Vi
a + Vi
+ Ri>
1 N
RwB(U\n,a,b_,b+) = ^ j[i>_ qt + b+ ¦ (1 - qt)] + q,
i=l 1
where b_, b+ are low and high bias penalty coefficients. If b_ = b+ = 1, then this is the same as
the previous loss function. As the UCL estimate goes to 0 from /i, the loss increases from 0 to
1
1 + b_ ¦ and as the UCL estimate increases from [i, the loss increases from 0 to b+. Setting
b+ = 1 + b_ ¦ makes the losses equal in the cases of maximum underestimation and
maximum overestimation. This is the balanced case for a mean that cannot be negative.
Figure 1 illustrates the weighted bounded linear loss function, LBw, given above with the true
value pL = 100, b_ = 1 and various values for b+ and a. Plots labelled "balanced" have the
parameters balanced in the sense given in the previous paragraph.
Percent of true mean
Figure 1. Weighted linear bounded loss profiles for various parameter values. Values of b+
are chosen as: i) b+ = b_ = 1, ii) £>+ = 1.5 matches the maximum loss for under- and
overestimation when a = 1; iii) b+ = 1. 67 matches the maximum loss for under- and
overestimation when a = 0. 5, and iv) b+ = 1.91 matches the maximum loss for
under- and overestimation when a = 0.1.
6 Jan 2022
6
-------
Analysis of UCL Simulations at the Lognormal Distribution
Review of the plots in the first row of Figure 1 leads to some interesting questions. When it
comes to assessment of risk or determining whether a site meets cleanup criteria, should a UCL
underestimating a mean by 1% be penalized more than a UCL overestimating the true mean by
200% or 500% or 1,000%? This will always be the case if b+ < 1, as in the first row of plots.
The balanced value of b+ = 1 + b_ , seen in the plots labeled "balanced," ensures that this
+ l+a F '
can never happen.
To further explain the examples from the plots above, when a = 0.1, b_ = 1, b+ = 1.91, a 1%
underestimate of the mean is penalized approximately the same as an 11% overestimate of the
mean. When a = 0.5, b_ = 1, b+ = 1.67, a 1% underestimate of the mean is penalized
approximately the same as a 75% overestimate of the mean. When a = 1, b_ = 1, b+ = 1.5, a
1% underestimate of the mean is penalized approximately the same as a 200% overestimate of
the mean. Therefore, minimizing the bounded linear loss with a = 1, b_ = l,b+ = 1.5, gives a
much more conservative (larger) UCL than does minimizing the bounded linear loss with a =
0.1, b_ = 1, b+ = 1.91, which gives more accurate estimates. Using a = 1, b_ = 1, b+ = 1.5 is
a compromise between conservatism and accuracy. In Section 4.0, UCL Plots, these losses are
designated as "Loss_bnd_c," "Loss_bnd_a," and "Loss_bnd_m," with "c," "a," and "m"
indicating conservative, accurate, and intermediate, respectively.
3.2 Unbounded Loss for UCLs
There are many possibilities for unbounded loss functions for confidence intervals, including
UCLs. One of the general advantages of unbounded loss functions over bounded loss functions is
that bounded loss functions for estimators which are unbounded (at least for practical purposes)
are forced to have the left and right sections of the loss function be concave, as may be seen in
Figure 1 above. On the other hand, unbounded loss functions can be convex. The graphs of
concave functions bend downward, and those of convex functions bend upward. Convex loss
functions generally have important properties for estimation procedures, as discussed at length
by Lehmann (1983) and specifically for interval estimates by Winkler (1972). Lehmann shows
that, under strictly convex loss functions, there is always an essentially unique estimator that
achieves minimum risk, and any estimator that does this is a function of the sufficient statistics2
of the distribution of the data. This is the Rao-Blackwell theorem. This extremely important
result does not hold for concave loss functions. The Gamma, t-, and H-UCL estimators in
ProUCL are functions of the sufficient statistics for the Gamma, normal, and lognormal
distributions, respectively. Many other results concerning convex loss functions in Lehmann
(1983) are useful but highly technical and will not be discussed further here.
Although many properties of optimal point estimators do not carry over to interval estimators,
Winkler (1972) shows that, when a strictly convex loss function is applied to the selection of an
interval estimator given a specific distribution, there is an optimal interval estimator, which is not
guaranteed for a concave loss function or a convex loss function that is not strictly convex.
Absolute error loss is an example of convex loss function that is not strictly convex. While strict
2 The sufficient statistics are functions of the data that for a specific distribution summarize all relevant information
contained in the sample about the parameters of the distribution. As an example, the sample mean and variance are
the sufficient statistics for the normal distribution.
6 Jan 2022
7
-------
Analysis of UCL Simulations at the Lognormal Distribution
convexity of the loss function does not guarantee optimality in the general situation of data from
an unknown distribution, it indicates that using strictly convex loss functions to evaluate
candidate UCL procedures is a productive approach.
The unbounded loss functions used here for evaluating UCLs are composed of four parts. The
first two are penalties for under- and over-coverage. The second two are penalties for inaccuracy
(the combination of bias and imprecision), on the low side and on the high side. As with bounded
loss functions, it is important that the coverage and accuracy components of the loss be balanced
so that both contribute meaningfully. This balance is first provided by choosing component
losses that are 0 when the UCL equals the true mean and that increase without bound as the UCL
is further and further from the true value and as the UCL coverage goes to 0 or to 1. Secondly,
the rates of increase for the negative and positive elements of the component losses can be varied
to reflect the losses incurred by the respective errors.
3.2.1 Unbounded Coverage Loss for UCLs
There are a couple of very reasonable possibilities for the unbounded coverage loss. One is based
on the log of the odds ratio (LOR) of the expected coverage of the UCL estimator versus the
desired or target coverage level (95% in our case). The LOR is a very natural scaling to compare
probabilities (or coverages), and its absolute value is a natural distance metric between
probabilities. We use the square of the LOR as a loss function:
Here Lwlor 1S the weighted version of the LOR coverage loss function. Both are strictly convex
loss functions.
Another natural metric depends on the fact that the distribution of UCL estimators is
asymptotically normal. This is a result of a number of things coming into play: the use of
maximum likelihood estimators (MLEs) in computing UCLs, the asymptotic normality of MLEs
under regularity conditions, UCLs being continuous functions of MLEs, and various
convergence results (see Serfling (1980)). This is not difficult to show. This fact then suggests
that coverage probabilities of UCL estimators could usefully be compared to the desired
coverages in the probit scale, the inverse of the normal probability function:
Y = desired coverage of UCL
c_ > c+ are low and high coverage penalty coefficients
logit(p)
LOR(p|y) = log = logit(p) - logit(y)
\1 p y '
£lor(?I Y) = LOR(q\Y)2
LwLOR(q\Y,c~,c+) = [c_-I(qY)]-LOR(q\Y)2
L0(q,\Y) = 3.763 [ 7)] [<2>-1(q) -<2>-1(r)]2
6 Jan 2022
8
-------
Analysis of UCL Simulations at the Lognormal Distribution
where <£_1 is the quantile function of the standard normal distribution, 3.763 is a scaling factor
to match the LOR and probit losses for coverage of 0.8 versus the target level of 0.95. Lw0 is the
weighted form of the probit coverage loss. Both are strictly convex loss functions.
Note that, unlike the bounded (0-1) coverage loss, which can be computed from individual UCL
estimates, the unbounded coverage loss functions use the average empirical coverage from
simulation. If the coverage were computed from theoretical calculations or from a large
simulation, we would consider the loss calculated from the coverage to be an expected loss or
risk.
This coverage is converted to a score (logit or probit) for comparison to the desired coverage.
The target coverage (say 95%) is also converted to a logit (probit). Then we square the difference
between them. If the target coverage logit (probit) is less than the true coverage logit (probit),
this is under-coverage, and we weight the squared difference with a weight of c_. For over-
coverage, the squared difference is weighted by c+. Generally, we weight under-coverage errors
more than over-coverage errors but do give positive weight to over-coverage errors. This is
consistent with the views on CI coverage of most of the authors surveyed (see Section 3.0, UCL
Properties).
Also, larger coverage errors (in logit/probit scale) should be weighted much more than relatively
small coverage errors instead of the weight being proportional to the size of the error. This is
technically known as having a strictly convex loss function. Using the square of the error makes
this a strictly convex loss function. Weighting a convex loss according to whether errors are
under- or over-coverage errors, provided the weights are positive, also results in a convex loss
function.
The shapes of the squared LOR and squared difference of probits coverage loss functions are
shown in Figure 2 and Figure 3. Note that, since the loss is unbounded on both sides of the target
coverage, y, the loss must increase very steeply if the x-axis is scaled in probability. In the plots
below, in which the x-axis is labelled as probability but scaled as a normal deviate (just as in a
normal Q-Q plot), the loss function would appear symmetric, except that we are weighting
under-coverage much more than over-coverage.
6 Jan 2022
9
-------
Analysis of UCL Simulations at the Lognormal Distribution
Penalty
coefficients
c_= 1
C+= 0.1
C+= 0.5
UCL coverage
Figure 2. Weighted logit squared error loss examples for coverage, c- = under-coverage
penalty coefficient. c+ = over-coverage penalty coefficient. Two different values of c+
are shown.
Penalty
coefficients
c_= 1
C+= 0.1
C+= 0.5
UCL coverage
Figure 3. Weighted probit squared error loss examples for coverage. C- = under-coverage
penalty coefficient. C+ = over-coverage penalty coefficient. Two different values of c+
are shown.
6 Jan 2022
10
-------
Analysis of UCL Simulations at the Lognormal Distribution
The weighted squared LOR and weighted probit squared error loss functions are clearly similar
but not identical.
3.2.2 Unbounded Accuracy Loss for UCLs
The penalty for lack of accuracy is relative mean squared error from the true mean value, which
in these simulations is 100. This allows accounting for both the relative bias and the relative
variance of the UCL as an estimator. For a UCL, negative deviations should be weighted more
than positive deviations, since our objective is a conservative estimate of the mean. Larger error
deviations should be penalized much more than smaller ones. Use of weighted squared error loss,
as discussed above, minimizes the worst behavior of an estimator by penalizing large errors by
much more than the magnitude of the error. This seems appropriate, because a major concern
with UCLs in environmental applications has been the fact that some UCL procedures can
produce wild overestimates of the mean under certain conditions. Having a very small penalty
for small overestimates and a very large penalty for large overestimates results from the
proposed weighted squared error loss and promotes our objective for a UCL that is a
conservative, but not overly conservative, estimator of the mean.
<7i =
ipt =
^wMSeC^i IM> C> £+)
^wMSe(^IM> C> £+)
I (Mi < M)
Ui~lL
ui
1
11
n
[b_qi + b+( 1 - qt)]vf
1 N
-'Y_\b_qi+b+{l-qi)]vf
i=1
6 Jan 2022
11
-------
Analysis of UCL Simulations at the Lognormal Distribution
0 100 200 300
UCL as Percentage of True Mean
Figure 4. Weighted mean squared error loss examples for inaccuracy (the combination of
bias and imprecision). B_ = negative bias penalty coefficient. B+ = positive bias
penalty coefficient. Three different values of b+ are shown, corresponding to
conservative, intermediate, and accurate estimates.
3.2.3 Combined Unbounded Loss
These components of the loss function, penalties for under- and over-coverage and for under-
and over-estimation, are added together to create the weighted linear unbounded loss function.
The formulas are:
Lwu,lor 0U\pL,Y,b_,b+,c_,c+) LwMse
(JJ | fx, b_, b+) + LwL0R(q | y> c-> c+)
Lwu.oiU \n,Y,b-,b+,C-,c+) = LwMSE(U \ii,b_,b+) + Lw0(q\Y,c_,c+)
4.0 UCL Plots
The most important UCL estimators (omitting the Gamma approximate UCL, the jackknife
UCL, and the percentile bootstrap UCL) computed in ProUCL are compared by looking at their
performance with respect to several measures of performance, including coverage, relative bias
(bias divided by the true value), variance of relative estimation error, relative Root Mean
Squared Error (RelRMSE), and various bounded and unbounded loss functions designed to focus
on different aspects of performance of the UCLs being compared.
The parameter values for each loss function in the plots below are summarized in Table 1. The
names of the loss functions briefly indicate their characteristics. As discussed in Section 3.1
6 Jan 2022
12
-------
Analysis of UCL Simulations at the Lognormal Distribution
above, "Loss bnd c," "Lossbndm," and "Lossbnda" indicate weighted bounded loss with
parameters chosen to be conservative, intermediate, and accurate, respectively.
Table 1. Loss function parameters
Name of Loss
Type of
Loss
Type of
Coverage
Loss
a
b_
b+ c_
c+
Loss_bnd_c
Bounded
0-1
1
1
1.5 1
0.0
Loss_bnd_m
Bounded
0-1
0.5
1
1.67 1
0.0
Loss_bnd_a
Bounded
0-1
0.1
1
1.91 1
0.0
Loss_LOR_c_RelMSE_c
Unbounded
LOR
-
1
0.2 1
0.1
Loss_LOR_a_RelMSE_c
Unbounded
LOR
-
1
0.2 1
0.5
Loss_LOR_c_RelMSE_a
Unbounded
LOR
-
1
1.0 1
0.1
Loss_LOR_a_RelMSE_a
Unbounded
LOR
-
1
1.0 1
0.5
Loss_probit_c_RelMSE_c
Unbounded
Probit
-
1
0.2 1
0.1
Loss_probit_a_RelMSE_c
Unbounded
Probit
-
1
0.2 1
0.5
Loss_probit_c_RelMSE_a
Unbounded
Probit
-
1
1.0 1
0.1
Loss_probit_a_RelMSE_a
Unbounded
Probit
-
1
1.0 1
0.5
For the unbounded losses, "Loss_LOR_c_RelMSE_c," "Loss_LOR_c_RelMSE_a,"
"Loss_probit_c_RelMSE_c," and "Loss_probit_c_RelMSE_a" have minimal over-coverage
penalty because their coverage loss is "conservative." The strings "LOR" and "probit" in the
unbounded loss names refer to using either LOR or probit losses for coverage. The loss names
ending in "c," like "Loss_LOR_c_RelMSE_c," indicate minimal loss for overestimation,
resulting in a more conservative estimate. The loss names ending in "a," like
"Loss_LOR_c_RelMSE_a," indicate symmetric loss for under- and overestimation, resulting in a
more accurate estimate.
The comparisons are plotted graphically in Figure 5 through Figure 20 below, and patterns in the
plots are explored and interpreted. The plots are organized by the log-scale standard deviation
(log SD) of the individual simulated data sets. Although the data sets are generated based on
specified values of the population CV (which is equivalent to specifying values of log SD, since
they are functions of each other) and for various sample sizes, the computed log SDs vary by
data set.
Furthermore, the GoF selection rules (Section 2.0) for filtering the generated data also somewhat
change the distribution of log SD by sample size from what was originally generated. Each plot
shows the features of the UCL estimators grouped by quartile of sample log SD (Figure 5
6 Jan 2022
13
-------
Analysis of UCL Simulations at the Lognormal Distribution
through Figure 20 in this section) and grouped by decile of the log SD (in Appendix A). The
plots in Appendix A show more detail of the UCL behavior, since the plots are presented over a
finer grid of sample log SD ranges.
Each combination of values of a type of UCL estimate, for a specified sample size and range of
sample log SD, is represented by a point computed as the average of a very large number of
simulated UCL values, since 10,000 UCL values were simulated for each type of UCL, sample
size, and population CV. To make the plots more readable, the curves for each UCL on each plot
were smoothed using locally estimated scatterplot smoothing (LOESS), a nonparametric
smoothing spline technique developed for scatterplot smoothing (Fox and Weisberg, 2018).
Another and more important reason to smooth the points in each curve is that the smoothed
curves give an improved estimate of the expected value of each UCL loss function (risk of the
UCL estimator) or of the expected value of the performance measure.
For each range of sample log SD, four figures are plotted as a set. The first figure in each set
includes four plots that show the coverage, relative error variance, relative bias, and RelRMSE
over a dense grid of sample sizes covering a wide range. The RelRMSE can be thought of as an
accuracy measure that integrates relative bias and relative error variance into a single measure
that is on the scale of average relative deviations from the true value. The second figure in each
set is composed of three plots that show the risk of the UCLs with respect to the bounded loss
function (Section 3.1) with parameters in Table 1; namely, " Loss bnd c," "Lossbndm," and
"Lossbnda."
The third figure in each set has four plots that show the risk of the UCLs with respect to the
unbounded loss function with LOR coverage loss (Section 3.2.1) and relative accuracy loss for
the parameter values in Table 1. The fourth figure in each set is very similar to the third but
shows the risk of the UCLs with respect to the unbounded loss function with probit coverage loss
(Section 3.2.1) and relative accuracy loss for the parameter values in Table 1.
6 Jan 2022
14
-------
Analysis of UCL Simulations at the Lognormal Distribution
o
LU
0)
>
0)
cc
0.001 ¦
50 100 200
Sample Size
100 200 500 1000
Sample Size
1.000-
0.100-
0.010-
O 3,00-
Z>
1.00-
0£ 0.30 -
Q)
>
J2
0
a:
0.10-
10
50 100 200
Sample Size
5001000
0.03-
20
Sample Size
1000
Figure 5. UCL summary for Lognormal with log SD in (0.0831,0.859]
UCL type
BCaJJCL
boot_t_UCL
Chebyshev_90_UCL
ChebyshevJJCL
Gamma_adj_UCL
H_UCL
HailsJJCL
skew_t_UCL
t UCL
in
c
o
o
c
3
LL
W
w
o
1.0
O
0.3
"O
CD
"O
c
D
O
CO
0.1
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
Sample Size
Figure 6. UCL Bounded Loss for Lognormal with log SD in (0.0831,0.859]
6 Jan 2022
15
-------
Analysis of UCL Simulations at the Lognormal Distribution
t/i
C
o
u
c
3
LL
W
tf)
O
O
3
TD
(D
~o
c
3
o
_Q
C
3
1 e 02
Loss_LOR_a_RelMSE_a
%
_ .
_ ^
1 e 02
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
* Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 7. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD in
(0.0831,0.859]
CO
C
o
u
c
3
LL
Cf>
tn
o
1e 02
O
1e+02
T3
(D
~o
c
3
o
-O 1e+00
c
1e 02
Loss_probit_c_RelMSE_c
%
N. >
*C,
UCL type
BCa_UCL
» boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
= H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 8. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD
in (0.0831,0.859]
6 Jan 2022
16
-------
Analysis of UCL Simulations at the Lognormal Distribution
50 100 200
Sample Size
50 100 200
Sample Size
UCL type
BCaJJCL
boot_t_UCL
Chebyshev_90_UCL
ChebyshevJJCL
Gamma_adj_UCL
HJJCL
HailsJJCL
skew_t_UCL
t UCL
Figure 9. UCL summary for Lognormal with log SD in (0.859,1.37]
in
c
% 1.0
O
c
3
LL
W
W
O
O 0.5
"O
CD
"O
c
D
m 0.3
UCL type
BCa_UCL
- bootJJJCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
.Q xG rf- Cp ,<0 4> r& c© .-£> '-is -Ci C&
v* 'v »}> - v- rh> rjp r
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaJJCL
- boof_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
* Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
1e-02
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 11. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD
in (0.859,1.37]
1e+01
c 1e+00
o
0
1 1e-01
cn
O 1 e-02
O
1e+02
T3
CD
| 1e+01
o
n
c .
1e-01
1e-02
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
= H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 12. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD
in (0.859,1.37]
6 Jan 2022
18
-------
Analysis of UCL Simulations at the Lognormal Distribution
50 100 200
Sample Size
50 100 200
Sample Size
UCL type
BCaJJCL
boot_t_UCL
Chebyshev_90_UCL
ChebyshevJJCL
Gamma_adj_UCL
HJJCL
HailsJJCL
skew_t_UCL
t UCL
Figure 13. UCL summary for Lognormal with log SD in (1.37,1.81]
in
c
o
0
1 1.0
w
w
o
O
3
"O
CD
"O
c
D
O
CO
0 3
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
Sample Size
Figure 14. UCL Bounded Loss for Lognormal with log SD in (1.37,1.81]
6 Jan 2022
19
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaJJCL
- boof_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
* Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 15. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD
in (1.37,1.81]
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
= H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 16. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD
in (1.37,1.81]
6 Jan 2022
20
-------
Analysis of UCL Simulations at the Lognormal Distribution
100%
10 20 50 100 200
Sample Size
d 3.0
Z>
w 10
to
m
CD
>
J2
CD
d;
0.3
0.1
5001000
10
10
50 100 200
Sample Size
500 1000
50 100 200
Sample Size
Sample Size
Figure 17. UCL summary for Lognormal with log SD in (1.81,5.41]
UCL type
BCaJJCL
boot_t_UCL
Chebyshev_90_UCL
ChebyshevJJCL
Gamma_adj_UCL
HJJCL
HailsJJCL
skew_t_UCL
t UCL
in
c
o
o
c
3
LL
W
W
O
1.0
o
3
"O
CD
"O
c
D
O
CD
0.5
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
Sample Size
Figure 18. UCL Bounded Loss for Lognormal with log SD in (1.81,5.41]
6 Jan 2022
21
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaJJCL
- boof_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
* Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 19. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with log SD
in (1.81,5.41]
10.0
(/}
C
o
c 10'
3
LL
Cf>
tn
O 0.1
O
T3
(V
~o
c
o 10.0
EZ
Losa_probit_a_RetMSE_a
\ 3^5^.
V
*\ f
1.0
0,1
Lo«s_probi_c_ R e&13 E_a
«? -
V \ ^
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
= H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 20. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with log SD
in (1.81,5.41]
6 Jan 2022
22
-------
Analysis of UCL Simulations at the Lognormal Distribution
4.1 Discussion of UCL Plots
As discussed in the preceding sections, the loss functions used were chosen to span a range of
penalties for coverage error and inaccuracy. The purpose of comparing UCLs against a variety of
relevant loss functions is to lend robustness to the conclusions of the comparisons.
Review of the plots above and of those in the Appendix A shows a number of very interesting
things:
The first is that the Chebyshev 90% and 95% UCLs perform quite poorly with respect to
the accuracy measures relative bias and relative root MSE in the first figure in each set
(Figures 5, 9, 13, and 17).
Secondly, the Chebyshev 90% and 95% UCLs perform quite poorly with respect to the
relative error variance measure in the first figure in each set (Figures 5, 9, 13, and 17).
This is due to the fact that both of these multiply the sample standard deviation by
relatively large factors.
Thirdly, with respect to the varieties of bounded loss considered, the Chebyshev 90% and
95% perform overall the most poorly of any of the UCLs considered over all sample sizes
and ranges of sample log SD, as seen in the second figure in each set (Figures 6, 10, 14,
and 18).
Fourthly, with respect to the varieties of unbounded loss considered, the Chebyshev 90%
and 95% perform overall poorly compared to the H-UCL below sample size 250, as seen
in the third and fourth figures in each set (Figures 7, 8, 11, 12, 15, 16, 19, and 20). In all
the plots, the H-UCL behaves badly starting somewhere in the range of sample sizes 250
to 300. This is due to a numerical issue in the code in the EnvStats R library used to
compute the H-UCL.
Except for small sample sizes or very large sample log SD, the H-UCL is by far the
closest to the target coverage (95%) of all the UCLs considered (Figures 5, 9, 13, and 17).
Hall's bootstrap UCL has by far the lowest coverage and the lowest relative bias
(although still biased significantly high) of all the UCLs, except in the case of large
sample log SD combined with large sample size, in which case the Gamma adjusted UCL
has lower coverage and less relative bias.
The risk profiles of UCLs under the unbounded loss functions with LOR and probit
coverage losses are similar but not identical.
For sample sizes 20 and below falling into the largest category of sample log SD (> 1.8),
Hall's bootstrap UCL has the lowest risk under all of the unbounded loss functions
considered.
The bootstrap-t UCL becomes wild for all sample sizes for sample log SD above 1.3.
For both bounded and unbounded losses over the ranges of sample sizes and sample log
SD considered, the risk performances of the t-UCL and the skewed t-UCL are very
similar.
6 Jan 2022
23
-------
Analysis of UCL Simulations at the Lognormal Distribution
For sample sizes of approximately 20-25 and smaller with sample log SD between
approximately 0.8 and 1.8, the t-UCL has the lowest risk among the unbounded losses
considered.
The behavior of the UCLs appears to be better separated and characterized by the
unbounded loss functions than by the bounded loss functions.
These simulated UCLs and summary statistics will be an important source of data moving
forward to improve the lognormal UCL recommendation rules (that is, for data that has gone
through the GoF logic described in Section 2.0 and is deemed to be lognormal) for ProUCL 5.2
and to substantially improve them in ProUCL 6.0. Since the patterns illustrated in the plots above
are complex, decision rules for UCL recommendations can best be formulated with the help of
machine learning methods.
5.0 UCL Recommendation Rules
ProUCL uses many different UCL methods for estimation, and, for many data sets, uses an
underlying decision logic to recommend one of the estimated UCLs. This decision logic is based
in part on the results of GoF tests but in the past has only addressed coverage of a UCL and not
accuracy. For data sets that are considered approximately lognormal, it is common for ProUCL
5.1 to recommend a Chebyshev UCL. The objective of this simulation study and analysis is to
improve these rules for data which ProUCL 5.2 has determined to be approximately lognormal
based on the revised GoF rules described in Section 2.0.
5.1 Algorithm for Recommendation Rules
The new recommendation rules are determined using classification trees estimated using the
method of Recursive Partitioning (RPart) as implemented in the R library rpart (version 4.1-15).
The objective is to identify the UCL estimator that minimizes the aggregated risk measures for
various values of decision variables. The decision variables are the sample size (N) and the log
SD of the samples in each simulated data set.
The aggregated risk measures used are derived from the eight unbounded loss functions used in
the simulation study. The bounded loss functions are useful but not as informative as the
unbounded loss functions, since the bounded loss functions don't separate the UCL estimators as
well. Two types of aggregated risk measures are used. The first computes the average value
across loss functions for each combination of UCL type, N, and log SD category over all the
filtered simulated data sets. The second computes the maximum across the loss functions for
each combination of UCL type, N, and log SD category over all the filtered simulated data sets.
Since the values of log SD are continuous, in order to match the values up, the values of log SD
are binned into 100 categories using the min, max, and percentiles of the log SD value in the
simulated data. The use of several different loss functions, some more conservative (emphasizing
coverage and allowing more overestimation) and some more accurate (less extreme
overestimation but not necessarily as good coverage), to derive the aggregated risk measures
makes the conclusions derived from the study more robust.
6 Jan 2022
24
-------
Analysis of UCL Simulations at the Lognormal Distribution
For each type of aggregated risk and for each combination of values of N and log SD category,
the UCL estimator with the smallest risk is identified. The selected UCLs are assigned to
UCLmal (UCL type with minimum average loss) and UCLmml (UCL type with minimax loss)
which take a value (UCL name) for each combination of values of N and log SD category. The
former is the choice of UCL type that would perform best on average for a given value of N and
sample log SD. The latter is a minimax choice of UCL type for a given value of N and sample
log SD.
Since the values of the average loss for each variety of loss function are estimates based on
simulation, so also are the risk values and the selected UCL types for UCLmal and UCLmml. The
assignments of UCLmal and UCLmml for the combinations of N and log SD do potentially
contain some errors. Therefore, a statistical classification method must be used to extract the
signal from the noise and to develop relatively simple recommendation rules that can be easily
implemented.
5.2 Risk Profiles
Figure 21 and Figure 22 below illustrate the features of the aggregated risk measures across
ranges of the sample log SD and for a range of sample sizes. These two figures effectively
summarize the eight unbounded loss figures in Section 4.0 and the 20 unbounded loss figures in
Appendix A. While the average risks are lower than the maximum risk levels, the patterns in
these plots are very similar.
Figure 21 shows the average risk profiles for the various UCL estimators averaged across the
various unbounded loss functions.
6 Jan 2022
25
-------
Analysis of UCL Simulations at the Lognormal Distribution
1e-02
Log SD range: (0.838,1.34]
%
\
\
%
\
UCL type
- BCa_UCL
- - boot_t_UCL
Chebyshev_90_UCL
Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 21. Average of eight unbounded loss functions for various UCLs by log-scale SD and
sample size
Figure 22 shows the maximum risk profiles for the various UCL estimators averaged across the
various unbounded loss functions.
6 Jan 2022
26
-------
Analysis of UCL Simulations at the Lognormal Distribution
le+03
j*:
(/)
ir
3
E
x
C0
1e+01^
1e-01 -
Log SD range: ;0.838,1 34j
O
3
)
(/>
o
TJ
-------
Analysis of UCL Simulations at the Lognormal Distribution
For estimating recommendation rules based on UCLmal, it is desirable to use a relatively simple
rule with no more than one split in N and one in log SD. The tree is therefore pruned until it has
only two levels. The tree (Figure 23) and associated decision rules for minimizing the average
risk (Table 2) are shown below. The UCL type shown at each node in the tree is the one most
likely to minimize the average risk, conditional on the predictor values being in the indicated
ranges.
(Ty
59
345/1200
35%
log_SD < 1.5-
(h_ucl)
1863/3400
100%
-(yes "l~N < 28T"°T
(HJJCL)
261 / 670
20%
(tJJCL)
322 / 530
16%
(HJJCL)
1597/2200
65%
Figure 23. Decision tree for minimum average loss, pruned to two levels
Only the H-UCL and the t-UCL are recommended. The breakpoint in N is 28, and for smaller
samples the t-UCL is recommended for cases in which the log SD is greater than or equal to 1.5.
Table 2. Decision rule output for UCL minimum average risk tree
ske
Gam
Che
Che
H U
t U
Hal
boo
BCa
H
UCL
[ . 06
.25
.22
. 01
. 39
. 03
. 00
. 03
. 01]
when
N
<
28
& log
SD <
1.5
H
UCL
[ . 03
. 08
. 03
. 00
. 73
. 02
. 01
. 07
. 03]
when
N
>=
28
t
UCL
[ . 04
. 19
. 00
. 00
. 01
. 61
. 15
. 00
. 00]
when
N
<
28
& log
SD >=
1.5
The decision tree for minimax risk, when pruned to two levels, gives a very similar result. The
only difference is that the decision point for sample log SD is 1.3 instead of 1.5.
6 Jan 2022
28
-------
Analysis of UCL Simulations at the Lognormal Distribution
5.4 Tentative Lognormal UCL Recommendations
It is reasonable to compromise between the minimum average risk rule and the minimax risk
rule. Therefore, the rule that for N > 28 use the H-UCL and that for N < 28, sample log SD <
1.4, use the H-UCL and otherwise the t-UCL can be tentatively recommended.
It must be strongly pointed out again that these recommendations are for data generated from a
lognormal distribution and that have passed through the GoF screening procedure described in
Section 2.0. Passing the GoF screening procedure means that ProUCL 5.2 is treating the data as
lognormal. Some of the lognormal data, especially that simulated with the smallest CV (0.01),
was screened out as normal (was not rejected by either test of normality at level 0.1). Some of
the lognormal data simulated with small to moderate CV was screened out as Gamma (was not
rejected by either Gamma GoF test at level 0.05). After these two filters, the remaining data was
accepted as lognormal only if it was not rej ected by two tests of lognormality at level 0.1.
Although this recommendation is for data treated by ProUCL 5.2 as lognormal, it would be
appropriate to run further simulations with data from other right-skewed distributions, including
mixture distributions, to confirm these recommendations. It must also be pointed out that the
effects of detection limit censoring were not modeled in this simulation. It is suspected that data
sets with a large number of nondetects could create problems for the H-UCL. This should also be
explored.
6.0 Conclusion
The results of this study provide clear and convincing evidence that neither the Chebyshev 95%
UCL nor the Chebyshev 90% UCL are useful procedures for constructing UCLs for data deemed
to be lognormal.
Furthermore, analysis of the lognormal UCL simulation study data using RPart classification
trees for risk minimization allows formulation of a simple tentative recommendation rule for
UCLs in data classified as lognormal by ProUCL 5.2. That rule may be simply stated as:
H-UCL when N > 28 or log-scale SD < 1.4, and the t-UCL otherwise.
It is important that this recommendation for lognormal data be confirmed by simulations with
other right-skewed distributions and by accounting for the effects of detection limit censoring.
6 Jan 2022
29
-------
Analysis of UCL Simulations at the Lognormal Distribution
7.0 References
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd edition. Springer-
Verlag.
Bickel, P.J. and K.A. Doksum (1977). Mathematical Statistics: Basic Ideas and Selected Topics.
Holden-Day. San Francisco.
Casella, G., Hwang, J.T., and Robert, C. (1990). Loss Functions for Set Estimation. Biometrics
Unit Technical Report BU-999-M, Cornell University.
Casella, G. and Hwang, J. T. (1991). EVALUATING CONFIDENCE SETS USING LOSS
FUNCTIONS. Statistica Sinica, 1(1), 159-173.
URL:http://www.istor.org/stable/24303998
Casella, G., Hwang, J. T. G., & Robert, C. (1993). A PARADOX IN DECISION-THEORETIC
INTERVAL ESTIMATION. Statistica Sinica, 3(1), 141-155.
URL:http://www.istor.org/stable/24304942
Davison, A.C. and D.V. Hinkley (1997). Bootstrap methods and their application. Cambridge
Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
Fox, J. and S. Weisberg (2018). "Appendix: Nonparametric Regression in R" (PDF). An R
Companion to Applied Regression (3rd ed.). SAGE. ISBN 978-1-5443-3645-9.
Friedman, J. H. (1977). A recursive partitioning decision rule for nonparametric classification.
IEEE Trans. Computers, 26(4), 404-408.
Hollander, M. and D.A. Wolfe (1973). Nonparametric Statistical Methods. John Wiley & Sons.
Lehmann, E.L. (1983). Theory of Point Estimation. Wadsworth & Brooks. Pacific Grove, CA.
Lehmann, E.L. (1986). Testing Statistical Hypotheses: Second Edition. John Wiley & Sons.
Meeden, G. and Vardeman, S. (1985). Bayes and admissible set estimation. J. Amer. Statist.
Assoc. 80, 465-471.
Mood, A.M., F.A. Graybill, and C.D. Boes (1974). Introduction to the Theory of Statistics.
McGraw-Hill.
Randies, R.H. and D.A. Wolfe (1991). Introduction to the Theory of Nonparametric Statistics.
Krieger Publishing Co., Malabar, Florida. 450 pp.
Rao, C.R. (2002). Linear Statistical Inference and Its Applications: 2nd Edition. John Wiley and
Sons.
Serfling, R. (1980). Approximation Theorems of Mathematical Statistics. John Wiley and Sons.
6 Jan 2022
30
-------
Analysis of UCL Simulations at the Lognormal Distribution
Terry Therneau and Beth Atkinson (2019). rpart: Recursive Partitioning and Regression Trees. R
package version 4.1-15. https://CRAN.R-project.org/package=rpart
Winkler, R. L. (1972). A Decision-Theoretic Approach to Interval Estimation. J. Amer. Statist.
Assoc. 61, 187-191.
6 Jan 2022
31
-------
Analysis of UCL Simulations at the Lognormal Distribution
Appendix A: Detailed UCL Plots using Deciles of Log SD
The following plots are the same as in Section 4.0, except that there the UCL summary statistics
and loss function values are plotted in groups by quartile of the log SD of the data sets from
which the UCLs were computed. Here the data are grouped for plotting by decile of the log SD
of the data sets. This gives a more detailed view of the behavior of the UCLs.
6 Jan 2022
32
-------
Analysis of UCL Simulations at the Lognormal Distribution
50 100 200 5001000
Sample Size
50 100 200 500 1000
Sample Size
O
1.000-
0)
o
c
.52
CO
>
0.100-
0.010-
LU
0)
>
0)
cc
0.001 *
o 1,00-
o
111
CO
o£
0) 0.10-
>
ro
0
OH 0.03-
10
50 100 200
Sample Size
5001000
10
50 100 200
Sample Size
UCL type
BCaJJCL
boot_t_UCL
Chebyshev_90_UCL
ChebyshevJJCL
Gamma_adj_UCL
H_UCL
HalIs_UCL
skew_t_UCL
t UCL
Figure 24. UCL summary for Lognormal with Std Dev of Logs in (0.0831,0.576]
10
in
c
o
o
c
3
LL
W
w
o
0.3
O
3
"O
CD
"O
c
D
O
CD
0.1
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
.Q xG cP ,C; 4> r& r© .-£> .ft .-il ,Ci r£> r& p&
v* »}> :~p >>> f <§> v rf.> ^
Sample Size
Figure 25. UCL Bounded Loss for Lognormal with Std Dev of Logs in (0.0831,0.576]
6 Jan 2022
33
-------
Analysis of UCL Simulations at the Lognormal Distribution
t/i
C
o
u
c
3
LL
W
tf)
O
O
3
TD
(D
~o
c
3
o
_Q
C
3
1e-02
Loss_tOR_a_RelMSE_a
%
Loss LOR i RelMSE c
"SUsk>" " '
1e-02
UCL type
BCa_UCL
» boot_t_UCL
tt Chebyshev_90_UCL
Chebyshev_UCL
1 Gamma_adj_UCL
H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 26. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.0831,0.576]
CO
C
0
1 1 e-02
LL
Cf>
tn
o
O
T3
~o
c
3
o
_q
c.
Lossjarobit_c_RelMSE_a
1e+02 -
1e+00 -
1 e-02 -
Loss_probit_c_RelMSE_c
4,
-C~ - *
UCL type
BCa_UCL
» boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
= H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 27. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (0.0831,0.576]
6 Jan 2022
34
-------
Analysis of UCL Simulations at the Lognormal Distribution
LU
0)
>
0)
cc
0.001 *
10 20 50 100 200
Sample Size
5001000
50 100 200
Sample Size
500 1000
o
z>
M
o
g 1.000-
c
.52
co 0.100 -
>
0.010-
10 20 50 100 200
Sample Size
5001000
50 100 200
Sample Size
UCL type
BCaJJCL
boot_t_UCL
Chebyshev_90_UCL
ChebyshevJJCL
Gamma_adj_UCL
H_UCL
HailsJJCL
skew_t_UCL
t UCL
Figure 28. UCL summary for Lognormal with Std Dev of Logs in (0.576,0.773]
to 1.0
c
o
o
c
3
LL
W
W
O
O
3
"O
CD
"O
c
D
O
CD
0.3
0.1
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
Sample Size
Figure 29. UCL Bounded Loss for Lognormal with Std Dev of Logs in (0.576,0.773]
6 Jan 2022
35
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaJJCL
- boof_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
* Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 30. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.576,0.773]
CO
C
o
u
c
3
LL
Cf>
tn
o
1e-02
O
1e+02
T3
(V
"O
c
=5
-o le+00
EZ
1e-02
Loss_probit_a_RefMSE_c
. - - '
"%
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
= H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 31. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (0.576,0.773]
6 Jan 2022
36
-------
Analysis of UCL Simulations at the Lognormal Distribution
75%
70%
O
z>
M
° 1.000-
u
c
ro 0.100-
>
o
£ 0.010-
0)
>
0)
cc
10 20 50 100 200
Sample Size
5001000
100 200
Sample Size
5C
10 20 50 100 200
Sample Size
5001000
50 100 200
Sample Size
UCL type
BCaJJCL
boot_t_UCL
Chebyshev_90_UCL
ChebyshevJJCL
Gamma_adj_UCL
H_UCL
HalIs_UCL
skew_t_UCL
t UCL
Figure 32. UCL summary for Lognormal with Std Dev of Logs in (0.773,0.982]
C/> . -
c 10
o
o
c
3
ifi
w
O
0.5
O
3
"O
CD
"O 0 3
c
D
O
CO
UCL type
BCa_UCL
- bootJJJCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
v* 'v »}> - v- rh> rjp r
-------
Analysis of UCL Simulations at the Lognormal Distribution
10 20
3 10 20 50 100 200
Sample Size
UCL type
BCaJJCL
- boof_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
* Gamma_adj_UCL
H_UCL
Halis_UCL
skew_t_UCL
t UCL
Figure 34. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.773,0.982]
1e+01
w 1e+00
o
y 1e-01
3
% 1e-02
O
1e+02
T3
CD
"c le+01
=5
o
-D
C 1
1
1
Loss probit a RelMSE a
, '
- ^
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
= H_UCL
Halls_UCL
skew_t_UCL
t UCL
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 35. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (0.773,0.982]
6 Jan 2022
38
-------
Analysis of UCL Simulations at the Lognormal Distribution
Figure 36. UCL summary for Lognormal with Std Dev of Logs in (0.982,1.17]
10 20 50 100 200 500 1000
Sample Size
10 20 50 100 200 500 1000
UCL type
BCaUCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
HallsJJCL
skew_t_UCL
t UCL
100%
d) 90%
O)
ro
o>
> 80%
o
O
70%
° 1.0
C/)
ro
CO
(D 0.3
>
-i»
ro
tu 0.1
cr
10 20 50 100 200 500 1000
Sample Size
0.3
10 20 50 100 200 500 1000
Sample Size
Sample Size
Sample Size
Figure 37. UCL Bounded Loss for Lognormal with Std Dev of Logs in (0.982,1.17]
O 0.5
3
"O
CD
"O
c
D
o
GQ
0.3
UCL type
- BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
HJJCL
- Halls_UCL
skew_t_UCL
- t UCL
6 Jan 2022
39
-------
Analysis of UCL Simulations at the Lognormal Distribution
100.0-
10.0-
(/)
C
O
u
c
Ll_
(/)
c/)
o
O
Z>
"O
CD
"D
C
D
O
-Q
C
ZD
100.0-
10.0-
50 100 200
500 1000 10 20
Sample Size
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
H_UCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 38. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (0.982,1.17]
100.0-
10.0-
{/)
C
O
o
c
D
LL
C/)
(/)
O
O
Z>
~o
CD
"O
c
o
_Q
c
Z)
Loss_probit_a_RelMSE_a
- + -%l I- + -I--
1 \ III
1 l\ 1 1 I
1 1
H - ¦+
1 1
1 1
I^J
1 1
_ J. m'mm
i i
r:r
fj--
1
_ 1
1
- 1
1
1.0-
0.1-
100.0-
10.0-
Loss_probit_c_
73
CD
CO
m
_c
- I
i*i ii
i \
I
I
I I
I I
I
I
I I
I I
" t - r NNv.-
ii ii
XT
1 1
ii ii
1
10 20 50 100 200 500 1000 10 20
Sample Size
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
HJJCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 39. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (0.982,1.17]
6 Jan 2022
40
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaUCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halls_UCL
skew_t_UCL
t UCL
Figure 40. UCL summary for Lognormal with Std Dev of Logs in (1.17,1.37]
100%
90%
a>
ro
ro
S 80%
>
o
O
70%
60%
» 10
ro
m
« 0.3
»'
ro
tu .
D£ 01
10 20 50 100 200 500 1000
Sample Size
0.3
ro
0)
K 0.1
10 20 50 100 200 500 1000
Sample Size
10 20 50 100 200 500 1000
Sample Size
10 20 50 100 200 500 1000
Sample Size
a>
u
c
TO
TO
>
1.00
0.10-
0.01 -
Sample Size
Figure 41. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.17,1.37]
(/)
C
o
'¦§ 1-0
c
D
o
z>
"D
CD
"D
C
D
o
DQ
0.5
UCL type
BCa_UCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
H_UCL
Halls_UCL
skew_t_UCL
- t UCL
6 Jan 2022
41
-------
Analysis of UCL Simulations at the Lognormal Distribution
Loss LOR a RelMSE a
Loss LOR a RelMSE c
Loss LOR c RelMSE a
Loss LOR c RelMSE c
1 e-02 -, , , , , ,
10 20 50 100 200 500 1000 10 20 50 100 200 500 1000
Sample Size
Figure 42. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.17,1.37]
1e+02 -
1e+01
(/>
§ 1e+00
o
§ 1e-01
CO
(/>
9 1e-02J
O
=> 1e+02
1e+01
"D
a)
"O
c
o
3 1e+00
1e-01 -
UCL type
BCa_UCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
t UCL
+
I
4-
I
I
Figure 43. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (1.17,1.37]
Loss_pro bit_a_Rel M S E_a
Loss_pro bit_a_Rel M S E_c
1e+02
1e+01
i/)
§ 1e+00
o
§ 1e-01
c/>
9 1e-02
O
=) 1 e+02 -
¦o
a;
~u
§ 1e+01 -
o
JD
3 1e+00
1e-01 ¦
1e-02 -L
UCL type
BCa_UCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
HJJCL
Halls_UCL
- skew_t_UCL
t UCL
Loss_pro bit_c_Rel M S E_a
Loss_probit_c_RelMSE_c
4- -
10 20 50 100 200 500 1000 10 20
Sample Size
50 100 200 500 1000
6 Jan 2022
42
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaUCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halls_UCL
skew_t_UCL
t UCL
Figure 44. UCL summary for Lognormal with Std Dev of Logs in (1.37,1.57]
u 1.0
C/)
ro
m
g 0.3
ro
Qf 0.1
10 20 50 100 200 500 1000
Sample Size
3 3.0
<4
O
LU
GO 1.0
o:
0
0.3
ro
0
0.1
Sample Size
0
o
c
TO
TO
>
1.00
0.10-
0.01 -
10 20 50 100 200 500 1000
Sample Size
10 20 50 100 200 500 1000
Sample Size
i
i
i
r
i
i
i
Sample Size
Figure 45. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.37,1.57]
o
3
"O
0
"O
c
D
o
00
0.5
UCL type
BCa_UCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
H_UCL
Halls_UCL
skew_t_UCL
- t UCL
100%
90%
80%
70%
60%
10 20 50 100 200 500 1000
6 Jan 2022
43
-------
Analysis of UCL Simulations at the Lognormal Distribution
100.0-
10.0-
(/)
C
O
u
c
LL
(/)
c/)
o
O
Z>
"O
CD
"D
C
D
O
-Q
C
ZD
100.0
10.0-
Loss_
LOR_a_RelMSE_c
- +¦
\l
- 1
1
H - + -
'\
¦ » 1
" \
1- -
1
- H -
1
+
1
A 1
- "
v\
r j
_
1
- -1 -
1
+
r*
- r
i
1 \
i 7
V(
.;
- T
I
" 1
i
1 ~ T -
¦ I
i
/l -
1
T
1
Loss LOR c RelMSE c
10 20 50 100 200 500 1000 10 20
Sample Size
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
HJJCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 46. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.37,1.57]
100.0-
Loss_probit_a_RelMSE_c
+
I I A J I I I I
I \ I I
- Vh - t->:i 4 +
10 20 50 100 200 500 1000 10 20
Sample Size
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
H_UCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 47. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (1.37,1.57]
6 Jan 2022
44
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaUCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
H_UCL
HallsJJCL
skew_t_UCL
t UCL
Figure 48. UCL summary for Lognormal with Std Dev of Logs in (1.57,1.73]
100%
90%
a>
ro 80%
a)
o 70%
O
60%
50%
3.0
u 1.0
C/)
ro
CQ
g> 0.3
ro
£ 0.1
10 20 50 100 200 500 1000
Sample Size
0
LU
co 1.0
2
o:
a)
1 0.3
JS
0)
a:
10 20 50 100 200 500 1000
Sample Size
10 20 50 100 200 500 1000
Sample Size
1.00
0.10
to
a> 0.01
or
10 20 50 100 200 500 1000
Sample Size
Sample Size
Figure 49. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.57,1.73]
o
3
"O
CD
"O
c
D
o
00
0.5
UCL type
BCa_UCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
- t UCL
6 Jan 2022
45
-------
Analysis of UCL Simulations at the Lognormal Distribution
Loss_LOR_a_RelMSE_a
100.0 H- +v * -
M \ i i i \ ii
i i i * i i
10.0 H »5^.V- t - i V ^
(/)
C
o
u
c
LL
(/)
c/)
o
O
Z>
"O
CD
"D
C
D
O
-Q
c
ZD
Loss_LOR_a_RelMSE_c
- +¦
-I » + -I-
i i
- H -
1
+
i
-A
1 1 J
- T " \
1
- ~l -
i
T
i
T
I
"T
" r
1 N/l\l
-V.
i i
I
i
1 i\X.
1 1
\.
/I
1
ii iii ii
100.0
10.0
10 20 50 100 200 500 1000 10 20
Sample Size
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
H_UCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 50. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.57,1.73]
100.0-
10 20 50 100 200 500 1000 10 20
Sample Size
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
HJJCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 51. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (1.57,1.73]
6 Jan 2022
46
-------
Analysis of UCL Simulations at the Lognormal Distribution
Figure 52. UCL summary for Lognormal with Std Dev of Logs in (1.73,1.95]
UCL type
BCaUCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halls_UCL
skew_t_UCL
t UCL
100%
90%
CD
80%
a)
g 70%
O
60%
50%
10 20 50 100 200 500 1000
Sample Size
Sample Size
O 3.0
CO
ro
m
a)
>
ro
tu
cm
0.3
0.1
10 20 50 100 200 500 1000
Sample Size
o
Z> 3.0
<4
O
LU
0.3
d)
a:
10 20 50 100 200 500 1000
Sample Size
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
- t UCL
Sample Size
Figure 53. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.73,1.95]
6 Jan 2022
47
-------
Analysis of UCL Simulations at the Lognormal Distribution
100.0-
Loss_LOR_a_RelMSE_c
- +
-1 I- + -I 1 -
+
l ^
~ I
1 III 1
\
1
1
+ ~h *
+
J
1
- +¦
1
- i
iVr^
1
1
- +¦
1
1 \ >
- H -
+
1
Loss_LOR_c_RelMSE_c
+
\ I III II
i i -A ii
» i - + - h \/\ - +
100 200 500 1000 10 20
Sample Size
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
H_UCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 54. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.73,1.95]
(/)
C
o
o
c
D
LL
C/)
(/)
O
Loss_probit_a_RelMSE_a
100.0-
- + -v 1 1 -
- +
i
&
i
- +
10.0-
Sx? si
i i iV
V\ i
1.0-
- + - i 1 -
i i i
\/^
0.1 -
i i i
- + - i 1 -
i i i
i i ii
+ -1i - +
ii ii
Loss_probit_c_RelMSE_c
- +¦
1 ^
± _<
+
i
1
\ 1 1 1 *1 * 1 I
y8N^rl_~ + f 4 +
- +¦
1
. \~ -V^-
¦ ^ 1
1
- +¦
i
1 1
- i 1 -
i iii i
i
i
20 50 100 200 500 1000 10 20
Sample Size
UCL type
BCaJJCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
HJJCL
Halls_UCL
- skew_t_UCL
t UCL
50 100 200 500 1000
Figure 55. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (1.73,1.95]
6 Jan 2022
48
-------
Analysis of UCL Simulations at the Lognormal Distribution
UCL type
BCaUCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
H_UCL
Halls_UCL
skew_t_UCL
t UCL
Figure 56. UCL summary for Lognormal with Std Dev of Logs in (1.95,2.25]
100%
90%
c» 80%
TO
? 70%
o
° 60%
50%
° 3.0
a>
o
c
^ 1.0
CT3
>
O 0.3
0.1
Q 3.0
Z>
C/)
ro
CD
(D
>
ro
tu
Q1
0.3
10 20 50 100 200 500 1000
Sample Size
10 20 50 100 200 500 1000
Sample Size
50 100 200 500 1000
Sample Size
o
=)
4
o
LU
CO
5
tz
a)
>
ro
0)
a: 0.3
3.0
1.0
10 20 50 100 200 500 1000
Sample Size
Sample Size
Figure 57. UCL Bounded Loss for Lognormal with Std Dev of Logs in (1.95,2.25]
Loss bnd a
Loss bnd m
Loss bnd c
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
- t UCL
1.0
2.0
6 Jan 2022
49
-------
Analysis of UCL Simulations at the Lognormal Distribution
100-
(/)
C
o
u
c
LL
(/)
c/)
o
10
1 ¦
Loss_LOR_a_RelMSE_a
- 4- -
T
1
T
+ 1-
s 1 I
1 I
1
1 1 1
1
- 4- -
>
-o _ -
1
1 ¦
1
1
1
l\
-r
_ j. .
i
i
. L .
1
1
_l _ \
1 1
1 I
. ± _
1 1
,1 .
1
Loss LOR a RelMSE c
+ - f-
I I
I I
I I
I I
I I
I I
J. - L
I I
\ 1 - +
11 1
I I I
II 1
J 1-4-
11 1
I I I
II 1
J I 1 ±
11 1
_l
o
Loss_LOR_c_RelMSE_a
=> 100-
~o
A ~ \
\ 1
\
i
-i.
-
1 -
1
_ J.
1
1
\
\
i i i
i i i i
1
1
Loss LOR c RelMSE c
+ - f-
I I
I I
I I
-I- - J-
I I
I I
I I
J. - L
I
I
r-
10 20
50 100 200
500 1000 10 20
Sample Size
i - +
i i i
i i i
i i i
-I 1 - -4-
I I I
I I I
I I I
_l I _ ±
I I I
, r
50 100 200 500 1000
r
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
H_UCL
Halls_UCL
- skew_t_UCL
t UCL
Figure 58. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (1.95,2.25]
100 200
500 1000 10 20
Sample Size
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ - Gamma_adj_UCL
HJJCL
Halls_UCL
- skew_t_UCL
t UCL
100 200 500 1000
Figure 59. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (1.95,2.25]
6 Jan 2022
50
-------
Analysis of UCL Simulations at the Lognormal Distribution
100%-
UCL type
20 50 100 200
Sample Size
50 100 200
Sample Size
500 1000
BCaUCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
O
3
CD
o
c
ro
ro
>
LU
(D
.>
-4'
ro
(D
q:
£0.5
100 200 500 1000
Sample Size
Sample Size
Figure 60. UCL summary for Lognormal with Std Dev of Logs in (2.25,5.41]
Sample Size
Figure 61. UCL Bounded Loss for Lognormal with Std Dev of Logs in (2.25,5.41]
1.0
UCL type
BCa_UCL
- boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
- t UCL
6 Jan 2022
51
-------
Analysis of UCL Simulations at the Lognormal Distribution
50
100
200
500 1000 10 20
Sample Size
50 100 200 500 1000
Figure 62. UCL Unbounded Loss with LOR loss for Coverage for Lognormal with Std Dev
of Logs in (2.25,5.41]
UCL type
BCa_UCL
boot_t_UCL
Chebyshev_90_UCL
Chebyshev_UCL
¦ Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
t UCL
Loss LOR a RelMSE a
Loss LOR a RelMSE c
Loss LOR c RelMSE a
Loss LOR c RelMSE c
100
Loss_probit_a_RelMSE_a
Loss_probit_a_RelMSE_c
Figure 63. UCL Unbounded Loss with Probit loss for Coverage for Lognormal with Std
Dev of Logs in (2.25,5.41]
100
o
3
100
BCa UCL
UCL type
- boot t UCL
Chebyshev_90_UCL
Chebyshev_UCL
Gamma_adj_UCL
HJJCL
Halls_UCL
skew_t_UCL
t UCL
50 100 200 500 1000 10 20
50 100 200 500 1000
Sample Size
Loss_probit_c_RelMSE_a
Loss_probit_c_RelMSE_c
6 Jan 2022
52
-------
Analysis of UCL Simulations at the Lognormal Distribution
Appendix B: Session Info
## R version 4.1.2 (2021 -11 -01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
m
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States. 1252
## [2] LC_CTYPE=English_United States. 1252
## [3] LC_MONETARY=English_United States. 1252
## [4] LC NUMERIC=C
## [5] LC_TIME=English_United States. 1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rpart.plot_3.1.0 rpart_4.1-15 ftExtra_0.2.0 flextable_0.6.10
## [5] captioner_2.2.3 scales_l.l.l cowplot_l.l.l reshape_0.8.8
## [9] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [13] readr 2.1.1 tidyr 1.1.4 tibble 3.1.6 ggplot2 3.3.5
## [17] tidyverse_1.3.1 install.load_1.2.3
##
## loaded via a namespace (and not attached):
## [ 1] httr l.4.2 splines_4.1.2 jsonlite_1.7.2 modelr_0.1.8
## [5] assertthat_0.2.1 highr_0.9 cellranger_ 1.1.0 yaml_2.2.1
## [9] gdtools 0.2.3 lattice 0.20-45 pillar 1.6.4 backports 1.4.1
## [13] glue_1.6.0 uuid_1.0-3 digest_0.6.29 checkmate_2.0.0
## [17] rvest_1.0.2 colorspace_2.0-2 Matrix_1.4-0 htmltools_0.5.2
## [21] plyr_1.8.6 pkgconfig_2.0.3 broom_0.7.10 haven_2.4.3
## [25] officer_0.4.1 tzdb_0.2.0 mgcv_1.8-38 generics O.1.1
## [29] farver_2.1.0 ellipsis_0.3.2 withr_2.4.3 cli_3.1.0
## [33] magrittr_2.0.1 crayon_1.4.2 readxl_1.3.1 evaluate_0.14
## [37] fs_1.5.2 fansi 0.5.0 nlme 3.1-153 xml2_1.3.3
## [41] tools_4.1.2 data.table_ 1.14.2 hms_l.l.l lifecycle_1.0.1
## [45] munsell_0.5.0 reprex_2.0.1 zip_2.2.0 compiler_4.1.2
## [49] systemfonts_1.0.3 rlang_0.4.12 grid_4.1.2 rstudioapi_0.13
## [53] base64enc 0.1-3 labeling 0.4.2 rmarkdown 2.11.3 gtable 0.3.0
## [57] DBI_1.1.2 R6 2.5.1 lubridate_ 1.8.0 knitr_1.37
## [61] fastmap_ 1.1.0 utf8_1.2.2 fastmatch_l.l-3 stringi_1.7.6
## [65] Repp 1.0.7 vctrs 0.3.8 dbplyr 2.1.1 tidyselect 1.1.1
## [69] xfun_0.29
End of Report
6 Jan 2022
53
------- |