R中截断帕累托的形状估计

Question

library(VGAM)
library(fitdistrplus)

fitdist(u_NI$k_u, 'truncpareto',
         start = list(lower=1,
                      upper=42016,
                      shape=1)) -> fit.k_u

length(u_NI$k_u) = 637594

我收到这个错误：

<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  : 
  the function mle failed to estimate the parameters, 
                with the error code 100
In addition: Warning messages:
1: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The dtruncpareto function should return a zero-length vector when input has length zero
2: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The ptruncpareto function should return a zero-length vector when input has length zero

是数据集过大的问题还是起始参数的问题？

可重现的例子：

library(VGAM)
library(fitdistrplus)

rtruncpareto(100,1,100,1.5) -> a
fitdist(a, "truncpareto",
        start = list(lower=1,
                     upper=100,
                     shape=1.5))

这行不通，我不明白为什么。

好像这里有问题:

argument 'lower' must be positive

Answer 1

我不太确定，但这可能是你想要的。

library(fitdistrplus)
library(VGAM)
library(ggplot2)

u_NI <- data.frame("k_u" = rtruncpareto(10000,
                                        lower = 1,
                                        upper = 100,
                                        shape = 1.5))

fit <- vglm(k_u ~ 1,
            truncpareto(lower=1, upper=max(u_NI$k_u) + 1),
            data = u_NI,
            trace = TRUE)

x <- seq(1, max(u_NI$k_u), length.out = 10000)
y <- dtruncpareto(x, shape = fit@coefficients[["(Intercept)"]],
                  lower=fit@extra$lower,
                  upper=fit@extra$upper)
pareto <- cbind.data.frame(x, y)

ggplot()+
  geom_density(data = u_NI, aes(k_u))+
  geom_line(data = pareto, aes(x = x, y = y, color = "truncpareto"),linetype = 5, size = 1.3)+
  theme(legend.position = c(.95, .95),
        legend.justification = c(1, 1),
        legend.title = element_blank())

########################## 编辑：

# fix lower and upper boundary, estimate may still fail, depending on     the start value of shape
fitdist(u_NI$k_u, 'truncpareto',
        method = "mle",
        start = list(shape=1),
        fix.arg=list(lower=1, upper=max(u_NI$k_u) + 0.01),
        control=list(trace=1, REPORT=1)) -> fit.k_u

# use different method to estimate parameter, mge seems to work
fitdist(u_NI$k_u, 'truncpareto',
        method = "mge",
        start = list(shape=1,
                     lower=1,
                     upper=max(u_NI$k_u) + 1),
        control=list(trace=1, REPORT=1)) -> fit.k_u

剩下的错误是指这个：

> pnorm(numeric(), 1,10)
numeric(0)
> ptruncpareto(numeric(), 1,10,5)
[1] NA

####### 编辑 2：我想我发现了最初的错误并且有点混乱。但是，mle 也有一个较低的参数，它与分布的较低参数分开。它应该将参数估计值限制在 >= 0。因此，这应该可以工作，但是需要很长时间，即使只有 10,000 个值：

fitdist(u_NI$k_u, 'truncpareto',
        method = "mle",
        start = list(shape=1,
                     lower=1,
                     upper=max(u_NI$k_u) +1 ),
        control=list(trace=1, REPORT=1),
        lower = 0) -> fit.k_u

Answer 2

您的部分问题是一个更普遍的问题，即对于截断分布，边界参数的 MLE 通常等于 observed min/max 的数据集。因此，通过将 lower/upper 边界的值设置为等于 min/max，您应该始终至少能够做到这一点（根据我尝试此操作的经验，它们必须略微 below/above 观察到的边界）。（我还发现我必须设置 lower = 0 来阻止算法尝试形状参数的负值。）

library(VGAM); library(fitdistrplus)
set.seed(101)
rtruncpareto(100,1,100,1.5) -> a
eps <- 1e-8
fitdist(a, "truncpareto",
        start = list(shape=1.5),
        fix.arg = list(lower = min(a) - eps, upper = max(a) + eps),
        lower = 0)
Parameters:
      estimate Std. Error
shape 1.349885  0.1554436
Fixed parameters:
          value
lower  1.006844
upper 25.906577

fitdist 的替代方法是 bbmle:

library(bbmle)
m1 <- mle2(a ~ dtruncpareto(lower = min(a) - eps,
                            upper = max(a) + eps,
                            shape = exp(lshape)),
           start = list(lshape = 0),
           data = data.frame(a),
         method = "BFGS")
exp(coef(m1))   
  lshape 
1.349884

bbmle 稍微灵活一点，允许您在对数刻度上拟合形状参数，这通常更稳健（并使标准偏差估计更有用）。这里使用 method = "BFGS" 是因为默认的 Nelder-Mead 算法对于一维优化效果不佳；也可以使用 method = "Brent"（这会更有效和更健壮），但随后需要为 lshape 参数提供明确的下限和上限。

R中截断帕累托的形状估计

Shape estimation in truncated Pareto in R

estimation

r

vgam

fitdistrplus