R中截断帕累托的形状估计
Shape estimation in truncated Pareto in R
library(VGAM)
library(fitdistrplus)
fitdist(u_NI$k_u, 'truncpareto',
start = list(lower=1,
upper=42016,
shape=1)) -> fit.k_u
length(u_NI$k_u) = 637594
我收到这个错误:
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016, :
the function mle failed to estimate the parameters,
with the error code 100
In addition: Warning messages:
1: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016, :
The dtruncpareto function should return a zero-length vector when input has length zero
2: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016, :
The ptruncpareto function should return a zero-length vector when input has length zero
是数据集过大的问题还是起始参数的问题?
可重现的例子:
library(VGAM)
library(fitdistrplus)
rtruncpareto(100,1,100,1.5) -> a
fitdist(a, "truncpareto",
start = list(lower=1,
upper=100,
shape=1.5))
这行不通,我不明白为什么。
好像这里有问题:
argument 'lower' must be positive
我不太确定,但这可能是你想要的。
library(fitdistrplus)
library(VGAM)
library(ggplot2)
u_NI <- data.frame("k_u" = rtruncpareto(10000,
lower = 1,
upper = 100,
shape = 1.5))
fit <- vglm(k_u ~ 1,
truncpareto(lower=1, upper=max(u_NI$k_u) + 1),
data = u_NI,
trace = TRUE)
x <- seq(1, max(u_NI$k_u), length.out = 10000)
y <- dtruncpareto(x, shape = fit@coefficients[["(Intercept)"]],
lower=fit@extra$lower,
upper=fit@extra$upper)
pareto <- cbind.data.frame(x, y)
ggplot()+
geom_density(data = u_NI, aes(k_u))+
geom_line(data = pareto, aes(x = x, y = y, color = "truncpareto"),linetype = 5, size = 1.3)+
theme(legend.position = c(.95, .95),
legend.justification = c(1, 1),
legend.title = element_blank())
##########################
编辑:
# fix lower and upper boundary, estimate may still fail, depending on the start value of shape
fitdist(u_NI$k_u, 'truncpareto',
method = "mle",
start = list(shape=1),
fix.arg=list(lower=1, upper=max(u_NI$k_u) + 0.01),
control=list(trace=1, REPORT=1)) -> fit.k_u
# use different method to estimate parameter, mge seems to work
fitdist(u_NI$k_u, 'truncpareto',
method = "mge",
start = list(shape=1,
lower=1,
upper=max(u_NI$k_u) + 1),
control=list(trace=1, REPORT=1)) -> fit.k_u
剩下的错误是指这个:
> pnorm(numeric(), 1,10)
numeric(0)
> ptruncpareto(numeric(), 1,10,5)
[1] NA
####### 编辑 2:
我想我发现了最初的错误并且有点混乱。但是,mle 也有一个较低的参数,它与分布的较低参数分开。它应该将参数估计值限制在 >= 0。因此,这应该可以工作,但是需要很长时间,即使只有 10,000 个值:
fitdist(u_NI$k_u, 'truncpareto',
method = "mle",
start = list(shape=1,
lower=1,
upper=max(u_NI$k_u) +1 ),
control=list(trace=1, REPORT=1),
lower = 0) -> fit.k_u
您的部分问题是一个更普遍的问题,即对于截断分布,边界参数的 MLE 通常等于 observed min/max 的数据集。因此,通过将 lower/upper 边界的值设置为等于 min/max,您应该始终至少能够做到这一点(根据我尝试此操作的经验,它们必须 略微 below/above 观察到的边界)。 (我还发现我必须设置 lower = 0
来阻止算法尝试形状参数的负值。)
library(VGAM); library(fitdistrplus)
set.seed(101)
rtruncpareto(100,1,100,1.5) -> a
eps <- 1e-8
fitdist(a, "truncpareto",
start = list(shape=1.5),
fix.arg = list(lower = min(a) - eps, upper = max(a) + eps),
lower = 0)
Parameters:
estimate Std. Error
shape 1.349885 0.1554436
Fixed parameters:
value
lower 1.006844
upper 25.906577
fitdist
的替代方法是 bbmle
:
library(bbmle)
m1 <- mle2(a ~ dtruncpareto(lower = min(a) - eps,
upper = max(a) + eps,
shape = exp(lshape)),
start = list(lshape = 0),
data = data.frame(a),
method = "BFGS")
exp(coef(m1))
lshape
1.349884
bbmle
稍微灵活一点,允许您在对数刻度上拟合形状参数,这通常更稳健(并使标准偏差估计更有用)。这里使用 method = "BFGS"
是因为默认的 Nelder-Mead 算法对于一维优化效果不佳;也可以使用 method = "Brent"
(这会更有效和更健壮),但随后需要为 lshape
参数提供明确的下限和上限。
library(VGAM)
library(fitdistrplus)
fitdist(u_NI$k_u, 'truncpareto',
start = list(lower=1,
upper=42016,
shape=1)) -> fit.k_u
length(u_NI$k_u) = 637594
我收到这个错误:
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016, :
the function mle failed to estimate the parameters,
with the error code 100
In addition: Warning messages:
1: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016, :
The dtruncpareto function should return a zero-length vector when input has length zero
2: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016, :
The ptruncpareto function should return a zero-length vector when input has length zero
是数据集过大的问题还是起始参数的问题?
可重现的例子:
library(VGAM)
library(fitdistrplus)
rtruncpareto(100,1,100,1.5) -> a
fitdist(a, "truncpareto",
start = list(lower=1,
upper=100,
shape=1.5))
这行不通,我不明白为什么。
好像这里有问题:
argument 'lower' must be positive
我不太确定,但这可能是你想要的。
library(fitdistrplus)
library(VGAM)
library(ggplot2)
u_NI <- data.frame("k_u" = rtruncpareto(10000,
lower = 1,
upper = 100,
shape = 1.5))
fit <- vglm(k_u ~ 1,
truncpareto(lower=1, upper=max(u_NI$k_u) + 1),
data = u_NI,
trace = TRUE)
x <- seq(1, max(u_NI$k_u), length.out = 10000)
y <- dtruncpareto(x, shape = fit@coefficients[["(Intercept)"]],
lower=fit@extra$lower,
upper=fit@extra$upper)
pareto <- cbind.data.frame(x, y)
ggplot()+
geom_density(data = u_NI, aes(k_u))+
geom_line(data = pareto, aes(x = x, y = y, color = "truncpareto"),linetype = 5, size = 1.3)+
theme(legend.position = c(.95, .95),
legend.justification = c(1, 1),
legend.title = element_blank())
########################## 编辑:
# fix lower and upper boundary, estimate may still fail, depending on the start value of shape
fitdist(u_NI$k_u, 'truncpareto',
method = "mle",
start = list(shape=1),
fix.arg=list(lower=1, upper=max(u_NI$k_u) + 0.01),
control=list(trace=1, REPORT=1)) -> fit.k_u
# use different method to estimate parameter, mge seems to work
fitdist(u_NI$k_u, 'truncpareto',
method = "mge",
start = list(shape=1,
lower=1,
upper=max(u_NI$k_u) + 1),
control=list(trace=1, REPORT=1)) -> fit.k_u
剩下的错误是指这个:
> pnorm(numeric(), 1,10)
numeric(0)
> ptruncpareto(numeric(), 1,10,5)
[1] NA
####### 编辑 2: 我想我发现了最初的错误并且有点混乱。但是,mle 也有一个较低的参数,它与分布的较低参数分开。它应该将参数估计值限制在 >= 0。因此,这应该可以工作,但是需要很长时间,即使只有 10,000 个值:
fitdist(u_NI$k_u, 'truncpareto',
method = "mle",
start = list(shape=1,
lower=1,
upper=max(u_NI$k_u) +1 ),
control=list(trace=1, REPORT=1),
lower = 0) -> fit.k_u
您的部分问题是一个更普遍的问题,即对于截断分布,边界参数的 MLE 通常等于 observed min/max 的数据集。因此,通过将 lower/upper 边界的值设置为等于 min/max,您应该始终至少能够做到这一点(根据我尝试此操作的经验,它们必须 略微 below/above 观察到的边界)。 (我还发现我必须设置 lower = 0
来阻止算法尝试形状参数的负值。)
library(VGAM); library(fitdistrplus)
set.seed(101)
rtruncpareto(100,1,100,1.5) -> a
eps <- 1e-8
fitdist(a, "truncpareto",
start = list(shape=1.5),
fix.arg = list(lower = min(a) - eps, upper = max(a) + eps),
lower = 0)
Parameters:
estimate Std. Error
shape 1.349885 0.1554436
Fixed parameters:
value
lower 1.006844
upper 25.906577
fitdist
的替代方法是 bbmle
:
library(bbmle)
m1 <- mle2(a ~ dtruncpareto(lower = min(a) - eps,
upper = max(a) + eps,
shape = exp(lshape)),
start = list(lshape = 0),
data = data.frame(a),
method = "BFGS")
exp(coef(m1))
lshape
1.349884
bbmle
稍微灵活一点,允许您在对数刻度上拟合形状参数,这通常更稳健(并使标准偏差估计更有用)。这里使用 method = "BFGS"
是因为默认的 Nelder-Mead 算法对于一维优化效果不佳;也可以使用 method = "Brent"
(这会更有效和更健壮),但随后需要为 lshape
参数提供明确的下限和上限。