用于估计 R 中的广义帕累托分布参数的函数不起作用
Function for estimating Generalized Pareto Distribution parameters in R not working
在这篇论文中https://www.jstor.org/stable/27867255?seq=1#metadata_info_tab_contents他们提供了一种基于轮廓似然函数和经验贝叶斯方法的 GPD 估计方法。他们提供的代码如下:
# x is the sample data from the GPD
f <- function(x) {
n <- length(x)
x <- sort(x)
lx <- function(b,x) {
k <- -mean(log(1-b*x))
if (b==0) {
k-1-log(mean(x))
} else {
k-1+log(b/k)
}
}
p <- (3:9)/10
xp <- x[round(n*(1-p)+.5)]
m <- 20+round(n^.5)
xq <- x[round(n*(1-p*p)+.5)]
k <- log(xq/xp-1,p)
a <- k*xp/(1-p^k)
a[k==0] <- (-xp/log(p))[k==0]
k <- -1
b <- w <- L <- (n-1)/(n+1)/n[n]-(1-((1:m-.5)/m)^k)/k/median(a)/2
for (i in 1:m) L[i] <- n*lx(b[i],x)
for (i in 1:m) w[i] <- 1/sum(exp(L-L[i]))
b <- sum(b*w)
k <- -mean(log(1-b*x))
list(sigma=k/b,k=k)
}
我从形状参数等于 1 且比例参数等于 -1 的 GPD 模拟了大小为 n = 100 的 x 值向量。然后我得到错误:
Error in if (condition) { : missing value where TRUE/FALSE needed
根据这个问题 Error in if/while (condition) {: missing Value where TRUE/FALSE needed 错误可以归因于 b 向量中的 NA 值,但我不明白为什么会这样。
我的 x 向量:
x
[1] 0.24264350 0.71418670 0.90929131 1.31011008 1.26467953 0.66491141 40.16132902
[8] 0.24815930 0.83783287 1.18879459 1.87167421 0.00396890 1.58125086 0.32042532
[15] 0.01330135 0.80800732 0.67832942 0.32623323 10.87447448 0.64779973 0.16550198
[22] 1.03595024 0.08463867 6.17820208 15.07862358 0.19188437 0.14925543 2.20031552
[29] 1.14093185 2.92184174 0.79279984 0.14736789 0.32954571 24.41728358 5.64925376
[36] 1.31141081 1.62897827 0.01491289 0.86641233 0.82289179 0.12105522 1.66701079
[43] 4.65817711 53.08764134 8.87696704 0.61560327 0.77817268 0.65148331 2.33976096
[50] 2.96629395 0.43783850 3.11005777 0.44923740 0.27073261 2.98854135 1.85962571
[57] 0.67065561 1.09661475 1.55934896 1.65683579 0.62373160 27.04125805 0.16137916
[64] 0.32925148 0.40487288 1.41714764 0.48798225 0.07624247 2.32993211 2.13228723
[71] 6.74938943 2.10121108 0.56772145 2.84504482 1.67119601 1.68640938 23.65633183
[78] 7.78195561 0.21317910 0.33639542 1.14508402 6.61466064 2.03818446 2.28166528
[85] 0.28095778 30.93008603 0.16512528 1.21974281 0.32121843 103.90277963 0.09672460
[92] 1.04882224 1.68444513 29.93956683 0.43961534 2.60623811 0.25039076 0.76208631
[99] 568.46662349 1.66133511
我认为你的函数中有一些转录错误(公平地说,论文中的代码格式不正确)。我认为这应该是您要查找的内容:
f <- function(x) {
n <- length(x)
x <- sort(x)
lx <- function(b, x) {
k <- -mean(log(1 - b * x))
if (b == 0) k - 1 - log(mean(x)) else k - 1 + log(b/k)
}
p <- (3:9)/10
xp <- x[round(n*(1 - p)+ 0.5)]
m <- 20 + round(n^0.5)
xq <- x[round(n*(1 - p * p) + .5)]
k <- log(xq / xp - 1, p)
a <- k*xp/(1 - p^k)
a[k == 0] <- (-xp/log(p))[k == 0]
k <- -1
b <- w <- L <- (n-1)/(n+1)/x[n] -(1-((1:m-.5)/m)^k)/k/median(a)/2
for (i in 1:m) L[i] <- n*lx(b[i],x)
for (i in 1:m) w[i] <- 1/sum(exp(L-L[i]))
b <- sum(b * w)
k <- -mean(log(1 - b * x))
list(sigma = k / b, k = k)
}
根据您的数据,这给出了:
f(x)
#> $sigma
#> [1] 1.050348
#>
#> $k
#> [1] -1.070265
数据
x <- c(0.2426435, 0.7141867, 0.90929131, 1.31011008, 1.26467953, 0.66491141,
40.16132902, 0.2481593, 0.83783287, 1.18879459, 1.87167421, 0.0039689,
1.58125086, 0.32042532, 0.01330135, 0.80800732, 0.67832942, 0.32623323,
10.87447448, 0.64779973, 0.16550198, 1.03595024, 0.08463867,
6.17820208, 15.07862358, 0.19188437, 0.14925543, 2.20031552,
1.14093185, 2.92184174, 0.79279984, 0.14736789, 0.32954571, 24.41728358,
5.64925376, 1.31141081, 1.62897827, 0.01491289, 0.86641233, 0.82289179,
0.12105522, 1.66701079, 4.65817711, 53.08764134, 8.87696704,
0.61560327, 0.77817268, 0.65148331, 2.33976096, 2.96629395, 0.4378385,
3.11005777, 0.4492374, 0.27073261, 2.98854135, 1.85962571, 0.67065561,
1.09661475, 1.55934896, 1.65683579, 0.6237316, 27.04125805, 0.16137916,
0.32925148, 0.40487288, 1.41714764, 0.48798225, 0.07624247, 2.32993211,
2.13228723, 6.74938943, 2.10121108, 0.56772145, 2.84504482, 1.67119601,
1.68640938, 23.65633183, 7.78195561, 0.2131791, 0.33639542, 1.14508402,
6.61466064, 2.03818446, 2.28166528, 0.28095778, 30.93008603,
0.16512528, 1.21974281, 0.32121843, 103.90277963, 0.0967246,
1.04882224, 1.68444513, 29.93956683, 0.43961534, 2.60623811,
0.25039076, 0.76208631, 568.46662349, 1.66133511)
在这篇论文中https://www.jstor.org/stable/27867255?seq=1#metadata_info_tab_contents他们提供了一种基于轮廓似然函数和经验贝叶斯方法的 GPD 估计方法。他们提供的代码如下:
# x is the sample data from the GPD
f <- function(x) {
n <- length(x)
x <- sort(x)
lx <- function(b,x) {
k <- -mean(log(1-b*x))
if (b==0) {
k-1-log(mean(x))
} else {
k-1+log(b/k)
}
}
p <- (3:9)/10
xp <- x[round(n*(1-p)+.5)]
m <- 20+round(n^.5)
xq <- x[round(n*(1-p*p)+.5)]
k <- log(xq/xp-1,p)
a <- k*xp/(1-p^k)
a[k==0] <- (-xp/log(p))[k==0]
k <- -1
b <- w <- L <- (n-1)/(n+1)/n[n]-(1-((1:m-.5)/m)^k)/k/median(a)/2
for (i in 1:m) L[i] <- n*lx(b[i],x)
for (i in 1:m) w[i] <- 1/sum(exp(L-L[i]))
b <- sum(b*w)
k <- -mean(log(1-b*x))
list(sigma=k/b,k=k)
}
我从形状参数等于 1 且比例参数等于 -1 的 GPD 模拟了大小为 n = 100 的 x 值向量。然后我得到错误:
Error in if (condition) { : missing value where TRUE/FALSE needed
根据这个问题 Error in if/while (condition) {: missing Value where TRUE/FALSE needed 错误可以归因于 b 向量中的 NA 值,但我不明白为什么会这样。
我的 x 向量:
x
[1] 0.24264350 0.71418670 0.90929131 1.31011008 1.26467953 0.66491141 40.16132902
[8] 0.24815930 0.83783287 1.18879459 1.87167421 0.00396890 1.58125086 0.32042532
[15] 0.01330135 0.80800732 0.67832942 0.32623323 10.87447448 0.64779973 0.16550198
[22] 1.03595024 0.08463867 6.17820208 15.07862358 0.19188437 0.14925543 2.20031552
[29] 1.14093185 2.92184174 0.79279984 0.14736789 0.32954571 24.41728358 5.64925376
[36] 1.31141081 1.62897827 0.01491289 0.86641233 0.82289179 0.12105522 1.66701079
[43] 4.65817711 53.08764134 8.87696704 0.61560327 0.77817268 0.65148331 2.33976096
[50] 2.96629395 0.43783850 3.11005777 0.44923740 0.27073261 2.98854135 1.85962571
[57] 0.67065561 1.09661475 1.55934896 1.65683579 0.62373160 27.04125805 0.16137916
[64] 0.32925148 0.40487288 1.41714764 0.48798225 0.07624247 2.32993211 2.13228723
[71] 6.74938943 2.10121108 0.56772145 2.84504482 1.67119601 1.68640938 23.65633183
[78] 7.78195561 0.21317910 0.33639542 1.14508402 6.61466064 2.03818446 2.28166528
[85] 0.28095778 30.93008603 0.16512528 1.21974281 0.32121843 103.90277963 0.09672460
[92] 1.04882224 1.68444513 29.93956683 0.43961534 2.60623811 0.25039076 0.76208631
[99] 568.46662349 1.66133511
我认为你的函数中有一些转录错误(公平地说,论文中的代码格式不正确)。我认为这应该是您要查找的内容:
f <- function(x) {
n <- length(x)
x <- sort(x)
lx <- function(b, x) {
k <- -mean(log(1 - b * x))
if (b == 0) k - 1 - log(mean(x)) else k - 1 + log(b/k)
}
p <- (3:9)/10
xp <- x[round(n*(1 - p)+ 0.5)]
m <- 20 + round(n^0.5)
xq <- x[round(n*(1 - p * p) + .5)]
k <- log(xq / xp - 1, p)
a <- k*xp/(1 - p^k)
a[k == 0] <- (-xp/log(p))[k == 0]
k <- -1
b <- w <- L <- (n-1)/(n+1)/x[n] -(1-((1:m-.5)/m)^k)/k/median(a)/2
for (i in 1:m) L[i] <- n*lx(b[i],x)
for (i in 1:m) w[i] <- 1/sum(exp(L-L[i]))
b <- sum(b * w)
k <- -mean(log(1 - b * x))
list(sigma = k / b, k = k)
}
根据您的数据,这给出了:
f(x)
#> $sigma
#> [1] 1.050348
#>
#> $k
#> [1] -1.070265
数据
x <- c(0.2426435, 0.7141867, 0.90929131, 1.31011008, 1.26467953, 0.66491141,
40.16132902, 0.2481593, 0.83783287, 1.18879459, 1.87167421, 0.0039689,
1.58125086, 0.32042532, 0.01330135, 0.80800732, 0.67832942, 0.32623323,
10.87447448, 0.64779973, 0.16550198, 1.03595024, 0.08463867,
6.17820208, 15.07862358, 0.19188437, 0.14925543, 2.20031552,
1.14093185, 2.92184174, 0.79279984, 0.14736789, 0.32954571, 24.41728358,
5.64925376, 1.31141081, 1.62897827, 0.01491289, 0.86641233, 0.82289179,
0.12105522, 1.66701079, 4.65817711, 53.08764134, 8.87696704,
0.61560327, 0.77817268, 0.65148331, 2.33976096, 2.96629395, 0.4378385,
3.11005777, 0.4492374, 0.27073261, 2.98854135, 1.85962571, 0.67065561,
1.09661475, 1.55934896, 1.65683579, 0.6237316, 27.04125805, 0.16137916,
0.32925148, 0.40487288, 1.41714764, 0.48798225, 0.07624247, 2.32993211,
2.13228723, 6.74938943, 2.10121108, 0.56772145, 2.84504482, 1.67119601,
1.68640938, 23.65633183, 7.78195561, 0.2131791, 0.33639542, 1.14508402,
6.61466064, 2.03818446, 2.28166528, 0.28095778, 30.93008603,
0.16512528, 1.21974281, 0.32121843, 103.90277963, 0.0967246,
1.04882224, 1.68444513, 29.93956683, 0.43961534, 2.60623811,
0.25039076, 0.76208631, 568.46662349, 1.66133511)