R bootstrap 数据 table 组的加权平均值
R bootstrap weighted mean by group with data table
我正在尝试结合两种方法:
- Bootstrapping multiple columns in data.table in a scalable fashion
和
这是一些随机数据:
## Generate sample data
# Function to randomly generate weights
set.seed(7)
rtnorm <- function(n, mean, sd, a = -Inf, b = Inf){
qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd)
}
# Generate variables
nps <- round(runif(3500, min=-1, max=1), 0) # nps value which takes 1, 0 or -1
group <- sample(letters[1:11], 3500, TRUE) # groups
weight <- rtnorm(n=3500, mean=1, sd=1, a=0.04, b=16) # weights between 0.04 and 16
# Build data frame
df = data.frame(group, nps, weight)
# The following packages / libraries are required:
require("data.table")
require("boot")
这是上面第一个 post 对加权平均值进行自举的代码:
samplewmean <- function(d, i, j) {
d <- d[i, ]
w <- j[i, ]
return(weighted.mean(d, w))
}
results_qsec <- boot(data= df[, 2, drop = FALSE],
statistic = samplewmean,
R=10000,
j = df[, 3 , drop = FALSE])
这完全没问题。
下面是上面第二个 post 的代码,在数据 table:
中按组引导平均值
dt = data.table(df)
stat <- function(x, i) {x[i, (m=mean(nps))]}
dt[, list(list(boot(.SD, stat, R = 100))), by = group]$V1
这也很好用。
我无法结合这两种方法:
运行 …
dt[, list(list(boot(.SD, samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
... 显示错误信息:
Error in weighted.mean.default(d, w) :
'x' and 'w' must have the same length
运行 …
dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
… 出现不同的错误:
Error in weighted.mean.default(d, w) :
(list) object cannot be coerced to type 'double'
我仍然无法理解 data.table 中的参数以及如何组合函数 运行 data.table.
如有任何帮助,我将不胜感激。
它与data.table
在函数范围内的行为有关。 d 仍然是 samplewmean
中的 data.table
,即使在用 i
进行子集化之后也是如此,而 weighted.mean
期望权重和值的数值向量。如果您在调用 weighted.mean
之前 unlist
,您将能够修复此错误
Error in weighted.mean.default(d, w) :
(list) object cannot be coerced to type 'double'
传递到 weighted.mean
之前取消列出的代码:
samplewmean <- function(d, i, j) {
d <- d[i, ]
w <- j[i, ]
return(weighted.mean(unlist(d), unlist(w)))
}
dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
更像data.table
的(data.table版本>=v1.10.2)语法大概如下:
#a variable named original is being passed in from somewhere and i am unable to figure out from where
samplewmean <- function(d, valCol, wgtCol, original) {
weighted.mean(unlist(d[, ..valCol]), unlist(d[, ..wgtCol]))
}
dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol="nps", wgtCol="weight"))), by=group]$V1
或者另一种可能的语法是:(参见 data.table faq 1.6)
samplewmean <- function(d, valCol, wgtCol, original) {
weighted.mean(unlist(d[, eval(substitute(valCol))]), unlist(d[, eval(substitute(wgtCol))]))
}
dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol=nps, wgtCol=weight))), by=group]$V1
我正在尝试结合两种方法:
- Bootstrapping multiple columns in data.table in a scalable fashion
和
这是一些随机数据:
## Generate sample data
# Function to randomly generate weights
set.seed(7)
rtnorm <- function(n, mean, sd, a = -Inf, b = Inf){
qnorm(runif(n, pnorm(a, mean, sd), pnorm(b, mean, sd)), mean, sd)
}
# Generate variables
nps <- round(runif(3500, min=-1, max=1), 0) # nps value which takes 1, 0 or -1
group <- sample(letters[1:11], 3500, TRUE) # groups
weight <- rtnorm(n=3500, mean=1, sd=1, a=0.04, b=16) # weights between 0.04 and 16
# Build data frame
df = data.frame(group, nps, weight)
# The following packages / libraries are required:
require("data.table")
require("boot")
这是上面第一个 post 对加权平均值进行自举的代码:
samplewmean <- function(d, i, j) {
d <- d[i, ]
w <- j[i, ]
return(weighted.mean(d, w))
}
results_qsec <- boot(data= df[, 2, drop = FALSE],
statistic = samplewmean,
R=10000,
j = df[, 3 , drop = FALSE])
这完全没问题。
下面是上面第二个 post 的代码,在数据 table:
中按组引导平均值dt = data.table(df)
stat <- function(x, i) {x[i, (m=mean(nps))]}
dt[, list(list(boot(.SD, stat, R = 100))), by = group]$V1
这也很好用。
我无法结合这两种方法:
运行 …
dt[, list(list(boot(.SD, samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
... 显示错误信息:
Error in weighted.mean.default(d, w) :
'x' and 'w' must have the same length
运行 …
dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
… 出现不同的错误:
Error in weighted.mean.default(d, w) :
(list) object cannot be coerced to type 'double'
我仍然无法理解 data.table 中的参数以及如何组合函数 运行 data.table.
如有任何帮助,我将不胜感激。
它与data.table
在函数范围内的行为有关。 d 仍然是 samplewmean
中的 data.table
,即使在用 i
进行子集化之后也是如此,而 weighted.mean
期望权重和值的数值向量。如果您在调用 weighted.mean
之前 unlist
,您将能够修复此错误
Error in weighted.mean.default(d, w) : (list) object cannot be coerced to type 'double'
传递到 weighted.mean
之前取消列出的代码:
samplewmean <- function(d, i, j) {
d <- d[i, ]
w <- j[i, ]
return(weighted.mean(unlist(d), unlist(w)))
}
dt[, list(list(boot(dt[, 2 , drop = FALSE], samplewmean, R = 5000, j = dt[, 3 , drop = FALSE]))), by = group]$V1
更像data.table
的(data.table版本>=v1.10.2)语法大概如下:
#a variable named original is being passed in from somewhere and i am unable to figure out from where
samplewmean <- function(d, valCol, wgtCol, original) {
weighted.mean(unlist(d[, ..valCol]), unlist(d[, ..wgtCol]))
}
dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol="nps", wgtCol="weight"))), by=group]$V1
或者另一种可能的语法是:(参见 data.table faq 1.6)
samplewmean <- function(d, valCol, wgtCol, original) {
weighted.mean(unlist(d[, eval(substitute(valCol))]), unlist(d[, eval(substitute(wgtCol))]))
}
dt[, list(list(boot(.SD, statistic=samplewmean, R=1, valCol=nps, wgtCol=weight))), by=group]$V1