data.table 大小和 datatable.alloccol 选项
data.table size and datatable.alloccol option
我正在处理的数据集不是很大,但很宽。我目前有 10 854 列,我想再添加大约 10/11k 列。它只有 760 行。
当我尝试(将函数应用于现有列的子集)时,我得到以下结果
Warning message:
In `[.data.table`(setDT(Final), , `:=`(c(paste0(vars, ".xy_diff"), :
truelength (30854) is greater than 10,000 items over-allocated (length = 10854). See ?truelength. If you didn't set the datatable.alloccol option very large, please report to data.table issue tracker including the result of sessionInfo().
我试过使用 setalloccol,但得到了类似的结果。例如:
setalloccol(Final, 40960)
Error in `[.data.table`(x, i, , ) :
getOption('datatable.alloccol') should be a number, by default 1024. But its type is 'language'.
In addition: Warning message:
In setalloccol(Final, 40960) :
tl (51894) is greater than 10,000 items over-allocated (l = 21174). If you didn't set the datatable.alloccol option to be very large, please report to data.table issue tracker including the result of sessionInfo().
有没有办法绕过这个问题?
非常感谢
编辑:
为了回答 Roland 的评论,这是我正在做的事情:
vars <- c(colnames(FinalTable_0)[271:290], colnames(FinalTable_0)[292:dim(FinalTable_0)[2]]) # <- variables I want to operate on
# FinalTable_0 is a previous table I use to collect the roots of the variables I want to work with
difference <- function(root) lapply(root, function(z) paste0("get('", z, ".x') - get('", z, ".y')"))
ratio <- function(root) lapply(root, function(z) paste0("get('", z, ".x') / get('", z, ".y')"))
# proceed to the computation
setDT(Final)[ , c(paste0(vars,".xy_diff"), paste0(vars,".xy_ratio")) := lapply(c(difference(vars), ratio(vars)), function(x) eval(parse(text = x)))]
我尝试了罗兰提出的解决方案,但并不完全满意。它有效,但我不喜欢转置我的数据的想法。
最后,我只是把原来的data.table拆分成多个,分别进行计算,最后再合并回来。快速简单,无需玩弄变量,分辨哪些是 id,哪些是度量,无需整形和重塑。我就是喜欢。
我正在处理的数据集不是很大,但很宽。我目前有 10 854 列,我想再添加大约 10/11k 列。它只有 760 行。
当我尝试(将函数应用于现有列的子集)时,我得到以下结果
Warning message:
In `[.data.table`(setDT(Final), , `:=`(c(paste0(vars, ".xy_diff"), :
truelength (30854) is greater than 10,000 items over-allocated (length = 10854). See ?truelength. If you didn't set the datatable.alloccol option very large, please report to data.table issue tracker including the result of sessionInfo().
我试过使用 setalloccol,但得到了类似的结果。例如:
setalloccol(Final, 40960)
Error in `[.data.table`(x, i, , ) :
getOption('datatable.alloccol') should be a number, by default 1024. But its type is 'language'.
In addition: Warning message:
In setalloccol(Final, 40960) :
tl (51894) is greater than 10,000 items over-allocated (l = 21174). If you didn't set the datatable.alloccol option to be very large, please report to data.table issue tracker including the result of sessionInfo().
有没有办法绕过这个问题?
非常感谢
编辑:
为了回答 Roland 的评论,这是我正在做的事情:
vars <- c(colnames(FinalTable_0)[271:290], colnames(FinalTable_0)[292:dim(FinalTable_0)[2]]) # <- variables I want to operate on
# FinalTable_0 is a previous table I use to collect the roots of the variables I want to work with
difference <- function(root) lapply(root, function(z) paste0("get('", z, ".x') - get('", z, ".y')"))
ratio <- function(root) lapply(root, function(z) paste0("get('", z, ".x') / get('", z, ".y')"))
# proceed to the computation
setDT(Final)[ , c(paste0(vars,".xy_diff"), paste0(vars,".xy_ratio")) := lapply(c(difference(vars), ratio(vars)), function(x) eval(parse(text = x)))]
我尝试了罗兰提出的解决方案,但并不完全满意。它有效,但我不喜欢转置我的数据的想法。
最后,我只是把原来的data.table拆分成多个,分别进行计算,最后再合并回来。快速简单,无需玩弄变量,分辨哪些是 id,哪些是度量,无需整形和重塑。我就是喜欢。