r data.table 在函数调用中的用法
r data.table usage in function call
我想在函数调用中一遍又一遍地执行 data.table 任务: My problem is similar to Data.table and get() command (R) or pass column name in data.table using variable in R 但我无法让它工作
没有函数调用,这工作得很好:
# Load data.table
require(data.table)
# Some data
set.seed(1)
dt <- data.table(type = factor(sample(c("A", "B", "C"), 10e3, replace = T)),
weight = rnorm(n = 10e3, mean = 70, sd = 20))
# Decide the minimum frequency a level needs...
min.freq <- 3350
# Levels that don't meet minumum frequency (using data.table)
fail.min.f <- dt[, .N, type][N < min.freq, type]
# Call all these level "Other"
levels(dt$type)[fail.min.f] <- "Other"
但包裹得像
reduceCategorical <- function(variableName, min.freq){
fail.min.f <- dt[, .N, variableName][N < min.freq, variableName]
levels(dt[, variableName][fail.min.f]) <- "Other"
}
我只收到如下错误:
reduceCategorical(dt$x, 3350)
Fehler in levels(df[, variableName][fail.min.f]) <- "Other" :
trying to set attribute of NULL value
有时
Error is: number of levels differs
您在包装器中引用的内容略有不同,为了获得 "type" 列名称,您使用的是整个 variableName
这实际上是一个与获取级别相同的向量,您没有使用 variableName
直接在函数中完成
错误是因为 fail.min.f
的值由于引用而变为 NULL。
一种可能性是使用 data.table::setattr
定义您自己的重新调平函数,这将修改 dt
。像
DTsetlvls <- function(x, newl)
setattr(x, "levels", c(setdiff(levels(x), newl), rep("other", length(newl))))
然后在另一个预定义函数中使用它
f <- function(variableName, min.freq){
fail.min.f <- dt[, .N, by = variableName][N < min.freq, get(variableName)]
dt[, DTsetlvls(get(variableName), fail.min.f)]
invisible()
}
f("type", min.freq)
levels(dt$type)
# [1] "C" "other"
一些其他 data.table
备选方案
f <- function(var, min.freq) {
fail.min.f <- dt[, .N, by = var][N < min.freq, get(var)]
dt[get(var) %in% fail.min.f, (var) := "Other"]
dt[, (var) := factor(get(var))]
}
或使用set
/.I
f <- function(var, min.freq) {
fail.min.f <- dt[, .I[.N < min.freq], by = var]$V1
set(dt, fail.min.f, var, "other")
set(dt, NULL, var, factor(dt[[var]]))
}
或结合base R(不修改原始数据集)
f <- function(df, variableName, min.freq){
fail.min.f <- df[, .N, by = variableName][N < min.freq, get(variableName)]
levels(df$type)[fail.min.f] <- "Other"
df
}
或者,我们可以坚持 character
s 代替(如果 type
是 character
),你可以简单地做
f <- function(var, min.freq) dt[, (var) := if(.N < min.freq) "other", by = var]
我想在函数调用中一遍又一遍地执行 data.table 任务:
没有函数调用,这工作得很好:
# Load data.table
require(data.table)
# Some data
set.seed(1)
dt <- data.table(type = factor(sample(c("A", "B", "C"), 10e3, replace = T)),
weight = rnorm(n = 10e3, mean = 70, sd = 20))
# Decide the minimum frequency a level needs...
min.freq <- 3350
# Levels that don't meet minumum frequency (using data.table)
fail.min.f <- dt[, .N, type][N < min.freq, type]
# Call all these level "Other"
levels(dt$type)[fail.min.f] <- "Other"
但包裹得像
reduceCategorical <- function(variableName, min.freq){
fail.min.f <- dt[, .N, variableName][N < min.freq, variableName]
levels(dt[, variableName][fail.min.f]) <- "Other"
}
我只收到如下错误:
reduceCategorical(dt$x, 3350)
Fehler in levels(df[, variableName][fail.min.f]) <- "Other" :
trying to set attribute of NULL value
有时
Error is: number of levels differs
您在包装器中引用的内容略有不同,为了获得 "type" 列名称,您使用的是整个 variableName
这实际上是一个与获取级别相同的向量,您没有使用 variableName
直接在函数中完成
错误是因为 fail.min.f
的值由于引用而变为 NULL。
一种可能性是使用 data.table::setattr
定义您自己的重新调平函数,这将修改 dt
。像
DTsetlvls <- function(x, newl)
setattr(x, "levels", c(setdiff(levels(x), newl), rep("other", length(newl))))
然后在另一个预定义函数中使用它
f <- function(variableName, min.freq){
fail.min.f <- dt[, .N, by = variableName][N < min.freq, get(variableName)]
dt[, DTsetlvls(get(variableName), fail.min.f)]
invisible()
}
f("type", min.freq)
levels(dt$type)
# [1] "C" "other"
一些其他 data.table
备选方案
f <- function(var, min.freq) {
fail.min.f <- dt[, .N, by = var][N < min.freq, get(var)]
dt[get(var) %in% fail.min.f, (var) := "Other"]
dt[, (var) := factor(get(var))]
}
或使用set
/.I
f <- function(var, min.freq) {
fail.min.f <- dt[, .I[.N < min.freq], by = var]$V1
set(dt, fail.min.f, var, "other")
set(dt, NULL, var, factor(dt[[var]]))
}
或结合base R(不修改原始数据集)
f <- function(df, variableName, min.freq){
fail.min.f <- df[, .N, by = variableName][N < min.freq, get(variableName)]
levels(df$type)[fail.min.f] <- "Other"
df
}
或者,我们可以坚持 character
s 代替(如果 type
是 character
),你可以简单地做
f <- function(var, min.freq) dt[, (var) := if(.N < min.freq) "other", by = var]