将函数应用于环境中的所有 data.frames

Applying a function to all data.frames in the environment

我想在我的环境中对所有 data.frames 使用下面的 cleanfunction

cleanfunction <- function(dataframe) {
  dataframe <- as.data.frame(dataframe)
  ## get mode of all vars
  var_mode <- sapply(dataframe, mode)
  ## produce error if complex or raw is found
  if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
  ## get class of all vars
  var_class <- sapply(dataframe, class)
  ## produce error if an "AsIs" object has "logical" or "character" mode
  if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
      stop("matrix variables with 'AsIs' class must be 'numeric'")
      }
  ## identify columns that needs be coerced to factors
  ind1 <- which(var_mode %in% c("logical", "character"))
  ## coerce logical / character to factor with `as.factor`
  dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
  return(dataframe)
}

set.seed(10238)
DT = data.table(
  A = rep(1:3, each = 5L), 
  B = rep(1:5, 3L),
  C = sample(15L),
  D = sample(15L)
)
DT_II <- copy(DT)
dfs <- ls()

现在我想将此函数应用于环境中的所有 df。我已经尝试了大约十件事,但我无法获得正确的语法..

for (i in seq_along(dfs)) {
  get(dfs[i])[ , lapply(.SD, cleanfunction)]
}

编辑:

我找到了 ,但它没有存储结果。

eapply(globalenv(), function(x) if (is.data.frame(x)) cleanfunction(x))

如何将结果存储在每个对象中?

get(dfs[i]) 其中 return 是对 data.table 的引用,但是你 lapply-ing 该帧的每一列,我从您期望完整帧的函数参数 dataframe。开头可能是:

for (i in seq_along(dfs)) {
  get(dfs[i])[ , cleanfunction(.SD)]
}

但意识到此操作 return 是一个新框架,它不使用规范的 data.table 机制来更新数据 in-place。我建议你更新你的函数以始终强制 data.table 并参考地处理它。

cleanfunction <- function(dataframe) {
  setDT(dataframe)
  ## get mode of all vars
  var_mode <- sapply(dataframe, mode)
  ## produce error if complex or raw is found
  if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
  ## get class of all vars
  var_class <- sapply(dataframe, class)
  ## produce error if an "AsIs" object has "logical" or "character" mode
  if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
      stop("matrix variables with 'AsIs' class must be 'numeric'")
      }
  ## identify columns that needs be coerced to factors
  ind1 <- which(var_mode %in% c("logical", "character"))
  ## coerce logical / character to factor with `as.factor`
  if (length(ind1)) dataframe[, c(ind1) := lapply(.SD, as.factor), .SDcols = ind1]
  return(dataframe)
}

由于您当前的数据没有触发任何变化,我将更新一个:

DT[,quux:="A"]
head(DT)
#        A     B     C     D   quux
#    <int> <int> <int> <int> <char>
# 1:     1     1    12    15      A
# 2:     1     2     4     6      A
# 3:     1     3     5     7      A
# 4:     1     4     9     1      A
# 5:     1     5     6    14      A
# 6:     2     1    15    13      A

for (i in seq_along(dfs)) cleanfunction(get(dfs[i]))
head(DT)
#        A     B     C     D   quux
#    <int> <int> <int> <int> <fctr>
# 1:     1     1    12    15      A
# 2:     1     2     4     6      A
# 3:     1     3     5     7      A
# 4:     1     4     9     1      A
# 5:     1     5     6    14      A
# 6:     2     1    15    13      A

请注意,for 循环仅依赖于引用更新;来自 cleanfunction 的 return 值在这里被忽略。

由于 data.table 引用语义,此方法完全有效;如果您使用 data.frametbl_df,这可能需要使用 assign(dfs[i], cleanfunction(..)).

包装对 cleanfunction(.) 的调用

这对你有用吗?:

# store all dataframes from environment a list
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))

#then apply your function
lapply(dfs, cleanfunction)