在函数调用中按多个因子列拆分数据框

Question

我想编写一个函数，将 df 拆分为多个因子变量（一次一个），然后运行在结果列表中添加另一个函数。但是，我找不到在 base::split

中调用该因子的正确方法

这是我到目前为止尝试过的方法

library (tidyverse)
fun_res  <- function (x,y) {
list_temp <- base::split (x, x$y, drop = FALSE) 

lapply (list_temp, another_fun) # does another function and returns results in a list
}

然后我想运行 fun_res 将 df 拆分为各种因子列

fun_res_(df, factor_col1)
fun_res_(df, factor_col2)

但是，x$y会导致以下错误Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : group length is 0 but data length > 0 正确的方法是什么？

这是一个简短的代表：

library (tidyverse) 
data1 <- c(1,2,3,4,1,2,3,4)
data2 <- c(4,3,2,1,4,3,2,1)
factor1 <- c(rep(1,4), rep(2,4)) %>% as.factor ()
factor2 <- c(rep(1,5), rep(2,3)) %>% as.factor ()

df <- data.frame (data1, data2, factor1, factor2)

fun_res  <- function (x,y) {
  list_temp <- base::split (x, x$y, drop = FALSE) 
  
  lapply (list_temp, function (z){ # just a random function
    as.list(z) %>%
      return ()
  }) 
}

fun_res(df, factor1)
fun_res(df, factor2)

之所以要按顺序为每个因素调用fun_res是因为对于我的真实数据，lapplyreturns中的函数是一个统计测试结果列表我想通过分别引用每个结果列表来打印。

Answer 1

在 base R 中，如果我们要传递不带引号的参数，请使用 substitute 和 deparse 作为字符，然后使用 [[

对列进行子集化

fun_res  <- function (x,y) {
    y <- deparse(substitute(y))
    list_temp <- base::split (x, x[[y]], drop = FALSE) 

 list_temp
  }

-测试

> fun_res(df, factor1)
$`1`
  data1 data2 factor1 factor2
1     1     4       1       1
2     2     3       1       1
3     3     2       1       1
4     4     1       1       1

$`2`
  data1 data2 factor1 factor2
5     1     4       2       1
6     2     3       2       2
7     3     2       2       2
8     4     1       2       2

> fun_res(df, factor2)
$`1`
  data1 data2 factor1 factor2
1     1     4       1       1
2     2     3       1       1
3     3     2       1       1
4     4     1       1       1
5     1     4       2       1

$`2`
  data1 data2 factor1 factor2
6     2     3       2       2
7     3     2       2       2
8     4     1       2       2

在函数调用中按多个因子列拆分数据框

Split a data frame by multiple factor columns in a function call

functional-programming

r

dataframe