在 purrr:::map_dfr 函数中使用列表名称

Use list names inside purrr:::map_dfr function

我正在尝试一些相对简单的事情,但遇到了一些困难。假设我有两个数据框 df1df2:

df1:

id  expenditure
1    10
2    20
1    30
2    50

df2:

id  expenditure
1    30
2    50
1    60
2    10

我也将它们添加到命名列表中:

table_list = list()
table_list[['a']] = df1
table_list[['b']] = df2

现在我想通过一个函数执行一些汇总操作,然后绑定那些行:

get_summary = function(table){
   final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))

}

然后通过map_dfr应用这个:

summary = table_list %>% map_dfr(get_summary, id='origin_table')

所以,这将创建一个几乎是我正在寻找的东西:

 origin_table   id   total_expenditure
      a          1       40
      a          2       70
      b          1       90
      b          2       60

但是,如果我想根据传递的列表元素做一些特定的事情,比如这样:

get_summary = function(table, name){
   dummy_list = c(TRUE, FALSE)
   names(dummy_list) = c('a', 'b')

   final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))

   is_true = dummy_list[[name]] # Want to use the original name to call another list

   if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1) 

   return(final_table)

}

这会带来这样的东西:

 origin_table   id   total_expenditure
      a          1       41
      a          2       71
      b          1       90
      b          2       60

那么有什么方法可以在我的函数中使用列表名称吗?或者有什么方法可以识别我正在使用的列表中的哪个元素?也许 map_dfr 限制太多,我必须使用其他东西?

编辑:更改了示例,使其更贴近现实

通过在数据帧上添加 origin_table 作为 pre-existing 列设法做到了:

df1 = df1 %>% mutate(origin_table = 'a')
df2 = df2 %>% mutate(origin_table = 'b')

然后我可以通过执行以下操作来提取原点:

get_summary = function(table){
   dummy_list = c(TRUE, FALSE)
   names(dummy_list) = c('a', 'b')

   origin = table %>% distinct(origin_table) %>% pull

   final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))

   is_true = dummy_list[[origin ]] # Want to use the original name to call another list

   if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1) 

   return(final_table)

}
 

不使用map,而是使用imap,这可以return.y

中的列表名称
library(purrr)
library(dplyr)
get_summary = function(dat, name){
   dat %>%
       group_by(id) %>%
        summarise(total_expenditure= sum(expenditure, na.rm = TRUE), 
              .groups = "drop") %>%
        mutate(total_expenditure = if(name=='a')
                total_expenditure + 1 else total_expenditure)

}

-测试

> table_list %>% 
    imap_dfr(~ get_summary(.x, name = .y), .id = 'origin_table')
# A tibble: 4 × 3
  origin_table    id total_expenditure
  <chr>        <int>             <dbl>
1 a                1                41
2 a                2                71
3 b                1                90
4 b                2                60

数据

table_list <- list(a = structure(list(id = c(1L, 2L, 1L, 2L), 
expenditure = c(10L, 
20L, 30L, 50L)), class = "data.frame", row.names = c(NA, -4L)), 
    b = structure(list(id = c(1L, 2L, 1L, 2L), expenditure = c(30L, 
    50L, 60L, 10L)), class = "data.frame", row.names = c(NA, 
    -4L)))