在 purrr:::map_dfr 函数中使用列表名称
Use list names inside purrr:::map_dfr function
我正在尝试一些相对简单的事情,但遇到了一些困难。假设我有两个数据框 df1
和 df2
:
df1:
id expenditure
1 10
2 20
1 30
2 50
df2:
id expenditure
1 30
2 50
1 60
2 10
我也将它们添加到命名列表中:
table_list = list()
table_list[['a']] = df1
table_list[['b']] = df2
现在我想通过一个函数执行一些汇总操作,然后绑定那些行:
get_summary = function(table){
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
}
然后通过map_dfr
应用这个:
summary = table_list %>% map_dfr(get_summary, id='origin_table')
所以,这将创建一个几乎是我正在寻找的东西:
origin_table id total_expenditure
a 1 40
a 2 70
b 1 90
b 2 60
但是,如果我想根据传递的列表元素做一些特定的事情,比如这样:
get_summary = function(table, name){
dummy_list = c(TRUE, FALSE)
names(dummy_list) = c('a', 'b')
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
is_true = dummy_list[[name]] # Want to use the original name to call another list
if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1)
return(final_table)
}
这会带来这样的东西:
origin_table id total_expenditure
a 1 41
a 2 71
b 1 90
b 2 60
那么有什么方法可以在我的函数中使用列表名称吗?或者有什么方法可以识别我正在使用的列表中的哪个元素?也许 map_dfr
限制太多,我必须使用其他东西?
编辑:更改了示例,使其更贴近现实
通过在数据帧上添加 origin_table
作为 pre-existing 列设法做到了:
df1 = df1 %>% mutate(origin_table = 'a')
df2 = df2 %>% mutate(origin_table = 'b')
然后我可以通过执行以下操作来提取原点:
get_summary = function(table){
dummy_list = c(TRUE, FALSE)
names(dummy_list) = c('a', 'b')
origin = table %>% distinct(origin_table) %>% pull
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
is_true = dummy_list[[origin ]] # Want to use the original name to call another list
if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1)
return(final_table)
}
不使用map
,而是使用imap
,这可以return.y
中的列表名称
library(purrr)
library(dplyr)
get_summary = function(dat, name){
dat %>%
group_by(id) %>%
summarise(total_expenditure= sum(expenditure, na.rm = TRUE),
.groups = "drop") %>%
mutate(total_expenditure = if(name=='a')
total_expenditure + 1 else total_expenditure)
}
-测试
> table_list %>%
imap_dfr(~ get_summary(.x, name = .y), .id = 'origin_table')
# A tibble: 4 × 3
origin_table id total_expenditure
<chr> <int> <dbl>
1 a 1 41
2 a 2 71
3 b 1 90
4 b 2 60
数据
table_list <- list(a = structure(list(id = c(1L, 2L, 1L, 2L),
expenditure = c(10L,
20L, 30L, 50L)), class = "data.frame", row.names = c(NA, -4L)),
b = structure(list(id = c(1L, 2L, 1L, 2L), expenditure = c(30L,
50L, 60L, 10L)), class = "data.frame", row.names = c(NA,
-4L)))
我正在尝试一些相对简单的事情,但遇到了一些困难。假设我有两个数据框 df1
和 df2
:
df1:
id expenditure
1 10
2 20
1 30
2 50
df2:
id expenditure
1 30
2 50
1 60
2 10
我也将它们添加到命名列表中:
table_list = list()
table_list[['a']] = df1
table_list[['b']] = df2
现在我想通过一个函数执行一些汇总操作,然后绑定那些行:
get_summary = function(table){
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
}
然后通过map_dfr
应用这个:
summary = table_list %>% map_dfr(get_summary, id='origin_table')
所以,这将创建一个几乎是我正在寻找的东西:
origin_table id total_expenditure
a 1 40
a 2 70
b 1 90
b 2 60
但是,如果我想根据传递的列表元素做一些特定的事情,比如这样:
get_summary = function(table, name){
dummy_list = c(TRUE, FALSE)
names(dummy_list) = c('a', 'b')
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
is_true = dummy_list[[name]] # Want to use the original name to call another list
if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1)
return(final_table)
}
这会带来这样的东西:
origin_table id total_expenditure
a 1 41
a 2 71
b 1 90
b 2 60
那么有什么方法可以在我的函数中使用列表名称吗?或者有什么方法可以识别我正在使用的列表中的哪个元素?也许 map_dfr
限制太多,我必须使用其他东西?
编辑:更改了示例,使其更贴近现实
通过在数据帧上添加 origin_table
作为 pre-existing 列设法做到了:
df1 = df1 %>% mutate(origin_table = 'a')
df2 = df2 %>% mutate(origin_table = 'b')
然后我可以通过执行以下操作来提取原点:
get_summary = function(table){
dummy_list = c(TRUE, FALSE)
names(dummy_list) = c('a', 'b')
origin = table %>% distinct(origin_table) %>% pull
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
is_true = dummy_list[[origin ]] # Want to use the original name to call another list
if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1)
return(final_table)
}
不使用map
,而是使用imap
,这可以return.y
library(purrr)
library(dplyr)
get_summary = function(dat, name){
dat %>%
group_by(id) %>%
summarise(total_expenditure= sum(expenditure, na.rm = TRUE),
.groups = "drop") %>%
mutate(total_expenditure = if(name=='a')
total_expenditure + 1 else total_expenditure)
}
-测试
> table_list %>%
imap_dfr(~ get_summary(.x, name = .y), .id = 'origin_table')
# A tibble: 4 × 3
origin_table id total_expenditure
<chr> <int> <dbl>
1 a 1 41
2 a 2 71
3 b 1 90
4 b 2 60
数据
table_list <- list(a = structure(list(id = c(1L, 2L, 1L, 2L),
expenditure = c(10L,
20L, 30L, 50L)), class = "data.frame", row.names = c(NA, -4L)),
b = structure(list(id = c(1L, 2L, 1L, 2L), expenditure = c(30L,
50L, 60L, 10L)), class = "data.frame", row.names = c(NA,
-4L)))