使用 purrr 遍历两个列表,然后通过管道输入 dplyr::filter
Using purrr to iterate over two lists and then pipe into dplyr::filter
library(tidyverse)
library(purrr)
使用下面的示例数据,我可以创建以下函数:
Funs <- function(DF, One, Two){
One <- enquo(One)
Two <- enquo(Two)
DF %>% filter(School == (!!One) & Code == (!!Two)) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
然后我可以使用该函数过滤两个变量 - 学校和代码 - 如下所示:
Funs(DF, "School1", "B344")
这一切都很好,但我的实际数据有很多变量,因此不必不断地将 "School" 和 "Code" 变量键入函数,我想使用 tidyverse 和 purrr包循环两个列表(一个学校,一个代码)并将其提供给过滤器。我希望输出是结果列表。
为了简单起见,要输入 dplyr::filter 的两个列表各只有两个值:School2 将使用 S300,School1 将使用 B344,就像上面的示例一样。
我试过的一些例子:
map2(c(“School2”, ”School1”),
c(“S300”, ”B344”),
function(x,y) {
DF %>% filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
还有...
map2(c("School2", "School1")),
c("S300","B344"),
~filter(School == .x & Code == .y) %>%
group_by(Code, School)%>%
summarise(Count = sum(Question1))
还有这个……
list(c("School2", "School1"), c("S300", "B344")) %>%
map2( ~ filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1)))
None 这些似乎有效,因此将不胜感激!
示例数据:
Code <- c("B344","B555","S300","T220","B888","B888","B555","B344","B344","T220","B555","B555","S300","B555","S300","S300","S300","S300","B344","B344","B888","B888","B888")
School <- c("School1","School1","School2","School3","School4","School4","School1","School1","School3","School3","School4","School1","School1","School3","School2","School2","School4","School2","School3","School4","School3","School1","School2")
Question1 <- c(3,4,5,4,5,5,5,4,5,3,4,5,4,5,4,3,3,3,4,5,4,3,3)
Question2 <- c(5,4,3,4,3,5,4,3,2,3,4,5,4,5,4,3,4,4,5,4,3,3,4)
DF <- data_frame(Code, School, Question1, Question2)
这里有一些选项,从最像您的代码到最优化:
library(tidyverse)
DF <- data_frame(Code = c("B344", "B555", "S300", "T220", "B888", "B888", "B555", "B344", "B344", "T220", "B555", "B555", "S300", "B555", "S300", "S300", "S300", "S300", "B344", "B344", "B888", "B888", "B888"),
School = c("School1", "School1", "School2", "School3", "School4", "School4", "School1", "School1", "School3", "School3", "School4", "School1", "School1", "School3", "School2", "School2", "School4", "School2", "School3", "School4", "School3", "School1", "School2"),
Question1 = c(3, 4, 5, 4, 5, 5, 5, 4, 5, 3, 4, 5, 4, 5, 4, 3, 3, 3, 4, 5, 4, 3, 3),
Question2 = c(5, 4, 3, 4, 3, 5, 4, 3, 2, 3, 4, 5, 4, 5, 4, 3, 4, 4, 5, 4, 3, 3, 4))
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
为了让 map2
工作,如果使用波浪符号,变量被命名为 .x
和 .y
;如果您使用常规函数符号,则可以随意调用它们。不要忘记 filter
的第一个参数是管道传入的数据帧,所以:
map2_dfr(wanted$School, wanted$Code, ~filter(DF, School == .x, Code == .y)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
由于我将 wanted
设置为数据框(原始列表也可以),您可以改用 pmap
。对于两个变量,带有 pmap
的参数名称实际上可以与 map2
相同,但它实际上是一个带有 ...
参数的函数,因此以不同的方式处理它们通常是有意义的,例如使用 ..1
表示法:
wanted %>%
pmap_dfr(~filter(DF, School == ..1, Code == ..2)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
上述两种技术的问题在于,在规模上,它们会很慢,因为它们 运行 filter
对于 wanted
的每一行,这意味着你多次重新测试每一行。为了使代码保持相似,避免额外工作的一种有点 hacky 的方法是将列合并为一个列,例如tidyr::unite
:
DF %>%
unite(school_code, School, Code) %>%
filter(school_code %in% invoke(paste, wanted, sep = '_')) %>% # or paste(wanted$School, wanted$Code, sep = '_') or equivalent
separate(school_code, c('School', 'Code')) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
...或者将它们合并到 filter
本身:
DF %>%
filter(paste(School, Code) %in% paste(wanted$School, wanted$Code)) %>% # or invoke(paste, wanted)
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
获得所需结果的最佳 方法可能更明显,因为我已经将 wanted
设置为数据框:一个连接,它被设计准确地完成这项工作:
DF %>%
inner_join(wanted) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> Joining, by = c("Code", "School")
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
library(tidyverse)
library(purrr)
使用下面的示例数据,我可以创建以下函数:
Funs <- function(DF, One, Two){
One <- enquo(One)
Two <- enquo(Two)
DF %>% filter(School == (!!One) & Code == (!!Two)) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
然后我可以使用该函数过滤两个变量 - 学校和代码 - 如下所示:
Funs(DF, "School1", "B344")
这一切都很好,但我的实际数据有很多变量,因此不必不断地将 "School" 和 "Code" 变量键入函数,我想使用 tidyverse 和 purrr包循环两个列表(一个学校,一个代码)并将其提供给过滤器。我希望输出是结果列表。
为了简单起见,要输入 dplyr::filter 的两个列表各只有两个值:School2 将使用 S300,School1 将使用 B344,就像上面的示例一样。
我试过的一些例子:
map2(c(“School2”, ”School1”),
c(“S300”, ”B344”),
function(x,y) {
DF %>% filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1))
}
还有...
map2(c("School2", "School1")),
c("S300","B344"),
~filter(School == .x & Code == .y) %>%
group_by(Code, School)%>%
summarise(Count = sum(Question1))
还有这个……
list(c("School2", "School1"), c("S300", "B344")) %>%
map2( ~ filter(School == .x & Code == .y) %>%
group_by(Code, School) %>%
summarise(Count = sum(Question1)))
None 这些似乎有效,因此将不胜感激!
示例数据:
Code <- c("B344","B555","S300","T220","B888","B888","B555","B344","B344","T220","B555","B555","S300","B555","S300","S300","S300","S300","B344","B344","B888","B888","B888")
School <- c("School1","School1","School2","School3","School4","School4","School1","School1","School3","School3","School4","School1","School1","School3","School2","School2","School4","School2","School3","School4","School3","School1","School2")
Question1 <- c(3,4,5,4,5,5,5,4,5,3,4,5,4,5,4,3,3,3,4,5,4,3,3)
Question2 <- c(5,4,3,4,3,5,4,3,2,3,4,5,4,5,4,3,4,4,5,4,3,3,4)
DF <- data_frame(Code, School, Question1, Question2)
这里有一些选项,从最像您的代码到最优化:
library(tidyverse)
DF <- data_frame(Code = c("B344", "B555", "S300", "T220", "B888", "B888", "B555", "B344", "B344", "T220", "B555", "B555", "S300", "B555", "S300", "S300", "S300", "S300", "B344", "B344", "B888", "B888", "B888"),
School = c("School1", "School1", "School2", "School3", "School4", "School4", "School1", "School1", "School3", "School3", "School4", "School1", "School1", "School3", "School2", "School2", "School4", "School2", "School3", "School4", "School3", "School1", "School2"),
Question1 = c(3, 4, 5, 4, 5, 5, 5, 4, 5, 3, 4, 5, 4, 5, 4, 3, 3, 3, 4, 5, 4, 3, 3),
Question2 = c(5, 4, 3, 4, 3, 5, 4, 3, 2, 3, 4, 5, 4, 5, 4, 3, 4, 4, 5, 4, 3, 3, 4))
wanted <- data_frame(School = c("School2", "School1"),
Code = c("S300", "B344"))
为了让 map2
工作,如果使用波浪符号,变量被命名为 .x
和 .y
;如果您使用常规函数符号,则可以随意调用它们。不要忘记 filter
的第一个参数是管道传入的数据帧,所以:
map2_dfr(wanted$School, wanted$Code, ~filter(DF, School == .x, Code == .y)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
由于我将 wanted
设置为数据框(原始列表也可以),您可以改用 pmap
。对于两个变量,带有 pmap
的参数名称实际上可以与 map2
相同,但它实际上是一个带有 ...
参数的函数,因此以不同的方式处理它们通常是有意义的,例如使用 ..1
表示法:
wanted %>%
pmap_dfr(~filter(DF, School == ..1, Code == ..2)) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
上述两种技术的问题在于,在规模上,它们会很慢,因为它们 运行 filter
对于 wanted
的每一行,这意味着你多次重新测试每一行。为了使代码保持相似,避免额外工作的一种有点 hacky 的方法是将列合并为一个列,例如tidyr::unite
:
DF %>%
unite(school_code, School, Code) %>%
filter(school_code %in% invoke(paste, wanted, sep = '_')) %>% # or paste(wanted$School, wanted$Code, sep = '_') or equivalent
separate(school_code, c('School', 'Code')) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
...或者将它们合并到 filter
本身:
DF %>%
filter(paste(School, Code) %in% paste(wanted$School, wanted$Code)) %>% # or invoke(paste, wanted)
group_by(School, Code) %>%
summarise_all(sum)
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0
获得所需结果的最佳 方法可能更明显,因为我已经将 wanted
设置为数据框:一个连接,它被设计准确地完成这项工作:
DF %>%
inner_join(wanted) %>%
group_by(School, Code) %>%
summarise_all(sum)
#> Joining, by = c("Code", "School")
#> # A tibble: 2 x 4
#> # Groups: School [?]
#> School Code Question1 Question2
#> <chr> <chr> <dbl> <dbl>
#> 1 School1 B344 7.00 8.00
#> 2 School2 S300 15.0 14.0