将中间列表输出保存在 dplyr 管道中，并将其映射回管道下方的另一个列表 - R

Question

我正在运行使用 dplyr 管道对数据集中的组进行 pcas。我从 group_split 开始，所以我正在处理一个列表。为了运行 prcomp() 函数，只能包含每个列表的 numeric 列，但我想在最后带回 factor 列用于绘图.我曾尝试在管道中途使用 {. ->> temp} 保存中间输出，但由于它是一个列表，我不知道如何在绘图时为分组列编制索引。

library(tidyverse)
library(ggbiplot)

iris %>%
  group_split(Species, keep = T) %>% #group by species, one pca per species
  {. ->> temp} %>%  # save intermediate output to preserve species column for use in plotting later
  map(~.x %>% select_if(is.numeric) %>% select_if(~var(.) != 0) %>% 
        prcomp(scale. = TRUE))%>% #run pca on numeric columns only
  map(~ggbiplot(.x), label=temp$Species)#plot each pca, labeling points as species names form the temporary object

这可以为 iris 数据集中的每个物种生成一个 pca 图，但由于 temp$species = NULL，这些点未标记。

Answer 1

一种选择是使用 split 和 imap

library(tidyverse)
library(ggbiplot)
iris %>%
split(.$Species) %>%  # save intermediate output to preserve species column for use in plotting later
map(~.x %>% select_if(is.numeric) %>% select_if(~var(.) != 0) %>% 
        prcomp(scale. = TRUE)) %>% 
imap(~ggbiplot(.x, labels = .y))

Answer 2

如果您使用 map2() 并将 .y 参数作为物种列表传递，您可以获得我认为您想要的结果。请注意，在您的原始代码中，labels 参数位于 ggbiplot() 函数之外，因此被忽略了。

library(tidyverse)
library(ggbiplot)

iris %>%
  group_split(Species, keep = T) %>% 
  {. ->> temp} %>%  
  map(~.x %>% 
        select_if(is.numeric) %>%
        select_if(~var(.) != 0) %>% 
        prcomp(scale. = TRUE)) %>% 
  map2(map(temp, "Species"), ~ggbiplot(.x, labels = .y))

针对您的评论，如果您想添加第三个参数，您可以使用 pmap() 而不是 map2()。在下面的示例中，pmap() 被传递给 ggbiplot() 参数的（嵌套）数据列表。请注意，我已经更改了 new 变量，因此它是一个因素，而不是跨组的常数。

iris %>%
  mutate(new = factor(sample(1:3, 150, replace = TRUE))) %>%
  group_split(Species, keep = T) %>% 
  {. ->> temp} %>%  
  map(~.x %>% 
        select_if(is.numeric) %>%
        select_if(~var(.) != 0) %>% 
        prcomp(scale. = TRUE)) %>% 
  list(map(temp, "Species"), map(temp, "new")) %>%
  pmap(~ ggbiplot(pcobj = ..1, labels = ..2, groups = ..3))

将中间列表输出保存在 dplyr 管道中，并将其映射回管道下方的另一个列表 - R

Save intermediate list output in dplyr pipeline and map it back to another list further down the pipeline - R

r

pca

ggbiplot

dplyr