在 r 中的列表的数据帧的匹配行之间找到列的并集

Question

我有一个数据框列表。当数据框之间的列名称匹配时，我想在所有数据框中取列值的并集。

这里是玩具数据

df1 <- data.frame(group = c("G1", "G1", "G1", "G1", "G1", "G2", "G2", "G2", "G1", "G1"), 
              Name = c("B", "B","B", "A", "A",'D',"D" , "E", "C", "C"), value = c(2,4,5,2,4,7, 1, 2,4,1))
df2 <- data.frame(group = c("G1", "G1", "G1", "G1", "G2", "G2", "G2", "G2" , "G1", "G1"), 
              Name = c("B", "B" , "A", "A", "D", "E", "E", "E", "C", "C"), value = c(2, 3, 5, 1, 7, 2, 4, 8, 9,1))
df <- rbind(df1, df2)

df.list <- split(df, f=df$group)

愿望输出如下：

  B = 2,3,4,5
  A = 1,2,4,5
  D = 1,7
  E = 2,4,8
  C = 1,4,9

Answer 1

我将使用 tidyverse 来解决问题，并假设所需的输出是一个向量列表。在解决方案中，我确保只保留 df1 和 df2 之间共有的 Name。

library(tidyverse)
bind_rows(df1, df2) %>% 
  filter(Name %in% df1$Name, Name %in% df2$Name) %>%
  split(.$Name) %>% 
  map(~ sort(unique(.x$value)))

输出：

$A
[1] 1 2 4 5

$B
[1] 2 3 4 5

$C
[1] 1 4 9

$D
[1] 1 7

$E
[1] 2 4 8

如果有两个以上的数据框，您可以将它们全部放在一个列表中并使用此解决方案，该解决方案适用于任意数量的数据框。

library(tidyverse)
dfs = list(df1, df2)
# First identify the common names within the data frames
common_names = dfs %>%
  map(`[[`, "Name") %>%
  reduce(intersect)
common_names
#> [1] "B" "A" "D" "E" "C"

# Now we can do the same thing as earlier
dfs %>%
  reduce(bind_rows) %>%
  filter(Name %in% common_names) %>%
  split(.$Name) %>%
  map(~ sort(unique(.x$value)))

在 r 中的列表的数据帧的匹配行之间找到列的并集

find the union of a column between matched rows of data frames of a list in r

union

r

list

dataframe

dplyr