在 r 中的列表的数据帧的匹配行之间找到列的并集
find the union of a column between matched rows of data frames of a list in r
我有一个数据框列表。当数据框之间的列名称匹配时,我想在所有数据框中取列值的并集。
这里是玩具数据
df1 <- data.frame(group = c("G1", "G1", "G1", "G1", "G1", "G2", "G2", "G2", "G1", "G1"),
Name = c("B", "B","B", "A", "A",'D',"D" , "E", "C", "C"), value = c(2,4,5,2,4,7, 1, 2,4,1))
df2 <- data.frame(group = c("G1", "G1", "G1", "G1", "G2", "G2", "G2", "G2" , "G1", "G1"),
Name = c("B", "B" , "A", "A", "D", "E", "E", "E", "C", "C"), value = c(2, 3, 5, 1, 7, 2, 4, 8, 9,1))
df <- rbind(df1, df2)
df.list <- split(df, f=df$group)
愿望输出如下:
B = 2,3,4,5
A = 1,2,4,5
D = 1,7
E = 2,4,8
C = 1,4,9
我将使用 tidyverse 来解决问题,并假设所需的输出是一个向量列表。在解决方案中,我确保只保留 df1
和 df2
之间共有的 Name
。
library(tidyverse)
bind_rows(df1, df2) %>%
filter(Name %in% df1$Name, Name %in% df2$Name) %>%
split(.$Name) %>%
map(~ sort(unique(.x$value)))
输出:
$A
[1] 1 2 4 5
$B
[1] 2 3 4 5
$C
[1] 1 4 9
$D
[1] 1 7
$E
[1] 2 4 8
如果有两个以上的数据框,您可以将它们全部放在一个列表中并使用此解决方案,该解决方案适用于任意数量的数据框。
library(tidyverse)
dfs = list(df1, df2)
# First identify the common names within the data frames
common_names = dfs %>%
map(`[[`, "Name") %>%
reduce(intersect)
common_names
#> [1] "B" "A" "D" "E" "C"
# Now we can do the same thing as earlier
dfs %>%
reduce(bind_rows) %>%
filter(Name %in% common_names) %>%
split(.$Name) %>%
map(~ sort(unique(.x$value)))
我有一个数据框列表。当数据框之间的列名称匹配时,我想在所有数据框中取列值的并集。
这里是玩具数据
df1 <- data.frame(group = c("G1", "G1", "G1", "G1", "G1", "G2", "G2", "G2", "G1", "G1"),
Name = c("B", "B","B", "A", "A",'D',"D" , "E", "C", "C"), value = c(2,4,5,2,4,7, 1, 2,4,1))
df2 <- data.frame(group = c("G1", "G1", "G1", "G1", "G2", "G2", "G2", "G2" , "G1", "G1"),
Name = c("B", "B" , "A", "A", "D", "E", "E", "E", "C", "C"), value = c(2, 3, 5, 1, 7, 2, 4, 8, 9,1))
df <- rbind(df1, df2)
df.list <- split(df, f=df$group)
愿望输出如下:
B = 2,3,4,5
A = 1,2,4,5
D = 1,7
E = 2,4,8
C = 1,4,9
我将使用 tidyverse 来解决问题,并假设所需的输出是一个向量列表。在解决方案中,我确保只保留 df1
和 df2
之间共有的 Name
。
library(tidyverse)
bind_rows(df1, df2) %>%
filter(Name %in% df1$Name, Name %in% df2$Name) %>%
split(.$Name) %>%
map(~ sort(unique(.x$value)))
输出:
$A
[1] 1 2 4 5
$B
[1] 2 3 4 5
$C
[1] 1 4 9
$D
[1] 1 7
$E
[1] 2 4 8
如果有两个以上的数据框,您可以将它们全部放在一个列表中并使用此解决方案,该解决方案适用于任意数量的数据框。
library(tidyverse)
dfs = list(df1, df2)
# First identify the common names within the data frames
common_names = dfs %>%
map(`[[`, "Name") %>%
reduce(intersect)
common_names
#> [1] "B" "A" "D" "E" "C"
# Now we can do the same thing as earlier
dfs %>%
reduce(bind_rows) %>%
filter(Name %in% common_names) %>%
split(.$Name) %>%
map(~ sort(unique(.x$value)))