如何在 R 中将一列的元素与 group_by 相交
how to intersect elements of one column along with group_by in R
假设,我的数据是这样的
group_id col1
1 1 A,B
2 1 B,C
3 2 A,C
4 2 B,D
5 3 A,D
6 3 A,B,C,D
我想要 summarise/mutate col1,其元素在同一组内相交(超过 group_id)。我需要的输出就像(如果总结)
group_id col1
1 1 B
2 2 <NA>
3 3 A,D
或像这样(如果发生变异)
group_id col1
1 1 B
2 1 B
3 2 <NA>
4 2 <NA>
5 3 A,D
6 3 A,D
我可以通过使用函数 toString
轻松创建联合,但我绞尽脑汁想知道如何在输出中包含公共元素。基本上 intersect
需要至少两个参数,因此在这里不起作用。
dput(df) 如下
df <- structure(list(group_id = c(1L, 1L, 2L, 2L, 3L, 3L), col1 = c("A,B",
"B,C", "A,C", "B,D", "A,D", "A,B,C,D")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
您可以用逗号分隔 col1
并使用 Reduce
+ intersect
以获得每个 group_id
.
中的公共值
library(dplyr)
df %>%
group_by(group_id) %>%
summarise(col1 = toString(Reduce(intersect, strsplit(col1, ','))))
# group_id col1
#* <int> <chr>
#1 1 "B"
#2 2 ""
#3 3 "A, D"
这行得通吗:
library(dplyr)
library(tidyr)
df %>% separate_rows(col1) %>%
group_by(group_id, col1) %>% filter(n()>1) %>%
distinct() %>% group_by(group_id) %>% summarise(col1 = toString(col1)) %>%
right_join(df %>% select(group_id) %>% distinct()) %>%
arrange(group_id)
`summarise()` ungrouping output (override with `.groups` argument)
Joining, by = "group_id"
# A tibble: 3 x 2
group_id col1
<int> <chr>
1 1 B
2 2 NA
3 3 A, D
使用 dplyr
和 tidyr
的一个选项可能是:
df %>%
separate_rows(col1) %>%
count(group_id, col1) %>%
group_by(group_id) %>%
summarise(col1 = if_else(all(n == 1), NA_character_, paste(col1[n == 2], collapse = ",")))
group_id col1
<int> <chr>
1 1 B
2 2 <NA>
3 3 A,D
假设,我的数据是这样的
group_id col1
1 1 A,B
2 1 B,C
3 2 A,C
4 2 B,D
5 3 A,D
6 3 A,B,C,D
我想要 summarise/mutate col1,其元素在同一组内相交(超过 group_id)。我需要的输出就像(如果总结)
group_id col1
1 1 B
2 2 <NA>
3 3 A,D
或像这样(如果发生变异)
group_id col1
1 1 B
2 1 B
3 2 <NA>
4 2 <NA>
5 3 A,D
6 3 A,D
我可以通过使用函数 toString
轻松创建联合,但我绞尽脑汁想知道如何在输出中包含公共元素。基本上 intersect
需要至少两个参数,因此在这里不起作用。
dput(df) 如下
df <- structure(list(group_id = c(1L, 1L, 2L, 2L, 3L, 3L), col1 = c("A,B",
"B,C", "A,C", "B,D", "A,D", "A,B,C,D")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
您可以用逗号分隔 col1
并使用 Reduce
+ intersect
以获得每个 group_id
.
library(dplyr)
df %>%
group_by(group_id) %>%
summarise(col1 = toString(Reduce(intersect, strsplit(col1, ','))))
# group_id col1
#* <int> <chr>
#1 1 "B"
#2 2 ""
#3 3 "A, D"
这行得通吗:
library(dplyr)
library(tidyr)
df %>% separate_rows(col1) %>%
group_by(group_id, col1) %>% filter(n()>1) %>%
distinct() %>% group_by(group_id) %>% summarise(col1 = toString(col1)) %>%
right_join(df %>% select(group_id) %>% distinct()) %>%
arrange(group_id)
`summarise()` ungrouping output (override with `.groups` argument)
Joining, by = "group_id"
# A tibble: 3 x 2
group_id col1
<int> <chr>
1 1 B
2 2 NA
3 3 A, D
使用 dplyr
和 tidyr
的一个选项可能是:
df %>%
separate_rows(col1) %>%
count(group_id, col1) %>%
group_by(group_id) %>%
summarise(col1 = if_else(all(n == 1), NA_character_, paste(col1[n == 2], collapse = ",")))
group_id col1
<int> <chr>
1 1 B
2 2 <NA>
3 3 A,D