如何在 R 中的不同大小的列中找到公共元素?
How to find the common elements among different sized columns in R?
我有一个名为 animals
的数据框,其中包含不同大小的列,这些列之间有一些常见和不常见的元素,如下所示:
Dog Cat Lion Dog
Cat Lion Dog Shark
Lion Dog Shark Cat
Shark Shark Cat Lion
Whale Seal Moose
Seal Whale
Deer
我想要做的是识别每一列中的所有共同元素,排除不常见的元素并将共同元素组合成一列,如下所示:
Dog
Cat
Lion
Shark
到目前为止,我已经尝试使用 duplicated(animals)
识别重复元素,然后使用 animals[duplicated(animals)]
提取重复元素,但这没有给出任何结果。谁有更好的方法?
我们可以使用intersect
Reduce(intersect, animals)
#[1] "Dog" "Cat" "Lion" "Shark"
或者可以使用tidyverse
library(dplyr)
library(tidyr)
pivot_longer(animals, cols = everything(), values_drop_na = TRUE) %>%
group_by(value) %>%
filter(n_distinct(name) == ncol(animals)) %>%
ungroup %>%
distinct(value)
# A tibble: 4 x 1
# value
# <chr>
#1 Dog
#2 Cat
#3 Lion
#4 Shark
数据
animals <- structure(list(v1 = c("Dog", "Cat", "Lion", "Shark", NA, NA,
NA), v2 = c("Cat", "Lion", "Dog", "Shark", "Whale", "Seal", NA
), v3 = c("Lion", "Dog", "Shark", "Cat", "Seal", NA, NA), v4 = c("Dog",
"Shark", "Cat", "Lion", "Moose", "Whale", "Deer")),
class = "data.frame", row.names = c(NA,
-7L))
另一个基本 R 选项使用 stack
+ table
+ rowSums
> names(which(rowSums(table(na.omit(stack(animals)))) == ncol(animals)))
[1] "Cat" "Dog" "Lion" "Shark"
下面我们把代码分解成几个步骤
> stack(animals)
values ind
1 Dog v1
2 Cat v1
3 Lion v1
4 Shark v1
5 <NA> v1
6 <NA> v1
7 <NA> v1
8 Cat v2
9 Lion v2
10 Dog v2
11 Shark v2
12 Whale v2
13 Seal v2
14 <NA> v2
15 Lion v3
16 Dog v3
17 Shark v3
18 Cat v3
19 Seal v3
20 <NA> v3
21 <NA> v3
22 Dog v4
23 Shark v4
24 Cat v4
25 Lion v4
26 Moose v4
27 Whale v4
28 Deer v4
> na.omit(stack(animals))
values ind
1 Dog v1
2 Cat v1
3 Lion v1
4 Shark v1
8 Cat v2
9 Lion v2
10 Dog v2
11 Shark v2
12 Whale v2
13 Seal v2
15 Lion v3
16 Dog v3
17 Shark v3
18 Cat v3
19 Seal v3
22 Dog v4
23 Shark v4
24 Cat v4
25 Lion v4
26 Moose v4
27 Whale v4
28 Deer v4
> table(na.omit(stack(animals)))
ind
values v1 v2 v3 v4
Cat 1 1 1 1
Deer 0 0 0 1
Dog 1 1 1 1
Lion 1 1 1 1
Moose 0 0 0 1
Seal 0 1 1 0
Shark 1 1 1 1
Whale 0 1 0 1
> rowSums(table(na.omit(stack(animals))))
Cat Deer Dog Lion Moose Seal Shark Whale
4 1 4 4 1 2 4 2
我有一个名为 animals
的数据框,其中包含不同大小的列,这些列之间有一些常见和不常见的元素,如下所示:
Dog Cat Lion Dog
Cat Lion Dog Shark
Lion Dog Shark Cat
Shark Shark Cat Lion
Whale Seal Moose
Seal Whale
Deer
我想要做的是识别每一列中的所有共同元素,排除不常见的元素并将共同元素组合成一列,如下所示:
Dog
Cat
Lion
Shark
到目前为止,我已经尝试使用 duplicated(animals)
识别重复元素,然后使用 animals[duplicated(animals)]
提取重复元素,但这没有给出任何结果。谁有更好的方法?
我们可以使用intersect
Reduce(intersect, animals)
#[1] "Dog" "Cat" "Lion" "Shark"
或者可以使用tidyverse
library(dplyr)
library(tidyr)
pivot_longer(animals, cols = everything(), values_drop_na = TRUE) %>%
group_by(value) %>%
filter(n_distinct(name) == ncol(animals)) %>%
ungroup %>%
distinct(value)
# A tibble: 4 x 1
# value
# <chr>
#1 Dog
#2 Cat
#3 Lion
#4 Shark
数据
animals <- structure(list(v1 = c("Dog", "Cat", "Lion", "Shark", NA, NA,
NA), v2 = c("Cat", "Lion", "Dog", "Shark", "Whale", "Seal", NA
), v3 = c("Lion", "Dog", "Shark", "Cat", "Seal", NA, NA), v4 = c("Dog",
"Shark", "Cat", "Lion", "Moose", "Whale", "Deer")),
class = "data.frame", row.names = c(NA,
-7L))
另一个基本 R 选项使用 stack
+ table
+ rowSums
> names(which(rowSums(table(na.omit(stack(animals)))) == ncol(animals)))
[1] "Cat" "Dog" "Lion" "Shark"
下面我们把代码分解成几个步骤
> stack(animals)
values ind
1 Dog v1
2 Cat v1
3 Lion v1
4 Shark v1
5 <NA> v1
6 <NA> v1
7 <NA> v1
8 Cat v2
9 Lion v2
10 Dog v2
11 Shark v2
12 Whale v2
13 Seal v2
14 <NA> v2
15 Lion v3
16 Dog v3
17 Shark v3
18 Cat v3
19 Seal v3
20 <NA> v3
21 <NA> v3
22 Dog v4
23 Shark v4
24 Cat v4
25 Lion v4
26 Moose v4
27 Whale v4
28 Deer v4
> na.omit(stack(animals))
values ind
1 Dog v1
2 Cat v1
3 Lion v1
4 Shark v1
8 Cat v2
9 Lion v2
10 Dog v2
11 Shark v2
12 Whale v2
13 Seal v2
15 Lion v3
16 Dog v3
17 Shark v3
18 Cat v3
19 Seal v3
22 Dog v4
23 Shark v4
24 Cat v4
25 Lion v4
26 Moose v4
27 Whale v4
28 Deer v4
> table(na.omit(stack(animals)))
ind
values v1 v2 v3 v4
Cat 1 1 1 1
Deer 0 0 0 1
Dog 1 1 1 1
Lion 1 1 1 1
Moose 0 0 0 1
Seal 0 1 1 0
Shark 1 1 1 1
Whale 0 1 0 1
> rowSums(table(na.omit(stack(animals))))
Cat Deer Dog Lion Moose Seal Shark Whale
4 1 4 4 1 2 4 2