如何在 R 中的不同大小的列中找到公共元素?

How to find the common elements among different sized columns in R?

我有一个名为 animals 的数据框,其中包含不同大小的列,这些列之间有一些常见和不常见的元素,如下所示:

Dog     Cat      Lion     Dog
Cat     Lion     Dog      Shark
Lion    Dog      Shark    Cat
Shark   Shark    Cat      Lion
        Whale    Seal     Moose
        Seal              Whale
                          Deer

我想要做的是识别每一列中的所有共同元素,排除不常见的元素并将共同元素组合成一列,如下所示:

Dog
Cat
Lion
Shark

到目前为止,我已经尝试使用 duplicated(animals) 识别重复元素,然后使用 animals[duplicated(animals)] 提取重复元素,但这没有给出任何结果。谁有更好的方法?

我们可以使用intersect

Reduce(intersect, animals)
#[1] "Dog"   "Cat"   "Lion"  "Shark"

或者可以使用tidyverse

library(dplyr)
library(tidyr)
pivot_longer(animals, cols = everything(), values_drop_na = TRUE) %>% 
     group_by(value) %>% 
     filter(n_distinct(name) == ncol(animals)) %>% 
     ungroup %>% 
     distinct(value)
# A tibble: 4 x 1
#  value
#  <chr>
#1 Dog  
#2 Cat  
#3 Lion 
#4 Shark

数据

animals <- structure(list(v1 = c("Dog", "Cat", "Lion", "Shark", NA, NA, 
NA), v2 = c("Cat", "Lion", "Dog", "Shark", "Whale", "Seal", NA
), v3 = c("Lion", "Dog", "Shark", "Cat", "Seal", NA, NA), v4 = c("Dog", 
"Shark", "Cat", "Lion", "Moose", "Whale", "Deer")), 
    class = "data.frame", row.names = c(NA, 
-7L))

另一个基本 R 选项使用 stack + table + rowSums

> names(which(rowSums(table(na.omit(stack(animals)))) == ncol(animals)))
[1] "Cat"   "Dog"   "Lion"  "Shark"

下面我们把代码分解成几个步骤

> stack(animals)
   values ind
1     Dog  v1
2     Cat  v1
3    Lion  v1
4   Shark  v1
5    <NA>  v1
6    <NA>  v1
7    <NA>  v1
8     Cat  v2
9    Lion  v2
10    Dog  v2
11  Shark  v2
12  Whale  v2
13   Seal  v2
14   <NA>  v2
15   Lion  v3
16    Dog  v3
17  Shark  v3
18    Cat  v3
19   Seal  v3
20   <NA>  v3
21   <NA>  v3
22    Dog  v4
23  Shark  v4
24    Cat  v4
25   Lion  v4
26  Moose  v4
27  Whale  v4
28   Deer  v4

> na.omit(stack(animals))
   values ind
1     Dog  v1
2     Cat  v1
3    Lion  v1
4   Shark  v1
8     Cat  v2
9    Lion  v2
10    Dog  v2
11  Shark  v2
12  Whale  v2
13   Seal  v2
15   Lion  v3
16    Dog  v3
17  Shark  v3
18    Cat  v3
19   Seal  v3
22    Dog  v4
23  Shark  v4
24    Cat  v4
25   Lion  v4
26  Moose  v4
27  Whale  v4
28   Deer  v4

> table(na.omit(stack(animals)))
       ind
values  v1 v2 v3 v4
  Cat    1  1  1  1
  Deer   0  0  0  1
  Dog    1  1  1  1
  Lion   1  1  1  1
  Moose  0  0  0  1
  Seal   0  1  1  0
  Shark  1  1  1  1
  Whale  0  1  0  1

> rowSums(table(na.omit(stack(animals))))
  Cat  Deer   Dog  Lion Moose  Seal Shark Whale
    4     1     4     4     1     2     4     2