在 R 中灵活地跨列查找重复值的独特案例
Unique case of finding duplicate values flexibly across columns in R
我有一个类似于以下的数据集:
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
> df
animal_1 predation_type animal_2
1 cat eats mouse
2 dog eats squirrel
3 mouse eaten by cat
4 squirrel eats nuts
我正在寻找将第 1 行和第 3 行标识为重复项的代码,因为它们显示相同的现象(猫吃老鼠或老鼠被猫吃掉)。我不确定如何询问我正在寻找什么样的重复案例,所以我希望有人能提供帮助。我试过将文本合并到一栏中(即“catmouse”、“dogsquirrel”等),然后反转字母,但很快证明这太复杂了。
非常感谢您提供的任何帮助。
您可以 sort()
数据框 duplicated()
有用。
newdf = df[, c('animal_1', 'animal_2')]
for (i in 1:nrow(df)){
newdf[i, ] = sort(df[i,])
}
newdf[!(duplicated(newdf$animal_1) & duplicated(newdf$animal_2)),]
animal_1 animal_2
1 cat mouse
2 dog squirrel
4 nuts squirrel
tidyverse
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
library(tidyverse)
df %>%
rowwise() %>%
mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>%
group_by(duplicates) %>%
mutate(duplicates = n() > 1) %>%
ungroup()
#> # A tibble: 4 x 4
#> animal_1 predation_type animal_2 duplicates
#> <chr> <chr> <chr> <lgl>
#> 1 cat eats mouse TRUE
#> 2 dog eats squirrel FALSE
#> 3 mouse eaten by cat TRUE
#> 4 squirrel eats nuts FALSE
由 reprex package (v2.0.1)
创建于 2022-01-17
删除重复项
library(tidyverse)
df %>%
filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
#> animal_1 predation_type animal_2
#> 1 cat eats mouse
#> 2 dog eats squirrel
#> 3 squirrel eats nuts
由 reprex package (v2.0.1)
创建于 2022-01-17
我有一个类似于以下的数据集:
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
> df
animal_1 predation_type animal_2
1 cat eats mouse
2 dog eats squirrel
3 mouse eaten by cat
4 squirrel eats nuts
我正在寻找将第 1 行和第 3 行标识为重复项的代码,因为它们显示相同的现象(猫吃老鼠或老鼠被猫吃掉)。我不确定如何询问我正在寻找什么样的重复案例,所以我希望有人能提供帮助。我试过将文本合并到一栏中(即“catmouse”、“dogsquirrel”等),然后反转字母,但很快证明这太复杂了。
非常感谢您提供的任何帮助。
您可以 sort()
数据框 duplicated()
有用。
newdf = df[, c('animal_1', 'animal_2')]
for (i in 1:nrow(df)){
newdf[i, ] = sort(df[i,])
}
newdf[!(duplicated(newdf$animal_1) & duplicated(newdf$animal_2)),]
animal_1 animal_2
1 cat mouse
2 dog squirrel
4 nuts squirrel
tidyverse
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
library(tidyverse)
df %>%
rowwise() %>%
mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>%
group_by(duplicates) %>%
mutate(duplicates = n() > 1) %>%
ungroup()
#> # A tibble: 4 x 4
#> animal_1 predation_type animal_2 duplicates
#> <chr> <chr> <chr> <lgl>
#> 1 cat eats mouse TRUE
#> 2 dog eats squirrel FALSE
#> 3 mouse eaten by cat TRUE
#> 4 squirrel eats nuts FALSE
由 reprex package (v2.0.1)
创建于 2022-01-17删除重复项
library(tidyverse)
df %>%
filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
#> animal_1 predation_type animal_2
#> 1 cat eats mouse
#> 2 dog eats squirrel
#> 3 squirrel eats nuts
由 reprex package (v2.0.1)
创建于 2022-01-17