Select 来自 data.frame 基于 R 中的另一个 data.frame

Question

我有一个非常简单的问题，我在 R 中努力解决（在其他编码系统中找到很多答案）。

我有一个 data.frame，其 ID 字段有多个 ID：

> data_new <- data.frame(ID_ornitho = c("1344", "2364", "1111","2254"))
> data_new
  ID_ornitho
1       1344
2       2364
3       1111
4       2254

我有另一个 data.frame，ID 已经被使用：

> data_old <- data.frame(ID_ornitho = c("2354", "2364", "2254","1354"))
> data_old
  ID_ornitho
1       2354
2       2364
3       2254
4       1354

我想做的是从 data_new 中删除与 data_old 中已使用的 ID 对应的行，实现这一点：

> data_filtered
  ID_ornitho
1       1344
2       1111

太简单了，我找不到简单的方法来做！

Answer 1

您可以使用 dplyr 过滤现有 ID：

library(dplyr)
data_old <- data.frame(ID_ornitho = c("2354", "2364", "2254","1354"))
data_new <- data.frame(ID_ornitho = c("1344", "2364", "1111","2254"))
data_new %>% filter(!(ID_ornitho %in% data_old$ID_ornitho))

这给出了

data_new %>% filter(!(ID_ornitho %in% data_old$ID_ornitho))
  ID_ornitho
1       1344
2       1111

Answer 2

留在基础上，您可以使用逻辑向量对 data_new 进行子集化，如下所示：

data.frame(ID_ornitho=
               data_new[!data_new$ID_ornitho %in% data_old$ID_ornitho, ])

有关详细信息和更多示例，请参阅 ? match。

Answer 3

来自 dplyr 包的 anti_join 的完美用例：

library(dplyr)
anti_join(df1, df2, by="ID_ornitho")

  ID_ornitho
1       1344
3       1111

Answer 4

您可以轻松使用库(dplyr)

anti_join(x,y by="ID_ornitho", copy= False)

Select 来自 data.frame 基于 R 中的另一个 data.frame

Select from data.frame based on another data.frame in R

r

subset