当某些值匹配而某些值不匹配时删除行

Question

ID Amount Previous 
1  10     15
1  10     13
2  20     18
2  20     24
3  5      7
3  5      6

我想从以下数据框中删除重复的行，其中 ID 和 Amount 匹配。上一列中的值不匹配。在决定取哪一行时，我想取上一列值较高的那一行。

这看起来像：

ID Amount Previous 
1  10     15
2  20     24
3  5      7

Answer 1

'ID'、'Amount' 列上的一个选项是 distinct（在 arrange 数据集之后），同时指定 .keep_all = TRUE 以获取所有与这些列中的不同元素相对应的其他列

library(dplyr)
df1 %>% 
    arrange(ID, Amount, desc(Previous)) %>%
    distinct(ID, Amount, .keep_all = TRUE)
#   ID Amount Previous
#1  1     10       15
#2  2     20       24
#3  3      5        7

或将 base R 中的 duplicated 应用于 'ID'、'Amount' 以创建逻辑 vector 并使用它来子集行数据集

df2 <- df1[with(df1, order(ID, Amount, -Previous)),]
df2[!duplicated(df2[c('ID', 'Amount')]),]
#  ID Amount Previous
#1  1     10       15
#3  2     20       24
#5  3      5        7

数据

df1 <- structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L), Amount = c(10L, 
10L, 20L, 20L, 5L, 5L), Previous = c(15L, 13L, 18L, 24L, 7L, 
6L)), class = "data.frame", row.names = c(NA, -6L))

当某些值匹配而某些值不匹配时删除行

Removing rows when some values match and some do not

r

rows

duplicates

数据