如何根据其他列找到一列的不同之处

Question

我有一个如下所示的数据框

col1    col2    col3
A       Z       10
A       Y       8
A       Z       15
B       X       11
B       Z       7
C       Y       10
D       Z       11
D       Y       14
D       L       16

我必须 select，每个 distinct col1 col2 中的哪个 max(col3)

输出数据框应该是这样的，

col1    col2    col3
A       Z       15
B       X       11
C       Y       10
D       L       16

如何在 R 或 SQL

中执行此操作

提前致谢

Answer 1

我们可以使用data.table。我们将 'data.frame' 转换为 'data.table' (setDT(df1))，按 'col1' 分组，我们根据索引对 data.table (.SD) 进行子集化'col3'

的最大值

library(data.table)
setDT(df1)[, .SD[which.max(col3)], col1]
#     col1 col2 col3
#1:    A    Z   15
#2:    B    X   11
#3:    C    Y   10
#4:    D    L   16

或者我们可以在按 'col1' 分组后使用 dplyr 中的 top_n。

library(dplyr)
df1 %>%
      group_by(col1) %>%
      top_n(1)

Answer 2

SQL 答案：

如果没有其他具有更高 col3 值的具有相同 col1 值的行，则使用 NOT EXISTS 到 return 行。

select *
from tablename t1
where not exists (select 1 from tablename t2
                  where t2.col1 = t1.col1
                    and t2.col3 > t1.col3)

如果有 max(c3) 并列，将 return 两行用于 col1。

Answer 3

MySQL 中的另一种做法。

这是SQLFiddle Demo

输出 : =>

SELECT T1.*
FROM
table_name T1
INNER JOIN 
(SELECT col1,MAX(col3) AS Max_col3 FROM table_name GROUP BY col1) T2 
            ON T1.`col1` = T2.`col1` and T2.`Max_col3`=t1.`col3`

希望对您有所帮助。

如何根据其他列找到一列的不同之处

How to find the distinct of one column based on other columns

mysql

sql

r

groupwise-maximum