PostgreSQL 通过 GROUP BY 删除重复项
PostgreSQL remove duplicates by GROUP BY
我想打印一个人的最后一条消息,但每个人只能打印他的最新消息。我使用 PostgreSQL 10.
+-----------+----------+--------------+
| name | body | created_at |
+-----------+----------+--------------+
| Maria | Test3 | 2017-07-07 |
| Paul | Test5 | 2017-06-01 |
+-----------+----------+--------------+
我已经用下面的 SQL 查询试过了,这给了我正确的回报,但不幸的是,人们加倍了。
SELECT * FROM messages
WHERE receive = 't'
GROUP BY name
ORDER BY MAX(created_at) DESC
+-----------+----------+--------------+
| name | body | created_at |
+-----------+----------+--------------+
| Maria | Test1 | 2016-06-01 |
| Maria | Test2 | 2016-11-01 |
| Maria | Test3 | 2017-07-07 |
| Paul | Test4 | 2017-01-01 |
| Paul | Test5 | 2017-06-01 |
+-----------+----------+--------------+
我尝试使用 DISTINCT 删除重复项,但不幸的是我收到此错误消息:
SELECT DISTINCT ON (name) * FROM messages
WHERE receive = 't'
GROUP BY name
ORDER BY MAX(created_at) DESC
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions LINE 1: SELECT DISTINCT ON (name) * FROM messages ^ : SELECT DISTINCT ON (name) * FROM messages WHERE receive = 't' GROUP BY name ORDER BY MAX(created_at) DESC
你有什么办法解决这个问题吗?
您将按如下方式使用 DISTINCT ON
:
SELECT DISTINCT ON (name) *
FROM messages
WHERE receive = 't'
ORDER BY name, created_at DESC
即:
不需要 GROUP BY
子句
DISTINCT ON(...)
中列出的列必须首先出现在 ORDER BY
子句中
...后面是应该用来分组的列(这里是created_at
)
请注意,distinct on
查询的结果始终按子句中的列排序(因为这种排序用于确定应保留哪些行)。
如果您想更好地控制排序顺序,则可以改用 window 函数:
SELECT *
FROM (
SELECT m.*, ROW_NUMBER() OVER(PARTITION BY name ORDER BY created_at DESC) rn
FROM messages m
WHERE receive = 't'
) t
WHERE rn = 1
ORDER BY created_at DESC
使用DISTINCT ON
,但用正确的ORDER BY
:
SELECT DISTINCT ON (name) m.*
FROM messages m
WHERE receive = 't'
ORDER BY name, created_at DESC;
一般来说,您不会将 DISTINCT ON
与 GROUP BY
一起使用。它与 ORDER BY
一起使用。它的工作方式是根据 ORDER BY
子句为每个 name
选择第一行。
你不应该把你正在做的事情想成聚合。您要根据 created_at
进行过滤。在许多数据库中,您可以使用相关子查询来表达这一点:
select m.*
from messages m
where m.created_at = (select max(m2.created_at)
from messages m2
where m2.name = m.name and m2.receive = 't'
) and
m.receive = 't'; -- this condition is probably not needed
SELECT *
FROM messages
WHERE receive = 't' and not exists (
select 1
from messages m
where m.receive = message.receive and messages.name = m.name and m.created_at > messages.created_at
)
ORDER BY created_at DESC
上面的查询找到了满足以下条件的邮件:
- 收到的是't'
- 不存在另一条消息
- 接收值相同
- 同名
- 并且更新
假设同名没有同时发送两条消息,这应该足够了。另一个要点是名称可能看起来相似,但如果值中存在一些白色字符,则名称可能看起来不同,因此,如果您在结果中看到两条名称相同但 created_at 不同的记录如上查询,那极有可能是白字在捉弄你
我想打印一个人的最后一条消息,但每个人只能打印他的最新消息。我使用 PostgreSQL 10.
+-----------+----------+--------------+
| name | body | created_at |
+-----------+----------+--------------+
| Maria | Test3 | 2017-07-07 |
| Paul | Test5 | 2017-06-01 |
+-----------+----------+--------------+
我已经用下面的 SQL 查询试过了,这给了我正确的回报,但不幸的是,人们加倍了。
SELECT * FROM messages
WHERE receive = 't'
GROUP BY name
ORDER BY MAX(created_at) DESC
+-----------+----------+--------------+
| name | body | created_at |
+-----------+----------+--------------+
| Maria | Test1 | 2016-06-01 |
| Maria | Test2 | 2016-11-01 |
| Maria | Test3 | 2017-07-07 |
| Paul | Test4 | 2017-01-01 |
| Paul | Test5 | 2017-06-01 |
+-----------+----------+--------------+
我尝试使用 DISTINCT 删除重复项,但不幸的是我收到此错误消息:
SELECT DISTINCT ON (name) * FROM messages
WHERE receive = 't'
GROUP BY name
ORDER BY MAX(created_at) DESC
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions LINE 1: SELECT DISTINCT ON (name) * FROM messages ^ : SELECT DISTINCT ON (name) * FROM messages WHERE receive = 't' GROUP BY name ORDER BY MAX(created_at) DESC
你有什么办法解决这个问题吗?
您将按如下方式使用 DISTINCT ON
:
SELECT DISTINCT ON (name) *
FROM messages
WHERE receive = 't'
ORDER BY name, created_at DESC
即:
不需要
GROUP BY
子句DISTINCT ON(...)
中列出的列必须首先出现在ORDER BY
子句中...后面是应该用来分组的列(这里是
created_at
)
请注意,distinct on
查询的结果始终按子句中的列排序(因为这种排序用于确定应保留哪些行)。
如果您想更好地控制排序顺序,则可以改用 window 函数:
SELECT *
FROM (
SELECT m.*, ROW_NUMBER() OVER(PARTITION BY name ORDER BY created_at DESC) rn
FROM messages m
WHERE receive = 't'
) t
WHERE rn = 1
ORDER BY created_at DESC
使用DISTINCT ON
,但用正确的ORDER BY
:
SELECT DISTINCT ON (name) m.*
FROM messages m
WHERE receive = 't'
ORDER BY name, created_at DESC;
一般来说,您不会将 DISTINCT ON
与 GROUP BY
一起使用。它与 ORDER BY
一起使用。它的工作方式是根据 ORDER BY
子句为每个 name
选择第一行。
你不应该把你正在做的事情想成聚合。您要根据 created_at
进行过滤。在许多数据库中,您可以使用相关子查询来表达这一点:
select m.*
from messages m
where m.created_at = (select max(m2.created_at)
from messages m2
where m2.name = m.name and m2.receive = 't'
) and
m.receive = 't'; -- this condition is probably not needed
SELECT *
FROM messages
WHERE receive = 't' and not exists (
select 1
from messages m
where m.receive = message.receive and messages.name = m.name and m.created_at > messages.created_at
)
ORDER BY created_at DESC
上面的查询找到了满足以下条件的邮件:
- 收到的是't'
- 不存在另一条消息
- 接收值相同
- 同名
- 并且更新
假设同名没有同时发送两条消息,这应该足够了。另一个要点是名称可能看起来相似,但如果值中存在一些白色字符,则名称可能看起来不同,因此,如果您在结果中看到两条名称相同但 created_at 不同的记录如上查询,那极有可能是白字在捉弄你