Postgres,每天从日期范围 select 获取唯一记录

Postgres, get unique records per day from date range select

我需要按日期范围生成一个包含登录用户的报告,但不能在同一天重复(如果某人在同一天登录两次,我们将不会列出两次)。不幸的是,我们将登录信息保留为 json(是的,我无法将其更改为单独的 table,我不知道是谁设计了这个数据库)。 查询以查看所有登录用户:

select a.id, username, email, ah.modified as login_date
from accounts a join
     account_history ah
     on modified_acc_id = a.id
 where ah.data::jsonb->>'message' = 'Logon';

修改为带有时区的时间戳,用作登录日期。

我只找到了每天有不同 ID 计数的示例,但我不知道如何将其修改为每天 return 个不同结果

示例数据:

 id  |        username  |              email       |         login_date
-----+-------------------------+---------------------------------+----------------------------
 102 | example          | example@example.com      | 2018-12-06 09:30:10.573+00
 102 | example          | example@example.com      | 2018-12-06 09:32:34.235+00
  42 | rafal            | rafal@example.com        | 2018-12-06 09:45:24.884+00
 576 | john             | john@example.com         | 2018-12-06 09:35:24.922+00
 576 | john             | john@example.com         | 2018-12-07 09:58:04.253+00

想要的数据:

 id  |        username  |              email       |         login_date
-----+-------------------------+---------------------------------+----------------------------
 102 | example          | example@example.com      | 2018-12-06 09:30:10.573+00
  42 | rafal            | rafal@example.com        | 2018-12-06 09:45:24.884+00
 576 | john             | john@example.com         | 2018-12-06 09:35:24.922+00
 576 | john             | john@example.com         | 2018-12-07 09:58:04.253+00

如你所见,没有第二行

您似乎想要一段时间内的用户天数。如果我理解正确的话:

select count(*) as num_user_days_in_range
from (select a.username, date_trunc('day', ah.modified) as login_date
      from accounts a join
           account_history ah
           on modified_acc_id = a.id
      where ah.data::jsonb->>'message' = 'Logon'
      group by a.username, login_date
     ) u
where login_date >= $date1 and login_date < $date2

使用window函数row_number()

select id,username,email,login_date from 
(
 select a.id, username, email, ah.modified as login_date,
row_number() over(partition by a.id, username,email order by ah.modified) rn
 from accounts a join
 account_history ah
 on modified_acc_id = a.id
 where ah.data::jsonb->>'message' = 'Logon'
) t where t.rn=1

DISTINCT ON 准确地给出有序组的第一行。在您的示例中,该组是 idlogin_date 时间戳

date 部分
SELECT DISTINCT ON (id, login_date::date)
    *
FROM (
    -- <your query>
) s
ORDER BY id, login_date::date, login_date

demo:db<>fiddle

ORDER BY 子句的解释:

您必须先按 DISTINCT 列排序。但在你的情况下,你真的不想只按日期排序,而是按时间部分排序。因此,在按日期排序后(这是必要的,因为你的 DISTINCT 列)你也必须按时间戳排序。


所以整个查询可以简化为(没有子查询):

SELECT DISTINCT ON (a.id, ah.modified::date) 
    a.id, 
    username, 
    email, 
    ah.modified as login_date
FROM accounts a 
JOIN account_history ah
    ON modified_acc_id = a.id
WHERE ah.data::jsonb->>'message' = 'Logon'
ORDER BY a.id, ah.modified::date, ah.modified 

看来有骗子的时候,你是最早约会的。如果是这样,这行得通吗?

select
  a.id, username, email, min (ah.modified) as login_date
from accounts a join
     account_history ah
     on modified_acc_id = a.id
 where ah.data::jsonb->>'message' = 'Logon'
group by a.id, username, email, ah.modified::date