获取最近几天不同用户 ID 的计数

Question

假设最近 7 天 table:

Userid   Download time
Rab01    2020-04-29 03:28
Klm01    2020-04-29 04:01
Klm01    2020-04-30 05:10
Rab01    2020-04-29 12:14
Osa_3    2020-04-25 09:01

以下是所需的输出：

Count  Download_time
1      2020-04-25
2      2020-04-29
1      2020-04-30

Answer 1

您可以使用 date_trunc 函数从日期时间中仅获取日期的一部分并将其用于分组。

下一个查询可能是：

SELECT 
    count(distinct Userid) as Count, -- get unuque users count
    to_char(date_trunc('day', Download_time), 'YYYY-MM-DD') AS Download_Day -- convert time do day
FROM table
WHERE DATE_PART('day', NOW() - Download_time) < 7 -- last 7 days
GROUP BY Download_Day; -- group by day

Fiddle

Answer 2

使用 Postgre 测试SQL。您还标记了很久以前在 Postgres 8.2 上分叉的 Redshift。可能有出入..

由于您似乎对标准 ISO 格式感到满意，因此简单转换为最新格式将是最有效的：

SELECT count(DISTINCT userid) AS "Count"
     , download_time::date AS "Download_Day"
FROM   tbl
WHERE  download_time >= CURRENT_DATE - 7
AND    download_time <  CURRENT_DATE
GROUP  BY 2;

db<>fiddle here

CURRENT_DATE 是标准 SQL 并且适用于 Postgres and Redshift。相关：

How do I determine the last day of the previous month using PostgreSQL?

关于 "last 7 days"：我花了最后 7 整个天（不包括今天 - 必然不完整），语法可以在 [=12 上使用普通索引=].相关：

Get dates of a day of week in a date range

理想情况下，您在 (download_time, userid) 上有一个复合索引（并满足一些先决条件）以获得非常快的 index-only scans。参见：

Is a composite index also good for queries on the first field?

count(DISTINCT ...) 通常很慢。对于有很多重复项的大表，有更快的技术。如果您需要优化性能，请披露您的确切设置和基数。

如果实际的数据类型是timestamptz，而不仅仅是timestamp，您还需要定义定义日期边界的时区。参见：

Ignoring time zones altogether in Rails and PostgreSQL

关于可选的短语法 GROUP BY 2:

Select first row in each GROUP BY group?

关于标识符的大小写：

Are PostgreSQL column names case-sensitive?

获取最近几天不同用户 ID 的计数

Get the count of distinct userids for last couple of days

sql

postgresql

distinct

aggregate-functions

amazon-redshift