按 10 分钟计算指标的 Redshift 查询 windows

Question

关于 PostgreSQL 标签，你可能知道，Redshift is based off of PostgreSQL.

Amazon Redshift is based on PostgreSQL 8.0.2. Amazon Redshift and PostgreSQL have a number of very important differences that you must be aware of as you design and develop your data warehouse applications.

我有一个 table 是这样创建的：

create table purchase (
  user_id int,
  item_id int,
  t timestamp
)
diststyle even
interleaved sortkey(user_id, item_id, t);

我想执行一个查询，告诉我十分钟内最活跃的 3 位用户（购买最多的用户）window，以及这 10 分钟内购买最多的 3 件商品-分钟 window.

所以结果应该是这样的

+-item_id-|-user_id-|-window-+
| aaa     | xxx     | 0      |
+---------+---------+--------+
| bbb     | yyy     | 0      |
+---------+---------+--------+
| ccc     | zzz     | 0      |
+---------+---------+--------+
| ...     | ...     | 1      |
+---------+---------+--------+
| ...     | ...     | 1      |
+---------+---------+--------+
| ...     | ...     | 1      |
..............................
| ...     | ...     | 5      |
+---------+---------+--------+
| ...     | ...     | 5      |
+---------+---------+--------+
| ...     | ...     | 5      |
+---------+---------+--------+

其中 aaa 是前十分钟购买最多的商品 window，bbb 是前十分钟内购买第二多的商品 window ,依此类推,xxx是前十分钟购买次数最多的用户window,yyy是前十分钟购买次数第二多的用户window ]，等等。有六个 10 分钟 windows 因为我将在一个小时的日期范围内进行此操作。

我是 Redshift 的新手，所以很遗憾，我没有任何现有的 SQL 可以向您展示我的尝试。

Answer 1

我的要求略有变化，但我能够创建一个函数来完成我的新要求。我的新要求只是计算所有不同的 item_ids 和 user_ids

select count(distinct item_id) as item_id_count, count(distinct user_id) as user_id_count, substring(t, 0, 16) as window group by window order by window asc;

不确定其他人是否会有相同的日期格式，但我的日期格式是 yyyy-MM-dd hh:mm:ss，所以获取按 10 分钟分组的子字符串需要我只获取 yyyy-MM-dd hh:m 部分，然后我就分组。

按 10 分钟计算指标的 Redshift 查询 windows

Redshift query to count metrics by 10 minute windows

postgresql

amazon-redshift