SQL: 根据条件选择Distinct users和groupby
SQL: Selecting Distinct users and groupby based on condition
我有一个数据库table:
User_id | User Name | Join_date | Activity_date |
1 abc 01/01/2021 02-01-2021
2 jay 01/01/2021 03-01-2021
2 jay 01/01/2021 04-01-2021
1 abc 01/01/2021 09-01-2021
1 abc 01/01/2021 16-01-2021
2 jay 01/01/2021 05-01-2021
3 xyz 03/03/2021 12-03-2021
3 xyz 03/03/2021 30-03-2021
2 jay 01/01/2021 26-01-2021
我想根据他们的 Activity_date
对用户进行分桶,例如每天执行 activity 的用户分为 table1,执行 activity 的用户每周(间隔 6-7 天)进入 table2,所有其他用户进入 table3.
output
是这样的:
temporary_table1
:(每隔一天做 activity 的人)
User_id | User Name | Join_date | Activity_date |
2 jay 01/01/2021 03-01-2021
2 jay 01/01/2021 04-01-2021
2 jay 01/01/2021 05-01-2021
temporary_table_2
:(从第一次开始每 6-7 天做一次维权的人 Activity_date)
User_id | User Name | Join_date | Activity_date |
1 abc 01/01/2021 02-01-2021
1 abc 01/01/2021 09-01-2021
1 abc 01/01/2021 15-01-2021
temporary_table_3
:(这里加上user_id=2是因为从这个用户上次activity做到最近activity做有超过7天差距)
User_id | User Name | Join_date | Activity_date |
3 xyz 03/03/2021 12-03-2021
3 xyz 03/03/2021 30-03-2021
2 jay 01/01/2021 26-01-2021
如何在 SQL (Redshift) 中实现这一点?
你应该可以做这样的事情,然后你可以在 date_diff:
上进行分组
SELECT
user_id,
user_name,
join_date
activity_date,
#window function
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date ASC) as day_before,
#difference in days between activity_date and day_before
DATEDIFF(
day,
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date ASC),
activity_date
) as date_diff
FROM your_dataset
我有一个数据库table:
User_id | User Name | Join_date | Activity_date |
1 abc 01/01/2021 02-01-2021
2 jay 01/01/2021 03-01-2021
2 jay 01/01/2021 04-01-2021
1 abc 01/01/2021 09-01-2021
1 abc 01/01/2021 16-01-2021
2 jay 01/01/2021 05-01-2021
3 xyz 03/03/2021 12-03-2021
3 xyz 03/03/2021 30-03-2021
2 jay 01/01/2021 26-01-2021
我想根据他们的 Activity_date
对用户进行分桶,例如每天执行 activity 的用户分为 table1,执行 activity 的用户每周(间隔 6-7 天)进入 table2,所有其他用户进入 table3.
output
是这样的:
temporary_table1
:(每隔一天做 activity 的人)
User_id | User Name | Join_date | Activity_date |
2 jay 01/01/2021 03-01-2021
2 jay 01/01/2021 04-01-2021
2 jay 01/01/2021 05-01-2021
temporary_table_2
:(从第一次开始每 6-7 天做一次维权的人 Activity_date)
User_id | User Name | Join_date | Activity_date |
1 abc 01/01/2021 02-01-2021
1 abc 01/01/2021 09-01-2021
1 abc 01/01/2021 15-01-2021
temporary_table_3
:(这里加上user_id=2是因为从这个用户上次activity做到最近activity做有超过7天差距)
User_id | User Name | Join_date | Activity_date |
3 xyz 03/03/2021 12-03-2021
3 xyz 03/03/2021 30-03-2021
2 jay 01/01/2021 26-01-2021
如何在 SQL (Redshift) 中实现这一点?
你应该可以做这样的事情,然后你可以在 date_diff:
上进行分组SELECT
user_id,
user_name,
join_date
activity_date,
#window function
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date ASC) as day_before,
#difference in days between activity_date and day_before
DATEDIFF(
day,
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date ASC),
activity_date
) as date_diff
FROM your_dataset