如何在具有日期时差的 Bigquery 中结合交叉连接和字符串聚合
How to combine Cross Join and String Agg in Bigquery with date time difference
我正在尝试从以下 table
| user_id | touch | Date | Purchase Amount
| 1 | Impression| 2020-09-12 |0
| 1 | Impression| 2020-10-12 |0
| 1 | Purchase | 2020-10-13 |125$
| 1 | Email | 2020-10-14 |0
| 1 | Impression| 2020-10-15 |0
| 1 | Purchase | 2020-10-30 |122
| 2 | Impression| 2020-10-15 |0
| 2 | Impression| 2020-10-16 |0
| 2 | Email | 2020-10-17 |0
至
| user_id | path | Number of days between First Touch and Purchase | Purchase Amount
| 1 | Impression,Impression,Purchase | 2020-10-13(Purchase) - 2020-09-12 (Impression) |125$
| 1 | Email,Impression, Purchase | 2020-10-30(Purchase) - 2020-10-14(Email) | 122$
| 2 | Impression, Impression, Email | 2020-12-31 (Fixed date) - 2020-10-15(Impression) | 0$
本质上,每当在逗号分隔的字符串中遇到 'Purchase' 时,我都会尝试为 table 中的每个唯一用户创建一个新行。
另外,计算每个唯一用户的首次接触和首次购买之间的差异。创建新行时,我们对上面示例中显示的同一用户执行相同的操作。
从我收集到的信息来看,我需要混合使用交叉连接和字符串聚合,但我尝试在字符串聚合中使用 case 语句,但无法获得所需的结果。
在 SQL (Bigquery) 中有没有更好的方法。
谢谢
意味着如果有联系购买,你需要分行的解决方案。
使用以下查询:
Select user_id,
Aggregation function according to your requirement,
Sum(purchase_amount)
From
(Select t.*,
Sum(case when touch = 'Purchase' then 1 else 0 end) over (partition by user_id order by date) as sm
From t) t
Group by user_id, sm
我们可以将其视为一个缺口和孤岛问题,其中每个孤岛都以购买结束。我们如何定义组?通过计算我们有多少购买提前(包括当前行)-因此在查询中进行降序排序。
select user_id, string_agg(touch order by date),
min(date) as first_date, max(date) as max_date,
date_diff(max(date), min(date)) as cnt_days
from (
select t.*,
countif(touch = 'Purchase') over(partition by user_id order by date desc) as grp
from mytable t
) t
group by user_id, grp
以下适用于 BigQuery 标准 SQL
#standardSQL
select user_id,
string_agg(touch order by date) path,
date_diff(max(date), min(date), day) days,
sum(amount) amount
from (
select user_id, touch, date, amount,
countif(touch = 'Purchase') over win grp
from `project.dataset.table`
window win as (partition by user_id order by date rows between unbounded preceding and 1 preceding)
)
group by user_id, grp
如果应用于您问题中的示例数据 - 输出为
another change, in case there is no Purchase in the touch we calculate the number of days from a fixed window we have set. How can I add this to the query above?
select user_id,
string_agg(touch order by date) path,
date_diff(if(countif(touch = 'Purchase') = 0, '2020-12-31', max(date)), min(date), day) days,
sum(amount) amount
from (
select user_id, touch, date, amount,
countif(touch = 'Purchase') over win grp
from `project.dataset.table`
window win as (partition by user_id order by date rows between unbounded preceding and 1 preceding)
)
group by user_id, grp
有输出
您可以为每一行创建一个值,该值对应于 table.touch = 'Purchase'
的实例数,然后可用于分组:
with r as (select row_number() over(order by t1.user_id) rid, t1.* from table t1)
select t3.user_id, group_concat(t3.touch), sum(t3.amount), date_diff(max(t3.date), min(t3.date))
from (select
(select sum(r1.touch = 'Purchase' AND r1.rid < r2.rid) from r r1) c1, r2.* from r r2
) t3
group by t3.c1;
我正在尝试从以下 table
| user_id | touch | Date | Purchase Amount
| 1 | Impression| 2020-09-12 |0
| 1 | Impression| 2020-10-12 |0
| 1 | Purchase | 2020-10-13 |125$
| 1 | Email | 2020-10-14 |0
| 1 | Impression| 2020-10-15 |0
| 1 | Purchase | 2020-10-30 |122
| 2 | Impression| 2020-10-15 |0
| 2 | Impression| 2020-10-16 |0
| 2 | Email | 2020-10-17 |0
至
| user_id | path | Number of days between First Touch and Purchase | Purchase Amount
| 1 | Impression,Impression,Purchase | 2020-10-13(Purchase) - 2020-09-12 (Impression) |125$
| 1 | Email,Impression, Purchase | 2020-10-30(Purchase) - 2020-10-14(Email) | 122$
| 2 | Impression, Impression, Email | 2020-12-31 (Fixed date) - 2020-10-15(Impression) | 0$
本质上,每当在逗号分隔的字符串中遇到 'Purchase' 时,我都会尝试为 table 中的每个唯一用户创建一个新行。
另外,计算每个唯一用户的首次接触和首次购买之间的差异。创建新行时,我们对上面示例中显示的同一用户执行相同的操作。
从我收集到的信息来看,我需要混合使用交叉连接和字符串聚合,但我尝试在字符串聚合中使用 case 语句,但无法获得所需的结果。
在 SQL (Bigquery) 中有没有更好的方法。
谢谢
意味着如果有联系购买,你需要分行的解决方案。
使用以下查询:
Select user_id,
Aggregation function according to your requirement,
Sum(purchase_amount)
From
(Select t.*,
Sum(case when touch = 'Purchase' then 1 else 0 end) over (partition by user_id order by date) as sm
From t) t
Group by user_id, sm
我们可以将其视为一个缺口和孤岛问题,其中每个孤岛都以购买结束。我们如何定义组?通过计算我们有多少购买提前(包括当前行)-因此在查询中进行降序排序。
select user_id, string_agg(touch order by date),
min(date) as first_date, max(date) as max_date,
date_diff(max(date), min(date)) as cnt_days
from (
select t.*,
countif(touch = 'Purchase') over(partition by user_id order by date desc) as grp
from mytable t
) t
group by user_id, grp
以下适用于 BigQuery 标准 SQL
#standardSQL
select user_id,
string_agg(touch order by date) path,
date_diff(max(date), min(date), day) days,
sum(amount) amount
from (
select user_id, touch, date, amount,
countif(touch = 'Purchase') over win grp
from `project.dataset.table`
window win as (partition by user_id order by date rows between unbounded preceding and 1 preceding)
)
group by user_id, grp
如果应用于您问题中的示例数据 - 输出为
another change, in case there is no Purchase in the touch we calculate the number of days from a fixed window we have set. How can I add this to the query above?
select user_id,
string_agg(touch order by date) path,
date_diff(if(countif(touch = 'Purchase') = 0, '2020-12-31', max(date)), min(date), day) days,
sum(amount) amount
from (
select user_id, touch, date, amount,
countif(touch = 'Purchase') over win grp
from `project.dataset.table`
window win as (partition by user_id order by date rows between unbounded preceding and 1 preceding)
)
group by user_id, grp
有输出
您可以为每一行创建一个值,该值对应于 table.touch = 'Purchase'
的实例数,然后可用于分组:
with r as (select row_number() over(order by t1.user_id) rid, t1.* from table t1)
select t3.user_id, group_concat(t3.touch), sum(t3.amount), date_diff(max(t3.date), min(t3.date))
from (select
(select sum(r1.touch = 'Purchase' AND r1.rid < r2.rid) from r r1) c1, r2.* from r r2
) t3
group by t3.c1;