LIMIT / 过滤 LEFT JOIN
LIMIT / Filtering on LEFT JOIN
我有两个 table,一个 table 是有收入的购买列表,purchase_time 和一个用户 ID,另一个 table 有一个列表campaign_id、user_id、click_time 的广告系列点击列表。 campaign_clicks 基本上记录来自活动的所有点击,可以有任意数量的点击或 none 这些可能发生在购买之前或之后的任何时间,但我需要做的是确定哪个 campaign_id是任何给定用户在购买之前点击的最后一个活动,归因于该活动的总收入是多少 campaign_id。我只想将收入归因于购买前 3 天内发生的点击。
购买
date
user_id
revenue
purchase_time
2020/09/01
10
30.0
2020/09/01 10:10:00 am
2020/09/01
20
15.0
2020/09/02 09:15:00 am
2020/09/01
30
25.0
2020/09/02 08:15:00 am
campaign_clicks
user_id
campaign_id
click_time
10
2
2020/09/01 10:01:00 am
10
1
2020/09/01 10:05:00 am
10
2
2020/09/01 10:20:00 am
20
2
2020/09/01 10:10:00 am
30
2
2020/09/01 07:30:00 am
想要的结果
date
campaign_id
revenue
2020/09/01
1
30.0
2020/09/01
2
25.0
不应包括用户 ID 20 的购买,因为它发生在 click_time 之前。用户 10 的收入应归因于广告系列 2,因为点击发生在购买之前。
我的问题是我的连接返回了所有点击,这增加了收入。 inner join 中的 select 不知道购买时间,我需要以某种方式过滤并将点击缩小到单次点击,即最后一次点击。我试过使用 ROW_NUMBER() 来应用索引,但这不允许我过滤掉购买后发生的点击。
这是我所在的地方
SELECT
date
,ROUND(sum(revenue)) as revenue
,campaign_clicks.campaign_id
FROM
purchases
LEFT JOIN (
SELECT
campaign_id
,user_id
,click_time
FROM
campaign_clicks
ORDER BY
click_time DESC
) AS clicks ON clicks.user_id = purchases.user_id
WHERE
-- only select campaign clicks that occurred before the purchase
purchases.purchase_time > clicks.click_time
-- only include clicks that occurred within 3 days of the purchase
AND DATEDIFF(minutes, clicks.click_time,purchases.purchase_time) <= (60*24*3)
-- PROBLEM HERE - there can be still a number of other clicks that occurred before the purchase I need to filter to only the last one
GROUP BY
date
,clicks.campaign_id
您可以使用以下查询来实现。所以基本上,您可以执行 INNER JOIN
并在 ON
子句本身中过滤掉持续时间超过 3 天的日期。
现在限制为最后点击的活动,可以使用 ROW_NUMBER
函数并将序列顺序设置为 clicked_time DESC
来实现。这样,购买前的最后点击日期将具有序列号。的 1。然后,您可以通过将结果集包装在外部查询中来过滤掉 row_number 大于 1 的记录。
-- Outer query to select just the last click for a any given purchase
SELECT * FROM (
SELECT p.date, p.purchase_time, c.click_time, c.campaign_id, p.revenue,
-- sequential row number for clicks sorted in descending order of date
ROW_NUMBER() OVER(PARTITION BY c.user_id ORDER BY c.click_time DESC) AS row_num
FROM purchases p
INNER JOIN campaign_clicks c
ON (
c.user_id = p.user_id
--- only select clicks that occured before the purchase
AND c.click_time<p.purchase_time
--- only select the clicks that occurred 3 days prior (mins * hours * days )
AND TIMESTAMPDIFF(MINUTE, c.click_time, p.purchase_time) <= (60*24*3)
)
) res WHERE res.row_num=1
您还可以在 DB-Fiddle link
上查看结果
Snowflake 支持横向连接。也就是说,在函数或相关子查询上。这使您可以加入 returns 只有一行(每个输入行)的查询。
SELECT
purchases.date
,purchases.revenue
,clicks.campaign_id
FROM
purchases
LEFT JOIN LATERAL
(
SELECT
campaign_id
,user_id
,click_time
FROM
campaign_clicks
WHERE
user_id = purchases.user_id
-- only select campaign clicks that occurred before the purchase
AND click_time < purchases.purchase_time
-- only include clicks that occurred within 3 days of the purchase
AND click_time >= DATEADD(days, -3, purchases.purchase_time)
ORDER BY
click_time DESC
LIMIT
1
)
AS clicks
我有两个 table,一个 table 是有收入的购买列表,purchase_time 和一个用户 ID,另一个 table 有一个列表campaign_id、user_id、click_time 的广告系列点击列表。 campaign_clicks 基本上记录来自活动的所有点击,可以有任意数量的点击或 none 这些可能发生在购买之前或之后的任何时间,但我需要做的是确定哪个 campaign_id是任何给定用户在购买之前点击的最后一个活动,归因于该活动的总收入是多少 campaign_id。我只想将收入归因于购买前 3 天内发生的点击。
购买
date | user_id | revenue | purchase_time |
---|---|---|---|
2020/09/01 | 10 | 30.0 | 2020/09/01 10:10:00 am |
2020/09/01 | 20 | 15.0 | 2020/09/02 09:15:00 am |
2020/09/01 | 30 | 25.0 | 2020/09/02 08:15:00 am |
campaign_clicks
user_id | campaign_id | click_time |
---|---|---|
10 | 2 | 2020/09/01 10:01:00 am |
10 | 1 | 2020/09/01 10:05:00 am |
10 | 2 | 2020/09/01 10:20:00 am |
20 | 2 | 2020/09/01 10:10:00 am |
30 | 2 | 2020/09/01 07:30:00 am |
想要的结果
date | campaign_id | revenue |
---|---|---|
2020/09/01 | 1 | 30.0 |
2020/09/01 | 2 | 25.0 |
不应包括用户 ID 20 的购买,因为它发生在 click_time 之前。用户 10 的收入应归因于广告系列 2,因为点击发生在购买之前。
我的问题是我的连接返回了所有点击,这增加了收入。 inner join 中的 select 不知道购买时间,我需要以某种方式过滤并将点击缩小到单次点击,即最后一次点击。我试过使用 ROW_NUMBER() 来应用索引,但这不允许我过滤掉购买后发生的点击。
这是我所在的地方
SELECT
date
,ROUND(sum(revenue)) as revenue
,campaign_clicks.campaign_id
FROM
purchases
LEFT JOIN (
SELECT
campaign_id
,user_id
,click_time
FROM
campaign_clicks
ORDER BY
click_time DESC
) AS clicks ON clicks.user_id = purchases.user_id
WHERE
-- only select campaign clicks that occurred before the purchase
purchases.purchase_time > clicks.click_time
-- only include clicks that occurred within 3 days of the purchase
AND DATEDIFF(minutes, clicks.click_time,purchases.purchase_time) <= (60*24*3)
-- PROBLEM HERE - there can be still a number of other clicks that occurred before the purchase I need to filter to only the last one
GROUP BY
date
,clicks.campaign_id
您可以使用以下查询来实现。所以基本上,您可以执行 INNER JOIN
并在 ON
子句本身中过滤掉持续时间超过 3 天的日期。
现在限制为最后点击的活动,可以使用 ROW_NUMBER
函数并将序列顺序设置为 clicked_time DESC
来实现。这样,购买前的最后点击日期将具有序列号。的 1。然后,您可以通过将结果集包装在外部查询中来过滤掉 row_number 大于 1 的记录。
-- Outer query to select just the last click for a any given purchase
SELECT * FROM (
SELECT p.date, p.purchase_time, c.click_time, c.campaign_id, p.revenue,
-- sequential row number for clicks sorted in descending order of date
ROW_NUMBER() OVER(PARTITION BY c.user_id ORDER BY c.click_time DESC) AS row_num
FROM purchases p
INNER JOIN campaign_clicks c
ON (
c.user_id = p.user_id
--- only select clicks that occured before the purchase
AND c.click_time<p.purchase_time
--- only select the clicks that occurred 3 days prior (mins * hours * days )
AND TIMESTAMPDIFF(MINUTE, c.click_time, p.purchase_time) <= (60*24*3)
)
) res WHERE res.row_num=1
您还可以在 DB-Fiddle link
上查看结果Snowflake 支持横向连接。也就是说,在函数或相关子查询上。这使您可以加入 returns 只有一行(每个输入行)的查询。
SELECT
purchases.date
,purchases.revenue
,clicks.campaign_id
FROM
purchases
LEFT JOIN LATERAL
(
SELECT
campaign_id
,user_id
,click_time
FROM
campaign_clicks
WHERE
user_id = purchases.user_id
-- only select campaign clicks that occurred before the purchase
AND click_time < purchases.purchase_time
-- only include clicks that occurred within 3 days of the purchase
AND click_time >= DATEADD(days, -3, purchases.purchase_time)
ORDER BY
click_time DESC
LIMIT
1
)
AS clicks