LIMIT / 过滤 LEFT JOIN

LIMIT / Filtering on LEFT JOIN

我有两个 table,一个 table 是有收入的购买列表,purchase_time 和一个用户 ID,另一个 table 有一个列表campaign_id、user_id、click_time 的广告系列点击列表。 campaign_clicks 基本上记录来自活动的所有点击,可以有任意数量的点击或 none 这些可能发生在购买之前或之后的任何时间,但我需要做的是确定哪个 campaign_id是任何给定用户在购买之前点击的最后一个活动,归因于该活动的总收入是多少 campaign_id。我只想将收入归因于购买前 3 天内发生的点击。

购买

date user_id revenue purchase_time
2020/09/01 10 30.0 2020/09/01 10:10:00 am
2020/09/01 20 15.0 2020/09/02 09:15:00 am
2020/09/01 30 25.0 2020/09/02 08:15:00 am

campaign_clicks

user_id campaign_id click_time
10 2 2020/09/01 10:01:00 am
10 1 2020/09/01 10:05:00 am
10 2 2020/09/01 10:20:00 am
20 2 2020/09/01 10:10:00 am
30 2 2020/09/01 07:30:00 am

想要的结果

date campaign_id revenue
2020/09/01 1 30.0
2020/09/01 2 25.0

不应包括用户 ID 20 的购买,因为它发生在 click_time 之前。用户 10 的收入应归因于广告系列 2,因为点击发生在购买之前。

我的问题是我的连接返回了所有点击,这增加了收入。 inner join 中的 select 不知道购买时间,我需要以某种方式过滤并将点击缩小到单次点击,即最后一次点击。我试过使用 ROW_NUMBER() 来应用索引,但这不允许我过滤掉购买后发生的点击。

这是我所在的地方

SELECT  
  date
  ,ROUND(sum(revenue)) as revenue
  ,campaign_clicks.campaign_id
FROM 
    purchases                    
       
        LEFT JOIN ( 

                   SELECT                                   
                        campaign_id 
                        ,user_id
                        ,click_time                       
                   FROM 
                      campaign_clicks            
                   ORDER BY         
                      click_time DESC                             
                  ) AS clicks ON clicks.user_id = purchases.user_id 
WHERE 
  -- only select campaign clicks that occurred before the purchase                
  purchases.purchase_time > clicks.click_time

  -- only include clicks that occurred within 3 days of the purchase               
  AND DATEDIFF(minutes, clicks.click_time,purchases.purchase_time) <= (60*24*3)

  -- PROBLEM HERE - there can be still a number of other clicks that occurred before the purchase I need to filter to only the last one 
GROUP BY 
   date
 ,clicks.campaign_id

您可以使用以下查询来实现。所以基本上,您可以执行 INNER JOIN 并在 ON 子句本身中过滤掉持续时间超过 3 天的日期。

现在限制为最后点击的活动,可以使用 ROW_NUMBER 函数并将序列顺序设置为 clicked_time DESC 来实现。这样,购买前的最后点击日期将具有序列号。的 1。然后,您可以通过将结果集包装在外部查询中来过滤掉 row_number 大于 1 的记录。

-- Outer query to select just the last click for a any given purchase
SELECT * FROM (
    SELECT p.date, p.purchase_time, c.click_time, c.campaign_id, p.revenue,
-- sequential row number for clicks sorted in descending order of date
    ROW_NUMBER() OVER(PARTITION BY c.user_id ORDER BY c.click_time DESC) AS row_num
    FROM purchases p
    INNER JOIN campaign_clicks c
    ON ( 
       c.user_id = p.user_id 
      --- only select clicks that occured before the purchase
      AND c.click_time<p.purchase_time
      --- only select the clicks that occurred 3 days prior (mins * hours * days )
      AND TIMESTAMPDIFF(MINUTE, c.click_time, p.purchase_time) <= (60*24*3)
    )
) res WHERE res.row_num=1
您还可以在 DB-Fiddle link
上查看结果

Snowflake 支持横向连接。也就是说,在函数或相关子查询上。这使您可以加入 returns 只有一行(每个输入行)的查询。

SELECT  
  purchases.date
 ,purchases.revenue
 ,clicks.campaign_id
FROM 
  purchases    
LEFT JOIN LATERAL
(
  SELECT
    campaign_id 
   ,user_id
   ,click_time                       
  FROM 
    campaign_clicks
  WHERE
            user_id = purchases.user_id
    -- only select campaign clicks that occurred before the purchase                
    AND click_time <  purchases.purchase_time
    -- only include clicks that occurred within 3 days of the purchase               
    AND click_time >= DATEADD(days, -3, purchases.purchase_time)
  ORDER BY
    click_time DESC
  LIMIT
    1                        
)
  AS clicks