如何使用 SQL 获得我想要的条件计算

How to get a conditional calculation that I want with SQL

我有一个 table 超过 50 万行和列,如下所示;

  user_id | event_date |  event_name  | version|
===============================================
  1543435 | 18092021   |  first_open  |  113    
  5476523 | 18092021   | session_start|  111 
  7418529 | 18092021   |  first_open  |  112 
  1754821 | 18092021   | first_open   |  113 
  9820011 | 18092021   | session_start|  114 
  4568714 | 18092021   | session_start|  120

有Event_name和first_open的用户表示用户是第一次安装和打开应用,而session_start表示用户之前安装和打开过,不是第一次.

user_id 对每个用户都是唯一的,不会因每次登录而改变。

我们正在处理只有版本 113 的用户。

我需要找出在第 18 天 (18.09.2021) 安装应用程序并在第 1 天 (19.09.2021) 和第 3 天 (21.09.2021) 再次打开它的用户比例。

路线图:

经过一周的研究和集思广益,我编写了以下查询:

SELECT 
(SELECT COUNT(DISTINCT our_data.user_id)
FROM our_data WHERE our_data.event_date = '20210918' 
AND our_data.event_name ='first_open' 
AND our_data.version = '113') 
AS DAY_ZERO,

(SELECT COUNT(DISTINCT dayone.user_id) 
FROM our_data
LEFT JOIN our_data AS dayone 
ON our_data.user_id = dayone.user_id)
WHERE our_data.event_date = '20210918' 
AND dayone.event_date = our_data.event_date +1
AND our_data.event_name ='first_open' 
AND dayone.event_name ='session_start' 
AND our_data.version = '113' 
AND dayone.version = '113')
AS DAY_ONE,

(SELECT COUNT(DISTINCT our_data.user_id)
FROM our_data
LEFT JOIN our_data as daythree
ON our_data.user_id = daythree.user_id 
WHERE our_data.event_date = '20210918' 
AND daythree.event_date = our_data.event_date +3
AND our_data.event_name ='first_open' 
AND daythree.event_name ='session_start' 
AND our_data.version = '113' 
AND daythree.version = '113')
AS DAY_THREE

这个查询给了我这些结果:

  DAY_ZERO | DAY_ONE |  DAY_THREE |
========================================
  14879    |   7850  |     949    |   

在这些结果中,我无法在同一查询中执行任何操作。我需要达到 Day_ONE/DAY_ZERO= DAY 1 RETENTION 和 DAY_THREE/DAY_ZERO= DAY3 RETENTION。我需要在同一个 table 的其他第一天执行这些操作,因此我必须在单个查询中执行。 你觉得我能怎么做?

我目前没有 sql 可用的数据库,但我认为解决方案应该如下所示:

select 
    DAY_ONE/DAY_ZERO   as DAY_1_RETENTION,
    DAY_THREE/DAY_ZERO as DAY_3_RETENTION
from (... your query ...)

我会在评论中发布这个,但查询对于评论来说似乎有点太长了。 简而言之,最佳答案在一定程度上取决于您实际使用的 SQL 风格(T-SQL?PL/SQL?PL/pgSQL?),但通用方法是一样。

您已经计算了子查询中所需的值。 我重新使用了那些子查询,但不是行值,而是将它们放入具有唯一列名的表中。

现在您有三个表,每个表有一行。交叉连接这些以获得具有三个不同列的“单行”......并对列进行所需的计算

select 
  DAY_ONE.cnt1/DAY_ZERO.cnt0   as DAY_1_RETENTION,
  DAY_THREE.cnt3/DAY_ZERO.cnt0 as DAY_3_RETENTION
FROM (SELECT COUNT(DISTINCT our_data.user_id) AS cnt0
      FROM our_data WHERE our_data.event_date = '20210918' 
      AND our_data.event_name ='first_open' 
      AND our_data.version = '113') 
     AS DAY_ZERO,

     (SELECT COUNT(DISTINCT dayone.user_id) cnt1
      FROM our_data
      LEFT JOIN our_data AS dayone 
      ON our_data.user_id = dayone.user_id)
      WHERE our_data.event_date = '20210918' 
      AND dayone.event_date = our_data.event_date +1
      AND our_data.event_name ='first_open' 
      AND dayone.event_name ='session_start' 
      AND our_data.version = '113' 
      AND dayone.version = '113')
     AS DAY_ONE,

     (SELECT COUNT(DISTINCT our_data.user_id) cnt3
      FROM our_data
      LEFT JOIN our_data as daythree
      ON our_data.user_id = daythree.user_id 
      WHERE our_data.event_date = '20210918' 
      AND daythree.event_date = our_data.event_date +3
      AND our_data.event_name ='first_open' 
      AND daythree.event_name ='session_start' 
      AND our_data.version = '113' 
      AND daythree.version = '113')
      AS DAY_THREE