sqlite 计算连续匹配行的数量,同时排除它们

sqlite count the number of consecutive matching rows while excluding them

更新

即使有一个可接受的答案,我仍然愿意接受其他建议。我需要这个才能在 sqlite 上工作,至少要回到 3.19.4 版本(实际上是 Android 8),并且接受答案的最高性能形式(使用 window 函数)之前不可用sqlite 版本 3.28。当查询的 table 包含数百行时,回退会导致设备停止然后崩溃,所以我不能依赖那些。


原问题

假设我有一个名为 messages 的 sqlite table,其中包含以下列:

| id | type | text | time |
---------------------------

id是主键,是唯一的。假设我有 5 行,顺序如下(为清楚起见,将它们表示为 JSON 数组):

[
  {
    id: 'first',
    type: 'random',
    text: 'hey there',
    time: '2022-02-15T01:47:25.581'
  },
  {
    id: 'second',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:48:25.581'
  }
  {
    id: 'third',
    type: 'new_item',
    text: 'new socks',
    time: '2022-02-15T01:49:25.581'
  }
  {
    id: 'fourth',
    type: 'random',
    text: 'what time is it',
    time: '2022-02-15T01:50:25.581'
  },
  {
    id: 'fifth',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:51:25.581'
  }

我想查询这些消息,以便将类型为 new_item 的连续行表示为一个,以及最终中每个唯一 new_item 的连续行数输出。具体来说,我希望输出能够为我提供以下捕获的信息(不必是相同的架构,这只是我想要的示例):

[
  {
    id: 'first',
    type: 'random',
    text: 'hey there',
    time: '2022-02-15T01:47:25.581'
  },
  {
    id: 'second',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:48:25.581',
    numConsecutiveItems: 2
  }
  {
    id: 'fourth',
    type: 'random',
    text: 'what time is it',
    time: '2022-02-15T01:50:25.581'
  },
  {
    id: 'fifth',
    type: 'new_item',
    text: 'new shoe',
    time: '2022-02-15T01:51:25.581',
    numConsecutiveItems: 1
  }

请注意,id 为 third 的消息不在最终输出中,因为它的类型为 new_item 并连续跟随另一条类型为 new_item 的消息,并且消息带有出于同样的原因,second 的 id 具有 2numConsecutiveItems。更重要的是,id 为 fifth 的消息存在是因为它没有立即跟在另一个 new_item 消息之后,出于同样的原因,它的 numConsecutiveItems 的值是 1。我可以通过单个查询实现这一点,并按 time 列排序吗?这将是我的强烈偏好,但如果不是,那么理想情况下不超过 2 个查询。谢谢!

使用window函数创建连续类型的组并计算每组中有多少'new_item'

WITH cte AS (
  SELECT *, 
         COUNT(*) OVER (PARTITION BY grp) count,
         ROW_NUMBER() OVER (PARTITION BY grp ORDER BY time) rn 
  FROM (       
    SELECT *, SUM(flag) OVER (ORDER BY time) grp
    FROM (
      SELECT *, (type <> LAG(type, 1, '') OVER (ORDER BY time)) flag
      FROM tablename
    )
  )  
)
SELECT id, type, text, time,
       CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;

对于不支持window函数的SQLite版本,使用聚合和相关子查询来模拟window函数:

WITH 
  prev_types AS (
    SELECT t1.*, t1.type <> COALESCE(t2.type, '') flag, MAX(t2.time) max_time 
    FROM tablename t1 LEFT JOIN tablename t2
    ON t2.time < t1.time
    GROUP BY t1.id
  ),
  sum_flags AS (
    SELECT pt1.*, SUM(pt2.flag) grp
    FROM prev_types pt1 INNER JOIN prev_types pt2
    ON pt2.time <= pt1.time
    GROUP BY pt1.id
  ),
  cte AS (
    SELECT sf1.*, 
           (SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp) count,
           (SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp AND sf2.time <= sf1.time) rn
    FROM sum_flags sf1
  )
SELECT id, type, text, time,
       CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;

参见demo