sqlite 计算连续匹配行的数量,同时排除它们
sqlite count the number of consecutive matching rows while excluding them
更新
即使有一个可接受的答案,我仍然愿意接受其他建议。我需要这个才能在 sqlite 上工作,至少要回到 3.19.4
版本(实际上是 Android 8),并且接受答案的最高性能形式(使用 window 函数)之前不可用sqlite 版本 3.28
。当查询的 table 包含数百行时,回退会导致设备停止然后崩溃,所以我不能依赖那些。
原问题
假设我有一个名为 messages
的 sqlite table,其中包含以下列:
| id | type | text | time |
---------------------------
id
是主键,是唯一的。假设我有 5 行,顺序如下(为清楚起见,将它们表示为 JSON 数组):
[
{
id: 'first',
type: 'random',
text: 'hey there',
time: '2022-02-15T01:47:25.581'
},
{
id: 'second',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:48:25.581'
}
{
id: 'third',
type: 'new_item',
text: 'new socks',
time: '2022-02-15T01:49:25.581'
}
{
id: 'fourth',
type: 'random',
text: 'what time is it',
time: '2022-02-15T01:50:25.581'
},
{
id: 'fifth',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:51:25.581'
}
我想查询这些消息,以便将类型为 new_item
的连续行表示为一个,以及最终中每个唯一 new_item
的连续行数输出。具体来说,我希望输出能够为我提供以下捕获的信息(不必是相同的架构,这只是我想要的示例):
[
{
id: 'first',
type: 'random',
text: 'hey there',
time: '2022-02-15T01:47:25.581'
},
{
id: 'second',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:48:25.581',
numConsecutiveItems: 2
}
{
id: 'fourth',
type: 'random',
text: 'what time is it',
time: '2022-02-15T01:50:25.581'
},
{
id: 'fifth',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:51:25.581',
numConsecutiveItems: 1
}
请注意,id 为 third
的消息不在最终输出中,因为它的类型为 new_item
并连续跟随另一条类型为 new_item
的消息,并且消息带有出于同样的原因,second
的 id 具有 2
的 numConsecutiveItems
。更重要的是,id 为 fifth
的消息存在是因为它没有立即跟在另一个 new_item
消息之后,出于同样的原因,它的 numConsecutiveItems
的值是 1
。我可以通过单个查询实现这一点,并按 time
列排序吗?这将是我的强烈偏好,但如果不是,那么理想情况下不超过 2 个查询。谢谢!
使用window函数创建连续类型的组并计算每组中有多少'new_item'
:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY grp) count,
ROW_NUMBER() OVER (PARTITION BY grp ORDER BY time) rn
FROM (
SELECT *, SUM(flag) OVER (ORDER BY time) grp
FROM (
SELECT *, (type <> LAG(type, 1, '') OVER (ORDER BY time)) flag
FROM tablename
)
)
)
SELECT id, type, text, time,
CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;
对于不支持window函数的SQLite版本,使用聚合和相关子查询来模拟window函数:
WITH
prev_types AS (
SELECT t1.*, t1.type <> COALESCE(t2.type, '') flag, MAX(t2.time) max_time
FROM tablename t1 LEFT JOIN tablename t2
ON t2.time < t1.time
GROUP BY t1.id
),
sum_flags AS (
SELECT pt1.*, SUM(pt2.flag) grp
FROM prev_types pt1 INNER JOIN prev_types pt2
ON pt2.time <= pt1.time
GROUP BY pt1.id
),
cte AS (
SELECT sf1.*,
(SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp) count,
(SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp AND sf2.time <= sf1.time) rn
FROM sum_flags sf1
)
SELECT id, type, text, time,
CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;
参见demo。
更新
即使有一个可接受的答案,我仍然愿意接受其他建议。我需要这个才能在 sqlite 上工作,至少要回到 3.19.4
版本(实际上是 Android 8),并且接受答案的最高性能形式(使用 window 函数)之前不可用sqlite 版本 3.28
。当查询的 table 包含数百行时,回退会导致设备停止然后崩溃,所以我不能依赖那些。
原问题
假设我有一个名为 messages
的 sqlite table,其中包含以下列:
| id | type | text | time |
---------------------------
id
是主键,是唯一的。假设我有 5 行,顺序如下(为清楚起见,将它们表示为 JSON 数组):
[
{
id: 'first',
type: 'random',
text: 'hey there',
time: '2022-02-15T01:47:25.581'
},
{
id: 'second',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:48:25.581'
}
{
id: 'third',
type: 'new_item',
text: 'new socks',
time: '2022-02-15T01:49:25.581'
}
{
id: 'fourth',
type: 'random',
text: 'what time is it',
time: '2022-02-15T01:50:25.581'
},
{
id: 'fifth',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:51:25.581'
}
我想查询这些消息,以便将类型为 new_item
的连续行表示为一个,以及最终中每个唯一 new_item
的连续行数输出。具体来说,我希望输出能够为我提供以下捕获的信息(不必是相同的架构,这只是我想要的示例):
[
{
id: 'first',
type: 'random',
text: 'hey there',
time: '2022-02-15T01:47:25.581'
},
{
id: 'second',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:48:25.581',
numConsecutiveItems: 2
}
{
id: 'fourth',
type: 'random',
text: 'what time is it',
time: '2022-02-15T01:50:25.581'
},
{
id: 'fifth',
type: 'new_item',
text: 'new shoe',
time: '2022-02-15T01:51:25.581',
numConsecutiveItems: 1
}
请注意,id 为 third
的消息不在最终输出中,因为它的类型为 new_item
并连续跟随另一条类型为 new_item
的消息,并且消息带有出于同样的原因,second
的 id 具有 2
的 numConsecutiveItems
。更重要的是,id 为 fifth
的消息存在是因为它没有立即跟在另一个 new_item
消息之后,出于同样的原因,它的 numConsecutiveItems
的值是 1
。我可以通过单个查询实现这一点,并按 time
列排序吗?这将是我的强烈偏好,但如果不是,那么理想情况下不超过 2 个查询。谢谢!
使用window函数创建连续类型的组并计算每组中有多少'new_item'
:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY grp) count,
ROW_NUMBER() OVER (PARTITION BY grp ORDER BY time) rn
FROM (
SELECT *, SUM(flag) OVER (ORDER BY time) grp
FROM (
SELECT *, (type <> LAG(type, 1, '') OVER (ORDER BY time)) flag
FROM tablename
)
)
)
SELECT id, type, text, time,
CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;
对于不支持window函数的SQLite版本,使用聚合和相关子查询来模拟window函数:
WITH
prev_types AS (
SELECT t1.*, t1.type <> COALESCE(t2.type, '') flag, MAX(t2.time) max_time
FROM tablename t1 LEFT JOIN tablename t2
ON t2.time < t1.time
GROUP BY t1.id
),
sum_flags AS (
SELECT pt1.*, SUM(pt2.flag) grp
FROM prev_types pt1 INNER JOIN prev_types pt2
ON pt2.time <= pt1.time
GROUP BY pt1.id
),
cte AS (
SELECT sf1.*,
(SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp) count,
(SELECT COUNT(*) FROM sum_flags sf2 WHERE sf2.grp = sf1.grp AND sf2.time <= sf1.time) rn
FROM sum_flags sf1
)
SELECT id, type, text, time,
CASE WHEN type = 'new_item' THEN count END numConsecutiveItems
FROM cte
WHERE numConsecutiveItems IS NULL OR rn = 1
ORDER BY time;
参见demo。