SQL: 检查连续n条记录是否大于某个值
SQL: Check if n consecutive records are greater than some value
我有一个包含数字的 table。我必须找出是否存在 n 个连续数字大于某个阈值 m 的情况。
例如
id delta
---------------
1 10
4 15
11 22
23 23
46 21
57 9
所以在这里,如果我想知道是否有 3 个连续记录的值大于 20,那么我应该得到 True。当我检查 4 个连续记录时为 False。那可能吗?这是在 Apache Spark SQL 上。谢谢
您可以使用延迟来执行此操作:
select t.*
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2
from t
) t
where val > 20 and val_1 > 20 and val_2 > 20;
这 returns 第一行是每个三人组的一部分。如果你只想 true/false:
select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2
from t
) t
where val > 20 and val_1 > 20 and val_2 > 20;
编辑:
我错过了关于不要超过 3 的部分。所以,你可以加强这个:
select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2,
lag(val, 3) over (order by id) as val_3,
lead(val, 1) over (order by id) as val_next_1
from t
) t
where (val_3 <= 20 or val_3 is null) and
(val_2 > 20 and val_1 > 20 and val > 20) and
(val_next_1 <= 20 or val_next_1 is null);
这有点棘手,因为值可以在行的开头或结尾。
我有一个包含数字的 table。我必须找出是否存在 n 个连续数字大于某个阈值 m 的情况。 例如
id delta
---------------
1 10
4 15
11 22
23 23
46 21
57 9
所以在这里,如果我想知道是否有 3 个连续记录的值大于 20,那么我应该得到 True。当我检查 4 个连续记录时为 False。那可能吗?这是在 Apache Spark SQL 上。谢谢
您可以使用延迟来执行此操作:
select t.*
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2
from t
) t
where val > 20 and val_1 > 20 and val_2 > 20;
这 returns 第一行是每个三人组的一部分。如果你只想 true/false:
select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2
from t
) t
where val > 20 and val_1 > 20 and val_2 > 20;
编辑:
我错过了关于不要超过 3 的部分。所以,你可以加强这个:
select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2,
lag(val, 3) over (order by id) as val_3,
lead(val, 1) over (order by id) as val_next_1
from t
) t
where (val_3 <= 20 or val_3 is null) and
(val_2 > 20 and val_1 > 20 and val > 20) and
(val_next_1 <= 20 or val_next_1 is null);
这有点棘手,因为值可以在行的开头或结尾。