每次取第一行 window

Take the first row from every time window

我有 2 列:一列是时间列,另一列是某种布尔类型的列:

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:33   0
2010-08-01 10:59:34   1
2010-08-01 10:59:36   0
2010-08-01 10:59:38   1
2010-08-01 10:59:41   1
2010-08-01 10:59:43   0
2010-08-01 10:59:45   1
2010-08-01 10:59:47   0
2010-08-01 10:59:53   1

我想每 10 秒 window 取第一行。

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:43   0

我该怎么做?

您可以使用 row_number():

select t.*
from (select t.*,
             row_nubmer() over (partition by date_trunc('minute', gmt), floor(extract(seconds from gmt) / 6)
                                order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

您也可以转换为字符串:

select t.*
from (select t.*,
             row_nubmer() over (partition by left(to_char(gmt, 'YYYYMMDDHH24MMSS'), 13)
                                order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

或使用epoch:

select t.*
from (select t.*,
             row_nubmer() over (partition by floor(extract(epoch from gmt) / 10)                                   order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

Vertica 处于最佳状态 - 虽然您确实会得到一个时间序列捕捉到下一个 10 秒边界以开始它,因此如果您不修复它会返回不同的行。

如果您确实需要确切的原始时间戳,则将最小时间序列时间戳与下一个最小实际时间戳之间的差异添加到 10 秒快照时间片中 - 在这种特定情况下为 2 秒 -检查下面的 tbts 常用 Table 表达式。

WITH
-- your input ...
input(gmt,val) AS (
          SELECT TIMESTAMP '2010-08-01 10:59:32',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:33',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:34',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:36',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:38',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:41',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:43',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:45',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:47',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:53',1
)
,
-- create the timeseries - I decide to snap it to exact 10-second time slices
-- use the Vertica TIME SLICE function to create the limits of the time series
tm(tm) AS (
            SELECT MIN(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
  UNION ALL SELECT MAX(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
)
,
-- use Vertica's TIMESERIES clause to actually create the time series
-- which will be snapped to 10-second borders
tb(tb) AS (
  SELECT tb 
  FROM tm
  TIMESERIES tb AS '10 SECONDS' OVER(ORDER BY tm)
)
,
 -- add the difference between timeseries timestamp and actual timestamp
ts(ts) AS (
  SELECT 
    tb +( (SELECT MIN(gmt) FROM INPUT) - (SELECT MIN(tb) FROM tb) )
  FROM tb
)
-- finally, use the "Event Series Join"
-- - That's the INTERPOLATE PREVIOUS VALUE predicate - 
-- to apply an outer join
SELECT
  gmt
, ts AS control_ts
, val
FROM input
LEFT
JOIN ts
  ON gmt INTERPOLATE PREVIOUS VALUE ts
WHERE gmt IS NOT NULL
-- Vertica's Analytic Limit Clause
LIMIT 1 OVER(PARTITION BY ts ORDER BY gmt)
;

Returns:

         gmt         |     control_ts      | val 
---------------------+---------------------+-----
 2010-08-01 10:59:32 | 2010-08-01 10:59:32 |   1
 2010-08-01 10:59:43 | 2010-08-01 10:59:42 |   0
 2010-08-01 10:59:53 | 2010-08-01 10:59:52 |   1