Postgres 组结果 window 函数滞后 returns 0 行

Postgres group results with window function lag returns 0 rows

我正在尝试执行一个查询,我想忽略结果查询的第一行和最后一行。为了做到这一点,使用 window 函数给出了一个命中,就像上面给我的查询

SELECT lag(timestamp_min)    OVER (ORDER BY timestamp_min) AS timestamp_min,
       lag(type)             OVER (ORDER BY timestamp_min) AS type,
       lag(sum_first_medium) OVER (ORDER BY timestamp_min),
FROM (SELECT to_timestamp(
                floor(
                   (extract('epoch' FROM TIMESTAMP) / 300)
                ) * 300
             ) AS timestamp_min,
             type,
             floor(sum(medium[1])) AS sum_first_medium
      FROM default_dataset
      WHERE type = 'ap_clients.wlan0'
        AND timestamp > current_timestamp - INTERVAL '85 minutes'
        AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
      GROUP BY timestamp_min, type) lagme
OFFSET 2;

问题是最后一个查询没有返回任何东西:

ws_controller_hist=> SELECT lag(timestamp_min) OVER (ORDER BY timestamp_min) AS timestamp_min, lag(type) OVER (ORDER BY timestamp_min) AS type, lag(sum_first_medium) OVER (ORDER BY timestamp_min) FROM (SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) AS timestamp_min, type, floor(sum(medium[1])) AS sum_first_medium FROM default_dataset WHERE type = 'ap_clients.wlan0' AND timestamp > current_timestamp - INTERVAL '85 minutes' AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638' GROUP BY timestamp_min, type) lagme OFFSET 2;
 timestamp_min | type | lag
---------------+------+-----
(0 rows)

但我有 "ap_clients.wlan0"

类型的数据
ws_controller_hist=> select * from default_dataset where type ='ap_clients.wlan0' order by timestamp desc limit 3;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
(3 rows)

我需要一个查询来检索过去一小时内按 5 分钟分组的所有媒体的总和。

我解决问题的第一种方法是忽略我使用 offset(1) 的第一条记录,并忽略我试图在我的 id 字段中做限制的最后一条记录,按时间戳 desc 排序。

ws_controller_hist=>  
SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) 
AS timestamp_min,
       TYPE,
       floor(sum(medium[1]))
FROM default_dataset
WHERE TYPE LIKE 'ap_clients.wlan0'
  AND TIMESTAMP > CURRENT_TIMESTAMP - interval '85 minutes'
  AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
  AND id NOT IN
    (SELECT id
     FROM default_dataset
     ORDER BY TIMESTAMP DESC
     LIMIT 1)
GROUP BY timestamp_min,
         TYPE
ORDER BY timestamp_min ASC
OFFSET 1;

     timestamp_min      |       type       | floor
------------------------+------------------+-------
 2017-12-19 14:20:00+00 | ap_clients.wlan0 |    38
 2017-12-19 14:25:00+00 | ap_clients.wlan0 |    37
 2017-12-19 14:30:00+00 | ap_clients.wlan0 |    39
 2017-12-19 14:35:00+00 | ap_clients.wlan0 |    42
 2017-12-19 14:40:00+00 | ap_clients.wlan0 |    43
 2017-12-19 14:45:00+00 | ap_clients.wlan0 |    44
 2017-12-19 14:50:00+00 | ap_clients.wlan0 |    45
 2017-12-19 14:55:00+00 | ap_clients.wlan0 |    45
 2017-12-19 15:00:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:05:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:10:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:15:00+00 | ap_clients.wlan0 |    52
 2017-12-19 15:20:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:25:00+00 | ap_clients.wlan0 |    53
 2017-12-19 15:30:00+00 | ap_clients.wlan0 |    49
 2017-12-19 15:35:00+00 | ap_clients.wlan0 |    39
 2017-12-19 15:40:00+00 | ap_clients.wlan0 |    16

但我的最后一个查询没有忽略最后一条记录,因为我有相同的记录不使用子查询“并且 id 不在 (select id from default_dataset order by timestamp desc limit 1) ".

如果我尝试进行查询以查看类型 "ap_clients.wlan0" 的结果,我有

ws_controller_hist=> select * from default_dataset where organization_id='ce4b69af-bdce-4f1b-ba71-dd03544205d5' and type ='ap_clients.wlan0' order by timestamp desc limit 5;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 e67baf28-6d5b-43a5-85e2-fcf2d04a0b2e | 2018-01-02 10:06:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 c7ce16ce-9cda-423f-b32b-f4d6dce859e6 | 2018-01-02 10:01:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}

我能做什么?

一个简单的解决方案是使用 laglead window 函数,其参数不能是 NULL,这样 lag 将 return NULL 第一行和 lead 将 return NULL 作为最后一行,因此您可以简单地过滤它们都是 [=18 的行=]:

SELECT
    t2.timestamp_min,
    t2.type,
    t2.sum_first_medium
FROM (
    SELECT
        t1.*,
        lead(1) OVER(ORDER BY t1.timestamp_min) AS lead,
        lag(1) OVER(ORDER BY t1.timestamp_min) AS lag
    FROM (
        SELECT
            to_timestamp(
              floor(
                (extract('epoch' FROM TIMESTAMP) / 300)
              ) * 300
            ) AS timestamp_min,
            type,
            floor(sum(medium[1])) AS sum_first_medium
        FROM default_dataset
        WHERE
            type = 'ap_clients.wlan0'
            AND timestamp > current_timestamp - INTERVAL '85 minutes'
            AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
        GROUP BY timestamp_min, type
    ) t1
) t2
WHERE
    t2.lag IS NOT NULL -- Only first row will return NULL, skip it
    AND t2.lead IS NOT NULL -- Only last row will return NULL, skip it
ORDER BY t2.timestamp_min

注意我使用了 lead(1)lag(1) 只是因为 1 是一个非 NULL 表达式,你可以使用任何非 NULL 表达式甚至那里的列(因为保证NOT NULL).

另一个可能的解决方案是应用两个 row_number() 调用,一个使用 ORDER BY timestamp_min ASC,另一个使用 ORDER BY timestamp_min DESC,然后过滤那些 <> 1 的行。但这将需要两种数据集(一种用于 ASC,一种用于 DESC),而 lag/lead 解决方案只需要一种(尽管可能更难理解)。