如何使用日期时间列按分钟获取记录数

How to get a count of records by minute using a datetime column

我有一个 table,其中的列如下:

Customer Time_Start
A 01/20/2020 01:25:00
A 01/22/2020 14:15:00
A 01/20/2020 03:23:00
A 01/21/2020 20:37:00

我正在尝试获取一个 table,它在给定的一天按分钟(包括零)输出 table。

Customer Time_Start Count
A 01/20/2020 00:01:00 5
A 01/20/2020 00:02:00 2
A 01/20/2020 00:03:00 0
A 01/20/2020 00:04:00 12

我想让它一次只为 1 位客户显示 1 天。

这是我目前的情况:

select 
customer,
cast(time_start as time) + ((cast(time_start as time) - cast('00:00:00' as time)) hour(2)) as TimeStampHour,
count(*) as count
from    Table_1
where   customer in ('A')
group by    customer, TimeStampHour
order by    TimeStampHour

首先,我们创建一个 return 是一个 table 的函数,它有一个字段和从 time_start 值开始并增加一分钟周期的记录。

CREATE OR REPLACE FUNCTION get_dates(time_start timestamp without time zone)
 RETURNS TABLE(list_dates timestamp without time zone)
 LANGUAGE plpgsql
AS $function$
declare
    time_end timestamp;
begin
    time_end = time_start + interval '1 day';
    
    return query 
    SELECT t1.dates
    FROM   generate_series(time_start, time_end, interval '1 min') t1(dates);
        
END;
$function$
;

考虑一下,这个函数 return 的时间戳列表有一个小时和一个分钟,但是秒和毫秒总是等于零。我们必须使用 timestamp 字段将此函数结果与我们的 customer table 连接起来。但是我们customertable的timestamp字段是完整的DateTime格式,second和millisecond都不为空。因此,据了解,我们的加入条件将不正确。我们需要的是,在customer table timestamp 字段的所有数据总是有一个空秒和空毫秒,以便正确连接。为此,我创建了一个 immutable 函数,当您向该函数发送完全格式化的 DateTime 2021-02-03 18:24:51.203 时,该函数将是 return 时间戳,即空秒和空毫秒,在此格式:2021-02-03 18:24:00.000

CREATE OR REPLACE FUNCTION clear_seconds_from_datetime(dt timestamp without time zone)
 RETURNS timestamp without time zone
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
DECLARE
    str_date text;
    str_hour text; 
    str_min text;
    v_ret timestamp; 
begin

    str_date = (dt::date)::text; 
    str_hour = (extract(hour from dt))::text;
    str_min = (extract(min from dt))::text;

    str_date = str_date || ' ' || str_hour || ':' || str_min || ':' || '00';
    v_ret = str_date::timestamp; 
    return v_ret; 

END;
$function$
;

为什么我们的函数是immutable?为了高性能,我想在 PostgreSQL 上使用基于函数的索引。但是,PostgreSQL 只需要在索引上使用 immutable 函数。我们在加入过程的条件下使用这个函数。下面我们来看创建函数索引的过程:

CREATE INDEX customer_date_function_idx ON customer USING btree ((clear_seconds_from_datetime(datestamp)));

让我们开始编写我们需要的主要查询:

select 
    t1.list_dates, 
    cc.userid, 
    count(cc.id) as count_as
from 
    get_dates('2021-02-03 00:01:00'::timestamp) t1 (list_dates)
left join 
    customer cc on clear_seconds_from_datetime(cc.datestamp) = t1.list_dates
group by t1.list_dates, cc.userid

我在customertable中插入了3000万样本数据进行测试。但是,此查询 运行 的时间为 33 毫秒。查看生成的 explain analyze 命令的查询计划:

HashAggregate  (cost=309264.30..309432.30 rows=16800 width=20) (actual time=26.076..26.958 rows=2022 loops=1)
  Group Key: t1.list_dates, lg.userid
  ->  Nested Loop Left Join  (cost=0.68..253482.75 rows=7437540 width=16) (actual time=18.870..24.882 rows=2023 loops=1)
        ->  Function Scan on get_dates t1  (cost=0.25..10.25 rows=1000 width=8) (actual time=18.699..18.906 rows=1441 loops=1)
        ->  Index Scan using log_date_cast_idx on log lg  (cost=0.43..179.09 rows=7438 width=16) (actual time=0.003..0.003 rows=1 loops=1441)
              Index Cond: (clear_seconds_from_datetime(datestamp) = t1.list_dates)
Planning Time: 0.398 ms
JIT:
  Functions: 12
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 3.544 ms, Inlining 0.000 ms, Optimization 0.816 ms, Emission 16.709 ms, Total 21.069 ms
Execution Time: 31.429 ms

结果:

list_dates group_count
2021-02-03 09:41:00.000 1
2021-02-03 09:42:00.000 3
2021-02-03 09:43:00.000 1
2021-02-03 09:44:00.000 3
2021-02-03 09:45:00.000 1
2021-02-03 09:46:00.000 5
2021-02-03 09:47:00.000 2
2021-02-03 09:48:00.000 1
2021-02-03 09:49:00.000 1
2021-02-03 09:50:00.000 1
2021-02-03 09:51:00.000 4
2021-02-03 09:52:00.000 0
2021-02-03 09:53:00.000 0
2021-02-03 09:54:00.000 2
2021-02-03 09:55:00.000 1

在 Teradata 16.20 中,使用新的时间序列聚合

这将是一项简单的任务
SELECT 
   customer
  ,Cast(Begin($Td_TimeCode_Range) AS VARCHAR(16))
  ,Count(*)
FROM table_1
WHERE customer = 'A'
  AND time_start BETWEEN TIMESTAMP'2020-01-20 00:00:00' 
                     AND Prior(TIMESTAMP'2020-01-20 00:00:00' + INTERVAL '1' DAY)
GROUP BY TIME(Minutes(1) AND customer) 
         USING timecode(time_start)
         FILL (0)
;

在你必须像 Ramin Faracov answer 那样实施它之前,先创建一个所有会议记录的列表,然后左键加入它。但我更喜欢在 before joining:

WITH all_minutes AS
 ( -- create a list of all minutes
   SELECT 
      Begin(pd) AS bucket
   FROM 
    ( -- EXPAND ON requires FROM and TRUNC materializes the FROM avoiding error
      -- "9303. EXPAND ON clause must not be specified in a query expression with no table references."
      SELECT Cast(Trunc(DATE '2020-01-20') AS TIMESTAMP(0)) AS start_date
    ) AS dt
   EXPAND ON PERIOD(start_date, start_date + INTERVAL '1' DAY) AS pd
          BY INTERVAL '1' MINUTE  
 )
SELECT customer
  ,bucket
  ,Coalesce(Cnt, 0)
FROM all_minutes
LEFT JOIN
 (
   SELECT customer
     ,time_start
      - (Extract (SECOND From time_start) * INTERVAL '1' SECOND) AS time_minute
      ,Count(*) AS Cnt
   FROM table_1
   WHERE customer = 'A'
     AND time_start BETWEEN TIMESTAMP'2020-01-20 00:00:00' 
                        AND Prior(TIMESTAMP'2020-01-20 00:00:00' + INTERVAL '1' DAY)
   GROUP BY customer, time_minute
 ) AS counts
ON counts.time_minute = bucket 
ORDER BY bucket
;