timescaledb postgresql性能问题跟进2

timescaledb postgresql performance issue follow up 2

这个问题是我之前 timescaledb postgresql performance issue 解决分块问题的后续问题。

我有一个问题需要帮助,我有一个 table 有 6700 万行,如下所示

   SELECT * FROM db_009a005a_df_downloaded_grand order by "timestamp" desc limit 1000;
symbol timestamp volume close high low open
ANT_USDT 2021-08-31 19:55:00 13.198 5.111 5.123 5.11 5.123
FET_USDT 2021-08-31 19:55:00 26.443781800000004 0.7253 0.7255 0.7224 0.7246
ONC_USDT 2021-08-31 19:55:00 47.89 0.363 0.3633 0.3628 0.3633
FSN_USDT 2021-08-31 19:55:00 1044.8977859 0.5454 0.5509 0.5454 0.5499
PCX_USDT 2021-08-31 19:55:00 1158.901 3.926 3.934 3.913 3.925
PRQ_USDT 2021-08-31 19:55:00 681.83529405 0.6791 0.6807 0.6757 0.6805
CREDIT_USDT 2021-08-31 19:55:00 3045.81454624 0.10573 0.10662 0.10567 0.10567
JFI_USDT 2021-08-31 19:55:00 0.6434 47.08 47.1 46.91 46.91
ENJ_USDT 2021-08-31 19:55:00 2613.32204107 2.018 2.0315 2.018 2.0294

我有我认为已正确生成的索引

tablename indexname indexdef
db_009a005a_df_downloaded_grand db_009a005a_df_downloaded_grand2_symbol_timestamp_idx CREATE INDEX db_009a005a_df_downloaded_grand2_symbol_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree (symbol, "timestamp" DESC)
db_009a005a_df_downloaded_grand db_009a005a_df_downloaded_grand2_timestamp_idx CREATE INDEX db_009a005a_df_downloaded_grand2_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree ("timestamp" DESC)

一个常见的查询问题是检查每个交易品种的最新时间戳是什么:

SELECT symbol, max("timestamp") FROM db_009a005a_df_downloaded_grand group by symbol;
symbol max
100X_USDT 2021-08-31 19:55:00
10SET_USDT 2021-08-31 19:55:00
1INCH3L_USDT 2021-08-31 19:20:00
1INCH3S_USDT 2021-08-31 19:10:00
1INCH_USDT 2021-08-31 19:55:00
88MPH_USDT 2021-08-31 19:55:00
A5T_USDT 2021-08-31 19:55:00
AAVE3L_USDT 2021-08-31 19:55:00
AAVE3S_USDT 2021-08-31 19:30:00

但是,这需要 15 秒以上才能让我返回 1000 行......这个速度合理吗?无论如何我可以让它更快吗?我在一个具有 8 个内核和 32GB 内存的专用 AWS 实例上......

解释分析结果如下:

explain analyze
select symbol , max("timestamp") from public.db_009a005a_df_downloaded_grand db2 group by symbol ;
QUERY PLAN
Finalize GroupAggregate (cost=1102706.72..1103220.75 rows=1010 width=17) (actual time=6328.788..6385.198 rows=1028 loops=1)
Group Key: db2_1.symbol
-> Gather Merge (cost=1102706.72..1103190.45 rows=4040 width=17) (actual time=6328.770..6383.885 rows=4952 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=1101706.66..1101709.19 rows=1010 width=17) (actual time=6286.324..6286.414 rows=990 loops=5)
Sort Key: db2_1.symbol
Sort Method: quicksort Memory: 100kB
Worker 0: Sort Method: quicksort Memory: 100kB
Worker 1: Sort Method: quicksort Memory: 103kB
Worker 2: Sort Method: quicksort Memory: 103kB
Worker 3: Sort Method: quicksort Memory: 100kB
-> Partial HashAggregate (cost=1101646.16..1101656.26 rows=1010 width=17) (actual time=6284.663..6284.886 rows=990 loops=5)
Group Key: db2_1.symbol
Batches: 1 Memory Usage: 193kB
Worker 0: Batches: 1 Memory Usage: 193kB
Worker 1: Batches: 1 Memory Usage: 193kB
Worker 2: Batches: 1 Memory Usage: 193kB
Worker 3: Batches: 1 Memory Usage: 193kB
-> Parallel Append (cost=0.00..1017326.87 rows=16863858 width=17) (actual time=190.496..3557.402 rows=13491444 loops=5)
-> Parallel Seq Scan on _hyper_68_2367_chunk db2_1 (cost=0.00..772558.91 rows=13957491 width=17) (actual time=63.172..1792.108 rows=11166350 loops=5)
-> Parallel Seq Scan on _hyper_68_2368_chunk db2_2 (cost=0.00..160448.67 rows=2906367 width=17) (actual time=212.293..858.803 rows=3875156 loops=3)
Planning Time: 0.230 ms
JIT:
Functions: 48
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.376 ms, Inlining 338.267 ms, Optimization 394.122 ms, Emission 218.212 ms, Total 957.977 ms
Execution Time: 6387.027 ms

您可以使用索引跳过扫描。由于 PostgreSQL 不会自然地实现它们,您可以 emulate one 使用递归 CTE。这根本不需要 timescaledb 或分区(事实上它可能会干扰这个——我不知道)

或者,如果您在某个地方有所有符号的 table,您可以对其进行横向连接。

 SELECT * FROM symbols left join lateral 
   (select "timestamp" from df where df.symbol=symbols.symbol order by "timestamp" desc limit 1) on true;

感谢 janes,最终的解决方案是

with symbol_set as (
    WITH RECURSIVE t AS (
       (SELECT symbol FROM db_009a005a_df_downloaded_grand ORDER BY symbol LIMIT 1)  -- parentheses required
       UNION ALL
       SELECT (SELECT symbol FROM db_009a005a_df_downloaded_grand WHERE symbol > t.symbol ORDER BY symbol LIMIT 1)
       FROM t
       WHERE t.symbol IS NOT NULL
       )
    SELECT symbol FROM t WHERE symbol IS NOT NULL
)
 SELECT * FROM symbol_set as db2 left join lateral 
   (select "timestamp" from db_009a005a_df_downloaded_grand as db3 where db2.symbol=db3.symbol order by db3."timestamp" desc limit 1) as db4 on true;

与之前的10+s相比,总共用了70ms