timescaledb postgresql性能问题跟进2

Question

这个问题是我之前 timescaledb postgresql performance issue 解决分块问题的后续问题。

我有一个问题需要帮助，我有一个 table 有 6700 万行，如下所示

   SELECT * FROM db_009a005a_df_downloaded_grand order by "timestamp" desc limit 1000;

symbol	timestamp	volume	close	high	low	open
ANT_USDT	2021-08-31 19:55:00	13.198	5.111	5.123	5.11	5.123
FET_USDT	2021-08-31 19:55:00	26.443781800000004	0.7253	0.7255	0.7224	0.7246
ONC_USDT	2021-08-31 19:55:00	47.89	0.363	0.3633	0.3628	0.3633
FSN_USDT	2021-08-31 19:55:00	1044.8977859	0.5454	0.5509	0.5454	0.5499
PCX_USDT	2021-08-31 19:55:00	1158.901	3.926	3.934	3.913	3.925
PRQ_USDT	2021-08-31 19:55:00	681.83529405	0.6791	0.6807	0.6757	0.6805
CREDIT_USDT	2021-08-31 19:55:00	3045.81454624	0.10573	0.10662	0.10567	0.10567
JFI_USDT	2021-08-31 19:55:00	0.6434	47.08	47.1	46.91	46.91
ENJ_USDT	2021-08-31 19:55:00	2613.32204107	2.018	2.0315	2.018	2.0294

我有我认为已正确生成的索引

tablename	indexname	indexdef
db_009a005a_df_downloaded_grand	db_009a005a_df_downloaded_grand2_symbol_timestamp_idx	CREATE INDEX db_009a005a_df_downloaded_grand2_symbol_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree (symbol, "timestamp" DESC)
db_009a005a_df_downloaded_grand	db_009a005a_df_downloaded_grand2_timestamp_idx	CREATE INDEX db_009a005a_df_downloaded_grand2_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree ("timestamp" DESC)

一个常见的查询问题是检查每个交易品种的最新时间戳是什么：

SELECT symbol, max("timestamp") FROM db_009a005a_df_downloaded_grand group by symbol;

symbol	max
100X_USDT	2021-08-31 19:55:00
10SET_USDT	2021-08-31 19:55:00
1INCH3L_USDT	2021-08-31 19:20:00
1INCH3S_USDT	2021-08-31 19:10:00
1INCH_USDT	2021-08-31 19:55:00
88MPH_USDT	2021-08-31 19:55:00
A5T_USDT	2021-08-31 19:55:00
AAVE3L_USDT	2021-08-31 19:55:00
AAVE3S_USDT	2021-08-31 19:30:00

但是，这需要 15 秒以上才能让我返回 1000 行......这个速度合理吗？无论如何我可以让它更快吗？我在一个具有 8 个内核和 32GB 内存的专用 AWS 实例上......

解释分析结果如下：

explain analyze
select symbol , max("timestamp") from public.db_009a005a_df_downloaded_grand db2 group by symbol ;

QUERY PLAN
Finalize GroupAggregate (cost=1102706.72..1103220.75 rows=1010 width=17) (actual time=6328.788..6385.198 rows=1028 loops=1)
Group Key: db2_1.symbol
-> Gather Merge (cost=1102706.72..1103190.45 rows=4040 width=17) (actual time=6328.770..6383.885 rows=4952 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=1101706.66..1101709.19 rows=1010 width=17) (actual time=6286.324..6286.414 rows=990 loops=5)
Sort Key: db2_1.symbol
Sort Method: quicksort Memory: 100kB
Worker 0: Sort Method: quicksort Memory: 100kB
Worker 1: Sort Method: quicksort Memory: 103kB
Worker 2: Sort Method: quicksort Memory: 103kB
Worker 3: Sort Method: quicksort Memory: 100kB
-> Partial HashAggregate (cost=1101646.16..1101656.26 rows=1010 width=17) (actual time=6284.663..6284.886 rows=990 loops=5)
Group Key: db2_1.symbol
Batches: 1 Memory Usage: 193kB
Worker 0: Batches: 1 Memory Usage: 193kB
Worker 1: Batches: 1 Memory Usage: 193kB
Worker 2: Batches: 1 Memory Usage: 193kB
Worker 3: Batches: 1 Memory Usage: 193kB
-> Parallel Append (cost=0.00..1017326.87 rows=16863858 width=17) (actual time=190.496..3557.402 rows=13491444 loops=5)
-> Parallel Seq Scan on _hyper_68_2367_chunk db2_1 (cost=0.00..772558.91 rows=13957491 width=17) (actual time=63.172..1792.108 rows=11166350 loops=5)
-> Parallel Seq Scan on _hyper_68_2368_chunk db2_2 (cost=0.00..160448.67 rows=2906367 width=17) (actual time=212.293..858.803 rows=3875156 loops=3)
Planning Time: 0.230 ms
JIT:
Functions: 48
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.376 ms, Inlining 338.267 ms, Optimization 394.122 ms, Emission 218.212 ms, Total 957.977 ms
Execution Time: 6387.027 ms

Answer 1

您可以使用索引跳过扫描。由于 PostgreSQL 不会自然地实现它们，您可以 emulate one 使用递归 CTE。这根本不需要 timescaledb 或分区（事实上它可能会干扰这个——我不知道）

或者，如果您在某个地方有所有符号的 table，您可以对其进行横向连接。

 SELECT * FROM symbols left join lateral 
   (select "timestamp" from df where df.symbol=symbols.symbol order by "timestamp" desc limit 1) on true;

Answer 2

感谢 janes，最终的解决方案是

with symbol_set as (
    WITH RECURSIVE t AS (
       (SELECT symbol FROM db_009a005a_df_downloaded_grand ORDER BY symbol LIMIT 1)  -- parentheses required
       UNION ALL
       SELECT (SELECT symbol FROM db_009a005a_df_downloaded_grand WHERE symbol > t.symbol ORDER BY symbol LIMIT 1)
       FROM t
       WHERE t.symbol IS NOT NULL
       )
    SELECT symbol FROM t WHERE symbol IS NOT NULL
)
 SELECT * FROM symbol_set as db2 left join lateral 
   (select "timestamp" from db_009a005a_df_downloaded_grand as db3 where db2.symbol=db3.symbol order by db3."timestamp" desc limit 1) as db4 on true;

与之前的10+s相比，总共用了70ms

timescaledb postgresql性能问题跟进2

timescaledb postgresql performance issue follow up 2

postgresql

timescaledb