timescaledb postgresql性能问题跟进2
timescaledb postgresql performance issue follow up 2
这个问题是我之前 timescaledb postgresql performance issue 解决分块问题的后续问题。
我有一个问题需要帮助,我有一个 table 有 6700 万行,如下所示
SELECT * FROM db_009a005a_df_downloaded_grand order by "timestamp" desc limit 1000;
symbol
timestamp
volume
close
high
low
open
ANT_USDT
2021-08-31 19:55:00
13.198
5.111
5.123
5.11
5.123
FET_USDT
2021-08-31 19:55:00
26.443781800000004
0.7253
0.7255
0.7224
0.7246
ONC_USDT
2021-08-31 19:55:00
47.89
0.363
0.3633
0.3628
0.3633
FSN_USDT
2021-08-31 19:55:00
1044.8977859
0.5454
0.5509
0.5454
0.5499
PCX_USDT
2021-08-31 19:55:00
1158.901
3.926
3.934
3.913
3.925
PRQ_USDT
2021-08-31 19:55:00
681.83529405
0.6791
0.6807
0.6757
0.6805
CREDIT_USDT
2021-08-31 19:55:00
3045.81454624
0.10573
0.10662
0.10567
0.10567
JFI_USDT
2021-08-31 19:55:00
0.6434
47.08
47.1
46.91
46.91
ENJ_USDT
2021-08-31 19:55:00
2613.32204107
2.018
2.0315
2.018
2.0294
我有我认为已正确生成的索引
tablename
indexname
indexdef
db_009a005a_df_downloaded_grand
db_009a005a_df_downloaded_grand2_symbol_timestamp_idx
CREATE INDEX db_009a005a_df_downloaded_grand2_symbol_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree (symbol, "timestamp" DESC)
db_009a005a_df_downloaded_grand
db_009a005a_df_downloaded_grand2_timestamp_idx
CREATE INDEX db_009a005a_df_downloaded_grand2_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree ("timestamp" DESC)
一个常见的查询问题是检查每个交易品种的最新时间戳是什么:
SELECT symbol, max("timestamp") FROM db_009a005a_df_downloaded_grand group by symbol;
symbol
max
100X_USDT
2021-08-31 19:55:00
10SET_USDT
2021-08-31 19:55:00
1INCH3L_USDT
2021-08-31 19:20:00
1INCH3S_USDT
2021-08-31 19:10:00
1INCH_USDT
2021-08-31 19:55:00
88MPH_USDT
2021-08-31 19:55:00
A5T_USDT
2021-08-31 19:55:00
AAVE3L_USDT
2021-08-31 19:55:00
AAVE3S_USDT
2021-08-31 19:30:00
但是,这需要 15 秒以上才能让我返回 1000 行......这个速度合理吗?无论如何我可以让它更快吗?我在一个具有 8 个内核和 32GB 内存的专用 AWS 实例上......
解释分析结果如下:
explain analyze
select symbol , max("timestamp") from public.db_009a005a_df_downloaded_grand db2 group by symbol ;
QUERY PLAN
Finalize GroupAggregate (cost=1102706.72..1103220.75 rows=1010 width=17) (actual time=6328.788..6385.198 rows=1028 loops=1)
Group Key: db2_1.symbol
-> Gather Merge (cost=1102706.72..1103190.45 rows=4040 width=17) (actual time=6328.770..6383.885 rows=4952 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Sort (cost=1101706.66..1101709.19 rows=1010 width=17) (actual time=6286.324..6286.414 rows=990 loops=5)
Sort Key: db2_1.symbol
Sort Method: quicksort Memory: 100kB
Worker 0: Sort Method: quicksort Memory: 100kB
Worker 1: Sort Method: quicksort Memory: 103kB
Worker 2: Sort Method: quicksort Memory: 103kB
Worker 3: Sort Method: quicksort Memory: 100kB
-> Partial HashAggregate (cost=1101646.16..1101656.26 rows=1010 width=17) (actual time=6284.663..6284.886 rows=990 loops=5)
Group Key: db2_1.symbol
Batches: 1 Memory Usage: 193kB
Worker 0: Batches: 1 Memory Usage: 193kB
Worker 1: Batches: 1 Memory Usage: 193kB
Worker 2: Batches: 1 Memory Usage: 193kB
Worker 3: Batches: 1 Memory Usage: 193kB
-> Parallel Append (cost=0.00..1017326.87 rows=16863858 width=17) (actual time=190.496..3557.402 rows=13491444 loops=5)
-> Parallel Seq Scan on _hyper_68_2367_chunk db2_1 (cost=0.00..772558.91 rows=13957491 width=17) (actual time=63.172..1792.108 rows=11166350 loops=5)
-> Parallel Seq Scan on _hyper_68_2368_chunk db2_2 (cost=0.00..160448.67 rows=2906367 width=17) (actual time=212.293..858.803 rows=3875156 loops=3)
Planning Time: 0.230 ms
JIT:
Functions: 48
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 7.376 ms, Inlining 338.267 ms, Optimization 394.122 ms, Emission 218.212 ms, Total 957.977 ms
Execution Time: 6387.027 ms
您可以使用索引跳过扫描。由于 PostgreSQL 不会自然地实现它们,您可以 emulate one 使用递归 CTE。这根本不需要 timescaledb 或分区(事实上它可能会干扰这个——我不知道)
或者,如果您在某个地方有所有符号的 table,您可以对其进行横向连接。
SELECT * FROM symbols left join lateral
(select "timestamp" from df where df.symbol=symbols.symbol order by "timestamp" desc limit 1) on true;
感谢 janes,最终的解决方案是
with symbol_set as (
WITH RECURSIVE t AS (
(SELECT symbol FROM db_009a005a_df_downloaded_grand ORDER BY symbol LIMIT 1) -- parentheses required
UNION ALL
SELECT (SELECT symbol FROM db_009a005a_df_downloaded_grand WHERE symbol > t.symbol ORDER BY symbol LIMIT 1)
FROM t
WHERE t.symbol IS NOT NULL
)
SELECT symbol FROM t WHERE symbol IS NOT NULL
)
SELECT * FROM symbol_set as db2 left join lateral
(select "timestamp" from db_009a005a_df_downloaded_grand as db3 where db2.symbol=db3.symbol order by db3."timestamp" desc limit 1) as db4 on true;
与之前的10+s相比,总共用了70ms
这个问题是我之前 timescaledb postgresql performance issue 解决分块问题的后续问题。
我有一个问题需要帮助,我有一个 table 有 6700 万行,如下所示
SELECT * FROM db_009a005a_df_downloaded_grand order by "timestamp" desc limit 1000;
symbol | timestamp | volume | close | high | low | open |
---|---|---|---|---|---|---|
ANT_USDT | 2021-08-31 19:55:00 | 13.198 | 5.111 | 5.123 | 5.11 | 5.123 |
FET_USDT | 2021-08-31 19:55:00 | 26.443781800000004 | 0.7253 | 0.7255 | 0.7224 | 0.7246 |
ONC_USDT | 2021-08-31 19:55:00 | 47.89 | 0.363 | 0.3633 | 0.3628 | 0.3633 |
FSN_USDT | 2021-08-31 19:55:00 | 1044.8977859 | 0.5454 | 0.5509 | 0.5454 | 0.5499 |
PCX_USDT | 2021-08-31 19:55:00 | 1158.901 | 3.926 | 3.934 | 3.913 | 3.925 |
PRQ_USDT | 2021-08-31 19:55:00 | 681.83529405 | 0.6791 | 0.6807 | 0.6757 | 0.6805 |
CREDIT_USDT | 2021-08-31 19:55:00 | 3045.81454624 | 0.10573 | 0.10662 | 0.10567 | 0.10567 |
JFI_USDT | 2021-08-31 19:55:00 | 0.6434 | 47.08 | 47.1 | 46.91 | 46.91 |
ENJ_USDT | 2021-08-31 19:55:00 | 2613.32204107 | 2.018 | 2.0315 | 2.018 | 2.0294 |
我有我认为已正确生成的索引
tablename | indexname | indexdef |
---|---|---|
db_009a005a_df_downloaded_grand | db_009a005a_df_downloaded_grand2_symbol_timestamp_idx | CREATE INDEX db_009a005a_df_downloaded_grand2_symbol_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree (symbol, "timestamp" DESC) |
db_009a005a_df_downloaded_grand | db_009a005a_df_downloaded_grand2_timestamp_idx | CREATE INDEX db_009a005a_df_downloaded_grand2_timestamp_idx ON public.db_009a005a_df_downloaded_grand USING btree ("timestamp" DESC) |
一个常见的查询问题是检查每个交易品种的最新时间戳是什么:
SELECT symbol, max("timestamp") FROM db_009a005a_df_downloaded_grand group by symbol;
symbol | max |
---|---|
100X_USDT | 2021-08-31 19:55:00 |
10SET_USDT | 2021-08-31 19:55:00 |
1INCH3L_USDT | 2021-08-31 19:20:00 |
1INCH3S_USDT | 2021-08-31 19:10:00 |
1INCH_USDT | 2021-08-31 19:55:00 |
88MPH_USDT | 2021-08-31 19:55:00 |
A5T_USDT | 2021-08-31 19:55:00 |
AAVE3L_USDT | 2021-08-31 19:55:00 |
AAVE3S_USDT | 2021-08-31 19:30:00 |
但是,这需要 15 秒以上才能让我返回 1000 行......这个速度合理吗?无论如何我可以让它更快吗?我在一个具有 8 个内核和 32GB 内存的专用 AWS 实例上......
解释分析结果如下:
explain analyze
select symbol , max("timestamp") from public.db_009a005a_df_downloaded_grand db2 group by symbol ;
QUERY PLAN |
---|
Finalize GroupAggregate (cost=1102706.72..1103220.75 rows=1010 width=17) (actual time=6328.788..6385.198 rows=1028 loops=1) |
Group Key: db2_1.symbol |
-> Gather Merge (cost=1102706.72..1103190.45 rows=4040 width=17) (actual time=6328.770..6383.885 rows=4952 loops=1) |
Workers Planned: 4 |
Workers Launched: 4 |
-> Sort (cost=1101706.66..1101709.19 rows=1010 width=17) (actual time=6286.324..6286.414 rows=990 loops=5) |
Sort Key: db2_1.symbol |
Sort Method: quicksort Memory: 100kB |
Worker 0: Sort Method: quicksort Memory: 100kB |
Worker 1: Sort Method: quicksort Memory: 103kB |
Worker 2: Sort Method: quicksort Memory: 103kB |
Worker 3: Sort Method: quicksort Memory: 100kB |
-> Partial HashAggregate (cost=1101646.16..1101656.26 rows=1010 width=17) (actual time=6284.663..6284.886 rows=990 loops=5) |
Group Key: db2_1.symbol |
Batches: 1 Memory Usage: 193kB |
Worker 0: Batches: 1 Memory Usage: 193kB |
Worker 1: Batches: 1 Memory Usage: 193kB |
Worker 2: Batches: 1 Memory Usage: 193kB |
Worker 3: Batches: 1 Memory Usage: 193kB |
-> Parallel Append (cost=0.00..1017326.87 rows=16863858 width=17) (actual time=190.496..3557.402 rows=13491444 loops=5) |
-> Parallel Seq Scan on _hyper_68_2367_chunk db2_1 (cost=0.00..772558.91 rows=13957491 width=17) (actual time=63.172..1792.108 rows=11166350 loops=5) |
-> Parallel Seq Scan on _hyper_68_2368_chunk db2_2 (cost=0.00..160448.67 rows=2906367 width=17) (actual time=212.293..858.803 rows=3875156 loops=3) |
Planning Time: 0.230 ms |
JIT: |
Functions: 48 |
Options: Inlining true, Optimization true, Expressions true, Deforming true |
Timing: Generation 7.376 ms, Inlining 338.267 ms, Optimization 394.122 ms, Emission 218.212 ms, Total 957.977 ms |
Execution Time: 6387.027 ms |
您可以使用索引跳过扫描。由于 PostgreSQL 不会自然地实现它们,您可以 emulate one 使用递归 CTE。这根本不需要 timescaledb 或分区(事实上它可能会干扰这个——我不知道)
或者,如果您在某个地方有所有符号的 table,您可以对其进行横向连接。
SELECT * FROM symbols left join lateral
(select "timestamp" from df where df.symbol=symbols.symbol order by "timestamp" desc limit 1) on true;
感谢 janes,最终的解决方案是
with symbol_set as (
WITH RECURSIVE t AS (
(SELECT symbol FROM db_009a005a_df_downloaded_grand ORDER BY symbol LIMIT 1) -- parentheses required
UNION ALL
SELECT (SELECT symbol FROM db_009a005a_df_downloaded_grand WHERE symbol > t.symbol ORDER BY symbol LIMIT 1)
FROM t
WHERE t.symbol IS NOT NULL
)
SELECT symbol FROM t WHERE symbol IS NOT NULL
)
SELECT * FROM symbol_set as db2 left join lateral
(select "timestamp" from db_009a005a_df_downloaded_grand as db3 where db2.symbol=db3.symbol order by db3."timestamp" desc limit 1) as db4 on true;
与之前的10+s相比,总共用了70ms