慢 WHERE IN 查询结束
Slow WHERE IN End of Query
来自我的示例数据 table test_table
:
date symbol value created_time
2010-01-09 symbol1 101 3847474847
2010-01-10 symbol1 102 3847474847
2010-01-10 symbol1 102.5 3847475500
2010-01-10 symbol2 204 3847474847
2010-01-11 symbol1 109 3847474847
2010-01-12 symbol1 105 3847474847
2010-01-12 symbol2 206 3847474847
下面是我目前在 table 上使用的查询,其中包含大约 3k 个唯一符号和大约 1100 万行。它仍然需要一段时间(超过 80% 的查询时间花在最后的子查询扫描上(我认为这是查询末尾的 WHERE IN 子句。有什么办法可以加快这部分的速度(想象我在那个 WHERE IN 查找中有很多符号。
select date, symbol, value, created_time
from (select *,
rank() over (partition by date, symbol order by created_time desc) as rownum
from test_table) x
where rownum = 1 and symbol in ('symbol1', 'symbol2', 'symbol5', ...)
如果有帮助,下面是 EXPLAIN ANALYZE 输出。
QUERY PLAN
Subquery Scan on x (cost=281573.35..282473.76 rows=129 width=37) (actual time=2874.389..3037.008 rows=32393 loops=1)
Filter: (x.rownum = 1)
Rows Removed by Filter: 183
-> WindowAgg (cost=281573.35..282152.19 rows=25726 width=37) (actual time=2874.363..2980.848 rows=32576 loops=1)
-> Sort (cost=281573.35..281637.67 rows=25726 width=37) (actual time=2874.340..2901.443 rows=32576 loops=1)
Sort Key: "test_table".date, "test_table".symbol, "test_table".created_time DESC
Sort Method: quicksort Memory: 3314kB
-> Seq Scan on "test_table" (cost=0.00..279688.80 rows=25726 width=37) (actual time=118.338..2693.767 rows=32576 loops=1)
Filter: (symbol = ANY ('{symbol5,symbol8,symbol15,symbol98,symbol43,symbol908}'::text[]))
Rows Removed by Filter: 10649132
Planning time: 0.999 ms
Execution time: 3064.496 ms
尝试使用 distinct on
代替:
select distinct on (symbol, date) date, symbol, value, created_time
from test_table
where symbol in ('symbol1', 'symbol2', 'symbol5', ...)
order by symbol, date, created_time desc;
对于此查询,您需要 test_table(symbol, date, created_time desc)
上的索引。
来自我的示例数据 table test_table
:
date symbol value created_time
2010-01-09 symbol1 101 3847474847
2010-01-10 symbol1 102 3847474847
2010-01-10 symbol1 102.5 3847475500
2010-01-10 symbol2 204 3847474847
2010-01-11 symbol1 109 3847474847
2010-01-12 symbol1 105 3847474847
2010-01-12 symbol2 206 3847474847
下面是我目前在 table 上使用的查询,其中包含大约 3k 个唯一符号和大约 1100 万行。它仍然需要一段时间(超过 80% 的查询时间花在最后的子查询扫描上(我认为这是查询末尾的 WHERE IN 子句。有什么办法可以加快这部分的速度(想象我在那个 WHERE IN 查找中有很多符号。
select date, symbol, value, created_time
from (select *,
rank() over (partition by date, symbol order by created_time desc) as rownum
from test_table) x
where rownum = 1 and symbol in ('symbol1', 'symbol2', 'symbol5', ...)
如果有帮助,下面是 EXPLAIN ANALYZE 输出。
QUERY PLAN
Subquery Scan on x (cost=281573.35..282473.76 rows=129 width=37) (actual time=2874.389..3037.008 rows=32393 loops=1)
Filter: (x.rownum = 1)
Rows Removed by Filter: 183
-> WindowAgg (cost=281573.35..282152.19 rows=25726 width=37) (actual time=2874.363..2980.848 rows=32576 loops=1)
-> Sort (cost=281573.35..281637.67 rows=25726 width=37) (actual time=2874.340..2901.443 rows=32576 loops=1)
Sort Key: "test_table".date, "test_table".symbol, "test_table".created_time DESC
Sort Method: quicksort Memory: 3314kB
-> Seq Scan on "test_table" (cost=0.00..279688.80 rows=25726 width=37) (actual time=118.338..2693.767 rows=32576 loops=1)
Filter: (symbol = ANY ('{symbol5,symbol8,symbol15,symbol98,symbol43,symbol908}'::text[]))
Rows Removed by Filter: 10649132
Planning time: 0.999 ms
Execution time: 3064.496 ms
尝试使用 distinct on
代替:
select distinct on (symbol, date) date, symbol, value, created_time
from test_table
where symbol in ('symbol1', 'symbol2', 'symbol5', ...)
order by symbol, date, created_time desc;
对于此查询,您需要 test_table(symbol, date, created_time desc)
上的索引。