为什么这个 pg 查询这么慢?我怎样才能让它更快?
Why is this pg query so slow? How can I make it faster?
这是查询:
(SELECT *
FROM url
WHERE domain = 'youtube.com'
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY likes DESC LIMIT 10)
UNION
(SELECT *
FROM url
WHERE domain = 'twitter.com'
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY likes DESC LIMIT 10)
UNION
(SELECT *
FROM url
WHERE domain = 'reddit.com'
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY likes DESC LIMIT 10)
ORDER BY timestamp DESC
这是EXPLAIN ANALYZE
。
Sort (cost=20460.17..20460.25 rows=30 width=497) (actual time=5161.013..5161.015 rows=30 loops=1)
Sort Key: url."timestamp" DESC
Sort Method: quicksort Memory: 53kB
-> HashAggregate (cost=20459.14..20459.44 rows=30 width=497) (actual time=5160.709..5160.738 rows=30 loops=1)
Group Key: url.url, url.domain, url.title, url.views, url.likes, url.dislikes, url.comments, url.shares, url.links_to_url, url."user", url.thumbnail_url, url.is_collection, url.image_url, url.video_url, url.audio_url, url.width, url.height, url.body, url.source, url."timestamp", url.created_at, url.updated_at, url.duration_seconds, url.tags, url.channel
-> Append (cost=0.43..20457.26 rows=30 width=497) (actual time=0.514..5160.073 rows=30 loops=1)
-> Limit (cost=0.43..18150.71 rows=10 width=1177) (actual time=0.513..28.599 rows=10 loops=1)
-> Index Scan Backward using "url-likes-index" on url (cost=0.43..816763.00 rows=450 width=1177) (actual time=0.511..28.594 rows=10 loops=1)
Filter: (((domain)::text = 'youtube.com'::text) AND ("timestamp" > (now() - '24:00:00'::interval)))
Rows Removed by Filter: 11106
-> Limit (cost=0.43..859.82 rows=10 width=1177) (actual time=2330.390..5033.214 rows=10 loops=1)
-> Index Scan Backward using "url-likes-index" on url url_1 (cost=0.43..816763.00 rows=9504 width=1177) (actual time=2330.388..5033.200 rows=10 loops=1)
Filter: (((domain)::text = 'twitter.com'::text) AND ("timestamp" > (now() - '24:00:00'::interval)))
Rows Removed by Filter: 1667422
-> Limit (cost=0.43..1446.28 rows=10 width=1177) (actual time=64.748..98.228 rows=10 loops=1)
-> Index Scan Backward using "url-likes-index" on url url_2 (cost=0.43..816763.00 rows=5649 width=1177) (actual time=64.745..98.220 rows=10 loops=1)
Filter: (((domain)::text = 'reddit.com'::text) AND ("timestamp" > (now() - '24:00:00'::interval)))
Rows Removed by Filter: 26739
Planning Time: 3.006 ms
Execution Time: 5162.201 ms
如果您自己对 运行 感兴趣,go to this link。
我看到一百万个 Twitter 行被过滤了,但我不确定如何避免它。我有一个 timestamp
索引,我希望可以使用它而不是按 likes
排序并扫描整个索引。这是否意味着我需要一个复合索引?有没有办法让规划器使用两个索引而不是创建另一个?
p.s。我想我搞砸了主键是 url。它使索引不必要地变大。
PostgreSQL 尝试使用 likes
上的索引来避免排序以获得前 10 个结果,但它必须丢弃许多行才能到达那里。
也许那个执行计划是最好的,也许不是。
按照以下步骤操作:
运行 ANALYZE
你 table 看看是否能解决问题。
如果没有,请在 (domain, timestamp)
上创建一个索引(按此顺序!),看看是否能改善问题。
如果这还不够,
- 删除
likes
上的索引(如果可以的话)
或
- 将
ORDER BY likes
改为ORDER BY likes + 0
。
如果所有这些都不能使它变得更好,那么您最初的查询计划是最好的,您所能做的就是使用更多 RAM,希望缓存中有更多数据。
我建议这样写查询:
SELECT ufiltered.*
FROM (SELECT url.*,
ROW_NUMBER() OVER (PARTITION BY domain ORDER BY likes DESC) AS seqnum
FROM url
WHERE domain IN ('youtube.com', 'twitter.com', 'reddit.com') AND
timestamp > NOW() - INTERVAL '24 hours'
) AS ufiltered
WHERE seqnum <= 10
ORDER BY timestamp DESC
为此,我建议在 url(timestamp, domain, likes)
上建立索引。
这是查询:
(SELECT *
FROM url
WHERE domain = 'youtube.com'
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY likes DESC LIMIT 10)
UNION
(SELECT *
FROM url
WHERE domain = 'twitter.com'
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY likes DESC LIMIT 10)
UNION
(SELECT *
FROM url
WHERE domain = 'reddit.com'
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY likes DESC LIMIT 10)
ORDER BY timestamp DESC
这是EXPLAIN ANALYZE
。
Sort (cost=20460.17..20460.25 rows=30 width=497) (actual time=5161.013..5161.015 rows=30 loops=1)
Sort Key: url."timestamp" DESC
Sort Method: quicksort Memory: 53kB
-> HashAggregate (cost=20459.14..20459.44 rows=30 width=497) (actual time=5160.709..5160.738 rows=30 loops=1)
Group Key: url.url, url.domain, url.title, url.views, url.likes, url.dislikes, url.comments, url.shares, url.links_to_url, url."user", url.thumbnail_url, url.is_collection, url.image_url, url.video_url, url.audio_url, url.width, url.height, url.body, url.source, url."timestamp", url.created_at, url.updated_at, url.duration_seconds, url.tags, url.channel
-> Append (cost=0.43..20457.26 rows=30 width=497) (actual time=0.514..5160.073 rows=30 loops=1)
-> Limit (cost=0.43..18150.71 rows=10 width=1177) (actual time=0.513..28.599 rows=10 loops=1)
-> Index Scan Backward using "url-likes-index" on url (cost=0.43..816763.00 rows=450 width=1177) (actual time=0.511..28.594 rows=10 loops=1)
Filter: (((domain)::text = 'youtube.com'::text) AND ("timestamp" > (now() - '24:00:00'::interval)))
Rows Removed by Filter: 11106
-> Limit (cost=0.43..859.82 rows=10 width=1177) (actual time=2330.390..5033.214 rows=10 loops=1)
-> Index Scan Backward using "url-likes-index" on url url_1 (cost=0.43..816763.00 rows=9504 width=1177) (actual time=2330.388..5033.200 rows=10 loops=1)
Filter: (((domain)::text = 'twitter.com'::text) AND ("timestamp" > (now() - '24:00:00'::interval)))
Rows Removed by Filter: 1667422
-> Limit (cost=0.43..1446.28 rows=10 width=1177) (actual time=64.748..98.228 rows=10 loops=1)
-> Index Scan Backward using "url-likes-index" on url url_2 (cost=0.43..816763.00 rows=5649 width=1177) (actual time=64.745..98.220 rows=10 loops=1)
Filter: (((domain)::text = 'reddit.com'::text) AND ("timestamp" > (now() - '24:00:00'::interval)))
Rows Removed by Filter: 26739
Planning Time: 3.006 ms
Execution Time: 5162.201 ms
如果您自己对 运行 感兴趣,go to this link。
我看到一百万个 Twitter 行被过滤了,但我不确定如何避免它。我有一个 timestamp
索引,我希望可以使用它而不是按 likes
排序并扫描整个索引。这是否意味着我需要一个复合索引?有没有办法让规划器使用两个索引而不是创建另一个?
p.s。我想我搞砸了主键是 url。它使索引不必要地变大。
PostgreSQL 尝试使用 likes
上的索引来避免排序以获得前 10 个结果,但它必须丢弃许多行才能到达那里。
也许那个执行计划是最好的,也许不是。
按照以下步骤操作:
运行
ANALYZE
你 table 看看是否能解决问题。如果没有,请在
(domain, timestamp)
上创建一个索引(按此顺序!),看看是否能改善问题。如果这还不够,
- 删除
likes
上的索引(如果可以的话)
或
- 将
ORDER BY likes
改为ORDER BY likes + 0
。
- 删除
如果所有这些都不能使它变得更好,那么您最初的查询计划是最好的,您所能做的就是使用更多 RAM,希望缓存中有更多数据。
我建议这样写查询:
SELECT ufiltered.*
FROM (SELECT url.*,
ROW_NUMBER() OVER (PARTITION BY domain ORDER BY likes DESC) AS seqnum
FROM url
WHERE domain IN ('youtube.com', 'twitter.com', 'reddit.com') AND
timestamp > NOW() - INTERVAL '24 hours'
) AS ufiltered
WHERE seqnum <= 10
ORDER BY timestamp DESC
为此,我建议在 url(timestamp, domain, likes)
上建立索引。