如何写出高效的UPDATE-SELECTsql
How to write efficient UPDATE-SELECT sql
我为拥有大约 50.000.000 用户的 table 写了一个 sql。查询花费的时间比我预期的要多,大约 23 小时没有完成。
UPDATE users
SET building_id = B.id
FROM (
SELECT *
FROM buildings B
) AS B
WHERE B.city = address_city
AND B.town = address_town
AND B.neighbourhood = address_neighbourhood
AND B.street = address_street
AND B.no = address_building_no
这个 sql 的想法是从用户那里删除 building/address 信息,而不是将其引用到建筑物 table。
EXPLAIN
Update on users (cost=22226900.43..22548054.14 rows=15212 width=166)
-> Merge Join (cost=22226900.43..22548054.14 rows=15212 width=166)
Merge Cond: (((users.address_city)::text = (b.city)::text) AND ((users.address_town)::text = (b.town)::text) AND ((users.address_neighbourhood)::text = (b.neighbourhood)::text) AND ((users.address_street)::text = (b.street)::text) AND ((users.address_building_no)::text = (b.no)::text))
-> Sort (cost=21352886.76..21401078.96 rows=96384398 width=156)
Sort Key: users.address_city, users.address_town, users.address_neighbourhood, users.address_street, users.address_building_no
-> Seq Scan on users (cost=0.00..2559921.19 rows=96384398 width=156)
-> Materialize (cost=874013.68..883606.86 rows=9593179 width=63)
-> Sort (cost=874013.68..878810.27 rows=9593179 width=63)
Sort Key: b.city, b.town, b.neighbourhood, b.street, b.no
-> Seq Scan on buildings b (cost=0.00..136253.54 rows=9593179 width=63) (10 rows)
我不知道这个 sql 是否为每个用户或缓存使用内部 SELECT sql 进行交易。另外,如果它缓存,它是否使用缓存临时文件的索引table?
我不能这样写 sql:
FROM (
SELECT *
FROM buildings B
WHERE B.city = users.address_city
AND B.town = users.address_town
AND B.neighbourhood = users.address_neighbourhood
AND B.street = users.address_street
AND B.no = users.address_building_no
)
它说无法从内部 select 访问 users
。您对如何访问内部 sql 语句中的建筑物有什么建议吗?
不确定,但这不会更快(至少稍微快一点,如果不是相当大的话)吗?
UPDATE users
SET building_id = B.id
FROM buildings B
WHERE B.city = address_city
AND B.town = address_town
AND B.neighbourhood = address_neighbourhood
AND B.street = address_street
AND B.no = address_building_no
如果不出意外,它不需要上面 EXPLAIN
中给出的 Materialize
阶段。
我想
create table t as select column_list from a join b on column=column;
alter table t rename to users;
会更快,并且只会产生微秒级锁定...
当然,如果 table 目前不是 editable 并且 temp_tablespace
中有足够的 space
我为拥有大约 50.000.000 用户的 table 写了一个 sql。查询花费的时间比我预期的要多,大约 23 小时没有完成。
UPDATE users
SET building_id = B.id
FROM (
SELECT *
FROM buildings B
) AS B
WHERE B.city = address_city
AND B.town = address_town
AND B.neighbourhood = address_neighbourhood
AND B.street = address_street
AND B.no = address_building_no
这个 sql 的想法是从用户那里删除 building/address 信息,而不是将其引用到建筑物 table。
EXPLAIN
Update on users (cost=22226900.43..22548054.14 rows=15212 width=166)
-> Merge Join (cost=22226900.43..22548054.14 rows=15212 width=166)
Merge Cond: (((users.address_city)::text = (b.city)::text) AND ((users.address_town)::text = (b.town)::text) AND ((users.address_neighbourhood)::text = (b.neighbourhood)::text) AND ((users.address_street)::text = (b.street)::text) AND ((users.address_building_no)::text = (b.no)::text))
-> Sort (cost=21352886.76..21401078.96 rows=96384398 width=156)
Sort Key: users.address_city, users.address_town, users.address_neighbourhood, users.address_street, users.address_building_no
-> Seq Scan on users (cost=0.00..2559921.19 rows=96384398 width=156)
-> Materialize (cost=874013.68..883606.86 rows=9593179 width=63)
-> Sort (cost=874013.68..878810.27 rows=9593179 width=63)
Sort Key: b.city, b.town, b.neighbourhood, b.street, b.no
-> Seq Scan on buildings b (cost=0.00..136253.54 rows=9593179 width=63) (10 rows)
我不知道这个 sql 是否为每个用户或缓存使用内部 SELECT sql 进行交易。另外,如果它缓存,它是否使用缓存临时文件的索引table?
我不能这样写 sql:
FROM (
SELECT *
FROM buildings B
WHERE B.city = users.address_city
AND B.town = users.address_town
AND B.neighbourhood = users.address_neighbourhood
AND B.street = users.address_street
AND B.no = users.address_building_no
)
它说无法从内部 select 访问 users
。您对如何访问内部 sql 语句中的建筑物有什么建议吗?
不确定,但这不会更快(至少稍微快一点,如果不是相当大的话)吗?
UPDATE users
SET building_id = B.id
FROM buildings B
WHERE B.city = address_city
AND B.town = address_town
AND B.neighbourhood = address_neighbourhood
AND B.street = address_street
AND B.no = address_building_no
如果不出意外,它不需要上面 EXPLAIN
中给出的 Materialize
阶段。
我想
create table t as select column_list from a join b on column=column;
alter table t rename to users;
会更快,并且只会产生微秒级锁定... 当然,如果 table 目前不是 editable 并且 temp_tablespace
中有足够的 space