SQL更新连接表:内存不足错误
SQL UPDATE with Joined tables: Out of Memory Error
我正在尝试更新 pSQL 中的 table 并遇到各种 memory/execution 错误。
奇怪的是,支持更新的 SELECT 查询非常快。我敢肯定我只是不明白引擎盖下发生了什么。
一些上下文。
相关tables
address_book:
loan_id,
county,
zip
---
loan:
id
---
loan_property:
loan_id,
property_id
---
property:
id,
zip,
county
目标
目标是更新 属性 table 的 zip & county 中的值address_book。 address_book 有一个 loan_id,它是 属性.
的连接
SQL
我们来看一个简单的SELECT
WITH ab AS (
SELECT DISTINCT
left(ab.loan_id, 6) AS loan_id,
ab.zip AS zip,
ab.county AS county
FROM
address_book ab
WHERE
ab.address IS NOT NULL
)
SELECT ab.county, p.name
FROM property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan l ON lp.loan_id = l.id
INNER JOIN ab ON ab.loan_id = l.id
WHERE
l.id = ab.loan_id
这非常有效并且非常快(约 10k 条记录 0.4 秒)
让我们将上面的内容作为更新调用:
WITH ab AS (
SELECT DISTINCT
left(ab.loan_id, 6) AS loan_id,
ab.zip AS zip,
ab.county AS county
FROM
address_book ab
WHERE
ab.address IS NOT NULL
)
UPDATE property
SET zip=ab.zip, county=ab.county
FROM property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan l ON lp.loan_id = l.id
INNER JOIN ab ON ab.loan_id = l.id
WHERE
l.id = ab.loan_id
此更新 运行 秒 2 分钟,然后通常失败基于
SQL Error [53200]: ERROR: out of memory
有没有更优化的方法来运行这次更新?即使我必须按 LIMIT/OFFSET 进行批处理或将 SELECT 结果保存到 table,然后直接从 table 执行更新 - [=62= 的方法是什么] 此更新没有遇到内存错误?
非常感谢大家!
我猜你的房产有很多贷款。 运行:
select property_id, count(*)
from loan_property
group by property_id
order by count(*) desc;
问题是你要从哪个信息中填写信息。
也有可能您的贷款地址很多。 select distinct
那里很可疑。
As documented in the manual do not 在 UPDATE 语句中重复目标 table:
...
UPDATE property
SET zip = ab.zip,
county = ab.county
FROM loan_property lp
JOIN loan l ON lp.loan_id = l.id
JOIN ab ON ab.loan_id = l.id
WHERE lp.property_id = p.id
我认为我的回答与@a_horse_with_no_name 的回答相似 - 重新引用目标时有些奇怪 table。
我实际上将更新中的 FROM 子句合并到另一个别名 SELECT 调用中,如下所示:
WITH ab as (
SELECT distinct
p.id as p_id,
ab.county as county,
ab.zip as zip
FROM
address_book ab
inner join loan l on ab.loan_id = l.id
inner join loan_property lp on loan_id = l.id
inner join property p on lp.property_id = p.id
WHERE
ab.address IS NOT null
and l.id = ab.loan_id
)
UPDATE property
SET county__c=ab.county, zip_code__c=ab.zip
FROM ab
WHERE ab.p_id = id
在更新中将连接(尤其是目标 table)从 FROM 调用中分离出来解决了这个问题。
最可取的是对记录进行分组、排序和限制,然后在你的更新语句中使用相同的记录,我想可能有很多是你内存不足的主要原因。因为更新是逐行进行的,所以冗余数据让更新工作得更多,并使时间限制变差。因此,将记录分组更新为 select
语句的最佳选择已经花费了更少的时间,因此没有必要对其进行优化。像下面的示例一样尝试
WITH xyz AS (
Select zip,property from property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan l ON lp.loan_id = l.id
INNER JOIN ab ON ab.loan_id = l.id
WHERE
l.id = ab.loan_id group by some_value/order by zip)
UPDATE xyz
SET zip=ab.zip, county=ab.county
我正在尝试更新 pSQL 中的 table 并遇到各种 memory/execution 错误。
奇怪的是,支持更新的 SELECT 查询非常快。我敢肯定我只是不明白引擎盖下发生了什么。
一些上下文。
相关tables
address_book:
loan_id,
county,
zip
---
loan:
id
---
loan_property:
loan_id,
property_id
---
property:
id,
zip,
county
目标
目标是更新 属性 table 的 zip & county 中的值address_book。 address_book 有一个 loan_id,它是 属性.
的连接SQL
我们来看一个简单的SELECT
WITH ab AS (
SELECT DISTINCT
left(ab.loan_id, 6) AS loan_id,
ab.zip AS zip,
ab.county AS county
FROM
address_book ab
WHERE
ab.address IS NOT NULL
)
SELECT ab.county, p.name
FROM property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan l ON lp.loan_id = l.id
INNER JOIN ab ON ab.loan_id = l.id
WHERE
l.id = ab.loan_id
这非常有效并且非常快(约 10k 条记录 0.4 秒)
让我们将上面的内容作为更新调用:
WITH ab AS (
SELECT DISTINCT
left(ab.loan_id, 6) AS loan_id,
ab.zip AS zip,
ab.county AS county
FROM
address_book ab
WHERE
ab.address IS NOT NULL
)
UPDATE property
SET zip=ab.zip, county=ab.county
FROM property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan l ON lp.loan_id = l.id
INNER JOIN ab ON ab.loan_id = l.id
WHERE
l.id = ab.loan_id
此更新 运行 秒 2 分钟,然后通常失败基于
SQL Error [53200]: ERROR: out of memory
有没有更优化的方法来运行这次更新?即使我必须按 LIMIT/OFFSET 进行批处理或将 SELECT 结果保存到 table,然后直接从 table 执行更新 - [=62= 的方法是什么] 此更新没有遇到内存错误?
非常感谢大家!
我猜你的房产有很多贷款。 运行:
select property_id, count(*)
from loan_property
group by property_id
order by count(*) desc;
问题是你要从哪个信息中填写信息。
也有可能您的贷款地址很多。 select distinct
那里很可疑。
As documented in the manual do not 在 UPDATE 语句中重复目标 table:
...
UPDATE property
SET zip = ab.zip,
county = ab.county
FROM loan_property lp
JOIN loan l ON lp.loan_id = l.id
JOIN ab ON ab.loan_id = l.id
WHERE lp.property_id = p.id
我认为我的回答与@a_horse_with_no_name 的回答相似 - 重新引用目标时有些奇怪 table。
我实际上将更新中的 FROM 子句合并到另一个别名 SELECT 调用中,如下所示:
WITH ab as (
SELECT distinct
p.id as p_id,
ab.county as county,
ab.zip as zip
FROM
address_book ab
inner join loan l on ab.loan_id = l.id
inner join loan_property lp on loan_id = l.id
inner join property p on lp.property_id = p.id
WHERE
ab.address IS NOT null
and l.id = ab.loan_id
)
UPDATE property
SET county__c=ab.county, zip_code__c=ab.zip
FROM ab
WHERE ab.p_id = id
在更新中将连接(尤其是目标 table)从 FROM 调用中分离出来解决了这个问题。
最可取的是对记录进行分组、排序和限制,然后在你的更新语句中使用相同的记录,我想可能有很多是你内存不足的主要原因。因为更新是逐行进行的,所以冗余数据让更新工作得更多,并使时间限制变差。因此,将记录分组更新为 select
语句的最佳选择已经花费了更少的时间,因此没有必要对其进行优化。像下面的示例一样尝试
WITH xyz AS (
Select zip,property from property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan l ON lp.loan_id = l.id
INNER JOIN ab ON ab.loan_id = l.id
WHERE
l.id = ab.loan_id group by some_value/order by zip)
UPDATE xyz
SET zip=ab.zip, county=ab.county