SQL更新连接表:内存不足错误

SQL UPDATE with Joined tables: Out of Memory Error

我正在尝试更新 pSQL 中的 table 并遇到各种 memory/execution 错误。

奇怪的是,支持更新的 SELECT 查询非常快。我敢肯定我只是不明白引擎盖下发生了什么。

一些上下文。


相关tables

address_book:
loan_id,
county,
zip
---
loan:
id
---
loan_property:
loan_id,
property_id
---
property:
id,
zip,
county

目标

目标是更新 属性 table 的 zip & county 中的值address_book。 address_book 有一个 loan_id,它是 属性.

的连接

SQL

我们来看一个简单的SELECT

WITH ab AS (
SELECT DISTINCT
    left(ab.loan_id, 6) AS loan_id,
    ab.zip AS zip,
    ab.county AS county
FROM 
    address_book ab
WHERE
    ab.address IS NOT NULL
)

SELECT ab.county, p.name

FROM property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan           l ON lp.loan_id     = l.id
INNER JOIN               ab ON ab.loan_id     = l.id
WHERE 
    l.id = ab.loan_id

这非常有效并且非常快(约 10k 条记录 0.4 秒)

让我们将上面的内容作为更新调用:

WITH ab AS (
SELECT DISTINCT
    left(ab.loan_id, 6) AS loan_id,
    ab.zip AS zip,
    ab.county AS county
FROM 
    address_book ab
WHERE
    ab.address IS NOT NULL
)

UPDATE property
SET zip=ab.zip, county=ab.county

FROM property p
INNER JOIN loan_property lp ON lp.property_id = p.id
INNER JOIN loan           l ON lp.loan_id     = l.id
INNER JOIN               ab ON ab.loan_id     = l.id
WHERE 
    l.id = ab.loan_id

此更新 运行 秒 2 分钟,然后通常失败基于

SQL Error [53200]: ERROR: out of memory

有没有更优化的方法来运行这次更新?即使我必须按 LIMIT/OFFSET 进行批处理或将 SELECT 结果保存到 table,然后直接从 table 执行更新 - [=62= 的方法是什么] 此更新没有遇到内存错误?

非常感谢大家!

我猜你的房产有很多贷款。 运行:

select property_id, count(*)
from loan_property
group by property_id
order by count(*) desc;

问题是你要从哪个信息中填写信息。

也有可能您的贷款地址很多。 select distinct 那里很可疑。

As documented in the manual do not 在 UPDATE 语句中重复目标 table:

...
UPDATE property
   SET zip = ab.zip, 
       county = ab.county
FROM loan_property lp
   JOIN loan l ON lp.loan_id = l.id
   JOIN ab ON ab.loan_id = l.id
WHERE lp.property_id = p.id

我认为我的回答与@a_horse_with_no_name 的回答相似 - 重新引用目标时有些奇怪 table。

我实际上将更新中的 FROM 子句合并到另一个别名 SELECT 调用中,如下所示:

WITH ab as (
SELECT distinct
    p.id as p_id, 
    ab.county as county, 
    ab.zip as zip
FROM 
    address_book ab
inner join loan l on ab.loan_id = l.id
inner join loan_property lp on loan_id = l.id
inner join property p on lp.property_id = p.id
WHERE
    ab.address IS NOT null
    and l.id = ab.loan_id
)

UPDATE property
SET county__c=ab.county, zip_code__c=ab.zip
FROM ab
WHERE ab.p_id = id

在更新中将连接(尤其是目标 table)从 FROM 调用中分离出来解决了这个问题。

最可取的是对记录进行分组、排序和限制,然后在你的更新语句中使用相同的记录,我想可能有很多是你内存不足的主要原因。因为更新是逐行进行的,所以冗余数据让更新工作得更多,并使时间限制变差。因此,将记录分组更新为 select 语句的最佳选择已经花费了更少的时间,因此没有必要对其进行优化。像下面的示例一样尝试

      WITH xyz AS (
      Select zip,property from property p
    INNER JOIN loan_property lp ON lp.property_id = p.id
     INNER JOIN loan           l ON lp.loan_id     = l.id
        INNER JOIN               ab ON ab.loan_id     = l.id
   WHERE 
    l.id = ab.loan_id group by some_value/order by zip)

    UPDATE xyz
     SET zip=ab.zip, county=ab.county