如何使用 Python / psycopg2 有效地更新大型 PostgreSQL table 中的列？

Question

我有一个很大的 table 大约。 PostgreSQL 9.4 数据库中有 1000 万行。它看起来有点像这样：

gid | number1 | random | result | ...
 1  |    2    |  NULL  |  NULL  | ...
 2  |   15    |  NULL  |  NULL  | ...
... |   ...   |   ...  |  ...   | ...

现在我想更新列 random 和 result 作为 number1 的函数。这意味着至少 random 需要在数据库外部的脚本中生成。由于我的 RAM 有限，我想知道如何使用 psycopg2 有效地做到这一点。我相信我面临两个问题：如何在不使用太多 RAM 的情况下获取数据以及如何将其放回那里。简单的方法看起来像这样：

curs.execute("""SELECT gid1, number1 FROM my_table;""")
data = curs.fetchall()

result = []
for i in data:
    result.append((create_random(i[1]), i[0]))
curs.executemany("""UPDATE my_table
                    SET random = %s
                    WHERE gid = %s;""",
                 results)
curs.execute("""UPDATE my_table
                SET result = number1 * random;""")

但是，这肯定会很快耗尽我所有的记忆并永远UPDATE my_table。

什么是更明智的策略？正在以独占方式访问数据库，可以将其锁定。不幸的是，对于我的情况，PostgreSQL 随机函数不是 suitable。

Answer 1

unnest一次性完成的数据：

def create_random(i):
    return random() * i

curs.execute("select gid, number from t;")
data = curs.fetchall()

results = []
for i in data:
    results.append((create_random(i[1]), i[0]))

curs.execute("""
    update t
    set
        rnd = s.rnd,
        result = number * s.rnd
    from unnest(%s) s(rnd numeric, gid integer)
    where t.gid = s.gid;
""", (results,))

con.commit()

Table t:

create table t (
    gid integer,
    number integer,
    rnd float,
    result float
);

如何使用 Python / psycopg2 有效地更新大型 PostgreSQL table 中的列？

How to efficiently UPDATE a column in a large PostgreSQL table using Python / psycopg2?

python

postgresql

psycopg2

sql-update