psycopg2 cursor.fetchmany() 会看到并发提交的事务吗?

Will psycopg2 cursor.fetchmany() see concurrent committed transactions?

考虑以下代码:

import psycopg2

conn = psycopg2.connect(**credentials)
cur = conn.cursor()
cur.execute('select * from some_table')  # Imagine some_table to be a very big table
while True:
    rows = cur.fetchmany(1000)
    if not rows:
        break
    do_some_processing(rows)

cur.close()
conn.commit()

问题1:如果并发事务在some_table中插入新行,而循环是运行,如果事务隔离级别设置为“已提交读”,是否会提取新行?

问题2:如果并发事务在some_table中更新了一些行,而循环是运行,如果事务隔离级别设置为“read committed”,是否会提取更新的行?

根据 Postgres 文档:

Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

在上面的代码中,事务中只有 1 个 SELECT 查询,这意味着没有“连续的 SELECT 命令”,所以我的假设是游标不会看到任何新的inserts/updates。这是正确的吗?如果是,那么游标如何一直“记住”数据库的旧状态?如果循环运行几个 hours/days 怎么办?这种情况会导致一些与 MVCC 相关的磁盘膨胀或类似的问题吗?

您的光标将看到 SELECT 语句开始时出现的任何记录,无论光标持续多长时间。数据库服务器非常善于将多“代”的表分开。例如,如果您使用 BEGIN TRANSACTION,那么在出现 COMMIT TRANSACTION 之前,除了您之外没有其他人会看到您所做的更改。

只要游标打开,就会保留旧版本的行。是的,这意味着将游标保持打开状态几天会有膨胀的风险,因为在打开游标时有效的当前过时的元组无法删除。

您可能会争辩说,这只需要应用于定义游标的表,因为以后无法将新表添加到查询中(在已提交读模式下)。但是这种推理在具有动态 SQL 的函数的情况下不起作用,它可以随时引入新表。无论如何,这会使会计变得更加复杂,所以没有完成。因此,在游标关闭之前,在数据库范围内无法删除游标时代的元组。