在 'r+' 中，为什么写一个文本文件 after 读取一行使其写在末尾，而不是“f.tell()”位置？

Question

有这样的文本文件：

line one
line two
line three

和运行下面的代码：

with open('file', 'r+') as f:
    print(f.tell())
    print(f.readline().strip())
    print(f.tell())
    # f.seek(f.tell())
    f.write('Hello')
    print(f.tell())

导致单词 "Hello" 被写在文件的最后：

line one
line two
line threeHello

我以为写作部分会从最后读取的字符位置开始（就在line one之后），但事实并非如此，除非我取消注释f.seek(f.tell())。我可能缺少一些基础知识，但我在 Python 文档中找不到任何内容来深入解释其工作原理。这里发生了什么，是什么让它在那里写下这个词？如果我不先阅读，而是开始写作，为什么这种情况不会发生？

f.tell() 的打印值如下：

0
9
39

Answer 1

这看起来是 io.TextIOWrapper（open 在文本模式下返回的 class）与 io.BufferedRandom（class它包装在 + 模式中。

如果您将测试用例更改为以二进制模式运行：

with open('file', 'rb+') as f:
    print(f.tell())
    print(f.readline().strip())
    print(f.tell())
    # f.seek(f.tell())
    f.write(b'Hello')
    print(f.tell())

无论是否包含多余的 f.seek(f.tell())，行为都是相同的。

问题似乎是由涉及的多层缓冲引起的。你得到的是一个 io.TextIOWrapper 包裹着一个 io.BufferedRandom（它又包裹着一个 io.FileIO）。 TextIOWrapper 从 io.BufferedRandom 读取块以分摊从字节解码到文本的成本，因此当您调用 readline 时，它实际上是在消耗和解码您的整个文件（它非常小，适合在一个块中），将 BufferedRandom 留在文件的末尾（即使逻辑上它应该只在中间，并且 TextIOWrapper.tell 报告与该逻辑位置对应的位置）。

当您转身 write 时，TextIOWrapper 对数据进行编码并将其传递给 BufferedRandom，它仍然认为自己位于文件末尾；由于 TextIOWrapper 没有更正这一点，因此数据被附加到最后。看似无操作的 f.seek(f.tell()) 将 TextIOWrapper 与底层 BufferedRandom 重新同步以获得预期的行为。它实际上不是必需的（我建议 filing a bug to ensure writes go to the logical tell position, as I can't find an existing bug, though Python 3 f.tell() gets out of sync with file pointer in binary append+read mode 表面上相似），但至少解决方法相对简单。

Answer 2

问题与缓冲 IO 有关。

open() 函数似乎打开了缓冲文件句柄。

因此，实际上每当从文件中读取某些内容时，至少会读取整个缓冲区，其中似乎在我的机器上大约 8k (8192) 字节。这是为了优化性能。

因此 readline 将读取一个块 return 第一行并将其余部分保存在缓冲区中以备将来读取。

f.tell() 为您提供相对于字节的位置，这些字节已经被 readline() return 编辑。

这你可以用f.seek(f.tell())强制写入指针到你想要的地方。如果没有明确的搜索语句，您将在缓冲区之后写入。

使用以下脚本来说明并查看输出：

你会看到，我尝试使用 buffering 参数。根据文档 1 表示行缓冲，但我没有看到任何行为变化。

with open("file", "w") as f:
    f.write(("*" * 79 +"\n") * 1000)

with open('file', 'r+', buffering=1) as f:
    print(f.tell())
    print(f.readline().strip())
    print(f.tell())
    # f.seek(f.tell())
    f.write('Hello')
    print(f.tell())

print("----------- file contents")
with open("file", "r") as f:
    pass
    print(f.read())
print("----------- END")

因此，如果您在 readline() 之后写入，那么它将在缓冲区之后写入新数据，即读入。

另一方面，

f.tell() return 告诉你有多少字节已经被 returned。

输出将是：

0
*******************************************************************************
80
8197
8202
----------- file contents
*******************************************************************************
*******************************************************************************
...
*******************************************************************************
********************************HelloHello*************************************
*******************************************************************************
*******************************************************************************
*******************************************************************************
*******************************************************************************
...

在 'r+' 中，为什么写一个文本文件 after 读取一行使其写在末尾，而不是“f.tell()”位置？

In 'r+', why is that writing a textfile after reading a single line makes it write at the end, instead of the `f.tell()` position?

python

file-writing

在 'r+' 中，为什么写一个文本文件 *after* 读取一行使其写在末尾，而不是“f.tell()”位置？

In 'r+', why is that writing a textfile *after* reading a single line makes it write at the end, instead of the `f.tell()` position?

python

file-writing

在 'r+' 中，为什么写一个文本文件 after 读取一行使其写在末尾，而不是“f.tell()”位置？

In 'r+', why is that writing a textfile after reading a single line makes it write at the end, instead of the `f.tell()` position?