截断文本文件不会更改文件

truncating a text file does not change the file

当新手(像我一样)在 python 中询问 reading/processing 文本文件时,他经常得到如下答案:

with open("input.txt", 'r') as f:
    for line in f:
        #do your stuff

现在我想在特殊行之后截断我正在阅读的文件中的所有内容。修改上面的示例后,我使用:

with open("input.txt", 'r+') as file:
    for line in file:
        print line.rstrip("\n\r") #for debug
        if line.rstrip("\n\r")=="CC":
           print "truncating!"  #for debug
           file.truncate();
           break;

并期望它在第一个 "CC" 看到后丢弃所有内容。 运行 input.txt 上的代码:

AA
CC
DD

以下内容打印在控制台上(如预期):

AA
CC
truncating!

但文件 "input.txt" 保持不变!?!?

怎么可能?我做错了什么?

编辑: 操作后我希望文件包含:

AA
CC

看来您正成为 Python 内部使用的 read-ahead 缓冲区的受害者。来自 documentation for the file.next() method:

A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

结果是文件的位置不是您截断时期望的位置。解决这个问题的一种方法是使用 readline 来遍历文件,而不是迭代器:

line = file.readline()
while line:
    ...
    line = file.readline()

除了 glibdud 的答案之外,truncate() 还需要它删除内容的位置的大小。您可以通过 tell() 命令获取文件中的当前位置。正如他提到的,通过使用 for-loop,next() 禁止像 tell 这样的命令。但是在建议的 while-loop 中,您可以在当前的 tell() 位置截断。所以完整的代码看起来像这样:

Python 3:

with open("test.txt", 'r+') as file:
line = file.readline()
while line:
    print(line.strip())
    if line.strip() == "CC":
        print("truncating")
        file.truncate(file.tell())
        break
    line = file.readline()