频繁地在文件对象上调用 write() 并添加少量内容是不是很糟糕？

Question

我正在重构一个可怕的 python 脚本，它是生成 lua 绑定的 polycode 项目的一部分。

我正在考虑将生成的 lua 行写成块。

但我的问题是，快速写入文件的 detriments/caveats 是什么？

举个例子：

persistent_file = open('/tmp/demo.txt')

for i in range(1000000):
    persistent_file.write(str(i)*80 + '\n')


for i in range(2000):
    persistent_file.write(str(i)*20 + '\n')


for i in range(1000000):
    persistent_file.write(str(i)*100 + '\n')


persistent_file.close()

这只是一种尽可能快地向文件大量写入的简单方法。我真的不希望这样做会遇到任何实际问题，但我确实想知道，为一次大写入缓存是否有利？

Answer 1

来自 open 函数的文档：

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) -> file object

...

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device's "block size" and falling back on io.DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long.

"Interactive" text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

换句话说，在大多数情况下，频繁调用 write() 的唯一开销是函数调用的开销。

频繁地在文件对象上调用 write() 并添加少量内容是不是很糟糕？

Is it bad to call write() on a file object frequently with small additions?

python

performance

file-io

file

stress-testing