file.write() 有时（但不总是）将文本写入文件

Question

我正在使用 file.write() 将数字数据添加到文本文件。然而，在 516159 个字符之后，发生了一些有趣的事情：大约一半的时间我运行我的代码，它删除了最后 7k 个字符。另一半，它工作正常。这是一些代码：

#Create or open file (it strangely couldn't create the file without using mode='x')
try:
  corpus_txt = open("corpus.txt", mode = "x")
except:
  corpus_txt = open("corpus.txt", mode = "w")

corpus_txt.truncate(0)#delete contents

content_length = 0

#X_train is a 2D array of integers
for sentence in X_train:
  for word in sentence:

    corpus_txt.write(str(word)+" ")
    content_length += len(str(word)+" ")

  corpus_txt.write("\n")
  content_length += 1

corpus_txt = open("corpus.txt")
content = corpus_txt.read()
corpus_txt.close()

print("FILE LENGTH (chars):", len(content))
print("TOTAL LENGTH OF TEXT ADDED TO FILE:", content_length)

当我用我的数据运行反复这样做时：

"content_length" 始终等于 523379
len("content") 在值 516247 和 523379 之间交替

一些其他信息：

缺失文本出现在数据末尾（最后 7k 个字符）
不是换行处content_length的增量
在此代码过程中我的数据未被更改
我正在使用 Google Colab
我得到 516k 的频率比 523k 稍高
开关没有特定的模式
这不应该与 read() 方法的格式有关，因为再一次，它只是缺少最后 7k 个字符

非常感谢help/explanation。谢谢！

Answer 1

您需要在写入文件后close()文件；否则它不能保证被刷新到磁盘，随后的 open() 将不会“看到”你所做的写入。使用上下文管理器语法 (with open(...) as ...:) 被认为是最佳实践，因为它几乎不可能犯这种错误。

这应该有效：

with open("corpus.txt", mode="w") as corpus_txt:

    # opening with "w" automatically overwrites previous contents
    content_length = 0

    #X_train is a 2D array of integers
    for sentence in X_train:
        for word in sentence:
            corpus_txt.write(str(word)+" ")
            content_length += len(str(word)+" ")
        corpus_txt.write("\n")
        content_length += 1

with open("corpus.txt") as corpus_txt:
    content = corpus_txt.read()

print("FILE LENGTH (chars):", len(content))
print("TOTAL LENGTH OF TEXT ADDED TO FILE:", content_length)

与文件写入问题无关：我可能建议将其简化为仅预先生成 content 作为字符串（因为它显然足够小以适合内存）因此您不需要所有额外的簿记来计算它有多长：

with open("corpus.txt", mode="w") as corpus_txt:
    content = "\n".join(
        " ".join(str(word) for word in sentence)
        for sentence in X_train
    ) + "\n"
    corpus_txt.write(content)
print(f"File length as written: {len(content)}")

with open("corpus.txt") as corpus_txt:
    content = corpus_txt.read()
print(f"File length as read: {len(content)}")

file.write() 有时（但不总是）将文本写入文件

file.write() sometimes (but not always) writing text to file

python

edit

google-colaboratory