Python 带有自定义分隔符的 readline

Question

这里是新手。我正在尝试从文件中读取行，但是 .txt 文件中的一行在中间某处有一个 \n 并且在尝试使用 .readline python 读取该行时将其删除在中间并输出为两行。

当我将行复制并粘贴到此 window 时，它显示为两行。所以我在这里上传了文件：https://ufile.io/npt3n
还添加了 txt 文件中显示的文件截图。
这是从 Whatsup 导出的群聊历史..如果你想知道的话。
请帮我把txt文件中的一行完整地读一遍。

.

f= open("f.txt",mode='r',encoding='utf8')

for i in range(4):
    lineText=f.readline()
    print(lineText)

f.close()

Answer 1

不使用 readline 函数，您可以通过正则表达式读取整个内容和拆分行：

import re

with open("txt", "r") as f:
    content = f.read()
    # remove end line characters
    content = content.replace("\n", "")
    # split by lines
    lines = re.compile("(\[[0-9//, :\]]+)").split(content)
    # clean "" elements
    lines = [x for x in lines if x != ""]
# join by pairs
lines = [i + j for i, j in zip(lines[::2], lines[1::2])]

如果所有内容都具有相同的开头[...]，您可以按此拆分，然后清理所有部分，省略“”元素。然后你可以用 zip 函数加入每个部分 ()

Answer 2

Python 3 允许您定义特定文件的换行符。它很少使用，因为默认的 universal newlines 模式非常宽容：

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.

所以在这里你应该明确表示只有'\r\n'是行尾：

f= open("f.txt",mode='r',encoding='utf8', newline='\r\n')

# use enumerate to show that second line is read as a whole
for i, line in enumerate(fd):   
    print(i, line)

Python 带有自定义分隔符的 readline

Python readline with custom delimiter

python

readline

end-of-line