Python 在 Windows 中写入将“\n”替换为“\r\n”

Python Write Replaces "\n" With "\r\n" in Windows

在查看我的问题后 ,我发现它是由一个更简单的问题引起的。

当我将 "\n" 写入文件时,我希望从文件中读入 "\n"。在 Windows.

中并非总是如此
In [1]: with open("out", "w") as file:
   ...:     file.write("\n")
   ...:

In [2]: with open("out", "r") as file:
   ...:     s = file.read()
   ...:

In [3]: s  # I expect "\n" and I get it
Out[3]: '\n'

In [4]: with open("out", "rb") as file:
   ...:     b = file.read()
   ...:

In [5]: b  # I expect b"\n"... Uh-oh
Out[5]: b'\r\n'

In [6]: with open("out", "wb") as file:
   ...:     file.write(b"\n")
   ...:

In [7]: with open("out", "r") as file:
   ...:     s = file.read()
   ...:

In [8]: s  # I expect "\n" and I get it
Out[8]: '\n'

In [9]: with open("out", "rb") as file:
   ...:     b = file.read()
   ...:

In [10]: b  # I expect b"\n" and I get it
Out[10]: b'\n'

以更有条理的方式:

| Method of Writing | Method of Reading | "\n" Turns Into |
|-------------------|-------------------|-----------------|
| "w"               | "r"               | "\n"            |
| "w"               | "rb"              | b"\r\n"         |
| "wb"              | "r"               | "\n"            |
| "wb"              | "rb"              | b"\n"           |

当我在 Linux 虚拟机上尝试此操作时,它总是 returns\n。我如何在 Windows 中执行此操作?

编辑: 这对于 pandas 库来说尤其有问题,它似乎用 "w"DataFrames 写入 csv 并用 "rb" 读取 csvs。有关此示例,请参阅顶部链接的问题。

来自 the documentation:

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

[...]

  • When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
open(..., 'w', newline='')

既然您使用的是 Python 3,那么您很幸运。当您打开文件进行写入时,只需指定 newline='\n' 以确保它写入 '\n' 而不是系统默认值,即 Windows 上的 \r\n。来自 docs:

When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

您认为自己 "sometimes" 看到双字符输出的原因是当您以二进制模式打开文件时,根本没有进行任何转换。为方便起见,字节数组仅以 ASCII 显示。在解码之前不要将它们视为真正的字符串。您显示的二进制输出是所有示例中文件的真实内容。

当您以默认文本模式打开文件进行读取时,newline 参数的工作方式与写入时的方式类似。默认情况下,文件中的所有 \r\n 将在字符解码后转换为仅 \n。当您的代码在操作系统之间传输但您的文件不这样做时,这非常好,因为您可以使用仅依赖于 \n 的完全相同的代码。如果您的文件也在移动,您应该坚持使用相对便携的 newline='\n' 至少输出。

文件的编码通常取决于系统。正如上面的答案所提到的,如果它对我们有用,我们可以将换行符选项硬编码为 '\n' 。但是,当您从云中获取文件或数据时,此方法将不起作用,因为它们通常具有受限访问权限并解析为轻量级可移植文件格式。因此,删除默认二进制或任何其他编码的最佳方法是对任何编码数据使用 decode() 方法和 file.read() 输出。 例如,在你的情况下

In [1]: with open("out", "w") as file:
   ...:     file.write("\n")

In [q]: with open("out", "file permission") as file:
   ...:     s = file.read().decode()

#--------------------------- OR --------------------------c

In [q`]: with open(..., newline='delimiter of your choice') as file:
   ...:     s = file.read()