阻止 Python print() 在 Windows 上自动将换行符转换为 CRLF

Prevent Python print()'s automatic newline conversion to CRLF on Windows

我想通过 Windows CMD(控制台)从 Python 传输带有类 unix EOL (LF) 的文本。但是,Python 似乎会自动将单个换行符转换为 Windows 样式的 end-of-line (EOL) 字符(即 \r\n<CR><LF>0D 0A13 10):

#!python3
#coding=utf-8
import sys
print(sys.version)
print("one\ntwo")
# run as py t.py > t.txt

结果

3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)]
one
two

或十六进制... 6F 6E 65 <b>0D 0A</b> 74 77 6F <b>0D 0A</b>

第二次停产是因为 print() 默认为 end='\n',但它也进行转换。

没有 newline 参数或 属性 像 open() 那样用于打印,那么如何控制?

看到这个回答:


print() usually writes to sys.stdout。以下是 non-interactive 模式的文档摘录:

  • stdout is used for the output of print()

  • sys.stdout: File object used by the interpreter for standard ... output

  • These streams are regular text files like those returned by the open() function.

  • character encoding on Windows is ANSI

  • standard streams are ... block-buffered like regular text files.

  • Note
    To write or read binary data from/to the standard streams, use the underlying binary buffer object. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').

让我们先试试这种直接的方法:

import sys
print("one\ntwo")
sys.stdout.write('three\nfour')
sys.stdout.buffer.write(b'five\nsix')

结果

five\n
sixone\r\n
two\r\n
three\r\n
four

缓冲区写入似乎按预期工作,尽管输出顺序 "messing"。

直接写入缓冲区之前刷新有助于:

import sys
print("one\ntwo")
sys.stdout.write('three\nfour')
sys.stdout.flush()
sys.stdout.buffer.write(b'five\nsix')

结果

one\r\n
two\r\n
three\r\n
fourfive\n
six

但它仍然不是 "fixing" print()。返回文件对象/流/文本文件(关于 IO objects in Python Data model 的简短信息):

https://docs.python.org/3/glossary.html#term-text-file

A file object able to read and write str objects. Often, a text file actually accesses a byte-oriented datastream and handles the text encoding automatically. Examples of text files are files opened in text mode ('r' or 'w'), sys.stdin, sys.stdout, and instances of io.StringIO.

那么(如何)可以重新配置或重新打开 sys.stdout file 来控制换行行为?它到底是什么?

>>> import sys
>>> type(sys.stdout)
<class '_io.TextIOWrapper'>

文档:class io.TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)

newline controls how line endings are handled. It can be None, '', '\n', '\r', and '\r\n'.
It works as follows:
When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.
If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.
If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep.
If newline is '' or '\n', no translation takes place.
If newline is any of the other legal values, any '\n' characters written are translated to the given string.

让我们看看:

>>> sys.stdout.newline = "\n"
>>>

好的,那

import sys
sys.stdout.newline = '\n'
print("one\ntwo")

无效:

one\r\n
two\r\n

因为 属性 不存在:

>>> sys.stdout.newline
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: '_io.TextIOWrapper' object has no attribute 'newline'

我应该早点检查的..

>>> vars(sys.stdout)
{'mode': 'w'}

所以真的,没有 newline 属性让我们重新定义。

有什么好用的方法吗?

>>> dir(sys.stdout)
['_CHUNK_SIZE', '__class__', '__del__', '__delattr__', '__dict__', 
'__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', 
'__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', 
'__init__', '__init_subclass__', '__iter__', '__le__', '__lt__',
'__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 
'_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 
'_finalizing', 'buffer', 'close', 'closed', 'detach', 'encoding', 
'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 
'name', 'newlines', 'read', 'readable', 'readline', 'readlines',
'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 
'writelines']

不是真的。

但我们至少可以将默认界面替换为缓冲区末尾指定所需的换行符:

import sys, io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline='\n' )
print("one\ntwo")

最终导致

one\n
two\n

要恢复,只需将 sys.stdout 重新分配给您制作的副本即可。或者,显然不推荐,使用内部保留的 sys.__stdout__ 来做到这一点。

警告:参见下面的,这需要一些注意。请改用他的解决方案(下面的link):


似乎也可以重新打开文件,请参阅 以获取灵感,此答案 实施。


如果您想更深入地了解,请查看 Python (CPython) 来源: https://github.com/python/cpython/blob/master/Modules/_io/textio.c


还有os.linesep,看看是不是真的“\r\n” for Windows:

>>> import os
>>> os.linesep
'\r\n'
>>> ",".join([f'0x{ord(c):X}' for c in os.linesep])
'0xD,0xA'

这可以重新定义吗?

#!python3
#coding=utf-8
import sys, os
saved = os.linesep
os.linesep = '\n'
print(os.linesep)
print("one\ntwo")
os.linesep = saved

它可以在交互模式下,但显然不是:

\r\n
\r\n
one\r\n
two\r\n