为什么只有在管道 python 脚本的输出时才会出现 unicode 错误？

Question

我用的是2.7写的Python脚本（seafile-cli，来自Seafile，一个文件同步解决方案）。

我知道 unicode is problematic in Python 2 但幸运的是，在启动脚本时正确处理了带有变音符号的文件名：

$ # seaf-cli status
# Name  Status  Progress
photos  downloading     0/0, 0.0KB/s
Ma bibliothèque downloading     566/1770, 1745.7KB/s
videos  downloading     28/1203, 5088.0KB/s
dev-perso       downloading     0/0, 0.0KB/s
dev-pro downloading     0/0, 0.0KB/s

令我惊讶的是，当管道输出时，Python 脚本崩溃 UnicodeEncodeError:

$ seaf-cli status | cat -
# Name  Status  Progress
photos  downloading     0/0, 0.0KB/s
Traceback (most recent call last):
  File "/usr/bin/seaf-cli", line 845, in <module>
    main()
  File "/usr/bin/seaf-cli", line 841, in main
    args.func(args)
  File "/usr/bin/seaf-cli", line 649, in seaf_status
    tx_task.rate/1024.0)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 11: ordinal not in range(128)

虽然我知道它可能首先与 Ma bibliothèque 有问题（但它没有），为什么管道会触发回溯？

不应该是shell的问题吗？ - 输出有 "left" 那个点的脚本。

编辑：answer is in another question。标记为重复。

Answer 1

Python 知道如何处理程序内部的编码，因为它使用终端应用程序正在使用的任何编码。

当您发送（管道）您的输出时，需要对其进行编码。这是因为使用管道实际上会在应用程序之间发送 字节流 。每个管道都是单向通道，一侧写入数据，另一侧读取数据。

使用管道或重定向，您将数据发送到 fd，由另一个应用程序读取。

所以你需要确保Python在发送数据之前正确编码数据，然后输入程序需要在处理之前解码。

您可能还会发现此 question 有用

更新：我将尝试详细说明有关编码的更多信息。我回答的第一行的意思是，因为您的 Python 解释器使用特定的编码，它知道如何将六进制值（实际字节）转换为符号。

我的口译员没有；如果我尝试从您的文本创建字符串 - 我收到错误消息：

>>> s = 'bibliothèque'
Unsupported characters in input

这是因为我在解释器上使用了不同的编码。

您的 shell 使用与 Python 解释器不同的编码。当 Python 从您的程序中发送数据时，它使用默认编码：ASCII。它无法使用 ASCII 翻译您的特殊字符（由六进制值 \xe8 显示）。因此，您必须指定要使用的编码以便 Python 发送它。

如果您更改 shell 编码，您也许可以克服这个问题 - 检查这个 question on SO。

PS - Ned Batchelder 在 youtube 上有一个关于 Unicode 的精彩视频 - 也许这会进一步阐明这个主题。

为什么只有在管道 python 脚本的输出时才会出现 unicode 错误？

Why is there a unicode error only when piping the output of a python script?

python

unicode

shell

pipe

python-unicode