Python encoding/decoding armv71 中的问题 Linux

Question

我在电子书reader设备上写了一些Python代码，无法解决encoding/decoding问题。我的环境如下：

设备：Kobo Aura One
OS: Linux (none) 3.0.35+ #5030 PREEMPT Wed Oct 25 10:25:24 CST 2017 armv7l GNU/Linux
Python 3.4.1

测试代码：

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import os
import sys
import locale

# Check the encoding and locale.
print('stderr:', sys.stderr, 'stdout:', sys.stdout)
print('filesystem encoding:', sys.getfilesystemencoding())
print('default locale:', locale.getdefaultlocale())
print('preferred encoding:', locale.getpreferredencoding())

path = '/mnt/onboard/Library'
for sub in os.listdir(path):
  print(sub)

“路径”中有一些非字母命名的文件（韩语）。字母命名的文件打印正常，非字母命名的文件出现异常。

结果：

stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='ANSI_X3.4-1968'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='ANSI_X3.4-1968'>
filesystem encoding: ascii
default locale: (None, None)
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
  File "./total.py", line 16, in <module>
    print(sub)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

所以，我尝试设置环境，

export PYTHONIOENCODING='UTF-8'
export LANG='C.UTF-8'
export LC_ALL='C.UTF-8'

结果：

stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
filesystem encoding: ascii
default locale: ('C', 'UTF-8')
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
  File "./total.py", line 16, in <module>
    print(sub)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 0: surrogates not allowed

这次修改代码和文件名打印的很好

    print(sub.encode('utf-8', 'surrogateescape').decode('utf-8'))

但是，我遇到了另一个问题。我无法访问这些文件。比如添加一行，

    print(os.path.exists(sub))

结果，错误。几个备选方案相同或导致异常。

print(os.path.exists(sub))                                                     # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('utf-8')))  # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape')))                  # False
print(os.path.exists(sub.encode('ascii'))                                      # Exception
print(os.path.exists(sub.encode('ascii').decode('ascii')))                     # Exception
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('ascii')))  # Exception

现在，我不知道。我能做什么？

Answer 1

Result, False. Several alternatives were same or caused exceptions.

当您重新编码 sub 时，您实际上是在更改值，即使它“看起来”相同。当您使用 UTF-8 重新编码并忽略选项时，os.listdir 返回的路径将不是相同的值。

如果您想显示文件名并访问其内容，您需要存储这两个值。

print(os.path.exists(sub))

# False

尝试os.path.exists(os.path.join(path, sub))。

Python encoding/decoding armv71 中的问题 Linux

Python encoding/decoding problem in armv71 Linux

python

decoding

character-encoding