Python encoding/decoding armv71 中的问题 Linux

Python encoding/decoding problem in armv71 Linux

我在电子书reader设备上写了一些Python代码,无法解决encoding/decoding问题。 我的环境如下:

测试代码:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import os
import sys
import locale

# Check the encoding and locale.
print('stderr:', sys.stderr, 'stdout:', sys.stdout)
print('filesystem encoding:', sys.getfilesystemencoding())
print('default locale:', locale.getdefaultlocale())
print('preferred encoding:', locale.getpreferredencoding())

path = '/mnt/onboard/Library'
for sub in os.listdir(path):
  print(sub)

“路径”中有一些非字母命名的文件(韩语)。字母命名的文件打印正常,非字母命名的文件出现异常。

结果:

stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='ANSI_X3.4-1968'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='ANSI_X3.4-1968'>
filesystem encoding: ascii
default locale: (None, None)
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
  File "./total.py", line 16, in <module>
    print(sub)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

所以,我尝试设置环境,

export PYTHONIOENCODING='UTF-8'
export LANG='C.UTF-8'
export LC_ALL='C.UTF-8'

结果:

stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
filesystem encoding: ascii
default locale: ('C', 'UTF-8')
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
  File "./total.py", line 16, in <module>
    print(sub)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 0: surrogates not allowed

这次修改代码和文件名打印的很好

    print(sub.encode('utf-8', 'surrogateescape').decode('utf-8'))

但是,我遇到了另一个问题。我无法访问这些文件。比如添加一行,

    print(os.path.exists(sub))

结果,错误。 几个备选方案相同或导致异常。

print(os.path.exists(sub))                                                     # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('utf-8')))  # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape')))                  # False
print(os.path.exists(sub.encode('ascii'))                                      # Exception
print(os.path.exists(sub.encode('ascii').decode('ascii')))                     # Exception
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('ascii')))  # Exception

现在,我不知道。 我能做什么?

Result, False. Several alternatives were same or caused exceptions.

当您重新编码 sub 时,您实际上是在更改值,即使它“看起来”相同。当您使用 UTF-8 重新编码并忽略选项时,os.listdir 返回的路径将不是相同的值。

如果您想显示文件名并访问其内容,您需要存储这两个值。

print(os.path.exists(sub))

# False

尝试os.path.exists(os.path.join(path, sub))