Python encoding/decoding armv71 中的问题 Linux
Python encoding/decoding problem in armv71 Linux
我在电子书reader设备上写了一些Python代码,无法解决encoding/decoding问题。
我的环境如下:
- 设备:Kobo Aura One
- OS: Linux (none) 3.0.35+ #5030 PREEMPT Wed Oct 25 10:25:24 CST 2017 armv7l GNU/Linux
- Python 3.4.1
测试代码:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import os
import sys
import locale
# Check the encoding and locale.
print('stderr:', sys.stderr, 'stdout:', sys.stdout)
print('filesystem encoding:', sys.getfilesystemencoding())
print('default locale:', locale.getdefaultlocale())
print('preferred encoding:', locale.getpreferredencoding())
path = '/mnt/onboard/Library'
for sub in os.listdir(path):
print(sub)
“路径”中有一些非字母命名的文件(韩语)。字母命名的文件打印正常,非字母命名的文件出现异常。
结果:
stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='ANSI_X3.4-1968'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='ANSI_X3.4-1968'>
filesystem encoding: ascii
default locale: (None, None)
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
File "./total.py", line 16, in <module>
print(sub)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
所以,我尝试设置环境,
export PYTHONIOENCODING='UTF-8'
export LANG='C.UTF-8'
export LC_ALL='C.UTF-8'
结果:
stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
filesystem encoding: ascii
default locale: ('C', 'UTF-8')
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
File "./total.py", line 16, in <module>
print(sub)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 0: surrogates not allowed
这次修改代码和文件名打印的很好
print(sub.encode('utf-8', 'surrogateescape').decode('utf-8'))
但是,我遇到了另一个问题。我无法访问这些文件。比如添加一行,
print(os.path.exists(sub))
结果,错误。
几个备选方案相同或导致异常。
print(os.path.exists(sub)) # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('utf-8'))) # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape'))) # False
print(os.path.exists(sub.encode('ascii')) # Exception
print(os.path.exists(sub.encode('ascii').decode('ascii'))) # Exception
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('ascii'))) # Exception
现在,我不知道。
我能做什么?
Result, False. Several alternatives were same or caused exceptions.
当您重新编码 sub
时,您实际上是在更改值,即使它“看起来”相同。当您使用 UTF-8 重新编码并忽略选项时,os.listdir
返回的路径将不是相同的值。
如果您想显示文件名并访问其内容,您需要存储这两个值。
print(os.path.exists(sub))
# False
尝试os.path.exists(os.path.join(path, sub))
。
我在电子书reader设备上写了一些Python代码,无法解决encoding/decoding问题。 我的环境如下:
- 设备:Kobo Aura One
- OS: Linux (none) 3.0.35+ #5030 PREEMPT Wed Oct 25 10:25:24 CST 2017 armv7l GNU/Linux
- Python 3.4.1
测试代码:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import os
import sys
import locale
# Check the encoding and locale.
print('stderr:', sys.stderr, 'stdout:', sys.stdout)
print('filesystem encoding:', sys.getfilesystemencoding())
print('default locale:', locale.getdefaultlocale())
print('preferred encoding:', locale.getpreferredencoding())
path = '/mnt/onboard/Library'
for sub in os.listdir(path):
print(sub)
“路径”中有一些非字母命名的文件(韩语)。字母命名的文件打印正常,非字母命名的文件出现异常。
结果:
stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='ANSI_X3.4-1968'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='ANSI_X3.4-1968'>
filesystem encoding: ascii
default locale: (None, None)
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
File "./total.py", line 16, in <module>
print(sub)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
所以,我尝试设置环境,
export PYTHONIOENCODING='UTF-8'
export LANG='C.UTF-8'
export LC_ALL='C.UTF-8'
结果:
stderr: <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'> stdout: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
filesystem encoding: ascii
default locale: ('C', 'UTF-8')
preferred encoding: ANSI_X3.4-1968
Traceback (most recent call last):
File "./total.py", line 16, in <module>
print(sub)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 0: surrogates not allowed
这次修改代码和文件名打印的很好
print(sub.encode('utf-8', 'surrogateescape').decode('utf-8'))
但是,我遇到了另一个问题。我无法访问这些文件。比如添加一行,
print(os.path.exists(sub))
结果,错误。 几个备选方案相同或导致异常。
print(os.path.exists(sub)) # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('utf-8'))) # False
print(os.path.exists(sub.encode('utf-8', 'surrogateescape'))) # False
print(os.path.exists(sub.encode('ascii')) # Exception
print(os.path.exists(sub.encode('ascii').decode('ascii'))) # Exception
print(os.path.exists(sub.encode('utf-8', 'surrogateescape').decode('ascii'))) # Exception
现在,我不知道。 我能做什么?
Result, False. Several alternatives were same or caused exceptions.
当您重新编码 sub
时,您实际上是在更改值,即使它“看起来”相同。当您使用 UTF-8 重新编码并忽略选项时,os.listdir
返回的路径将不是相同的值。
如果您想显示文件名并访问其内容,您需要存储这两个值。
print(os.path.exists(sub))
# False
尝试os.path.exists(os.path.join(path, sub))
。