Python 3 不知道 Windows 文件名编码?
Python 3 not aware of Windows filename encodings?
以下代码在 Win7 中运行良好,直到在最后一个 print(f) 中崩溃。当它在文件名中找到一些 "exotic" 字符时,它会执行此操作,例如 ouvre 中的法语 "oe" 和 Karel Çapek 中的 C。程序因编码错误而崩溃,表示文件名中的字符 x 不是有效的 utf-8 字符。
Python3 不应该知道 Windows7 路径的 utf-16 编码吗?
我应该如何修改我的代码?
import os
rootDir = '.'
extensions = ['mobi','lit','prc','azw','rtf','odt','lrf','fb2','azw3' ]
files=[]
for dirName, subdirList, fileList in os.walk(rootDir):
files.extend((os.path.join(dirName,fn) for fn in fileList if any([fn.endswith(ext) for ext in extensions])))
for f in files:
print(f)
eryksun 在评论中回答了我的问题。我在这里复制了他的回答,这样线程就不会被视为未回答,win-unicode-console 模块解决了问题:
Python 3's raw FileIO class forces binary mode, which precludes using
a UTF-16 text mode for the Windows console. Thus the default setup is
limited to using an OEM/ANSI codepage. To avoid raising an exception,
you'd have to use a less-strict 'replace' or 'backslashreplace' mode
for sys.stdout. Switching to codepage 65001 (UTF-8) seems like it
should be the answer, but the console host (conhost.exe) has problems
with multibyte encodings. That leaves the UTF-16 wide-character API,
such as via the win-unicode-console module.
以下代码在 Win7 中运行良好,直到在最后一个 print(f) 中崩溃。当它在文件名中找到一些 "exotic" 字符时,它会执行此操作,例如 ouvre 中的法语 "oe" 和 Karel Çapek 中的 C。程序因编码错误而崩溃,表示文件名中的字符 x 不是有效的 utf-8 字符。
Python3 不应该知道 Windows7 路径的 utf-16 编码吗?
我应该如何修改我的代码?
import os
rootDir = '.'
extensions = ['mobi','lit','prc','azw','rtf','odt','lrf','fb2','azw3' ]
files=[]
for dirName, subdirList, fileList in os.walk(rootDir):
files.extend((os.path.join(dirName,fn) for fn in fileList if any([fn.endswith(ext) for ext in extensions])))
for f in files:
print(f)
eryksun 在评论中回答了我的问题。我在这里复制了他的回答,这样线程就不会被视为未回答,win-unicode-console 模块解决了问题:
Python 3's raw FileIO class forces binary mode, which precludes using a UTF-16 text mode for the Windows console. Thus the default setup is limited to using an OEM/ANSI codepage. To avoid raising an exception, you'd have to use a less-strict 'replace' or 'backslashreplace' mode for sys.stdout. Switching to codepage 65001 (UTF-8) seems like it should be the answer, but the console host (conhost.exe) has problems with multibyte encodings. That leaves the UTF-16 wide-character API, such as via the win-unicode-console module.