Python 约定名称文件编码从 iso-8859-5 到 utf-8

Question

我有大约 3500 个文件，其名称和内容也是用 'iso-8859-5' 编码的。这是它在 Linux 控制台和 7 zip 程序中的样子：

我正在尝试编写一个转换为 'UTF-8'

的脚本

# -*- coding: utf-8 -*-
import os
#Exemple
#                   how it should look like
#iso-8859-5     ==> utf-8
#НјБ_ФШРУ_Г99   ==> ЭМС_диаг_У99

path = r"C://Users//Kamel//Desktop//работа//macros"
obj = os.scandir(path)

for entry in obj:
    if entry.is_dir() or entry.is_file():
        command = entry.name
        print(command, end="\t\t")
        file_name = command.encode('iso-8859-5').decode('UTF-8')
        print(command)

我收到这个错误

C:\Python\Python310\python.exe D:/PycharmProjects/pythonProject3/ansi_to_utf.py
Traceback (most recent call last):
  File "D:\PycharmProjects\pythonProject3\ansi_to_utf.py", line 15, in <module>
    file_name = command.encode('iso-8859-5').decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 11: invalid start byte
BE_BEF      BE_BEF
BE_BEF_IMP_0        BE_BEF_IMP_0
BE_BEF_IMP_1        BE_BEF_IMP_1
BE_BEF_IMP_6        BE_BEF_IMP_6
BE_BEF_IMP_7        BE_BEF_IMP_7
BE_BEF_IMP_8        BE_BEF_IMP_8
BE_BEF_IMP_K        BE_BEF_IMP_K
BE_BEF_IMP_T        BE_BEF_IMP_T
BE_BEF_IMP_В        
Process finished with exit code 1

Answer 1

一个mojibake案例。您的示例 НјБ_ФШРУ_Г99 ==> ЭМС_диаг_У99 可以完成为：

'НјБ_ФШРУ_Г99'.encode('cp1251').decode('iso-8859-5')
# 'ЭМС_диаг_У99'

或（或者）作为

'НјБ_ФШРУ_Г99'.encode('ptcp154').decode('iso-8859-5')
# 'ЭМС_диаг_У99'

您的失败示例（…无法解码位置 11 中的字节 0xb2）：

'BE_BEF_IMP_В'.encode('iso-8859-5')
# b'BE_BEF_IMP_\xb2'

使用相同的机制解决：

'BE_BEF_IMP_В'.encode('cp1251').decode('iso-8859-5')
# 'BE_BEF_IMP_Т'

Python 约定名称文件编码从 iso-8859-5 到 utf-8

Python convention name files encoding from iso-8859-5 to utf-8

python

utf-8