读取 Python 中的 excel 文件时出现编码问题
Encoding issue during reading excel file in Python
我使用 pandas 库中的 read_excel
来读取 excel 内容并将其转换为 JSON。我正在努力解决编码问题。非英语字符编码为 "u652f\u63f4\u8cc7\u8a0a"
。
我该如何解决这个问题?
我试过了
wb = xlrd.open_workbook(excel_filePath, encoding_override='ISO-8859-1')
new_data = pd.read_excel(wb)
还有
with open(excel_filePath, mode="r", encoding="utf-8") as file:
new_data = pd.read_excel(excel_filePath)
我尝试使用如下编码的代码:utf-8, utf-16, utf-16, latin1...
From the docs of the json
module:
The RFC requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.
As permitted, though not required, by the RFC, this module’s serializer sets ensure_ascii=True
by default, thus escaping the output so that the resulting strings only contain ASCII characters.
也许令人惊讶的是,在这个时代,模块默认转义非 ASCII(可能是为了向后兼容),所以只需使用 ensure_ascii=false
:
覆盖该行为
with open(json_filePath, 'w') as f:
json.dump(new_json, f, ensure_ascii=False)
我使用 pandas 库中的 read_excel
来读取 excel 内容并将其转换为 JSON。我正在努力解决编码问题。非英语字符编码为 "u652f\u63f4\u8cc7\u8a0a"
。
我该如何解决这个问题?
我试过了
wb = xlrd.open_workbook(excel_filePath, encoding_override='ISO-8859-1')
new_data = pd.read_excel(wb)
还有
with open(excel_filePath, mode="r", encoding="utf-8") as file:
new_data = pd.read_excel(excel_filePath)
我尝试使用如下编码的代码:utf-8, utf-16, utf-16, latin1...
From the docs of the json
module:
The RFC requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.
As permitted, though not required, by the RFC, this module’s serializer sets
ensure_ascii=True
by default, thus escaping the output so that the resulting strings only contain ASCII characters.
也许令人惊讶的是,在这个时代,模块默认转义非 ASCII(可能是为了向后兼容),所以只需使用 ensure_ascii=false
:
with open(json_filePath, 'w') as f:
json.dump(new_json, f, ensure_ascii=False)