读取 Python 中的 excel 文件时出现编码问题

Question

我使用 pandas 库中的 read_excel 来读取 excel 内容并将其转换为 JSON。我正在努力解决编码问题。非英语字符编码为 "u652f\u63f4\u8cc7\u8a0a"。我该如何解决这个问题？我试过了

wb = xlrd.open_workbook(excel_filePath, encoding_override='ISO-8859-1')
new_data = pd.read_excel(wb)

还有

with open(excel_filePath, mode="r", encoding="utf-8") as file:
  new_data = pd.read_excel(excel_filePath)

我尝试使用如下编码的代码：utf-8, utf-16, utf-16, latin1...

Answer 1

From the docs of the json module:

The RFC requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.

As permitted, though not required, by the RFC, this module’s serializer sets ensure_ascii=True by default, thus escaping the output so that the resulting strings only contain ASCII characters.

也许令人惊讶的是，在这个时代，模块默认转义非 ASCII（可能是为了向后兼容），所以只需使用 ensure_ascii=false:

覆盖该行为

with open(json_filePath, 'w') as f:
    json.dump(new_json, f, ensure_ascii=False)

读取 Python 中的 excel 文件时出现编码问题

Encoding issue during reading excel file in Python

python

excel

encoding

json

pandas