由于数字格式为文本，将 Excel 文件读取到 Python 失败

Question

我有一大堆 Excel 文件，每个文件都有一列，其中数字被格式化为文本。 Excel 给出错误 "the number in this cell is formatted as text or preceded by an apostrophe" - 请参阅第三列，其中单元格具有绿色三角形。

我的目标是在 Pandas 中打开所有这些文件，而无需 手动打开每个文件并将列转换为数字。但是，pd.read_excel() 失败并出现以下 xlrd 错误：

XLRDError: ZIP file contents not a known type of workbook

毫不奇怪，当我直接使用 xlrd 时：wb = xlrd.open_workbook(filename) 我得到了同样的错误。

我也试过 openpyxl: wb = openpyxl.load_workbook(filename)，它给了我这个：

KeyError: "There is no item named 'xl/_rels/workbook.xml.rels' in the archive"

我确认如果我在 excel 中手动将列转换为数字并重新保存工作簿，则文件可由 pandas (xlrd) 和 openpyxl 打开。

有没有人有什么想法？

Answer 1

在阅读 excel 时使用 "converters"。
Docs。

例如：

df = pd.read_excel('yourfile.xlsx',sheetname='sheetname',header=0,converters={ "% Chg" : str })

Reading Excel file to Python fails due to number formatted as text