Pandas open_excel() fails with xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

Pandas open_excel() fails with xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

我正在尝试使用 pandas 来解析 .xlsm 文档。我的代码与给我的示例文件完美配合,但是一旦我获得其余文档,它就会因上述错误而失败。这是有问题的堆栈跟踪:

Traceback (most recent call last):
  File "@@@@@@@@/UnsupervisedCAM.py", line 9, in <module>
    info_dict = read_excel_to_dict('files/' + filename)
  File "@@@@@@@@\readCAM.py", line 7, in read_excel_to_dict
    df = pandas.read_excel(filename, parse_cols='E,G,I,K,Q,O')
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 191, in read_excel
    io = ExcelFile(io, engine=engine)
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\pandas\io\excel.py", line 249, in __init__
    self.book = xlrd.open_workbook(io)
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
    ragged_rows=ragged_rows,
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 87, in open_workbook_xls
    ragged_rows=ragged_rows,
  File "@@@@@@@@\Anaconda3\envs\tensorflow\lib\site-packages\xlrd\book.py", line 595, in biff2_8_load
    raise XLRDError("Can't find workbook in OLE2 compound document")
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

我什至不知道从哪里开始...还没有在网上找到任何有用的东西。

经过大量搜索,我发现唯一的方法是打开并保存所有 excel 文档,这些文档似乎 'strip' 它们的 OLE2 格式。我使用以下 vbs 脚本自动化了该过程:

Dim objFSO, objFolder, objFile
Dim objExcel, objWB
Set objExcel = CreateObject("Excel.Application")
Set objFSO = CreateObject("scripting.filesystemobject")
   MyFolder = "<PATH/TO/FILES"
Set objFolder = objfso.getfolder(myfolder)
For Each objFile In objfolder.Files
If Right(objFile.Name,4) = "<EXTENSION>" Then
Set objWB = objExcel.Workbooks.Open(objFile)
objWB.save
objWB.close
End If
Next
objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing
Wscript.Echo "Done"

确保更改文件夹路径和扩展名。

如果您像我在搜索错误时一样在 Jupyter notebook 上遇到这个问题,您只需重新启动内核即可解决问题。

我收到了同样的错误消息,可以通过删除 xlsx 文件的密码保护来解决。 (并不是说这是错误的唯一原因,但值得检查!)