Pandas 和 xlrd 读取 excel 文件时出错

Question

我一直在研究 Python 脚本，该脚本处理从 Excel 文件创建 Pandas 数据帧。在过去的几天里，Pandas 方法与通常的 pd.read_excel() 方法完美结合。

今天我一直在尝试运行相同的代码，但运行遇到了错误。我尝试在一个小型测试文档中使用以下代码（只有两列，5 行简单整数）：

import pandas as pd

pd.read_excel("tstr.xlsx")

我收到这个错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\__init__.py", line 130, in open_workbook
    bk = xlsx.open_workbook_2007_xml(
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 812, in open_workbook_2007_xml
    x12book.process_stream(zflo, 'Workbook')
  File "C:\Users\micro\AppData\Local\Programs\Python\Python39\lib\site-packages\xlrd\xlsx.py", line 266, in process_stream
    for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
AttributeError: 'ElementTree' object has no attribute 'getiterator'

我在尝试使用 xlrd 直接加载 excel 文件时遇到完全相同的问题。我尝试了几个不同的 excel 文件，我所有的 pip 安装都是最新的。

自从 pd.read_excel 上次完美运行以来，我没有对我的系统进行任何更改（我确实重新启动了我的系统，但它不涉及任何更新）。如果相关的话，我正在使用 Windows 10 机器。

还有其他人遇到过这个问题吗？关于如何进行的任何建议？

Answer 1

可能有许多不同的原因导致此错误，但您应该尝试添加 engine='xlrd' 或其他可能的值（主要是“openpyxl”）。它可能会解决您的问题，因为它更多地取决于 excel 文件而不是您的代码。

此外，尝试添加文件的完整路径而不是相对路径。

Answer 2

openpyxl.utils.exceptions.InvalidFileException: openpyxl不支持旧的.xls文件格式，请使用 xlrd 读取此文件，或将其转换为更新的 .xlsx 文件格式。

所以对我来说，论点是：

engine="xlrd" 在 .xls
engine="openpyxl" 在 .xlsx

Answer 3

这对我有用

#Back to linux prompt and install openpyxl

pip install openpyxl

#Add engine='openpyxl' in the python argument

data = pd.read_excel(path, sheet_name='Sheet1', parse_dates=True, engine='openpyxl')

Pandas 和 xlrd 读取 excel 文件时出错

Pandas and xlrd error while reading excel files

python

excel

xlrd

pandas