如何从 python 中的 .xls 文件中读取多个表？

Question

我需要使用 python 从 Excel 文件中的 sheet 读取多个表。 sheet 看起来像这样：

我想获得一个 python 对象，其中包含 First_Table 中的信息，Second_Table 中的信息也是如此。我试过这样使用 pandas 和 Dataframe.iloc：

import pandas as pd
xls = pd.ExcelFile('path_to_xls_file')
df = pd.read_excel(xls, "sheet_1")
# first table
df1 = df.iloc[2:12,0:6]

但是我没有从 First_Table 中得到预期的单元格。我对行和列的范围做错了吗？是否必须使用确切的行和列索引指定，或者是否有更有效和更优雅的方法来指定它？

提前致谢！

Answer 1

对要从 excel 文件中读取的列 select 使用 "usecols" 参数。 Pandas 将相应地 select 行。

您还需要将索引设置为 False 以避免将第一列作为索引。

以下是您的任务的示例代码

pd.read_excel(path, usecols=range(1,6), index=False)

在documentation

中查找更多信息

Answer 2

该方法是正确的，但可能不是最优的。你没有得到正确的 table，因为索引不正确 - 根据你的屏幕 df1 = df.iloc[1:12,1:6] 应该做的工作。

更好的解决方案是为 pd.read_excel()

设置 header 和 usecols 参数

header : int, list of ints,

default 0 Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex.

Use None if there is no header

usecols : int or list, default None

If None then parse all columns,

If int then indicates last column to be parsed

If list of ints then indicates list of column numbers to be parsed

If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.

检索自：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

此外，可能有设计用于在一个 sheet 中读取多个 table 的包，但我不知道有任何包。

- 重复？

如何从 python 中的 .xls 文件中读取多个表？

How to read multiple tables from .xls file in python?

python

excel

xlrd

pandas