在 Python 中读取包含单元格的 Excel 文件

Question

我有一个 excel table 以下类型（下面描述的问题是由联合单元的存在引起的）。

我正在使用 pandas 中的 read_excel 来阅读。

我想要什么：我想使用第一列中的值作为索引，并将第三列中的值组合在一个单元格中，例如像 here.

直接应用read_excel得到的可以看到here.

如果需要：请查看下面用于读取文件的代码（我在 google colab 中从 google 驱动器读取它）：

path = '/content/drive/MyDrive/ExampleFile.xlsx'
pd.read_excel(path, header = 0, index_col = 0)

你能帮忙吗？如果问题中有任何不清楚的地方，请告诉我。

Answer 1

这是实现它的一种方法。我创建了与您类似的 xls，第一列的标题为 sno

# fill the null values with values from previous rows
df=df.ffill()

# combine the rows where class is the same and create a new column
df=df.assign(comb=df.groupby(['class'])['type'].transform(lambda x: ','.join(x)))

# drop the duplicated rows
df2=df.drop_duplicates(subset=['class','comb'])[['class','comb']]

    class   comb
0   fruit   apple,orange
2   toys    car,truck,train

在 Python 中读取包含单元格的 Excel 文件

Reading an Excel file with united cells in Python

excel

dataframe

pandas