使用像 Xlrd 模块一样的 Pandas 读取 Excel 行

Read Excel Rows Using Pandas like Xlrd module

在这个例子中,我展示了两种读取 excel 文件和打印数据的方法。 我不想使用这两种方式,而是想使用 pandas 模块来输出数据,就像我对 xlrd 模块所做的那样。 我想遍历 column/rows 并将其附加到数组 (Col_B3, Col_C3, Col_D3) 我该怎么做?

import xlrd
import pandas as pd
import io

path = r'C:\Temp Files\My_Excel_File.xlsx'

''' USING XLRD '''
#open workbook
inputWorkbook = xlrd.open_workbook(path)
#open first sheet
Sheet = inputWorkbook.sheet_by_index(0)

Col_B3 = []
Col_C3 = []
Col_D3 = []

for row in range(2 ,Sheet.nrows):
    Col_B3.append(Sheet.cell_value(row, 1))
    Col_C3.append(Sheet.cell_value(row, 2))
    Col_D3.append(Sheet.cell_value(row, 3))

print(Col_B3)    
print(Col_C3)
print(Col_D3)


''' USING PANDAS '''
df = pd.read_excel(path)

print(df)

XLRD OUTPUT

['Col_B3', 1.0, 2.0, 3.0, 4.0]
['Col_C3', 'Jack', 'Jill', 'Peter', 'Jade']
['Col_D3', 1200.0, 875.0, 120.0, 4230.0]

PANDAS OUTPUT

   Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
0         NaN        NaN        NaN        NaN
1         NaN     Col_B3     Col_C3     Col_D3
2         NaN          1       Jack       1200
3         NaN          2       Jill        875
4         NaN          3      Peter        120
5         NaN          4       Jade       4230

试试这个,我已经在电子表格中重新创建了您的数据。

import pandas as pd
import numpy as np

df = pd.read_excel(r"C:\Temp Files\My_Excel_File.xlsx")
#drop column name
df=df.drop(["Unnamed: 0"],axis=1)
#drop na
df=df.dropna(axis=0)

# remove all non-integer values
df = df[df.applymap(np.isreal).any(1)]

#Map column names

df.columns = ["Col_B3","Col_C3","Col_D3"]

#print results
print(df)

Col_B3 Col_C3 Col_D3
2      1   Jack   1200
3      2   Jill    875
4      3  Peter    120
5      4   Jade   4320

使用 XLRD,您可以更好地控制在读入数据时如何处理数据。Pandas 按原样读取数据;您的第一列为空,与第一行相同。您的数据也是列格式,因此 Pandas 按列阅读。

您可以使用 Pandas 将其重塑为列表形式:

res = (df.dropna(how='all') #remove completely empty rows
      .dropna(how='all',axis=1) #remove completely empty columns
      .T #flip columns into row position
      #convert to list    
      .to_numpy()
      .tolist()
      )

print(res)

[['Col_B3', '1', '2', '3', '4'],
 ['Col_C3', 'Jack', 'Jill', 'Peter', 'Jade'],
 ['Col_D3', '1200', '875', '120', '4230']]