Openpyxl，Pandas 或两者

Question

我正在尝试处理一个 excel 文件，以便以后可以使用每一行和每一列进行特定操作。

我的问题如下：

使用 Openpyxl 让我更容易加载文件并能够遍历行

    #reading the excel file
    path = r'Datasets/Chapter 1/Table B1.1.xlsx'
    wb = load_workbook(path) #loading the excel table
    ws = wb.active #grab the active worksheet
    
    #Setting the doc Header
    for h in ws.iter_rows(max_row = 1, values_only = True): #getting the first row (Headers) in the table
        header = list(h)
    
    for sh in ws.iter_rows(min_row = 1 ,max_row = 2, values_only = True):
        sub_header = list(sh)
    
    #removing all of the none Values
    header = list(filter(None, header))
    sub_header = list(filter(None, sub_header))
    #creating a list of all the rows in the excel file
    row_list = []
    
    for row in ws.iter_rows(min_row=3): #Iteration over every single row starting from the third row since first two are the headers
        row = [cell.value for cell in row] #Creating a list from each row
        row = list(filter(None, row)) #removing the none values from each row
        row_list.append(row) #creating a list of all rows (starting from the 3d one)

    colm = []
    for col in ws.iter_cols(min_row=3,min_col = 1): #Iteration over every single row starting from the third row since first two are the headers
        col = [cell.value for cell in col] #Creating a list from each row
        col = list(filter(None, col)) #removing the none values from each row
        colm.append(col) #creating a list of all rows (starting from the 3d one)

但同时（据我在文档中阅读的内容），我无法将其可视化或直接对行或列进行操作。

虽然使用 pandas 对行和列进行直接操作更有效，但我读过，不推荐迭代数据框以获取列表中的行，即使它是使用 df.iloc[2:] 完成它不会给我相同的结果（将每一行保存在特定列表中，因为 headers 将始终存在）。但是，与 Openpyxl 不同的是，使用 df[col1]-df[col2] 之类的东西使用列名对列进行直接操作要容易得多，这是我需要做的。（因为仅将所有列值放入列表中对我来说是行不通的）

所以我的问题是是否有一种解决方案能够仅使用其中一个来完成我想做的事情，或者如果同时使用它们还不错，请记住我必须加载 excel 文件两次。

“提前致谢！”

Answer 1

使用openpyxl读取一次excel文件没有问题，然后将行加载到pandas:

pandas.DataFrame(row_list, columns=header)

你是对的，使用索引遍历 DataFrame 非常慢，但你有其他选择：apply(), iterrows(), itertuples()

Link: Different ways to iterate over rows in pandas DataFrame

我还想指出，您的代码可能无法满足您的要求。

list(filter(None, header)) 不仅过滤 None，而且过滤所有虚假值，例如 0 或 "".
这样的过滤会移动列。例如，您有行 [1, None, 3] 和列 ['a', 'b', 'c']。通过过滤 None，您将得到 [1, 3]，这将与列 'a' 和 'b'.

Openpyxl，Pandas 或两者

Openpyxl, Pandas or both

python

pandas

dataframe

performance

openpyxl