Openpyxl,Pandas 或两者

Openpyxl, Pandas or both

我正在尝试处理一个 excel 文件,以便以后可以使用每一行和每一列进行特定操作。

我的问题如下:

    #reading the excel file
    path = r'Datasets/Chapter 1/Table B1.1.xlsx'
    wb = load_workbook(path) #loading the excel table
    ws = wb.active #grab the active worksheet
    
    #Setting the doc Header
    for h in ws.iter_rows(max_row = 1, values_only = True): #getting the first row (Headers) in the table
        header = list(h)
    
    for sh in ws.iter_rows(min_row = 1 ,max_row = 2, values_only = True):
        sub_header = list(sh)
    
    #removing all of the none Values
    header = list(filter(None, header))
    sub_header = list(filter(None, sub_header))
    #creating a list of all the rows in the excel file
    row_list = []
    
    for row in ws.iter_rows(min_row=3): #Iteration over every single row starting from the third row since first two are the headers
        row = [cell.value for cell in row] #Creating a list from each row
        row = list(filter(None, row)) #removing the none values from each row
        row_list.append(row) #creating a list of all rows (starting from the 3d one)

    colm = []
    for col in ws.iter_cols(min_row=3,min_col = 1): #Iteration over every single row starting from the third row since first two are the headers
        col = [cell.value for cell in col] #Creating a list from each row
        col = list(filter(None, col)) #removing the none values from each row
        colm.append(col) #creating a list of all rows (starting from the 3d one)

但同时(据我在文档中阅读的内容),我无法将其可视化或直接对行或列进行操作。

所以我的问题是是否有一种解决方案能够仅使用其中一个来完成我想做的事情,或者如果同时使用它们还不错,请记住我必须加载 excel 文件两次。

“提前致谢!”

使用openpyxl读取一次excel文件没有问题,然后将行加载到pandas:

pandas.DataFrame(row_list, columns=header)

你是对的,使用索引遍历 DataFrame 非常慢,但你有其他选择:apply(), iterrows(), itertuples()

Link: Different ways to iterate over rows in pandas DataFrame

我还想指出,您的代码可能无法满足您的要求。

  1. list(filter(None, header)) 不仅过滤 None,而且过滤所有虚假值,例如 0"".
  2. 这样的过滤会移动列。例如,您有行 [1, None, 3] 和列 ['a', 'b', 'c']。通过过滤 None,您将得到 [1, 3],这将与列 'a''b'.
  3. 相关