有没有办法将 100 多个数据帧的列表导出到 excel?

Is there a way to export a list of 100+ dataframes to excel?

所以这有点奇怪,但我是 Python 的新手,我致力于完成我与 Python 的第一个项目。

所以我正在从文件路径中读取大约 100 个 .xlsx 文件。然后我 trim 每个文件并仅将重要信息作为单独且唯一的数据帧发送到列表。所以现在我有一个包含 100 个唯一数据帧的列表,但是遍历列表并写入 excel 只会覆盖文件中的数据。我想附加 .xlsx 文件的末尾。所有这一切的最大问题是,我只能使用 Excel 2010,我没有任何其他版本的应用程序。所以 openpyxl 库似乎有一些有趣的东西,我试过这样的东西:

from openpyxl.utils.dataframe import dataframe_to_rows
wb = load_workbook(outfile_path)
ws = wb.active

for frame in main_df_list:
    for r in dataframe_to_rows(frame, index = True, header = True):
        ws.append(r)

注意:在另一个 post 中,我被告知使用循环逐行读取数据帧不是最佳做法,但当我开始时我并不知道这一点。然而,我致力于这个怪物。

看完评论编辑

所以我的代码抓取 .xlsx 文件并根据关键字比较将特定数据存储到数据框中。这些数据帧存储在一个列表中,我将在下面列出整个程序,希望我能解释我的想法。另外,请随意讨论我的代码,因为我不知道什么是真正好的 python 实践与什么不是。

import os
import pandas as pd
from openpyxl import load_workbook

#the file path I want to pull from
in_path = r'W:\R1_Manufacturing\Parts List Project\Tool_scraping\Excel'
#the file path where row search items are stored
search_parameters = r'W:\R1_Manufacturing\Parts List Project\search_params.xlsx'
#the file I will write the dataframes to
outfile_path = r'W:\R1_Manufacturing\Parts List Project\xlsx_reader.xlsx'

#establishing my list that I will store looped data into
file_list = []
main_df = []
master_list = []

#open the file path to store the directory in files
files = os.listdir(in_path)

#database with terms that I want to track
search = pd.read_excel(search_parameters)
search_size = search.index

#searching only for files that end with .xlsx
for file in files:
    if file.endswith('.xlsx'):
        file_list.append(in_path + '/' + file)

#read in the files to a dataframe, main loop the files will be maninpulated in
for current_file in file_list:
    df = pd.read_excel(current_file)
    
    #get columns headers and a range for total rows
    columns = df.columns
    total_rows = df.index
    
    #adding to store where headers are stored in DF
    row_list = []
    column_list = []
    header_list = []

for name in columns:
        for number in total_rows:
            cell = df.at[number, name]
            if isinstance(cell, str) == False:
                continue
            elif cell == '':
                continue
            for place in search_size:
                search_loop = search.at[place, 'Parameters']
                #main compare, if str and matches search params, then do...
                if insensitive_compare(search_loop, cell) == True:
                    if cell not in header_list:
                        header_list.append(df.at[number, name]) #store data headers
                        row_list.append(number)  #store row number where it is in that data frame
                        column_list.append(name) #store column number where it is in that data frame
                    else:
                        continue
                else:
                    continue
    
    for thing in column_list:
        df = pd.concat([df, pd.DataFrame(0, columns=[thing], index = range(2))], ignore_index = True)

    #turns the dataframe into a set of booleans where its true if 
    #theres something there
    na_finder = df.notna()
    
    #create a new dataframe to write the output to
    outdf = pd.DataFrame(columns = header_list)


for i in range(len(row_list)):
        k = 0
        while na_finder.at[row_list[i] + k, column_list[i]] == True: 
        #I turn the dataframe into booleans and read until False
            if(df.at[row_list[i] + k, column_list[i]] not in header_list): 
            #Store actual dataframe into my output dataframe, outdf
                outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
            k += 1
            
    main_df.append(outdf)

所以 main_df 是一个包含 100 多个数据帧的列表。对于这个例子,我将只使用其中的 2 个。我希望他们打印成 excel,例如:

所以 Ashish 的评论真的帮助了我,所有的数据框都有不同的列标题,所以我的 100 多个数据框最终连接到一个 569X52 的数据框。这是我使用的代码,我完全放弃了 openpyxl 因为一旦我能够将所有数据帧连接在一起,我只需要使用 pandas:

导出它
# what I want to do here is grab all the data in the same column as each 
# header, then move to the next column
for i in range(len(row_list)):
    k = 0
    while na_finder.at[row_list[i] + k, column_list[i]] == True:
        if(df.at[row_list[i] + k, column_list[i]] not in header_list):
            outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
        k += 1
            
main_df.append(outdf)

to_xlsx_df = pd.DataFrame()

for frame in main_df:
    to_xlsx_df = pd.concat([to_xlsx_df, frame])           

to_xlsx_df.to_excel(outfile_path)

excel 的输出最终看起来像这样:

希望这也能帮助其他人。