有没有办法将 100 多个数据帧的列表导出到 excel?
Is there a way to export a list of 100+ dataframes to excel?
所以这有点奇怪,但我是 Python 的新手,我致力于完成我与 Python 的第一个项目。
所以我正在从文件路径中读取大约 100 个 .xlsx 文件。然后我 trim 每个文件并仅将重要信息作为单独且唯一的数据帧发送到列表。所以现在我有一个包含 100 个唯一数据帧的列表,但是遍历列表并写入 excel 只会覆盖文件中的数据。我想附加 .xlsx 文件的末尾。所有这一切的最大问题是,我只能使用 Excel 2010,我没有任何其他版本的应用程序。所以 openpyxl 库似乎有一些有趣的东西,我试过这样的东西:
from openpyxl.utils.dataframe import dataframe_to_rows
wb = load_workbook(outfile_path)
ws = wb.active
for frame in main_df_list:
for r in dataframe_to_rows(frame, index = True, header = True):
ws.append(r)
注意:在另一个 post 中,我被告知使用循环逐行读取数据帧不是最佳做法,但当我开始时我并不知道这一点。然而,我致力于这个怪物。
看完评论编辑
所以我的代码抓取 .xlsx 文件并根据关键字比较将特定数据存储到数据框中。这些数据帧存储在一个列表中,我将在下面列出整个程序,希望我能解释我的想法。另外,请随意讨论我的代码,因为我不知道什么是真正好的 python 实践与什么不是。
import os
import pandas as pd
from openpyxl import load_workbook
#the file path I want to pull from
in_path = r'W:\R1_Manufacturing\Parts List Project\Tool_scraping\Excel'
#the file path where row search items are stored
search_parameters = r'W:\R1_Manufacturing\Parts List Project\search_params.xlsx'
#the file I will write the dataframes to
outfile_path = r'W:\R1_Manufacturing\Parts List Project\xlsx_reader.xlsx'
#establishing my list that I will store looped data into
file_list = []
main_df = []
master_list = []
#open the file path to store the directory in files
files = os.listdir(in_path)
#database with terms that I want to track
search = pd.read_excel(search_parameters)
search_size = search.index
#searching only for files that end with .xlsx
for file in files:
if file.endswith('.xlsx'):
file_list.append(in_path + '/' + file)
#read in the files to a dataframe, main loop the files will be maninpulated in
for current_file in file_list:
df = pd.read_excel(current_file)
#get columns headers and a range for total rows
columns = df.columns
total_rows = df.index
#adding to store where headers are stored in DF
row_list = []
column_list = []
header_list = []
for name in columns:
for number in total_rows:
cell = df.at[number, name]
if isinstance(cell, str) == False:
continue
elif cell == '':
continue
for place in search_size:
search_loop = search.at[place, 'Parameters']
#main compare, if str and matches search params, then do...
if insensitive_compare(search_loop, cell) == True:
if cell not in header_list:
header_list.append(df.at[number, name]) #store data headers
row_list.append(number) #store row number where it is in that data frame
column_list.append(name) #store column number where it is in that data frame
else:
continue
else:
continue
for thing in column_list:
df = pd.concat([df, pd.DataFrame(0, columns=[thing], index = range(2))], ignore_index = True)
#turns the dataframe into a set of booleans where its true if
#theres something there
na_finder = df.notna()
#create a new dataframe to write the output to
outdf = pd.DataFrame(columns = header_list)
for i in range(len(row_list)):
k = 0
while na_finder.at[row_list[i] + k, column_list[i]] == True:
#I turn the dataframe into booleans and read until False
if(df.at[row_list[i] + k, column_list[i]] not in header_list):
#Store actual dataframe into my output dataframe, outdf
outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
k += 1
main_df.append(outdf)
所以 main_df 是一个包含 100 多个数据帧的列表。对于这个例子,我将只使用其中的 2 个。我希望他们打印成 excel,例如:
所以 Ashish 的评论真的帮助了我,所有的数据框都有不同的列标题,所以我的 100 多个数据框最终连接到一个 569X52 的数据框。这是我使用的代码,我完全放弃了 openpyxl 因为一旦我能够将所有数据帧连接在一起,我只需要使用 pandas:
导出它
# what I want to do here is grab all the data in the same column as each
# header, then move to the next column
for i in range(len(row_list)):
k = 0
while na_finder.at[row_list[i] + k, column_list[i]] == True:
if(df.at[row_list[i] + k, column_list[i]] not in header_list):
outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
k += 1
main_df.append(outdf)
to_xlsx_df = pd.DataFrame()
for frame in main_df:
to_xlsx_df = pd.concat([to_xlsx_df, frame])
to_xlsx_df.to_excel(outfile_path)
希望这也能帮助其他人。
所以这有点奇怪,但我是 Python 的新手,我致力于完成我与 Python 的第一个项目。
所以我正在从文件路径中读取大约 100 个 .xlsx 文件。然后我 trim 每个文件并仅将重要信息作为单独且唯一的数据帧发送到列表。所以现在我有一个包含 100 个唯一数据帧的列表,但是遍历列表并写入 excel 只会覆盖文件中的数据。我想附加 .xlsx 文件的末尾。所有这一切的最大问题是,我只能使用 Excel 2010,我没有任何其他版本的应用程序。所以 openpyxl 库似乎有一些有趣的东西,我试过这样的东西:
from openpyxl.utils.dataframe import dataframe_to_rows
wb = load_workbook(outfile_path)
ws = wb.active
for frame in main_df_list:
for r in dataframe_to_rows(frame, index = True, header = True):
ws.append(r)
注意:在另一个 post 中,我被告知使用循环逐行读取数据帧不是最佳做法,但当我开始时我并不知道这一点。然而,我致力于这个怪物。
看完评论编辑
所以我的代码抓取 .xlsx 文件并根据关键字比较将特定数据存储到数据框中。这些数据帧存储在一个列表中,我将在下面列出整个程序,希望我能解释我的想法。另外,请随意讨论我的代码,因为我不知道什么是真正好的 python 实践与什么不是。
import os
import pandas as pd
from openpyxl import load_workbook
#the file path I want to pull from
in_path = r'W:\R1_Manufacturing\Parts List Project\Tool_scraping\Excel'
#the file path where row search items are stored
search_parameters = r'W:\R1_Manufacturing\Parts List Project\search_params.xlsx'
#the file I will write the dataframes to
outfile_path = r'W:\R1_Manufacturing\Parts List Project\xlsx_reader.xlsx'
#establishing my list that I will store looped data into
file_list = []
main_df = []
master_list = []
#open the file path to store the directory in files
files = os.listdir(in_path)
#database with terms that I want to track
search = pd.read_excel(search_parameters)
search_size = search.index
#searching only for files that end with .xlsx
for file in files:
if file.endswith('.xlsx'):
file_list.append(in_path + '/' + file)
#read in the files to a dataframe, main loop the files will be maninpulated in
for current_file in file_list:
df = pd.read_excel(current_file)
#get columns headers and a range for total rows
columns = df.columns
total_rows = df.index
#adding to store where headers are stored in DF
row_list = []
column_list = []
header_list = []
for name in columns:
for number in total_rows:
cell = df.at[number, name]
if isinstance(cell, str) == False:
continue
elif cell == '':
continue
for place in search_size:
search_loop = search.at[place, 'Parameters']
#main compare, if str and matches search params, then do...
if insensitive_compare(search_loop, cell) == True:
if cell not in header_list:
header_list.append(df.at[number, name]) #store data headers
row_list.append(number) #store row number where it is in that data frame
column_list.append(name) #store column number where it is in that data frame
else:
continue
else:
continue
for thing in column_list:
df = pd.concat([df, pd.DataFrame(0, columns=[thing], index = range(2))], ignore_index = True)
#turns the dataframe into a set of booleans where its true if
#theres something there
na_finder = df.notna()
#create a new dataframe to write the output to
outdf = pd.DataFrame(columns = header_list)
for i in range(len(row_list)):
k = 0
while na_finder.at[row_list[i] + k, column_list[i]] == True:
#I turn the dataframe into booleans and read until False
if(df.at[row_list[i] + k, column_list[i]] not in header_list):
#Store actual dataframe into my output dataframe, outdf
outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
k += 1
main_df.append(outdf)
所以 main_df 是一个包含 100 多个数据帧的列表。对于这个例子,我将只使用其中的 2 个。我希望他们打印成 excel,例如:
所以 Ashish 的评论真的帮助了我,所有的数据框都有不同的列标题,所以我的 100 多个数据框最终连接到一个 569X52 的数据框。这是我使用的代码,我完全放弃了 openpyxl 因为一旦我能够将所有数据帧连接在一起,我只需要使用 pandas:
导出它# what I want to do here is grab all the data in the same column as each
# header, then move to the next column
for i in range(len(row_list)):
k = 0
while na_finder.at[row_list[i] + k, column_list[i]] == True:
if(df.at[row_list[i] + k, column_list[i]] not in header_list):
outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
k += 1
main_df.append(outdf)
to_xlsx_df = pd.DataFrame()
for frame in main_df:
to_xlsx_df = pd.concat([to_xlsx_df, frame])
to_xlsx_df.to_excel(outfile_path)
希望这也能帮助其他人。