在 Pandas 中循环清理多个文档并将它们保存到一本书中
cleaning multiple documents and saving them to one book with a loop in Pandas
我有将近 30 个相同格式的 xlsx 文件的列表。我有一个有效的数据清理代码,我想清理它们并将它们保存在单独的 sheet 中,放在一本书中。我认为循环最适合完成这项工作,但缺少一些东西。我见过一些将多个 sheet 保存到工作簿的函数,但我希望 read_excel、清理数据框、保存到 sheet 并删除数据框。正在发生的事情是,它只是 returns 新 excel 文档列表中的最后一个 sheet。
BOX = [
"aa1",
"aa2",
"aa3"]
for B in BOX:
filename = B+".xls"
#create data frame
BDF = pd.read_excel(r'C:\Projects\BOXES\' + filename)
#clean data frame
BDF = BDF.dropna(how="all")
BDF['Total Cost'] = BDF['Total Cost'].str.replace('.', '')
BDF.columns = ['LVL', 'PN', 'Leadtime', 'Description', 'Ext QTY']
BDF.PN = BDF.PN.str.strip()
sheetname=B
#save to sheet
with pd.ExcelWriter(r'C:\Projects\BOXES\BOXED.xlsx') as writer:
BDF.to_excel(writer, sheet_name=B, index=False)
#delete data frame before repeating
del(BDF)
del(B)
您应该将 with
放在循环之外,因为它会在每个 for
循环中自动打开和关闭文件。以下应该有效:
BOX = [
"aa1",
"aa2",
"aa3"]
with pd.ExcelWriter(r'C:\Projects\BOXES\BOXED.xlsx') as writer:
for B in BOX:
filename = B+".xls"
#create data frame
BDF = pd.read_excel(r'C:\Projects\BOXES\' + filename)
#clean data frame
BDF = BDF.dropna(how="all")
BDF['Total Cost'] = BDF['Total Cost'].str.replace('.', '')
BDF.columns = ['LVL', 'PN', 'Leadtime', 'Description', 'Ext QTY']
BDF.PN = BDF.PN.str.strip()
sheetname=B
#save to sheet
BDF.to_excel(writer, sheet_name=B, index=False)
#delete data frame before repeating
del(BDF)
del(B)
我有将近 30 个相同格式的 xlsx 文件的列表。我有一个有效的数据清理代码,我想清理它们并将它们保存在单独的 sheet 中,放在一本书中。我认为循环最适合完成这项工作,但缺少一些东西。我见过一些将多个 sheet 保存到工作簿的函数,但我希望 read_excel、清理数据框、保存到 sheet 并删除数据框。正在发生的事情是,它只是 returns 新 excel 文档列表中的最后一个 sheet。
BOX = [
"aa1",
"aa2",
"aa3"]
for B in BOX:
filename = B+".xls"
#create data frame
BDF = pd.read_excel(r'C:\Projects\BOXES\' + filename)
#clean data frame
BDF = BDF.dropna(how="all")
BDF['Total Cost'] = BDF['Total Cost'].str.replace('.', '')
BDF.columns = ['LVL', 'PN', 'Leadtime', 'Description', 'Ext QTY']
BDF.PN = BDF.PN.str.strip()
sheetname=B
#save to sheet
with pd.ExcelWriter(r'C:\Projects\BOXES\BOXED.xlsx') as writer:
BDF.to_excel(writer, sheet_name=B, index=False)
#delete data frame before repeating
del(BDF)
del(B)
您应该将 with
放在循环之外,因为它会在每个 for
循环中自动打开和关闭文件。以下应该有效:
BOX = [
"aa1",
"aa2",
"aa3"]
with pd.ExcelWriter(r'C:\Projects\BOXES\BOXED.xlsx') as writer:
for B in BOX:
filename = B+".xls"
#create data frame
BDF = pd.read_excel(r'C:\Projects\BOXES\' + filename)
#clean data frame
BDF = BDF.dropna(how="all")
BDF['Total Cost'] = BDF['Total Cost'].str.replace('.', '')
BDF.columns = ['LVL', 'PN', 'Leadtime', 'Description', 'Ext QTY']
BDF.PN = BDF.PN.str.strip()
sheetname=B
#save to sheet
BDF.to_excel(writer, sheet_name=B, index=False)
#delete data frame before repeating
del(BDF)
del(B)