如何连接具有相同列 headers 但在 Python/Pandas 中随机排序的多个工作表的文件?
How do I concatenate files that have multiple sheets with same column headers but randomly ordered in Python/Pandas?
我有 3 个 xls 文件,每个文件有 3 sheets。
所有 sheet 都有相同的列 headers 但如下所示的顺序不同
1.xls
Name Address Date City State Zip
2.xls
Address Date City Zip Name State
3.xls
City Zip Name Address Date State
我希望我的最终 xls 文件连接所有 3 个文件和 sheets
Output.xls
Name Address Date City State Zip RowNumber SheetName
rownumber 应该是每个文件的特定行号,sheet 数据来自之前 concatenation.Sheetname 应该是 sheet 它来自 xls 文件。
我的尝试-
import os
import pandas as pd
#set src directory
os.chdir('C:/Users/hhh/Desktop/python/Concat')
def read_sheets(filename):
result = []
sheets = pd.read_excel(filename, sheet_name=None)
for name, sheet in sheets.items():
sheet['Sheetname'] = name
sheet['Row'] = sheet.index
result.append(sheet)
return pd.concat(result, ignore_index=True)
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")]
dfoo = read_sheets(files)
但什么也没有发生,我只是收到一个断言错误,说断言 content_or_path 不是 None。这是因为列顺序不匹配吗?有解决方法吗?所有文件和 sheet 中的列数相同。在每个文件中 sheets 具有相同的顺序。但是,如果您将 1.xls sheets 与 2.xls 进行比较,则列顺序会有所不同,正如您在我上面的 reprex 中看到的那样
我认为您的问题是要求采用 9 个不同的 sheet(每个 3 个在 3 个不同的 .xls 文件中)并将它们组合成一个 sheet 的新传播sheet Output.xls.
一些评论开始:
- 不同输入文件的不同列顺序应该不是问题。
- 您可能需要考虑将输出文件设为 .xlsx 文件而不是 .xls,因为处理 .xls 文件所需的 xlwt 包会引发警告:
FutureWarning:
As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas.
This is the only engine in pandas that supports writing in the xls format.
Install openpyxl and write to an xlsx file instead.
You can set the option io.excel.xls.writer to 'xlwt' to silence this warning.
While this option is deprecated and will also raise a warning, it can be globally set and the warning suppressed.
writer = pd.ExcelWriter('Output.xls')
- 您问题中的示例代码将文件列表发送到函数 read_sheets(),因此需要更改此函数以期望此列表而不是单个文件。
- 代码需要遍历输入文件,然后遍历每个文件中的 sheet。
这是对您的代码的修改,它可以满足您的要求(使用 os.chdir() 的不同参数以匹配我的测试环境):
import os
import pandas as pd
#set src directory
#os.chdir('C:/Users/hhh/Desktop/python/Concat')
os.chdir('./Concat')
def read_sheets(files):
result = []
for filename in files:
sheets = pd.read_excel(filename, sheet_name=None)
for name, sheet in sheets.items():
sheet['Sheetname'] = name
sheet['Row'] = sheet.index
result.append(sheet)
return pd.concat(result, ignore_index=True)
folder_path = '.'
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")]
dfCombined = read_sheets(files)
writer = pd.ExcelWriter('Output.xls')
dfCombined.to_excel(writer, index=None, sheet_name='Combined')
writer.save()
writer.close()
示例输出如下所示:
Name Address Date City State Zip Sheetname Row
Alice 1 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 1 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 1 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 1 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 1 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 1 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 1 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 1 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 1 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 1 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 1 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 1 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3
Alice 2 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 2 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 2 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 2 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 2 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 2 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 2 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 2 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 2 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 2 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 2 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 2 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3
Alice 3 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 3 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 3 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 3 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 3 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 3 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 3 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 3 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 3 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 3 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 3 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 3 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3
我有 3 个 xls 文件,每个文件有 3 sheets。 所有 sheet 都有相同的列 headers 但如下所示的顺序不同
1.xls
Name Address Date City State Zip
2.xls
Address Date City Zip Name State
3.xls
City Zip Name Address Date State
我希望我的最终 xls 文件连接所有 3 个文件和 sheets
Output.xls
Name Address Date City State Zip RowNumber SheetName
rownumber 应该是每个文件的特定行号,sheet 数据来自之前 concatenation.Sheetname 应该是 sheet 它来自 xls 文件。
我的尝试-
import os
import pandas as pd
#set src directory
os.chdir('C:/Users/hhh/Desktop/python/Concat')
def read_sheets(filename):
result = []
sheets = pd.read_excel(filename, sheet_name=None)
for name, sheet in sheets.items():
sheet['Sheetname'] = name
sheet['Row'] = sheet.index
result.append(sheet)
return pd.concat(result, ignore_index=True)
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")]
dfoo = read_sheets(files)
但什么也没有发生,我只是收到一个断言错误,说断言 content_or_path 不是 None。这是因为列顺序不匹配吗?有解决方法吗?所有文件和 sheet 中的列数相同。在每个文件中 sheets 具有相同的顺序。但是,如果您将 1.xls sheets 与 2.xls 进行比较,则列顺序会有所不同,正如您在我上面的 reprex 中看到的那样
我认为您的问题是要求采用 9 个不同的 sheet(每个 3 个在 3 个不同的 .xls 文件中)并将它们组合成一个 sheet 的新传播sheet Output.xls.
一些评论开始:
- 不同输入文件的不同列顺序应该不是问题。
- 您可能需要考虑将输出文件设为 .xlsx 文件而不是 .xls,因为处理 .xls 文件所需的 xlwt 包会引发警告:
FutureWarning:
As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas.
This is the only engine in pandas that supports writing in the xls format.
Install openpyxl and write to an xlsx file instead.
You can set the option io.excel.xls.writer to 'xlwt' to silence this warning.
While this option is deprecated and will also raise a warning, it can be globally set and the warning suppressed.
writer = pd.ExcelWriter('Output.xls')
- 您问题中的示例代码将文件列表发送到函数 read_sheets(),因此需要更改此函数以期望此列表而不是单个文件。
- 代码需要遍历输入文件,然后遍历每个文件中的 sheet。
这是对您的代码的修改,它可以满足您的要求(使用 os.chdir() 的不同参数以匹配我的测试环境):
import os
import pandas as pd
#set src directory
#os.chdir('C:/Users/hhh/Desktop/python/Concat')
os.chdir('./Concat')
def read_sheets(files):
result = []
for filename in files:
sheets = pd.read_excel(filename, sheet_name=None)
for name, sheet in sheets.items():
sheet['Sheetname'] = name
sheet['Row'] = sheet.index
result.append(sheet)
return pd.concat(result, ignore_index=True)
folder_path = '.'
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")]
dfCombined = read_sheets(files)
writer = pd.ExcelWriter('Output.xls')
dfCombined.to_excel(writer, index=None, sheet_name='Combined')
writer.save()
writer.close()
示例输出如下所示:
Name Address Date City State Zip Sheetname Row
Alice 1 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 1 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 1 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 1 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 1 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 1 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 1 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 1 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 1 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 1 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 1 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 1 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3
Alice 2 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 2 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 2 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 2 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 2 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 2 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 2 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 2 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 2 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 2 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 2 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 2 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3
Alice 3 Main St 11 Nome Alaska 11111 Sheet1 0
Bob 3 Main St 12 Providence Rhode Island 22222 Sheet1 1
Candace 3 Main St 13 Denver Colorado 33333 Sheet1 2
Dirk 3 Main St 14 Wilmington Delaware 44444 Sheet1 3
Edward 3 Marvin Gardens 11 Nome Alaska 11111 Sheet2 0
Fran 3 Marvin Gardens 12 Providence Rhode Island 22222 Sheet2 1
George 3 Marvin Gardens 13 Denver Colorado 33333 Sheet2 2
Hannah 3 Marvin Gardens 14 Wilmington Delaware 44444 Sheet2 3
Irvin 3 St Marks Place 11 Nome Alaska 11111 Sheet3 0
Jasmine 3 St Marks Place 12 Providence Rhode Island 22222 Sheet3 1
Kirk 3 St Marks Place 13 Denver Colorado 33333 Sheet3 2
Lana 3 St Marks Place 14 Wilmington Delaware 44444 Sheet3 3