在 Python 或 R 中合并具有不同 headers 的数据

Merge data with varying headers in Python or R

我有一个 excel 文件,其中包含多个需要合并的工作表。但是,headers 列彼此不同。目前数据是这样的。

Sheet 1
+-------------+--------------+----------+--------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 |
+-------------+--------------+----------+--------+---------+---------+
|          17 | Data         | Data     |      0 |       0 |       0 |
|          17 | Data         | Data     |      0 |       0 |       0 |
+-------------+--------------+----------+--------+---------+---------+

Sheet 2
+-------------+--------------+----------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header3 | Header2 |
+-------------+--------------+----------+---------+---------+
|          15 | Data         | Data     |       0 |       0 |
|          15 | Data         | Data     |       0 |       0 |
+-------------+--------------+----------+---------+---------+

Sheet 3
+-------------+--------------+----------+---------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header4 | Header1 | Header3 |
+-------------+--------------+----------+---------+---------+---------+
|          16 | Data         | Data     |       0 |       0 |       0 |
|          16 | Data         | Data     |       0 |       0 |       0 |
+-------------+--------------+----------+---------+---------+---------+

OUTPUT
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 | Header3 | Header4 | SheetName |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
|          17 | Data         | Data     | 0      | 0       | 0       | null    | null    | Sheet1    |
|          17 | Data         | Data     | 0      | 0       | 0       | null    | null    | Sheet1    |
|          15 | Data         | Data     | null   | null    | 0       | 0       | null    | Sheet2    |
|          15 | Data         | Data     | null   | null    | 0       | 0       | null    | Sheet2    |
|          16 | Data         | Data     | null   | 0       | null    | 0       | 0       | Sheet3    |
|          16 | Data         | Data     | null   | 0       | null    | 0       | 0       | Sheet3    |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+

我对 Python 比较陌生。我用过 Pandas 和 numpy。 我有多达 60 张纸要工作。谁能帮助我了解如何实现这一目标?如果不是 python,我应该使用其他 tool/method 吗?我真的可以使用代码示例开始。

非常感谢您的帮助。提前谢谢你

使用 R,这很容易做到。

library(openxlsx) # to read xlsx files
library(purrr)    # for the "map" function

wb <- loadWorkbook("path/filename.xlsx")
all_sheets <- names(wb)

merged_data <- map_df(all_sheets, ~ read.xlsx(wb, sheet = .x)

在 R 中使用 for 循环和 rbind :

for (i in file.list) {
    data <- rbind(data, read.xlsx(i, sheetIndex = 1))
}

rbind 用法:要垂直连接两个数据框(数据集),请使用 rbind 函数。两个数据框必须有相同的变量,但不必顺序相同。

total <- rbind(data frameA, data frameB) 
import pandas as pd

filepath = r"filePath here"
sheets_dict = pd.read_excel(filepath, sheet_name=None)

full_table = pd.DataFrame()

#loop through sheets
for name, sheet in sheets_dict.items():
    sheet['sheet'] = name
    #sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
    full_table = full_table.append (sheet)

full_table.reset_index (inplace=True, drop=True)

#Write to Excel
writer = pd.ExcelWriter('consolidated_TB1.xlsx', engine='xlsxwriter')
full_table.to_excel(writer,'Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.save()