如何将 Excel 工作簿中的多个电子表格连接到 pandas 数据框?
How to concat multiple spreadsheets in Excel workbooks into pandas dataframe?
我有多个文件夹和子文件夹,其中包含 Excel 个带有多个选项卡的工作簿。如何将所有信息连接到 1 个 pandas 数据帧中?
到目前为止,这是我的代码:
from pathlib import Path
import os
import pandas as pd
import glob
p = Path(r'C:\Users\user1\Downloads\key_folder')
globbed_files = p.glob('**/**/*.xlsx')
df = []
for file in globbed_files:
frame = pd.read_excel(file, sheet_name = None, ignore_index=True)
frame['File Path'] = os.path.basename(file)
df.append(frame)
# df = pd.concat([d.values() for d in df], axis = 0, ignore_index=True)
df = pd.concat(df, axis=0, ignore_index = True)
这会产生以下错误:
cannot concatenate object of type "<class 'collections.OrderedDict'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
当我 运行 pd.DataFrame(df)
时,我看到每个 Excel 电子表格选项卡都是一个单独的列。单元格包含数据和文本形式的 headers,形成一个非常长的字符串。
感谢任何帮助!谢谢!
这是最终代码:
from pathlib import Path
import os
import pandas as pd
import glob
import xlrd
p = Path('path here')
globbed_files = p.glob('**/**/*.xlsx')
list_dfs = []
dfs = []
for file in globbed_files:
xls = xlrd.open_workbook(file, on_demand=True)
for sheet_name in xls.sheet_names():
df = pd.read_excel(file,sheet_name)
df['Sheet Name'] = sheet_name
list_dfs.append(df)
dfs = pd.concat(list_dfs,axis=0)
dfs.to_excel('merged spreadsheet.xlsx')
我有多个文件夹和子文件夹,其中包含 Excel 个带有多个选项卡的工作簿。如何将所有信息连接到 1 个 pandas 数据帧中?
到目前为止,这是我的代码:
from pathlib import Path
import os
import pandas as pd
import glob
p = Path(r'C:\Users\user1\Downloads\key_folder')
globbed_files = p.glob('**/**/*.xlsx')
df = []
for file in globbed_files:
frame = pd.read_excel(file, sheet_name = None, ignore_index=True)
frame['File Path'] = os.path.basename(file)
df.append(frame)
# df = pd.concat([d.values() for d in df], axis = 0, ignore_index=True)
df = pd.concat(df, axis=0, ignore_index = True)
这会产生以下错误:
cannot concatenate object of type "<class 'collections.OrderedDict'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
当我 运行 pd.DataFrame(df)
时,我看到每个 Excel 电子表格选项卡都是一个单独的列。单元格包含数据和文本形式的 headers,形成一个非常长的字符串。
感谢任何帮助!谢谢!
这是最终代码:
from pathlib import Path
import os
import pandas as pd
import glob
import xlrd
p = Path('path here')
globbed_files = p.glob('**/**/*.xlsx')
list_dfs = []
dfs = []
for file in globbed_files:
xls = xlrd.open_workbook(file, on_demand=True)
for sheet_name in xls.sheet_names():
df = pd.read_excel(file,sheet_name)
df['Sheet Name'] = sheet_name
list_dfs.append(df)
dfs = pd.concat(list_dfs,axis=0)
dfs.to_excel('merged spreadsheet.xlsx')