连接列数未知的多个 .xls 文件
Concatenating multiple .xls files with unknown number of columns
使用 Python,我想将一个目录中的所有 .xls 文件合并到一个数据框中,并将其另存为一个新的串联 .xls 文件。 .xls 文件将具有未知数量的列并且不一致 headers.
我在这个论坛上使用了其他建议,结果是这样的:
import os
import pandas as pd
path = os.getcwd()
files = os.listdir(path)
files_xls = [f for f in files if f[-3:] == 'xls']
df = pd.DataFrame()
for f in files_xls:
data = pd.read_excel(f for f in files_xls) # I dont understand what to add
# in the parentheses here.
df = df.append(data)
df
我遇到了这些错误:
File "<ipython-input-17-bb67a423cf40>", line 14, in <module>
data = pd.read_excel(f for f in files_xls)
File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 229, in __init__
raise ValueError('Must explicitly set engine if not passing in'
ValueError: Must explicitly set engine if not passing in buffer or path for io.
试试这个兄弟
df = []
for f in files_xls:
data = pd.read_excel(f)
df = df.append(data)
mydf = pd.concat(df, axis = 0)
使用 Python,我想将一个目录中的所有 .xls 文件合并到一个数据框中,并将其另存为一个新的串联 .xls 文件。 .xls 文件将具有未知数量的列并且不一致 headers.
我在这个论坛上使用了其他建议,结果是这样的:
import os
import pandas as pd
path = os.getcwd()
files = os.listdir(path)
files_xls = [f for f in files if f[-3:] == 'xls']
df = pd.DataFrame()
for f in files_xls:
data = pd.read_excel(f for f in files_xls) # I dont understand what to add
# in the parentheses here.
df = df.append(data)
df
我遇到了这些错误:
File "<ipython-input-17-bb67a423cf40>", line 14, in <module>
data = pd.read_excel(f for f in files_xls)
File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 229, in __init__
raise ValueError('Must explicitly set engine if not passing in'
ValueError: Must explicitly set engine if not passing in buffer or path for io.
试试这个兄弟
df = []
for f in files_xls:
data = pd.read_excel(f)
df = df.append(data)
mydf = pd.concat(df, axis = 0)