连接列数未知的多个 .xls 文件

Question

使用 Python，我想将一个目录中的所有 .xls 文件合并到一个数据框中，并将其另存为一个新的串联 .xls 文件。 .xls 文件将具有未知数量的列并且不一致 headers.

我在这个论坛上使用了其他建议，结果是这样的：

import os
import pandas as pd

path = os.getcwd()
files = os.listdir(path)

files_xls = [f for f in files if f[-3:] == 'xls']

df = pd.DataFrame()

for f in files_xls:
    data = pd.read_excel(f for f in files_xls) # I dont understand what to add 
# in the parentheses here.
    df = df.append(data)
    df

我遇到了这些错误：

File "<ipython-input-17-bb67a423cf40>", line 14, in <module>
  data = pd.read_excel(f for f in files_xls)

File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
  io = ExcelFile(io, engine=engine)

File "C:\Users\xxxx\Anaconda2\lib\site-packages\pandas\io\excel.py", line 229, in __init__
  raise ValueError('Must explicitly set engine if not passing in'

ValueError: Must explicitly set engine if not passing in buffer or path for io.

Answer 1

试试这个兄弟

df = []

for f in files_xls:
    data = pd.read_excel(f) 
    df = df.append(data)

mydf = pd.concat(df, axis = 0)

连接列数未知的多个 .xls 文件

Concatenating multiple .xls files with unknown number of columns

python

xls

concatenation

pandas