将文件夹中不同文件类型的数据合并到 pandas DataFrame 中

Merge data from different file types in a folder into a pandas DataFrame

我有一个文件夹,其中有些文件为 .csv,有些文件为 .xls。我想将所有数据从它们中获取到一个数据框中。好的部分是所有文件都有相同的数据字段。

我正在使用以下代码:

import os
import glob
import pandas as pd
os.chdir("/content/")

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

new_extend = 'xls'
all_files = [i for i in glob.glob('*.{}'.format(extension))]
all_filenames.append(all_files)

#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

但我收到以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-7-747e8d68cec8> in <module>()
      1 #combine all files in the list
----> 2 combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
      3 #export to csv
      4 combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

3 frames
/usr/local/lib/python3.7/dist-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    241     if not is_file_like(filepath_or_buffer):
    242         msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 243         raise ValueError(msg)
    244 
    245     return filepath_or_buffer, None, compression, False

ValueError: Invalid file path or buffer object type: <class 'list'>

有人可以告诉我我做错了什么吗?

我无法测试您的方法,因为我无法访问您的文件,但我已经使用一些示例文件对其进行了测试,您可以下载 here 以备不时之需。

尝试 运行 下面的代码,仅将“mypath”更改为文件所在的文件夹。在同一文件夹中,脚本将生成最终数据帧 ("1.Final_DF.xlsx"):

mypath = r"F:\yourFolder"

import pandas as pd

import glob, os
os.chdir(mypath)

first_df = 0
ext_csv = ".csv"
ext_xls = ".xls"

target_files = []
for file in glob.glob("*xls"):
    target_files.append(file)
for file in glob.glob("*csv"):
    target_files.append(file)

for file in target_files:
    file_path = mypath + "\" + file
    if ext_xls in file:
        df_raw = pd.read_excel(file_path)
    else:
        df_raw = pd.read_csv(file_path)
    first_df += 1
    if first_df == 1:
        df = df_raw.copy()
    else:
        df = df.append(df_raw, ignore_index = True)

df.to_excel (r'1.Final_DF.xlsx', index = False)

告诉我 ;-)