将文件夹中不同文件类型的数据合并到 pandas DataFrame 中
Merge data from different file types in a folder into a pandas DataFrame
我有一个文件夹,其中有些文件为 .csv,有些文件为 .xls。我想将所有数据从它们中获取到一个数据框中。好的部分是所有文件都有相同的数据字段。
我正在使用以下代码:
import os
import glob
import pandas as pd
os.chdir("/content/")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
new_extend = 'xls'
all_files = [i for i in glob.glob('*.{}'.format(extension))]
all_filenames.append(all_files)
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
但我收到以下错误:
ValueError Traceback (most recent call last)
<ipython-input-7-747e8d68cec8> in <module>()
1 #combine all files in the list
----> 2 combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
3 #export to csv
4 combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
241 if not is_file_like(filepath_or_buffer):
242 msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 243 raise ValueError(msg)
244
245 return filepath_or_buffer, None, compression, False
ValueError: Invalid file path or buffer object type: <class 'list'>
有人可以告诉我我做错了什么吗?
我无法测试您的方法,因为我无法访问您的文件,但我已经使用一些示例文件对其进行了测试,您可以下载 here 以备不时之需。
尝试 运行 下面的代码,仅将“mypath”更改为文件所在的文件夹。在同一文件夹中,脚本将生成最终数据帧 ("1.Final_DF.xlsx"):
mypath = r"F:\yourFolder"
import pandas as pd
import glob, os
os.chdir(mypath)
first_df = 0
ext_csv = ".csv"
ext_xls = ".xls"
target_files = []
for file in glob.glob("*xls"):
target_files.append(file)
for file in glob.glob("*csv"):
target_files.append(file)
for file in target_files:
file_path = mypath + "\" + file
if ext_xls in file:
df_raw = pd.read_excel(file_path)
else:
df_raw = pd.read_csv(file_path)
first_df += 1
if first_df == 1:
df = df_raw.copy()
else:
df = df.append(df_raw, ignore_index = True)
df.to_excel (r'1.Final_DF.xlsx', index = False)
告诉我 ;-)
我有一个文件夹,其中有些文件为 .csv,有些文件为 .xls。我想将所有数据从它们中获取到一个数据框中。好的部分是所有文件都有相同的数据字段。
我正在使用以下代码:
import os
import glob
import pandas as pd
os.chdir("/content/")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
new_extend = 'xls'
all_files = [i for i in glob.glob('*.{}'.format(extension))]
all_filenames.append(all_files)
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
但我收到以下错误:
ValueError Traceback (most recent call last)
<ipython-input-7-747e8d68cec8> in <module>()
1 #combine all files in the list
----> 2 combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
3 #export to csv
4 combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
241 if not is_file_like(filepath_or_buffer):
242 msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 243 raise ValueError(msg)
244
245 return filepath_or_buffer, None, compression, False
ValueError: Invalid file path or buffer object type: <class 'list'>
有人可以告诉我我做错了什么吗?
我无法测试您的方法,因为我无法访问您的文件,但我已经使用一些示例文件对其进行了测试,您可以下载 here 以备不时之需。
尝试 运行 下面的代码,仅将“mypath”更改为文件所在的文件夹。在同一文件夹中,脚本将生成最终数据帧 ("1.Final_DF.xlsx"):
mypath = r"F:\yourFolder"
import pandas as pd
import glob, os
os.chdir(mypath)
first_df = 0
ext_csv = ".csv"
ext_xls = ".xls"
target_files = []
for file in glob.glob("*xls"):
target_files.append(file)
for file in glob.glob("*csv"):
target_files.append(file)
for file in target_files:
file_path = mypath + "\" + file
if ext_xls in file:
df_raw = pd.read_excel(file_path)
else:
df_raw = pd.read_csv(file_path)
first_df += 1
if first_df == 1:
df = df_raw.copy()
else:
df = df.append(df_raw, ignore_index = True)
df.to_excel (r'1.Final_DF.xlsx', index = False)
告诉我 ;-)