从 Jupyter notebook 文件夹打开多个泡菜文件不起作用

Open multiple pickle files from Jupyter notebook folder doesn't work

我在服务器上使用 jupyter notebook(文件夹不在我的电脑上)。我有一个包含 30 个数据框的文件夹,这些数据框具有完全相同的列。都保存在下一个路径:

Reut/folder_no_one/here_the_files_located

我想将它们全部打开并连接起来。 我知道我可以做这样的事情:

df1=pd.read_pickle('table1')
df2=pd.read_pickle('table2')
df3=pd.read_pickle('table3')
...
#and then concat

但我确信有更好、更聪明的方法来做到这一点。 我尝试打开所有文件并将它们分别保存如下:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'

{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

但我收到错误消息:

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in ----> 1 {f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

TypeError: 'str' object is not callable

我试过玩和放不同版本的路径,也没有放路径(因为我的笔记本是那些文件所在的地方),但我总是遇到同样的错误。

*值得一提的是,当笔记本也在该文件夹中时,我可以在不指定路径的情况下打开这些文件。

我的最终目标是自动打开并连接所有这些 table 作为一个大的 table。

编辑:我也试过这个:

path = r'file_name/file_location_with_all_pickles'
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
    df = pd.read_pickle(filename)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

还有

path_to_files = r'file_name/file_location_with_all_pickles'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pkl"):
    print(table)
    tables.append(pd.read_pickle(table))

但这两种情况我都得到错误

ValueError: No objects to concatenate when I try to concat. also when I tell it to print the filename/table it does nothing. also if inside the loop I try to print just ordinary string (like print('hello'), nothing happens. it seems like there is problem with the path but when I open one specific pickle like this:

pd.read_pickle(r'file_name/file_location_with_all_pickles/specific_table.pkl')

打开。

'更新:

这最终对我有用:

import pandas as pd
import glob

path = r'folder' # use your path
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
    df = pd.read_pickle(filename)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

从这里 ()

怎么样:

path_to_files = r'Reut/here_the_files_located'
df = pd.concat([pd.read_pickle(f'{path_to_files}/table{num}.pickle') for num in range(1, 33)])

这相当于:

path_to_files = r'Reut/here_the_files_located'
tables = []
for num in range(1, 33):
    filename = f'{path_to_files}/table{num}.pickle'
    print(filename)
    tables.append(pd.read_pickle(filename))

df = pd.concat(tables)

输出:

Reut/here_the_files_located/table1.pickle
Reut/here_the_files_located/table2.pickle
Reut/here_the_files_located/table3.pickle
Reut/here_the_files_located/table4.pickle
Reut/here_the_files_located/table5.pickle
Reut/here_the_files_located/table6.pickle
Reut/here_the_files_located/table7.pickle
Reut/here_the_files_located/table8.pickle
Reut/here_the_files_located/table9.pickle
Reut/here_the_files_located/table10.pickle
Reut/here_the_files_located/table11.pickle
Reut/here_the_files_located/table12.pickle
Reut/here_the_files_located/table13.pickle
Reut/here_the_files_located/table14.pickle
Reut/here_the_files_located/table15.pickle
Reut/here_the_files_located/table16.pickle
Reut/here_the_files_located/table17.pickle
Reut/here_the_files_located/table18.pickle
Reut/here_the_files_located/table19.pickle
Reut/here_the_files_located/table20.pickle
Reut/here_the_files_located/table21.pickle
Reut/here_the_files_located/table22.pickle
Reut/here_the_files_located/table23.pickle
Reut/here_the_files_located/table24.pickle
Reut/here_the_files_located/table25.pickle
Reut/here_the_files_located/table26.pickle
Reut/here_the_files_located/table27.pickle
Reut/here_the_files_located/table28.pickle
Reut/here_the_files_located/table29.pickle
Reut/here_the_files_located/table30.pickle
Reut/here_the_files_located/table31.pickle
Reut/here_the_files_located/table32.pickle

关于您的代码的几点评论:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'

{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
num=list(range(1, 33)) #number of tables I have in the folder

不需要用 range 创建 list。直接在 for 循环中使用 range 或 list/dictionary 理解非常有效。

Path=r'Reut/folder_no_one/here_the_files_located'

我猜您之前已经从 pathlib 导入了 Path class。如果你想像平常一样调用 Path ,你需要为该变量选择另一个名称。这就是您收到错误 TypeError: 'str' object is not callable.

的原因

is there nay way to use it if the tables names' are not the same? e.g if one was table1 and one is dataframe3, just to read them not depended on their name

当然可以。假设所有已保存表格的文件名都以 .pickle 结尾,您可以像第一次尝试那样使用 glob 方法。别忘了 import pathlib.

import pathlib
path_to_files = r'Reut/here_the_files_located'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pickle"):
    tables.append(pd.read_pickle(table))

df = pd.concat(tables)