如何将许多腌制文件发送到数据框中?

How to send many pickled file into a dataframe?

我有很多文件是使用 "pickle" 创建的。 我想将它们发送到数据框,计算每个数据的平均值(从第二行到最后),将其乘以 1000 并将其四舍五入到小数点后两位。

到目前为止,我已经使用 1 个 pickle 文件实现了这一点。

import pandas as pd

df = pd.read_pickle(r'C:\Users\file_inference_time')
df = pd.DataFrame(df)
df.rename(columns={0:'MobileNet'},inplace=True)

df_mean=(df.iloc[2::,:].mean()* 1000).round(decimals=2)
df_mean2=pd.DataFrame(df_mean)
df_mean2

我从 1 个文件中得到结果。

这些是我需要阅读的文件 ("pickle")

编辑 这是我在 运行 第二个选项

时得到的错误
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-b72e45d8bcfc> in <module>
     16 
     17 
---> 18 df_mean_all = pd.concat(df_mean_list).reset_index(drop=True)
     19 
     20 print(df_mean_all)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    253         verify_integrity=verify_integrity,
    254         copy=copy,
--> 255         sort=sort,
    256     )
    257 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    302 
    303         if len(objs) == 0:
--> 304             raise ValueError("No objects to concatenate")
    305 
    306         if keys is None:

ValueError: No objects to concatenate

这是一个有结果的情节

获得 dictdataframes

  • 将每个文件的计算平均结果保存到dict
from pathlib import Path

dir_path = Path(r'C:\Users\path_to_files')
files = dir_path.glob('**/file_inference_time*')  # get all pkl files in main dir and subdirectories

df_mean_dict = dict()

for i, file in enumerate(files):
    df = pd.DataFrame(pd.read_pickle(file))
    df.rename(columns={0:'MobileNet'}, inplace=True)

    df_mean_dict[i] = pd.DataFrame((df.iloc[2::,:].mean()* 1000).round(decimals=2))

    # if all the file names are unique, the dict key can be the file name (w/o the file extension)
    # df_mean_dict[file.stem] = pd.DataFrame((df.iloc[2::,:].mean()* 1000).round(decimals=2))

获取单个数据框 - 这就是我要做的

  • 结果 df_mean_all 将是一个 2 列数据框。
    • 第 0 列将是 MobileNet
    • 第 1 列将是 file
dir_path = Path(r'C:\Users\path_to_files')
files = dir_path.glob('**/file_inference_time*')   # get all pkl files in main dir and subdirectories

# to check if the files are found
# if an empty list prints, no files are found
files = list(files)
print(files[:5]

df_mean_list = list()

for file in files:
    df = pd.DataFrame(pd.read_pickle(file))

    df_mean = pd.DataFrame((df.iloc[2::,:].mean()* 1000).round(decimals=2)).reset_index(drop=True).rename(columns={0: 'MobileNet'})
    df_mean['file'] = file  # or file.stem for just the file name

    df_mean_list.append(df_mean)

# df_mean_list is a list of dataframes, pd.concat combines them all into one dataframe
df_mean_all = pd.concat(df_mean_list).reset_index(drop=True)

print(df_mean_all)

   MobileNet                                    file
0       3.24  C:\Users\file_inference_time\file1.pkl
1       2.34  C:\Users\file_inference_time\file2.pkl
2       4.23  C:\Users\file_inference_time\file3.pkl