Group By pandas df 并创建一个包含嵌套字典的列
Groupby pandas df and create a colum with nested dictionary
鉴于此 df:
dim_date_id closing_type r_d variable value rolling cusum_sample sample_type
1330 1995-10-27 low 1 low 9.699377 0.039688 1 [sh_dummy_0.5, sh_dummy_1]
1331 1995-10-27 low 1 close 10.340971 0.044784 1 [sh_dummy_0.5, sh_dummy_1]
1330 1995-10-27 high 1 high 10.529675 0.062868 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 high 1 close 10.340971 0.044784 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330 1995-10-27 low 5 low 9.699377 0.132976 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 low 5 close 10.340971 0.188179 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330 1995-10-27 high 5 high 10.529675 0.184475 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
我想根据 variable
对它进行分组并创建一个嵌套字典到 colum 样本类型(或我不太关心的其他类型)中。作为输出,我想要一个看起来像这样的 df
dim_date_id variable value sample_type
1330 1995-10-27 low 9.699377 {'r_d':1,'closing_type':'low','rolling':0.039688,'sample':[sh_dummy_0.5, sh_dummy_1]},
{'r_d':5,'closing_type':'low','rolling':0.132976,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 close 10.340971 {'r_d':1,'closing_type':'low','rolling':0.044784,'sample':[sh_dummy_0.5, sh_dummy_1]},
{'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],
{'r_d':5,'closing_type':'low','rolling':0.188179,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],
1330 1995-10-27 high 10.529675 {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]},
{'r_d':5,'closing_type':'high','rolling':0.184475,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
它必须尽可能灵活,因为在 sample_type 列中有时也可以有 'n' 个不同的变量。
试试这个:
new_df = df.groupby(['dim_date_id','variable','value']).apply(lambda x: x.to_dict()).reset_index(name='sample_type')
输出:
>>> new_df
dim_date_id variable value sample_type
0 1995-10-27 close 10.340971 {'dim_date_id': {1331: '1995-10-27'}, 'closing...
1 1995-10-27 high 10.529675 {'dim_date_id': {1330: '1995-10-27'}, 'closing...
2 1995-10-27 low 9.699377 {'dim_date_id': {1330: '1995-10-27'}, 'closing...
鉴于此 df:
dim_date_id closing_type r_d variable value rolling cusum_sample sample_type
1330 1995-10-27 low 1 low 9.699377 0.039688 1 [sh_dummy_0.5, sh_dummy_1]
1331 1995-10-27 low 1 close 10.340971 0.044784 1 [sh_dummy_0.5, sh_dummy_1]
1330 1995-10-27 high 1 high 10.529675 0.062868 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 high 1 close 10.340971 0.044784 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330 1995-10-27 low 5 low 9.699377 0.132976 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 low 5 close 10.340971 0.188179 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330 1995-10-27 high 5 high 10.529675 0.184475 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
我想根据 variable
对它进行分组并创建一个嵌套字典到 colum 样本类型(或我不太关心的其他类型)中。作为输出,我想要一个看起来像这样的 df
dim_date_id variable value sample_type
1330 1995-10-27 low 9.699377 {'r_d':1,'closing_type':'low','rolling':0.039688,'sample':[sh_dummy_0.5, sh_dummy_1]},
{'r_d':5,'closing_type':'low','rolling':0.132976,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 close 10.340971 {'r_d':1,'closing_type':'low','rolling':0.044784,'sample':[sh_dummy_0.5, sh_dummy_1]},
{'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],
{'r_d':5,'closing_type':'low','rolling':0.188179,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],
1330 1995-10-27 high 10.529675 {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]},
{'r_d':5,'closing_type':'high','rolling':0.184475,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
它必须尽可能灵活,因为在 sample_type 列中有时也可以有 'n' 个不同的变量。
试试这个:
new_df = df.groupby(['dim_date_id','variable','value']).apply(lambda x: x.to_dict()).reset_index(name='sample_type')
输出:
>>> new_df
dim_date_id variable value sample_type
0 1995-10-27 close 10.340971 {'dim_date_id': {1331: '1995-10-27'}, 'closing...
1 1995-10-27 high 10.529675 {'dim_date_id': {1330: '1995-10-27'}, 'closing...
2 1995-10-27 low 9.699377 {'dim_date_id': {1330: '1995-10-27'}, 'closing...