Pandas 数据重塑,将索引相同但值不同的多行根据发生率转换为多列
Pandas data reshaping, turn multiple rows with the same index but different values into many columns based on incidence
我在 pandas
中有以下 table
user_id idaggregate_info num_events num_lark_convo_events num_meals_logged num_breakfasts num_lunches num_dinners num_snacks total_activity sleep_duration num_activity_events num_weights num_notifs idusermission completed mission_delta
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28
2 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 NaT
3 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05
4 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 NaT
有些 user_ids 有多行相同的行,只是它们的 mission_delta 值不同。我如何将其转换为每个 id 的一行,其中一列名为 "mission_delta_1"、"mission_delta_2"(它们的数量各不相同,可能是每个 user_id 1 个到每个 [= 5 个) 46=] 所以命名必须是 iterative_ 等等 所以输出将是:
user_id idaggregate_info num_events num_lark_convo_events num_meals_logged num_breakfasts num_lunches num_dinners num_snacks total_activity sleep_duration num_activity_events num_weights num_notifs idusermission completed mission_delta_1 mission_delta_2
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28 NaT
2 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05 NaT
由于这些地址会展开所有列,因此只有一列需要取消堆叠。重复 link 中提供的解决方案失败:
df.groupby(level=0).apply(lambda x: pd.Series(x.values.flatten()))
生成与原始标签相同的 df,但标签不同
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28
2 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 NaT
3 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05
下一个选项:
result2.groupby(level=0).apply(lambda x: pd.Series(x.stack().values))
产生:
0 0 0
1 406
2 94
3 20
4 7
和
df.groupby(level=0).apply(lambda x: x.values.ravel()).apply(pd.Series)
生成原始数据帧:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28
2 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 NaT
3 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05
本质上我想转个df:
id mission_delta
0 NaT
1 1 day
1 2 days
1 1 day
2 5 days
2 NaT
进入
id mission_delta1 mission_delta_2 mission_delta_3
0 NaT NaT NaT
1 1 day 2 days 1 day
2 5 days NaT NaT
你可以试试这个;
grp = df.groupby('id')
df_res = grp['mission_delta'].apply(lambda x: pd.Series(x.values)).unstack().fillna('NaT')
df_res = df_res.rename(columns={i: 'mission_delta_{}'.format(i + 1) for i in range(len(df_res))})
print(df_res)
mission_delta_1 mission_delta_2 mission_delta_3
id
0 NaT NaT NaT
1 1 day 2 days 1 day
2 5 days NaT NaT
我在 pandas
中有以下 tableuser_id idaggregate_info num_events num_lark_convo_events num_meals_logged num_breakfasts num_lunches num_dinners num_snacks total_activity sleep_duration num_activity_events num_weights num_notifs idusermission completed mission_delta
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28
2 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 NaT
3 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05
4 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 NaT
有些 user_ids 有多行相同的行,只是它们的 mission_delta 值不同。我如何将其转换为每个 id 的一行,其中一列名为 "mission_delta_1"、"mission_delta_2"(它们的数量各不相同,可能是每个 user_id 1 个到每个 [= 5 个) 46=] 所以命名必须是 iterative_ 等等 所以输出将是:
user_id idaggregate_info num_events num_lark_convo_events num_meals_logged num_breakfasts num_lunches num_dinners num_snacks total_activity sleep_duration num_activity_events num_weights num_notifs idusermission completed mission_delta_1 mission_delta_2
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28 NaT
2 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05 NaT
df.groupby(level=0).apply(lambda x: pd.Series(x.values.flatten()))
生成与原始标签相同的 df,但标签不同
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28
2 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 NaT
3 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05
下一个选项:
result2.groupby(level=0).apply(lambda x: pd.Series(x.stack().values))
产生:
0 0 0
1 406
2 94
3 20
4 7
和
df.groupby(level=0).apply(lambda x: x.values.ravel()).apply(pd.Series)
生成原始数据帧:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 406 94 20 7 2 2 2 1 4456 47738 72 0 18 1426 0 NaT
1 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 7 days 10:04:28
2 1 1247 121 48 26 8 7 2 9 48695 37560 53 14 48 1379 1 NaT
3 2 2088 356 32 15 6 6 1 2 41598 184113 314 1 21 967 1 8 days 00:03:05
本质上我想转个df:
id mission_delta
0 NaT
1 1 day
1 2 days
1 1 day
2 5 days
2 NaT
进入
id mission_delta1 mission_delta_2 mission_delta_3
0 NaT NaT NaT
1 1 day 2 days 1 day
2 5 days NaT NaT
你可以试试这个;
grp = df.groupby('id')
df_res = grp['mission_delta'].apply(lambda x: pd.Series(x.values)).unstack().fillna('NaT')
df_res = df_res.rename(columns={i: 'mission_delta_{}'.format(i + 1) for i in range(len(df_res))})
print(df_res)
mission_delta_1 mission_delta_2 mission_delta_3
id
0 NaT NaT NaT
1 1 day 2 days 1 day
2 5 days NaT NaT