将行堆叠为基于另一个排名列的列
stack rows as columns based on another rank column
我有这样一个数据框:
offer_id hurdle hurdle_lvl reward_value
0 5c0c1545a944456aa28dcf578e0cbdd2 35000.0 1 500.0
1 5c0c1545a944456aa28dcf578e0cbdd2 40000.0 2 1500.0
2 5c0c1545a944456aa28dcf578e0cbdd2 45000.0 3 3000.0
3 f21306541ae046edbdf0a79daea3a005 500.0 1 25.0
4 f21306541ae046edbdf0a79daea3a005 750.0 2 100.0
5 f21306541ae046edbdf0a79daea3a005 25000.0 2 1500.0
我需要重新格式化它
offer_id hurdle_1 hurdle_2 hurdle_3 reward_1 reward_2 reward_3
0 5c0c1545a944456aa28dcf578e0cbdd2 35000.0 40000.0 45000.0 500.0 1500.0 3000.0
1 f21306541ae046edbdf0a79daea3a005 500.0 750.0 25000.0 25.0 100.0 1500.0
因此,基于 hurdle_lvl 列将障碍行和奖励行堆叠为列。非常感谢任何帮助
所以我使用了 pivot table:
y.pivot_table(index=y.groupby('hurdle_lvl').cumcount(), columns='hurdle_lvl', values=['hurdle','reward_value'])
但这给了我一个如下的数据框:
hurdle reward_value
hurdle_lvl 1 2 3 1 2 3
0 35000.0 40000.0 45000.0 500.0 1500.0 3000.0
1 500.0 750.0 30000.0 25.0 100.0 1500.0
问题是我丢失了 offer_id 映射。有什么方法可以将其与旋转 table?
相结合
使用 pivot_table
并对常用值求和。
out = df.astype({'hurdle_lvl': str}) \
.pivot_table(['hurdle', 'reward_value'], 'offer_id', 'hurdle_lvl',
aggfunc='sum', fill_value=0)
out.columns = out.columns.to_flat_index().str.join('_')
输出:
>>> out
hurdle_1 hurdle_2 hurdle_3 reward_value_1 reward_value_2 reward_value_3
offer_id
5c0c1545a944456aa28dcf578e0cbdd2 35000 40000 45000 500 1500 3000
f21306541ae046edbdf0a79daea3a005 500 25750 0 25 1600 0
我有这样一个数据框:
offer_id hurdle hurdle_lvl reward_value
0 5c0c1545a944456aa28dcf578e0cbdd2 35000.0 1 500.0
1 5c0c1545a944456aa28dcf578e0cbdd2 40000.0 2 1500.0
2 5c0c1545a944456aa28dcf578e0cbdd2 45000.0 3 3000.0
3 f21306541ae046edbdf0a79daea3a005 500.0 1 25.0
4 f21306541ae046edbdf0a79daea3a005 750.0 2 100.0
5 f21306541ae046edbdf0a79daea3a005 25000.0 2 1500.0
我需要重新格式化它
offer_id hurdle_1 hurdle_2 hurdle_3 reward_1 reward_2 reward_3
0 5c0c1545a944456aa28dcf578e0cbdd2 35000.0 40000.0 45000.0 500.0 1500.0 3000.0
1 f21306541ae046edbdf0a79daea3a005 500.0 750.0 25000.0 25.0 100.0 1500.0
因此,基于 hurdle_lvl 列将障碍行和奖励行堆叠为列。非常感谢任何帮助
所以我使用了 pivot table:
y.pivot_table(index=y.groupby('hurdle_lvl').cumcount(), columns='hurdle_lvl', values=['hurdle','reward_value'])
但这给了我一个如下的数据框:
hurdle reward_value
hurdle_lvl 1 2 3 1 2 3
0 35000.0 40000.0 45000.0 500.0 1500.0 3000.0
1 500.0 750.0 30000.0 25.0 100.0 1500.0
问题是我丢失了 offer_id 映射。有什么方法可以将其与旋转 table?
相结合使用 pivot_table
并对常用值求和。
out = df.astype({'hurdle_lvl': str}) \
.pivot_table(['hurdle', 'reward_value'], 'offer_id', 'hurdle_lvl',
aggfunc='sum', fill_value=0)
out.columns = out.columns.to_flat_index().str.join('_')
输出:
>>> out
hurdle_1 hurdle_2 hurdle_3 reward_value_1 reward_value_2 reward_value_3
offer_id
5c0c1545a944456aa28dcf578e0cbdd2 35000 40000 45000 500 1500 3000
f21306541ae046edbdf0a79daea3a005 500 25750 0 25 1600 0