具有不均匀组的数据透视表 Python Pandas
Pivot Data with Uneven Groups Python Pandas
我有一个 table,我想使用 Python 3x Pandas 将其转换为以下内容:
Group
Assessment
Review
GroupA
No team spirit
Negative
GroupA
Good players
Positive
GroupA
They scored well
Positive
GroupB
Goal failed
Negative
GroupB
Bad weather
Negative
GroupB
Resilience
Positive
GroupB
Growth potential
Positive
GroupB
Bad technique
Negative
结果 table 应该是:
Group
Positive
Negative
GroupA
Good players
No team spirit
GroupA
They scored well
NaN
GroupB
Resilience
Goal Failed
GroupB
Growth Potential
Bad weather
GroupB
NaN
Bad Technique
是否有使用 Pandas 或其他方法的简洁 Pythonic 方法?
您不能按原样 pivot
,但您可以使用 groupby
+ cumcount
和 pivot
添加组号,使用新创建的编号作为索引:
out = (df.assign(num=df.groupby(['Group','Review']).cumcount())
.pivot(['num','Group'],'Review','Assessment')
.droplevel(0).sort_index()
[['Positive','Negative']]
.reset_index()
.rename_axis(columns=[None]))
输出:
Group Positive Negative
0 GroupA Good players No team spirit
1 GroupA They scored well NaN
2 GroupB Resilience Goal failed
3 GroupB Growth potential Bad weather
4 GroupB NaN Bad technique
这不是您真正想要的,但这个 table 结构似乎更有意义:
res = (df.groupby(['Group','Review']).
apply(lambda x:x['Assessment'].tolist()).
unstack())
print(res)
'''
Review Negative Positive
Group
GroupA [No team spirit] [Good players, They scored well]
GroupB [Goal failed, Bad weather, Bad technique] [Resilience, Growth potential]
我有一个 table,我想使用 Python 3x Pandas 将其转换为以下内容:
Group | Assessment | Review |
---|---|---|
GroupA | No team spirit | Negative |
GroupA | Good players | Positive |
GroupA | They scored well | Positive |
GroupB | Goal failed | Negative |
GroupB | Bad weather | Negative |
GroupB | Resilience | Positive |
GroupB | Growth potential | Positive |
GroupB | Bad technique | Negative |
结果 table 应该是:
Group | Positive | Negative |
---|---|---|
GroupA | Good players | No team spirit |
GroupA | They scored well | NaN |
GroupB | Resilience | Goal Failed |
GroupB | Growth Potential | Bad weather |
GroupB | NaN | Bad Technique |
是否有使用 Pandas 或其他方法的简洁 Pythonic 方法?
您不能按原样 pivot
,但您可以使用 groupby
+ cumcount
和 pivot
添加组号,使用新创建的编号作为索引:
out = (df.assign(num=df.groupby(['Group','Review']).cumcount())
.pivot(['num','Group'],'Review','Assessment')
.droplevel(0).sort_index()
[['Positive','Negative']]
.reset_index()
.rename_axis(columns=[None]))
输出:
Group Positive Negative
0 GroupA Good players No team spirit
1 GroupA They scored well NaN
2 GroupB Resilience Goal failed
3 GroupB Growth potential Bad weather
4 GroupB NaN Bad technique
这不是您真正想要的,但这个 table 结构似乎更有意义:
res = (df.groupby(['Group','Review']).
apply(lambda x:x['Assessment'].tolist()).
unstack())
print(res)
'''
Review Negative Positive
Group
GroupA [No team spirit] [Good players, They scored well]
GroupB [Goal failed, Bad weather, Bad technique] [Resilience, Growth potential]