在没有数字聚合/数字列的情况下进行透视

Pivoting without numerical aggregation/ a numerical column

我有一个看起来像这样的数据框

d = {'Name': ['Sally', 'Sally', 'Sally', 'James', 'James', 'James'], 'Sports': ['Tennis', 'Track & field', 'Dance', 'Dance', 'MMA', 'Crosscountry']}
df = pd.DataFrame(data=d)
Name Sports
Sally Tennis
Sally Track & field
Sally Dance
James Dance
James MMA
James Crosscountry

似乎 pandas' pivot_table 只允许使用数字聚合进行整形,但我想将其整形为宽格式,以便字符串位于“值”中:

Name First_sport Second_sport Third_sport
Sally Tennis Track & field Dance
James Dance MMA Crosscountry

pandas 中有什么方法可以帮助我做到这一点吗?谢谢!

您可以这样做,如果您的列/索引名称是唯一的,则使用 .pivot(),或者通过提供也适用于字符串的聚合函数使用 .pivot_table(),例如'first'.

>>> df['Sport_num'] = 'Sport ' + df.groupby('Name').cumcount().astype(str)
>>> df
    Name         Sports Sport_num
0  Sally         Tennis   Sport 0
1  Sally  Track & field   Sport 1
2  Sally          Dance   Sport 2
3  James          Dance   Sport 0
4  James            MMA   Sport 1
5  James   Crosscountry   Sport 2
>>> df.pivot(index='Name', values='Sports', columns='Sport_num')
Sport_num Sport 0        Sport 1       Sport 2
Name                                          
James       Dance            MMA  Crosscountry
Sally      Tennis  Track & field         Dance
>>> df.pivot_table(index='Name', values='Sports', columns='Sport_num', aggfunc='first')
Sport_num Sport 0        Sport 1       Sport 2
Name                                          
James       Dance            MMA  Crosscountry
Sally      Tennis  Track & field         Dance

另一个解决方案:

print(
    df.groupby("Name")
    .agg(list)["Sports"]
    .apply(pd.Series)
    .rename(columns={0: "First", 1: "Second", 2: "Third"})
    .add_suffix("_sport")
    .reset_index()
)

打印:

    Name First_sport   Second_sport   Third_sport
0  James       Dance            MMA  Crosscountry
1  Sally      Tennis  Track & field         Dance

我们也可以使用groupby cumcount in conjunction with set_index + unstack:

new_df = df.set_index(['Name', df.groupby('Name').cumcount()]).unstack()

new_df:

       Sports                             
            0              1             2
Name                                      
James   Dance            MMA  Crosscountry
Sally  Tennis  Track & field         Dance

我们可以通过重命名和折叠 MultiIndex 来做一些额外的清理工作:

new_df = (
    df.set_index(['Name', df.groupby('Name').cumcount()])
        .unstack()
        .rename(columns={0: "First", 1: "Second", 2: "Third",
                         'Sports': 'Sport'})
)
new_df.columns = new_df.columns.swaplevel().map('_'.join)
new_df = new_df.reset_index()

new_df:

    Name First_Sport   Second_Sport   Third_Sport
0  James       Dance            MMA  Crosscountry
1  Sally      Tennis  Track & field         Dance

如果想要从整数到序数词的编程转换,我们可以使用类似 inflect:

import inflect

new_df = df.set_index([
    'Name', df.groupby('Name').cumcount().add(1)
]).unstack()
# Collapse MultiIndex
p = inflect.engine()
new_df.columns = new_df.columns.map(
    # Convert to Ordinal Word and Column to singular noun
    lambda c: f'{p.number_to_words(p.ordinal(c[1])).capitalize()}_'
              f'{p.singular_noun(c[0])}'
)
new_df = new_df.reset_index()

new_df:

    Name First_Sport   Second_Sport   Third_Sport
0  James       Dance            MMA  Crosscountry
1  Sally      Tennis  Track & field         Dance