在没有数字聚合/数字列的情况下进行透视
Pivoting without numerical aggregation/ a numerical column
我有一个看起来像这样的数据框
d = {'Name': ['Sally', 'Sally', 'Sally', 'James', 'James', 'James'], 'Sports': ['Tennis', 'Track & field', 'Dance', 'Dance', 'MMA', 'Crosscountry']}
df = pd.DataFrame(data=d)
Name
Sports
Sally
Tennis
Sally
Track & field
Sally
Dance
James
Dance
James
MMA
James
Crosscountry
似乎 pandas' pivot_table 只允许使用数字聚合进行整形,但我想将其整形为宽格式,以便字符串位于“值”中:
Name
First_sport
Second_sport
Third_sport
Sally
Tennis
Track & field
Dance
James
Dance
MMA
Crosscountry
pandas 中有什么方法可以帮助我做到这一点吗?谢谢!
您可以这样做,如果您的列/索引名称是唯一的,则使用 .pivot()
,或者通过提供也适用于字符串的聚合函数使用 .pivot_table()
,例如'first'
.
>>> df['Sport_num'] = 'Sport ' + df.groupby('Name').cumcount().astype(str)
>>> df
Name Sports Sport_num
0 Sally Tennis Sport 0
1 Sally Track & field Sport 1
2 Sally Dance Sport 2
3 James Dance Sport 0
4 James MMA Sport 1
5 James Crosscountry Sport 2
>>> df.pivot(index='Name', values='Sports', columns='Sport_num')
Sport_num Sport 0 Sport 1 Sport 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
>>> df.pivot_table(index='Name', values='Sports', columns='Sport_num', aggfunc='first')
Sport_num Sport 0 Sport 1 Sport 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
另一个解决方案:
print(
df.groupby("Name")
.agg(list)["Sports"]
.apply(pd.Series)
.rename(columns={0: "First", 1: "Second", 2: "Third"})
.add_suffix("_sport")
.reset_index()
)
打印:
Name First_sport Second_sport Third_sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance
我们也可以使用groupby cumcount
in conjunction with set_index
+ unstack
:
new_df = df.set_index(['Name', df.groupby('Name').cumcount()]).unstack()
new_df
:
Sports
0 1 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
我们可以通过重命名和折叠 MultiIndex 来做一些额外的清理工作:
new_df = (
df.set_index(['Name', df.groupby('Name').cumcount()])
.unstack()
.rename(columns={0: "First", 1: "Second", 2: "Third",
'Sports': 'Sport'})
)
new_df.columns = new_df.columns.swaplevel().map('_'.join)
new_df = new_df.reset_index()
new_df
:
Name First_Sport Second_Sport Third_Sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance
如果想要从整数到序数词的编程转换,我们可以使用类似 inflect:
import inflect
new_df = df.set_index([
'Name', df.groupby('Name').cumcount().add(1)
]).unstack()
# Collapse MultiIndex
p = inflect.engine()
new_df.columns = new_df.columns.map(
# Convert to Ordinal Word and Column to singular noun
lambda c: f'{p.number_to_words(p.ordinal(c[1])).capitalize()}_'
f'{p.singular_noun(c[0])}'
)
new_df = new_df.reset_index()
new_df
:
Name First_Sport Second_Sport Third_Sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance
我有一个看起来像这样的数据框
d = {'Name': ['Sally', 'Sally', 'Sally', 'James', 'James', 'James'], 'Sports': ['Tennis', 'Track & field', 'Dance', 'Dance', 'MMA', 'Crosscountry']}
df = pd.DataFrame(data=d)
Name | Sports |
---|---|
Sally | Tennis |
Sally | Track & field |
Sally | Dance |
James | Dance |
James | MMA |
James | Crosscountry |
似乎 pandas' pivot_table 只允许使用数字聚合进行整形,但我想将其整形为宽格式,以便字符串位于“值”中:
Name | First_sport | Second_sport | Third_sport |
---|---|---|---|
Sally | Tennis | Track & field | Dance |
James | Dance | MMA | Crosscountry |
pandas 中有什么方法可以帮助我做到这一点吗?谢谢!
您可以这样做,如果您的列/索引名称是唯一的,则使用 .pivot()
,或者通过提供也适用于字符串的聚合函数使用 .pivot_table()
,例如'first'
.
>>> df['Sport_num'] = 'Sport ' + df.groupby('Name').cumcount().astype(str)
>>> df
Name Sports Sport_num
0 Sally Tennis Sport 0
1 Sally Track & field Sport 1
2 Sally Dance Sport 2
3 James Dance Sport 0
4 James MMA Sport 1
5 James Crosscountry Sport 2
>>> df.pivot(index='Name', values='Sports', columns='Sport_num')
Sport_num Sport 0 Sport 1 Sport 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
>>> df.pivot_table(index='Name', values='Sports', columns='Sport_num', aggfunc='first')
Sport_num Sport 0 Sport 1 Sport 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
另一个解决方案:
print(
df.groupby("Name")
.agg(list)["Sports"]
.apply(pd.Series)
.rename(columns={0: "First", 1: "Second", 2: "Third"})
.add_suffix("_sport")
.reset_index()
)
打印:
Name First_sport Second_sport Third_sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance
我们也可以使用groupby cumcount
in conjunction with set_index
+ unstack
:
new_df = df.set_index(['Name', df.groupby('Name').cumcount()]).unstack()
new_df
:
Sports
0 1 2
Name
James Dance MMA Crosscountry
Sally Tennis Track & field Dance
我们可以通过重命名和折叠 MultiIndex 来做一些额外的清理工作:
new_df = (
df.set_index(['Name', df.groupby('Name').cumcount()])
.unstack()
.rename(columns={0: "First", 1: "Second", 2: "Third",
'Sports': 'Sport'})
)
new_df.columns = new_df.columns.swaplevel().map('_'.join)
new_df = new_df.reset_index()
new_df
:
Name First_Sport Second_Sport Third_Sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance
如果想要从整数到序数词的编程转换,我们可以使用类似 inflect:
import inflect
new_df = df.set_index([
'Name', df.groupby('Name').cumcount().add(1)
]).unstack()
# Collapse MultiIndex
p = inflect.engine()
new_df.columns = new_df.columns.map(
# Convert to Ordinal Word and Column to singular noun
lambda c: f'{p.number_to_words(p.ordinal(c[1])).capitalize()}_'
f'{p.singular_noun(c[0])}'
)
new_df = new_df.reset_index()
new_df
:
Name First_Sport Second_Sport Third_Sport
0 James Dance MMA Crosscountry
1 Sally Tennis Track & field Dance