Pandas:旋转重复值
Pandas: Pivoting With Duplicate Values
我想旋转一个在一列中具有重复值的数据框,以在新列中显示关联值,如下例所示。从 Pandas 文档中我无法弄清楚如何从这里开始...
name car model
rob mazda 626
rob bmw 328
james audi a4
james VW golf
tom audi a6
tom ford focus
为此...
name car_1 model_1 car_2 model_2
rob mazda 626 bmw 328
james audi a4 VW golf
tom audi a6 ford focus
x = df.groupby('name')['car','model'] \
.apply(lambda x: pd.DataFrame(x.values.tolist(),
columns=['car','model'])) \
.unstack()
x.columns = ['{0[0]}_{0[1]}'.format(tup) for tup in x.columns]
结果:
In [152]: x
Out[152]:
car_0 car_1 model_0 model_1
name
james audi VW a4 golf
rob mazda bmw 626 328
tom audi ford a6 focus
如何对列进行排序:
In [157]: x.loc[:, x.columns.str[::-1].sort_values().str[::-1]]
Out[157]:
model_0 car_0 model_1 car_1
name
james a4 audi golf VW
rob 626 mazda 328 bmw
tom a6 audi focus ford
我们可以使用groupby
和cumcount
设置索引
i = df.groupby('name').cumcount() + 1
df.set_index(['name', i2]).unstack()
car model
1 2 1 2
name
james audi VW a4 golf
rob mazda bmw 626 328
tom audi ford a6 focus
或者我们可以折叠 pd.MultiIndex
列
i = df.groupby('name').cumcount() + 1
d1 = df.set_index(['name', i2]).unstack().sort_index(1, 1)
d1.columns = d1.columns.to_series().map('{0[0]}_{0[1]}'.format)
d1
car_1 model_1 car_2 model_2
name
james audi a4 VW golf
rob mazda 626 bmw 328
tom audi a6 ford focus
我想旋转一个在一列中具有重复值的数据框,以在新列中显示关联值,如下例所示。从 Pandas 文档中我无法弄清楚如何从这里开始...
name car model
rob mazda 626
rob bmw 328
james audi a4
james VW golf
tom audi a6
tom ford focus
为此...
name car_1 model_1 car_2 model_2
rob mazda 626 bmw 328
james audi a4 VW golf
tom audi a6 ford focus
x = df.groupby('name')['car','model'] \
.apply(lambda x: pd.DataFrame(x.values.tolist(),
columns=['car','model'])) \
.unstack()
x.columns = ['{0[0]}_{0[1]}'.format(tup) for tup in x.columns]
结果:
In [152]: x
Out[152]:
car_0 car_1 model_0 model_1
name
james audi VW a4 golf
rob mazda bmw 626 328
tom audi ford a6 focus
如何对列进行排序:
In [157]: x.loc[:, x.columns.str[::-1].sort_values().str[::-1]]
Out[157]:
model_0 car_0 model_1 car_1
name
james a4 audi golf VW
rob 626 mazda 328 bmw
tom a6 audi focus ford
我们可以使用groupby
和cumcount
i = df.groupby('name').cumcount() + 1
df.set_index(['name', i2]).unstack()
car model
1 2 1 2
name
james audi VW a4 golf
rob mazda bmw 626 328
tom audi ford a6 focus
或者我们可以折叠 pd.MultiIndex
列
i = df.groupby('name').cumcount() + 1
d1 = df.set_index(['name', i2]).unstack().sort_index(1, 1)
d1.columns = d1.columns.to_series().map('{0[0]}_{0[1]}'.format)
d1
car_1 model_1 car_2 model_2
name
james audi a4 VW golf
rob mazda 626 bmw 328
tom audi a6 ford focus