获取多个列并将它们放入相同的索引 pandas
Take multiple columns and put them to the same index with pandas
我有一堆丑陋的客户数据,我正试图对其进行规范化。它基本上是这样的:
Customer Order1 Order2 Order3 ... OrderN
John This That The ... Other
Shelly Thing1 Thing2 Thing3 ... ThingN
. . . . .
. . . . .
所以我想把它改成这样:
Customer Order
John This
John That
John The
Shelly Thing1
Shelly Thing2
以此类推
不过我不知道该怎么做。
任何帮助都会很棒!
pd.melt 就是您要找的人:
# Assuming all the other columns are orders except for the Customer column
value_list = [col for col in df.columns if col != 'Customer']
pd.melt(df, id_vars=['Customer'], value_vars=value_list,
value_name='Order').drop('variable', axis=1)
Customer Order
0 John this
1 Shelly thing1
2 John that
3 Shelly thing2
4 John that
5 Shelly thing2
我觉得用stack
稍微好一点
df.set_index('Customer').stack().reset_index(level=0)
Out[1219]:
Customer 0
Order1 John This
Order2 John That
Order3 John The
OrderN John Other
Order1 Shelly Thing1
Order2 Shelly Thing2
Order3 Shelly Thing3
OrderN Shelly ThingN
恰好是一个 stack
和两个 reset_index
调用。
df
Customer Order1 Order2 Order3 OrderN
0 John This That The Other
1 Shelly Thing1 Thing2 Thing3 ThingN
(df.set_index('Customer')
.stack()
.reset_index(level=1, drop=True)
.reset_index(name='Order')
)
Customer Order
0 John This
1 John That
2 John The
3 John Other
4 Shelly Thing1
5 Shelly Thing2
6 Shelly Thing3
7 Shelly ThingN
使用理解
pd.DataFrame(
[[c, o] for c, *O in df.values for o in O],
columns=['Customer', 'Order']
)
Customer Order
0 John This
1 John That
2 John The
3 John Other
4 Shelly Thing1
5 Shelly Thing2
6 Shelly Thing3
7 Shelly ThingN
我有一堆丑陋的客户数据,我正试图对其进行规范化。它基本上是这样的:
Customer Order1 Order2 Order3 ... OrderN
John This That The ... Other
Shelly Thing1 Thing2 Thing3 ... ThingN
. . . . .
. . . . .
所以我想把它改成这样:
Customer Order
John This
John That
John The
Shelly Thing1
Shelly Thing2
以此类推
不过我不知道该怎么做。
任何帮助都会很棒!
pd.melt 就是您要找的人:
# Assuming all the other columns are orders except for the Customer column
value_list = [col for col in df.columns if col != 'Customer']
pd.melt(df, id_vars=['Customer'], value_vars=value_list,
value_name='Order').drop('variable', axis=1)
Customer Order
0 John this
1 Shelly thing1
2 John that
3 Shelly thing2
4 John that
5 Shelly thing2
我觉得用stack
稍微好一点
df.set_index('Customer').stack().reset_index(level=0)
Out[1219]:
Customer 0
Order1 John This
Order2 John That
Order3 John The
OrderN John Other
Order1 Shelly Thing1
Order2 Shelly Thing2
Order3 Shelly Thing3
OrderN Shelly ThingN
恰好是一个 stack
和两个 reset_index
调用。
df
Customer Order1 Order2 Order3 OrderN
0 John This That The Other
1 Shelly Thing1 Thing2 Thing3 ThingN
(df.set_index('Customer')
.stack()
.reset_index(level=1, drop=True)
.reset_index(name='Order')
)
Customer Order
0 John This
1 John That
2 John The
3 John Other
4 Shelly Thing1
5 Shelly Thing2
6 Shelly Thing3
7 Shelly ThingN
使用理解
pd.DataFrame(
[[c, o] for c, *O in df.values for o in O],
columns=['Customer', 'Order']
)
Customer Order
0 John This
1 John That
2 John The
3 John Other
4 Shelly Thing1
5 Shelly Thing2
6 Shelly Thing3
7 Shelly ThingN