获取多个列并将它们放入相同的索引 pandas

Take multiple columns and put them to the same index with pandas

我有一堆丑陋的客户数据,我正试图对其进行规范化。它基本上是这样的:

Customer   Order1   Order2   Order3 ... OrderN

  John      This     That     The   ...  Other
 Shelly    Thing1   Thing2   Thing3 ... ThingN
   .         .        .        .          .
   .         .        .        .          .

所以我想把它改成这样:

Customer   Order

 John      This
 John      That
 John      The
 Shelly    Thing1
 Shelly    Thing2

以此类推

不过我不知道该怎么做。

任何帮助都会很棒!

pd.melt 就是您要找的人:

# Assuming all the other columns are orders except for the Customer column
value_list = [col for col in df.columns if col != 'Customer']

pd.melt(df, id_vars=['Customer'], value_vars=value_list,
        value_name='Order').drop('variable', axis=1)

  Customer   Order
0    John    this
1  Shelly  thing1
2    John    that
3  Shelly  thing2
4    John    that
5  Shelly  thing2

我觉得用stack稍微好一点

df.set_index('Customer').stack().reset_index(level=0)
Out[1219]: 
       Customer       0
Order1     John    This
Order2     John    That
Order3     John     The
OrderN     John   Other
Order1   Shelly  Thing1
Order2   Shelly  Thing2
Order3   Shelly  Thing3
OrderN   Shelly  ThingN

恰好是一个 stack 和两个 reset_index 调用。

df
  Customer  Order1  Order2  Order3  OrderN
0     John    This    That     The   Other
1   Shelly  Thing1  Thing2  Thing3  ThingN

(df.set_index('Customer')
   .stack()
   .reset_index(level=1, drop=True)
   .reset_index(name='Order')
)

  Customer   Order
0     John    This
1     John    That
2     John     The
3     John   Other
4   Shelly  Thing1
5   Shelly  Thing2
6   Shelly  Thing3
7   Shelly  ThingN

使用理解

pd.DataFrame(
    [[c, o] for c, *O in df.values for o in O],
    columns=['Customer', 'Order']
)

  Customer   Order
0     John    This
1     John    That
2     John     The
3     John   Other
4   Shelly  Thing1
5   Shelly  Thing2
6   Shelly  Thing3
7   Shelly  ThingN