Pandas：重塑和多索引

Question

我有一个包含这些列的 pandas 数据框：

itemid
15/01/2015 状态
15/01/2015 地点
15/02/2015 状态
15/02/2015 地点
等等

这两件事我该怎么做？

创建多索引列，其中第一个索引是月份，第二个索引是我正在跟踪的指标（状态、位置）
堆叠列，使 table 看起来像这样：

+--------+-----------+----------+--------+--+
| itemid |  mymonth  | location | status |  |
+--------+-----------+----------+--------+--+
| A      | 15/1/2015 | North    | Good   |  |
| A      | 15/2/2015 | South    | Bad    |  |
+--------+-----------+----------+--------+--+

从如下所示的输入开始：

+--------+-------------------+---------------------+-------------------+---------------------+
| itemid | 15/01/2015 status | 15/01/2015 location | 15/02/2015 status | 15/02/2015 location |
+--------+-------------------+---------------------+-------------------+---------------------+
| A      | Good              | North               | Bad               | South               |
+--------+-------------------+---------------------+-------------------+---------------------+

哪个（输入）可以重新创建：

import pandas as pd
df=pd.DataFrame()
df['itemid']=['A']
df['15/01/2015 status'] = ['Good']
df['15/01/2015 location'] = ['North']
df['15/02/2015 status'] = ['Bad']
df['15/02/2015 location'] = ['South']

我一直在考虑如何使用 melt，但我不太确定它是否适用于这种情况。

Answer 1

您可以使用 stack with split and last pivot_table with rename_axis（pandas 0.18.0 中的新功能）：

df1 = df.set_index('itemid').stack().reset_index()
df1.columns = ['itemid','mymonth', 'd']

df1[['mymonth','c']] = df1.mymonth.str.split('\s+').apply(pd.Series)
print df1
  itemid     mymonth      d         c
0      A  15/01/2015   Good    status
1      A  15/01/2015  North  location
2      A  15/02/2015    Bad    status
3      A  15/02/2015  South  location

print df1.pivot_table(index=['itemid', 'mymonth'], columns='c', values='d', aggfunc='first')
        .reset_index()
        .rename_axis(None, axis=1)

  itemid     mymonth location status
0      A  15/01/2015    North   Good
1      A  15/02/2015    South    Bad

编辑：

我认为如果按 first 聚合，您有时会丢失数据，因为您只带来第一个值（如果创建新索引的列中的重复性）而其他值会丢失。

所以如果按字符串聚合，可以使用join。数据未丢失，仅由 ,:

连接和分隔

print df1.pivot_table(index=['itemid', 'mymonth'], columns='c', values='d',aggfunc=', '.join)
         .reset_index()
         .rename_axis(None, axis=1)

Pandas：重塑和多索引

Pandas: reshaping and multi-index

python

reshape

dataframe

pandas