Pandas:重塑和多索引
Pandas: reshaping and multi-index
我有一个包含这些列的 pandas 数据框:
- itemid
- 15/01/2015 状态
- 15/01/2015 地点
- 15/02/2015 状态
- 15/02/2015 地点
- 等等
这两件事我该怎么做?
- 创建多索引列,其中第一个索引是月份,第二个索引是我正在跟踪的指标(状态、位置)
- 堆叠列,使 table 看起来像这样:
+--------+-----------+----------+--------+--+
| itemid | mymonth | location | status | |
+--------+-----------+----------+--------+--+
| A | 15/1/2015 | North | Good | |
| A | 15/2/2015 | South | Bad | |
+--------+-----------+----------+--------+--+
从如下所示的输入开始:
+--------+-------------------+---------------------+-------------------+---------------------+
| itemid | 15/01/2015 status | 15/01/2015 location | 15/02/2015 status | 15/02/2015 location |
+--------+-------------------+---------------------+-------------------+---------------------+
| A | Good | North | Bad | South |
+--------+-------------------+---------------------+-------------------+---------------------+
哪个(输入)可以重新创建:
import pandas as pd
df=pd.DataFrame()
df['itemid']=['A']
df['15/01/2015 status'] = ['Good']
df['15/01/2015 location'] = ['North']
df['15/02/2015 status'] = ['Bad']
df['15/02/2015 location'] = ['South']
我一直在考虑如何使用 melt,但我不太确定它是否适用于这种情况。
您可以使用 stack
with split
and last pivot_table
with rename_axis
(pandas
0.18.0
中的新功能):
df1 = df.set_index('itemid').stack().reset_index()
df1.columns = ['itemid','mymonth', 'd']
df1[['mymonth','c']] = df1.mymonth.str.split('\s+').apply(pd.Series)
print df1
itemid mymonth d c
0 A 15/01/2015 Good status
1 A 15/01/2015 North location
2 A 15/02/2015 Bad status
3 A 15/02/2015 South location
print df1.pivot_table(index=['itemid', 'mymonth'], columns='c', values='d', aggfunc='first')
.reset_index()
.rename_axis(None, axis=1)
itemid mymonth location status
0 A 15/01/2015 North Good
1 A 15/02/2015 South Bad
编辑:
我认为如果按 first
聚合,您有时会丢失数据,因为您只带来第一个值(如果创建新索引的列中的重复性)而其他值会丢失。
所以如果按字符串聚合,可以使用join
。数据 未 丢失,仅由 ,
:
连接和分隔
print df1.pivot_table(index=['itemid', 'mymonth'], columns='c', values='d',aggfunc=', '.join)
.reset_index()
.rename_axis(None, axis=1)
我有一个包含这些列的 pandas 数据框:
- itemid
- 15/01/2015 状态
- 15/01/2015 地点
- 15/02/2015 状态
- 15/02/2015 地点
- 等等
这两件事我该怎么做?
- 创建多索引列,其中第一个索引是月份,第二个索引是我正在跟踪的指标(状态、位置)
- 堆叠列,使 table 看起来像这样:
+--------+-----------+----------+--------+--+
| itemid | mymonth | location | status | |
+--------+-----------+----------+--------+--+
| A | 15/1/2015 | North | Good | |
| A | 15/2/2015 | South | Bad | |
+--------+-----------+----------+--------+--+
从如下所示的输入开始:
+--------+-------------------+---------------------+-------------------+---------------------+
| itemid | 15/01/2015 status | 15/01/2015 location | 15/02/2015 status | 15/02/2015 location |
+--------+-------------------+---------------------+-------------------+---------------------+
| A | Good | North | Bad | South |
+--------+-------------------+---------------------+-------------------+---------------------+
哪个(输入)可以重新创建:
import pandas as pd
df=pd.DataFrame()
df['itemid']=['A']
df['15/01/2015 status'] = ['Good']
df['15/01/2015 location'] = ['North']
df['15/02/2015 status'] = ['Bad']
df['15/02/2015 location'] = ['South']
我一直在考虑如何使用 melt,但我不太确定它是否适用于这种情况。
您可以使用 stack
with split
and last pivot_table
with rename_axis
(pandas
0.18.0
中的新功能):
df1 = df.set_index('itemid').stack().reset_index()
df1.columns = ['itemid','mymonth', 'd']
df1[['mymonth','c']] = df1.mymonth.str.split('\s+').apply(pd.Series)
print df1
itemid mymonth d c
0 A 15/01/2015 Good status
1 A 15/01/2015 North location
2 A 15/02/2015 Bad status
3 A 15/02/2015 South location
print df1.pivot_table(index=['itemid', 'mymonth'], columns='c', values='d', aggfunc='first')
.reset_index()
.rename_axis(None, axis=1)
itemid mymonth location status
0 A 15/01/2015 North Good
1 A 15/02/2015 South Bad
编辑:
我认为如果按 first
聚合,您有时会丢失数据,因为您只带来第一个值(如果创建新索引的列中的重复性)而其他值会丢失。
所以如果按字符串聚合,可以使用join
。数据 未 丢失,仅由 ,
:
print df1.pivot_table(index=['itemid', 'mymonth'], columns='c', values='d',aggfunc=', '.join)
.reset_index()
.rename_axis(None, axis=1)