多索引数据框以 table 为新列

multiindex dataframe to pivot table with a new column

我有一个带有 multindex 的数据框,我想将它转换为一个数据透视表 table,对列进行总结,数据是:

import random
import pandas as pd
arrays = [[2,2,3,3,3,4,4,4,4,5,5,7,7],
      [1,2,1,2,3,1,2,3,4,1,3,1,4]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names = ['first','second'])
data = pd.Series(random.sample(range(1,100),13), index = index)
data

first  second
2      1         28
       2         20
3      1          7
       2          6
       3         86
4      1         10
       2         30
       3          8
       4         44
5      1         74
       3         65
7      1         12
       4         72
dtype: int64

我想将其转换为(内部值是列值的总和):

      second==1    second > 1
first
2      28          20
3      7           92
4      10          38
5      74          65
7      1           72

有没有一种优雅的方式来做到这一点?

谢谢!

设置

import random
import pandas as pd
random.seed(314)

arrays = [[2,2,3,3,3,4,4,4,4,5,5,7,7],
      [1,2,1,2,3,1,2,3,4,1,3,1,4]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names = ['first','second'])
data = pd.Series(random.sample(range(1,100),13), index = index)
data

first  second
2      1         20
       2         12
3      1          1
       2         63
       3         24
4      1         21
       2         55
       3         45
       4         18
5      1         11
       3         25
7      1          3
       4         26
dtype: int64

解决方案

def eq_one(x):
    values = [x.ix[1], x.sum() - x.ix[1]]
    index = ['second==1', 'second > 1']
    return pd.Series(values, index=index)

data.unstack().apply(eq_one, axis=1)

       second==1  second > 1
first                       
2           20.0        12.0
3            1.0        87.0
4           21.0       118.0
5           11.0        25.0
7            3.0        26.0

使用 groupby 修改您的特定级别索引并重组数据。 set_levels()get_level_values() 在您修改 pandas multiIndex 的特定级别索引时很有用。

data.index = data.index.set_levels(data.index.get_level_values(1).map(lambda x: 'second = 1' if x == 1 else 'second > 1'), level=1)
print data.unstack().fillna(0).groupby(axis=1, level=0).sum()

second  second = 1  second > 1
first                         
2             44.0        46.0
3            110.0        31.0
4             63.0       150.0
5             74.0         0.0
7              7.0        86.0