如何将计算列添加到多索引数据框

Question

这可能已经在几个地方得到了回答，但如果不借助几行代码，我仍然无法让我的示例工作。

我有一个包含多索引列的数据框，我想向每个 0 级组添加一个计算列。

arrays = [
    np.array(["bar", "bar", "baz", "baz", "foo", "foo"]),
    np.array(["one", "two", "one", "two", "one", "two"]),
]

df = pd.DataFrame(np.linspace(0, 4*6-1, 4*6).reshape(4,6), columns=arrays)
print(df)

  bar     baz     foo    
  one two one two one two
0   0   1   2   3   4   5
1   6   7   8   9  10  11
2  12  13  14  15  16  17
3  18  19  20  21  22  23

具体来说，我想在 bar、baz 和 foo 中添加一列 division，它按元素划分 one 和 two.

我现在的做法是这样的：

# divide the numbers of the two dataframes (have to drop the first level)
idx = pd.IndexSlice
div_res = df.loc[:, idx[:, "one"]].droplevel(1, axis=1) / df.loc[
    :, idx[:, "two"]
].droplevel(1, axis=1)
div_res.columns = pd.MultiIndex.from_tuples([(y, "division") for y in div_res.columns])

# and concatenate the result again
result = (
    pd.concat([df, div_res], axis=1).sort_index(
        level=[0, 1],
        axis=1,
    )
)
print(result)

       bar                baz                foo          
  division  one  two division  one  two division  one  two
0      0.0  0.0  1.0      0.7  2.0  3.0      0.8  4.0  5.0
1      0.9  6.0  7.0      0.9  8.0  9.0      0.9 10.0 11.0
2      0.9 12.0 13.0      0.9 14.0 15.0      0.9 16.0 17.0
3      0.9 18.0 19.0      1.0 20.0 21.0      1.0 22.0 23.0

这行得通，但在我看来应该有更有效的方法来做到这一点。

非常感谢！

作为奖励，我还没有弄清楚如何对第二列级别进行排序：one、two、division。

Answer 1

您可以 groupby 在列轴上并应用您的转换：

(df.groupby(level=0, axis=1) # groupby first level of columns
   .apply(lambda d: (d.xs('one', level=1, axis=1)        # select "one"
                      .div(d.xs('two', level=1, axis=1)) # divide by "two" 
                     ).rename(columns=lambda x: 'division')) # col name
   .join(df)  # join with original data
   .sort_index(level=0, axis=1)
)

输出：

        bar                   baz                   foo            
   division   one   two  division   one   two  division   one   two
0  0.000000   0.0   1.0  0.666667   2.0   3.0  0.800000   4.0   5.0
1  0.857143   6.0   7.0  0.888889   8.0   9.0  0.909091  10.0  11.0
2  0.923077  12.0  13.0  0.933333  14.0  15.0  0.941176  16.0  17.0
3  0.947368  18.0  19.0  0.952381  20.0  21.0  0.956522  22.0  23.0

Answer 2

让我们尝试使用 stack 和 unstack

进行整形

s = df.stack(0)
s['div'] = s['one'].div(s['two'])
s.stack().unstack([1, 2])

    bar                   baz                   foo                
    one   two       div   one   two       div   one   two       div
0   0.0   1.0  0.000000   2.0   3.0  0.666667   4.0   5.0  0.800000
1   6.0   7.0  0.857143   8.0   9.0  0.888889  10.0  11.0  0.909091
2  12.0  13.0  0.923077  14.0  15.0  0.933333  16.0  17.0  0.941176
3  18.0  19.0  0.947368  20.0  21.0  0.952381  22.0  23.0  0.956522

如何将计算列添加到多索引数据框

How add calculated column to multiindex dataframe

python

multi-index

pandas