函数式编程问题：如何将函数应用于具有多索引列的数据框中的同一列？

Question

假设下面的数据框。如何将函数（例如 np.log）应用于下面的两列 e？我很确定需要使用 apply 或 map 但从未使用具有多索引列的数据框来完成此操作？

df = pd.DataFrame([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]])
df.columns = pd.MultiIndex.from_tuples((("a", "d"), ("a", "e"), ("b", "d"), ("b","e")))
df  

    a       b
    d   e   d   e
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12
3  13  14  15  16

期望的输出：

df.loc[:, (slice(None), ['e'])] = df.loc[:, (slice(None), ['e'])].apply(np.log).round(2)

df
    a         b
    d     e   d     e
0   1  0.69   3  1.39
1   5  1.79   7  2.08
2   9  2.30  11  2.48
3  13  2.64  15  2.77

我意识到我可以使用上述不优雅的方法，但我想利用函数式编程技术。

Answer 1

您可以使用 swaplevel 将 e 列名放在顶部，使它们可以通过索引直接访问：

df = df.swaplevel(axis=1).assign(e=df.swaplevel(axis=1)['e'].apply(np.log).round(2)).swaplevel(axis=1)

输出：

>>> df
    a         b      
    d     e   d     e
0   1  0.69   3  1.39
1   5  1.79   7  2.08
2   9  2.30  11  2.48
3  13  2.64  15  2.77

Answer 2

在列轴上使用 loc 和 select：

# for this transform and apply are interchangeable
df.loc(axis=1)[:, 'e'] = df.loc(axis=1)[:, 'e'].transform(np.log).round(2)

df
    a         b
    d     e   d     e
0   1  0.69   3  1.39
1   5  1.79   7  2.08
2   9  2.30  11  2.48
3  13  2.64  15  2.77

为了回应您的评论，您可以使用字典理解（但在我看来不会提供任何性能增益/清晰度）：

result = {key : ent.transform(np.log).round(2) 
          for key, ent in df.loc(axis=1)[:, 'e'].items()}

df.update(result)

函数式编程问题：如何将函数应用于具有多索引列的数据框中的同一列？

Functional programming question: How to apply a function to the same column in a dataframe that has multiindex columns?

functional-programming

numpy

python-3.x

pandas