Python dataframe return 所有最大值的列索引作为列表

Question

我正在寻找具有最大值的数据框列，并将这个变量名分配给一个新变量。一个类似的例子 here 没有在数据帧设置中回答这个问题。请参阅以下示例：

import pandas as pd

data = {'A': [1, 2, 2, 0], 'B':[2, 0, 2, 1]}
df = pd.DataFrame(data)

我想创建一个变量 df['C'] = [B, A, [A, B], B]。

Answer 1

你可以把它分成几行，但我想就是这样:

df["C"] = df.apply(lambda x: "A, B" if x.A == x.B == max(x.A, x.B) else "A" if x.A == max(x.A, x.B) else "B", axis=1)

这会给你

   A  B     C
0  1  2     B
1  2  0     A
2  2  2  A, B
3  0  1     B

Answer 2

在第二个轴上使用 max 并将数据框修改为 select 与每行最大值匹配的列：

# get max value per row and identify matching cells
m = df.eq(df.max(axis=1), axis=0)
# mask and reshape to 1D (removes the non matches)
s = m.where(m).stack()
# aggregate to produce the final result
df['C'] = (s.index.get_level_values(1)
            .to_series()
            .groupby(s.index.get_level_values(0))
            .apply(list)
           )

输出：

   A  B       C
0  1  2     [B]
1  2  0     [A]
2  2  2  [A, B]
3  0  1     [B]

Python dataframe return 所有最大值的列索引作为列表

Python dataframe return column index of all max values as list

python

apply

dataframe