groupby 中的 py-polars 中是否有类似 idxmax() 的函数？

Question

import polars as pl
import pandas as pd


A = ['a','a','a','a','a','a','a','b','b','b','b','b','b','b']
B = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]


df = pl.DataFrame({'cola':A,
                   'colb':B})


df_pd = df.to_pandas()

index = df_pd.groupby('cola')['colb'].idxmax()
df_pd.loc[index,'top'] = 1

在 pandas 我可以使用 idxmax() 获取顶部的列。

然而，在极地

我使用 arg_max()

index = df[pl.col('colb').arg_max().over('cola').flatten()]

好像得不到我想要的..

有什么方法可以在 polars 中生成一列 'top'？

非常感谢！

Answer 1

在 Polars 中，window 函数（.over()）将进行聚合 + self-join（参见 https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Expr.over.html?highlight=over#polars.Expr.over），这意味着您不能 return每行一个唯一值，这就是您所追求的。

一种计算顶列的方法是使用 apply:

df.groupby("cola").apply(lambda x: x.with_columns([pl.col("colb"), (pl.col("colb")==pl.col("colb").max()).alias("top")]))

groupby 中的 py-polars 中是否有类似 idxmax() 的函数？

is there any simliar function of idxmax() in py-polars in groupby?

python

dataframe

python-polars