pandas groupby 每组值

pandas groupby per-group value

我有这个数据:

df = pd.DataFrame({
    "dim1":   [ "aaa", "aaa", "aaa", "aaa", "aaa", "aaa" ],
    "dim2":   [ "xxx", "xxx", "xxx", "yyy", "yyy", "yyy" ],
    "iter":   [     0,     1,     2,     0,     1,     2 ],
    "value1": [   100,   101,    99,   500,   490,   510 ],
    "value2": [ 10000, 10100,  9900, 50000, 49000, 51000 ],
})

然后我 groupby dim1/dim2 并且在所有迭代中,我选择 value1/value2 作为最小值 1:

df = df.groupby(["dim1", "dim2"], group_keys=False) \
    .apply(lambda x: x.sort_values("value1").head(1)).drop(columns=["iter"])

哪个returns:

dim1    dim2    value1  value2
 aaa    xxx         99    9900
 aaa    yyy        490   49000

我的问题:如何添加包含每个 dim1 组的最小值 1 的新列:

dim1    dim2    value1  value2     new_col
 aaa    xxx         99    9900          99
 aaa    yyy        490   49000          99

我试过这样的方法,但没有用:

df["new_col"] = df.groupby(["dim1"], group_keys=False) \
    .apply(lambda x: x.value1.head(1))

IIUC,之后可以用.groupby+.transform

df["new_col"] = df.groupby("dim1")["value1"].transform("min")
print(df)

打印:

  dim1 dim2  value1  value2  new_col
2  aaa  xxx      99    9900       99
4  aaa  yyy     490   49000       99