pandas groupby 每组值
pandas groupby per-group value
我有这个数据:
df = pd.DataFrame({
"dim1": [ "aaa", "aaa", "aaa", "aaa", "aaa", "aaa" ],
"dim2": [ "xxx", "xxx", "xxx", "yyy", "yyy", "yyy" ],
"iter": [ 0, 1, 2, 0, 1, 2 ],
"value1": [ 100, 101, 99, 500, 490, 510 ],
"value2": [ 10000, 10100, 9900, 50000, 49000, 51000 ],
})
然后我 groupby
dim1/dim2 并且在所有迭代中,我选择 value1/value2 作为最小值 1:
df = df.groupby(["dim1", "dim2"], group_keys=False) \
.apply(lambda x: x.sort_values("value1").head(1)).drop(columns=["iter"])
哪个returns:
dim1 dim2 value1 value2
aaa xxx 99 9900
aaa yyy 490 49000
我的问题:如何添加包含每个 dim1 组的最小值 1 的新列:
dim1 dim2 value1 value2 new_col
aaa xxx 99 9900 99
aaa yyy 490 49000 99
我试过这样的方法,但没有用:
df["new_col"] = df.groupby(["dim1"], group_keys=False) \
.apply(lambda x: x.value1.head(1))
IIUC,之后可以用.groupby
+.transform
:
df["new_col"] = df.groupby("dim1")["value1"].transform("min")
print(df)
打印:
dim1 dim2 value1 value2 new_col
2 aaa xxx 99 9900 99
4 aaa yyy 490 49000 99
我有这个数据:
df = pd.DataFrame({
"dim1": [ "aaa", "aaa", "aaa", "aaa", "aaa", "aaa" ],
"dim2": [ "xxx", "xxx", "xxx", "yyy", "yyy", "yyy" ],
"iter": [ 0, 1, 2, 0, 1, 2 ],
"value1": [ 100, 101, 99, 500, 490, 510 ],
"value2": [ 10000, 10100, 9900, 50000, 49000, 51000 ],
})
然后我 groupby
dim1/dim2 并且在所有迭代中,我选择 value1/value2 作为最小值 1:
df = df.groupby(["dim1", "dim2"], group_keys=False) \
.apply(lambda x: x.sort_values("value1").head(1)).drop(columns=["iter"])
哪个returns:
dim1 dim2 value1 value2
aaa xxx 99 9900
aaa yyy 490 49000
我的问题:如何添加包含每个 dim1 组的最小值 1 的新列:
dim1 dim2 value1 value2 new_col
aaa xxx 99 9900 99
aaa yyy 490 49000 99
我试过这样的方法,但没有用:
df["new_col"] = df.groupby(["dim1"], group_keys=False) \
.apply(lambda x: x.value1.head(1))
IIUC,之后可以用.groupby
+.transform
:
df["new_col"] = df.groupby("dim1")["value1"].transform("min")
print(df)
打印:
dim1 dim2 value1 value2 new_col
2 aaa xxx 99 9900 99
4 aaa yyy 490 49000 99