Pandas：根据另一列中存在的组将一列中的所有值标准化为 0 到 10 之间

Question

假设我有一个像这样的数据框：

    Group  Values
0       1       1
1       1       4
2       1       2
3       1       7
4       1       3
5       2       4
6       2       1
7       2       5
8       2      12
9       2       4
10      2      10
11      3       2
12      3       6
13      3      20
14      3      15

MRE:

df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, 2, 7, 3, 4, 1, 5, 12, 4, 10, 2, 6, 20, 15]})

请注意数据框中第 1 组的最大值为 7，而第 2 组为 12，第 3 组为 20。现在我想将每个 Group 的 Values 归一化为上限值为 10.

我尝试使用 pd.groupby 方法，但我不知道如何继续。另外我知道我可以使用 for 循环，但这会非常低效，因为我在尝试处理的数据中有大约 20k 个样本。

有没有什么甜蜜又微妙的方法可以做到这一点？

Answer 1

为此，您可以使用 groupby 和 transform。有一个类似的例子 in the docs of transform():

import pandas as pd

df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, 2, 7, 3, 4, 1, 5, 12, 4, 10, 2, 6, 20, 15]})

df['normal'] = df.groupby('Group').transform(lambda x: (x / x.max()) * 10)

print(df)

打印：

    Group  Values     normal
0       1       1   1.428571
1       1       4   5.714286
2       1       2   2.857143
3       1       7  10.000000
4       1       3   4.285714
5       2       4   3.333333
6       2       1   0.833333
7       2       5   4.166667
8       2      12  10.000000
9       2       4   3.333333
10      2      10   8.333333
11      3       2   1.000000
12      3       6   3.000000
13      3      20  10.000000
14      3      15   7.500000

Pandas：根据另一列中存在的组将一列中的所有值标准化为 0 到 10 之间

Pandas: Normalizing all the values in one column between 0 and 10 based on groups present in another column

python

normalize

pandas