对按另一列的值分组的熊猫列执行归一化

Question

我想按照这里 (link) 的描述进行 z-score 归一化，这基本上是由 x_normalized = (x- x_mean)/x_std 给出的。我有以下数据框

import pandas as pd
  
country = ['US', 'US', 'US', 'UK', 'UK', 'Canada', 'Canada', "Mexico"]
rating =  [0, 2, 1, 4, 3, 1, 0, 1] 

df = pd.DataFrame(list(zip(country,rating)),
               columns =['country', 'rating'])

也就是

    country     rating
0   US            0
1   US            2
2   US            1
3   UK            4
4   UK            3
5   Canada        1
6   Canada        0
7   Mexico        1

现在我想对 rating 列的每个不同值的 country 列执行 z-score 归一化。即对于值 US 执行值 0, 2, 1 的归一化，对于 UK 值 4, 3 等等。我该怎么做？

Answer 1

试试 groupby 和 transform:

>>> df.groupby("country")["rating"].transform(lambda x: (x-x.mean())/x.std())
0   -1.000000
1    1.000000
2    0.000000
3    0.707107
4   -0.707107
5    0.707107
6   -0.707107
7         NaN
Name: rating, dtype: float64

对按另一列的值分组的熊猫列执行归一化

Performing normalization of panda column grouped by values of another column

python

normalization

pandas

pandas-groupby