如何在 pandas 数据帧中对 0 到 1 之间的变量进行归一化
How to do normalization of variables between 0 to 1 in pandas dataframe
我想根据
的公式对每个组的以下数据集进行归一化
(x-min(x))/(max(x)-min(x))
每组。我怎样才能在 pandas 数据框中做到这一点?我需要对价格和尺寸进行标准化吗?谢谢。
data = [['Group 1',10,100],
['Group 1',20,80],
['Group 1',15,60],
['Group 1',10,120],
['Group 2',10,120],
['Group 2',20,130],
['Group 2',30,200],
['Group 2',40,250],
['Group 2',50,300]]
df = pd.DataFrame(data, columns = ['Group','price','size'])
将 GroupBy.apply
与自定义函数一起使用:
cols = ['price','size']
df[cols] = df.groupby('Group')[cols].apply(lambda x: (x-x.min())/(x.max()-x.min()))
print (df)
Group price size
0 Group 1 0.00 0.666667
1 Group 1 1.00 0.333333
2 Group 1 0.50 0.000000
3 Group 1 0.00 1.000000
4 Group 2 0.00 0.000000
5 Group 2 0.25 0.055556
6 Group 2 0.50 0.444444
7 Group 2 0.75 0.722222
8 Group 2 1.00 1.000000
cols = ['price','size']
g = df.groupby('Group')[cols]
min1 = g.transform('min')
max1 = g.transform('max')
df1 = df.join(df[cols].sub(min1).div(max1 - min1).add_suffix('_norm'))
print (df1)
Group price size price_norm size_norm
0 Group 1 10 100 0.00 0.666667
1 Group 1 20 80 1.00 0.333333
2 Group 1 15 60 0.50 0.000000
3 Group 1 10 120 0.00 1.000000
4 Group 2 10 120 0.00 0.000000
5 Group 2 20 130 0.25 0.055556
6 Group 2 30 200 0.50 0.444444
7 Group 2 40 250 0.75 0.722222
8 Group 2 50 300 1.00 1.000000
df[['normalized_price', 'normalized_size']]= df.groupby('Group').transform(lambda x: (x - x.min())/ (x.max() - x.min()))
df
Group price size normalized_price normalized_size
0 Group 1 10 100 0.00 0.666667
1 Group 1 20 80 1.00 0.333333
2 Group 1 15 60 0.50 0.000000
3 Group 1 10 120 0.00 1.000000
4 Group 2 10 120 0.00 0.000000
5 Group 2 20 130 0.25 0.055556
6 Group 2 30 200 0.50 0.444444
7 Group 2 40 250 0.75 0.722222
8 Group 2 50 300 1.00 1.000000
我想根据
的公式对每个组的以下数据集进行归一化(x-min(x))/(max(x)-min(x))
每组。我怎样才能在 pandas 数据框中做到这一点?我需要对价格和尺寸进行标准化吗?谢谢。
data = [['Group 1',10,100],
['Group 1',20,80],
['Group 1',15,60],
['Group 1',10,120],
['Group 2',10,120],
['Group 2',20,130],
['Group 2',30,200],
['Group 2',40,250],
['Group 2',50,300]]
df = pd.DataFrame(data, columns = ['Group','price','size'])
将 GroupBy.apply
与自定义函数一起使用:
cols = ['price','size']
df[cols] = df.groupby('Group')[cols].apply(lambda x: (x-x.min())/(x.max()-x.min()))
print (df)
Group price size
0 Group 1 0.00 0.666667
1 Group 1 1.00 0.333333
2 Group 1 0.50 0.000000
3 Group 1 0.00 1.000000
4 Group 2 0.00 0.000000
5 Group 2 0.25 0.055556
6 Group 2 0.50 0.444444
7 Group 2 0.75 0.722222
8 Group 2 1.00 1.000000
cols = ['price','size']
g = df.groupby('Group')[cols]
min1 = g.transform('min')
max1 = g.transform('max')
df1 = df.join(df[cols].sub(min1).div(max1 - min1).add_suffix('_norm'))
print (df1)
Group price size price_norm size_norm
0 Group 1 10 100 0.00 0.666667
1 Group 1 20 80 1.00 0.333333
2 Group 1 15 60 0.50 0.000000
3 Group 1 10 120 0.00 1.000000
4 Group 2 10 120 0.00 0.000000
5 Group 2 20 130 0.25 0.055556
6 Group 2 30 200 0.50 0.444444
7 Group 2 40 250 0.75 0.722222
8 Group 2 50 300 1.00 1.000000
df[['normalized_price', 'normalized_size']]= df.groupby('Group').transform(lambda x: (x - x.min())/ (x.max() - x.min()))
df
Group price size normalized_price normalized_size
0 Group 1 10 100 0.00 0.666667
1 Group 1 20 80 1.00 0.333333
2 Group 1 15 60 0.50 0.000000
3 Group 1 10 120 0.00 1.000000
4 Group 2 10 120 0.00 0.000000
5 Group 2 20 130 0.25 0.055556
6 Group 2 30 200 0.50 0.444444
7 Group 2 40 250 0.75 0.722222
8 Group 2 50 300 1.00 1.000000