R中的最小-最大归一化，根据另一列设置最小和最大组

Question

我试图使用 R 对列进行最小值-最大值规范化，但我需要按由另一列确定的组来设置最小值和最大值，而不是使用所有列值的最小值和最大值。

请看这个例子：

x <- c(0, 0.5, 1, 2.5, 0.2, 0.3, 0.5, 0,0,0.1, 0.7)
y <- c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3)

df <- data.frame (x, y)

df

对于 y=1，min(x) = 0，max(x) = 2.5。对于 y=2，min(x) = 0.2，max(x) = 0.5，依此类推。

基于这个分组的min和max，进行归一化。

我在 Python 找到了类似的问题，但对我帮助不大：

Answer 1

我不确定你是否需要下面这样的东西

dfout <- within(df,xnorm <- ave(x,y,FUN = function(v) (v-min(v))/diff(range(v))))

这样

> dfout
     x y     xnorm
1  0.0 1 0.0000000
2  0.5 1 0.2000000
3  1.0 1 0.4000000
4  2.5 1 1.0000000
5  0.2 2 0.0000000
6  0.3 2 0.3333333
7  0.5 2 1.0000000
8  0.0 3 0.0000000
9  0.0 3 0.0000000
10 0.1 3 0.1428571
11 0.7 3 1.0000000

Answer 2

library(tidyverse)

df %>%
  group_by(y) %>%
  mutate(xnorm = (x - min(x)) / (max(x) - min(x))) %>%
  ungroup()

输出：

# A tibble: 11 x 3
       x     y xnorm
   <dbl> <dbl> <dbl>
 1   0       1 0    
 2   0.5     1 0.2  
 3   1       1 0.4  
 4   2.5     1 1    
 5   0.2     2 0    
 6   0.3     2 0.333
 7   0.5     2 1    
 8   0       3 0    
 9   0       3 0    
10   0.1     3 0.143
11   0.7     3 1

或者，在 mutate() 语句中，您可以输入 xnorm = scales::rescale(x)

Answer 3

您可以使用聚合函数

aggregate(x, list(y), min)
  Group.1   x
1       1 0.0
2       2 0.2
3       3 0.0
aggregate(x, list(y), max)
  Group.1   x
1       1 2.5
2       2 0.5
3       3 0.7

# You can create your own function like this
myFun <- function (u) {
    c(min(u), mean(u), max(u))
} 
# and pass myFun to aggregate
aggregate(x, list(y), myFun)
  Group.1       x.1       x.2       x.3
1       1 0.0000000 1.0000000 2.5000000
2       2 0.2000000 0.3333333 0.5000000
3       3 0.0000000 0.2000000 0.7000000

# alternative is "by" different output format
by(x, list(y), myFun)

R中的最小-最大归一化，根据另一列设置最小和最大组

Min-max normalization in R, setting groups of min and max based on another column

r

normalization