R: z-score 归一化

R: z-score normalization

我想对 R 中矩阵的每一行进行 z-score 归一化。我使用归一化函数,它可以很好地用于此目的:

library(som)

training <- matrix(seq(1:20), ncol = 10)
training
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    3    5    7    9   11   13   15   17    19
[2,]    2    4    6    8   10   12   14   16   18    20
training_zscore <- normalize(training, byrow=TRUE)
training_zscore
          [,1]      [,2]       [,3]       [,4]       [,5]      [,6]      [,7]      [,8]     [,9]    [,10]
[1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
[2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301

假设我现在有另一个矩阵,例如以下:

validation <- matrix(seq(1:20)*2, ncol = 10)
validation
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    6   10   14   18   22   26   30   34    38
[2,]    4    8   12   16   20   24   28   32   36    40

我还想对这个新矩阵进行 z-score 变换。然而,缩放比例应该与训练 z 分数矩阵相同。我怎样才能做到这一点?

如果我只执行单独的 z-score 归一化,我会得到以下输出:

> validation_zscore <- normalize(validation, byrow=TRUE)
> validation_zscore
          [,1]      [,2]       [,3]       [,4]       [,5]      [,6]      [,7]      [,8]     [,9]    [,10]
[1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
[2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301

但这不是我想要的,例如在训练矩阵中,值“10”被转换为“-0.1651446”的 z 分数。这也应该是验证矩阵中的情况(然而,这里的 10 被转换为“-0.8257228”的 z 分数):

感谢您的帮助!

不清楚,但我假设您希望 validation 的每一行都使用 training 作为 "reference" 进行规范化。如果是这样,您可以使用 base::scale 并给出均值和标准差的数值。无论如何,使用 som::normalize 有什么意义?

training <- matrix(seq(1:20), ncol = 10)
training_zscore <- t(scale(t(training)))
training_zscore
# [,1]      [,2]       [,3]       [,4]       [,5]      [,6]      [,7]      [,8]     [,9]    [,10]
# [1,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
# [2,] -1.486301 -1.156012 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.156012 1.486301
# attr(,"scaled:center")
# [1] 10 11
# attr(,"scaled:scale")
# [1] 6.055301 6.055301

validation <- matrix(seq(1:20)*2, ncol = 10)    
validation_zscore <- t(scale(t(validation), center = rowMeans(training),
                             scale = apply(training, 1, sd)))
# [,1]       [,2]      [,3]      [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
# [1,] -1.321157 -0.6605783 0.0000000 0.6605783 1.321157 1.981735 2.642313 3.302891 3.963470 4.624048
# [2,] -1.156012 -0.4954337 0.1651446 0.8257228 1.486301 2.146879 2.807458 3.468036 4.128614 4.789192
# attr(,"scaled:center")
# [1] 10 11
# attr(,"scaled:scale")
# [1] 6.055301 6.055301