如何归一化model.matrix?
How to normalize a model.matrix?
# first, create your data.frame
mydf <- data.frame(a = c(1,2,3), b = c(1,2,3), c = c(1,2,3))
# then, create your model.matrix
mym <- model.matrix(as.formula("~ a + b + c"), mydf)
# how can I normalize the model.matrix?
目前,我必须将我的 model.matrix 转换回 data.frame 以便 运行 我的标准化函数:
normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x))) }
m.norm <- as.data.frame(lapply(m, normalize))
是否可以通过简单地标准化 model.matrix 来避免这一步?
您可以使用 apply
函数对每一列进行标准化,而无需转换为数据框:
apply(mym, 2, normalize)
# (Intercept) a b c
# 1 NaN 0.0 0.0 0.0
# 2 NaN 0.5 0.5 0.5
# 3 NaN 1.0 1.0 1.0
你可能真的想保持截距不变,比如:
cbind(mym[,1,drop=FALSE], apply(mym[,-1], 2, normalize))
# (Intercept) a b c
# 1 1 0.0 0.0 0.0
# 2 1 0.5 0.5 0.5
# 3 1 1.0 1.0 1.0
另一种选择是使用非常有用的 matrixStats
包对其进行矢量化(尽管 TBHapply
通常在矩阵和列上应用时也非常有效)。这样您也可以保留原始数据结构
library(matrixStats)
Max <- colMaxs(mym[, -1])
Min <- colMins(mym[, -1])
mym[, -1] <- (mym[, -1] - Min)/(Max - Min)
mym
# (Intercept) a b c
# 1 1 0.0 0.0 0.0
# 2 1 0.5 0.5 0.5
# 3 1 1.0 1.0 1.0
# attr(,"assign")
# [1] 0 1 2 3
如果你想 "normalize" 从某种意义上说,你可以使用 scale
函数,它将 std.dev 居中并设置为 1。
> scale( mym )
(Intercept) a b c
1 NaN -1 -1 -1
2 NaN 0 0 0
3 NaN 1 1 1
attr(,"assign")
[1] 0 1 2 3
attr(,"scaled:center")
(Intercept) a b c
1 2 2 2
attr(,"scaled:scale")
(Intercept) a b c
0 1 1 1
> mym
(Intercept) a b c
1 1 1 1 1
2 1 2 2 2
3 1 3 3 3
attr(,"assign")
[1] 0 1 2 3
如您所见,当存在 "Intercept" 项时,"normalize" 所有模型矩阵实际上没有意义。所以你可以这样做:
> mym[ , -1 ] <- scale( mym[,-1] )
> mym
(Intercept) a b c
1 1 -1 -1 -1
2 1 0 0 0
3 1 1 1 1
attr(,"assign")
[1] 0 1 2 3
如果您的默认对比选项设置为 "contr.sum" 并且列是因子类型,这实际上是模型矩阵。如果要成为 "normalized" 的变量是因数:
,则这只会被接受为 model.matrix
内部操作
> mym <- model.matrix(as.formula("~ a + b + c"), mydf, contrasts.arg=list(a="contr.sum"))
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) :
contrasts apply only to factors
> mydf <- data.frame(a = factor(c(1,2,3)), b = c(1,2,3), c = c(1,2,3))
> mym <- model.matrix(as.formula("~ a + b + c"), mydf, contrasts.arg=list(a="contr.sum"))
> mym
(Intercept) a1 a2 b c
1 1 1 0 1 1
2 1 0 1 2 2
3 1 -1 -1 3 3
attr(,"assign")
[1] 0 1 1 2 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.sum"
# first, create your data.frame
mydf <- data.frame(a = c(1,2,3), b = c(1,2,3), c = c(1,2,3))
# then, create your model.matrix
mym <- model.matrix(as.formula("~ a + b + c"), mydf)
# how can I normalize the model.matrix?
目前,我必须将我的 model.matrix 转换回 data.frame 以便 运行 我的标准化函数:
normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x))) }
m.norm <- as.data.frame(lapply(m, normalize))
是否可以通过简单地标准化 model.matrix 来避免这一步?
您可以使用 apply
函数对每一列进行标准化,而无需转换为数据框:
apply(mym, 2, normalize)
# (Intercept) a b c
# 1 NaN 0.0 0.0 0.0
# 2 NaN 0.5 0.5 0.5
# 3 NaN 1.0 1.0 1.0
你可能真的想保持截距不变,比如:
cbind(mym[,1,drop=FALSE], apply(mym[,-1], 2, normalize))
# (Intercept) a b c
# 1 1 0.0 0.0 0.0
# 2 1 0.5 0.5 0.5
# 3 1 1.0 1.0 1.0
另一种选择是使用非常有用的 matrixStats
包对其进行矢量化(尽管 TBHapply
通常在矩阵和列上应用时也非常有效)。这样您也可以保留原始数据结构
library(matrixStats)
Max <- colMaxs(mym[, -1])
Min <- colMins(mym[, -1])
mym[, -1] <- (mym[, -1] - Min)/(Max - Min)
mym
# (Intercept) a b c
# 1 1 0.0 0.0 0.0
# 2 1 0.5 0.5 0.5
# 3 1 1.0 1.0 1.0
# attr(,"assign")
# [1] 0 1 2 3
如果你想 "normalize" 从某种意义上说,你可以使用 scale
函数,它将 std.dev 居中并设置为 1。
> scale( mym )
(Intercept) a b c
1 NaN -1 -1 -1
2 NaN 0 0 0
3 NaN 1 1 1
attr(,"assign")
[1] 0 1 2 3
attr(,"scaled:center")
(Intercept) a b c
1 2 2 2
attr(,"scaled:scale")
(Intercept) a b c
0 1 1 1
> mym
(Intercept) a b c
1 1 1 1 1
2 1 2 2 2
3 1 3 3 3
attr(,"assign")
[1] 0 1 2 3
如您所见,当存在 "Intercept" 项时,"normalize" 所有模型矩阵实际上没有意义。所以你可以这样做:
> mym[ , -1 ] <- scale( mym[,-1] )
> mym
(Intercept) a b c
1 1 -1 -1 -1
2 1 0 0 0
3 1 1 1 1
attr(,"assign")
[1] 0 1 2 3
如果您的默认对比选项设置为 "contr.sum" 并且列是因子类型,这实际上是模型矩阵。如果要成为 "normalized" 的变量是因数:
,则这只会被接受为model.matrix
内部操作
> mym <- model.matrix(as.formula("~ a + b + c"), mydf, contrasts.arg=list(a="contr.sum"))
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) :
contrasts apply only to factors
> mydf <- data.frame(a = factor(c(1,2,3)), b = c(1,2,3), c = c(1,2,3))
> mym <- model.matrix(as.formula("~ a + b + c"), mydf, contrasts.arg=list(a="contr.sum"))
> mym
(Intercept) a1 a2 b c
1 1 1 0 1 1
2 1 0 1 2 2
3 1 -1 -1 3 3
attr(,"assign")
[1] 0 1 1 2 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.sum"