如何使用 apply() 对特定列的数据矩阵进行规范化
How to use apply() to normalize datamatrix with respect to specific columns
我正在尝试根据我的控件(R247、R235、R241)对矩阵中的值进行归一化。
我的coldata
是:
Condition Tank
R235 Control T6
R236 LowExposure T6
R239 HighExposure T6
R241 Control T8
R242 LowExposure T8
R245 HighExposure T8
R247 Control T14_3
R248 LowExposure T14_3
R250 HighExposure T14_3
和我的矩阵 mydata
:
R235 R236 R239 R241 R242 R245 R247 R248 R250
ENSDARG00000033160 11.91873 10.899929 10.831388 12.092478 11.564555 10.908011 11.67680 11.168115 10.414632
ENSDARG00000013522 12.39036 11.692673 11.439107 12.440952 11.841307 11.118888 12.13594 11.634806 11.336330
ENSDARG00000103295 10.54697 10.004169 8.753556 10.659075 9.980232 8.511240 11.11711 10.690518 9.240825
ENSDARG00000056765 9.18106 8.488917 7.431641 9.440119 8.830816 7.901337 10.39879 9.899546 8.142807
ENSDARG00000087303 11.07447 10.765197 11.682291 11.010172 10.380666 11.487207 11.05384 10.526109 11.962465
ENSDARG00000018478 11.51562 11.000702 10.382845 11.597848 11.218944 10.185381 11.61043 11.214280 10.614338
我通过以下方式提取对照样本:
x <- which(coldata$Condition %in% "Control")
control <- row.names(coldata[x,])
类似于 z-score 转换我想使用均值和 sd 但仅来自控制组来转换数据集,如 (x - mean[control]) / sd[control] 类似:
function(x){
(x - rowMeans[,control])/apply(matrix[,control],1,sd)
}
然后在 mydata
上使用 apply()
到 运行 就像: apply(mydata, 1, function(x))
但我不知道如何正确地将它写成一个函数通过申请使用。非常感谢任何帮助。谢谢!
也许你可以试试下面的代码
ctr<-c(mydata[,control])
mydata <- (mydata - mean(ctr))/sd(ctr)
这样
> mydata
R235 R236 R239 R241 R242 R245 R247
ENSDARG00000033160 0.7649126 -0.3416566 -0.4161023 0.9536287 0.380225962 -0.3328783 0.5021407
ENSDARG00000013522 1.2771728 0.5193811 0.2439708 1.3321233 0.680819738 -0.1038346 1.0008349
ENSDARG00000103295 -0.7250225 -1.3145850 -2.6729365 -0.6032598 -1.340584125 -2.9361276 -0.1057658
ENSDARG00000056765 -2.2086036 -2.9603737 -4.1087325 -1.9272271 -2.589020615 -3.5985729 -0.8859680
ENSDARG00000087303 -0.1520791 -0.4879955 0.5081047 -0.2219163 -0.905653327 0.2962145 -0.1744864
ENSDARG00000018478 0.3270753 -0.2322021 -0.9032866 0.4163871 0.004841085 -1.1177618 0.4300530
R248 R250
ENSDARG00000033160 -0.0503667586 -0.8687612
ENSDARG00000013522 0.4565289817 0.1323397
ENSDARG00000103295 -0.5691080347 -2.1436899
ENSDARG00000056765 -1.4282211043 -3.3363006
ENSDARG00000087303 -0.7476806273 0.8124153
ENSDARG00000018478 -0.0002247121 -0.6518508
数据
coldata <- structure(list(Condition = c("Control", "LowExposure", "HighExposure",
"Control", "LowExposure", "HighExposure", "Control", "LowExposure",
"HighExposure"), Tank = c("T6", "T6", "T6", "T8", "T8", "T8",
"T14_3", "T14_3", "T14_3")), class = "data.frame", row.names = c("R235",
"R236", "R239", "R241", "R242", "R245", "R247", "R248", "R250"
))
mydata <- structure(c(0.764912614946124, 1.27717284283513, -0.725022482925137,
-2.20860361193701, -0.152079137057719, 0.327075283851119, -0.341656586405671,
0.519381138288775, -1.31458498734254, -2.96037370907202, -0.487995549202655,
-0.232202141300273, -0.416102292318654, 0.243970801913007, -2.67293645010139,
-4.10873247484514, 0.508104744323286, -0.90328660925782, 0.95362874851536,
1.3321232689093, -0.603259802757428, -1.92722706172457, -0.221916314787733,
0.416387104598013, 0.380225961822725, 0.680819737852109, -1.34058412453691,
-2.58902061521661, -0.905653326889375, 0.00484108465005354, -0.332878334043016,
-0.103834591964374, -2.93612761559372, -3.59857285819956, 0.296214545867932,
-1.11776184119785, 0.502140702783651, 1.00083493562074, -0.105765764038217,
-0.885967971059041, -0.174486381086619, 0.430053025314037, -0.0503667586240605,
0.456528981709989, -0.569108034749636, -1.42822110426138, -0.747680627262844,
-0.000224712061085116, -0.868761206158128, 0.132339715167575,
-2.14368994546163, -3.33630057435765, 0.812415320598279, -0.651850829229599
), .Dim = c(6L, 9L), .Dimnames = list(c("ENSDARG00000033160",
"ENSDARG00000013522", "ENSDARG00000103295", "ENSDARG00000056765",
"ENSDARG00000087303", "ENSDARG00000018478"), c("R235", "R236",
"R239", "R241", "R242", "R245", "R247", "R248", "R250")))
@ThomasIsCoding 非常感谢您的帮助!我通过以下方式整理了它:
ctrM <- apply(mydata[,control], 1, FUN = mean)
Sd <- apply(mydata, 1, FUN = sd)
new <- (mydata - ctrM)/Sd #centering around ctrM and scaling with Sd
抱歉,我没有提供有关我的数据结构的信息,但现在可以使用了:)
我还意识到,为了缩放行,我需要对整行使用整体 Sd,而不仅仅是那些平均值。因此,我以行的 Sd 的控件和比例的平均值为中心。现在就可以了。
我正在尝试根据我的控件(R247、R235、R241)对矩阵中的值进行归一化。
我的coldata
是:
Condition Tank
R235 Control T6
R236 LowExposure T6
R239 HighExposure T6
R241 Control T8
R242 LowExposure T8
R245 HighExposure T8
R247 Control T14_3
R248 LowExposure T14_3
R250 HighExposure T14_3
和我的矩阵 mydata
:
R235 R236 R239 R241 R242 R245 R247 R248 R250
ENSDARG00000033160 11.91873 10.899929 10.831388 12.092478 11.564555 10.908011 11.67680 11.168115 10.414632
ENSDARG00000013522 12.39036 11.692673 11.439107 12.440952 11.841307 11.118888 12.13594 11.634806 11.336330
ENSDARG00000103295 10.54697 10.004169 8.753556 10.659075 9.980232 8.511240 11.11711 10.690518 9.240825
ENSDARG00000056765 9.18106 8.488917 7.431641 9.440119 8.830816 7.901337 10.39879 9.899546 8.142807
ENSDARG00000087303 11.07447 10.765197 11.682291 11.010172 10.380666 11.487207 11.05384 10.526109 11.962465
ENSDARG00000018478 11.51562 11.000702 10.382845 11.597848 11.218944 10.185381 11.61043 11.214280 10.614338
我通过以下方式提取对照样本:
x <- which(coldata$Condition %in% "Control")
control <- row.names(coldata[x,])
类似于 z-score 转换我想使用均值和 sd 但仅来自控制组来转换数据集,如 (x - mean[control]) / sd[control] 类似:
function(x){
(x - rowMeans[,control])/apply(matrix[,control],1,sd)
}
然后在 mydata
上使用 apply()
到 运行 就像: apply(mydata, 1, function(x))
但我不知道如何正确地将它写成一个函数通过申请使用。非常感谢任何帮助。谢谢!
也许你可以试试下面的代码
ctr<-c(mydata[,control])
mydata <- (mydata - mean(ctr))/sd(ctr)
这样
> mydata
R235 R236 R239 R241 R242 R245 R247
ENSDARG00000033160 0.7649126 -0.3416566 -0.4161023 0.9536287 0.380225962 -0.3328783 0.5021407
ENSDARG00000013522 1.2771728 0.5193811 0.2439708 1.3321233 0.680819738 -0.1038346 1.0008349
ENSDARG00000103295 -0.7250225 -1.3145850 -2.6729365 -0.6032598 -1.340584125 -2.9361276 -0.1057658
ENSDARG00000056765 -2.2086036 -2.9603737 -4.1087325 -1.9272271 -2.589020615 -3.5985729 -0.8859680
ENSDARG00000087303 -0.1520791 -0.4879955 0.5081047 -0.2219163 -0.905653327 0.2962145 -0.1744864
ENSDARG00000018478 0.3270753 -0.2322021 -0.9032866 0.4163871 0.004841085 -1.1177618 0.4300530
R248 R250
ENSDARG00000033160 -0.0503667586 -0.8687612
ENSDARG00000013522 0.4565289817 0.1323397
ENSDARG00000103295 -0.5691080347 -2.1436899
ENSDARG00000056765 -1.4282211043 -3.3363006
ENSDARG00000087303 -0.7476806273 0.8124153
ENSDARG00000018478 -0.0002247121 -0.6518508
数据
coldata <- structure(list(Condition = c("Control", "LowExposure", "HighExposure",
"Control", "LowExposure", "HighExposure", "Control", "LowExposure",
"HighExposure"), Tank = c("T6", "T6", "T6", "T8", "T8", "T8",
"T14_3", "T14_3", "T14_3")), class = "data.frame", row.names = c("R235",
"R236", "R239", "R241", "R242", "R245", "R247", "R248", "R250"
))
mydata <- structure(c(0.764912614946124, 1.27717284283513, -0.725022482925137,
-2.20860361193701, -0.152079137057719, 0.327075283851119, -0.341656586405671,
0.519381138288775, -1.31458498734254, -2.96037370907202, -0.487995549202655,
-0.232202141300273, -0.416102292318654, 0.243970801913007, -2.67293645010139,
-4.10873247484514, 0.508104744323286, -0.90328660925782, 0.95362874851536,
1.3321232689093, -0.603259802757428, -1.92722706172457, -0.221916314787733,
0.416387104598013, 0.380225961822725, 0.680819737852109, -1.34058412453691,
-2.58902061521661, -0.905653326889375, 0.00484108465005354, -0.332878334043016,
-0.103834591964374, -2.93612761559372, -3.59857285819956, 0.296214545867932,
-1.11776184119785, 0.502140702783651, 1.00083493562074, -0.105765764038217,
-0.885967971059041, -0.174486381086619, 0.430053025314037, -0.0503667586240605,
0.456528981709989, -0.569108034749636, -1.42822110426138, -0.747680627262844,
-0.000224712061085116, -0.868761206158128, 0.132339715167575,
-2.14368994546163, -3.33630057435765, 0.812415320598279, -0.651850829229599
), .Dim = c(6L, 9L), .Dimnames = list(c("ENSDARG00000033160",
"ENSDARG00000013522", "ENSDARG00000103295", "ENSDARG00000056765",
"ENSDARG00000087303", "ENSDARG00000018478"), c("R235", "R236",
"R239", "R241", "R242", "R245", "R247", "R248", "R250")))
@ThomasIsCoding 非常感谢您的帮助!我通过以下方式整理了它:
ctrM <- apply(mydata[,control], 1, FUN = mean)
Sd <- apply(mydata, 1, FUN = sd)
new <- (mydata - ctrM)/Sd #centering around ctrM and scaling with Sd
抱歉,我没有提供有关我的数据结构的信息,但现在可以使用了:) 我还意识到,为了缩放行,我需要对整行使用整体 Sd,而不仅仅是那些平均值。因此,我以行的 Sd 的控件和比例的平均值为中心。现在就可以了。