如何获得 R 中两个数据集之间相关性的显着 p 值?
How to get the significant pvalue for the correlation between two datasets in R?
我在两个不同长度的数据框中有两个数据集。我正在尝试计算两个数据帧之间的相关性。
dput(data1)
structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4",
"Sample5", "Sample6"), Gene1 = c(4.671302252, 4.047214831, 5.466179936,
5.283893539, 4.755404471, 2.469735829), Gene2 = c(-2.597625581,
-0.545400583, -1.760948089, -0.025914727, -4.506701651, -1.92244677
), Gene3 = c(4.518177333, 4.383227672, 5.175808561, 4.798211945,
5.381003512, 3.755721129), Gene4 = c(6.176159142, 5.851062234,
4.001055798, 4.923729009, 5.438633666, 5.398964894), Gene5 = c(5.292719633,
5.118130958, 4.161460436, 3.802601359, 5.364577948, 4.084908402
), Gene6 = c(3.018843066, 4.250663843, 3.559959674, 5.120952707,
3.00470538, 3.478294452), Gene7 = c(5.523144074, 5.213819135,
6.910541399, 7.76544151, 6.71295831, 4.135405512), Gene8 = c(4.280905818,
4.894985917, 5.691758253, 5.801298455, 5.801366783, 4.914515168
)), class = "data.frame", row.names = c(NA, -6L))
data2 如下所示:
dput(data2)
structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4",
"Sample5", "Sample6"), B.cell = c(0.077235161, 0.083169074, 0.131076839,
0.091987739, 0.104246383, 0.137797284), T.cell.CD4. = c(0.089704844,
0.116399012, 0.179227307, 0.134534096, 0.143346971, 0.125001186
), T.cell.CD8. = c(0.19430495, 0.203269704, 0.211582459, 0.198538867,
0.232369158, 0.165748405), Neutrophil = c(0.119542496, 0.130890002,
0.130346653, 0.126705533, 0.119481998, 0.158515447), Macrophage = c(0.013270172,
0.049023848, 0.036012432, 0.056938321, 0.149161503, 0.024486122
), Myeloid.dendritic.cell = c(0.437531368, 0.478689299, 0.493095004,
0.503436218, 0.517459595, 0.550177096)), class = "data.frame", row.names = c(NA,
-6L))
我得到了cor
的相关系数。
data1 <- data.frame(data1[,-1], row.names = data1[,1])
data2 <- data.frame(data2[,-1], row.names = data2[,1])
cor(data1, data2, method="spearman")
但这并没有给出任何 p 值。通过谷歌搜索,我找到了 cor.test
。这给了我一个错误。
cor.test(data1, data2, method = "spearman")
Error in cor.test.default(data1, data2, method = "spearman") :
'x' and 'y' must have the same length
有没有办法通过 cor
函数获取 p 值?或如何在该数据上应用 cor.test
函数以获得相关系数和 p 值?
编辑:按照建议我尝试使用 rcorr
但它看起来像下面这样:
res2 <- rcorr(as.matrix(data1, data2), type="spearman")
Warning message:
In if (rownames.force %in% FALSE) NULL else if (rownames.force %in% :
the condition has length > 1 and only the first element will be used
并且输出:
但是,我想要两个数据帧之间的相关性。
您可以使用 rcorr
,但它提供的功能超出您的需要。只提取你想要的。
out <- rcorr(as.matrix(data1), as.matrix(data2), type="spearman")
out
是一个包含三个矩阵的列表:r
、n
和 p
,其中包含相关性、观测数和所有的 p-values两个矩阵中的列。要得到你想要的,只需提取你需要的 columns/rows:
out$r[1:5, 6:8]
# Neutrophil Macrophage Myeloid.dendritic.cell
# Gene4 -0.2571 -0.25714 -0.48571
# Gene5 -0.6000 0.08571 -0.31429
# Gene6 0.4857 0.14286 -0.14286
# Gene7 -0.5429 0.48571 -0.08571
# Gene8 -0.3714 0.82857 0.65714
out$P[1:5, 6:8]
# Neutrophil Macrophage Myeloid.dendritic.cell
# Gene4 0.6228 0.62279 0.3287
# Gene5 0.2080 0.87174 0.5441
# Gene6 0.3287 0.78717 0.7872
# Gene7 0.2657 0.32872 0.8717
# Gene8 0.4685 0.04156 0.1562
我在两个不同长度的数据框中有两个数据集。我正在尝试计算两个数据帧之间的相关性。
dput(data1)
structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4",
"Sample5", "Sample6"), Gene1 = c(4.671302252, 4.047214831, 5.466179936,
5.283893539, 4.755404471, 2.469735829), Gene2 = c(-2.597625581,
-0.545400583, -1.760948089, -0.025914727, -4.506701651, -1.92244677
), Gene3 = c(4.518177333, 4.383227672, 5.175808561, 4.798211945,
5.381003512, 3.755721129), Gene4 = c(6.176159142, 5.851062234,
4.001055798, 4.923729009, 5.438633666, 5.398964894), Gene5 = c(5.292719633,
5.118130958, 4.161460436, 3.802601359, 5.364577948, 4.084908402
), Gene6 = c(3.018843066, 4.250663843, 3.559959674, 5.120952707,
3.00470538, 3.478294452), Gene7 = c(5.523144074, 5.213819135,
6.910541399, 7.76544151, 6.71295831, 4.135405512), Gene8 = c(4.280905818,
4.894985917, 5.691758253, 5.801298455, 5.801366783, 4.914515168
)), class = "data.frame", row.names = c(NA, -6L))
data2 如下所示:
dput(data2)
structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4",
"Sample5", "Sample6"), B.cell = c(0.077235161, 0.083169074, 0.131076839,
0.091987739, 0.104246383, 0.137797284), T.cell.CD4. = c(0.089704844,
0.116399012, 0.179227307, 0.134534096, 0.143346971, 0.125001186
), T.cell.CD8. = c(0.19430495, 0.203269704, 0.211582459, 0.198538867,
0.232369158, 0.165748405), Neutrophil = c(0.119542496, 0.130890002,
0.130346653, 0.126705533, 0.119481998, 0.158515447), Macrophage = c(0.013270172,
0.049023848, 0.036012432, 0.056938321, 0.149161503, 0.024486122
), Myeloid.dendritic.cell = c(0.437531368, 0.478689299, 0.493095004,
0.503436218, 0.517459595, 0.550177096)), class = "data.frame", row.names = c(NA,
-6L))
我得到了cor
的相关系数。
data1 <- data.frame(data1[,-1], row.names = data1[,1])
data2 <- data.frame(data2[,-1], row.names = data2[,1])
cor(data1, data2, method="spearman")
但这并没有给出任何 p 值。通过谷歌搜索,我找到了 cor.test
。这给了我一个错误。
cor.test(data1, data2, method = "spearman")
Error in cor.test.default(data1, data2, method = "spearman") :
'x' and 'y' must have the same length
有没有办法通过 cor
函数获取 p 值?或如何在该数据上应用 cor.test
函数以获得相关系数和 p 值?
编辑:按照建议我尝试使用 rcorr
但它看起来像下面这样:
res2 <- rcorr(as.matrix(data1, data2), type="spearman")
Warning message:
In if (rownames.force %in% FALSE) NULL else if (rownames.force %in% :
the condition has length > 1 and only the first element will be used
并且输出:
但是,我想要两个数据帧之间的相关性。
您可以使用 rcorr
,但它提供的功能超出您的需要。只提取你想要的。
out <- rcorr(as.matrix(data1), as.matrix(data2), type="spearman")
out
是一个包含三个矩阵的列表:r
、n
和 p
,其中包含相关性、观测数和所有的 p-values两个矩阵中的列。要得到你想要的,只需提取你需要的 columns/rows:
out$r[1:5, 6:8]
# Neutrophil Macrophage Myeloid.dendritic.cell
# Gene4 -0.2571 -0.25714 -0.48571
# Gene5 -0.6000 0.08571 -0.31429
# Gene6 0.4857 0.14286 -0.14286
# Gene7 -0.5429 0.48571 -0.08571
# Gene8 -0.3714 0.82857 0.65714
out$P[1:5, 6:8]
# Neutrophil Macrophage Myeloid.dendritic.cell
# Gene4 0.6228 0.62279 0.3287
# Gene5 0.2080 0.87174 0.5441
# Gene6 0.3287 0.78717 0.7872
# Gene7 0.2657 0.32872 0.8717
# Gene8 0.4685 0.04156 0.1562