如何获得 R 中两个数据集之间相关性的显着 p 值?

How to get the significant pvalue for the correlation between two datasets in R?

我在两个不同长度的数据框中有两个数据集。我正在尝试计算两个数据帧之间的相关性。

dput(data1)

structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4", 
"Sample5", "Sample6"), Gene1 = c(4.671302252, 4.047214831, 5.466179936, 
5.283893539, 4.755404471, 2.469735829), Gene2 = c(-2.597625581, 
-0.545400583, -1.760948089, -0.025914727, -4.506701651, -1.92244677
), Gene3 = c(4.518177333, 4.383227672, 5.175808561, 4.798211945, 
5.381003512, 3.755721129), Gene4 = c(6.176159142, 5.851062234, 
4.001055798, 4.923729009, 5.438633666, 5.398964894), Gene5 = c(5.292719633, 
5.118130958, 4.161460436, 3.802601359, 5.364577948, 4.084908402
), Gene6 = c(3.018843066, 4.250663843, 3.559959674, 5.120952707, 
3.00470538, 3.478294452), Gene7 = c(5.523144074, 5.213819135, 
6.910541399, 7.76544151, 6.71295831, 4.135405512), Gene8 = c(4.280905818, 
4.894985917, 5.691758253, 5.801298455, 5.801366783, 4.914515168
)), class = "data.frame", row.names = c(NA, -6L))

data2 如下所示:

dput(data2)

structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4", 
"Sample5", "Sample6"), B.cell = c(0.077235161, 0.083169074, 0.131076839, 
0.091987739, 0.104246383, 0.137797284), T.cell.CD4. = c(0.089704844, 
0.116399012, 0.179227307, 0.134534096, 0.143346971, 0.125001186
), T.cell.CD8. = c(0.19430495, 0.203269704, 0.211582459, 0.198538867, 
0.232369158, 0.165748405), Neutrophil = c(0.119542496, 0.130890002, 
0.130346653, 0.126705533, 0.119481998, 0.158515447), Macrophage = c(0.013270172, 
0.049023848, 0.036012432, 0.056938321, 0.149161503, 0.024486122
), Myeloid.dendritic.cell = c(0.437531368, 0.478689299, 0.493095004, 
0.503436218, 0.517459595, 0.550177096)), class = "data.frame", row.names = c(NA, 
-6L))

我得到了cor的相关系数。

data1 <- data.frame(data1[,-1], row.names = data1[,1])
data2 <- data.frame(data2[,-1], row.names = data2[,1])
cor(data1, data2, method="spearman")

但这并没有给出任何 p 值。通过谷歌搜索,我找到了 cor.test。这给了我一个错误。

cor.test(data1, data2, method = "spearman")
Error in cor.test.default(data1, data2, method = "spearman") : 
  'x' and 'y' must have the same length

有没有办法通过 cor 函数获取 p 值?或如何在该数据上应用 cor.test 函数以获得相关系数和 p 值?

编辑:按照建议我尝试使用 rcorr 但它看起来像下面这样:

res2 <- rcorr(as.matrix(data1, data2), type="spearman")
Warning message:
In if (rownames.force %in% FALSE) NULL else if (rownames.force %in%  :
  the condition has length > 1 and only the first element will be used

并且输出:

但是,我想要两个数据帧之间的相关性。

您可以使用 rcorr,但它提供的功能超出您的需要。只提取你想要的。

out <- rcorr(as.matrix(data1), as.matrix(data2), type="spearman")

out 是一个包含三个矩阵的列表:rnp,其中包含相关性、观测数和所有的 p-values两个矩阵中的列。要得到你想要的,只需提取你需要的 columns/rows:

out$r[1:5, 6:8]
#       Neutrophil Macrophage Myeloid.dendritic.cell
# Gene4    -0.2571   -0.25714               -0.48571
# Gene5    -0.6000    0.08571               -0.31429
# Gene6     0.4857    0.14286               -0.14286
# Gene7    -0.5429    0.48571               -0.08571
# Gene8    -0.3714    0.82857                0.65714
out$P[1:5, 6:8]
#       Neutrophil Macrophage Myeloid.dendritic.cell
# Gene4     0.6228    0.62279                 0.3287
# Gene5     0.2080    0.87174                 0.5441
# Gene6     0.3287    0.78717                 0.7872
# Gene7     0.2657    0.32872                 0.8717
# Gene8     0.4685    0.04156                 0.1562