如何在两个不同的数据帧上运行 cor.test()

Question

我想运行 cor.test() 在两个单独的数据帧上，但我不确定如何进行。

我有两个具有相同列（患者）但不同行（分别为细菌和基因）的示例数据框：

	1C	1L	2C	2L
Staphylococcus	10	400	20	600
Enterococcus	15	607	39	800

	1C	1L	2C	2L
IL4	60	300	90	450
IL8	30	600	54	750
TNFA	89	450	96	600

我想运行在两个数据帧之间进行 spearman 相关性测试，以确定细菌数量（丰度）是否与基因表达增加有关。所以基本上我想针对所有基因测试所有细菌。

我试过了运行宁:

cor.test(df1, df2, method = "spearman", alternative = c("two.sided"))

但是我得到这个错误：

Error in cor.test.default(df1, df2, method = "spearman",  : 
  'x' and 'y' must have the same length

Answer 1

我认为您遇到的问题是当函数采用相同长度的 x 和 y 向量时试图运行三个变量的相关性。

为了比较受试者的所有基因与所有细菌计数，您必须将它们转化为该函数可以使用的表格格式。您可以为此使用 tidyr 中的 pivot_longer()，然后合并以加入主题。

Bacteria <- data.frame(name=c("Staph", "Enter"), C1=c(10,15), L1=c(400,607), C2=c(20,39), L2=c(600, 800))
Genes <- data.frame(name=c("IL4", "IL8", "TNFA"), C1=c(60,30,89), L1=c(300,600,450), C2=c(90,54,96), L2=c(450,750,600))

Bacteria <- pivot_longer(Bacteria, -1, names_to = "Subject", values_to="Counts")
Genes <- pivot_longer(Genes, -1, names_to = "Subject", values_to="Counts")

FullSet <- merge(Bacteria, Genes, by="Subject", suffixes = c(".Bac", ".Gene"))

cor.test(FullSet$Counts.Bac, FullSet$Counts.Gene, method="spearman", alternative=c("two.sided"))

编辑以使用 p-value 矩阵

创建一个漂亮的 corrplot

library(tidyverse)
library(tidyr)

MakeStats <- function(x) {

result <- cor.test(x$Counts.Bac, x$Counts.Gene, method="spearman", alternative=c("two.sided"))
return(data.frame(Bacteria=x$name.Bac[1], Gene=x$name.Gene[1],    Estimate=result$estimate[1], PValue=result$p.value, row.names=NULL))
}

ListOfTests <- split(FullSet, list(FullSet$name.Bac, FullSet$name.Gene))
Results <- bind_rows(lapply(ListOfTests, MakeStats))
PValues <- Results[,-3]
Estimates <- Results[,-4]
Estimates <- pivot_wider(Estimates, id_cols="Gene", names_from="Bacteria", values_from="Estimate")
PValues <- pivot_wider(PValues, id_cols="Gene", names_from="Bacteria", values_from="PValue")

EstMatrix <- as.matrix(data.frame(Estimates[-1], row.names = Estimates$Gene))
PMatrix <- as.matrix(data.frame(PValues[-1], row.names = PValues$Gene))

corrplot(EstMatrix, method="square", p.mat = PMatrix, pch=8)

如何在两个不同的数据帧上运行 cor.test()

How to run cor.test() on two different dataframes

statistics

r

如何在两个不同的数据帧上 运行 cor.test()

How to run cor.test() on two different dataframes

statistics

r

如何在两个不同的数据帧上运行 cor.test()