R package huge - 为什么采用偏相关会改变相关的符号?
R package huge - why does taking partial correlation changes the sign of the correlation?
我一直在使用 R 包 huge 来创建对数转换基因表达数据的高斯图形模型(我有 367 个基因和 150 个样本)。但是,如果我在应用 huge 之前获取两个基因的相关性,我会发现正相关,而当我查看 huge 输出的部分相关性时,我会看到符号反转。也就是说,如果两个基因显示出正边际相关性,我会在使用 huge 时看到负偏相关性。我知道偏相关应该会降低相关性,但我对它如何完全改变符号感到困惑。有谁知道为什么这可能或能够将我指向不同的包?
library(huge)
#The TCGA Pancreatic Illumina sequencing with patients as columns
#and genes as rows
rnaseq.read = as.matrix(read.table("PAAD_HiSeqV2.gz", header = T, row.names = 1, sep="\t"))
#This gives a list of the 360 genes we're interested in
genes_list = read.table("DE_gene_high_v_low.txt",header=T,sep="\t")
rnaseq = rnaseq.read[(rownames(rnaseq.read) %in% gene_list),]
#An example being RARG which has been shown to bind to a KCNN4 promoter
#likely leading to increase in KCNN4
#When I plot this I see a positive slope as expected
plot(rnaseq[,"RARG"],rnaseq[,"KCNN4"])
PDAC.all = huge(t(rnaseq), method="glasso")
PDAC.all.select = huge.select(PDAC.all, criterion="stars")
PDAC_icov = as.matrix(PDAC.all.select$opt.icov)
ig_PDAC = graph.adjacency(PDAC_icov, mode="undirected", weighted=TRUE, add.colnames = "name")
#However, when I look at the partial correlation matrix, the correlation is now -0.201
PDAC_icov["RARG","KCNN4"]
我希望偏相关仍然是正的,但它是负的。我在其他配对中也看到了这一点,其中具有正边际相关性的配对具有负偏相关性,反之亦然。
"glasso"
方法returns变量opt.icov
中的精度矩阵。要获得偏相关估计,您需要将精度转换为相关矩阵并更改其符号(有关更多详细信息,请参见维基百科页面)。
在您的例子中,PDAC.all.select$opt.icov
表示逆协方差(精度)矩阵。
要得到偏相关,需要改变它的符号并除以对角线元素的平方根的乘积:
partial.corr = -cov2cor(PDAC.all.select$opt.icov)
我一直在使用 R 包 huge 来创建对数转换基因表达数据的高斯图形模型(我有 367 个基因和 150 个样本)。但是,如果我在应用 huge 之前获取两个基因的相关性,我会发现正相关,而当我查看 huge 输出的部分相关性时,我会看到符号反转。也就是说,如果两个基因显示出正边际相关性,我会在使用 huge 时看到负偏相关性。我知道偏相关应该会降低相关性,但我对它如何完全改变符号感到困惑。有谁知道为什么这可能或能够将我指向不同的包?
library(huge)
#The TCGA Pancreatic Illumina sequencing with patients as columns
#and genes as rows
rnaseq.read = as.matrix(read.table("PAAD_HiSeqV2.gz", header = T, row.names = 1, sep="\t"))
#This gives a list of the 360 genes we're interested in
genes_list = read.table("DE_gene_high_v_low.txt",header=T,sep="\t")
rnaseq = rnaseq.read[(rownames(rnaseq.read) %in% gene_list),]
#An example being RARG which has been shown to bind to a KCNN4 promoter
#likely leading to increase in KCNN4
#When I plot this I see a positive slope as expected
plot(rnaseq[,"RARG"],rnaseq[,"KCNN4"])
PDAC.all = huge(t(rnaseq), method="glasso")
PDAC.all.select = huge.select(PDAC.all, criterion="stars")
PDAC_icov = as.matrix(PDAC.all.select$opt.icov)
ig_PDAC = graph.adjacency(PDAC_icov, mode="undirected", weighted=TRUE, add.colnames = "name")
#However, when I look at the partial correlation matrix, the correlation is now -0.201
PDAC_icov["RARG","KCNN4"]
我希望偏相关仍然是正的,但它是负的。我在其他配对中也看到了这一点,其中具有正边际相关性的配对具有负偏相关性,反之亦然。
"glasso"
方法returns变量opt.icov
中的精度矩阵。要获得偏相关估计,您需要将精度转换为相关矩阵并更改其符号(有关更多详细信息,请参见维基百科页面)。
在您的例子中,PDAC.all.select$opt.icov
表示逆协方差(精度)矩阵。
要得到偏相关,需要改变它的符号并除以对角线元素的平方根的乘积:
partial.corr = -cov2cor(PDAC.all.select$opt.icov)