从R中的数据帧计算平均成对Pearson相关系数
Calculating average pairwise Pearson Correlation Coefficients from Data Frame in R
假设我有以下向量:
IDs_Complex_1 <- c("orangutan", "panda", "sloth", "mountain_gorilla", "dolphin", "snake")
IDs_Complex_2 <- c("bat", "penguin", "goat", "elephant", "tiger")
我想计算组织列中垂直获取的值之间的成对 Pearson 相关系数,对于每个向量,在以下数据框中。然后我希望找到所有可能组合的平均 PCC。
Complex_ID Tissue_X Tissue_Y Tissue_Z
orangutan 5 6 7
panda 6 7 8
sloth 7 8 9
mountain_gorilla 100 60 50
dolphin 115 62 51
snake 130 59 67
bat 2 6 7
penguin 15 11 12
goat 22 23 86
elephant 14 22 109
tiger 0 1 7
为了说明复数 1,我想计算:
PCC_1 <- PCC of (5, 6, 7, 100, 115, 130) and (6, 7, 8, 60, 62, 59)
PCC_2 <- PCC of (5, 6, 7, 100, 115, 130) and (7, 8, 9, 50, 51, 67)
PCC_3 <- PCC of (6, 7, 8, 60, 62, 59) and (7, 8, 9, 50, 51, 67)
我想计算
的平均值
(PCC_1, PCC_2, PCC_3) = ?
但是,如果我有大约 20 个组织柱,而那里会有 20!/2!18! = 成对相关系数的 190 种组合(无重复)。我将如何编码?
非常感谢!
阿比盖尔
如果 df
是你的 data.frame:
df = structure(list(Complex_ID = structure(c(6L, 7L, 9L, 5L, 2L, 10L,
1L, 8L, 4L, 3L, 11L), .Label = c("bat", "dolphin", "elephant",
"goat", "mountain_gorilla", "orangutan", "panda", "penguin",
"sloth", "snake", "tiger"), class = "factor"), Tissue_X = c(5L,
6L, 7L, 100L, 115L, 130L, 2L, 15L, 22L, 14L, 0L), Tissue_Y = c(6L,
7L, 8L, 60L, 62L, 59L, 6L, 11L, 23L, 22L, 1L), Tissue_Z = c(7L,
8L, 9L, 50L, 51L, 67L, 7L, 12L, 86L, 109L, 7L)), class = "data.frame", row.names = c(NA,
-11L))
你可以这样做:
cor(df[,-1])
Tissue_X Tissue_Y Tissue_Z
Tissue_X 1.0000000 0.9748668 0.4119840
Tissue_Y 0.9748668 1.0000000 0.5440719
Tissue_Z 0.4119840 0.5440719 1.0000000
假设我有以下向量:
IDs_Complex_1 <- c("orangutan", "panda", "sloth", "mountain_gorilla", "dolphin", "snake")
IDs_Complex_2 <- c("bat", "penguin", "goat", "elephant", "tiger")
我想计算组织列中垂直获取的值之间的成对 Pearson 相关系数,对于每个向量,在以下数据框中。然后我希望找到所有可能组合的平均 PCC。
Complex_ID Tissue_X Tissue_Y Tissue_Z
orangutan 5 6 7
panda 6 7 8
sloth 7 8 9
mountain_gorilla 100 60 50
dolphin 115 62 51
snake 130 59 67
bat 2 6 7
penguin 15 11 12
goat 22 23 86
elephant 14 22 109
tiger 0 1 7
为了说明复数 1,我想计算:
PCC_1 <- PCC of (5, 6, 7, 100, 115, 130) and (6, 7, 8, 60, 62, 59)
PCC_2 <- PCC of (5, 6, 7, 100, 115, 130) and (7, 8, 9, 50, 51, 67)
PCC_3 <- PCC of (6, 7, 8, 60, 62, 59) and (7, 8, 9, 50, 51, 67)
我想计算
的平均值 (PCC_1, PCC_2, PCC_3) = ?
但是,如果我有大约 20 个组织柱,而那里会有 20!/2!18! = 成对相关系数的 190 种组合(无重复)。我将如何编码?
非常感谢!
阿比盖尔
如果 df
是你的 data.frame:
df = structure(list(Complex_ID = structure(c(6L, 7L, 9L, 5L, 2L, 10L,
1L, 8L, 4L, 3L, 11L), .Label = c("bat", "dolphin", "elephant",
"goat", "mountain_gorilla", "orangutan", "panda", "penguin",
"sloth", "snake", "tiger"), class = "factor"), Tissue_X = c(5L,
6L, 7L, 100L, 115L, 130L, 2L, 15L, 22L, 14L, 0L), Tissue_Y = c(6L,
7L, 8L, 60L, 62L, 59L, 6L, 11L, 23L, 22L, 1L), Tissue_Z = c(7L,
8L, 9L, 50L, 51L, 67L, 7L, 12L, 86L, 109L, 7L)), class = "data.frame", row.names = c(NA,
-11L))
你可以这样做:
cor(df[,-1])
Tissue_X Tissue_Y Tissue_Z
Tissue_X 1.0000000 0.9748668 0.4119840
Tissue_Y 0.9748668 1.0000000 0.5440719
Tissue_Z 0.4119840 0.5440719 1.0000000