从相关矩阵中提取某些值
Extract certain values out of a correlation matrix
有没有办法从相关矩阵中分散相关系数?
假设我有一个包含 3 个变量(a、b、c)的数据集,我想计算它们之间的相关性。
和
df <- data.frame(a <- c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23),
b <- c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43),
c <- c(13, 14, 14, 14, 15, 17, 18, 19, 22, 20, 24, 26),
d <- c(6, 6, 7, 8, 8, 8, 7, 6, 5, 3, 3, 2))
和
cor(df[, c('a', 'b', 'c')])
我会得到一个相关矩阵:
a b c
a 1.0000000 0.9279869 0.9604329
b 0.9279869 1.0000000 0.8942139
c 0.9604329 0.8942139 1.0000000
有没有办法像这样显示结果:
- a 和 b 之间的相关性是:0.9279869。
- a 和 c 之间的相关性是:0.9604329。
- b 和 c 之间的相关性是:0.8942139:
?
我的相关矩阵明显更大(约 300 个条目),我需要一种方法来仅分散对我重要的值。
谢谢。
你可以的,
df1 = cor(df[, c('a', 'b', 'c')])
df1 = as.data.frame(as.table(df1))
df1$Freq = round(df1$Freq,2)
df2 = subset(df1, (as.character(df1$Var1) != as.character(df1$Var2)))
df2$res = paste('Correlation between', df2$Var1, 'and', df2$Var2, 'is', df2$Freq)
Var1 Var2 Freq res
2 b a 0.93 Correlation between b and a is 0.93
3 c a 0.96 Correlation between c and a is 0.96
4 a b 0.93 Correlation between a and b is 0.93
6 c b 0.89 Correlation between c and b is 0.89
7 a c 0.96 Correlation between a and c is 0.96
8 b c 0.89 Correlation between b and c is 0.89
这里有另一种想法,即重塑为长格式,即
tidyr::pivot_longer(tibble::rownames_to_column(as.data.frame(cor(df[, c('a', 'b', 'c')])), var = 'rn'), -1)
# A tibble: 9 x 3
rn name value
<chr> <chr> <dbl>
1 a a 1
2 a b 0.928
3 a c 0.960
4 b a 0.928
5 b b 1
6 b c 0.894
7 c a 0.960
8 c b 0.894
9 c c 1
使用 reshape2 和 melt
df <- data.frame("a" = c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23),
"b" = c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43),
"c" = c(13, 14, 14, 14, 15, 17, 18, 19, 22, 20, 24, 26),
"d" = c(6, 6, 7, 8, 8, 8, 7, 6, 5, 3, 3, 2))
tmp=cor(df[, c('a', 'b', 'c')])
tmp[lower.tri(tmp)]=NA
diag(tmp)=NA
library(reshape2)
na.omit(melt(tmp))
导致
Var1 Var2 value
4 a b 0.9279869
7 a c 0.9604329
8 b c 0.8942139
也许你可以试试 as.table
+ as.data.frame
> as.data.frame(as.table(cor(df[, c("a", "b", "c")])))
Var1 Var2 Freq
1 a a 1.0000000
2 b a 0.9279869
3 c a 0.9604329
4 a b 0.9279869
5 b b 1.0000000
6 c b 0.8942139
7 a c 0.9604329
8 b c 0.8942139
9 c c 1.0000000
有没有办法从相关矩阵中分散相关系数?
假设我有一个包含 3 个变量(a、b、c)的数据集,我想计算它们之间的相关性。
和
df <- data.frame(a <- c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23),
b <- c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43),
c <- c(13, 14, 14, 14, 15, 17, 18, 19, 22, 20, 24, 26),
d <- c(6, 6, 7, 8, 8, 8, 7, 6, 5, 3, 3, 2))
和
cor(df[, c('a', 'b', 'c')])
我会得到一个相关矩阵:
a b c
a 1.0000000 0.9279869 0.9604329
b 0.9279869 1.0000000 0.8942139
c 0.9604329 0.8942139 1.0000000
有没有办法像这样显示结果:
- a 和 b 之间的相关性是:0.9279869。
- a 和 c 之间的相关性是:0.9604329。
- b 和 c 之间的相关性是:0.8942139:
?
我的相关矩阵明显更大(约 300 个条目),我需要一种方法来仅分散对我重要的值。
谢谢。
你可以的,
df1 = cor(df[, c('a', 'b', 'c')])
df1 = as.data.frame(as.table(df1))
df1$Freq = round(df1$Freq,2)
df2 = subset(df1, (as.character(df1$Var1) != as.character(df1$Var2)))
df2$res = paste('Correlation between', df2$Var1, 'and', df2$Var2, 'is', df2$Freq)
Var1 Var2 Freq res
2 b a 0.93 Correlation between b and a is 0.93
3 c a 0.96 Correlation between c and a is 0.96
4 a b 0.93 Correlation between a and b is 0.93
6 c b 0.89 Correlation between c and b is 0.89
7 a c 0.96 Correlation between a and c is 0.96
8 b c 0.89 Correlation between b and c is 0.89
这里有另一种想法,即重塑为长格式,即
tidyr::pivot_longer(tibble::rownames_to_column(as.data.frame(cor(df[, c('a', 'b', 'c')])), var = 'rn'), -1)
# A tibble: 9 x 3
rn name value
<chr> <chr> <dbl>
1 a a 1
2 a b 0.928
3 a c 0.960
4 b a 0.928
5 b b 1
6 b c 0.894
7 c a 0.960
8 c b 0.894
9 c c 1
使用 reshape2 和 melt
df <- data.frame("a" = c(2, 3, 3, 5, 6, 9, 14, 15, 19, 21, 22, 23),
"b" = c(23, 24, 24, 23, 17, 28, 38, 34, 35, 39, 41, 43),
"c" = c(13, 14, 14, 14, 15, 17, 18, 19, 22, 20, 24, 26),
"d" = c(6, 6, 7, 8, 8, 8, 7, 6, 5, 3, 3, 2))
tmp=cor(df[, c('a', 'b', 'c')])
tmp[lower.tri(tmp)]=NA
diag(tmp)=NA
library(reshape2)
na.omit(melt(tmp))
导致
Var1 Var2 value
4 a b 0.9279869
7 a c 0.9604329
8 b c 0.8942139
也许你可以试试 as.table
+ as.data.frame
> as.data.frame(as.table(cor(df[, c("a", "b", "c")])))
Var1 Var2 Freq
1 a a 1.0000000
2 b a 0.9279869
3 c a 0.9604329
4 a b 0.9279869
5 b b 1.0000000
6 c b 0.8942139
7 a c 0.9604329
8 b c 0.8942139
9 c c 1.0000000