调整 ggpairs() 或相关矩阵的更好解决方案
Tweaking ggpairs() or a better solution to a correlation matrix
我正在尝试在我的 X 和 Y 变量之间创建一个相关矩阵,并以漂亮的图形显示此信息。我目前正在使用 GGally
包中的 ggpairs()
,但如果有更好的方法,那么我很乐意尝试新的东西。数字应该是:
-在 X 和 Y 变量之间拟合线性回归模型(使用 lm
)
-显示带有回归线的散点图
-显示测定系数(R2)
-按组映射 points/lines/R2 个值的颜色
我已经能够做到大部分,但是 ggpairs
只显示相关系数 (r) 而不是确定系数 (R2)。我能够使用 的建议,但不幸的是该解决方案没有按组显示 R2 值。
到目前为止:
library(GGally)
library(ggplot2)
cars <- mtcars
cars$group <- factor(c(rep("A", 16), rep("B", 16))) #adding grouping variable
#function to return R2 (coefficient of determination) and not just r (Coefficient of correlation) in the top portion of the figure
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
# Calculate the r^2 & format output
m <- summary(lm(y ~ x))
lbl <- paste("r^2: ", formatC(m$r.squared, digits=ndp, format="f"))
# Write out label which is centered at x&y position
ggplot(data=data, mapping=mapping) +
annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, parse=TRUE, ...)+
theme(panel.grid = element_blank())
}
#lower function basically fits a linear model and displays points
lower_fn <- function(data, mapping, ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point(alpha = 0.7) +
geom_smooth(method=lm, fill="blue", se = F, ...)
p
}
#The actual figure
ggpairs(cars,
columns = c(1:11),
mapping = ggplot2::aes(color = group),
upper = list(continuous = "cor", size = 15),
diag = list(continuous = "densityDiag", alpha=0.5),
lower = list(continuous = lower_fn))
基于 ,下面是一些入门代码。
想法是您需要 1.split
aes
thetic 变量(假定为 colour
)的数据,2.运行对每个数据子集进行回归并提取 r^2,3. 快速计算放置 r^2 标签的位置,4. 绘图。有些功能还有待完成。
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
col <- eval_data_col(data, mapping$colour)
# if no colour mapping run over full data
if(is.null(col)) {
## add something here
}
# if colour aesthetic, split data and run `lm` over each group
if(!is.null(col)) {
idx <- split(seq_len(nrow(data)), col)
r2 <- unlist(lapply(idx, function(i) summary(lm(y[i] ~ x[i]))$r.squared))
lvs <- if(is.character(col)) sort(unique(col)) else levels(col)
cuts <- seq(min(y, na.rm=TRUE), max(y, na.rm=TRUE), length=length(idx)+1L)
pos <- (head(cuts, -1) + tail(cuts, -1))/2
p <- ggplot(data=data, mapping=mapping, ...) +
geom_blank() +
theme_void() +
# you could map colours to each level
annotate("text", x=mean(x), y=pos, label=paste(lvs, ": ", formatC(r2, digits=ndp, format="f")))
}
return(p)
}
我正在尝试在我的 X 和 Y 变量之间创建一个相关矩阵,并以漂亮的图形显示此信息。我目前正在使用 GGally
包中的 ggpairs()
,但如果有更好的方法,那么我很乐意尝试新的东西。数字应该是:
-在 X 和 Y 变量之间拟合线性回归模型(使用 lm
)
-显示带有回归线的散点图
-显示测定系数(R2)
-按组映射 points/lines/R2 个值的颜色
我已经能够做到大部分,但是 ggpairs
只显示相关系数 (r) 而不是确定系数 (R2)。我能够使用
到目前为止:
library(GGally)
library(ggplot2)
cars <- mtcars
cars$group <- factor(c(rep("A", 16), rep("B", 16))) #adding grouping variable
#function to return R2 (coefficient of determination) and not just r (Coefficient of correlation) in the top portion of the figure
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
# Calculate the r^2 & format output
m <- summary(lm(y ~ x))
lbl <- paste("r^2: ", formatC(m$r.squared, digits=ndp, format="f"))
# Write out label which is centered at x&y position
ggplot(data=data, mapping=mapping) +
annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, parse=TRUE, ...)+
theme(panel.grid = element_blank())
}
#lower function basically fits a linear model and displays points
lower_fn <- function(data, mapping, ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point(alpha = 0.7) +
geom_smooth(method=lm, fill="blue", se = F, ...)
p
}
#The actual figure
ggpairs(cars,
columns = c(1:11),
mapping = ggplot2::aes(color = group),
upper = list(continuous = "cor", size = 15),
diag = list(continuous = "densityDiag", alpha=0.5),
lower = list(continuous = lower_fn))
基于
想法是您需要 1.split
aes
thetic 变量(假定为 colour
)的数据,2.运行对每个数据子集进行回归并提取 r^2,3. 快速计算放置 r^2 标签的位置,4. 绘图。有些功能还有待完成。
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
col <- eval_data_col(data, mapping$colour)
# if no colour mapping run over full data
if(is.null(col)) {
## add something here
}
# if colour aesthetic, split data and run `lm` over each group
if(!is.null(col)) {
idx <- split(seq_len(nrow(data)), col)
r2 <- unlist(lapply(idx, function(i) summary(lm(y[i] ~ x[i]))$r.squared))
lvs <- if(is.character(col)) sort(unique(col)) else levels(col)
cuts <- seq(min(y, na.rm=TRUE), max(y, na.rm=TRUE), length=length(idx)+1L)
pos <- (head(cuts, -1) + tail(cuts, -1))/2
p <- ggplot(data=data, mapping=mapping, ...) +
geom_blank() +
theme_void() +
# you could map colours to each level
annotate("text", x=mean(x), y=pos, label=paste(lvs, ": ", formatC(r2, digits=ndp, format="f")))
}
return(p)
}