分组因子水平以避免在 ggplot2 中重叠
Grouping factor levels to avoid overlap in ggplot2
我正在 ggplot2 的帮助下制作一种数字线图,并面临文本标签相互重叠的问题。我还使用 geom_text_repel 包来避免文本重叠,但随着越来越多的因子水平具有相邻的平均分数,它变得越来越混乱。我在下面提供了代码示例以及主要使用的数据。
Category Dimension1
AcademicWriting -0.7
Brd.Discussions 0.6
Brd.Interviews -2.4
Brd.News 8.3
Brd.Talks 0
BusinessLetters 2.4
ClassLessons 0.2
Commentaries -12.9
Comments -1.2
CreativeWriting 1.4
Documentaries -1.4
F2FConversations -1.8
FBGroups 0.4
FBSt.Updates -1
Ind.Blogs 0.1
Inst.Writing 0.9
NBrd.Talks -0.1
NewsBlogs 0.4
NewsReports 7.1
Pol.Debates -1.4
PopularWriting 0.5
PressEditorials 1.8
SocialLetters 0.6
Speeches 3
StudentWriting -2
TechBlogs 1.7
ThesesPresentations -0.8
Tweets -2.8
代码:
library(ggplot2)
library(ggrepel)
library(extrafont)
loadfonts(device = "win")
plot_graph <- function(d1, label_below = "", label_above = "")
{
d1 <- d1[order(-d1[,2]),]
d1$X <- rep(0, each=length(d1$Dimension1))
attach(d1)
plot1 <- ggplot(data=d1, aes(x=X, y=Dimension1, label=Category)) +
geom_point() +
geom_text_repel(aes(label=Category), direction = "x", family="Times New Roman", size=4, max.iter = 2e2) +
theme_bw()+
theme(axis.text.x = element_text(colour="black"), axis.text.y = element_text(colour="black"))+
theme(text=element_text(family="Times New Roman"), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
geom_vline(xintercept = 0, linetype = 1) +
coord_cartesian(xlim = c(-3, 3)) +
geom_segment(aes(x = -2, y = 5+min(Dimension1), xend = -2, yend = max(Dimension1)-5), arrow = arrow(ends = "both"), alpha=0.5, size=0.5) +
geom_text(aes(x = -2, y = 6+min(Dimension1), label = label_below)) +
geom_text(aes(x = -2, y = max(Dimension1)-4, label = label_above))
detach(d1)
plot1
}
plot4 <- plot_graph(d1 = d1, label_below = "", label_above = "")
plot4
结果如下图:
看了多个类似的线程,不知道有没有办法解决这个问题。但是我有一个想法来对因子水平进行分组,即根据相邻的平均分数进行标签,例如AcademicWriting,FBSt.Updates(示例中的第 1 和第 7 个因素水平)可以在将各自的平均分数四舍五入到 -1 后分组在一起。它们可以显示在以逗号分隔的水平线上。但是我想不出一种方法来对它们进行分组。这就是为什么我请求你的帮助,或任何其他方式来解决重叠问题。
这是一个想法:
将 Dimension1 列剪切成任意数量的组,按形成的剪切变量分组,粘贴类别名称并计算 y 坐标。我将文本和点映射到相同的颜色,但没有必要。
library(tidyverse)
d1 %>%
arrange(desc(Dimension1)) %>%
mutate(cut = cut(Dimension1, 32),
X = 0) %>%
group_by(cut) %>%
mutate(label = paste(Category, collapse = ", "),
coord = mean(Dimension1),
label2 = ifelse(duplicated(label), NA, label)) %>%
ungroup() %>%
ggplot(aes(x=X, y=Dimension1, label=Category, color = label)) +
geom_segment(aes(x = -0.25, y = 5 + min(Dimension1), xend = -0.25, yend = max(Dimension1)-5), arrow = arrow(ends = "both"), alpha=0.5, size=0.5)+
geom_point() +
geom_text(aes(label=label2, x = X+0.05, y = coord, color = label), family="Times New Roman", size=4, hjust = 0) +
theme_bw()+
theme(axis.text.x = element_text(colour="black"),
axis.text.y = element_text(colour="black"))+
theme(text=element_text(family="Times New Roman"),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position="none") +
geom_vline(xintercept = 0, linetype = 1) +
coord_cartesian(xlim = c(-0.5, 3))
我正在 ggplot2 的帮助下制作一种数字线图,并面临文本标签相互重叠的问题。我还使用 geom_text_repel 包来避免文本重叠,但随着越来越多的因子水平具有相邻的平均分数,它变得越来越混乱。我在下面提供了代码示例以及主要使用的数据。
Category Dimension1
AcademicWriting -0.7
Brd.Discussions 0.6
Brd.Interviews -2.4
Brd.News 8.3
Brd.Talks 0
BusinessLetters 2.4
ClassLessons 0.2
Commentaries -12.9
Comments -1.2
CreativeWriting 1.4
Documentaries -1.4
F2FConversations -1.8
FBGroups 0.4
FBSt.Updates -1
Ind.Blogs 0.1
Inst.Writing 0.9
NBrd.Talks -0.1
NewsBlogs 0.4
NewsReports 7.1
Pol.Debates -1.4
PopularWriting 0.5
PressEditorials 1.8
SocialLetters 0.6
Speeches 3
StudentWriting -2
TechBlogs 1.7
ThesesPresentations -0.8
Tweets -2.8
代码:
library(ggplot2)
library(ggrepel)
library(extrafont)
loadfonts(device = "win")
plot_graph <- function(d1, label_below = "", label_above = "")
{
d1 <- d1[order(-d1[,2]),]
d1$X <- rep(0, each=length(d1$Dimension1))
attach(d1)
plot1 <- ggplot(data=d1, aes(x=X, y=Dimension1, label=Category)) +
geom_point() +
geom_text_repel(aes(label=Category), direction = "x", family="Times New Roman", size=4, max.iter = 2e2) +
theme_bw()+
theme(axis.text.x = element_text(colour="black"), axis.text.y = element_text(colour="black"))+
theme(text=element_text(family="Times New Roman"), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
geom_vline(xintercept = 0, linetype = 1) +
coord_cartesian(xlim = c(-3, 3)) +
geom_segment(aes(x = -2, y = 5+min(Dimension1), xend = -2, yend = max(Dimension1)-5), arrow = arrow(ends = "both"), alpha=0.5, size=0.5) +
geom_text(aes(x = -2, y = 6+min(Dimension1), label = label_below)) +
geom_text(aes(x = -2, y = max(Dimension1)-4, label = label_above))
detach(d1)
plot1
}
plot4 <- plot_graph(d1 = d1, label_below = "", label_above = "")
plot4
结果如下图:
这是一个想法:
将 Dimension1 列剪切成任意数量的组,按形成的剪切变量分组,粘贴类别名称并计算 y 坐标。我将文本和点映射到相同的颜色,但没有必要。
library(tidyverse)
d1 %>%
arrange(desc(Dimension1)) %>%
mutate(cut = cut(Dimension1, 32),
X = 0) %>%
group_by(cut) %>%
mutate(label = paste(Category, collapse = ", "),
coord = mean(Dimension1),
label2 = ifelse(duplicated(label), NA, label)) %>%
ungroup() %>%
ggplot(aes(x=X, y=Dimension1, label=Category, color = label)) +
geom_segment(aes(x = -0.25, y = 5 + min(Dimension1), xend = -0.25, yend = max(Dimension1)-5), arrow = arrow(ends = "both"), alpha=0.5, size=0.5)+
geom_point() +
geom_text(aes(label=label2, x = X+0.05, y = coord, color = label), family="Times New Roman", size=4, hjust = 0) +
theme_bw()+
theme(axis.text.x = element_text(colour="black"),
axis.text.y = element_text(colour="black"))+
theme(text=element_text(family="Times New Roman"),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position="none") +
geom_vline(xintercept = 0, linetype = 1) +
coord_cartesian(xlim = c(-0.5, 3))