使用 ggplot2,什么代码创建由单个单词及其计数组成的条形图?
With ggplot2, what code creates bars made of individual words and their count?
为了传达关键词的相对频率,我希望图中的每个 "bar" 都包含一个按频率垂直重复的单词。下面的 ggplot
代码删除了条形图的轮廓和填充,但是我如何根据单词的频率创建 "stack" 个单词作为(或在)条形图中?因此 "global" 将从 x 轴开始,并在垂直方向重复 "global" 三次,分别位于 y 轴的 1、2 和 3 位置; "local" 会堆叠五次,依此类推
# a toy data frame
words <- c("global", "local", "firm")
freq <- c(3, 5, 6)
df <-data.frame(cbind(words, freq))
library("ggthemes")
# a very unimpressive and uninformative plot
ggplot(df, aes(x = words, y = freq)) +
geom_bar(stat = "identity", fill = "transparent", colour = "white") +
theme_tufte()
我尝试将 annotation_custom()
与 textGrob 一起使用,但无法弄清楚如何按频率重复单词。
感谢您的指导。
这里有一个可能满足您需求的快速技巧(尽管我敢打赌有更好的方法来做到这一点):
library(dplyr)
# Data frame with each word appearing a number of times equal to its frequency
df.freq = data.frame(words=rep(words, freq))
# Add a counter from 1 to freq for each word.
# This will become the `y` value in the graph.
df.freq = df.freq %>%
group_by(words) %>%
mutate(counter=1:n())
# Graph the words as if they were points in a scatterplot
p1 = ggplot(df.freq, aes(words, counter-0.5)) +
geom_text(aes(label=words), size=12) +
scale_y_continuous(limits=c(0,max(df.freq$counter))) +
labs(x="Words",y="Freq") +
theme_tufte(base_size=20) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
# Save the plot, adjusting the aspect ratio so that the words stack nicely
# without large gaps between each copy of the word
pdf("word stack.pdf", 6,3.5)
p1
dev.off()
这是一个 png
版本,因为 SO 不显示 PDF 文件。
如果您不打算使用一堆单词,另一种选择是坚持使用条形图并将单词添加到每个条形的中间。例如:
# a toy data frame
words <- c("global", "local", "firm")
freq <- c(3, 5, 6)
df <-data.frame(words, freq)
ggplot(df, aes(words, freq)) +
geom_bar(stat="identity", fill=hcl(195,100,65)) +
geom_text(aes(label=words, y=freq*0.5), colour="white", size=10) +
theme_tufte(base_size=20) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
为了传达关键词的相对频率,我希望图中的每个 "bar" 都包含一个按频率垂直重复的单词。下面的 ggplot
代码删除了条形图的轮廓和填充,但是我如何根据单词的频率创建 "stack" 个单词作为(或在)条形图中?因此 "global" 将从 x 轴开始,并在垂直方向重复 "global" 三次,分别位于 y 轴的 1、2 和 3 位置; "local" 会堆叠五次,依此类推
# a toy data frame
words <- c("global", "local", "firm")
freq <- c(3, 5, 6)
df <-data.frame(cbind(words, freq))
library("ggthemes")
# a very unimpressive and uninformative plot
ggplot(df, aes(x = words, y = freq)) +
geom_bar(stat = "identity", fill = "transparent", colour = "white") +
theme_tufte()
我尝试将 annotation_custom()
与 textGrob 一起使用,但无法弄清楚如何按频率重复单词。
感谢您的指导。
这里有一个可能满足您需求的快速技巧(尽管我敢打赌有更好的方法来做到这一点):
library(dplyr)
# Data frame with each word appearing a number of times equal to its frequency
df.freq = data.frame(words=rep(words, freq))
# Add a counter from 1 to freq for each word.
# This will become the `y` value in the graph.
df.freq = df.freq %>%
group_by(words) %>%
mutate(counter=1:n())
# Graph the words as if they were points in a scatterplot
p1 = ggplot(df.freq, aes(words, counter-0.5)) +
geom_text(aes(label=words), size=12) +
scale_y_continuous(limits=c(0,max(df.freq$counter))) +
labs(x="Words",y="Freq") +
theme_tufte(base_size=20) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
# Save the plot, adjusting the aspect ratio so that the words stack nicely
# without large gaps between each copy of the word
pdf("word stack.pdf", 6,3.5)
p1
dev.off()
这是一个 png
版本,因为 SO 不显示 PDF 文件。
如果您不打算使用一堆单词,另一种选择是坚持使用条形图并将单词添加到每个条形的中间。例如:
# a toy data frame
words <- c("global", "local", "firm")
freq <- c(3, 5, 6)
df <-data.frame(words, freq)
ggplot(df, aes(words, freq)) +
geom_bar(stat="identity", fill=hcl(195,100,65)) +
geom_text(aes(label=words, y=freq*0.5), colour="white", size=10) +
theme_tufte(base_size=20) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())