数据点可以在带状图中标记吗?

Can data points be labeled in stripcharts?

我找不到在 stripchart 中标记数据点的方法。使用 text 函数,如本 question 中所建议,当点堆叠或抖动时会崩溃。

我有 4 个类别(第 2-5 列)的数值数据,我想用首字母(第 1 列)标记每个数据点。

这是我的数据和我试过的代码:

initials,total,interest,slides,presentation
CU,1.6,1.7,1.5,1.6
DS,1.6,1.7,1.5,1.7
VA,1.7,1.5,1.5,2.1
MB,2.3,2.0,2.1,2.9
HS,1.2,1.3,1.4,1.0
LS,1.8,1.8,1.5,2.0

stripchart(CTscores[-1], method = "stack", las = 1)
text(CTscores$total + 0.05, 1, labels = CTscores$name, cex = 0.5)

下面的情节是我迄今为止最好的情节。如您所见,数据点标签重叠。此外,最长的 y 标签被切断。

可以在条形图中标记点吗?或者我必须用另一个命令显示它以允许标记吗?

如何使用标签作为点标记,而不是使用单独的标签?这是一个使用 ggplot2 而不是基本图形的示例。

为了避免重叠,我们直接设置重复值的垂直偏移量,而不是随机抖动。为此,我们需要分配数值 y-values(以便我们可以添加偏移量),然后将数值轴标签替换为适当的文本标签。

library(ggplot2)
library(reshape2)
library(dplyr)

# Convert data from "wide" to "long" format
CTscores.m = melt(CTscores, id.var="initials")

# Create an offset that we'll use for vertically separating the repeated values
CTscores.m = CTscores.m %>% group_by(variable, value) %>%
  mutate(repeats = ifelse(n()>1, 1,0),
         offset = ifelse(repeats==0, 0, seq(-n()/25, n()/25, length.out=n())))

ggplot(CTscores.m, aes(label=initials, x=value, y=as.numeric(variable) + offset,
                       color=initials)) +
  geom_text() +
  scale_y_continuous(labels=sort(unique(CTscores.m$variable))) +
  theme_bw(base_size=15) +
  labs(y="", x="") +
  guides(color=FALSE)

为了完整起见,以下是如何为重复值创建具有抖动而不是特定偏移量的图形:

# Convert data from "wide" to "long" format
CTscores.m = melt(CTscores, id.var="initials")

# Mark repeated values (so we can selectively jitter them later)
CTscores.m = CTscores.m %>% group_by(variable, value) %>%
  mutate(repeats = ifelse(n()>1, 1,0))

# Jitter only the points with repeated values
set.seed(13)
ggplot() +
  geom_text(data=CTscores.m[CTscores.m$repeats==1,], 
            aes(label=initials, x=value, y=variable, color=initials),
            position=position_jitter(height=0.25, width=0)) +
  geom_text(data=CTscores.m[CTscores.m$repeats==0,], 
            aes(label=initials, x=value, y=variable, color=initials)) +
  theme_bw(base_size=15) +
  guides(color=FALSE)

这里有一个替代方法,允许您向条形图添加颜色以识别首字母:

library(ggplot2)
library(reshape2)
library(gtable)
library(gridExtra)

# Gets default ggplot colors
gg_color_hue <- function(n) {
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]}

# Transform to long format
CTscores.m = melt(CTscores, id.var="initials")

# Create a vector of colors with keys for the initials
colvals <- gg_color_hue(nrow(CTscores))
names(colvals) <- sort(CTscores$initials)

# This color vector needs to be the same length as the melted dataset
cols <- rep(colvals,ncol(CTscores)-1)

# Create a basic plot that will have a legend with the desired attributes
g1 <- ggplot(CTscores.m, aes(x=variable, y=value, fill=initials)) +
  geom_dotplot(color=NA)+theme_bw()+coord_flip()+scale_fill_manual(values=colvals)

# Extract the legend
fill.legend <- gtable_filter(ggplot_gtable(ggplot_build(g1)), "guide-box") 
legGrob <- grobTree(fill.legend)

# Create the plot we want without the legend
g2 <- ggplot(CTscores.m, aes(x=variable, y=value)) +
  geom_dotplot(binaxis="y", stackdir="up",binwidth=0.03,fill=cols,color=NA) +
  theme_bw()+coord_flip()

# Create the plot with the legend
grid.arrange(g2, legGrob, ncol=2, widths=c(10, 1))