ggplot:定义点重叠的颜色

ggplot: define color for point overlaps

使用 ggplot2,我想在二维中绘制两个向量 (vec1_num、vec2_num),并通过组变量 (vec3_char) 为这些点着色。一些数据点重叠。

library(ggplot2)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "C", "A", "B", "C", "C", "A", "B", "C")

# plot 1
ggplot(data = NULL) +
  geom_point(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4) +
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
  theme(panel.grid = element_blank())

我知道我可以通过减少 alpha 或使用 geom_jitter 添加一点噪音来减弱重叠。像这样:

# plot 2
ggplot(data = NULL) +
  geom_jitter(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4, width = 0.1) +
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
  theme(panel.grid = element_blank())

但是,是否可以利用图 1 但对重叠点进行不同的着色?例如,“A”=“深蓝”,“AB”=“黑”,“ABC”=“灰”,“B”=“深红”,“BC”=“粉红”,“C”=橙色”?我可以另外添加一个小维恩图(图例)来可视化点重叠的颜色选择吗?

谢谢!

我会首先创建一个数据框。然后我会为每个 x y 组合 (list(df$vec1_num, df$vec2_num)) 提取存在的字符 (...unique(xy_i$vec3_char)...)。像这样:

df <- data.frame(vec1_num, vec2_num, vec3_char)
df_new <- do.call("rbind.data.frame", by(df, list(df$vec1_num, df$vec2_num), function(xy_i){
chars_i <- paste0(sort(unique(xy_i$vec3_char)),collapse= "")
xy_i$chars_comb <- factor(chars_i, levels= c("A", "AB", "AC", "ABC", "B", "BC", "C"))
xy_i
}))

如果您现在制作绘图,它会显示哪些字符在哪个点重叠。

ggplot(data = df_new) +
  geom_point(aes(x=vec1_num, y=vec2_num, colour=chars_comb), alpha=0.4, size=4) + 
  scale_colour_manual(values=c("AB" = "black", "ABC" = "grey", "B" = "darkred", "C"="orange", "AC"= "red")) +
  theme(panel.grid = element_blank())

我这样做的方法是将字母转换为数字,将它们相加并转换回字母。

NB 一个复杂的问题是字母必须是 A、B、D、H……所以每个数字组合只有一种方法。虽然可能有一种方法可以从 A、B、C...开始并编码为唯一值

library(tidyverse)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "D", "A", "B", "D", "D", "A", "B", "D")

removeDup <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="") # Function to remove duplicated values in a string

data <- data.frame(x = vec1_num, y = vec2_num, col = match(vec3_char, LETTERS))

data <- data %>% 
  group_by(x) %>%
  mutate(colour = glue::glue_collapse(col, sep = "")) %>%
  select(-col) %>% 
  distinct(x, y, .keep_all = TRUE) %>% 
  mutate(colour = removeDup(colour)) %>%
  mutate(colour = sapply(str_extract_all(colour, '\d'), function(x) sum(as.integer(x)))) %>% 
  mutate(colour = case_when(
    colour == 1 ~ "A",
    colour == 2 ~ "B",
    colour == 3 ~ "AB",
    colour == 4 ~ "D",
    colour == 5 ~ "AD",
    colour == 6 ~ "BD",
    colour == 7 ~ "ABD"
  ))

# plot 1
ggplot(data) +
  geom_point(aes(x=x, y=y, colour = as_factor(colour)), alpha=0.4, size=4) +
  geom_text(aes(x = x, y = y, label = colour), vjust = 2) +
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "AB"="orange", "D" = "green", "AD" = "black", "BD" = "orange", "ABD" = "purple"), name = "Colour") +
  theme(panel.grid = element_blank())