基于 R 中 x-axis 的相似模式对变量 geom_point 进行排序

Question

我怎样才能对变量进行排序，以便将它们绘制成热量 map/where 它们显示出相似的模式，即：在顶部 A 和 D，然后是 B、C 和底部 E。想要避免手动操作，因为真实数据有更多变量。

Variable1 <- c(rep("A",7), rep("B",7),rep("C",7), rep("D",7), rep("E",7))
Variable2 <- c(rep(1:7, 5))
value <- c(15, 16, 11, 12, 13, 11, 12, 4, 3, 6, 5, 4, 3, 2, 3, 3, 2, 3, 3, 4, 3, 18, 17, 15, 2, 3, 4, 5, 2, 3, 4, 5, 6, 10, 18)
dff <- data.frame(Variable1, Variable2, value)

library(dplyr)
dff <- dff %>%group_by(Variable1)%>%
  mutate(scaled_val = scale(value)) %>%
  ungroup()

dff$Variable <- factor(dff$Variable1,levels=rev(unique(dff$Variable1)))

ggplot(dff, aes(x = Variable2, y = Variable1, label=NA)) +
  geom_point(aes(size = scaled_val, colour = value)) +
  geom_point(aes(size = scaled_val, colour = value), shape=21, colour="black") +
  geom_text(hjust = 1, size = 2) +
  theme_bw()+
  scale_color_gradient(low = "lightblue", high = "darkblue")+
  scale_x_discrete(expand=c(1,0))+
  coord_fixed(ratio=4)

并且想要：

如果您通过相似性查看具有聚类行的热图，例如：https://3.bp.blogspot.com/-AI2dxe95VHk/TgTJtEkoBgI/AAAAAAAAC5w/XCyBw3qViGA/s400/heatmap_cluster2.png 您会在顶部看到其模式首先是 x-axis 个时间点的行，然后是更高的最后 x-axis 个时间点..

要做：所以我想知道是否使用缩放值，我们可以这样做顶部是 Variable2 (1:2) 中具有较高平均值的那些，然后是 Variable2 (3:5) 中较高平均值的那些，然后是 Variable2 (6 :7).让我知道如果我在这里不清楚并且可以更好地解释。

Answer 1

听起来您想根据 A-E 的平均值来排列组。您可以通过将 Variable1 转换为具有自定义级别的因子来做到这一点：

lvls <- names(sort(by(dff$value, dff$Variable1, mean)))
dff$Variable1 <- factor(dff$Variable1, levels = lvls)

这是一个按 which.max 对组进行排序的解决方案：

peaks <- c(by(dff$value, dff$Variable1, which.max))
lvls <- names(sort(peaks))
dff$Variable1 <- factor(dff$Variable1, levels = lvls)

基于 R 中 x-axis 的相似模式对变量 geom_point 进行排序

Order variables geom_point based on similar pattern across x-axis in R

r

ggplot2

dplyr

geom-point