在 ggplot2 中标记或突出显示特定行
Label or Highlight Specific Rows in ggplot2
我有一个很好看的 geom_tile 图,但我需要一种方法来突出显示特定行或根据二进制值标记特定行。
这是一小部分宽格式数据和结果输出:
df <- structure(list(bin_level = c(0,1), sequence = c("L19088.1", "chr1_43580199_43586187"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G"
)), class = "data.frame", row.names = 1:2)
> df
bin_level sequence X236 X237 X238 X239 X240 X241
1 0 L19088.1 G G A T A G
2 1 chr1_43580199_43586187 . . a C c G
实际数据集要大得多,有 3096 个变量的 1045 个观测值。
我的目标是将这个庞大的数据集绘制成一个热图,其中每个不同的核苷酸 和 能够区分 bin_levels 为 0 和 1 的行。
以下代码构成了一个很好的情节,但不包括我需要看到的 bin_level 差异。如果 bin_level 为 1,我想突出显示整行,但我还没有找到任何关于如何做这样的事情的信息。我已经为 aes 填充变量使用核苷酸,所以我需要其他东西。到目前为止,我想出的最佳选择是为行标签着色。我使用 中的信息尝试使用 ifelse 语句根据 bin_level 变量进行着色。
这里最大的问题是
- 行轴标题太长太多不好看
- 只有 53 bin_level 行带有 1(总共 1045 行),那么为什么它看起来比应该的红很多?
- 我想要情节顶部的红色标签 (bin_level =1's),black/red 的混合让我觉得我的安排 (bin_level) 作品不是'不能正常工作。
如果您知道更好的方法来完成我正在努力完成的事情,或者可以帮助我的代码比目前更好地工作,请告诉我。谢谢!
df %>%
## reshape to long table
## (one column each for sequence, position and nucleotide):
pivot_longer(-c("Sequence", "bin_level"), ## stack all columns *except* sequence and bin_level
names_to = 'position',
values_to = 'nucleotide'
) %>%
arrange(bin_level) %>%
## create the plot:
ggplot() +
geom_tile(aes(x = position, y = Sequence, fill = nucleotide),
height = 1 ## adjust to visually separate sequences
) +
scale_fill_manual(values = c('a'='#ea0064', 'c'='#008a3f', 'g'='#116eff',
't'='#cf00dc', '\U00B7'='#000000', 'X' ='#ffffff'
)
) +
labs(x = 'x-axis-title', y='Sequence') +
## remove x-axis (=position) elements: they'll probably be too dense:
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_text(colour = ifelse(levels(df$bin_level)==1, "red", "black"))
)
虽然您在将数据输入 ggplot 之前按 bin 级别排列数据,但绘图的垂直排列遵循 y-value(即:序列)。您可以创建 bin_level 和序列的组合来排列和绘制数据:
df %>%
...
## reformat bin_level to a three-digit character, so that
## 002 properly precedes 011 (otherwise 11 would come before 2)
mutate(dummy = paste(sprintf('%03.0f', bin_level),
Sequence, sep = '_')) %>%
arrange(dummy) %>%
...
## ggplot instructions:
ggplot() + ... +
geom_tile(aes(y = dummy, ...)) +
## remove the bin_level prefix ('00x_') for labelling:
scale_y_discrete(labels = gsub('.*_', '', df$dummy)) +
... +
theme(axis.text.y = element_text(
## note: df$bin_level NOT levels(df$bin_level)
colour = ifelse(df$bin_level == 1, "red", "black"))
)
请注意,使用 element_text 为标签着色可能在未来不起作用:
Vectorized input to element_text()
is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.
(console warning)
虽然在某些情况下将颜色矢量传递给 element_text()
是一个快速选项,恕我直言,但在更一般的情况下,它很容易出错,并且需要密切关注您订购数据的方式。相反,我建议看一下 ggtext
包,它引入了主题元素 element_markdown
并允许使用一些 HTML、CSS 和 markdown 来设置文本样式。
此外,除了@I_O 已经指出的问题之外,另一个问题是您将数据操作步骤与绘图代码放在一个管道中。因此,当您按 bin_level
排列数据时,您使用的是原始的未经处理、未经排列的数据集 df
,顺便说一下,它仍然是用于颜色分配的宽格式。这就是为什么我个人总是建议将数据整理和绘图分开,除了非常简单的情况。
最后,虽然您按 bin_level
排列数据,但真正重要的是 sequence
的顺序,即您必须在排列后设置 sequence
的顺序,我使用forecast::fct_inorder
.
注意:为了使您的示例更加真实,我复制了您的数据集以添加另外两行。
library(tidyr)
library(dplyr)
library(ggplot2)
df_long <- df %>%
pivot_longer(-c("sequence", "bin_level"),
names_to = "position",
values_to = "nucleotide"
) %>%
arrange(bin_level) %>%
mutate(
sequence = if_else(bin_level == 1, paste0("<span style='color: red'>", sequence, "</span>"), sequence),
sequence = forcats::fct_inorder(sequence))
ggplot(df_long) +
geom_tile(aes(x = position, y = sequence, fill = nucleotide),
height = 1
) +
scale_fill_manual(values = c(
"a" = "#ea0064", "c" = "#008a3f", "g" = "#116eff",
"t" = "#cf00dc", "\U00B7" = "#000000", "X" = "#ffffff"
)) +
labs(x = "x-axis-title", y = "Sequence") +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = ggtext::element_markdown()
)
数据
df <- structure(list(
bin_level = c(0, 1), sequence = c("L19088.1", "chr1_43580199_43586187"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G")
), class = "data.frame", row.names = 1:2)
df1 <- structure(list(
bin_level = c(0, 1), sequence = c("L19088.2", "chr1_43580199_43586187.2"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G")
), class = "data.frame", row.names = 1:2)
df <- dplyr::bind_rows(df, df1)
我有一个很好看的 geom_tile 图,但我需要一种方法来突出显示特定行或根据二进制值标记特定行。
这是一小部分宽格式数据和结果输出:
df <- structure(list(bin_level = c(0,1), sequence = c("L19088.1", "chr1_43580199_43586187"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G"
)), class = "data.frame", row.names = 1:2)
> df
bin_level sequence X236 X237 X238 X239 X240 X241
1 0 L19088.1 G G A T A G
2 1 chr1_43580199_43586187 . . a C c G
实际数据集要大得多,有 3096 个变量的 1045 个观测值。
我的目标是将这个庞大的数据集绘制成一个热图,其中每个不同的核苷酸 和 能够区分 bin_levels 为 0 和 1 的行。
以下代码构成了一个很好的情节,但不包括我需要看到的 bin_level 差异。如果 bin_level 为 1,我想突出显示整行,但我还没有找到任何关于如何做这样的事情的信息。我已经为 aes 填充变量使用核苷酸,所以我需要其他东西。到目前为止,我想出的最佳选择是为行标签着色。我使用
这里最大的问题是
- 行轴标题太长太多不好看
- 只有 53 bin_level 行带有 1(总共 1045 行),那么为什么它看起来比应该的红很多?
- 我想要情节顶部的红色标签 (bin_level =1's),black/red 的混合让我觉得我的安排 (bin_level) 作品不是'不能正常工作。
如果您知道更好的方法来完成我正在努力完成的事情,或者可以帮助我的代码比目前更好地工作,请告诉我。谢谢!
df %>%
## reshape to long table
## (one column each for sequence, position and nucleotide):
pivot_longer(-c("Sequence", "bin_level"), ## stack all columns *except* sequence and bin_level
names_to = 'position',
values_to = 'nucleotide'
) %>%
arrange(bin_level) %>%
## create the plot:
ggplot() +
geom_tile(aes(x = position, y = Sequence, fill = nucleotide),
height = 1 ## adjust to visually separate sequences
) +
scale_fill_manual(values = c('a'='#ea0064', 'c'='#008a3f', 'g'='#116eff',
't'='#cf00dc', '\U00B7'='#000000', 'X' ='#ffffff'
)
) +
labs(x = 'x-axis-title', y='Sequence') +
## remove x-axis (=position) elements: they'll probably be too dense:
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_text(colour = ifelse(levels(df$bin_level)==1, "red", "black"))
)
虽然您在将数据输入 ggplot 之前按 bin 级别排列数据,但绘图的垂直排列遵循 y-value(即:序列)。您可以创建 bin_level 和序列的组合来排列和绘制数据:
df %>%
...
## reformat bin_level to a three-digit character, so that
## 002 properly precedes 011 (otherwise 11 would come before 2)
mutate(dummy = paste(sprintf('%03.0f', bin_level),
Sequence, sep = '_')) %>%
arrange(dummy) %>%
...
## ggplot instructions:
ggplot() + ... +
geom_tile(aes(y = dummy, ...)) +
## remove the bin_level prefix ('00x_') for labelling:
scale_y_discrete(labels = gsub('.*_', '', df$dummy)) +
... +
theme(axis.text.y = element_text(
## note: df$bin_level NOT levels(df$bin_level)
colour = ifelse(df$bin_level == 1, "red", "black"))
)
请注意,使用 element_text 为标签着色可能在未来不起作用:
Vectorized input to
element_text()
is not officially supported. Results may be unexpected or may change in future versions of ggplot2. (console warning)
虽然在某些情况下将颜色矢量传递给 element_text()
是一个快速选项,恕我直言,但在更一般的情况下,它很容易出错,并且需要密切关注您订购数据的方式。相反,我建议看一下 ggtext
包,它引入了主题元素 element_markdown
并允许使用一些 HTML、CSS 和 markdown 来设置文本样式。
此外,除了@I_O 已经指出的问题之外,另一个问题是您将数据操作步骤与绘图代码放在一个管道中。因此,当您按 bin_level
排列数据时,您使用的是原始的未经处理、未经排列的数据集 df
,顺便说一下,它仍然是用于颜色分配的宽格式。这就是为什么我个人总是建议将数据整理和绘图分开,除了非常简单的情况。
最后,虽然您按 bin_level
排列数据,但真正重要的是 sequence
的顺序,即您必须在排列后设置 sequence
的顺序,我使用forecast::fct_inorder
.
注意:为了使您的示例更加真实,我复制了您的数据集以添加另外两行。
library(tidyr)
library(dplyr)
library(ggplot2)
df_long <- df %>%
pivot_longer(-c("sequence", "bin_level"),
names_to = "position",
values_to = "nucleotide"
) %>%
arrange(bin_level) %>%
mutate(
sequence = if_else(bin_level == 1, paste0("<span style='color: red'>", sequence, "</span>"), sequence),
sequence = forcats::fct_inorder(sequence))
ggplot(df_long) +
geom_tile(aes(x = position, y = sequence, fill = nucleotide),
height = 1
) +
scale_fill_manual(values = c(
"a" = "#ea0064", "c" = "#008a3f", "g" = "#116eff",
"t" = "#cf00dc", "\U00B7" = "#000000", "X" = "#ffffff"
)) +
labs(x = "x-axis-title", y = "Sequence") +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = ggtext::element_markdown()
)
数据
df <- structure(list(
bin_level = c(0, 1), sequence = c("L19088.1", "chr1_43580199_43586187"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G")
), class = "data.frame", row.names = 1:2)
df1 <- structure(list(
bin_level = c(0, 1), sequence = c("L19088.2", "chr1_43580199_43586187.2"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G")
), class = "data.frame", row.names = 1:2)
df <- dplyr::bind_rows(df, df1)