在 R 中的热图中围绕连接的单元格绘制等高线
Drawing a contour line around connected cells in a heatmap in R
我有两个时间轴的数据和每个单元格的测量值。由此我创建了一个热图。我还知道每个单元格的测量值是否重要。
我的问题是在所有重要的单元格周围画一条等高线。如果细胞形成具有相同显着性值的簇,我需要围绕簇而不是每个单独的细胞绘制轮廓。
数据格式如下:
x_time y_time metric signif
1 1 1 0.3422285 FALSE
2 2 1 0.6114085 FALSE
3 3 1 0.5381621 FALSE
4 4 1 0.5175120 FALSE
5 1 2 0.6997991 FALSE
6 2 2 0.3054885 FALSE
7 3 2 0.8353888 TRUE
8 4 2 0.3991566 TRUE
9 1 3 0.7522728 TRUE
10 2 3 0.5311418 TRUE
11 3 3 0.4972816 TRUE
12 4 3 0.4330033 TRUE
13 1 4 0.5157972 TRUE
14 2 4 0.6324151 TRUE
15 3 4 0.4734126 TRUE
16 4 4 0.4315119 TRUE
下面的代码生成此数据,其中测量值是随机的 (dt$metrics),显着性是合乎逻辑的 (dt$signif)。
# data example
dt <- data.frame(x_time=rep(seq(1, 4), 4),
y_time=rep(seq(1, 4), each=4),
metric=(rnorm(16, 0.5, 0.2)),
signif=c(rep(FALSE, 6), rep(TRUE, 10)))
可以使用 ggplot2 单独生成热图 geom_tile
# Generate heatmap using ggplot2's geom_tile
library(ggplot2)
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric))
在this question的基础上,我设法根据显着性值在每个单元格周围绘制了不同颜色的轮廓。
# Heatmap with lines around each significant cell
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric, color = signif), size = 2)
p <- p + scale_color_manual(values = c("black", "white"))
但是,这种方法不会通过围绕整个组绘制轮廓来将相邻的重要单元格分组在一起(正如我链接到的问题中所讨论的那样)。
如this question所示,可以在指定区域周围绘制方框,但我认为这不能扩展到所有可能的单元格簇。
当然,如果您要创建大量热图,这会有点乏味(即使可以从您的数据中创建具有必要值的数据框),但除此之外您可以使用 geom_segment
小号:
p + geom_segment(aes(x = .5, xend = 4.5, y = 4.5, yend = 4.5), colour = "white", size = 2) +
geom_segment(aes(x = .5, xend = 2.5, y = 2.5, yend = 2.5), colour = "white", size = 2) +
geom_segment(aes(x = 2.5, xend = 4.5, y = 1.5, yend = 1.5), colour = "white", size = 2) +
geom_segment(aes(x = .5, xend = .5, y = 2.5, yend = 4.5), colour = "white", size = 2) +
geom_segment(aes(x = 2.5, xend = 2.5, y = 1.5, yend = 2.5), colour = "white", size = 2) +
geom_segment(aes(x = 4.5, xend = 4.5, y = 1.5, yend = 4.5), colour = "white", size = 2)
此答案基于 。
library(data.table)
library(raster)
另请注意,clump
需要安装 igraph
软件包,rasterToPolygons
中的 dissolve = TRUE
需要 rgeos
.
# convert data.frame to data.table
# not strictly necessary, but enables use of convenient functions: dcast and rbindlist.
setDT(d)
# reshape to wide
d2 <- dcast(d, y ~ x, value.var = "sig")
# reverse order of rows to match raster order
# remove first column
# convert to matrix and then to raster
r <- raster(as.matrix(d2[ , .SD[.N:1, -1]]),
xmn = 0, xmx = ncol(d2) - 1, ymn = 0, ymx = ncol(d2) - 1)
# detect clumps of connected cells of the value TRUE
# convert raster to polygons
# dissolve polygons into multi-polygons
polys <- rasterToPolygons(clump(r), dissolve = TRUE)
# grab coordinates of individual polygons and convert to a data.table
# use idcol = TRUE to enable grouping of paths when plotting
d_poly <- rbindlist(lapply(polys@polygons,
function(x) as.data.table(x@Polygons[[1]]@coords)),
idcol = TRUE)
# plot an outline around each 'patch of significant values' using geom_path
ggplot(d, aes(x = x, y = y)) +
geom_tile(aes(fill = z)) +
geom_path(data = d_poly, aes(x = x + 0.5, y = y + 0.5, group = .id),
size = 2, color = "red")
数据:
d <- structure(list(x = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L),
y = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
sig = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
z = c(0.96, 0.76, 0.14, 0.93, 0.39, 0.06, 0.99, 0.77,
0.7, 0.72, 0.08, 0.94, 0.98, 0.83, 0.12, 0.42)),
row.names = c(NA, -16L), class = "data.frame")
我有两个时间轴的数据和每个单元格的测量值。由此我创建了一个热图。我还知道每个单元格的测量值是否重要。
我的问题是在所有重要的单元格周围画一条等高线。如果细胞形成具有相同显着性值的簇,我需要围绕簇而不是每个单独的细胞绘制轮廓。
数据格式如下:
x_time y_time metric signif
1 1 1 0.3422285 FALSE
2 2 1 0.6114085 FALSE
3 3 1 0.5381621 FALSE
4 4 1 0.5175120 FALSE
5 1 2 0.6997991 FALSE
6 2 2 0.3054885 FALSE
7 3 2 0.8353888 TRUE
8 4 2 0.3991566 TRUE
9 1 3 0.7522728 TRUE
10 2 3 0.5311418 TRUE
11 3 3 0.4972816 TRUE
12 4 3 0.4330033 TRUE
13 1 4 0.5157972 TRUE
14 2 4 0.6324151 TRUE
15 3 4 0.4734126 TRUE
16 4 4 0.4315119 TRUE
下面的代码生成此数据,其中测量值是随机的 (dt$metrics),显着性是合乎逻辑的 (dt$signif)。
# data example
dt <- data.frame(x_time=rep(seq(1, 4), 4),
y_time=rep(seq(1, 4), each=4),
metric=(rnorm(16, 0.5, 0.2)),
signif=c(rep(FALSE, 6), rep(TRUE, 10)))
可以使用 ggplot2 单独生成热图 geom_tile
# Generate heatmap using ggplot2's geom_tile
library(ggplot2)
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric))
在this question的基础上,我设法根据显着性值在每个单元格周围绘制了不同颜色的轮廓。
# Heatmap with lines around each significant cell
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric, color = signif), size = 2)
p <- p + scale_color_manual(values = c("black", "white"))
但是,这种方法不会通过围绕整个组绘制轮廓来将相邻的重要单元格分组在一起(正如我链接到的问题中所讨论的那样)。
如this question所示,可以在指定区域周围绘制方框,但我认为这不能扩展到所有可能的单元格簇。
当然,如果您要创建大量热图,这会有点乏味(即使可以从您的数据中创建具有必要值的数据框),但除此之外您可以使用 geom_segment
小号:
p + geom_segment(aes(x = .5, xend = 4.5, y = 4.5, yend = 4.5), colour = "white", size = 2) +
geom_segment(aes(x = .5, xend = 2.5, y = 2.5, yend = 2.5), colour = "white", size = 2) +
geom_segment(aes(x = 2.5, xend = 4.5, y = 1.5, yend = 1.5), colour = "white", size = 2) +
geom_segment(aes(x = .5, xend = .5, y = 2.5, yend = 4.5), colour = "white", size = 2) +
geom_segment(aes(x = 2.5, xend = 2.5, y = 1.5, yend = 2.5), colour = "white", size = 2) +
geom_segment(aes(x = 4.5, xend = 4.5, y = 1.5, yend = 4.5), colour = "white", size = 2)
此答案基于
library(data.table)
library(raster)
另请注意,clump
需要安装 igraph
软件包,rasterToPolygons
中的 dissolve = TRUE
需要 rgeos
.
# convert data.frame to data.table
# not strictly necessary, but enables use of convenient functions: dcast and rbindlist.
setDT(d)
# reshape to wide
d2 <- dcast(d, y ~ x, value.var = "sig")
# reverse order of rows to match raster order
# remove first column
# convert to matrix and then to raster
r <- raster(as.matrix(d2[ , .SD[.N:1, -1]]),
xmn = 0, xmx = ncol(d2) - 1, ymn = 0, ymx = ncol(d2) - 1)
# detect clumps of connected cells of the value TRUE
# convert raster to polygons
# dissolve polygons into multi-polygons
polys <- rasterToPolygons(clump(r), dissolve = TRUE)
# grab coordinates of individual polygons and convert to a data.table
# use idcol = TRUE to enable grouping of paths when plotting
d_poly <- rbindlist(lapply(polys@polygons,
function(x) as.data.table(x@Polygons[[1]]@coords)),
idcol = TRUE)
# plot an outline around each 'patch of significant values' using geom_path
ggplot(d, aes(x = x, y = y)) +
geom_tile(aes(fill = z)) +
geom_path(data = d_poly, aes(x = x + 0.5, y = y + 0.5, group = .id),
size = 2, color = "red")
数据:
d <- structure(list(x = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L),
y = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
sig = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
z = c(0.96, 0.76, 0.14, 0.93, 0.39, 0.06, 0.99, 0.77,
0.7, 0.72, 0.08, 0.94, 0.98, 0.83, 0.12, 0.42)),
row.names = c(NA, -16L), class = "data.frame")