如何在 r 中创建水平堆积条形图样式的时间序列图

How to create a time series plot in the style of a horizontal stacked bar plot in r

我想创建一个水平的“堆积条形”类型的图,其中日期 运行s 沿 x 轴,我的样本在 y 轴上显示为条形。在下面的简单示例中,我有三个样本 (a、b、c),每个样本包含三个值 (0、1、2)。我希望根据沿 x 轴的每个时间步长的值对水平条进行着色,以便我最终得到三个水平条(每个样本一个),从我第一次到最后一次 运行指向并包含一系列具有与不同值相关的颜色的块。

例如,假设我希望值 0 为蓝色,值 1 为黄色,值 2 为红色:对于示例 a,迹线的前两天为蓝色,然后接下来的两天为蓝色黄色,然后是蓝色等等……

示例数据:

df <- structure(list(date = c("30/04/2011", "01/05/2011", "02/05/2011", "03/05/2011", "04/05/2011", "05/05/2011", "06/05/2011", "07/05/2011", "08/05/2011", "09/05/2011", "10/05/2011", "11/05/2011", "12/05/2011", "13/05/2011", "14/05/2011", "15/05/2011", "16/05/2011", "17/05/2011", "18/05/2011", "19/05/2011", "20/05/2011", "21/05/2011", "22/05/2011", "23/05/2011", "24/05/2011", "25/05/2011", "26/05/2011", "27/05/2011", "28/05/2011", "29/05/2011", "30/05/2011", "31/05/2011", "01/06/2011", "02/06/2011", "03/06/2011", "04/06/2011", "05/06/2011", "06/06/2011", "07/06/2011", "08/06/2011", "09/06/2011", "10/06/2011", "11/06/2011", "12/06/2011", "13/06/2011", "14/06/2011", "15/06/2011", "16/06/2011", "17/06/2011", "18/06/2011", "19/06/2011", "20/06/2011", "21/06/2011", "22/06/2011", "23/06/2011", "24/06/2011", "25/06/2011", "26/06/2011", "27/06/2011", "28/06/2011", "29/06/2011", "30/06/2011", "01/07/2011", "02/07/2011", "03/07/2011", "04/07/2011", "05/07/2011", "06/07/2011", "07/07/2011", "08/07/2011", "09/07/2011", "10/07/2011", "11/07/2011", "12/07/2011", "13/07/2011", "14/07/2011", "15/07/2011", "16/07/2011", "17/07/2011", "18/07/2011", "19/07/2011", "20/07/2011", "21/07/2011", "22/07/2011", "23/07/2011", "24/07/2011"), a = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), b = c(0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), c = c(1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)), .Names = c("date", "a", "b", "c"), class = "data.frame", row.names = c(NA, -86L))

head(df)
#         date a b c
# 1 30/04/2011 0 0 1
# 2 01/05/2011 0 1 1
# 3 02/05/2011 1 1 0
# 4 03/05/2011 1 0 0
# 5 04/05/2011 0 0 0

这一定是一件非常容易实现的事情,但我无法理解它(即条形图似乎不适用于此)。任何帮助,将不胜感激。谢谢!

这是非常手动的,但我认为它回答了你的问题。据我所知,没有为您执行此操作的功能——但我很可能是错的。我只是使用多边形为每个组绘制框。 注意:您需要将日期字段更改为日期 class。

dat$date <- as.Date(dat$date, "%d/%m/%Y")

plot(dat$a~dat$date, type = "n", yaxt = "n", ylab = "", 
     xlab = "", bty = "n", ylim = c(0, 4))
draw.box <- function(y, x1, x2, h, col) {
  polygon(x = c(x1, x1, x2, x2), 
          y = c(y - h/2, y + h/2, y + h/2, y - h/2),
          col = col, border = col)
}

for (j in c("a", "b", "c")) {
  for (i in 2:nrow(dat)) {
    bcol <- switch(as.character(dat[(i - 1), j]),
                   "0" = "red",
                   "1" = "blue",
                   "2" = "yellow")
    yloc <- switch(j,
                   "a" = 3,
                   "b" = 2,
                   "c" = 1)
    draw.box(y = yloc, 
             h = 0.75, 
             col = bcol, 
             x1 = dat[(i - 1), "date"], 
             x2 = dat[i, "date"])
  }
}

axis(side = 2, at = 3:1, labels = c("A", "B", "C"), 
     tick = FALSE, las = 2)

这里没有绘制最后一个值,因为没有 "end date" 来限制栏。

我能够 barplot() 在这里工作,但是伙计,我不得不经历一些困难。

首先,barplot() 需要一个 条形段长度 的矩阵,这意味着我们必须从您的输入数据来定义这些长度(注意:请参阅答案结尾,了解将每个数据点视为单独段的替代方案)。我们还需要捕获哪些颜色适用于每个 运行 长度,幸运的是,rle() 非常适合,因为它捕获了双组件列表中的 运行 长度和值。

其次,barplot() 对堆叠条的着色有一个不幸的限制。也就是说,如果您向 height 参数提供一个具有两个或更多堆叠条(意味着两列或更多列)的外观正常的直观结构矩阵,并且您想要使用 为每个堆叠条着色与其他堆叠条不同 的颜色序列,那么你将无法做到。至少,不是那种矩阵结构。

这是因为col参数只能接受颜色向量;它不能接受矩阵或向量列表或任何其他与传递给 height 参数的主矩阵输入相对应的东西。如果您尝试提供过长的颜色矢量,barplot() 会忽略多余部分。

基于Stacked bar plot with different combinations of colors in R,解决方案是在矩阵中偏移每个条,将所有相邻列设置为零,从而允许您为每个条中的每个条段设置不同的颜色。

将数据整理成所需的形状并不容易,但在@ak运行 对我刚才提出的问题的回答的帮助下,,我们可以按以下方式完成所有这些:

pd <- lapply(df[-1],function(v) do.call(cbind,rle(v)));
height <- as.matrix(setNames(reshape(cbind(id=1:sum(sapply(pd,nrow)),stack(lapply(pd,function(x) x[,'lengths']))),dir='w',timevar='ind')[-1],names(pd)));
height[is.na(height)] <- 0;
col <- c('blue','yellow','red')[do.call(c,sapply(pd,function(x) x[,'values']))+1];
barplot(t(apply(height,1,rev)),col=col,horiz=T,axes=F);
axis(1,0:(nrow(df)-1),labels=df$date);
title('Horizontal Stacked Bar Plot');

下面是数据,供参考:

pd;
## $a
##       lengths values
##  [1,]       2      0
##  [2,]       2      1
##  [3,]       1      0
##  [4,]       1      1
##  [5,]       3      0
##  [6,]       1      1
##  [7,]       3      0
##  [8,]       1      1
##  [9,]      13      0
## [10,]      22      2
## [11,]      12      0
## [12,]       4      1
## [13,]       3      0
## [14,]       2      1
## [15,]       3      0
## [16,]       2      1
## [17,]       1      0
## [18,]       1      1
## [19,]       8      0
## [20,]       1      1
##
## $b
##       lengths values
##  [1,]       1      0
##  [2,]       2      1
##  [3,]       4      0
##  [4,]       2      1
##  [5,]       3      0
##  [6,]       1      1
##  [7,]       9      0
##  [8,]      22      2
##  [9,]       3      0
## [10,]       1      1
## [11,]      10      0
## [12,]       1      1
## [13,]       7      0
## [14,]       3      1
## [15,]       5      0
## [16,]       2      1
## [17,]       5      0
## [18,]       5      1
##
## $c
##       lengths values
##  [1,]       2      1
##  [2,]       3      0
##  [3,]       1      1
##  [4,]       1      0
##  [5,]       1      1
##  [6,]       1      0
##  [7,]       1      1
##  [8,]       1      0
##  [9,]       1      1
## [10,]      13      0
## [11,]      30      2
## [12,]      16      0
## [13,]       1      1
## [14,]       7      0
## [15,]       3      1
## [16,]       4      0
##
height;
##     a  b  c
## 1   2  0  0
## 2   2  0  0
## 3   1  0  0
## 4   1  0  0
## 5   3  0  0
## 6   1  0  0
## 7   3  0  0
## 8   1  0  0
## 9  13  0  0
## 10 22  0  0
## 11 12  0  0
## 12  4  0  0
## 13  3  0  0
## 14  2  0  0
## 15  3  0  0
## 16  2  0  0
## 17  1  0  0
## 18  1  0  0
## 19  8  0  0
## 20  1  0  0
## 21  0  1  0
## 22  0  2  0
## 23  0  4  0
## 24  0  2  0
## 25  0  3  0
## 26  0  1  0
## 27  0  9  0
## 28  0 22  0
## 29  0  3  0
## 30  0  1  0
## 31  0 10  0
## 32  0  1  0
## 33  0  7  0
## 34  0  3  0
## 35  0  5  0
## 36  0  2  0
## 37  0  5  0
## 38  0  5  0
## 39  0  0  2
## 40  0  0  3
## 41  0  0  1
## 42  0  0  1
## 43  0  0  1
## 44  0  0  1
## 45  0  0  1
## 46  0  0  1
## 47  0  0  1
## 48  0  0 13
## 49  0  0 30
## 50  0  0 16
## 51  0  0  1
## 52  0  0  7
## 53  0  0  3
## 54  0  0  4
col;
##  [1] "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "red"    "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"
## [24] "yellow" "blue"   "yellow" "blue"   "red"    "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "yellow" "blue"   "yellow" "blue"   "yellow" "blue"   "yellow" "blue"
## [47] "yellow" "blue"   "red"    "blue"   "yellow" "blue"   "yellow" "blue"

最后,我确实尝试在没有 运行 长度步长的情况下构建绘图,而是将每个数据点视为其自己的片段。这行得通(尽管您仍然必须执行偏移操作),但可能不是您想要的。这是它的屏幕截图:

这是代码,如果你更喜欢这个:

pd <- lapply(df[-1],function(v) rep(1,length(v)));
height <- as.matrix(setNames(reshape(cbind(id=1:sum(sapply(pd,length)),stack(lapply(pd,function(x) x))),dir='w',timevar='ind')[-1],names(pd)));
height[is.na(height)] <- 0;
col <- c('blue','yellow','red')[do.call(c,df[-1]+1)];
barplot(t(apply(height,1,rev)),col=col,horiz=T,axes=F);
axis(1,0:(nrow(df)-1),labels=df$date);
title('Horizontal Stacked Bar Plot');

对于 ggplot2 图,首先将 df 转换为长格式(使用 reshape2 包中的 melt),将日期列转换为 "Date" class 并将 value 列转换为因子然后使用 geom_tile:

library(ggplot2)
library(reshape2)

long <- melt(df, measure.var = 2:4)
long <- transform(long, date = as.Date(long$date, "%d/%m/%Y"), value = factor(value))

ggplot(long, aes(date, variable)) + 
   geom_tile(aes(fill = value)) + 
   scale_fill_manual(values = c("blue", "yellow", "red"))