GGPlot combining/overlaying 柱形图和折线图(甘特图)

GGPlot combining/overlaying column and line (Gantt) charts

我想在包含 'suggested sowing windows' 和实际播种日期的甘特图上叠加降雨数据(列)。从数据集中,我可以分别创建两者,但不能在一张图表上创建。非常感谢任何指点。

## plot Gantt chart with suggested sowing dates and actual sowing dates
sowdate.df$Element <- factor(sowdate.df$Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk"))
ggplot(sowdate.df, aes(Date1, Element, Color=Category, group=Item)) +
  geom_line(size = 10) 

## plot rainfall
ggplot(sowdate.df, aes(Date1, rain)) + geom_col()


## combine Gantt and rainfall
ggplot(sowdate.df) + 
  geom_col(aes(Date1, rain), size = 1, color = "darkblue", fill = "white") +
  geom_line(aes(Date1, Element, Color=Category, group=Item), size = 1.5, color="red", group = 1)



      Item     Element    Category Start-End      Date1 rain
1     1      Beckom     Variety     Start 2018-05-07   NA
2     2        Dart     Variety     Start 2018-06-01   NA
3     3     Flanker     Variety     Start 2018-05-01   NA
4     4   Kittyhawk     Variety     Start 2018-04-01   NA
5     5      Lancer     Variety     Start 2018-05-01   NA
6     6 SOWING DATE Sowing date     Start 2018-06-06   NA
7     7 SOWING DATE Sowing date     Start 2018-06-26   NA
8     8 SOWING DATE Sowing date     Start 2018-07-03   NA
9     9 SOWING DATE Sowing date     Start 2018-07-12   NA
10   10    Spitfire     Variety     Start 2018-05-21   NA
11   11      Sunmax     Variety     Start 2018-04-15   NA
12   12      Suntop     Variety     Start 2018-05-07   NA
13    1      Beckom     Variety       End 2018-05-31   NA
14    2        Dart     Variety       End 2018-06-30   NA
15    3     Flanker     Variety       End 2018-05-21   NA
16    4   Kittyhawk     Variety       End 2018-05-07   NA
17    5      Lancer     Variety       End 2018-05-21   NA
18    6 SOWING DATE Sowing date       End 2018-06-07   NA
19    7 SOWING DATE Sowing date       End 2018-06-27   NA
20    8 SOWING DATE Sowing date       End 2018-07-04   NA
21    9 SOWING DATE Sowing date       End 2018-07-13   NA
22   10    Spitfire     Variety       End 2018-06-21   NA
23   11      Sunmax     Variety       End 2018-05-07   NA
24   12      Suntop     Variety       End 2018-06-07   NA
25   13        <NA>    Rainfall      <NA> 2018-04-14  3.0
26   14        <NA>    Rainfall      <NA> 2018-03-30  7.0
27   15        <NA>    Rainfall      <NA> 2018-06-10  3.5
28   16        <NA>    Rainfall      <NA> 2018-06-18  4.0
29   17        <NA>    Rainfall      <NA> 2018-06-28 13.5
30   18        <NA>    Rainfall      <NA> 2018-07-23  3.0
31   19        <NA>    Rainfall      <NA> 2018-08-05  6.0
32   20        <NA>    Rainfall      <NA> 2018-08-25 23.0
33   21        <NA>    Rainfall      <NA> 2018-09-10  5.0

正如您在发布的图片上看到的那样 - 显示的图只是覆盖了两个图。虽然这也可以用 ggplot2 来做,但我觉得这不是很优雅,而且可能非常棘手,因为你需要找到两个图的确切位置,这样它看起来就很整洁。

你使用 geom_line 和你的因子水平作为 y 值的解决方法很有趣,但我不确定是否如此可取。

无论如何 - 这可能是您问题的核心。您正在 混合不同的 y 度量 - 而且它们具有不同的 类。一个图的因子水平,另一个图的数字/整数。这是有问题的。我不会努力将它们强制合并为一个 y-axis,但我宁愿创建两个图并将它们与一个图组合包(例如 patchwork 组合)。像这样

我已经重命名了你的列,我正在使用来自 GitHub 用户@alisdaire47 的包来读取你的数据,并且还更改了一些列以实现绘图。 Key is using the right 类: 日期为日期,数字为数字。

首先读取您的数据:

sowdate.df <- read.so::read_so('Item     Element    Category Start_End      Date1 rain
1     1      Beckom     Variety     Start 2018-05-07   NA
2     2        Dart     Variety     Start 2018-06-01   NA
3     3     Flanker     Variety     Start 2018-05-01   NA
4     4   Kittyhawk     Variety     Start 2018-04-01   NA
5     5      Lancer     Variety     Start 2018-05-01   NA
6     6 SOWING DATE Sowing date     Start 2018-06-06   NA
7     7 SOWING DATE Sowing date     Start 2018-06-26   NA
8     8 SOWING DATE Sowing date     Start 2018-07-03   NA
9     9 SOWING DATE Sowing date     Start 2018-07-12   NA
10   10    Spitfire     Variety     Start 2018-05-21   NA
11   11      Sunmax     Variety     Start 2018-04-15   NA
12   12      Suntop     Variety     Start 2018-05-07   NA
13    1      Beckom     Variety       End 2018-05-31   NA
14    2        Dart     Variety       End 2018-06-30   NA
15    3     Flanker     Variety       End 2018-05-21   NA
16    4   Kittyhawk     Variety       End 2018-05-07   NA
17    5      Lancer     Variety       End 2018-05-21   NA
18    6 SOWING DATE Sowing date       End 2018-06-07   NA
19    7 SOWING DATE Sowing date       End 2018-06-27   NA
20    8 SOWING DATE Sowing date       End 2018-07-04   NA
21    9 SOWING DATE Sowing date       End 2018-07-13   NA
22   10    Spitfire     Variety       End 2018-06-21   NA
23   11      Sunmax     Variety       End 2018-05-07   NA
24   12      Suntop     Variety       End 2018-06-07   NA
25   13        <NA>    Rainfall      <NA> 2018-04-14  3.0
26   14        <NA>    Rainfall      <NA> 2018-03-30  7.0
27   15        <NA>    Rainfall      <NA> 2018-06-10  3.5
28   16        <NA>    Rainfall      <NA> 2018-06-18  4.0
29   17        <NA>    Rainfall      <NA> 2018-06-28 13.5
30   18        <NA>    Rainfall      <NA> 2018-07-23  3.0
31   19        <NA>    Rainfall      <NA> 2018-08-05  6.0
32   20        <NA>    Rainfall      <NA> 2018-08-25 23.0
33   21        <NA>    Rainfall      <NA> 2018-09-10  5.0')
#> Warning: 8 parsing failures.
#> row col  expected    actual         file
#>   6  -- 6 columns 8 columns literal data
#>   7  -- 6 columns 8 columns literal data
#>   8  -- 6 columns 8 columns literal data
#>   9  -- 6 columns 8 columns literal data
#>  18  -- 6 columns 8 columns literal data
#> ... ... ......... ......... ............
#> See problems(...) for more details.

现在剧情

library(tidyverse)
library(patchwork)

准备数据(混乱是由于值缩放到您的因子水平)

sowdate <- sowdate.df %>% mutate(element_f = factor(Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk")),
                                 date = as.Date(Date1),
                                 rain = as.numeric(rain),
                                 rain_scaled = rain*max(length(levels(element_f))/max(rain, na.rm = TRUE)))
#> Warning: NAs introduced by coercion

方法 1 - 使用拼凑组合地块。我推荐这个,以免将不同的 类 混合成一个 y.

p1 <- ggplot(sowdate, aes(date, element_f, Color = Category, group = Item)) +
  geom_line(size = 10) +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin = margin(b = 0))
p2 <- ggplot(sowdate) +
  geom_col(aes(date, rain)) +
  theme(plot.margin = margin(t = 0))
p1 + p2 + plot_layout(nrow = 2, )
#> Warning: Removed 8 rows containing missing values (geom_path).
#> Warning: Removed 24 rows containing missing values (position_stack).

我删除了第一个图中的轴文本、标题和刻度以及上下图边距,使它们靠得更近

方法 2 组合不同的变量 类(我不推荐这样做。这会变得很混乱 你可以见上文和下文)。 您需要将降雨值缩放到您的因子水平,以便列重叠并且不会变得太长。 现在这需要第二个 y 轴。为此,您必须将因子水平设置为数字,而不是为左侧 y-axis 创建中断和标签,然后 re-transform 将雨值设为它们的实际值,并希望中断能够正常工作。我不认为第二 y-axis 真的有助于阅读图表。


max_rain <- max(sowdate$rain,na.rm = TRUE)
breaks_ax <- 1:length(levels(sowdate$element_f)) - sum(is.na(levels(sowdate$element_f)))
labels_ax <- as.character(levels(sowdate$element_f)[which(!is.na(levels(sowdate$element_f)))])

ggplot(sowdate, aes(date, as.numeric(element_f), Color = Category, group=Item)) +
  geom_line(size = 10) +
  geom_col(aes(date, rain_scaled)) +
  scale_y_continuous(breaks = breaks_ax, labels = labels_ax, 
                     sec.axis = sec_axis(~ .*max_rain/ max(length(levels(sowdate$element_f))))) +
  labs(y = 'Element')
#> Warning: Removed 24 rows containing missing values (position_stack).
#> Warning: Removed 17 rows containing missing values (geom_path).

reprex package (v0.3.0)

于 2020 年 1 月 22 日创建