GGPlot combining/overlaying 柱形图和折线图(甘特图)
GGPlot combining/overlaying column and line (Gantt) charts
我想在包含 'suggested sowing windows' 和实际播种日期的甘特图上叠加降雨数据(列)。从数据集中,我可以分别创建两者,但不能在一张图表上创建。非常感谢任何指点。
## plot Gantt chart with suggested sowing dates and actual sowing dates
sowdate.df$Element <- factor(sowdate.df$Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk"))
ggplot(sowdate.df, aes(Date1, Element, Color=Category, group=Item)) +
geom_line(size = 10)
## plot rainfall
ggplot(sowdate.df, aes(Date1, rain)) + geom_col()
## combine Gantt and rainfall
ggplot(sowdate.df) +
geom_col(aes(Date1, rain), size = 1, color = "darkblue", fill = "white") +
geom_line(aes(Date1, Element, Color=Category, group=Item), size = 1.5, color="red", group = 1)
Item Element Category Start-End Date1 rain
1 1 Beckom Variety Start 2018-05-07 NA
2 2 Dart Variety Start 2018-06-01 NA
3 3 Flanker Variety Start 2018-05-01 NA
4 4 Kittyhawk Variety Start 2018-04-01 NA
5 5 Lancer Variety Start 2018-05-01 NA
6 6 SOWING DATE Sowing date Start 2018-06-06 NA
7 7 SOWING DATE Sowing date Start 2018-06-26 NA
8 8 SOWING DATE Sowing date Start 2018-07-03 NA
9 9 SOWING DATE Sowing date Start 2018-07-12 NA
10 10 Spitfire Variety Start 2018-05-21 NA
11 11 Sunmax Variety Start 2018-04-15 NA
12 12 Suntop Variety Start 2018-05-07 NA
13 1 Beckom Variety End 2018-05-31 NA
14 2 Dart Variety End 2018-06-30 NA
15 3 Flanker Variety End 2018-05-21 NA
16 4 Kittyhawk Variety End 2018-05-07 NA
17 5 Lancer Variety End 2018-05-21 NA
18 6 SOWING DATE Sowing date End 2018-06-07 NA
19 7 SOWING DATE Sowing date End 2018-06-27 NA
20 8 SOWING DATE Sowing date End 2018-07-04 NA
21 9 SOWING DATE Sowing date End 2018-07-13 NA
22 10 Spitfire Variety End 2018-06-21 NA
23 11 Sunmax Variety End 2018-05-07 NA
24 12 Suntop Variety End 2018-06-07 NA
25 13 <NA> Rainfall <NA> 2018-04-14 3.0
26 14 <NA> Rainfall <NA> 2018-03-30 7.0
27 15 <NA> Rainfall <NA> 2018-06-10 3.5
28 16 <NA> Rainfall <NA> 2018-06-18 4.0
29 17 <NA> Rainfall <NA> 2018-06-28 13.5
30 18 <NA> Rainfall <NA> 2018-07-23 3.0
31 19 <NA> Rainfall <NA> 2018-08-05 6.0
32 20 <NA> Rainfall <NA> 2018-08-25 23.0
33 21 <NA> Rainfall <NA> 2018-09-10 5.0
正如您在发布的图片上看到的那样 - 显示的图只是覆盖了两个图。虽然这也可以用 ggplot2 来做,但我觉得这不是很优雅,而且可能非常棘手,因为你需要找到两个图的确切位置,这样它看起来就很整洁。
你使用 geom_line
和你的因子水平作为 y 值的解决方法很有趣,但我不确定是否如此可取。
无论如何 - 这可能是您问题的核心。您正在 混合不同的 y 度量 - 而且它们具有不同的 类。一个图的因子水平,另一个图的数字/整数。这是有问题的。我不会努力将它们强制合并为一个 y-axis,但我宁愿创建两个图并将它们与一个图组合包(例如 patchwork
组合)。像这样
我已经重命名了你的列,我正在使用来自 GitHub 用户@alisdaire47 的包来读取你的数据,并且还更改了一些列以实现绘图。 Key is using the right 类: 日期为日期,数字为数字。
首先读取您的数据:
sowdate.df <- read.so::read_so('Item Element Category Start_End Date1 rain
1 1 Beckom Variety Start 2018-05-07 NA
2 2 Dart Variety Start 2018-06-01 NA
3 3 Flanker Variety Start 2018-05-01 NA
4 4 Kittyhawk Variety Start 2018-04-01 NA
5 5 Lancer Variety Start 2018-05-01 NA
6 6 SOWING DATE Sowing date Start 2018-06-06 NA
7 7 SOWING DATE Sowing date Start 2018-06-26 NA
8 8 SOWING DATE Sowing date Start 2018-07-03 NA
9 9 SOWING DATE Sowing date Start 2018-07-12 NA
10 10 Spitfire Variety Start 2018-05-21 NA
11 11 Sunmax Variety Start 2018-04-15 NA
12 12 Suntop Variety Start 2018-05-07 NA
13 1 Beckom Variety End 2018-05-31 NA
14 2 Dart Variety End 2018-06-30 NA
15 3 Flanker Variety End 2018-05-21 NA
16 4 Kittyhawk Variety End 2018-05-07 NA
17 5 Lancer Variety End 2018-05-21 NA
18 6 SOWING DATE Sowing date End 2018-06-07 NA
19 7 SOWING DATE Sowing date End 2018-06-27 NA
20 8 SOWING DATE Sowing date End 2018-07-04 NA
21 9 SOWING DATE Sowing date End 2018-07-13 NA
22 10 Spitfire Variety End 2018-06-21 NA
23 11 Sunmax Variety End 2018-05-07 NA
24 12 Suntop Variety End 2018-06-07 NA
25 13 <NA> Rainfall <NA> 2018-04-14 3.0
26 14 <NA> Rainfall <NA> 2018-03-30 7.0
27 15 <NA> Rainfall <NA> 2018-06-10 3.5
28 16 <NA> Rainfall <NA> 2018-06-18 4.0
29 17 <NA> Rainfall <NA> 2018-06-28 13.5
30 18 <NA> Rainfall <NA> 2018-07-23 3.0
31 19 <NA> Rainfall <NA> 2018-08-05 6.0
32 20 <NA> Rainfall <NA> 2018-08-25 23.0
33 21 <NA> Rainfall <NA> 2018-09-10 5.0')
#> Warning: 8 parsing failures.
#> row col expected actual file
#> 6 -- 6 columns 8 columns literal data
#> 7 -- 6 columns 8 columns literal data
#> 8 -- 6 columns 8 columns literal data
#> 9 -- 6 columns 8 columns literal data
#> 18 -- 6 columns 8 columns literal data
#> ... ... ......... ......... ............
#> See problems(...) for more details.
现在剧情
library(tidyverse)
library(patchwork)
准备数据(混乱是由于值缩放到您的因子水平)
sowdate <- sowdate.df %>% mutate(element_f = factor(Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk")),
date = as.Date(Date1),
rain = as.numeric(rain),
rain_scaled = rain*max(length(levels(element_f))/max(rain, na.rm = TRUE)))
#> Warning: NAs introduced by coercion
方法 1 - 使用拼凑组合地块。我推荐这个,以免将不同的 类 混合成一个 y.
p1 <- ggplot(sowdate, aes(date, element_f, Color = Category, group = Item)) +
geom_line(size = 10) +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = margin(b = 0))
p2 <- ggplot(sowdate) +
geom_col(aes(date, rain)) +
theme(plot.margin = margin(t = 0))
p1 + p2 + plot_layout(nrow = 2, )
#> Warning: Removed 8 rows containing missing values (geom_path).
#> Warning: Removed 24 rows containing missing values (position_stack).
我删除了第一个图中的轴文本、标题和刻度以及上下图边距,使它们靠得更近
方法 2 组合不同的变量 类(我不推荐这样做。这会变得很混乱 你可以见上文和下文)。
您需要将降雨值缩放到您的因子水平,以便列重叠并且不会变得太长。
现在这需要第二个 y 轴。为此,您必须将因子水平设置为数字,而不是为左侧 y-axis 创建中断和标签,然后 re-transform 将雨值设为它们的实际值,并希望中断能够正常工作。我不认为第二 y-axis 真的有助于阅读图表。
max_rain <- max(sowdate$rain,na.rm = TRUE)
breaks_ax <- 1:length(levels(sowdate$element_f)) - sum(is.na(levels(sowdate$element_f)))
labels_ax <- as.character(levels(sowdate$element_f)[which(!is.na(levels(sowdate$element_f)))])
ggplot(sowdate, aes(date, as.numeric(element_f), Color = Category, group=Item)) +
geom_line(size = 10) +
geom_col(aes(date, rain_scaled)) +
scale_y_continuous(breaks = breaks_ax, labels = labels_ax,
sec.axis = sec_axis(~ .*max_rain/ max(length(levels(sowdate$element_f))))) +
labs(y = 'Element')
#> Warning: Removed 24 rows containing missing values (position_stack).
#> Warning: Removed 17 rows containing missing values (geom_path).
由 reprex package (v0.3.0)
于 2020 年 1 月 22 日创建
我想在包含 'suggested sowing windows' 和实际播种日期的甘特图上叠加降雨数据(列)。从数据集中,我可以分别创建两者,但不能在一张图表上创建。非常感谢任何指点。
## plot Gantt chart with suggested sowing dates and actual sowing dates
sowdate.df$Element <- factor(sowdate.df$Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk"))
ggplot(sowdate.df, aes(Date1, Element, Color=Category, group=Item)) +
geom_line(size = 10)
## plot rainfall
ggplot(sowdate.df, aes(Date1, rain)) + geom_col()
## combine Gantt and rainfall
ggplot(sowdate.df) +
geom_col(aes(Date1, rain), size = 1, color = "darkblue", fill = "white") +
geom_line(aes(Date1, Element, Color=Category, group=Item), size = 1.5, color="red", group = 1)
Item Element Category Start-End Date1 rain
1 1 Beckom Variety Start 2018-05-07 NA
2 2 Dart Variety Start 2018-06-01 NA
3 3 Flanker Variety Start 2018-05-01 NA
4 4 Kittyhawk Variety Start 2018-04-01 NA
5 5 Lancer Variety Start 2018-05-01 NA
6 6 SOWING DATE Sowing date Start 2018-06-06 NA
7 7 SOWING DATE Sowing date Start 2018-06-26 NA
8 8 SOWING DATE Sowing date Start 2018-07-03 NA
9 9 SOWING DATE Sowing date Start 2018-07-12 NA
10 10 Spitfire Variety Start 2018-05-21 NA
11 11 Sunmax Variety Start 2018-04-15 NA
12 12 Suntop Variety Start 2018-05-07 NA
13 1 Beckom Variety End 2018-05-31 NA
14 2 Dart Variety End 2018-06-30 NA
15 3 Flanker Variety End 2018-05-21 NA
16 4 Kittyhawk Variety End 2018-05-07 NA
17 5 Lancer Variety End 2018-05-21 NA
18 6 SOWING DATE Sowing date End 2018-06-07 NA
19 7 SOWING DATE Sowing date End 2018-06-27 NA
20 8 SOWING DATE Sowing date End 2018-07-04 NA
21 9 SOWING DATE Sowing date End 2018-07-13 NA
22 10 Spitfire Variety End 2018-06-21 NA
23 11 Sunmax Variety End 2018-05-07 NA
24 12 Suntop Variety End 2018-06-07 NA
25 13 <NA> Rainfall <NA> 2018-04-14 3.0
26 14 <NA> Rainfall <NA> 2018-03-30 7.0
27 15 <NA> Rainfall <NA> 2018-06-10 3.5
28 16 <NA> Rainfall <NA> 2018-06-18 4.0
29 17 <NA> Rainfall <NA> 2018-06-28 13.5
30 18 <NA> Rainfall <NA> 2018-07-23 3.0
31 19 <NA> Rainfall <NA> 2018-08-05 6.0
32 20 <NA> Rainfall <NA> 2018-08-25 23.0
33 21 <NA> Rainfall <NA> 2018-09-10 5.0
正如您在发布的图片上看到的那样 - 显示的图只是覆盖了两个图。虽然这也可以用 ggplot2 来做,但我觉得这不是很优雅,而且可能非常棘手,因为你需要找到两个图的确切位置,这样它看起来就很整洁。
你使用 geom_line
和你的因子水平作为 y 值的解决方法很有趣,但我不确定是否如此可取。
无论如何 - 这可能是您问题的核心。您正在 混合不同的 y 度量 - 而且它们具有不同的 类。一个图的因子水平,另一个图的数字/整数。这是有问题的。我不会努力将它们强制合并为一个 y-axis,但我宁愿创建两个图并将它们与一个图组合包(例如 patchwork
组合)。像这样
我已经重命名了你的列,我正在使用来自 GitHub 用户@alisdaire47 的包来读取你的数据,并且还更改了一些列以实现绘图。 Key is using the right 类: 日期为日期,数字为数字。
首先读取您的数据:
sowdate.df <- read.so::read_so('Item Element Category Start_End Date1 rain
1 1 Beckom Variety Start 2018-05-07 NA
2 2 Dart Variety Start 2018-06-01 NA
3 3 Flanker Variety Start 2018-05-01 NA
4 4 Kittyhawk Variety Start 2018-04-01 NA
5 5 Lancer Variety Start 2018-05-01 NA
6 6 SOWING DATE Sowing date Start 2018-06-06 NA
7 7 SOWING DATE Sowing date Start 2018-06-26 NA
8 8 SOWING DATE Sowing date Start 2018-07-03 NA
9 9 SOWING DATE Sowing date Start 2018-07-12 NA
10 10 Spitfire Variety Start 2018-05-21 NA
11 11 Sunmax Variety Start 2018-04-15 NA
12 12 Suntop Variety Start 2018-05-07 NA
13 1 Beckom Variety End 2018-05-31 NA
14 2 Dart Variety End 2018-06-30 NA
15 3 Flanker Variety End 2018-05-21 NA
16 4 Kittyhawk Variety End 2018-05-07 NA
17 5 Lancer Variety End 2018-05-21 NA
18 6 SOWING DATE Sowing date End 2018-06-07 NA
19 7 SOWING DATE Sowing date End 2018-06-27 NA
20 8 SOWING DATE Sowing date End 2018-07-04 NA
21 9 SOWING DATE Sowing date End 2018-07-13 NA
22 10 Spitfire Variety End 2018-06-21 NA
23 11 Sunmax Variety End 2018-05-07 NA
24 12 Suntop Variety End 2018-06-07 NA
25 13 <NA> Rainfall <NA> 2018-04-14 3.0
26 14 <NA> Rainfall <NA> 2018-03-30 7.0
27 15 <NA> Rainfall <NA> 2018-06-10 3.5
28 16 <NA> Rainfall <NA> 2018-06-18 4.0
29 17 <NA> Rainfall <NA> 2018-06-28 13.5
30 18 <NA> Rainfall <NA> 2018-07-23 3.0
31 19 <NA> Rainfall <NA> 2018-08-05 6.0
32 20 <NA> Rainfall <NA> 2018-08-25 23.0
33 21 <NA> Rainfall <NA> 2018-09-10 5.0')
#> Warning: 8 parsing failures.
#> row col expected actual file
#> 6 -- 6 columns 8 columns literal data
#> 7 -- 6 columns 8 columns literal data
#> 8 -- 6 columns 8 columns literal data
#> 9 -- 6 columns 8 columns literal data
#> 18 -- 6 columns 8 columns literal data
#> ... ... ......... ......... ............
#> See problems(...) for more details.
现在剧情
library(tidyverse)
library(patchwork)
准备数据(混乱是由于值缩放到您的因子水平)
sowdate <- sowdate.df %>% mutate(element_f = factor(Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk")),
date = as.Date(Date1),
rain = as.numeric(rain),
rain_scaled = rain*max(length(levels(element_f))/max(rain, na.rm = TRUE)))
#> Warning: NAs introduced by coercion
方法 1 - 使用拼凑组合地块。我推荐这个,以免将不同的 类 混合成一个 y.
p1 <- ggplot(sowdate, aes(date, element_f, Color = Category, group = Item)) +
geom_line(size = 10) +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = margin(b = 0))
p2 <- ggplot(sowdate) +
geom_col(aes(date, rain)) +
theme(plot.margin = margin(t = 0))
p1 + p2 + plot_layout(nrow = 2, )
#> Warning: Removed 8 rows containing missing values (geom_path).
#> Warning: Removed 24 rows containing missing values (position_stack).
我删除了第一个图中的轴文本、标题和刻度以及上下图边距,使它们靠得更近
方法 2 组合不同的变量 类(我不推荐这样做。这会变得很混乱 你可以见上文和下文)。 您需要将降雨值缩放到您的因子水平,以便列重叠并且不会变得太长。 现在这需要第二个 y 轴。为此,您必须将因子水平设置为数字,而不是为左侧 y-axis 创建中断和标签,然后 re-transform 将雨值设为它们的实际值,并希望中断能够正常工作。我不认为第二 y-axis 真的有助于阅读图表。
max_rain <- max(sowdate$rain,na.rm = TRUE)
breaks_ax <- 1:length(levels(sowdate$element_f)) - sum(is.na(levels(sowdate$element_f)))
labels_ax <- as.character(levels(sowdate$element_f)[which(!is.na(levels(sowdate$element_f)))])
ggplot(sowdate, aes(date, as.numeric(element_f), Color = Category, group=Item)) +
geom_line(size = 10) +
geom_col(aes(date, rain_scaled)) +
scale_y_continuous(breaks = breaks_ax, labels = labels_ax,
sec.axis = sec_axis(~ .*max_rain/ max(length(levels(sowdate$element_f))))) +
labs(y = 'Element')
#> Warning: Removed 24 rows containing missing values (position_stack).
#> Warning: Removed 17 rows containing missing values (geom_path).
由 reprex package (v0.3.0)
于 2020 年 1 月 22 日创建