R 中带有 ggplot 的堆积面积图:如何只使用每个对应 x 的最高 y 值?
Stacked Area Plot with ggplot in R: How to only only use the highest of y per corresponding x?
我正在尝试创建堆积面积图,但它看起来很糟糕(请参阅下面的 link)。
以下是我的数据。日期应该是 x 轴,案例应该是 y 轴。但是,同一日期多次出现,但案例数量不同。发生这种情况时,我希望日期用该特定日期(以及该特定类型)的案例总和表示一次。
另请注意,堆积面积图必须分为 3 种类型(下面数据中的 "type" 列)。
我的数据是这样的:
# Groups: type [3]
Province.State Country.Region Lat Long date cases type loc total cumsum
<chr> <chr> <dbl> <dbl> <date> <int> <chr> <chr> <int> <int>
1 "" France 47 2 2020-01-24 2 confirmed Europe 2 2
2 "" France 47 2 2020-01-25 1 confirmed Europe 1 3
3 "" Germany 51 9 2020-01-27 1 confirmed Europe 1 4
4 "" France 47 2 2020-01-28 1 confirmed Europe 4 5
5 "" Germany 51 9 2020-01-28 3 confirmed Europe 4 8
6 "" Finland 64 26 2020-01-29 1 confirmed Europe 2 9
7 "" France 47 2 2020-01-29 1 confirmed Europe 2 10
8 "" Germany 51 9 2020-01-31 1 confirmed Europe 6 11
9 "" Italy 43 12 2020-01-31 2 confirmed Europe 6 13
10 "" Sweden 63 16 2020-01-31 1 confirmed Europe 6 14
# ... with 378 more rows
到目前为止的情节是这样的:
Ugly stacked area plot so far
根据给出的示例数据和所需图的描述...
- 对于 type = "death" 我简单地复制了给定的数据。举个例子。
- 从描述来看,并不完全清楚最终情节应该如何,例如你会展示不同的国家或地区吗?
因此,我刚刚按日期和时间制作了一个堆叠的累积案例图。试试这个:
library(ggplot2)
library(dplyr)
dataset <- structure(list(
id = c(
"1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
),
Province.State = c(
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\""
),
Country.Region = c(
"France", "France", "Germany", "France",
"Germany", "Finland", "France", "Germany", "Italy", "Sweden",
"France", "France", "Germany", "France", "Germany", "Finland",
"France", "Germany", "Italy", "Sweden"
), Lat = c(
47L, 47L,
51L, 47L, 51L, 64L, 47L, 51L, 43L, 63L, 47L, 47L, 51L, 47L,
51L, 64L, 47L, 51L, 43L, 63L
), Long = c(
2L, 2L, 9L, 2L, 9L,
26L, 2L, 9L, 12L, 16L, 2L, 2L, 9L, 2L, 9L, 26L, 2L, 9L, 12L,
16L
), date = structure(c(
18285, 18286, 18288, 18289, 18289,
18290, 18290, 18292, 18292, 18292, 18285, 18286, 18288, 18289,
18289, 18290, 18290, 18292, 18292, 18292
), class = "Date"),
cases = c(
2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L
), type = c(
"confirmed", "confirmed",
"confirmed", "confirmed", "confirmed", "confirmed", "confirmed",
"confirmed", "confirmed", "confirmed", "death", "death",
"death", "death", "death", "death", "death", "death", "death",
"death"
), loc = c(
"Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe"
), total = c(
2L, 1L,
1L, 4L, 4L, 2L, 2L, 6L, 6L, 6L, 2L, 1L, 1L, 4L, 4L, 2L, 2L,
6L, 6L, 6L
), cumsum = c(
2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L,
13L, 14L, 2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L, 13L, 14L
)
), class = c(
"tbl_df",
"tbl", "data.frame"
), row.names = c(NA, -20L))
dataset_plot <- dataset %>%
# Number of cases by date, type
count(date, type, wt = cases, name = "cases") %>%
# Cumulated sum over time by type
group_by(type) %>%
arrange(date) %>%
mutate(cumsum = cumsum(cases))
ggplot(dataset_plot, aes(date, cumsum, fill = type)) +
geom_area()
由 reprex package (v0.3.0)
于 2020-03-18 创建
我正在尝试创建堆积面积图,但它看起来很糟糕(请参阅下面的 link)。
以下是我的数据。日期应该是 x 轴,案例应该是 y 轴。但是,同一日期多次出现,但案例数量不同。发生这种情况时,我希望日期用该特定日期(以及该特定类型)的案例总和表示一次。
另请注意,堆积面积图必须分为 3 种类型(下面数据中的 "type" 列)。
我的数据是这样的:
# Groups: type [3]
Province.State Country.Region Lat Long date cases type loc total cumsum
<chr> <chr> <dbl> <dbl> <date> <int> <chr> <chr> <int> <int>
1 "" France 47 2 2020-01-24 2 confirmed Europe 2 2
2 "" France 47 2 2020-01-25 1 confirmed Europe 1 3
3 "" Germany 51 9 2020-01-27 1 confirmed Europe 1 4
4 "" France 47 2 2020-01-28 1 confirmed Europe 4 5
5 "" Germany 51 9 2020-01-28 3 confirmed Europe 4 8
6 "" Finland 64 26 2020-01-29 1 confirmed Europe 2 9
7 "" France 47 2 2020-01-29 1 confirmed Europe 2 10
8 "" Germany 51 9 2020-01-31 1 confirmed Europe 6 11
9 "" Italy 43 12 2020-01-31 2 confirmed Europe 6 13
10 "" Sweden 63 16 2020-01-31 1 confirmed Europe 6 14
# ... with 378 more rows
到目前为止的情节是这样的:
Ugly stacked area plot so far
根据给出的示例数据和所需图的描述...
- 对于 type = "death" 我简单地复制了给定的数据。举个例子。
- 从描述来看,并不完全清楚最终情节应该如何,例如你会展示不同的国家或地区吗?
因此,我刚刚按日期和时间制作了一个堆叠的累积案例图。试试这个:
library(ggplot2)
library(dplyr)
dataset <- structure(list(
id = c(
"1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
),
Province.State = c(
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\""
),
Country.Region = c(
"France", "France", "Germany", "France",
"Germany", "Finland", "France", "Germany", "Italy", "Sweden",
"France", "France", "Germany", "France", "Germany", "Finland",
"France", "Germany", "Italy", "Sweden"
), Lat = c(
47L, 47L,
51L, 47L, 51L, 64L, 47L, 51L, 43L, 63L, 47L, 47L, 51L, 47L,
51L, 64L, 47L, 51L, 43L, 63L
), Long = c(
2L, 2L, 9L, 2L, 9L,
26L, 2L, 9L, 12L, 16L, 2L, 2L, 9L, 2L, 9L, 26L, 2L, 9L, 12L,
16L
), date = structure(c(
18285, 18286, 18288, 18289, 18289,
18290, 18290, 18292, 18292, 18292, 18285, 18286, 18288, 18289,
18289, 18290, 18290, 18292, 18292, 18292
), class = "Date"),
cases = c(
2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L
), type = c(
"confirmed", "confirmed",
"confirmed", "confirmed", "confirmed", "confirmed", "confirmed",
"confirmed", "confirmed", "confirmed", "death", "death",
"death", "death", "death", "death", "death", "death", "death",
"death"
), loc = c(
"Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe"
), total = c(
2L, 1L,
1L, 4L, 4L, 2L, 2L, 6L, 6L, 6L, 2L, 1L, 1L, 4L, 4L, 2L, 2L,
6L, 6L, 6L
), cumsum = c(
2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L,
13L, 14L, 2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L, 13L, 14L
)
), class = c(
"tbl_df",
"tbl", "data.frame"
), row.names = c(NA, -20L))
dataset_plot <- dataset %>%
# Number of cases by date, type
count(date, type, wt = cases, name = "cases") %>%
# Cumulated sum over time by type
group_by(type) %>%
arrange(date) %>%
mutate(cumsum = cumsum(cases))
ggplot(dataset_plot, aes(date, cumsum, fill = type)) +
geom_area()
由 reprex package (v0.3.0)
于 2020-03-18 创建