R 中带有 ggplot 的堆积面积图:如何只使用每个对应 x 的最高 y 值?

Stacked Area Plot with ggplot in R: How to only only use the highest of y per corresponding x?

我正在尝试创建堆积面积图,但它看起来很糟糕(请参阅下面的 link)。

以下是我的数据。日期应该是 x 轴,案例应该是 y 轴。但是,同一日期多次出现,但案例数量不同。发生这种情况时,我希望日期用该特定日期(以及该特定类型)的案例总和表示一次。

另请注意,堆积面积图必须分为 3 种类型(下面数据中的 "type" 列)。

我的数据是这样的:

# Groups:   type [3]
   Province.State Country.Region   Lat  Long date       cases type      loc    total cumsum
   <chr>          <chr>          <dbl> <dbl> <date>     <int> <chr>     <chr>  <int>  <int>
 1 ""             France            47     2 2020-01-24     2 confirmed Europe     2      2
 2 ""             France            47     2 2020-01-25     1 confirmed Europe     1      3
 3 ""             Germany           51     9 2020-01-27     1 confirmed Europe     1      4
 4 ""             France            47     2 2020-01-28     1 confirmed Europe     4      5
 5 ""             Germany           51     9 2020-01-28     3 confirmed Europe     4      8
 6 ""             Finland           64    26 2020-01-29     1 confirmed Europe     2      9
 7 ""             France            47     2 2020-01-29     1 confirmed Europe     2     10
 8 ""             Germany           51     9 2020-01-31     1 confirmed Europe     6     11
 9 ""             Italy             43    12 2020-01-31     2 confirmed Europe     6     13
10 ""             Sweden            63    16 2020-01-31     1 confirmed Europe     6     14
# ... with 378 more rows

到目前为止的情节是这样的:

Ugly stacked area plot so far

根据给出的示例数据和所需图的描述...

  1. 对于 type = "death" 我简单地复制了给定的数据。举个例子。
  2. 从描述来看,并不完全清楚最终情节应该如何,例如你会展示不同的国家或地区吗?

因此,我刚刚按日期和时间制作了一个堆叠的累积案例图。试试这个:

library(ggplot2)
library(dplyr)

dataset <- structure(list(
  id = c(
    "1", "2", "3", "4", "5", "6", "7", "8",
    "9", "10", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
  ),
  Province.State = c(
    "\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
    "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
    "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\""
  ),
  Country.Region = c(
    "France", "France", "Germany", "France",
    "Germany", "Finland", "France", "Germany", "Italy", "Sweden",
    "France", "France", "Germany", "France", "Germany", "Finland",
    "France", "Germany", "Italy", "Sweden"
  ), Lat = c(
    47L, 47L,
    51L, 47L, 51L, 64L, 47L, 51L, 43L, 63L, 47L, 47L, 51L, 47L,
    51L, 64L, 47L, 51L, 43L, 63L
  ), Long = c(
    2L, 2L, 9L, 2L, 9L,
    26L, 2L, 9L, 12L, 16L, 2L, 2L, 9L, 2L, 9L, 26L, 2L, 9L, 12L,
    16L
  ), date = structure(c(
    18285, 18286, 18288, 18289, 18289,
    18290, 18290, 18292, 18292, 18292, 18285, 18286, 18288, 18289,
    18289, 18290, 18290, 18292, 18292, 18292
  ), class = "Date"),
  cases = c(
    2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
    1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L
  ), type = c(
    "confirmed", "confirmed",
    "confirmed", "confirmed", "confirmed", "confirmed", "confirmed",
    "confirmed", "confirmed", "confirmed", "death", "death",
    "death", "death", "death", "death", "death", "death", "death",
    "death"
  ), loc = c(
    "Europe", "Europe", "Europe", "Europe",
    "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
    "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
    "Europe", "Europe", "Europe", "Europe"
  ), total = c(
    2L, 1L,
    1L, 4L, 4L, 2L, 2L, 6L, 6L, 6L, 2L, 1L, 1L, 4L, 4L, 2L, 2L,
    6L, 6L, 6L
  ), cumsum = c(
    2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L,
    13L, 14L, 2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L, 13L, 14L
  )
), class = c(
  "tbl_df",
  "tbl", "data.frame"
), row.names = c(NA, -20L))

dataset_plot <- dataset %>%
  # Number of cases by date, type
  count(date, type, wt = cases, name = "cases") %>%
  # Cumulated sum over time by type
  group_by(type) %>%
  arrange(date) %>%
  mutate(cumsum = cumsum(cases))

ggplot(dataset_plot, aes(date, cumsum, fill = type)) +
  geom_area()

reprex package (v0.3.0)

于 2020-03-18 创建