在 ggplot2 中随时间绘制多条线;希望能更好的区分线条

Plotting multiple lines over time in ggplot2; hope to better distinguish lines

我发帖主要是因为我真的认为我把它复杂化了。随着时间的推移,我正在创建 12 条不同线条的图。我希望每一天都显示在 x-axis 上,每个日期下面都有 "title"。

我已经尝试了一些解决方案和我所拥有的"works",但效果不是很好。忽略我在那里的占位符,我希望有一些点可以增加它们,并更清楚地显示人们在哪里。我的代码似乎有点冗长;也许有更好的方法来做到这一点。

riddle_log <- structure(list(date = structure(c(1559779200, 1559865600, 1560124800, 
1560211200, 1560297600, 1560384000, 1560470400, 1560470400, 1560470400, 
1560729600, 1560729600, 1560816000, 1560902400, 1560988800, 1561075200, 
1561334400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    title = c("The Midget", "Bowling Balls", "Poisonous Ice", 
    "Dog Crosses River", "Camel Race", "Two Masked Men", "The Cabin", 
    "Black Truck", "Burglary", "Japanese Ship", "Haunted Floor", 
    "East and West", "Filling the Room", "Untied", "Window Jumper", 
    "Window Faller"), Brigid = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), Carly = c(0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 
    3, 3, 3, 3, 3, 3), Christian = c(1, 1, 1, 1, 1, 1, 1, 1, 
    2, 2, 3, 3, 3, 3, 4, 4), Daniel = c(0, 0, 0, 0, 0, 1, 1, 
    2, 2, 2, 2, 3, 3, 3, 3, 3.5), Jess = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Luke = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Mara = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Marcus = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 1, 2, 2, 3, 3, 3, 3.5), Nassim = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Nathalie = c(0, 0, 1, 
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Neil = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-16L), class = c("tbl_df", "tbl", "data.frame"))

library(tidyverse)
library(ggthemes)

line1 <- riddle_log %>% 
  select(date, Brigid)

line2 <- riddle_log %>% 
  select(date, Carly)

line3 <- riddle_log %>% 
  select(date, Christian)

line4 <- riddle_log %>% 
  select(date, Daniel)

line5 <- riddle_log %>% 
  select(date, Jess)

line6 <- riddle_log %>% 
  select(date, Luke)

line7 <- riddle_log %>% 
  select(date, Mara)

line8 <- riddle_log %>% 
  select(date, Marcus)

line9 <- riddle_log %>% 
  select(date, Nassim)

line10 <- riddle_log %>% 
  select(date, Nathalie)

line11 <- riddle_log %>% 
  select(date, Neil)

ggplot() + 
  geom_line(data = line1, aes(x = date, y = Brigid, color = "a")) +
  geom_line(data = line2, aes(x = date, y = Carly, color = "b")) +
  geom_line(data = line3, aes(x = date, y = Christian, color = "c")) +
  geom_line(data = line4, aes(x = date, y = Daniel, color = "d")) +
  geom_line(data = line5, aes(x = date, y = Jess, color = "e")) +
  geom_line(data = line6, aes(x = date, y = Luke, color = "f")) +
  geom_line(data = line7, aes(x = date, y = Mara, color = "g")) +
  geom_line(data = line8, aes(x = date, y = Marcus, color = "h")) +
  geom_line(data = line9, aes(x = date, y = Nassim, color = "i")) +
  geom_line(data = line10, aes(x = date, y = Nathalie, color = "j")) +
  geom_line(data = line11, aes(x = date, y = Neil, color = "k")) +
  scale_color_manual(name = "Analysts", 
                     values = c("a" = "blue", "b" = "red", "c" = "orange", "d" = "black",
                                "e" = "steelblue", "f" = "blue", "g" = "blue", "h" = "blue",
                                "i" = "blue", "j" = "blue", "k" = "blue")) +
  xlab('Date') +
  ylab('Wins') +
  ggtitle(" NAME ") 

#+
 # scale_x_date(breaks = as.Date(c("2019-05-01", "2019-08-15")))



 # scale_x_discrete(name, breaks, labels, limits)

总之我想补充四点: - x-axis 上显示的所有日期。周末被跳过,但我不希望他们在情节中有空白,而是被视为连续的日子。 -如果有可能以某种方式合并标题,那会很酷,除非我正在努力思考,因为有些日子有多个标题。 - 一种更独特的方式来查看所有线路进度,而不是这里发生的不良重叠 -积分。

如果有任何主题更适合解决此类问题,我愿意接受。

这里有一个转换为 "long" 数据的例子,使 ggplot 更容易。我还添加了一个 geom_jitter 图层,以便更容易看到重叠的日子。

riddle_log %>%
  tidyr::gather(Analyst, Wins, -c(date, title)) %>%
  ggplot(aes(x = date, y = Wins, color = Analyst)) +
  geom_line() +
  geom_jitter( width = 0, shape = 21, alpha = 0.7) + # one way to show daily overlap
  scale_color_manual(name = "Analysts", 
                     values = c("Brigid" = "blue", "Carly" = "red", 
                                "Christian" = "orange", "Daniel" = "black",
                                "Jess" = "steelblue", "Luke" = "blue", 
                                "Mara" = "blue", "Marcus" = "blue",
                                "Nassim" = "blue", "Nathalie" = "blue", 
                                "Neil" = "blue"))

首先,你的密码是"a little long winded",你是对的。要利用 ggplot,您应该将数据保存在 tidy ("tall") format 中,其中一个变量用于 "person",另一个变量用于人员得分。使用 tidyr 包中的 gather() 很容易实现:

riddle_log2 <- riddle_log %>%
  tidyr::gather("Analyst", "Wins", Brigid:Neil)

现在数据采用了 ggplot 的首选格式,我们可以更轻松地绘制它们,如下所示:

ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) + 
  geom_line(size = 2)

然而,很多线是相互重叠的。我们可以尝试用更粗的线条绘制第一人称(首先绘制并将在其他线条之后结束),从而使情节更好,例如:

ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) + 
  geom_line(aes(size = Analyst)) +
  scale_size_manual(values = seq(4, 1, length = 11))

现在,这稍微好一点。接下来,我们可以改善颜色。 R 有大量可用的调色板。在这种情况下,我经常使用 the palettes of Paul Tol:

tol_colors = c("#332288", "#6699CC", "#88CCEE", "#44AA99", "#117733", "#999933",   
               "#DDCC77", "#661100", "#CC6677", "#882255", "#AA4499")
ggplot(riddle_log2) + 
  geom_line(aes(x = date, y = Wins, color = Analyst, size = Analyst)) +
  scale_size_manual(values = seq(5, 1, length = 11)) +
  scale_color_manual(values = tol_colors)

现在,这并不完美,但它是一个改进。您应该考虑的是使用 facet_wrap():

将地块分成一堆子地块
gg <- ggplot(riddle_log2, aes(x = date, y = Wins, color = Analyst)) + 
  geom_line(size = 2) +
  scale_color_manual(values = tol_colors) + 
  facet_wrap(~Analyst) 
gg

我认为在这种情况下这是一个更好的选择。

接下来,您还希望 x 轴显示所有日期。每天显示有点太少了space,所以我在这里显示每隔一天的标签:

gg + 
  scale_x_datetime(breaks = "2 day", date_labels = "%d. %b") +
  theme(axis.text.x = element_text(hjust = 0, angle = -45))

如您所见,格式化标签并不简单,但非常灵活。尤其是如何显示 time/date 的代码非常糟糕;在这种情况下,%d 表示 "date",%m 表示 "abbreviated month"。其他代码可以通过运行?strptime找到。

最后,每当 "Win" 分数增加时,我们都会添加当天的 "title"。我们首先添加一个变量 'Wins_increase' 以增加获胜次数:

riddle_log2 <- riddle_log2 %>%
  arrange(Analyst, date) %>%                # Make sure sortings is correct
  group_by(Analyst) %>%                     # 'Wins_increase' will be calculated for every Analyst 
  mutate(Wins_increase = Wins - lag(Wins))  # How much 'Wins' have increased since last day

然后我们使用geom_text()添加旋转标签:

gg + scale_x_datetime(breaks = "2 day", date_labels = "%d. %b") +  # as before
  theme(axis.text.x = element_text(hjust = 0, angle = -45)) +      # as before
  geom_text(data = riddle_log2 %>% filter(Wins_increase > 0),      # Pick only where "Wins" is increasing
            aes(y = Wins + 0.3, label = title),                    # We add 0.3 to lift the labels a bit
            hjust = 0, angle = 90, size = 2)                       # Left-adjust and rotate labels

接下来要解决的问题是 Marcus 标签之间的重叠(因为他在同一天赢了两次)。这可以使用 ggrepel 包修复。