使用具有重复 y 因子的 ggplot 在基于 geom_segment 的甘特图中排序条形图

Question

问题

我有一个数据集，se.df（问题底部的数据），我通过使用 ggplot 和 facet_grid 将其可视化为分解甘特图。但是，y 标签没有按照我指定的顺序排列 aes

library(ggplot2)
base <- ggplot(
  se.df,
  aes(
    x = Start.Date, reorder(Action,Start.Date), color = Comms.Type
  ))
base + geom_segment(aes(
  xend = End.Date,ystart = Action, yend = Action
), size = 5) + 
facet_grid(Source ~ .,scale = "free_y",space = "free_y", drop = TRUE)

在这张详细图片中，您可以看到有以下条形：

未显示在 Start.Date 订单中
Action 未订购。为澄清起见，柱状图应按 Start.Date 排序，然后按字母顺序 Action

如何根据 Start.Date 然后按 Action 对每个因素内的条形图进行排序？

更新

@heathobrien 提供了一个解决方案，解决了我按 Start.Date 订购金条的问题，而不是重复因素引起的问题——这是我的实际数据所具有的问题。

Action 中有两个 "Inform colleges" 的实例，这导致 @heathobrien 的以下代码顺序错误，在图像中用红色虚线椭圆突出显示：

se.df <-se.df[order(se.df$Start.Date,se.df$Action),]
se.df$Action <- factor(se.df$Action, levels=unique(se.df$Action))
ggplot(se.df, aes(x = Start.Date, color = Comms.Type)) +
  geom_segment(aes(xend = End.Date, y = Action, yend = Action), size = 5) +
  facet_grid(Source ~ .,scale = "free_y",space = "free_y", drop = TRUE)

如何将此 data.frame 提供给 ggplot，以便每个 facet_grid 中的顺序一致？

数据

se.df <- structure(list(Source = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("a", "b", "c"), class = c("ordered", 
"factor")), Action = structure(c(21L, 30L, 19L, 27L, 16L, 17L, 
18L, 13L, 12L, 3L, 1L, 8L, 4L, 21L, 20L, 27L, 15L, 17L, 18L, 
14L, 26L, 2L, 8L, 5L, 22L, 26L, 2L, 8L, 5L, 22L, 22L, 11L, 7L, 
24L, 29L, 6L, 23L, 25L, 25L, 10L, 28L, 9L), .Label = c("Add OA \"Act on Acceptance\" to websites", 
"Add RDM liaison presece to divisional and departmental websites", 
"All-staff message from VC and/or Pro-VC (Research)", "Arrange OA Briefing for every department", 
"Arrange RDM Briefing for every department", "Brief Communication Officers Network", 
"Brief Conference of Colleges", "Brief divisional board/commitees", 
"Brief Faculty IT Officers", "Brief Research Committee", "Brief Senior Tutors", 
"Brief/mobilise internal comms officers", "Brief/mobilise ORFN", 
"Brief/mobilise Subject Librarians", "Ceate template slides for colleagues to use in delivering RDM Briefings", 
"Create template slides for colleagues to use in delivering OA Briefings", 
"Create template text & icon for use on websites", "Draft material for use in staff induction", 
"Ensure webpages for ORA & Symplectic Elements  are updated & consistent", 
"Ensure webpages for ORA-Data are updated & consistent", "Finalise key messages and draft campaign text", 
"Inform colleges ", "Inform Heads of Departments and Research Directors", 
"Present at Departmental Administrator's Meeting", "Present at HAF meeting", 
"Present at UAS Conference", "Produce hard copy materials to promote message ", 
"Update Divisional Board", "Update Library Committee (CLIPS)", 
"Update OAO website content for HEFCE/REF"), class = "factor"), 
    Start.Date = structure(c(1435705200, 1435705200, 1438383600, 
    1441062000, 1441062000, 1441062000, 1441062000, 1444518000, 
    1444518000, 1425168000, 1420070400, 1444518000, 1444518000, 
    1441062000, 1441062000, 1441062000, 1441062000, 1441062000, 
    1441062000, 1438383600, 1441062000, 1420070400, 1444518000, 
    1444518000, 1443654000, 1441062000, 1420070400, 1444518000, 
    1444518000, 1443654000, 1441062000, 1444518000, 1449273600, 
    1444518000, 1444518000, 1445036400, 1441062000, 1443740400, 
    1443740400, 1443740400, 1447459200, 1443740400), class = c("POSIXct", 
    "POSIXt"), tzone = ""), End.Date = structure(c(1440975600, 
    1440975600, 1443567600, 1443567600, 1443567600, 1443567600, 
    1443567600, 1449273600, 1449273600, 1430348400, 1446249600, 
    1449273600, 1449273600, 1446249600, 1446249600, 1443567600, 
    1443567600, 1443567600, 1443567600, 1443567600, 1443567600, 
    1443567600, 1449014400, 1449014400, 1451520000, 1443567600, 
    1443567600, 1449014400, 1449014400, 1451520000, 1443567600, 
    1449014400, 1449619200, 1449014400, 1449014400, 1446249600, 
    1446249600, 1449792000, 1449792000, 1449792000, 1447804800, 
    1449792000), class = c("POSIXct", "POSIXt"), tzone = ""), 
    Comms.Type = structure(c(3L, 7L, 7L, 6L, 5L, 7L, 8L, 4L, 
    4L, 2L, 7L, 1L, 1L, 3L, 7L, 6L, 5L, 7L, 8L, 4L, 5L, 7L, 1L, 
    1L, 5L, 5L, 7L, 1L, 1L, 5L, 1L, 1L, 1L, 5L, 3L, 1L, 1L, 5L, 
    5L, 1L, 1L, 1L), .Label = c("Briefing", "Email", "Mixed Media", 
    "Mobilisation", "Presentations", "Printed Materials", "Website", 
    "Workshop"), class = "factor")), .Names = c("Source", "Action", 
"Start.Date", "End.Date", "Comms.Type"), row.names = c(NA, -42L
), class = c("tbl_df", "tbl", "data.frame"))

Answer 1

按照您想要的顺序对数据框进行排序后，您应该能够将其用作因子的级别：

se.df <-se.df[order(se.df$Start.Date,se.df$Action),]
se.df$Action <- factor(se.df$Action, levels=unique(se.df$Action))
ggplot(se.df, aes(x = Start.Date, color = Comms.Type)) +
  geom_segment(aes(xend = End.Date, y = Action, yend = Action), size = 5) +
  facet_grid(Source ~ .,scale = "free_y",space = "free_y", drop = TRUE)

Answer 2

我认为这就是 OP 正在寻找的：

我不得不创建一个合成的 taskID 来传递订单（通过增加 Start.Date，alpha by Action）。顺便说一句，如果您想按 Action 的字母顺序排序，则需要更改因子的顺序或转换为字符。

# first let's order the DF the way we want it to appear 
#    (higher taskID's first)

# dplyr-free version
se.df$Action <- as.character(se.df$Action)  
se.df <- se.df[order(se.df$Start.Date, se.df$Action), ]
se.df$taskID <- as.factor(nrow(se.df):1)


library(ggplot2)
ggplot(se.df, aes(x = Start.Date, y=taskID, color = Comms.Type)) +
  scale_y_discrete(breaks=se.df$taskID, labels = se.df$Action) + 
  geom_segment(aes(xend = End.Date, y = taskID, yend = taskID), size = 5) +
  facet_grid(Source ~ .,scale = "free_y",space = "free_y", drop = TRUE)

Answer 3

这个问题是在 tidyverse 和优秀的 forcats 库出现之前的 2015 年提出的。

这是问题的 tidyverse 解决方案：

使用arrange按Start.Date和Action排序数据，然后使用row_number()

创建task_id

library("tidyverse")
se.df <- se.df %>%
  arrange(desc(Start.Date), Action) %>%
  mutate(task_id = row_number())

使用fct_reorder将Action转换为按task_id排序的因子。

se.df <- se.df %>%
  mutate(
    Action = fct_reorder(Action, task_id),
    Action = fct_rev(Action)
  )

现在我们可以绘制此数据而无需替换轴标签：

se.df %>%
  ggplot(aes(x = Start.Date, y = Action, color = Comms.Type)) +
  geom_segment(aes(xend = End.Date, y = Action, yend = Action), size = 5) +
  facet_grid(Source ~ ., scale = "free_y", space = "free_y", drop = TRUE)

使用具有重复 y 因子的 ggplot 在基于 geom_segment 的甘特图中排序条形图

Ordering bars in geom_segment based gantt chart using ggplot with duplicate y-factors

visualization

r

ggplot2

dataframe

问题

更新

更多细节

数据