使用 R 的时间线数据操作/插补

Data Manipulation/ Imputation for Timeline using R

我正在尝试使用 R 中的 vistime 包创建时间线。我遇到的问题是创建不存在数据的行,以具有连续的时间线。

手动这样做可能会非常乏味,我想找到一种方法来自动完成为缺少数据的时间段填充默认标签的过程。

这是数据和当前输出的示例:

library(vistime)

  syst <- data.frame(Position = rep(c( "DOWN"), each= 5),
    Name = c("SYS2", "SYS2","SYS4","SYS4","SYS6"),
    start = c("2018-10-16","2018-12-06","2018-10-24","2018-12-05","2018-11-09"),
    end = c("2018-11-26","2018-12-31","2018-11-23","2018-12-31","2018-12-31"),
    color = rep(c('#FF0000'), each=5),
    fontcolor = rep(c('white'), each=5))

vistime(syst, events = "Position", groups = "Name")

期望的输出:

syst2 <- data.frame(Position = rep(c( "UP","DOWN"), 5),
        Name = rep(c("SYS2", "SYS2","SYS4","SYS4","SYS6"), each=2),
        start = c("2018-10-01","2018-10-16","2018-11-26","2018-12-06","2018-10-01","2018-10-24","2018-11-23","2018-12-05","2018-10-01","2018-11-09"),
        end = c("2018-10-16","2018-11-26","2018-12-06","2018-12-31","2018-10-24","2018-11-23","2018-12-05","2018-12-31","2018-11-09","2018-12-31"),
        color = rep(c("#008000",'#FF0000'), 5),
        fontcolor = rep(c('white'), 10))


vistime(syst2, events = "Position", groups = "Name")

我们可以这样做。先让

rng <- c("2018-10-01", "2018-12-31")

是您考虑的开始日期和结束日期的向量。此外,我将 stringsAsFactors = FALSE 添加到 syst 的定义中,以避免在添加新日期时出现问题。

然后我们有

library(tidyverse)
syst2 <- syst %>% group_by(Name) %>% 
  do({bind_rows(., data.frame(Position = "UP", Name = .$Name[1], 
                              start = c(rng[1], .$end),
                              end = c(.$start, rng[2]), 
                              color = "#008000", 
                              fontcolor = "white", 
                              stringsAsFactors = FALSE))}) %>%
  filter(start != end)
vistime(syst2, events = "Position", groups = "Name")

因此,我们按 Name 分组,对于每个组,我们将现有行与一个新数据框绑定,其中所有内容都按预期指定,唯一的技巧是 startend。最后,我过滤掉那些开始日期和结束日期重合的行。