基于R中的两个条件分组和重命名(dplyr)

Group and rename based on two conditions in R (dplyr)

我有一个数据集,df:

最终我希望能够将数据分组到 'chunks' 中,其中文件夹列包含字符串 'Out',确保考虑 DATE 和它关联的空消息值和。有没有一种方法可以为 'Out' 和出现空消息行的每个实例创建一个块,同时计算其持续时间。

  Folder               DATE                         Message
  Outdata              9/9/2019 5:46:00                   
  Outdata              9/9/2019 5:46:01
  Outdata              9/9/2019 5:46:02
  In                   9/9/2019 5:46:03            hello
  In                   9/9/2019 5:46:04            hello
  Outdata              9/10/2019 6:00:01
  Outdata              9/10/2019 6:00:02
  In                   9/11/2019 7:50:00           hello
  In                   9/11/2019 7:50:01           hello

我想要这个输出:

 New Variable        Duration        Message
 Outdata1              2 sec
 Outdata2              1 sec

我已经包含了输出:

dput(sample)
structure(list(Folder = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 
1L, 1L), .Label = c("In", "Outdata"), class = "factor"), Date = structure(c(5L, 
6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L), .Label = c("9/10/2019 6:00:01 AM", 
"9/10/2019 6:00:02 AM", "9/11/2019 7:50:00 AM", "9/11/2019 7:50:01 AM", 
"9/9/2019 5:46:00 AM", "9/9/2019 5:46:01 AM", "9/9/2019 5:46:02 AM", 
"9/9/2019 5:46:03 AM", "9/9/2019 5:46:04 AM"), class = "factor"), 
Message = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("", 
"hello"), class = "factor")), class = "data.frame", row.names = c(NA, 
-9L))

这是我试过的,效果不错,我只需要考虑一下 消息值也是空的。

  library(dplyr)

  df  %>%
  mutate(DATE = as.POSIXct(DATE, format = "%m/%d/%Y %I:%M:%S %p"), 
     gr = cumsum(Folder != lag(Folder, default = TRUE))) %>%
 filter(Folder == "Out") %>%
 arrange(gr, DATE) %>%
 group_by(gr) %>%
 summarise(Duration = difftime(last(DATE), first(DATE), units = "secs")) %>%
 mutate(gr = paste0('Out', row_number()))

上面的代码工作正常,但我不确定如何满足 row == ""

的条件

可能只是 pasteMessage 在一个字符串中。

library(dplyr)

sample  %>%
  mutate(DATE = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p"), 
         gr = cumsum(Folder != lag(Folder, default = TRUE))) %>%
  filter(Folder == "Outdata") %>%
  arrange(gr, DATE) %>%
  group_by(gr) %>%
  summarise(Duration = difftime(last(DATE), first(DATE), units = "secs"), 
            Message = paste0(Message, collapse = "")) %>%
  mutate(gr = paste0('Out', row_number()))