合并数据框中的两行 - 开始时间和结束时间

Merging two rows within a dataframe - start and end time

希望我做对了,因为这是我第一次在这里发帖!我目前有一个看起来像这样的数据集(总共有 160k 个条目):

Geocode Barrier.ID Device.ID City Date Time State.code
603 7 392 Por 31/01/2021 10:39:10 Deactivated
603 7 392 Por 31/01/2021 10:54:18 Deactivated
603 7 392 Por 31/01/2021 11:10:38 Activated
603 7 392 Por 31/01/2021 11:11:37 Deactivated
603 7 392 Por 31/01/2021 11:12:18 Activated
603 7 392 Por 31/01/2021 11:13:37 Deactivated
603 7 392 Por 31/01/2021 11:17:38 Activated
603 7 392 Por 31/01/2021 11:19:37 Deactivated
603 7 392 Por 31/01/2021 11:26:25 Activated
603 7 392 Por 31/01/2021 11:29:37 Deactivated
603 7 392 Por 31/01/2021 11:40:38 Activated
603 7 392 Por 31/01/2021 11:45:38 Activated
603 7 392 Por 31/01/2021 11:49:38 Deactivated

原始数据输入:

structure(list(Geocode = c("603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603"), 
Barrier.ID = c("7",  "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7"), 
Device.ID = c("392","392", "392", "392", "392","392", "392", "392", "392", "392", "392","392", "392"), 
City = c("Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por"),
Date = c("31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021"), 
Time = c("10:39:10", "10:54:18", "11:10:38", "11:11:37", "11:12:18", "11:13:37", 
"11:17:38", "11:19:37", "11:26:25", "11:29:37", "11:40:38", "11:45:38", "11:49:38"), 
State.code = c("Deactivated", "Deactivated", "Activated", "Deactivated", 
"Activated", "Deactivated", "Activated", "Deactivated", "Activated", 
"Deactivated", "Activated", "Activated", "Deactivated")), 
row.names = c(NA, 13L), 
class = "data.frame")

我想创建类似于下面的 table 的内容。请注意,第一个 table 的前两行已被删除,因为事件必须始终以“激活”状态代码开头。上面 table 的第 11 行也已被删除,因为那里有一个未完成的事件。只有“激活”事件,没有“停用”事件。

Geocode Barrier.ID Device.ID City Date State.code Activated Deactivated
603 7 392 Por 31/01/2021 Activated 11:10:38 11:11:37
603 7 392 Por 31/01/2021 Activated 11:12:18 11:13:37
603 7 392 Por 31/01/2021 Activated 11:17:38 11:19:37
603 7 392 Por 31/01/2021 Activated 11:26:25 11:29:37
603 7 392 Por 31/01/2021 Activated 11:45:38 11:49:38

根据 state.code 中找到的值,时间是“激活”或“停用”时间。我设法通过使用来分割时间:

df$Activated <- ifelse(df$State.code == "Activated", df$Time, NA)
df$Deactivated <- ifelse(df$State.code == "Deactivated", df$Time, NA)

哪个给了我:

Geocode Barrier.ID Device.ID City Date Time State.code Activated Deactivated
603 7 392 Por 31/01/2021 10:39:10 Deactivated 10:39:10
603 7 392 Por 31/01/2021 10:54:18 Deactivated 10:54:18
603 7 392 Por 31/01/2021 11:10:38 Activated 11:10:38
603 7 392 Por 31/01/2021 11:11:37 Deactivated 11:11:37
603 7 392 Por 31/01/2021 11:12:18 Activated 11:12:18
603 7 392 Por 31/01/2021 11:13:37 Deactivated 11:13:37
603 7 392 Por 31/01/2021 11:17:38 Activated 11:17:38
603 7 392 Por 31/01/2021 11:19:37 Deactivated 11:19:37
603 7 392 Por 31/01/2021 11:26:25 Activated 11:26:25
603 7 392 Por 31/01/2021 11:29:37 Deactivated 11:29:37
603 7 392 Por 31/01/2021 11:40:38 Activated 11:40:38
603 7 392 Por 31/01/2021 11:45:38 Activated 11:45:38
603 7 392 Por 31/01/2021 11:49:38 Deactivated 11:49:38

然后我卡住了,但是,我不知道如何进行。因此,我的问题是:

  1. 如何将这些行合并在一起,以便它们在数据框中的一行中显示“激活”和“停用”时间(如第二个 table 所示)?

  2. 如何排除“不完整”的事件(某些事件缺少相应的“激活”或“停用”时间的情况)?

我考虑过使用 cbind 合并这两行,但由于事件不完整(即缺少“激活”事件的“停用”事件),这行不通,对吗?

如果有人能进一步帮助我,我将不胜感激!

以下内容甚至可以处理未排列的数据。

备注-

  • 我加入了日期和时间列,以便检查在停用之前的日历日开始的事件。
  • 我假设如果有多个激活状态在继续,最后一个将只被计算。
library(dplyr)
library(tidyr)

df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
  mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
         code = State.code == "Activated") %>%
  select(-Time) %>%
  arrange(Date) %>%
  mutate(code = cumsum(code)) %>%
  filter(code != 0) %>%
  group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
  filter(n() ==2) %>%
  pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
  select(-code)

# A tibble: 5 x 7
# Groups:   Geocode, Barrier.ID, Device.ID, City, code [5]
   code Geocode Barrier.ID Device.ID City  Activated           Deactivated        
  <int> <chr>   <chr>      <chr>     <chr> <dttm>              <dttm>             
1     1 603     7          392       Por   2021-01-31 11:10:38 2021-01-31 11:11:37
2     2 603     7          392       Por   2021-01-31 11:12:18 2021-01-31 11:13:37
3     3 603     7          392       Por   2021-01-31 11:17:38 2021-01-31 11:19:37
4     4 603     7          392       Por   2021-01-31 11:26:25 2021-01-31 11:29:37
5     6 603     7          392       Por   2021-01-31 11:45:38 2021-01-31 11:49:38
  • 但是,如果您停用了多个状态,则以下策略将起作用
library(data.table)
df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
  mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
         code = State.code == "Activated") %>%
  select(-Time) %>%
  arrange(Date) %>%
  mutate(code = cumsum(code),
         code2 = rleid(State.code)) %>%
  filter(code != 0) %>%
  group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
  filter(n() != 1) %>%
  group_by(code2) %>% slice_tail() %>%
  pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
  select(-code)

  Geocode Barrier.ID Device.ID City  Activated           Deactivated        
  <chr>   <chr>      <chr>     <chr> <dttm>              <dttm>             
1 603     7          392       Por   2021-01-31 11:10:38 2021-01-31 11:11:37
2 603     7          392       Por   2021-01-31 11:13:37 2021-01-31 11:19:37
3 603     7          392       Por   2021-01-31 11:26:25 2021-01-31 11:29:37
4 603     7          392       Por   2021-01-31 11:45:38 2021-01-31 11:49:38

df
> df
   Geocode Barrier.ID Device.ID City       Date     Time  State.code
1      603          7       392  Por 31/01/2021 10:39:10 Deactivated
2      603          7       392  Por 31/01/2021 10:54:18 Deactivated
3      603          7       392  Por 31/01/2021 11:10:38   Activated
4      603          7       392  Por 31/01/2021 11:11:37 Deactivated
5      603          7       392  Por 31/01/2021 11:12:18   Activated
6      603          7       392  Por 31/01/2021 11:13:37   Activated
7      603          7       392  Por 31/01/2021 11:17:38 Deactivated
8      603          7       392  Por 31/01/2021 11:19:37 Deactivated
9      603          7       392  Por 31/01/2021 11:26:25   Activated
10     603          7       392  Por 31/01/2021 11:29:37 Deactivated
11     603          7       392  Por 31/01/2021 11:40:38   Activated
12     603          7       392  Por 31/01/2021 11:45:38   Activated
13     603          7       392  Por 31/01/2021 11:49:38 Deactivated