合并数据框中的两行 - 开始时间和结束时间

Question

希望我做对了，因为这是我第一次在这里发帖！我目前有一个看起来像这样的数据集（总共有 160k 个条目）：

Geocode	Barrier.ID	Device.ID	City	Date	Time	State.code
603	7	392	Por	31/01/2021	10:39:10	Deactivated
603	7	392	Por	31/01/2021	10:54:18	Deactivated
603	7	392	Por	31/01/2021	11:10:38	Activated
603	7	392	Por	31/01/2021	11:11:37	Deactivated
603	7	392	Por	31/01/2021	11:12:18	Activated
603	7	392	Por	31/01/2021	11:13:37	Deactivated
603	7	392	Por	31/01/2021	11:17:38	Activated
603	7	392	Por	31/01/2021	11:19:37	Deactivated
603	7	392	Por	31/01/2021	11:26:25	Activated
603	7	392	Por	31/01/2021	11:29:37	Deactivated
603	7	392	Por	31/01/2021	11:40:38	Activated
603	7	392	Por	31/01/2021	11:45:38	Activated
603	7	392	Por	31/01/2021	11:49:38	Deactivated

原始数据输入：

structure(list(Geocode = c("603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603"), 
Barrier.ID = c("7",  "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7"), 
Device.ID = c("392","392", "392", "392", "392","392", "392", "392", "392", "392", "392","392", "392"), 
City = c("Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por"),
Date = c("31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021"), 
Time = c("10:39:10", "10:54:18", "11:10:38", "11:11:37", "11:12:18", "11:13:37", 
"11:17:38", "11:19:37", "11:26:25", "11:29:37", "11:40:38", "11:45:38", "11:49:38"), 
State.code = c("Deactivated", "Deactivated", "Activated", "Deactivated", 
"Activated", "Deactivated", "Activated", "Deactivated", "Activated", 
"Deactivated", "Activated", "Activated", "Deactivated")), 
row.names = c(NA, 13L), 
class = "data.frame")

我想创建类似于下面的 table 的内容。请注意，第一个 table 的前两行已被删除，因为事件必须始终以“激活”状态代码开头。上面 table 的第 11 行也已被删除，因为那里有一个未完成的事件。只有“激活”事件，没有“停用”事件。

Geocode	Barrier.ID	Device.ID	City	Date	State.code	Activated	Deactivated
603	7	392	Por	31/01/2021	Activated	11:10:38	11:11:37
603	7	392	Por	31/01/2021	Activated	11:12:18	11:13:37
603	7	392	Por	31/01/2021	Activated	11:17:38	11:19:37
603	7	392	Por	31/01/2021	Activated	11:26:25	11:29:37
603	7	392	Por	31/01/2021	Activated	11:45:38	11:49:38

根据 state.code 中找到的值，时间是“激活”或“停用”时间。我设法通过使用来分割时间：

df$Activated <- ifelse(df$State.code == "Activated", df$Time, NA)
df$Deactivated <- ifelse(df$State.code == "Deactivated", df$Time, NA)

哪个给了我：

Geocode	Barrier.ID	Device.ID	City	Date	Time	State.code	Activated	Deactivated
603	7	392	Por	31/01/2021	10:39:10	Deactivated		10:39:10
603	7	392	Por	31/01/2021	10:54:18	Deactivated		10:54:18
603	7	392	Por	31/01/2021	11:10:38	Activated	11:10:38
603	7	392	Por	31/01/2021	11:11:37	Deactivated		11:11:37
603	7	392	Por	31/01/2021	11:12:18	Activated	11:12:18
603	7	392	Por	31/01/2021	11:13:37	Deactivated		11:13:37
603	7	392	Por	31/01/2021	11:17:38	Activated	11:17:38
603	7	392	Por	31/01/2021	11:19:37	Deactivated		11:19:37
603	7	392	Por	31/01/2021	11:26:25	Activated	11:26:25
603	7	392	Por	31/01/2021	11:29:37	Deactivated		11:29:37
603	7	392	Por	31/01/2021	11:40:38	Activated	11:40:38
603	7	392	Por	31/01/2021	11:45:38	Activated	11:45:38
603	7	392	Por	31/01/2021	11:49:38	Deactivated		11:49:38

然后我卡住了，但是，我不知道如何进行。因此，我的问题是：

如何将这些行合并在一起，以便它们在数据框中的一行中显示“激活”和“停用”时间（如第二个 table 所示）？
如何排除“不完整”的事件（某些事件缺少相应的“激活”或“停用”时间的情况）？

我考虑过使用 cbind 合并这两行，但由于事件不完整（即缺少“激活”事件的“停用”事件），这行不通，对吗？

如果有人能进一步帮助我，我将不胜感激！

Answer 1

以下内容甚至可以处理未排列的数据。

备注-

我加入了日期和时间列，以便检查在停用之前的日历日开始的事件。
我假设如果有多个激活状态在继续，最后一个将只被计算。

library(dplyr)
library(tidyr)

df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
  mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
         code = State.code == "Activated") %>%
  select(-Time) %>%
  arrange(Date) %>%
  mutate(code = cumsum(code)) %>%
  filter(code != 0) %>%
  group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
  filter(n() ==2) %>%
  pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
  select(-code)

# A tibble: 5 x 7
# Groups:   Geocode, Barrier.ID, Device.ID, City, code [5]
   code Geocode Barrier.ID Device.ID City  Activated           Deactivated        
  <int> <chr>   <chr>      <chr>     <chr> <dttm>              <dttm>             
1     1 603     7          392       Por   2021-01-31 11:10:38 2021-01-31 11:11:37
2     2 603     7          392       Por   2021-01-31 11:12:18 2021-01-31 11:13:37
3     3 603     7          392       Por   2021-01-31 11:17:38 2021-01-31 11:19:37
4     4 603     7          392       Por   2021-01-31 11:26:25 2021-01-31 11:29:37
5     6 603     7          392       Por   2021-01-31 11:45:38 2021-01-31 11:49:38

但是，如果您停用了多个状态，则以下策略将起作用

library(data.table)
df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
  mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
         code = State.code == "Activated") %>%
  select(-Time) %>%
  arrange(Date) %>%
  mutate(code = cumsum(code),
         code2 = rleid(State.code)) %>%
  filter(code != 0) %>%
  group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
  filter(n() != 1) %>%
  group_by(code2) %>% slice_tail() %>%
  pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
  select(-code)

  Geocode Barrier.ID Device.ID City  Activated           Deactivated        
  <chr>   <chr>      <chr>     <chr> <dttm>              <dttm>             
1 603     7          392       Por   2021-01-31 11:10:38 2021-01-31 11:11:37
2 603     7          392       Por   2021-01-31 11:13:37 2021-01-31 11:19:37
3 603     7          392       Por   2021-01-31 11:26:25 2021-01-31 11:29:37
4 603     7          392       Por   2021-01-31 11:45:38 2021-01-31 11:49:38

df
> df
   Geocode Barrier.ID Device.ID City       Date     Time  State.code
1      603          7       392  Por 31/01/2021 10:39:10 Deactivated
2      603          7       392  Por 31/01/2021 10:54:18 Deactivated
3      603          7       392  Por 31/01/2021 11:10:38   Activated
4      603          7       392  Por 31/01/2021 11:11:37 Deactivated
5      603          7       392  Por 31/01/2021 11:12:18   Activated
6      603          7       392  Por 31/01/2021 11:13:37   Activated
7      603          7       392  Por 31/01/2021 11:17:38 Deactivated
8      603          7       392  Por 31/01/2021 11:19:37 Deactivated
9      603          7       392  Por 31/01/2021 11:26:25   Activated
10     603          7       392  Por 31/01/2021 11:29:37 Deactivated
11     603          7       392  Por 31/01/2021 11:40:38   Activated
12     603          7       392  Por 31/01/2021 11:45:38   Activated
13     603          7       392  Por 31/01/2021 11:49:38 Deactivated

合并数据框中的两行 - 开始时间和结束时间

Merging two rows within a dataframe - start and end time

r

dataset

data-transform