合并数据框中的两行 - 开始时间和结束时间
Merging two rows within a dataframe - start and end time
希望我做对了,因为这是我第一次在这里发帖!我目前有一个看起来像这样的数据集(总共有 160k 个条目):
Geocode
Barrier.ID
Device.ID
City
Date
Time
State.code
603
7
392
Por
31/01/2021
10:39:10
Deactivated
603
7
392
Por
31/01/2021
10:54:18
Deactivated
603
7
392
Por
31/01/2021
11:10:38
Activated
603
7
392
Por
31/01/2021
11:11:37
Deactivated
603
7
392
Por
31/01/2021
11:12:18
Activated
603
7
392
Por
31/01/2021
11:13:37
Deactivated
603
7
392
Por
31/01/2021
11:17:38
Activated
603
7
392
Por
31/01/2021
11:19:37
Deactivated
603
7
392
Por
31/01/2021
11:26:25
Activated
603
7
392
Por
31/01/2021
11:29:37
Deactivated
603
7
392
Por
31/01/2021
11:40:38
Activated
603
7
392
Por
31/01/2021
11:45:38
Activated
603
7
392
Por
31/01/2021
11:49:38
Deactivated
原始数据输入:
structure(list(Geocode = c("603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603"),
Barrier.ID = c("7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7"),
Device.ID = c("392","392", "392", "392", "392","392", "392", "392", "392", "392", "392","392", "392"),
City = c("Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por"),
Date = c("31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021"),
Time = c("10:39:10", "10:54:18", "11:10:38", "11:11:37", "11:12:18", "11:13:37",
"11:17:38", "11:19:37", "11:26:25", "11:29:37", "11:40:38", "11:45:38", "11:49:38"),
State.code = c("Deactivated", "Deactivated", "Activated", "Deactivated",
"Activated", "Deactivated", "Activated", "Deactivated", "Activated",
"Deactivated", "Activated", "Activated", "Deactivated")),
row.names = c(NA, 13L),
class = "data.frame")
我想创建类似于下面的 table 的内容。请注意,第一个 table 的前两行已被删除,因为事件必须始终以“激活”状态代码开头。上面 table 的第 11 行也已被删除,因为那里有一个未完成的事件。只有“激活”事件,没有“停用”事件。
Geocode
Barrier.ID
Device.ID
City
Date
State.code
Activated
Deactivated
603
7
392
Por
31/01/2021
Activated
11:10:38
11:11:37
603
7
392
Por
31/01/2021
Activated
11:12:18
11:13:37
603
7
392
Por
31/01/2021
Activated
11:17:38
11:19:37
603
7
392
Por
31/01/2021
Activated
11:26:25
11:29:37
603
7
392
Por
31/01/2021
Activated
11:45:38
11:49:38
根据 state.code 中找到的值,时间是“激活”或“停用”时间。我设法通过使用来分割时间:
df$Activated <- ifelse(df$State.code == "Activated", df$Time, NA)
df$Deactivated <- ifelse(df$State.code == "Deactivated", df$Time, NA)
哪个给了我:
Geocode
Barrier.ID
Device.ID
City
Date
Time
State.code
Activated
Deactivated
603
7
392
Por
31/01/2021
10:39:10
Deactivated
10:39:10
603
7
392
Por
31/01/2021
10:54:18
Deactivated
10:54:18
603
7
392
Por
31/01/2021
11:10:38
Activated
11:10:38
603
7
392
Por
31/01/2021
11:11:37
Deactivated
11:11:37
603
7
392
Por
31/01/2021
11:12:18
Activated
11:12:18
603
7
392
Por
31/01/2021
11:13:37
Deactivated
11:13:37
603
7
392
Por
31/01/2021
11:17:38
Activated
11:17:38
603
7
392
Por
31/01/2021
11:19:37
Deactivated
11:19:37
603
7
392
Por
31/01/2021
11:26:25
Activated
11:26:25
603
7
392
Por
31/01/2021
11:29:37
Deactivated
11:29:37
603
7
392
Por
31/01/2021
11:40:38
Activated
11:40:38
603
7
392
Por
31/01/2021
11:45:38
Activated
11:45:38
603
7
392
Por
31/01/2021
11:49:38
Deactivated
11:49:38
然后我卡住了,但是,我不知道如何进行。因此,我的问题是:
如何将这些行合并在一起,以便它们在数据框中的一行中显示“激活”和“停用”时间(如第二个 table 所示)?
如何排除“不完整”的事件(某些事件缺少相应的“激活”或“停用”时间的情况)?
我考虑过使用 cbind 合并这两行,但由于事件不完整(即缺少“激活”事件的“停用”事件),这行不通,对吗?
如果有人能进一步帮助我,我将不胜感激!
以下内容甚至可以处理未排列的数据。
备注-
- 我加入了日期和时间列,以便检查在停用之前的日历日开始的事件。
- 我假设如果有多个激活状态在继续,最后一个将只被计算。
library(dplyr)
library(tidyr)
df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
code = State.code == "Activated") %>%
select(-Time) %>%
arrange(Date) %>%
mutate(code = cumsum(code)) %>%
filter(code != 0) %>%
group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
filter(n() ==2) %>%
pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
select(-code)
# A tibble: 5 x 7
# Groups: Geocode, Barrier.ID, Device.ID, City, code [5]
code Geocode Barrier.ID Device.ID City Activated Deactivated
<int> <chr> <chr> <chr> <chr> <dttm> <dttm>
1 1 603 7 392 Por 2021-01-31 11:10:38 2021-01-31 11:11:37
2 2 603 7 392 Por 2021-01-31 11:12:18 2021-01-31 11:13:37
3 3 603 7 392 Por 2021-01-31 11:17:38 2021-01-31 11:19:37
4 4 603 7 392 Por 2021-01-31 11:26:25 2021-01-31 11:29:37
5 6 603 7 392 Por 2021-01-31 11:45:38 2021-01-31 11:49:38
- 但是,如果您停用了多个状态,则以下策略将起作用
library(data.table)
df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
code = State.code == "Activated") %>%
select(-Time) %>%
arrange(Date) %>%
mutate(code = cumsum(code),
code2 = rleid(State.code)) %>%
filter(code != 0) %>%
group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
filter(n() != 1) %>%
group_by(code2) %>% slice_tail() %>%
pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
select(-code)
Geocode Barrier.ID Device.ID City Activated Deactivated
<chr> <chr> <chr> <chr> <dttm> <dttm>
1 603 7 392 Por 2021-01-31 11:10:38 2021-01-31 11:11:37
2 603 7 392 Por 2021-01-31 11:13:37 2021-01-31 11:19:37
3 603 7 392 Por 2021-01-31 11:26:25 2021-01-31 11:29:37
4 603 7 392 Por 2021-01-31 11:45:38 2021-01-31 11:49:38
df
> df
Geocode Barrier.ID Device.ID City Date Time State.code
1 603 7 392 Por 31/01/2021 10:39:10 Deactivated
2 603 7 392 Por 31/01/2021 10:54:18 Deactivated
3 603 7 392 Por 31/01/2021 11:10:38 Activated
4 603 7 392 Por 31/01/2021 11:11:37 Deactivated
5 603 7 392 Por 31/01/2021 11:12:18 Activated
6 603 7 392 Por 31/01/2021 11:13:37 Activated
7 603 7 392 Por 31/01/2021 11:17:38 Deactivated
8 603 7 392 Por 31/01/2021 11:19:37 Deactivated
9 603 7 392 Por 31/01/2021 11:26:25 Activated
10 603 7 392 Por 31/01/2021 11:29:37 Deactivated
11 603 7 392 Por 31/01/2021 11:40:38 Activated
12 603 7 392 Por 31/01/2021 11:45:38 Activated
13 603 7 392 Por 31/01/2021 11:49:38 Deactivated
希望我做对了,因为这是我第一次在这里发帖!我目前有一个看起来像这样的数据集(总共有 160k 个条目):
Geocode | Barrier.ID | Device.ID | City | Date | Time | State.code |
---|---|---|---|---|---|---|
603 | 7 | 392 | Por | 31/01/2021 | 10:39:10 | Deactivated |
603 | 7 | 392 | Por | 31/01/2021 | 10:54:18 | Deactivated |
603 | 7 | 392 | Por | 31/01/2021 | 11:10:38 | Activated |
603 | 7 | 392 | Por | 31/01/2021 | 11:11:37 | Deactivated |
603 | 7 | 392 | Por | 31/01/2021 | 11:12:18 | Activated |
603 | 7 | 392 | Por | 31/01/2021 | 11:13:37 | Deactivated |
603 | 7 | 392 | Por | 31/01/2021 | 11:17:38 | Activated |
603 | 7 | 392 | Por | 31/01/2021 | 11:19:37 | Deactivated |
603 | 7 | 392 | Por | 31/01/2021 | 11:26:25 | Activated |
603 | 7 | 392 | Por | 31/01/2021 | 11:29:37 | Deactivated |
603 | 7 | 392 | Por | 31/01/2021 | 11:40:38 | Activated |
603 | 7 | 392 | Por | 31/01/2021 | 11:45:38 | Activated |
603 | 7 | 392 | Por | 31/01/2021 | 11:49:38 | Deactivated |
原始数据输入:
structure(list(Geocode = c("603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603", "603"),
Barrier.ID = c("7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7", "7"),
Device.ID = c("392","392", "392", "392", "392","392", "392", "392", "392", "392", "392","392", "392"),
City = c("Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por", "Por"),
Date = c("31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021", "31/01/2021"),
Time = c("10:39:10", "10:54:18", "11:10:38", "11:11:37", "11:12:18", "11:13:37",
"11:17:38", "11:19:37", "11:26:25", "11:29:37", "11:40:38", "11:45:38", "11:49:38"),
State.code = c("Deactivated", "Deactivated", "Activated", "Deactivated",
"Activated", "Deactivated", "Activated", "Deactivated", "Activated",
"Deactivated", "Activated", "Activated", "Deactivated")),
row.names = c(NA, 13L),
class = "data.frame")
我想创建类似于下面的 table 的内容。请注意,第一个 table 的前两行已被删除,因为事件必须始终以“激活”状态代码开头。上面 table 的第 11 行也已被删除,因为那里有一个未完成的事件。只有“激活”事件,没有“停用”事件。
Geocode | Barrier.ID | Device.ID | City | Date | State.code | Activated | Deactivated |
---|---|---|---|---|---|---|---|
603 | 7 | 392 | Por | 31/01/2021 | Activated | 11:10:38 | 11:11:37 |
603 | 7 | 392 | Por | 31/01/2021 | Activated | 11:12:18 | 11:13:37 |
603 | 7 | 392 | Por | 31/01/2021 | Activated | 11:17:38 | 11:19:37 |
603 | 7 | 392 | Por | 31/01/2021 | Activated | 11:26:25 | 11:29:37 |
603 | 7 | 392 | Por | 31/01/2021 | Activated | 11:45:38 | 11:49:38 |
根据 state.code 中找到的值,时间是“激活”或“停用”时间。我设法通过使用来分割时间:
df$Activated <- ifelse(df$State.code == "Activated", df$Time, NA)
df$Deactivated <- ifelse(df$State.code == "Deactivated", df$Time, NA)
哪个给了我:
Geocode | Barrier.ID | Device.ID | City | Date | Time | State.code | Activated | Deactivated |
---|---|---|---|---|---|---|---|---|
603 | 7 | 392 | Por | 31/01/2021 | 10:39:10 | Deactivated | 10:39:10 | |
603 | 7 | 392 | Por | 31/01/2021 | 10:54:18 | Deactivated | 10:54:18 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:10:38 | Activated | 11:10:38 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:11:37 | Deactivated | 11:11:37 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:12:18 | Activated | 11:12:18 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:13:37 | Deactivated | 11:13:37 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:17:38 | Activated | 11:17:38 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:19:37 | Deactivated | 11:19:37 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:26:25 | Activated | 11:26:25 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:29:37 | Deactivated | 11:29:37 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:40:38 | Activated | 11:40:38 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:45:38 | Activated | 11:45:38 | |
603 | 7 | 392 | Por | 31/01/2021 | 11:49:38 | Deactivated | 11:49:38 |
然后我卡住了,但是,我不知道如何进行。因此,我的问题是:
如何将这些行合并在一起,以便它们在数据框中的一行中显示“激活”和“停用”时间(如第二个 table 所示)?
如何排除“不完整”的事件(某些事件缺少相应的“激活”或“停用”时间的情况)?
我考虑过使用 cbind 合并这两行,但由于事件不完整(即缺少“激活”事件的“停用”事件),这行不通,对吗?
如果有人能进一步帮助我,我将不胜感激!
以下内容甚至可以处理未排列的数据。
备注-
- 我加入了日期和时间列,以便检查在停用之前的日历日开始的事件。
- 我假设如果有多个激活状态在继续,最后一个将只被计算。
library(dplyr)
library(tidyr)
df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
code = State.code == "Activated") %>%
select(-Time) %>%
arrange(Date) %>%
mutate(code = cumsum(code)) %>%
filter(code != 0) %>%
group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
filter(n() ==2) %>%
pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
select(-code)
# A tibble: 5 x 7
# Groups: Geocode, Barrier.ID, Device.ID, City, code [5]
code Geocode Barrier.ID Device.ID City Activated Deactivated
<int> <chr> <chr> <chr> <chr> <dttm> <dttm>
1 1 603 7 392 Por 2021-01-31 11:10:38 2021-01-31 11:11:37
2 2 603 7 392 Por 2021-01-31 11:12:18 2021-01-31 11:13:37
3 3 603 7 392 Por 2021-01-31 11:17:38 2021-01-31 11:19:37
4 4 603 7 392 Por 2021-01-31 11:26:25 2021-01-31 11:29:37
5 6 603 7 392 Por 2021-01-31 11:45:38 2021-01-31 11:49:38
- 但是,如果您停用了多个状态,则以下策略将起作用
library(data.table)
df %>% group_by(Geocode, Barrier.ID, Device.ID, City) %>%
mutate(Date = as.POSIXct(paste(Date, Time), format = "%d/%m/%Y %H:%M:%S"),
code = State.code == "Activated") %>%
select(-Time) %>%
arrange(Date) %>%
mutate(code = cumsum(code),
code2 = rleid(State.code)) %>%
filter(code != 0) %>%
group_by(Geocode, Barrier.ID, Device.ID, City, code) %>%
filter(n() != 1) %>%
group_by(code2) %>% slice_tail() %>%
pivot_wider(id_cols = c(Geocode, Barrier.ID, Device.ID, City, code), names_from = State.code, values_from = Date) %>%
select(-code)
Geocode Barrier.ID Device.ID City Activated Deactivated
<chr> <chr> <chr> <chr> <dttm> <dttm>
1 603 7 392 Por 2021-01-31 11:10:38 2021-01-31 11:11:37
2 603 7 392 Por 2021-01-31 11:13:37 2021-01-31 11:19:37
3 603 7 392 Por 2021-01-31 11:26:25 2021-01-31 11:29:37
4 603 7 392 Por 2021-01-31 11:45:38 2021-01-31 11:49:38
df
> df
Geocode Barrier.ID Device.ID City Date Time State.code
1 603 7 392 Por 31/01/2021 10:39:10 Deactivated
2 603 7 392 Por 31/01/2021 10:54:18 Deactivated
3 603 7 392 Por 31/01/2021 11:10:38 Activated
4 603 7 392 Por 31/01/2021 11:11:37 Deactivated
5 603 7 392 Por 31/01/2021 11:12:18 Activated
6 603 7 392 Por 31/01/2021 11:13:37 Activated
7 603 7 392 Por 31/01/2021 11:17:38 Deactivated
8 603 7 392 Por 31/01/2021 11:19:37 Deactivated
9 603 7 392 Por 31/01/2021 11:26:25 Activated
10 603 7 392 Por 31/01/2021 11:29:37 Deactivated
11 603 7 392 Por 31/01/2021 11:40:38 Activated
12 603 7 392 Por 31/01/2021 11:45:38 Activated
13 603 7 392 Por 31/01/2021 11:49:38 Deactivated