使用 R 根据时间戳拆分数据框中的行
Split the rows in data frame based on timestamp using R
我有以下非结构化票务数据集和工作笔记更新。每张工单都有多个基于时间戳的工作笔记。我需要将 Work notes 列拆分为具有时间戳的每一行及其相应的更新,类似于 Expected output
中显示的内容
I.NO Ticket No: Worknotes
0 198822 2015-06-19 01:57:11 -Account Service
1 198822 Event closed
2 198822 Acknowledged
3 198822 2015-06-19 01:58:33- Lawrence David
4 198822 Data unavialable and hence ticket closed
5 198824 2015-06-19 02:07:01- Account Service
6 198824 User requested for database information
7 198824 2015-06-19 02:07:34- Cecilia Trandau
8 198824 Backup in progress. Under discusion
9 198824 2015-06-20 02:07:01- Account Service
10 198824 Auto closed
########## Edited **Output of dput**
structure(list(I.NO = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), `Ticket No:` = c(198822,
198822, 198822, 198822, 198822, 198824, 198824, 198824, 198824,
198824, 198824), Worknotes = c("2015-06-19 01:57:11 -Account Service",
"Event closed", "Acknowledged", "2015-06-19 01:58:33- Lawrence David",
"Data unavialable and hence ticket closed", "2015-06-19 02:07:01- Account Service",
"User requested for database information", "2015-06-19 02:07:34- Cecilia Trandau",
"Backup in progress. Under discusion", "2015-06-20 02:07:01- Account Service",
"Auto closed")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 6 x 3
I.NO `Ticket No:` Worknotes
<dbl> <dbl> <chr>
1 0 198822 2015-06-19 01:57:11 -Account Service
2 1 198822 Event closed
3 2 198822 Acknowledged
4 3 198822 2015-06-19 01:58:33- Lawrence David
5 4 198822 Data unavialable and hence ticket closed
6 5 198824 2015-06-19 02:07:01- Account Service
###########################
**Expected Output**
**Ticket No:** **Worknotes**
198822 2015-06-19 01:57:11 -Account Service
Event closed
Acknowledge
198822 2015-06-19 01:58:33- Lawrence David
Data unavailable and hence ticket closed
198824 2015-06-19 02:07:01- Account Service
User requested for database information
198824 2015-06-19 02:07:34- Cecilia Trandau
Backup in progress. Under discusion
198824 2015-06-20 02:07:01- Account Service
Auto closed
这是一种在 cumsum
和 str_detect
上分组的方法:
library(tidyverse)
data %>%
mutate(grouper = cumsum(str_detect(Worknotes,"^[0-9\-]{10}")))
# A tibble: 11 x 4
I.NO `Ticket No:` Worknotes grouper
<dbl> <dbl> <chr> <int>
1 0 198822 2015-06-19 01:57:11 -Account Service 1
2 1 198822 Event closed 1
3 2 198822 Acknowledged 1
4 3 198822 2015-06-19 01:58:33- Lawrence David 2
5 4 198822 Data unavialable and hence ticket closed 2
6 5 198824 2015-06-19 02:07:01- Account Service 3
7 6 198824 User requested for database information 3
8 7 198824 2015-06-19 02:07:34- Cecilia Trandau 4
9 8 198824 Backup in progress. Under discusion 4
10 9 198824 2015-06-20 02:07:01- Account Service 5
11 10 198824 Auto closed 5
从这里,我们可以 group_by
、summarise
和 paste
:
data %>%
mutate(grouper = cumsum(str_detect(Worknotes,"^[0-9\-]{10}"))) %>%
group_by(`Ticket No:`, grouper) %>%
summarise(Worknotes = paste(Worknotes, collapse = "\n")) %>%
select(-grouper) -> result
result
`Ticket No:` Worknotes
<dbl> <chr>
1 198822 "2015-06-19 01:57:11 -Account Service\nEvent closed\nAcknowledged"
2 198822 "2015-06-19 01:58:33- Lawrence David\nData unavialable and hence ticket closed"
3 198824 "2015-06-19 02:07:01- Account Service\nUser requested for database information"
4 198824 "2015-06-19 02:07:34- Cecilia Trandau\nBackup in progress. Under discusion"
5 198824 "2015-06-20 02:07:01- Account Service\nAuto closed"
请注意 \n
在 R 中不使用 print()
进行解析,但它使用 cat()
:
进行解析
cat(as.matrix(result[1,2]))
2015-06-19 01:57:11 -Account Service
Event closed
Acknowledged
我有以下非结构化票务数据集和工作笔记更新。每张工单都有多个基于时间戳的工作笔记。我需要将 Work notes 列拆分为具有时间戳的每一行及其相应的更新,类似于 Expected output
中显示的内容I.NO Ticket No: Worknotes
0 198822 2015-06-19 01:57:11 -Account Service
1 198822 Event closed
2 198822 Acknowledged
3 198822 2015-06-19 01:58:33- Lawrence David
4 198822 Data unavialable and hence ticket closed
5 198824 2015-06-19 02:07:01- Account Service
6 198824 User requested for database information
7 198824 2015-06-19 02:07:34- Cecilia Trandau
8 198824 Backup in progress. Under discusion
9 198824 2015-06-20 02:07:01- Account Service
10 198824 Auto closed
########## Edited **Output of dput**
structure(list(I.NO = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), `Ticket No:` = c(198822,
198822, 198822, 198822, 198822, 198824, 198824, 198824, 198824,
198824, 198824), Worknotes = c("2015-06-19 01:57:11 -Account Service",
"Event closed", "Acknowledged", "2015-06-19 01:58:33- Lawrence David",
"Data unavialable and hence ticket closed", "2015-06-19 02:07:01- Account Service",
"User requested for database information", "2015-06-19 02:07:34- Cecilia Trandau",
"Backup in progress. Under discusion", "2015-06-20 02:07:01- Account Service",
"Auto closed")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 6 x 3
I.NO `Ticket No:` Worknotes
<dbl> <dbl> <chr>
1 0 198822 2015-06-19 01:57:11 -Account Service
2 1 198822 Event closed
3 2 198822 Acknowledged
4 3 198822 2015-06-19 01:58:33- Lawrence David
5 4 198822 Data unavialable and hence ticket closed
6 5 198824 2015-06-19 02:07:01- Account Service
###########################
**Expected Output**
**Ticket No:** **Worknotes**
198822 2015-06-19 01:57:11 -Account Service
Event closed
Acknowledge
198822 2015-06-19 01:58:33- Lawrence David
Data unavailable and hence ticket closed
198824 2015-06-19 02:07:01- Account Service
User requested for database information
198824 2015-06-19 02:07:34- Cecilia Trandau
Backup in progress. Under discusion
198824 2015-06-20 02:07:01- Account Service
Auto closed
这是一种在 cumsum
和 str_detect
上分组的方法:
library(tidyverse)
data %>%
mutate(grouper = cumsum(str_detect(Worknotes,"^[0-9\-]{10}")))
# A tibble: 11 x 4
I.NO `Ticket No:` Worknotes grouper
<dbl> <dbl> <chr> <int>
1 0 198822 2015-06-19 01:57:11 -Account Service 1
2 1 198822 Event closed 1
3 2 198822 Acknowledged 1
4 3 198822 2015-06-19 01:58:33- Lawrence David 2
5 4 198822 Data unavialable and hence ticket closed 2
6 5 198824 2015-06-19 02:07:01- Account Service 3
7 6 198824 User requested for database information 3
8 7 198824 2015-06-19 02:07:34- Cecilia Trandau 4
9 8 198824 Backup in progress. Under discusion 4
10 9 198824 2015-06-20 02:07:01- Account Service 5
11 10 198824 Auto closed 5
从这里,我们可以 group_by
、summarise
和 paste
:
data %>%
mutate(grouper = cumsum(str_detect(Worknotes,"^[0-9\-]{10}"))) %>%
group_by(`Ticket No:`, grouper) %>%
summarise(Worknotes = paste(Worknotes, collapse = "\n")) %>%
select(-grouper) -> result
result
`Ticket No:` Worknotes
<dbl> <chr>
1 198822 "2015-06-19 01:57:11 -Account Service\nEvent closed\nAcknowledged"
2 198822 "2015-06-19 01:58:33- Lawrence David\nData unavialable and hence ticket closed"
3 198824 "2015-06-19 02:07:01- Account Service\nUser requested for database information"
4 198824 "2015-06-19 02:07:34- Cecilia Trandau\nBackup in progress. Under discusion"
5 198824 "2015-06-20 02:07:01- Account Service\nAuto closed"
请注意 \n
在 R 中不使用 print()
进行解析,但它使用 cat()
:
cat(as.matrix(result[1,2]))
2015-06-19 01:57:11 -Account Service
Event closed
Acknowledged