如何grepl搜索字符串中的最大和最小时间?
How to grepl search for the max and min timings in a string?
我有一个数据集,其中有一列包含各种商店的开门和关门时间。
时间为字符串格式 Opening time - Closing time,
例如:17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30
我想提取上述字符串中的最小打开时间,即 11:30 和最大关闭时间,即 21:00。我如何使用 R 来做到这一点?
输出:
structure(list(head.timings_remapping.Opening.And.Closing.Time..40. = c("15:30 - 21:30",
"12:00 - 00:00", "11:00 - 15:00 | 16:30 - 20:45", "12:00 - 22:30",
"17:00 - 21:30", "17:00 - 21:30", "16:30 - 00:00", "16:00 - 21:15",
"16:30 - 20:30", "17:00 - 20:00", "16:00 - 23:30", "16:30 - 21:30",
"17:00 - 22:00", "17:00 - 22:00", "17:00 - 21:30", "17:00 - 21:30",
"16:00 - 00:00", "16:30 - 23:59", "11:30 - 22:30", "11:30 - 23:59",
"17:00 - 20:30", "07:30 - 12:50", "16:15 - 23:00", "09:00 - 21:00",
"10:00 - 21:00", "11:00 - 22:00", "07:00 - 12:00 | 07:00 - 13:30 | 12:00 - 13:30",
"07:00 - 13:00 | 10:00 - 15:00", "10:00 - 02:00", "00:00 - 23:59",
"00:00 - 23:59", "11:00 - 20:00", "11:00 - 20:00", NA, "12:00 - 03:30 | 11:00 - 00:00",
"05:30 - 15:00", "07:00 - 16:00", "08:30 - 13:30", "17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30",
"12:00 - 01:00")), class = "data.frame", row.names = c(NA, -40L
))
最终输出将有两列“Opening time”和“Closing time”
这个有用吗:
library(dplyr)
library(tidyr)
df %>%
separate(col = head.timings_remapping.Opening.And.Closing.Time..40., into = c('Open_Close','A'), sep = '\|') %>%
separate(col = Open_Close, into = c('Opening Time','Closing Time'), sep = ' - ') %>%
mutate(`Opening Time` = trimws(`Opening Time`), `Closing Time` = trimws(`Closing Time`)) %>% select(-A)
Opening Time Closing Time
1 15:30 21:30
2 12:00 00:00
3 11:00 15:00
4 12:00 22:30
5 17:00 21:30
6 17:00 21:30
7 16:30 00:00
8 16:00 21:15
9 16:30 20:30
10 17:00 20:00
11 16:00 23:30
12 16:30 21:30
13 17:00 22:00
14 17:00 22:00
15 17:00 21:30
16 17:00 21:30
17 16:00 00:00
18 16:30 23:59
19 11:30 22:30
20 11:30 23:59
21 17:00 20:30
22 07:30 12:50
23 16:15 23:00
24 09:00 21:00
25 10:00 21:00
26 11:00 22:00
27 07:00 12:00
28 07:00 13:00
29 10:00 02:00
30 00:00 23:59
31 00:00 23:59
32 11:00 20:00
33 11:00 20:00
34 <NA> <NA>
35 12:00 03:30
36 05:30 15:00
37 07:00 16:00
38 08:30 13:30
39 17:00 21:00
40 12:00 01:00
使用 dplyr
和 tidyr
库你可以做到:
library(dplyr)
library(tidyr)
#Rename the long column name to something smaller
names(df)[1] <- 'Time'
df %>%
#Create a row index
mutate(row = row_number()) %>%
#Split the data in different rows on '|'
separate_rows(Time, sep = '\s*\|\s*') %>%
#split the data on '-'
separate(Time, c("Opening_Time", "Closing_time"), sep = '\s*-\s*') %>%
#Change the time to POSIXct format
mutate(across(c(Opening_Time, Closing_time), as.POSIXct, format = '%H:%M')) %>%
#For each row
group_by(row) %>%
#Get minimum opening time and maximum closing time
#and change into required format
summarise(Opening_Time = format(min(Opening_Time), "%H:%M"),
Closing_time = format(max(Closing_time), "%H:%M")) %>%
#Drop row column
select(-row)
这个returns
# Opening_Time Closing_time
# <chr> <chr>
# 1 15:30 21:30
# 2 12:00 00:00
# 3 11:00 20:45
# 4 12:00 22:30
# 5 17:00 21:30
# 6 17:00 21:30
# 7 16:30 00:00
# 8 16:00 21:15
# 9 16:30 20:30
#10 17:00 20:00
# … with 30 more rows
我有一个数据集,其中有一列包含各种商店的开门和关门时间。 时间为字符串格式 Opening time - Closing time, 例如:17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30
我想提取上述字符串中的最小打开时间,即 11:30 和最大关闭时间,即 21:00。我如何使用 R 来做到这一点?
输出:
structure(list(head.timings_remapping.Opening.And.Closing.Time..40. = c("15:30 - 21:30",
"12:00 - 00:00", "11:00 - 15:00 | 16:30 - 20:45", "12:00 - 22:30",
"17:00 - 21:30", "17:00 - 21:30", "16:30 - 00:00", "16:00 - 21:15",
"16:30 - 20:30", "17:00 - 20:00", "16:00 - 23:30", "16:30 - 21:30",
"17:00 - 22:00", "17:00 - 22:00", "17:00 - 21:30", "17:00 - 21:30",
"16:00 - 00:00", "16:30 - 23:59", "11:30 - 22:30", "11:30 - 23:59",
"17:00 - 20:30", "07:30 - 12:50", "16:15 - 23:00", "09:00 - 21:00",
"10:00 - 21:00", "11:00 - 22:00", "07:00 - 12:00 | 07:00 - 13:30 | 12:00 - 13:30",
"07:00 - 13:00 | 10:00 - 15:00", "10:00 - 02:00", "00:00 - 23:59",
"00:00 - 23:59", "11:00 - 20:00", "11:00 - 20:00", NA, "12:00 - 03:30 | 11:00 - 00:00",
"05:30 - 15:00", "07:00 - 16:00", "08:30 - 13:30", "17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30",
"12:00 - 01:00")), class = "data.frame", row.names = c(NA, -40L
))
最终输出将有两列“Opening time”和“Closing time”
这个有用吗:
library(dplyr)
library(tidyr)
df %>%
separate(col = head.timings_remapping.Opening.And.Closing.Time..40., into = c('Open_Close','A'), sep = '\|') %>%
separate(col = Open_Close, into = c('Opening Time','Closing Time'), sep = ' - ') %>%
mutate(`Opening Time` = trimws(`Opening Time`), `Closing Time` = trimws(`Closing Time`)) %>% select(-A)
Opening Time Closing Time
1 15:30 21:30
2 12:00 00:00
3 11:00 15:00
4 12:00 22:30
5 17:00 21:30
6 17:00 21:30
7 16:30 00:00
8 16:00 21:15
9 16:30 20:30
10 17:00 20:00
11 16:00 23:30
12 16:30 21:30
13 17:00 22:00
14 17:00 22:00
15 17:00 21:30
16 17:00 21:30
17 16:00 00:00
18 16:30 23:59
19 11:30 22:30
20 11:30 23:59
21 17:00 20:30
22 07:30 12:50
23 16:15 23:00
24 09:00 21:00
25 10:00 21:00
26 11:00 22:00
27 07:00 12:00
28 07:00 13:00
29 10:00 02:00
30 00:00 23:59
31 00:00 23:59
32 11:00 20:00
33 11:00 20:00
34 <NA> <NA>
35 12:00 03:30
36 05:30 15:00
37 07:00 16:00
38 08:30 13:30
39 17:00 21:00
40 12:00 01:00
使用 dplyr
和 tidyr
库你可以做到:
library(dplyr)
library(tidyr)
#Rename the long column name to something smaller
names(df)[1] <- 'Time'
df %>%
#Create a row index
mutate(row = row_number()) %>%
#Split the data in different rows on '|'
separate_rows(Time, sep = '\s*\|\s*') %>%
#split the data on '-'
separate(Time, c("Opening_Time", "Closing_time"), sep = '\s*-\s*') %>%
#Change the time to POSIXct format
mutate(across(c(Opening_Time, Closing_time), as.POSIXct, format = '%H:%M')) %>%
#For each row
group_by(row) %>%
#Get minimum opening time and maximum closing time
#and change into required format
summarise(Opening_Time = format(min(Opening_Time), "%H:%M"),
Closing_time = format(max(Closing_time), "%H:%M")) %>%
#Drop row column
select(-row)
这个returns
# Opening_Time Closing_time
# <chr> <chr>
# 1 15:30 21:30
# 2 12:00 00:00
# 3 11:00 20:45
# 4 12:00 22:30
# 5 17:00 21:30
# 6 17:00 21:30
# 7 16:30 00:00
# 8 16:00 21:15
# 9 16:30 20:30
#10 17:00 20:00
# … with 30 more rows