如何grepl搜索字符串中的最大和最小时间?

How to grepl search for the max and min timings in a string?

我有一个数据集,其中有一列包含各种商店的开门和关门时间。 时间为字符串格式 Opening time - Closing time, 例如:17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30

我想提取上述字符串中的最小打开时间,即 11:30 和最大关闭时间,即 21:00。我如何使用 R 来做到这一点?

输出:

 structure(list(head.timings_remapping.Opening.And.Closing.Time..40. = c("15:30 - 21:30", 
"12:00 - 00:00", "11:00 - 15:00 | 16:30 - 20:45", "12:00 - 22:30", 
"17:00 - 21:30", "17:00 - 21:30", "16:30 - 00:00", "16:00 - 21:15", 
"16:30 - 20:30", "17:00 - 20:00", "16:00 - 23:30", "16:30 - 21:30", 
"17:00 - 22:00", "17:00 - 22:00", "17:00 - 21:30", "17:00 - 21:30", 
"16:00 - 00:00", "16:30 - 23:59", "11:30 - 22:30", "11:30 - 23:59", 
"17:00 - 20:30", "07:30 - 12:50", "16:15 - 23:00", "09:00 - 21:00", 
"10:00 - 21:00", "11:00 - 22:00", "07:00 - 12:00 | 07:00 - 13:30 | 12:00 - 13:30", 
"07:00 - 13:00 | 10:00 - 15:00", "10:00 - 02:00", "00:00 - 23:59", 
"00:00 - 23:59", "11:00 - 20:00", "11:00 - 20:00", NA, "12:00 - 03:30 | 11:00 - 00:00", 
"05:30 - 15:00", "07:00 - 16:00", "08:30 - 13:30", "17:00 - 21:00 | 11:30 - 14:30 | 11:30 - 14:30", 
"12:00 - 01:00")), class = "data.frame", row.names = c(NA, -40L
))

最终输出将有两列“Opening time”和“Closing time”

这个有用吗:

library(dplyr)
library(tidyr)
df %>% 
   separate(col = head.timings_remapping.Opening.And.Closing.Time..40., into = c('Open_Close','A'), sep = '\|') %>% 
   separate(col = Open_Close, into = c('Opening Time','Closing Time'), sep = ' - ') %>% 
   mutate(`Opening Time` = trimws(`Opening Time`), `Closing Time` = trimws(`Closing Time`)) %>% select(-A)
   Opening Time Closing Time
1         15:30        21:30
2         12:00        00:00
3         11:00        15:00
4         12:00        22:30
5         17:00        21:30
6         17:00        21:30
7         16:30        00:00
8         16:00        21:15
9         16:30        20:30
10        17:00        20:00
11        16:00        23:30
12        16:30        21:30
13        17:00        22:00
14        17:00        22:00
15        17:00        21:30
16        17:00        21:30
17        16:00        00:00
18        16:30        23:59
19        11:30        22:30
20        11:30        23:59
21        17:00        20:30
22        07:30        12:50
23        16:15        23:00
24        09:00        21:00
25        10:00        21:00
26        11:00        22:00
27        07:00        12:00
28        07:00        13:00
29        10:00        02:00
30        00:00        23:59
31        00:00        23:59
32        11:00        20:00
33        11:00        20:00
34         <NA>         <NA>
35        12:00        03:30
36        05:30        15:00
37        07:00        16:00
38        08:30        13:30
39        17:00        21:00
40        12:00        01:00
 

使用 dplyrtidyr 库你可以做到:

library(dplyr)
library(tidyr)

#Rename the long column name to something smaller
names(df)[1] <- 'Time'

df %>%
  #Create a row index
  mutate(row = row_number()) %>%
  #Split the data in different rows on '|'
  separate_rows(Time, sep = '\s*\|\s*') %>%
  #split the data on '-'
  separate(Time, c("Opening_Time", "Closing_time"), sep = '\s*-\s*') %>%
  #Change the time to POSIXct format
  mutate(across(c(Opening_Time, Closing_time), as.POSIXct, format = '%H:%M')) %>%
  #For each row
  group_by(row) %>%
  #Get minimum opening time and maximum closing time 
  #and change into required format
  summarise(Opening_Time = format(min(Opening_Time), "%H:%M"), 
            Closing_time = format(max(Closing_time), "%H:%M")) %>%
  #Drop row column
  select(-row)

这个returns

#  Opening_Time Closing_time
#   <chr>        <chr>       
# 1 15:30        21:30       
# 2 12:00        00:00       
# 3 11:00        20:45       
# 4 12:00        22:30       
# 5 17:00        21:30       
# 6 17:00        21:30       
# 7 16:30        00:00       
# 8 16:00        21:15       
# 9 16:30        20:30       
#10 17:00        20:00       
# … with 30 more rows