当字符串更改 c 时从 R 中的字符串中间提取时间

Extracting time from middle of string in R when string changes c

我有一个数据集,其中包含多个运动员在不同 days/times 上的训练数据。一列包含会话的日期和开始时间。我只想保留此列中的开始时间,即我想删除“2020/01/05”和 "UTC"。如何删除时间前后的所有内容(有 400 万行 dates/times)。

 start.time
1 2020/01/05 21:30:04 UTC 
2 2020/01/05 21:30:04 UTC 
3 2020/01/05 21:30:04 UTC 
4 2020/01/05 21:30:04 UTC 
5 2020/01/05 21:30:04 UTC 
6 2020/01/05 21:30:04 UTC 

抱歉,这可能已经在某个地方得到了回答。

谢谢

执行此操作的多种方法:

1) 使用正则表达式

df$time <- sub('.*\s+(.*) UTC', '\1', df$start.time)
df
#               start.time     time
#1 2020/01/05 21:30:04 UTC 21:30:04
#2 2020/01/05 21:30:04 UTC 21:30:04
#3 2020/01/05 21:30:04 UTC 21:30:04
#4 2020/01/05 21:30:04 UTC 21:30:04
#5 2020/01/05 21:30:04 UTC 21:30:04
#6 2020/01/05 21:30:04 UTC 21:30:04

在这里,我们捕获空白和 "UTC" 之间的所有内容。 \1 用作反向引用以捕获提取的值。


2) 转换为 POSIXct 然后 format

这可以在 base R 中完成:

format(as.POSIXct(df$start.time, format = "%Y/%m/%d %T"), "%T")

或使用lubridate

format(lubridate::ymd_hms(df$start.time), "%T")

数据

df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")), 
class = "data.frame", row.names = c(NA,-6L))

我们可以使用 anytime 来自 anytime

library(anytime)
format(anytime(df$start.time), "%T")

as.ITime

library(data.table)
as.ITime(df$start.time)
#[1] "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04"

数据

df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")), 
class = "data.frame", row.names = c(NA,-6L))