当字符串更改 c 时从 R 中的字符串中间提取时间
Extracting time from middle of string in R when string changes c
我有一个数据集,其中包含多个运动员在不同 days/times 上的训练数据。一列包含会话的日期和开始时间。我只想保留此列中的开始时间,即我想删除“2020/01/05”和 "UTC"。如何删除时间前后的所有内容(有 400 万行 dates/times)。
start.time
1 2020/01/05 21:30:04 UTC
2 2020/01/05 21:30:04 UTC
3 2020/01/05 21:30:04 UTC
4 2020/01/05 21:30:04 UTC
5 2020/01/05 21:30:04 UTC
6 2020/01/05 21:30:04 UTC
抱歉,这可能已经在某个地方得到了回答。
谢谢
执行此操作的多种方法:
1) 使用正则表达式
df$time <- sub('.*\s+(.*) UTC', '\1', df$start.time)
df
# start.time time
#1 2020/01/05 21:30:04 UTC 21:30:04
#2 2020/01/05 21:30:04 UTC 21:30:04
#3 2020/01/05 21:30:04 UTC 21:30:04
#4 2020/01/05 21:30:04 UTC 21:30:04
#5 2020/01/05 21:30:04 UTC 21:30:04
#6 2020/01/05 21:30:04 UTC 21:30:04
在这里,我们捕获空白和 "UTC"
之间的所有内容。 \1
用作反向引用以捕获提取的值。
2) 转换为 POSIXct
然后 format
这可以在 base R 中完成:
format(as.POSIXct(df$start.time, format = "%Y/%m/%d %T"), "%T")
或使用lubridate
format(lubridate::ymd_hms(df$start.time), "%T")
数据
df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")),
class = "data.frame", row.names = c(NA,-6L))
我们可以使用 anytime
来自 anytime
library(anytime)
format(anytime(df$start.time), "%T")
或 as.ITime
library(data.table)
as.ITime(df$start.time)
#[1] "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04"
数据
df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")),
class = "data.frame", row.names = c(NA,-6L))
我有一个数据集,其中包含多个运动员在不同 days/times 上的训练数据。一列包含会话的日期和开始时间。我只想保留此列中的开始时间,即我想删除“2020/01/05”和 "UTC"。如何删除时间前后的所有内容(有 400 万行 dates/times)。
start.time
1 2020/01/05 21:30:04 UTC
2 2020/01/05 21:30:04 UTC
3 2020/01/05 21:30:04 UTC
4 2020/01/05 21:30:04 UTC
5 2020/01/05 21:30:04 UTC
6 2020/01/05 21:30:04 UTC
抱歉,这可能已经在某个地方得到了回答。
谢谢
执行此操作的多种方法:
1) 使用正则表达式
df$time <- sub('.*\s+(.*) UTC', '\1', df$start.time)
df
# start.time time
#1 2020/01/05 21:30:04 UTC 21:30:04
#2 2020/01/05 21:30:04 UTC 21:30:04
#3 2020/01/05 21:30:04 UTC 21:30:04
#4 2020/01/05 21:30:04 UTC 21:30:04
#5 2020/01/05 21:30:04 UTC 21:30:04
#6 2020/01/05 21:30:04 UTC 21:30:04
在这里,我们捕获空白和 "UTC"
之间的所有内容。 \1
用作反向引用以捕获提取的值。
2) 转换为 POSIXct
然后 format
这可以在 base R 中完成:
format(as.POSIXct(df$start.time, format = "%Y/%m/%d %T"), "%T")
或使用lubridate
format(lubridate::ymd_hms(df$start.time), "%T")
数据
df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")),
class = "data.frame", row.names = c(NA,-6L))
我们可以使用 anytime
来自 anytime
library(anytime)
format(anytime(df$start.time), "%T")
或 as.ITime
library(data.table)
as.ITime(df$start.time)
#[1] "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04"
数据
df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")),
class = "data.frame", row.names = c(NA,-6L))