将字符串 "days, hours, minutes, seconds" 计算为数字总天数

Calculate character string "days, hours, minutes, seconds" to numeric total days

我看到了很多与格式化时间相关的问题,但是 none 在我拥有的特定导入格式中:

Time <- c(
"22 hours 3 minutes 22 seconds", 
"170 hours 15 minutes 20 seconds", 
"39 seconds", 
"2 days 6 hours 44 minutes 17 seconds", 
"9 hours 54 minutes 36 seconds", 
"357 hours 23 minutes 28 seconds", 
"464 hours 30 minutes 7 seconds", 
"51 seconds", 
"31 hours 39 minutes 2 seconds", 
"355 hours 29 minutes 10 seconds")

有些时候只包含"seconds",而其他的则包含"minutes and seconds"、"days, hours, minutes and seconds"、"days and seconds"等。还有我需要保留的NA值。我怎样才能得到这个字符向量来计算(即添加天、小时、分钟、秒)数字总天数?

例如:

Time
8.10
19.3
0.68
2.28
48.1
0.00
0.70
0.1
3.2
13.9

谢谢!

编辑

老问题,但一个简单的 lubridate 调用现在可以解决问题:

(period_to_seconds(period(time)) / 86400) %>% round(2)

除了需要 %>% 以提高可读性之外,这在没有包的情况下也可以解决问题:

Time_vec <- mapply(function(tt, to_days) {
  ifelse(grepl(tt, Time), gsub(paste0("^.*?(\d+) ", tt, ".*$"), "\1", Time), 0) %>%
    as.numeric() / to_days
    },
  c("day", "hour", "minute", "second"),
  c(1, 24, 1440, 86400)
) %>%
  apply(1, sum) %>% 
  round(2)

在我的实际数据中,只有一个值与 lubridate 解不同,0.96 vs 0.97.

我建议您安装 stringr 包。然后这样做

library(stringr)
options(digits=7)
returndays <- function(alist){
        val <-length(alist)
        #print(val)
        hr <- vector()
        min <- vector()
        sec <- vector()
        day <- vector()
        for (i in 1:val){
                myinfo <-"([1-9][0-9]{0,2}) hours" 
                hr[i] <-str_match(alist[i],myinfo)[,2]
                myinfo2 <-"([1-9][0-9]{0,2}) minutes" 
                min[i] <-str_match(alist[i],myinfo2)[,2]
                myinfo3 <-"([1-9][0-9]{0,2}) seconds" 
                sec[i] <-str_match(alist[i],myinfo3)[,2]

                h <- as.numeric(hr[i])/24

                m <- as.numeric(min[i])/1440

                s <- as.numeric(sec[i])/86400

               day[i] <- sum(h+m+s,na.rm = TRUE)


        }

        return(day)

}

days <-returndays(Time)

days

[1]  0.9190046  7.0939815  0.0000000  0.2807523  0.4129167 14.8912963 19.3542477  0.0000000  1.3187731
[10] 14.8119213

lubridate提供函数period()可以方便的将小时、分钟、秒等转换为perdiod对象,可以方便的转换为秒:

period(days = 3, hours = 10, minutes = 3, seconds = 37)
## [1] "3d 10H 3M 37S"

我用这个函数来转换你的字符串:

to_days <- function(hms_char) {

   # split string
   v <- strsplit(hms_char, " ")[[1]]
   # get numbers
   idx <- seq(1, by = 2, length = length(v)/2)
   nums <- as.list(v[idx])
   # get units and use them as names
   names(nums) <- v[-idx]
   # apply functions, sum and convert to days
   duration <- do.call(period, nums)
   days <- period_to_seconds(duration)/86400

   return(days)
}

它适用于单个字符串,因此您需要使用 sapply 来转换完整的 Time:

sapply(Time, to_days, USE.NAMES = FALSE)
## [1] 9.190046e-01 7.093981e+00 4.513889e-04 2.807523e-01 4.129167e-01 1.489130e+01 1.935425e+01
## [8] 5.902778e-04 1.318773e+00 1.481192e+01

同样,没有包和一点正则表达式

Time <- c(
  "22 hours 3 minutes 22 seconds", 
  "170 hours 15 minutes 20 seconds", 
  "39 seconds", 
  "6 hours 44 minutes 17 seconds", 
  "9 hours 54 minutes 36 seconds", 
  "357 hours 23 minutes 28 seconds", 
  "464 hours 30 minutes 7 seconds", 
  "51 seconds", 
  "31 hours 39 minutes 2 seconds", 
  "355 hours 29 minutes 10 seconds")

pat <- '(?:(\d+) hours )?(?:(\d+) minutes )?(?:(\d+) seconds)?'
m <- regexpr(pat, Time, perl = TRUE)

m_st <- attr(m, 'capture.start')
m_ln <- attr(m, 'capture.length')

(mm <- mapply(function(x, y) as.numeric(substr(Time, x, y)),
              data.frame(m_st), data.frame(m_st + m_ln - 1)))

(dd <- setNames(data.frame(mm), c('h','m','s')))
#      h  m  s
# 1   22  3 22
# 2  170 15 20
# 3   NA NA 39
# 4    6 44 17
# 5    9 54 36
# 6  357 23 28
# 7  464 30  7
# 8   NA NA 51
# 9   31 39  2
# 10 355 29 10

round(rowSums(dd / data.frame(h = rep(24, nrow(dd)), m = 24 * 60, s = 24 * 60 * 60),
        na.rm = TRUE), 3)
# [1]  0.919  7.094  0.000  0.281  0.413 14.891 19.354  0.001  1.319 14.812

lubridate 在这里很有用。 hms 自动提取小时、分钟和秒(为您节省一些正则表达式),time_length 转换为天。

> library(lubridate)
> time_length(hms(Time), 'day')
estimate only: convert periods to intervals for accuracy
 [1]  0.9190046  7.0939815         NA  0.2807523  0.4129167 14.8912963 19.3542477         NA
 [9]  1.3187731 14.8119213

但是,如果没有三个数字,hms 将无法解析,因此进行一些预先清理会有所帮助:

> library(stringr)
> Time2 <- sapply(Time, function(x){paste(paste(rep(0, 3 - str_count(x, '[0-9]+')), collapse = ' '), x)})
> time_length(hms(Time2), 'day')
estimate only: convert periods to intervals for accuracy
 [1] 9.190046e-01 7.093981e+00 4.513889e-04 2.807523e-01 4.129167e-01 1.489130e+01 1.935425e+01
 [8] 5.902778e-04 1.318773e+00 1.481192e+01