创建多个时差变量的最佳方法

Best way to create multiple time differences variables

我想创建多个变量来显示多个变量与一个变量 (V0) 的时间差。我想要绝对差异(即忽略差异的符号)。我所有的变量都是日期格式。

我有下面的代码,它可以工作,但我想有一种 neater/better 方法可以用更少的代码行来做到这一点。我已经尝试了几件事,但运气不佳。

df$V1_timediff <- (abs(as.numeric(difftime(df$V0, df$V1, units = "days"))))

df$V2_timediff <- (abs(as.numeric(difftime(df$V0, df$V2, units = "days"))))

df$V3_timediff <- (abs(as.numeric(difftime(df$V0, df$V3, units = "days"))))

df$V4_timediff <- (abs(as.numeric(difftime(df$V0, df$V4, units = "days"))))

我将使用 mtcars 进行演示。由于它没有 POSIXt 个对象,我将使用简单的 -;这也适用于您的情况,没有变化,因此从技术上讲不需要 difftime,结果应该是相同的。但是,如果适应使用 difftime.

,则以下两种解决方案的前提都可以工作

dplyr

library(dplyr)
mtcars %>%
  mutate(across(vs:carb, list(timediff = ~ abs(as.numeric(cyl - ., units = "days"))))) %>%
  head()
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb vs_timediff am_timediff gear_timediff carb_timediff
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4           6           5             2             2
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4           6           5             2             2
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1           3           3             0             3
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1           5           6             3             5
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2           8           8             5             6
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1           5           6             3             5

基础 R

tmp <- lapply(mtcars$cyl - subset(mtcars, select = vs:carb),
              function(z) abs(as.numeric(z, units = "days")))
names(tmp) <- paste0(names(tmp), "_timediff")
head(cbind(mtcars, tmp))
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb vs_timediff am_timediff gear_timediff carb_timediff
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4           6           5             2             2
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4           6           5             2             2
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1           3           3             0             3
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1           5           6             3             5
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2           8           8             5             6
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1           5           6             3             5

base中我们可以定义一个UDF并循环遍历列:

time_diff <- function(df, v0, vn) {
  abs(as.numeric(difftime(df[[v0]], df[[vn]], units = "days")))
}

lapply(c("t2", "t3"), function(tn) time_diff(test, "t1", tn))
#> [[1]]
#> [1] 0.05208333 0.05208333 0.05208333 0.05208333 0.05208333
#> 
#> [[2]]
#> [1] 0.1041667 0.1041667 0.1041667 0.1041667 0.1041667

数据:

structure(list(t1 = structure(c(1014919200, 1014920100, 1014921000,
                                1014921900, 1014922800), 
                    class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                t2 = structure(c(1014923700, 1014924600, 1014925500, 
                                 1014926400, 1014927300),       
                    class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                t3 = structure(c(1014928200, 1014929100, 1014930000, 
                                 1014930900, 1014931800), 
                    class = c("POSIXct", "POSIXt"), tzone = "UTC")), 
                class = "data.frame", row.names = c(NA, -5L))