R 随时间超前和滞后(移位)
R lead and lag (shift) with times
我尝试在数据框的列上使用滞后,但是当涉及到时间时它就不起作用了。我试过 shift、lag 和 tlag。
示例:
y = strptime(sprintf("%s:%s:%s", 4, 20, 10), "%H:%M:%S")
yy = strptime(sprintf("%s:%s:%s", 10, 20, 10), "%H:%M:%S")
lag(c(y,yy))
Error in format.POSIXlt(x, usetz = usetz) :
invalid component [[10]] in "POSIXlt" should be 'zone'
tlag(c(y,yy))
Error in n_distinct_multi(list(...), na.rm) :
argument "time" is missing, with no default
shift(c(y,yy))
[[1]]
[1] NA 10
[[2]]
[1] NA 20
[[3]]
[1] NA 4
[[4]]
[1] NA 4
[[5]]
[1] NA 6
[[6]]
[1] NA 117
[[7]]
[1] NA 2
[[8]]
[1] NA 184
[[9]]
[1] NA 1
[[10]]
[1] NA "BST"
[[11]]
[1] NA 3600
我不想要任何时差,我只想要我的数据框中上一行的值,我认为这是滞后造成的:"Lead and lag are useful for comparing values offset by a constant (e.g. the previous or next value)"。
时间甚至不重要,它应该只从以前的位置选择任何 numeric/character/time 。我该如何解决这个问题,或者是否有一个不同的功能可以完成我想要的功能 - 我不想涉及任何循环,因为速度很重要并且数据帧很大。
来自我的数据框的示例:
structure(list(sec = c(52, 53, 54, 55, 56, 57, 58, 59, 0, 1),
min = c(50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 51L, 51L),
hour = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L
), mday = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), mon = c(6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(117L, 117L,
117L, 117L, 117L, 117L, 117L, 117L, 117L, 117L), wday = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), yday = c(184L, 184L,
184L, 184L, 184L, 184L, 184L, 184L, 184L, 184L), isdst = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), zone = c("BST", "BST",
"BST", "BST", "BST", "BST", "BST", "BST", "BST", "BST"),
gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt"))
对于像下面这样的data.frame
index time
1 1 2017-07-04 04:20:10
2 2 2017-07-04 10:20:10
你可以使用dplyr
dplyr::lag(df$time, 1)
[1] NA "2017-07-04 04:20:10 CEST"
dplyr::lead(df$time, 1)
[1] "2017-07-04 10:20:10 CEST" NA
要将 lead/lag 列添加到 data.frame
,您可以使用
dplyr::mutate(df, lead_1 = dplyr::lead(time, 1), lag_1 = dplyr::lag(time, 1))
index time lead_1 lag_1
1 1 2017-07-04 04:20:10 2017-07-04 10:20:10 <NA>
2 2 2017-07-04 10:20:10 <NA> 2017-07-04 04:20:10
我尝试在数据框的列上使用滞后,但是当涉及到时间时它就不起作用了。我试过 shift、lag 和 tlag。
示例:
y = strptime(sprintf("%s:%s:%s", 4, 20, 10), "%H:%M:%S")
yy = strptime(sprintf("%s:%s:%s", 10, 20, 10), "%H:%M:%S")
lag(c(y,yy))
Error in format.POSIXlt(x, usetz = usetz) : invalid component [[10]] in "POSIXlt" should be 'zone'
tlag(c(y,yy))
Error in n_distinct_multi(list(...), na.rm) : argument "time" is missing, with no default
shift(c(y,yy))
[[1]]
[1] NA 10
[[2]]
[1] NA 20
[[3]]
[1] NA 4
[[4]]
[1] NA 4
[[5]]
[1] NA 6
[[6]]
[1] NA 117
[[7]]
[1] NA 2
[[8]]
[1] NA 184
[[9]]
[1] NA 1
[[10]]
[1] NA "BST"
[[11]]
[1] NA 3600
我不想要任何时差,我只想要我的数据框中上一行的值,我认为这是滞后造成的:"Lead and lag are useful for comparing values offset by a constant (e.g. the previous or next value)"。 时间甚至不重要,它应该只从以前的位置选择任何 numeric/character/time 。我该如何解决这个问题,或者是否有一个不同的功能可以完成我想要的功能 - 我不想涉及任何循环,因为速度很重要并且数据帧很大。
来自我的数据框的示例:
structure(list(sec = c(52, 53, 54, 55, 56, 57, 58, 59, 0, 1),
min = c(50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 51L, 51L),
hour = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L
), mday = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), mon = c(6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(117L, 117L,
117L, 117L, 117L, 117L, 117L, 117L, 117L, 117L), wday = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), yday = c(184L, 184L,
184L, 184L, 184L, 184L, 184L, 184L, 184L, 184L), isdst = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), zone = c("BST", "BST",
"BST", "BST", "BST", "BST", "BST", "BST", "BST", "BST"),
gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt"))
对于像下面这样的data.frame
index time
1 1 2017-07-04 04:20:10
2 2 2017-07-04 10:20:10
你可以使用dplyr
dplyr::lag(df$time, 1)
[1] NA "2017-07-04 04:20:10 CEST"
dplyr::lead(df$time, 1)
[1] "2017-07-04 10:20:10 CEST" NA
要将 lead/lag 列添加到 data.frame
,您可以使用
dplyr::mutate(df, lead_1 = dplyr::lead(time, 1), lag_1 = dplyr::lag(time, 1))
index time lead_1 lag_1
1 1 2017-07-04 04:20:10 2017-07-04 10:20:10 <NA>
2 2 2017-07-04 10:20:10 <NA> 2017-07-04 04:20:10