R:Return 组和数据子集的下一次出现值(条件 lead/lag)
R: Return value of next occurence by group AND from a subset of data (conditional lead/lag)
我的问题是这个问题的 continuation/modification:
我正在使用以下数据:
df<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C",
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600,
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600,
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
我想 return 下一个 datetime
某个 firm
出现在数据中并且 employee
对所有 employee
等于 0等于 1.
df_expected<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C",
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600,
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600,
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L), NextTime = structure(c(1514799000,
NA, 1514797200, NA, 1514804400, 1514804400,
NA, NA, NA, NA), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -10L), class = "data.frame")
我用 dplyr
尝试过,只有当每个 firm
不超过一个 employee=="0"
时才有效:
df %>%
group_by(firm) %>%
mutate(nextTime=datetime[employee==0])
..或者如果每个 firm
不超过一个 employee=="1"
:
df %>%
group_by(firm) %>%
mutate(nextTime=lead(datetime))
我尝试了上述代码片段的多个“组合”以及 data.table
对原始问题的回答,但都没有结果。非常感谢您的帮助!
试试这个:
df_expected %>%
group_by(firm) %>%
mutate(NextTime2 = if_else(lead(employee == 0), lead(datetime), datetime[NA])) %>%
tidyr::fill(NextTime2, .direction = "up") %>%
mutate(NextTime2 = if_else(employee == 0, NextTime2[NA], NextTime2)) %>%
ungroup()
# # A tibble: 10 x 5
# firm datetime employee NextTime NextTime2
# <chr> <dttm> <int> <dttm> <dttm>
# 1 A 2018-01-01 08:00:00 1 2018-01-01 09:30:00 2018-01-01 09:30:00
# 2 A 2018-01-01 09:30:00 0 NA NA
# 3 B 2018-01-01 08:00:00 1 2018-01-01 09:00:00 2018-01-01 09:00:00
# 4 B 2018-01-01 09:00:00 0 NA NA
# 5 B 2018-01-01 10:00:00 1 2018-01-01 11:00:00 2018-01-01 11:00:00
# 6 B 2018-01-01 10:55:00 1 2018-01-01 11:00:00 2018-01-01 11:00:00
# 7 B 2018-01-01 11:00:00 0 NA NA
# 8 C 2018-01-01 10:00:00 1 NA NA
# 9 C 2018-01-01 10:30:00 1 NA NA
# 10 C 2018-01-01 10:35:00 1 NA NA datetime
仅供参考:[NA]
索引是确保 true=
和 false=
向量的 class
相同的一种技巧。如果我只使用 NA
,它就会失败,因为 NA
是 class logical
:
if_else(TRUE, 1, NA)
# Error in `if_else()`:
# ! `false` must be a double vector, not a logical vector.
通过使用 [NA]
进行索引,我们保证它将是合适的 class(有超过 6 种不同类型的 NA
):
(1:3)[NA]
# [1] NA NA NA
class( (1:3)[NA] )
# [1] "integer"
#### many types of `NA`
class(NA)
# [1] "logical"
class( (seq(1,3,by=0.5))[NA] )
# [1] "numeric"
class( letters[NA] )
# [1] "character"
class( Sys.time()[NA] )
# [1] "POSIXct" "POSIXt"
class( Sys.Date()[NA] )
# [1] "Date"
我的问题是这个问题的 continuation/modification:
我正在使用以下数据:
df<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C",
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600,
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600,
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
我想 return 下一个 datetime
某个 firm
出现在数据中并且 employee
对所有 employee
等于 0等于 1.
df_expected<-structure(list(firm = c("A", "A", "B", "B", "B", "B", "B", "C",
"C", "C"), datetime = structure(c(1514793600, 1514799000, 1514793600,
1514797200, 1514800800, 1514804100, 1514804400, 1514800800, 1514802600,
1514802900), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
employee = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L), NextTime = structure(c(1514799000,
NA, 1514797200, NA, 1514804400, 1514804400,
NA, NA, NA, NA), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -10L), class = "data.frame")
我用 dplyr
尝试过,只有当每个 firm
不超过一个 employee=="0"
时才有效:
df %>%
group_by(firm) %>%
mutate(nextTime=datetime[employee==0])
..或者如果每个 firm
不超过一个 employee=="1"
:
df %>%
group_by(firm) %>%
mutate(nextTime=lead(datetime))
我尝试了上述代码片段的多个“组合”以及 data.table
对原始问题的回答,但都没有结果。非常感谢您的帮助!
试试这个:
df_expected %>%
group_by(firm) %>%
mutate(NextTime2 = if_else(lead(employee == 0), lead(datetime), datetime[NA])) %>%
tidyr::fill(NextTime2, .direction = "up") %>%
mutate(NextTime2 = if_else(employee == 0, NextTime2[NA], NextTime2)) %>%
ungroup()
# # A tibble: 10 x 5
# firm datetime employee NextTime NextTime2
# <chr> <dttm> <int> <dttm> <dttm>
# 1 A 2018-01-01 08:00:00 1 2018-01-01 09:30:00 2018-01-01 09:30:00
# 2 A 2018-01-01 09:30:00 0 NA NA
# 3 B 2018-01-01 08:00:00 1 2018-01-01 09:00:00 2018-01-01 09:00:00
# 4 B 2018-01-01 09:00:00 0 NA NA
# 5 B 2018-01-01 10:00:00 1 2018-01-01 11:00:00 2018-01-01 11:00:00
# 6 B 2018-01-01 10:55:00 1 2018-01-01 11:00:00 2018-01-01 11:00:00
# 7 B 2018-01-01 11:00:00 0 NA NA
# 8 C 2018-01-01 10:00:00 1 NA NA
# 9 C 2018-01-01 10:30:00 1 NA NA
# 10 C 2018-01-01 10:35:00 1 NA NA datetime
仅供参考:[NA]
索引是确保 true=
和 false=
向量的 class
相同的一种技巧。如果我只使用 NA
,它就会失败,因为 NA
是 class logical
:
if_else(TRUE, 1, NA)
# Error in `if_else()`:
# ! `false` must be a double vector, not a logical vector.
通过使用 [NA]
进行索引,我们保证它将是合适的 class(有超过 6 种不同类型的 NA
):
(1:3)[NA]
# [1] NA NA NA
class( (1:3)[NA] )
# [1] "integer"
#### many types of `NA`
class(NA)
# [1] "logical"
class( (seq(1,3,by=0.5))[NA] )
# [1] "numeric"
class( letters[NA] )
# [1] "character"
class( Sys.time()[NA] )
# [1] "POSIXct" "POSIXt"
class( Sys.Date()[NA] )
# [1] "Date"