R根据条件搜索跨列的最小日期
R searching for minimum date across columns based on a condition
我有以下带有几个日期变量的数据框。
x <- structure(list(id = c(1, 2, 3, 4), date = structure(c(18611,
16801, 16801, 17532), class = "Date"), s1 = c(0, 1, 1, NA), date1 = structure(c(17880,
16450, 16416, NA), class = "Date"), s2 = c(0, 0, 1, NA), date2 = structure(c(17880,
NA, 15869, NA), class = "Date"), DN = structure(c(18611, 15869,
15869, NA), class = "Date")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
我想比较 date1
和 date2
并根据 case_when
中的条件生成 DN
作为两个日期中的最小值。我目前正在使用此代码:
x <- mutate(date = as.Date(date),
date1 = as.Date(date1),
date2 = as.Date(date2),
DN = case_when(
s1 == 1 | s2 == 1 ~ min(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
但是,我得到了一个奇怪的结果!!对于id = 2
,DN
的值取自id = 3
,我无法理解!!
有什么想法吗?
感谢转发
如果您添加 rowwise()
(即按行分组),您将获得所需的 row-minimum:
x %>%
rowwise() %>%
mutate(date = as.Date(date),
date1 = as.Date(date1),
date2 = as.Date(date2),
DN = case_when(
s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
您可以使用pmin
来select分配列的第一个日期。您可以使用以下代码:
library(dplyr)
x %>%
mutate(DN = case_when(
s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
输出:
# A tibble: 4 × 7
id date s1 date1 s2 date2 DN
<dbl> <date> <dbl> <date> <dbl> <date> <date>
1 1 2020-12-15 0 2018-12-15 0 2018-12-15 2020-12-15
2 2 2016-01-01 1 2015-01-15 0 NA 2015-01-15
3 3 2016-01-01 1 2014-12-12 1 2013-06-13 2013-06-13
4 4 2018-01-01 NA NA NA NA NA
我有以下带有几个日期变量的数据框。
x <- structure(list(id = c(1, 2, 3, 4), date = structure(c(18611,
16801, 16801, 17532), class = "Date"), s1 = c(0, 1, 1, NA), date1 = structure(c(17880,
16450, 16416, NA), class = "Date"), s2 = c(0, 0, 1, NA), date2 = structure(c(17880,
NA, 15869, NA), class = "Date"), DN = structure(c(18611, 15869,
15869, NA), class = "Date")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
我想比较 date1
和 date2
并根据 case_when
中的条件生成 DN
作为两个日期中的最小值。我目前正在使用此代码:
x <- mutate(date = as.Date(date),
date1 = as.Date(date1),
date2 = as.Date(date2),
DN = case_when(
s1 == 1 | s2 == 1 ~ min(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
但是,我得到了一个奇怪的结果!!对于id = 2
,DN
的值取自id = 3
,我无法理解!!
有什么想法吗? 感谢转发
如果您添加 rowwise()
(即按行分组),您将获得所需的 row-minimum:
x %>%
rowwise() %>%
mutate(date = as.Date(date),
date1 = as.Date(date1),
date2 = as.Date(date2),
DN = case_when(
s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
您可以使用pmin
来select分配列的第一个日期。您可以使用以下代码:
library(dplyr)
x %>%
mutate(DN = case_when(
s1 == 1 | s2 == 1 ~ pmin(date1, date2, na.rm = T),
s1 == 0 | s2 == 0 ~ date,
is.na(s1) & is.na(s2) ~ NA_real_
))
输出:
# A tibble: 4 × 7
id date s1 date1 s2 date2 DN
<dbl> <date> <dbl> <date> <dbl> <date> <date>
1 1 2020-12-15 0 2018-12-15 0 2018-12-15 2020-12-15
2 2 2016-01-01 1 2015-01-15 0 NA 2015-01-15
3 3 2016-01-01 1 2014-12-12 1 2013-06-13 2013-06-13
4 4 2018-01-01 NA NA NA NA NA