如果我的行本身相同,如何不计算 datediff
How to don't calculate datediff if I have the line following itself the same
我在 dplyr 的 R 中遇到问题。我计算日期之间的 diffdate 并创建新列,但我想在状态与自身后面的一行(行 +1)相同时省略。它应该是 NA 或 0.
serial status date days days2
312313124 Good Stock 20/01/2021 0 0
312313124 Under Assessment 29/01/2021 9 9
312313124 In Repair 03/02/2021 4 4
312313124 Under Assessment 06/02/2021 3 3 <- is correct, because between status: Under is other status
70453423040 Under Assessment 18/03/2021 0 0
70453423040 In Repair 25/03/2021 7 0
70453423040 In Repair 28/03/2021 3 0 <- should be NA or 0, because in the same serial, status before (1 line above) is the same
12131231 Good Stock 03/04/2021 6
我尝试根据我的数据重新分组,但没有用。我使用这个代码。:
df2 <- df %>%
distinct() %>%
group_by(Serial) %>%
mutate(Days = c(NA, as.numeric(diff(Exported), units='days'))) %>%
ungroup() %>%
group_by(Serial, Status, Date) %>%
mutate(Days2 = if_else(row_number() > 1 , NA, Days)) %>%
ungroup()
我也尝试过,但是当我有重复项时,即使记录之间的差异大于 2,这段代码也会显示给我。
df3<- df%>%
group_by(Serial, Status) %>%
mutate(Days2 = +duplicated((paste(Serial, Status)))
您应该可以使用 lag
/ lead
来完成此操作。类似于以下内容:
df2 <- df %>%
distinct() %>%
group_by(Serial) %>%
mutate(next_date = lead(date, 1, order_by = date),
next_status = lead(status, 1, order_by = date)) %>%
mutate(Days2 = ifelse(status != next_status, next_date - date, 0)
我在 dplyr 的 R 中遇到问题。我计算日期之间的 diffdate 并创建新列,但我想在状态与自身后面的一行(行 +1)相同时省略。它应该是 NA 或 0.
serial status date days days2
312313124 Good Stock 20/01/2021 0 0
312313124 Under Assessment 29/01/2021 9 9
312313124 In Repair 03/02/2021 4 4
312313124 Under Assessment 06/02/2021 3 3 <- is correct, because between status: Under is other status
70453423040 Under Assessment 18/03/2021 0 0
70453423040 In Repair 25/03/2021 7 0
70453423040 In Repair 28/03/2021 3 0 <- should be NA or 0, because in the same serial, status before (1 line above) is the same
12131231 Good Stock 03/04/2021 6
我尝试根据我的数据重新分组,但没有用。我使用这个代码。:
df2 <- df %>%
distinct() %>%
group_by(Serial) %>%
mutate(Days = c(NA, as.numeric(diff(Exported), units='days'))) %>%
ungroup() %>%
group_by(Serial, Status, Date) %>%
mutate(Days2 = if_else(row_number() > 1 , NA, Days)) %>%
ungroup()
我也尝试过,但是当我有重复项时,即使记录之间的差异大于 2,这段代码也会显示给我。
df3<- df%>%
group_by(Serial, Status) %>%
mutate(Days2 = +duplicated((paste(Serial, Status)))
您应该可以使用 lag
/ lead
来完成此操作。类似于以下内容:
df2 <- df %>%
distinct() %>%
group_by(Serial) %>%
mutate(next_date = lead(date, 1, order_by = date),
next_status = lead(status, 1, order_by = date)) %>%
mutate(Days2 = ifelse(status != next_status, next_date - date, 0)