在 R 中的一个序列（不同的行）中获取 id 中的日期之间的差异

Question

我想 - 对于每个 id (cpf) - 计算前一行 hire_date 和 sep_date 之间的月差。例如，我想区分与订单 1 关联的 hire_date 和与订单 2 关联的 sep_date 之间的区别（对于具有两个以上订单值的 ID 也是如此）。

并非所有观测值都只有两个阶值。有些可能有更多。我怎样才能编写一个代码来解决这个问题？在这种情况下，一个 id 有两行以上。所以我还需要做出不止一个不同。

我总是想区分给定的 hire_date（例如订单 2）和之前的 sep_Date（订单 1）等等。对于两行以上：hire_date（顺序 3）- sep_date（顺序 2）； hire_date（订单 2）- sep_date（订单 1）...

structure(list(cpf = c(234L, 234L, 245L, 245L, 245L, 555L, 555L
), hire_date = c("10-11-29", "13-7-29", "11-10-19", "13-3-20", 
"13-5-20", "10-02-18", "13-11-21"), sep_date = c("13-4-18", "13-8-29", 
"13-2-15", "13-4-20", NA, "13-10-20", NA), Order = c(1L, 2L, 
1L, 2L, 3L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-7L))

  cpf hire_date sep_date Order
1 234  10-11-29  13-4-18     1
2 234   13-7-29  13-8-29     2
3 245  11-10-19  13-2-15     1
4 245   13-3-20  13-4-20     2
5 245   13-5-20     <NA>     3
6 555  10-02-18 13-10-20     1
7 555  13-11-21     <NA>     2

如有任何帮助，我们将不胜感激！

Answer 1

我们可以将日期列转换为 Date class 并按 difftime

分组

library(dplyr)
library(lubridate)
df1 %>%
  mutate(across(hire_date:sep_date, dmy)) %>% 
  group_by(cpf) %>% 
  mutate(Month = as.numeric(difftime(hire_date, 
     lag(sep_date), unit = "weeks"))/4) %>%
  ungroup

在 R 中的一个序列（不同的行）中获取 id 中的日期之间的差异

Take differences between dates within id over a sequence (different rows) in R

r

date

sequence

difference