为每个 id 计算 A 组与其上方行之间的日期差异(以天为单位)

Calculate difference in dates (in days) between group A and the row above it for each id

这是我的 df (data.frame):

      id   group     date
[1]    1     B    2000-01-01
[2]    1     B    2001-02-11  
[3]    1     A    2001-04-06   
[4]    2     C    2000-02-01
[5]    2     A    2001-01-01
[6]    2     B    2004-11-12 
    ...

data.frame已按id和日期排列。 我想为每个 id 计算 A 组和它上面的行之间的日期差异(以天为单位)。在我的数据中,每个组 A 上面都有一行相同的 ID。

我感兴趣的结果看起来像这样

      id       days
[1]    1        54  
[2]    2       335
    ...

请指教

谢谢。

这是一个使用 dplyr

的想法
library(dplyr)

#make sure "date" has the appropriate class
df$date <- as.POSIXct(df$date, format = '%Y-%m-%d')

df %>% 
 group_by(id) %>%
 mutate(diff1 = c(NA, round(diff.difftime(date, units = 'days')))) %>% 
 filter(group == 'A') %>%
 select(id, diff1)

#Source: local data frame [2 x 2]
#Groups: id [2]

#     id diff1
#  <int> <dbl>
#1     1    54
#2     2   335

因为它已经排序了,你可以这样做:

dft %>%
  group_by(id) %>%
  mutate(diff_days = difftime(date, lag(date))) %>%
  filter(group == "A") %>%
  select(diff_days)

给出:

     id diff_days
  <int>    <time>
1     1   54 days
2     2  335 days

我们可以使用data.table

library(data.table)
setDT(df)[, diff1 :=  c(NA, round(diff.difftime(date,
    units = 'days'), 0)), id][group=="A"][, c("id", "diff1"), with = FALSE]
#   id diff1
#1:  1    54
#2:  2   335