为每个 id 计算 A 组与其上方行之间的日期差异(以天为单位)
Calculate difference in dates (in days) between group A and the row above it for each id
这是我的 df
(data.frame):
id group date
[1] 1 B 2000-01-01
[2] 1 B 2001-02-11
[3] 1 A 2001-04-06
[4] 2 C 2000-02-01
[5] 2 A 2001-01-01
[6] 2 B 2004-11-12
...
data.frame已按id和日期排列。
我想为每个 id 计算 A 组和它上面的行之间的日期差异(以天为单位)。在我的数据中,每个组 A 上面都有一行相同的 ID。
我感兴趣的结果看起来像这样
id days
[1] 1 54
[2] 2 335
...
请指教
谢谢。
这是一个使用 dplyr
的想法
library(dplyr)
#make sure "date" has the appropriate class
df$date <- as.POSIXct(df$date, format = '%Y-%m-%d')
df %>%
group_by(id) %>%
mutate(diff1 = c(NA, round(diff.difftime(date, units = 'days')))) %>%
filter(group == 'A') %>%
select(id, diff1)
#Source: local data frame [2 x 2]
#Groups: id [2]
# id diff1
# <int> <dbl>
#1 1 54
#2 2 335
因为它已经排序了,你可以这样做:
dft %>%
group_by(id) %>%
mutate(diff_days = difftime(date, lag(date))) %>%
filter(group == "A") %>%
select(diff_days)
给出:
id diff_days
<int> <time>
1 1 54 days
2 2 335 days
我们可以使用data.table
library(data.table)
setDT(df)[, diff1 := c(NA, round(diff.difftime(date,
units = 'days'), 0)), id][group=="A"][, c("id", "diff1"), with = FALSE]
# id diff1
#1: 1 54
#2: 2 335
这是我的 df
(data.frame):
id group date
[1] 1 B 2000-01-01
[2] 1 B 2001-02-11
[3] 1 A 2001-04-06
[4] 2 C 2000-02-01
[5] 2 A 2001-01-01
[6] 2 B 2004-11-12
...
data.frame已按id和日期排列。 我想为每个 id 计算 A 组和它上面的行之间的日期差异(以天为单位)。在我的数据中,每个组 A 上面都有一行相同的 ID。
我感兴趣的结果看起来像这样
id days
[1] 1 54
[2] 2 335
...
请指教
谢谢。
这是一个使用 dplyr
library(dplyr)
#make sure "date" has the appropriate class
df$date <- as.POSIXct(df$date, format = '%Y-%m-%d')
df %>%
group_by(id) %>%
mutate(diff1 = c(NA, round(diff.difftime(date, units = 'days')))) %>%
filter(group == 'A') %>%
select(id, diff1)
#Source: local data frame [2 x 2]
#Groups: id [2]
# id diff1
# <int> <dbl>
#1 1 54
#2 2 335
因为它已经排序了,你可以这样做:
dft %>%
group_by(id) %>%
mutate(diff_days = difftime(date, lag(date))) %>%
filter(group == "A") %>%
select(diff_days)
给出:
id diff_days
<int> <time>
1 1 54 days
2 2 335 days
我们可以使用data.table
library(data.table)
setDT(df)[, diff1 := c(NA, round(diff.difftime(date,
units = 'days'), 0)), id][group=="A"][, c("id", "diff1"), with = FALSE]
# id diff1
#1: 1 54
#2: 2 335