查找长格式日期之间的差异
Finding Difference Between Dates in Long format
我有一个包含开始日期和结束日期的长格式数据集。对于每个 ID,您将看到多个开始和结束日期。
我需要找出第一个结束日期和第二个开始日期之间的差异。我不确定如何使用两行来计算差异。感谢任何帮助。
df=data.frame(c(1,2,2,2,3,4,4),
as.Date(c( "2010-10-01","2009-09-01","2014-01-01","2014-02-01","2009-01-01","2013-03-01","2014-03-01")),
as.Date(c("2016-04-30","2013-12-31","2014-01-31","2016-04-30","2014-02-28","2013-05-01","2014-08-31")));
names(df)=c('id','start','end')
我的输出看起来像这样:
df$diff=c(NA,1,1,NA,NA,304, NA)
这是我认为可以满足您要求的基础 R 尝试:
df$diff <- NA
split(df$diff, df$id) <- by(df, df$id, FUN=function(SD) c(SD$start[-1], NA) - SD$end)
df
# id start end diff
#1 1 2010-10-01 2016-04-30 NA
#2 2 2009-09-01 2013-12-31 1
#3 2 2014-01-01 2014-01-31 1
#4 2 2014-02-01 2016-04-30 NA
#5 3 2009-01-01 2014-02-28 NA
#6 4 2013-03-01 2013-05-01 304
#7 4 2014-03-01 2014-08-31 NA
或者,在 data.table
中它将是:
setDT(df)[, diff := shift(start,n=1,type="lead") - end, by=id]
这是使用流行的 dplyr
包的替代方法:
library(dplyr)
df %>%
group_by(id) %>%
mutate(diff = difftime(lead(start), end, units = "days"))
# id start end diff
# (dbl) (date) (date) (dfft)
# 1 1 2010-10-01 2016-04-30 NA days
# 2 2 2009-09-01 2013-12-31 1 days
# 3 2 2014-01-01 2014-01-31 1 days
# 4 2 2014-02-01 2016-04-30 NA days
# 5 3 2009-01-01 2014-02-28 NA days
# 6 4 2013-03-01 2013-05-01 304 days
# 7 4 2014-03-01 2014-08-31 NA days
如果需要,您可以将 diff
包裹在 as.numeric
中。
再次使用 base R,您可以执行以下操作:
df$noofdays <- as.numeric(as.difftime(df$end-df$start, units=c("days"), format="%Y-%m-%d"))
我有一个包含开始日期和结束日期的长格式数据集。对于每个 ID,您将看到多个开始和结束日期。 我需要找出第一个结束日期和第二个开始日期之间的差异。我不确定如何使用两行来计算差异。感谢任何帮助。
df=data.frame(c(1,2,2,2,3,4,4),
as.Date(c( "2010-10-01","2009-09-01","2014-01-01","2014-02-01","2009-01-01","2013-03-01","2014-03-01")),
as.Date(c("2016-04-30","2013-12-31","2014-01-31","2016-04-30","2014-02-28","2013-05-01","2014-08-31")));
names(df)=c('id','start','end')
我的输出看起来像这样:
df$diff=c(NA,1,1,NA,NA,304, NA)
这是我认为可以满足您要求的基础 R 尝试:
df$diff <- NA
split(df$diff, df$id) <- by(df, df$id, FUN=function(SD) c(SD$start[-1], NA) - SD$end)
df
# id start end diff
#1 1 2010-10-01 2016-04-30 NA
#2 2 2009-09-01 2013-12-31 1
#3 2 2014-01-01 2014-01-31 1
#4 2 2014-02-01 2016-04-30 NA
#5 3 2009-01-01 2014-02-28 NA
#6 4 2013-03-01 2013-05-01 304
#7 4 2014-03-01 2014-08-31 NA
或者,在 data.table
中它将是:
setDT(df)[, diff := shift(start,n=1,type="lead") - end, by=id]
这是使用流行的 dplyr
包的替代方法:
library(dplyr)
df %>%
group_by(id) %>%
mutate(diff = difftime(lead(start), end, units = "days"))
# id start end diff
# (dbl) (date) (date) (dfft)
# 1 1 2010-10-01 2016-04-30 NA days
# 2 2 2009-09-01 2013-12-31 1 days
# 3 2 2014-01-01 2014-01-31 1 days
# 4 2 2014-02-01 2016-04-30 NA days
# 5 3 2009-01-01 2014-02-28 NA days
# 6 4 2013-03-01 2013-05-01 304 days
# 7 4 2014-03-01 2014-08-31 NA days
如果需要,您可以将 diff
包裹在 as.numeric
中。
再次使用 base R,您可以执行以下操作:
df$noofdays <- as.numeric(as.difftime(df$end-df$start, units=c("days"), format="%Y-%m-%d"))