创建对应于 R 中日期范围的单个年份变量
Create single year variables corresponding to a range of dates in R
我希望能够采用表示开始日期和结束日期的两个变量,并创建指示这两个日期范围内涵盖哪些年份的变量。
我有:
df1 <- data.frame(ID = c("A", "B", "C"),
Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"),
End_Date = c("6/25/2009","11/2/2006", "6/9/2011"))
我想要的:
df2 <- data.frame(ID = c("A", "B", "C"),
Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"),
End_Date = c("6/25/2009","11/2/2006", "6/9/2011"),
y2004 = c(1, 0, 0),
y2005 = c(1, 1, 0),
y2006 = c(1, 1, 0),
y2007 = c(1, 0, 0),
y2008 = c(1, 0, 1),
y2009 = c(0, 0, 1),
y2010 = c(0, 0, 1),
y2011 = c(0, 0, 1))
同上,每个新的年份变量表示年份是否在“Start_Date”和“End_Date”这两个日期变量的范围内。
如有任何想法,我们将不胜感激。提前致谢。
一种方法是转'long'格式,转Date
class后提取year
部分,得到seq
(:
) 从第一个到最后按 'ID' 分组并重塑回 'wide',然后按 'ID'
与原始数据合并
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
df1 %>%
pivot_longer(cols = -ID) %>%
group_by(ID) %>%
summarise(year = str_c('y', year(mdy(value)[1]):year(mdy(value)[2])),
n = 1, .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = n, values_fill = 0) %>%
left_join(df1, .)
-输出
# ID Start_Date End_Date y2004 y2005 y2006 y2007 y2008 y2009 y2010 y2011
#1 A 3/5/2004 6/25/2009 1 1 1 1 1 1 0 0
#2 B 8/22/2005 11/2/2006 0 1 1 0 0 0 0 0
#3 C 4/8/2008 6/9/2011 0 0 0 0 1 1 1 1
我希望能够采用表示开始日期和结束日期的两个变量,并创建指示这两个日期范围内涵盖哪些年份的变量。
我有:
df1 <- data.frame(ID = c("A", "B", "C"),
Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"),
End_Date = c("6/25/2009","11/2/2006", "6/9/2011"))
我想要的:
df2 <- data.frame(ID = c("A", "B", "C"),
Start_Date = c("3/5/2004", "8/22/2005", "4/8/2008"),
End_Date = c("6/25/2009","11/2/2006", "6/9/2011"),
y2004 = c(1, 0, 0),
y2005 = c(1, 1, 0),
y2006 = c(1, 1, 0),
y2007 = c(1, 0, 0),
y2008 = c(1, 0, 1),
y2009 = c(0, 0, 1),
y2010 = c(0, 0, 1),
y2011 = c(0, 0, 1))
同上,每个新的年份变量表示年份是否在“Start_Date”和“End_Date”这两个日期变量的范围内。
如有任何想法,我们将不胜感激。提前致谢。
一种方法是转'long'格式,转Date
class后提取year
部分,得到seq
(:
) 从第一个到最后按 'ID' 分组并重塑回 'wide',然后按 'ID'
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)
df1 %>%
pivot_longer(cols = -ID) %>%
group_by(ID) %>%
summarise(year = str_c('y', year(mdy(value)[1]):year(mdy(value)[2])),
n = 1, .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = n, values_fill = 0) %>%
left_join(df1, .)
-输出
# ID Start_Date End_Date y2004 y2005 y2006 y2007 y2008 y2009 y2010 y2011
#1 A 3/5/2004 6/25/2009 1 1 1 1 1 1 0 0
#2 B 8/22/2005 11/2/2006 0 1 1 0 0 0 0 0
#3 C 4/8/2008 6/9/2011 0 0 0 0 1 1 1 1