根据R中的一列日期扩展具有顺序日期的数据框
Expand dataframe with sequential dates based on a column of dates in R
我想根据我的日期列扩展我的数据框,以便在我当前日期之间有新的日期行,按时间顺序排列。我的日期列是按时间顺序排列的,跨越 5 年,并且包含我想忽略的重复日期。我希望新行的相应 Group 和 Draw 行为 "NA"。
zz <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-14 aa T
4 2006-05-16 aa T
5 2006-05-20 cc F
6 2006-05-20 bb F
7 2006-05-21 aa T"
Data <- read.table(text=zz, header = TRUE)
所以我希望我的新数据框看起来像这样:
xx <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-12 NA NA
4 2006-05-13 NA NA
5 2006-05-14 aa T
6 2006-05-15 NA NA
7 2006-05-16 aa T
8 2006-05-17 NA NA
9 2006-05-18 NA NA
10 2006-05-19 NA NA
11 2006-05-20 cc F
12 2006-05-20 bb F
13 2006-05-21 aa T"
Output <- read.table(text=xx, header = TRUE)
任何帮助将不胜感激。我是 R 的新手,一直在尝试手动执行此操作。
我认为这应该可以正常工作:
merge(
x = data.frame(
Date = seq.Date(min(df$Date), max(df$Date), by = "day")
),
y = df,
all.x = TRUE
)
# Date Group Draw
# 1 2006-05-11 bb TRUE
# 2 2006-05-11 bb FALSE
# 3 2006-05-12 <NA> NA
# 4 2006-05-13 <NA> NA
# 5 2006-05-14 aa TRUE
# 6 2006-05-15 <NA> NA
# 7 2006-05-16 aa TRUE
# 8 2006-05-17 <NA> NA
# 9 2006-05-18 <NA> NA
# 10 2006-05-19 <NA> NA
# 11 2006-05-20 cc FALSE
# 12 2006-05-20 bb FALSE
# 13 2006-05-21 aa TRUE
所有这一切都是创建一个跨越实际数据范围的日期序列,然后执行左连接。
同样的想法,使用data.table
:
dt[dt[,.(Date = seq.Date(min(Date), max(Date), by = "day"))], on = .(Date)]
# Date Group Draw
# 1: 2006-05-11 bb TRUE
# 2: 2006-05-11 bb FALSE
# 3: 2006-05-12 NA NA
# 4: 2006-05-13 NA NA
# 5: 2006-05-14 aa TRUE
# 6: 2006-05-15 NA NA
# 7: 2006-05-16 aa TRUE
# 8: 2006-05-17 NA NA
# 9: 2006-05-18 NA NA
# 10: 2006-05-19 NA NA
# 11: 2006-05-20 cc FALSE
# 12: 2006-05-20 bb FALSE
# 13: 2006-05-21 aa TRUE
zz <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-14 aa T
4 2006-05-16 aa T
5 2006-05-20 cc F
6 2006-05-20 bb F
7 2006-05-21 aa T"
df <- read.table(
text = zz,
header = TRUE
)
df$Date <- as.Date(df$Date)
library(data.table)
dt <- data.table(read.table(text = zz, header = TRUE))[,Date := as.Date(Date)]
如果我没有正确理解你的问题,以下是我的粗略理解:
date <- format(seq.Date(from=as.Date(paste(2006, '05', '11', sep='-'),
'%Y-%m-%d'),
to =as.Date(paste(2006, 05, '21', sep='-'),
'%Y-%m-%d'),
by = "day"), '%Y-%m-%d')
上面生成了日期列表。然后,您可以使用上面 date
的左连接到您的 data.table.
使用来自@nrussell 的 post 的数据,另一种选择是来自 tidyr
的 complete
library(tidyr)
complete(df, Date = full_seq(Date, 1))
## A tibble: 13 × 3
# Date Group Draw
# <date> <fctr> <lgl>
#1 2006-05-11 bb TRUE
#2 2006-05-11 bb FALSE
#3 2006-05-12 NA NA
#4 2006-05-13 NA NA
#5 2006-05-14 aa TRUE
#6 2006-05-15 NA NA
#7 2006-05-16 aa TRUE
#8 2006-05-17 NA NA
#9 2006-05-18 NA NA
#10 2006-05-19 NA NA
#11 2006-05-20 cc FALSE
#12 2006-05-20 bb FALSE
#13 2006-05-21 aa TRUE
我想根据我的日期列扩展我的数据框,以便在我当前日期之间有新的日期行,按时间顺序排列。我的日期列是按时间顺序排列的,跨越 5 年,并且包含我想忽略的重复日期。我希望新行的相应 Group 和 Draw 行为 "NA"。
zz <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-14 aa T
4 2006-05-16 aa T
5 2006-05-20 cc F
6 2006-05-20 bb F
7 2006-05-21 aa T"
Data <- read.table(text=zz, header = TRUE)
所以我希望我的新数据框看起来像这样:
xx <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-12 NA NA
4 2006-05-13 NA NA
5 2006-05-14 aa T
6 2006-05-15 NA NA
7 2006-05-16 aa T
8 2006-05-17 NA NA
9 2006-05-18 NA NA
10 2006-05-19 NA NA
11 2006-05-20 cc F
12 2006-05-20 bb F
13 2006-05-21 aa T"
Output <- read.table(text=xx, header = TRUE)
任何帮助将不胜感激。我是 R 的新手,一直在尝试手动执行此操作。
我认为这应该可以正常工作:
merge(
x = data.frame(
Date = seq.Date(min(df$Date), max(df$Date), by = "day")
),
y = df,
all.x = TRUE
)
# Date Group Draw
# 1 2006-05-11 bb TRUE
# 2 2006-05-11 bb FALSE
# 3 2006-05-12 <NA> NA
# 4 2006-05-13 <NA> NA
# 5 2006-05-14 aa TRUE
# 6 2006-05-15 <NA> NA
# 7 2006-05-16 aa TRUE
# 8 2006-05-17 <NA> NA
# 9 2006-05-18 <NA> NA
# 10 2006-05-19 <NA> NA
# 11 2006-05-20 cc FALSE
# 12 2006-05-20 bb FALSE
# 13 2006-05-21 aa TRUE
所有这一切都是创建一个跨越实际数据范围的日期序列,然后执行左连接。
同样的想法,使用data.table
:
dt[dt[,.(Date = seq.Date(min(Date), max(Date), by = "day"))], on = .(Date)]
# Date Group Draw
# 1: 2006-05-11 bb TRUE
# 2: 2006-05-11 bb FALSE
# 3: 2006-05-12 NA NA
# 4: 2006-05-13 NA NA
# 5: 2006-05-14 aa TRUE
# 6: 2006-05-15 NA NA
# 7: 2006-05-16 aa TRUE
# 8: 2006-05-17 NA NA
# 9: 2006-05-18 NA NA
# 10: 2006-05-19 NA NA
# 11: 2006-05-20 cc FALSE
# 12: 2006-05-20 bb FALSE
# 13: 2006-05-21 aa TRUE
zz <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-14 aa T
4 2006-05-16 aa T
5 2006-05-20 cc F
6 2006-05-20 bb F
7 2006-05-21 aa T"
df <- read.table(
text = zz,
header = TRUE
)
df$Date <- as.Date(df$Date)
library(data.table)
dt <- data.table(read.table(text = zz, header = TRUE))[,Date := as.Date(Date)]
如果我没有正确理解你的问题,以下是我的粗略理解:
date <- format(seq.Date(from=as.Date(paste(2006, '05', '11', sep='-'),
'%Y-%m-%d'),
to =as.Date(paste(2006, 05, '21', sep='-'),
'%Y-%m-%d'),
by = "day"), '%Y-%m-%d')
上面生成了日期列表。然后,您可以使用上面 date
的左连接到您的 data.table.
使用来自@nrussell 的 post 的数据,另一种选择是来自 tidyr
complete
library(tidyr)
complete(df, Date = full_seq(Date, 1))
## A tibble: 13 × 3
# Date Group Draw
# <date> <fctr> <lgl>
#1 2006-05-11 bb TRUE
#2 2006-05-11 bb FALSE
#3 2006-05-12 NA NA
#4 2006-05-13 NA NA
#5 2006-05-14 aa TRUE
#6 2006-05-15 NA NA
#7 2006-05-16 aa TRUE
#8 2006-05-17 NA NA
#9 2006-05-18 NA NA
#10 2006-05-19 NA NA
#11 2006-05-20 cc FALSE
#12 2006-05-20 bb FALSE
#13 2006-05-21 aa TRUE