在列表 R 中插入并用 NA 填充缺失日期的行
Insert and fill rows of missing dates with NA in List R
目前我在列表中有多个数据帧,格式如下:
datetime precip code
1 2015-04-15 00:00:00 NA M
2 2015-04-15 01:00:00 NA M
3 2015-04-15 02:00:00 NA M
4 2015-04-15 03:00:00 NA M
5 2015-04-15 04:00:00 NA M
6 2015-04-15 05:00:00 NA M
每个数据框都有不同的开始和结束日期,但我希望每个数据框从 2015-04-01 0:00:00
到 2015-11-30 23:59:59
开始。我想在每个数据框中为 datetime
中的缺失日期生成行,并用 NA
填充 precip
列,以便我有一个连续的时间序列 nrow=5856
在每个数据框。
忽略 code
列。如果 precip
存在值,请不要更改它们,只需将附加的 datetime
rows
填充为 NAs
到目前为止我的尝试产生了一个错误:
library(dplyr)
dates <- seq.POSIXt(as.POSIXlt("2015-04-01 0:00:00"), as.POSIXlt("2015-11-30 23:59:59"), by="hour",tz="GMT")
ts <- format.POSIXct(dates,"%Y/%m/%d %H:%M")
df <- data.frame(datetime=ts)
dat=mylist
final_list <- lapply(dat, function(x) full_join(df,dat$precip))
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"
link to sample file in case it is needed
感谢您的建议。
正如 vitor 上面指出的,您只能连接两个 data.frames,而不是 data.frame 和一个向量。 dplyr
也可以与 POSIXct
很好地搭配,但 POSIXlt
则不行(Hadley 有偏好),因此如果您将数据存储为实际时间,则加入起来会更容易有用。
此外,在 lapply
内,您需要使用您创建的函数的变量(此处为 x
),否则您将重复同样的事情。如果您想加入 data.frames,也不要对其进行子集化;您需要在每个列中使用相同的名称和数据类型。
总而言之,你需要这样的东西:
library(dplyr)
df$datetime <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df) # not necessary, but prints nicely
list_df <- list(df, df) # fake list of data.frames
# make a data.frame of sequence to join on
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'),
as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'),
by="hour",tz="GMT"))
lapply(list_df, function(x){full_join(x, seq_df)})
# Joining by: "datetime"
# Joining by: "datetime"
# [[1]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
#
# [[2]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
数据:
df <- structure(list(datetime = structure(c(1429056000, 1429059600, 1429063200, 1429066800,
1429070400, 1429074000), class = c("POSIXct", "POSIXt"), tzone = "GMT"), precip = c(NA,
NA, NA, NA, NA, NA), code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "M",
class = "factor")), .Names = c("datetime", "precip", "code"), row.names = c("1",
"2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))
目前我在列表中有多个数据帧,格式如下:
datetime precip code
1 2015-04-15 00:00:00 NA M
2 2015-04-15 01:00:00 NA M
3 2015-04-15 02:00:00 NA M
4 2015-04-15 03:00:00 NA M
5 2015-04-15 04:00:00 NA M
6 2015-04-15 05:00:00 NA M
每个数据框都有不同的开始和结束日期,但我希望每个数据框从 2015-04-01 0:00:00
到 2015-11-30 23:59:59
开始。我想在每个数据框中为 datetime
中的缺失日期生成行,并用 NA
填充 precip
列,以便我有一个连续的时间序列 nrow=5856
在每个数据框。
忽略 code
列。如果 precip
存在值,请不要更改它们,只需将附加的 datetime
rows
填充为 NAs
到目前为止我的尝试产生了一个错误:
library(dplyr)
dates <- seq.POSIXt(as.POSIXlt("2015-04-01 0:00:00"), as.POSIXlt("2015-11-30 23:59:59"), by="hour",tz="GMT")
ts <- format.POSIXct(dates,"%Y/%m/%d %H:%M")
df <- data.frame(datetime=ts)
dat=mylist
final_list <- lapply(dat, function(x) full_join(df,dat$precip))
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"
link to sample file in case it is needed
感谢您的建议。
正如 vitor 上面指出的,您只能连接两个 data.frames,而不是 data.frame 和一个向量。 dplyr
也可以与 POSIXct
很好地搭配,但 POSIXlt
则不行(Hadley 有偏好),因此如果您将数据存储为实际时间,则加入起来会更容易有用。
此外,在 lapply
内,您需要使用您创建的函数的变量(此处为 x
),否则您将重复同样的事情。如果您想加入 data.frames,也不要对其进行子集化;您需要在每个列中使用相同的名称和数据类型。
总而言之,你需要这样的东西:
library(dplyr)
df$datetime <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df) # not necessary, but prints nicely
list_df <- list(df, df) # fake list of data.frames
# make a data.frame of sequence to join on
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'),
as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'),
by="hour",tz="GMT"))
lapply(list_df, function(x){full_join(x, seq_df)})
# Joining by: "datetime"
# Joining by: "datetime"
# [[1]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
#
# [[2]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
数据:
df <- structure(list(datetime = structure(c(1429056000, 1429059600, 1429063200, 1429066800,
1429070400, 1429074000), class = c("POSIXct", "POSIXt"), tzone = "GMT"), precip = c(NA,
NA, NA, NA, NA, NA), code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "M",
class = "factor")), .Names = c("datetime", "precip", "code"), row.names = c("1",
"2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))