R:如何在合并的动物园对象中保留合法的 NA

R: how to keep legitimate NAs in a merged zoo object

我有多个时间序列对象,有规律的五分钟间隔,但它们可以有不同的开始和结束时间。他们还可以在不同的时间登录,不一定是在第 5、10、15 分钟等

我想合并这些对象,但我想保持合法的 NA 完好无损。例如,一个对象在稍后的时间开始记录,那么开始的NA就是合法的NA。如果一个对象更早停止记录,那么最后的 NA 是合法的。

但是 na.locf 没有保持两个 NA 完整的选项。

这是我的问题的一个例子:

lines1="Index,x1
2014-01-01 00:00:00,73.06
2014-01-01 00:05:00,73.11
2014-01-01 00:10:00,73.16
2014-01-01 00:15:00,73.22"

lines2="Index,x2
2014-01-01 00:11:00,71.11
2014-01-01 00:16:00,70.12
2014-01-01 00:21:00,70.16
2014-01-01 00:26:00,70.19
2014-01-01 00:31:00,69.16"

lines3="Index,x3
2014-01-01 00:23:00,0
2014-01-01 00:28:00,1
2014-01-01 00:33:00,1
2014-01-01 00:38:00,0
2014-01-01 00:43:00,0"

df1=read.table(text = lines1, header = TRUE, sep = ",")
df2=read.table(text = lines2, header = TRUE, sep = ",")
df3=read.table(text = lines3, header = TRUE, sep = ",")

z1 = zoo(df1$x1, as.POSIXct(df1$Index))
z2 = zoo(df2$x2, as.POSIXct(df2$Index))
z3 = zoo(df3$x3, as.POSIXct(df3$Index))

z = merge(z1,z2,z3)
z

z.na.locf = na.locf(z)
z.na.locf

timesteps = seq(as.POSIXct("2014-01-01 00:00:00"), 
                as.POSIXct("2014-01-01 01:00:00"),
                by = "5 min")

z.timesteps = na.locf(z, xout=timesteps)
z.timesteps

合并后的对象是这样的:

> z
                       z1    z2 z3
2014-01-01 00:00:00 73.06    NA NA
2014-01-01 00:05:00 73.11    NA NA
2014-01-01 00:10:00 73.16    NA NA
2014-01-01 00:11:00    NA 71.11 NA
2014-01-01 00:15:00 73.22    NA NA
2014-01-01 00:16:00    NA 70.12 NA
2014-01-01 00:21:00    NA 70.16 NA
2014-01-01 00:23:00    NA    NA  0
2014-01-01 00:26:00    NA 70.19 NA
2014-01-01 00:28:00    NA    NA  1
2014-01-01 00:31:00    NA 69.16 NA
2014-01-01 00:33:00    NA    NA  1
2014-01-01 00:38:00    NA    NA  0
2014-01-01 00:43:00    NA    NA  0

注意z1开头的NA是合法的,z3的结尾也是合法的,z2的开头和结尾也是。需要替换的 NA 是数据中间的 NA。问题是如果我试图在数据中间填充缺失值,合法的 NAs 也不见了:

> z.na.locf
                       z1    z2 z3
2014-01-01 00:00:00 73.06    NA NA
2014-01-01 00:05:00 73.11    NA NA
2014-01-01 00:10:00 73.16    NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 73.22 70.12 NA
2014-01-01 00:21:00 73.22 70.16 NA
2014-01-01 00:23:00 73.22 70.16  0
2014-01-01 00:26:00 73.22 70.19  0
2014-01-01 00:28:00 73.22 70.19  1
2014-01-01 00:31:00 73.22 69.16  1
2014-01-01 00:33:00 73.22 69.16  1
2014-01-01 00:38:00 73.22 69.16  0
2014-01-01 00:43:00 73.22 69.16  0

注意z1和z2,最后合法的NA都没有了

此外,如果我想重新采样数据以具有相同的常规时间戳,则开头和结尾的 NA 也都消失了。

> z.timesteps
                       z1    z2 z3
2014-01-01 00:00:00 73.06 71.11  0
2014-01-01 00:05:00 73.11 71.11  0
2014-01-01 00:10:00 73.16 71.11  0
2014-01-01 00:15:00 73.22 71.11  0
2014-01-01 00:20:00 73.22 70.12  0
2014-01-01 00:25:00 73.22 70.16  0
2014-01-01 00:30:00 73.22 70.19  1
2014-01-01 00:35:00 73.22 69.16  1
2014-01-01 00:40:00 73.22 69.16  0
2014-01-01 00:45:00 73.22 69.16  0
2014-01-01 00:50:00 73.22 69.16  0
2014-01-01 00:55:00 73.22 69.16  0
2014-01-01 01:00:00 73.22 69.16  0

有什么办法可以达到我的要求吗?感谢您的帮助。

na.fill 可以提供帮助。下面的代码行将在开始和结束时保留 NA 的运行,但使用 na.locf:

填充剩余的 NA
zz <- na.locf(z, na.rm = FALSE) + 0 * na.fill(z, fill = c(NA, 0, NA))

给予:

> zz
                       z1    z2 z3
2014-01-01 00:00:00 73.06    NA NA
2014-01-01 00:05:00 73.11    NA NA
2014-01-01 00:10:00 73.16    NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00    NA 70.12 NA
2014-01-01 00:21:00    NA 70.16 NA
2014-01-01 00:23:00    NA 70.16  0
2014-01-01 00:26:00    NA 70.19  0
2014-01-01 00:28:00    NA 70.19  1
2014-01-01 00:31:00    NA 69.16  1
2014-01-01 00:33:00    NA    NA  1
2014-01-01 00:38:00    NA    NA  0
2014-01-01 00:43:00    NA    NA  0

注 1: 我们可以将 read.table / zoo 行减少为以下形式的三行:

z1 <- read.zoo(text = lines1, header = TRUE, sep = ",", tz = "")

注2:也许你接下来要做的是:

timesteps <- seq(start(zz), start(zz) + 3600, by = "5 min")
m <- merge(zz, zoo(, timesteps))
m.na <- na.locf(m, na.rm = FALSE) + 0 * na.fill(m, fill = c(NA, 0, NA))
window(m.na, timesteps)