R:如何在合并的动物园对象中保留合法的 NA
R: how to keep legitimate NAs in a merged zoo object
我有多个时间序列对象,有规律的五分钟间隔,但它们可以有不同的开始和结束时间。他们还可以在不同的时间登录,不一定是在第 5、10、15 分钟等
我想合并这些对象,但我想保持合法的 NA 完好无损。例如,一个对象在稍后的时间开始记录,那么开始的NA就是合法的NA。如果一个对象更早停止记录,那么最后的 NA 是合法的。
但是 na.locf 没有保持两个 NA 完整的选项。
这是我的问题的一个例子:
lines1="Index,x1
2014-01-01 00:00:00,73.06
2014-01-01 00:05:00,73.11
2014-01-01 00:10:00,73.16
2014-01-01 00:15:00,73.22"
lines2="Index,x2
2014-01-01 00:11:00,71.11
2014-01-01 00:16:00,70.12
2014-01-01 00:21:00,70.16
2014-01-01 00:26:00,70.19
2014-01-01 00:31:00,69.16"
lines3="Index,x3
2014-01-01 00:23:00,0
2014-01-01 00:28:00,1
2014-01-01 00:33:00,1
2014-01-01 00:38:00,0
2014-01-01 00:43:00,0"
df1=read.table(text = lines1, header = TRUE, sep = ",")
df2=read.table(text = lines2, header = TRUE, sep = ",")
df3=read.table(text = lines3, header = TRUE, sep = ",")
z1 = zoo(df1$x1, as.POSIXct(df1$Index))
z2 = zoo(df2$x2, as.POSIXct(df2$Index))
z3 = zoo(df3$x3, as.POSIXct(df3$Index))
z = merge(z1,z2,z3)
z
z.na.locf = na.locf(z)
z.na.locf
timesteps = seq(as.POSIXct("2014-01-01 00:00:00"),
as.POSIXct("2014-01-01 01:00:00"),
by = "5 min")
z.timesteps = na.locf(z, xout=timesteps)
z.timesteps
合并后的对象是这样的:
> z
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 NA 71.11 NA
2014-01-01 00:15:00 73.22 NA NA
2014-01-01 00:16:00 NA 70.12 NA
2014-01-01 00:21:00 NA 70.16 NA
2014-01-01 00:23:00 NA NA 0
2014-01-01 00:26:00 NA 70.19 NA
2014-01-01 00:28:00 NA NA 1
2014-01-01 00:31:00 NA 69.16 NA
2014-01-01 00:33:00 NA NA 1
2014-01-01 00:38:00 NA NA 0
2014-01-01 00:43:00 NA NA 0
注意z1开头的NA是合法的,z3的结尾也是合法的,z2的开头和结尾也是。需要替换的 NA 是数据中间的 NA。问题是如果我试图在数据中间填充缺失值,合法的 NAs 也不见了:
> z.na.locf
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 73.22 70.12 NA
2014-01-01 00:21:00 73.22 70.16 NA
2014-01-01 00:23:00 73.22 70.16 0
2014-01-01 00:26:00 73.22 70.19 0
2014-01-01 00:28:00 73.22 70.19 1
2014-01-01 00:31:00 73.22 69.16 1
2014-01-01 00:33:00 73.22 69.16 1
2014-01-01 00:38:00 73.22 69.16 0
2014-01-01 00:43:00 73.22 69.16 0
注意z1和z2,最后合法的NA都没有了
此外,如果我想重新采样数据以具有相同的常规时间戳,则开头和结尾的 NA 也都消失了。
> z.timesteps
z1 z2 z3
2014-01-01 00:00:00 73.06 71.11 0
2014-01-01 00:05:00 73.11 71.11 0
2014-01-01 00:10:00 73.16 71.11 0
2014-01-01 00:15:00 73.22 71.11 0
2014-01-01 00:20:00 73.22 70.12 0
2014-01-01 00:25:00 73.22 70.16 0
2014-01-01 00:30:00 73.22 70.19 1
2014-01-01 00:35:00 73.22 69.16 1
2014-01-01 00:40:00 73.22 69.16 0
2014-01-01 00:45:00 73.22 69.16 0
2014-01-01 00:50:00 73.22 69.16 0
2014-01-01 00:55:00 73.22 69.16 0
2014-01-01 01:00:00 73.22 69.16 0
有什么办法可以达到我的要求吗?感谢您的帮助。
na.fill
可以提供帮助。下面的代码行将在开始和结束时保留 NA 的运行,但使用 na.locf
:
填充剩余的 NA
zz <- na.locf(z, na.rm = FALSE) + 0 * na.fill(z, fill = c(NA, 0, NA))
给予:
> zz
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 NA 70.12 NA
2014-01-01 00:21:00 NA 70.16 NA
2014-01-01 00:23:00 NA 70.16 0
2014-01-01 00:26:00 NA 70.19 0
2014-01-01 00:28:00 NA 70.19 1
2014-01-01 00:31:00 NA 69.16 1
2014-01-01 00:33:00 NA NA 1
2014-01-01 00:38:00 NA NA 0
2014-01-01 00:43:00 NA NA 0
注 1: 我们可以将 read.table
/ zoo
行减少为以下形式的三行:
z1 <- read.zoo(text = lines1, header = TRUE, sep = ",", tz = "")
注2:也许你接下来要做的是:
timesteps <- seq(start(zz), start(zz) + 3600, by = "5 min")
m <- merge(zz, zoo(, timesteps))
m.na <- na.locf(m, na.rm = FALSE) + 0 * na.fill(m, fill = c(NA, 0, NA))
window(m.na, timesteps)
我有多个时间序列对象,有规律的五分钟间隔,但它们可以有不同的开始和结束时间。他们还可以在不同的时间登录,不一定是在第 5、10、15 分钟等
我想合并这些对象,但我想保持合法的 NA 完好无损。例如,一个对象在稍后的时间开始记录,那么开始的NA就是合法的NA。如果一个对象更早停止记录,那么最后的 NA 是合法的。
但是 na.locf 没有保持两个 NA 完整的选项。
这是我的问题的一个例子:
lines1="Index,x1
2014-01-01 00:00:00,73.06
2014-01-01 00:05:00,73.11
2014-01-01 00:10:00,73.16
2014-01-01 00:15:00,73.22"
lines2="Index,x2
2014-01-01 00:11:00,71.11
2014-01-01 00:16:00,70.12
2014-01-01 00:21:00,70.16
2014-01-01 00:26:00,70.19
2014-01-01 00:31:00,69.16"
lines3="Index,x3
2014-01-01 00:23:00,0
2014-01-01 00:28:00,1
2014-01-01 00:33:00,1
2014-01-01 00:38:00,0
2014-01-01 00:43:00,0"
df1=read.table(text = lines1, header = TRUE, sep = ",")
df2=read.table(text = lines2, header = TRUE, sep = ",")
df3=read.table(text = lines3, header = TRUE, sep = ",")
z1 = zoo(df1$x1, as.POSIXct(df1$Index))
z2 = zoo(df2$x2, as.POSIXct(df2$Index))
z3 = zoo(df3$x3, as.POSIXct(df3$Index))
z = merge(z1,z2,z3)
z
z.na.locf = na.locf(z)
z.na.locf
timesteps = seq(as.POSIXct("2014-01-01 00:00:00"),
as.POSIXct("2014-01-01 01:00:00"),
by = "5 min")
z.timesteps = na.locf(z, xout=timesteps)
z.timesteps
合并后的对象是这样的:
> z
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 NA 71.11 NA
2014-01-01 00:15:00 73.22 NA NA
2014-01-01 00:16:00 NA 70.12 NA
2014-01-01 00:21:00 NA 70.16 NA
2014-01-01 00:23:00 NA NA 0
2014-01-01 00:26:00 NA 70.19 NA
2014-01-01 00:28:00 NA NA 1
2014-01-01 00:31:00 NA 69.16 NA
2014-01-01 00:33:00 NA NA 1
2014-01-01 00:38:00 NA NA 0
2014-01-01 00:43:00 NA NA 0
注意z1开头的NA是合法的,z3的结尾也是合法的,z2的开头和结尾也是。需要替换的 NA 是数据中间的 NA。问题是如果我试图在数据中间填充缺失值,合法的 NAs 也不见了:
> z.na.locf
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 73.22 70.12 NA
2014-01-01 00:21:00 73.22 70.16 NA
2014-01-01 00:23:00 73.22 70.16 0
2014-01-01 00:26:00 73.22 70.19 0
2014-01-01 00:28:00 73.22 70.19 1
2014-01-01 00:31:00 73.22 69.16 1
2014-01-01 00:33:00 73.22 69.16 1
2014-01-01 00:38:00 73.22 69.16 0
2014-01-01 00:43:00 73.22 69.16 0
注意z1和z2,最后合法的NA都没有了
此外,如果我想重新采样数据以具有相同的常规时间戳,则开头和结尾的 NA 也都消失了。
> z.timesteps
z1 z2 z3
2014-01-01 00:00:00 73.06 71.11 0
2014-01-01 00:05:00 73.11 71.11 0
2014-01-01 00:10:00 73.16 71.11 0
2014-01-01 00:15:00 73.22 71.11 0
2014-01-01 00:20:00 73.22 70.12 0
2014-01-01 00:25:00 73.22 70.16 0
2014-01-01 00:30:00 73.22 70.19 1
2014-01-01 00:35:00 73.22 69.16 1
2014-01-01 00:40:00 73.22 69.16 0
2014-01-01 00:45:00 73.22 69.16 0
2014-01-01 00:50:00 73.22 69.16 0
2014-01-01 00:55:00 73.22 69.16 0
2014-01-01 01:00:00 73.22 69.16 0
有什么办法可以达到我的要求吗?感谢您的帮助。
na.fill
可以提供帮助。下面的代码行将在开始和结束时保留 NA 的运行,但使用 na.locf
:
zz <- na.locf(z, na.rm = FALSE) + 0 * na.fill(z, fill = c(NA, 0, NA))
给予:
> zz
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 NA 70.12 NA
2014-01-01 00:21:00 NA 70.16 NA
2014-01-01 00:23:00 NA 70.16 0
2014-01-01 00:26:00 NA 70.19 0
2014-01-01 00:28:00 NA 70.19 1
2014-01-01 00:31:00 NA 69.16 1
2014-01-01 00:33:00 NA NA 1
2014-01-01 00:38:00 NA NA 0
2014-01-01 00:43:00 NA NA 0
注 1: 我们可以将 read.table
/ zoo
行减少为以下形式的三行:
z1 <- read.zoo(text = lines1, header = TRUE, sep = ",", tz = "")
注2:也许你接下来要做的是:
timesteps <- seq(start(zz), start(zz) + 3600, by = "5 min")
m <- merge(zz, zoo(, timesteps))
m.na <- na.locf(m, na.rm = FALSE) + 0 * na.fill(m, fill = c(NA, 0, NA))
window(m.na, timesteps)