为不规则时间序列插值数据

Question

我尝试插入这个 meterValue，这里是完整的 csv：https://drive.google.com/open?id=18cwtw-chAB-FqqCesXZJ-6NB6eHFJlgQ

localminute,dataid,meter_value
2015-10-03 09:51:53,6578,157806
2015-10-13 13:41:49,6578,158086
:
:
2016-01-17 16:00:33,6578,164544  #end of meter_value data for ID=6578

基于什么@G。 Grothendieck，建议，我在 z.interpolate（合并数据）

处出错

D6578z <- read.csv.zoo("test_6578.csv")[,2]
D6578zd <- to.daily(D6578z)[,4]
#Warning messages:
                #1: In zoo(xx, order.by = index(x), ...) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
                #2: In zoo(rval, index(x)[i]) :some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique

     test_6578t <- time(D6578zd)

     plot(D6578zd,type="p",xaxt="n", pch=19, col="blue",cex=1.5)

     diff(test_6578t) 

     t.daily6578 <- seq(from =min(test_6578t),to=max(test_6578t),by="1 day")

     dummy6578 <- zoo(,t.daily6578) 

     z.interpolated <- merge(D6578zd,dummy6578,all=TRUE)
        *#Error in merge.zoo(D6578zd, dummy6578, all = TRUE) :  series cannot be merged with non-unique index entries in a series*

@G提供的插值数据一小时时差的R代码解决方法。格洛腾迪克，如下。

嗨@G。 Grothendieck，感谢解决方案代码。关于您的代码，我有一些问题要与您澄清。

  `line1: to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))

    line2: z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))`

         `line3: zz <- na.approx(as.zoo(as.ts(z)))`

        `line4: time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")`

in line1, why “as.POSIXct” before `trunc(as.POSIXct(x,origin =”1970-01-01”)?
I understand that "trunc" function round up the datetime value.

In line2, What does this code mean “FUN=to.hour, aggregate =function(x) tail (x,1)” work?

As I could not understand what is tail(x,1). I extracted the z function in csv file, I observed that only dataid and meter_value columns are generated when ‘read.csv.zoo’ function is used.

In line3, I understand that, zz function gives interpolated data but I didn’t fully understand the code “na.approx(as.zoo(as.ts(z)))” , since z is already zoo series after read.csv.zoo, why we still have to use “as.zoo” and “as.ts” in “na.approx” line?

what is the difference between zoo and zooreg series?

In line4, “time(zz)” is the index of “zz” function?

提前感谢您的解释。

我可以绘制时差为 1 小时的插值数据。

Answer 1

读取文件使用 read.csv.zoo 转换为 Date class 聚合重复日期以便使用最后一个日期。然后转换为 ts 并返回到 zoo ，它将用 NA 填充空天。现在使用 na.approx 填写 NA 值。由于 ts 不能表示 Date class 结果系列将有代表日期的数字，因此将它们转换回来。

library(zoo)
z <- read.csv.zoo("test_6578.csv", FUN = as.Date, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.Date(time(zz))

评论中有人声称输出中存在漏洞，但事实并非如此。连续时间之间的差异相同为 1，并且没有 NA。

table(diff(time(zz)))
##   1 
## 106 

any(is.na(zz)) 
## [1] FALSE

any(is.na(time(zz)))
## [1] FALSE

这里是一个例子，执行此操作一小时而不是一天的差异。

to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")

plot(zz[, 2], type = "p", pch = ".")

为不规则时间序列插值数据

Interpolate data for irregular time series

r

time-series

forecasting