为不规则时间序列插值数据
Interpolate data for irregular time series
我尝试插入这个 meterValue,这里是完整的 csv:https://drive.google.com/open?id=18cwtw-chAB-FqqCesXZJ-6NB6eHFJlgQ
localminute,dataid,meter_value
2015-10-03 09:51:53,6578,157806
2015-10-13 13:41:49,6578,158086
:
:
2016-01-17 16:00:33,6578,164544 #end of meter_value data for ID=6578
基于什么@G。 Grothendieck,建议,我在 z.interpolate(合并数据)
处出错
D6578z <- read.csv.zoo("test_6578.csv")[,2]
D6578zd <- to.daily(D6578z)[,4]
#Warning messages:
#1: In zoo(xx, order.by = index(x), ...) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
#2: In zoo(rval, index(x)[i]) :some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
test_6578t <- time(D6578zd)
plot(D6578zd,type="p",xaxt="n", pch=19, col="blue",cex=1.5)
diff(test_6578t)
t.daily6578 <- seq(from =min(test_6578t),to=max(test_6578t),by="1 day")
dummy6578 <- zoo(,t.daily6578)
z.interpolated <- merge(D6578zd,dummy6578,all=TRUE)
*#Error in merge.zoo(D6578zd, dummy6578, all = TRUE) : series cannot be merged with non-unique index entries in a series*
@G提供的插值数据一小时时差的R代码解决方法。格洛腾迪克,如下。
嗨@G。 Grothendieck,感谢解决方案代码。关于您的代码,我有一些问题要与您澄清。
`line1: to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
line2: z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))`
`line3: zz <- na.approx(as.zoo(as.ts(z)))`
`line4: time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")`
in line1, why “as.POSIXct” before `trunc(as.POSIXct(x,origin
=”1970-01-01”)?
I understand that "trunc" function round up the datetime value.
In line2, What does this code mean “FUN=to.hour, aggregate
=function(x) tail (x,1)” work?
As I could not understand what is tail(x,1). I extracted the z
function in csv file, I observed that only dataid and meter_value
columns are generated when ‘read.csv.zoo’ function is used.
In line3, I understand that, zz
function gives interpolated data
but I didn’t fully understand the code “na.approx(as.zoo(as.ts(z)))” ,
since z
is already zoo series after read.csv.zoo, why we still have
to use “as.zoo” and “as.ts” in “na.approx” line?
what is the difference between zoo and zooreg series?
In line4, “time(zz)” is the index of “zz” function?
提前感谢您的解释。
我可以绘制时差为 1 小时的插值数据。
读取文件使用 read.csv.zoo
转换为 Date
class 聚合重复日期以便使用最后一个日期。然后转换为 ts
并返回到 zoo ,它将用 NA 填充空天。现在使用 na.approx
填写 NA 值。由于 ts
不能表示 Date
class 结果系列将有代表日期的数字,因此将它们转换回来。
library(zoo)
z <- read.csv.zoo("test_6578.csv", FUN = as.Date, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.Date(time(zz))
评论中有人声称输出中存在漏洞,但事实并非如此。连续时间之间的差异相同为 1,并且没有 NA。
table(diff(time(zz)))
## 1
## 106
any(is.na(zz))
## [1] FALSE
any(is.na(time(zz)))
## [1] FALSE
这里是一个例子,执行此操作一小时而不是一天的差异。
to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")
plot(zz[, 2], type = "p", pch = ".")
我尝试插入这个 meterValue,这里是完整的 csv:https://drive.google.com/open?id=18cwtw-chAB-FqqCesXZJ-6NB6eHFJlgQ
localminute,dataid,meter_value
2015-10-03 09:51:53,6578,157806
2015-10-13 13:41:49,6578,158086
:
:
2016-01-17 16:00:33,6578,164544 #end of meter_value data for ID=6578
基于什么@G。 Grothendieck,建议,我在 z.interpolate(合并数据)
处出错D6578z <- read.csv.zoo("test_6578.csv")[,2]
D6578zd <- to.daily(D6578z)[,4]
#Warning messages:
#1: In zoo(xx, order.by = index(x), ...) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
#2: In zoo(rval, index(x)[i]) :some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
test_6578t <- time(D6578zd)
plot(D6578zd,type="p",xaxt="n", pch=19, col="blue",cex=1.5)
diff(test_6578t)
t.daily6578 <- seq(from =min(test_6578t),to=max(test_6578t),by="1 day")
dummy6578 <- zoo(,t.daily6578)
z.interpolated <- merge(D6578zd,dummy6578,all=TRUE)
*#Error in merge.zoo(D6578zd, dummy6578, all = TRUE) : series cannot be merged with non-unique index entries in a series*
@G提供的插值数据一小时时差的R代码解决方法。格洛腾迪克,如下。
嗨@G。 Grothendieck,感谢解决方案代码。关于您的代码,我有一些问题要与您澄清。
`line1: to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
line2: z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))`
`line3: zz <- na.approx(as.zoo(as.ts(z)))`
`line4: time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")`
in line1, why “as.POSIXct” before `trunc(as.POSIXct(x,origin =”1970-01-01”)?
I understand that "trunc" function round up the datetime value.In line2, What does this code mean “FUN=to.hour, aggregate =function(x) tail (x,1)” work?
As I could not understand what is tail(x,1). I extracted the
z
function in csv file, I observed that only dataid and meter_value columns are generated when ‘read.csv.zoo’ function is used.In line3, I understand that,
zz
function gives interpolated data but I didn’t fully understand the code “na.approx(as.zoo(as.ts(z)))” , sincez
is already zoo series after read.csv.zoo, why we still have to use “as.zoo” and “as.ts” in “na.approx” line?what is the difference between zoo and zooreg series?
In line4, “time(zz)” is the index of “zz” function?
提前感谢您的解释。
我可以绘制时差为 1 小时的插值数据。
读取文件使用 read.csv.zoo
转换为 Date
class 聚合重复日期以便使用最后一个日期。然后转换为 ts
并返回到 zoo ,它将用 NA 填充空天。现在使用 na.approx
填写 NA 值。由于 ts
不能表示 Date
class 结果系列将有代表日期的数字,因此将它们转换回来。
library(zoo)
z <- read.csv.zoo("test_6578.csv", FUN = as.Date, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.Date(time(zz))
评论中有人声称输出中存在漏洞,但事实并非如此。连续时间之间的差异相同为 1,并且没有 NA。
table(diff(time(zz)))
## 1
## 106
any(is.na(zz))
## [1] FALSE
any(is.na(time(zz)))
## [1] FALSE
这里是一个例子,执行此操作一小时而不是一天的差异。
to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")
plot(zz[, 2], type = "p", pch = ".")