约束间隙的插值

Interpolation of constrained gaps

接续以下问题:

我有以下 table:

Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 19:00,0.03 <-- Gap I
4,21/11/2014 16:00,0.04
5,21/11/2014 17:00,0.06 <-- Gap II
6,21/11/2014 20:00,0.10"

可以看出,2014 年 11 月有一个 18:00 的差距,2014 年 11 月 21 日有两个 18:00 和 19:00 的差距。 2014 年 11 月 20 日 19:00 和 2014 年 11 月 21 日 16:00 之间还有一个差距。 我想插入(填写)行之间的差距最多为 3 小时的值。 所需的结果应如下所示(数据帧格式):

Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 18:00,0.025<-- Added lines
4,20/11/2014 19:00,0.03
5,21/11/2014 16:00,0.04
6,21/11/2014 17:00,0.06 
6,21/11/2014 18:00,0.073 <--
6,21/11/2014 19:00,0.086 <--
6,21/11/2014 20:00,0.10"

这是我用来填补超过 3 小时的天数间隔的代码:

library (zoo)
z <- read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",")
interpolated1 <-na.approx(z, xout = seq(start(z), end(z), "hours"))

来源 1:Creating a specific sequence of date/times in R。 mnel 于 2012 年 9 月 13 日回答,Matt Dowle 于 2012 年 9 月 13 日编辑

&

来源 2:Creating regular 15-minute time-series from irregular time-series。 mnel 于 2012 年 9 月 13 日回答,Dirk Eddelbuettel 于 2012 年 5 月 3 日编辑

library(zoo)
library(xts)
library(data.table)
library(devtools)
devtools::install_github("iembry-USGS/ie2misc")
library(ie2misc)
# iembry released a version of ie2misc so you should be able to install
# the package now
# `na.interp1` is a function that combines zoo's `na.approx` and pracma's
# `interp1`

其余代码在创建 z zoo 对象后开始

## Source 1 begins
startdate <- as.character((start(z)))
# set the start date/time as the 1st entry in the time series and make
# this a character vector.

start <- as.POSIXct(startdate)
# transform the character vector to a POSIXct object

enddate <- as.character((end(z)))
# set the end date/time as the last entry in the time series and make   
# this a character vector.

end <- as.POSIXct(enddate)
# transform the character vector to a POSIXct object

gridtime <- seq(from = start, by = 3600, to = end)
# create a sequence beginning with the start date/time with a 60 minute 
# interval ending at the end date/time
## Source 1 ends

## Source 2 begins
timeframe <- data.frame(rep(NA, length(gridtime)))
# create 1 NA column spaced out by the gridtime to complement the single 
# column of z

timelength <- xts(timeframe, order.by = gridtime)
# create a xts time series object using timeframe and gridtime

zDate <- merge(timelength, z)
# merge the z zoo object and the timelength xts object  
## Source 2 ends

接下来的步骤涉及根据要求插入数据的过程。

Lines <- as.data.frame(zDate)
# to data.frame from zoo

Lines[, "D1"] <- rownames(Lines)
# create column named D1

Lines <- setDT(Lines)
# create data.table out of data.frame

setcolorder(Lines, c(3, 2, 1))
# set the column order as the 3rd column followed by the 2nd and 1st 
# columns

Lines <- Lines[, 3 := NULL]
# remove the 3rd column

setnames(Lines, 2, "diff")
# change the name of the 2nd column to diff

Lines <- setDF(Lines)
# return to data.frame

rowsinterps1 <- which(is.na(Lines$diff == TRUE))
# index of rows of Lines that have NA (to be interpolated)

xi <- as.numeric(Lines[which(is.na(Lines$diff == TRUE)), 1])
# the Date-Times for diff to be interpolated in numeric format

interps1 <- na.interp1(as.numeric(Lines$Time), Lines$diff, xi = xi,
na.rm = FALSE, maxgap = 3)
# the interpolated values where only gap sizes of 3 are filled

Lines[rowsinterps1, 2] <- interps1
# replace the NAs in diff with the interpolated diff values

Lines <- na.omit(Lines) # remove rows with NAs
Lines

这是行 data.frame:

Lines
                D1       diff
1  2014-11-20 16:00:00 0.01000000
2  2014-11-20 17:00:00 0.02000000
3  2014-11-20 18:00:00 0.02500000
4  2014-11-20 19:00:00 0.03000000
25 2014-11-21 16:00:00 0.04000000
26 2014-11-21 17:00:00 0.06000000
27 2014-11-21 18:00:00 0.07333333
28 2014-11-21 19:00:00 0.08666667
29 2014-11-21 20:00:00 0.10000000

我们可以将 z 与基于小时网格的零宽度动物园系列 z0 合并。这会将 z 转换为具有 NA 的每小时系列。然后使用 maxgapna.approx 的参数,如下所示仅填充所需的空白。这仍然会在较长的间隙中留下 NA,因此请使用 na.omit 删除它们。

fortify.zoo(z3) 会将结果转换为数据框,但由于 z3,得到的序列只有长度为 3 的间隙被填充,是一个时间序列,这可能不是一个好主意,它会最好将其保留为动物园对象,以便您可以使用动物园的所有设施。

没有使用除 zoo 以外的包。

z0 <- zoo(, seq(start(z), end(z), "hours"))
z3 <- na.omit(na.approx(merge(z, z0), maxgap = 3))

给予:

> z3
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00 
         0.01000000          0.02000000          0.02500000          0.03000000 
2014-11-21 16:00:00 2014-11-21 17:00:00 2014-11-21 18:00:00 2014-11-21 19:00:00 
         0.04000000          0.06000000          0.07333333          0.08666667 
2014-11-21 20:00:00 
         0.10000000