约束间隙的插值
Interpolation of constrained gaps
接续以下问题:
我有以下 table:
Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 19:00,0.03 <-- Gap I
4,21/11/2014 16:00,0.04
5,21/11/2014 17:00,0.06 <-- Gap II
6,21/11/2014 20:00,0.10"
可以看出,2014 年 11 月有一个 18:00 的差距,2014 年 11 月 21 日有两个 18:00 和 19:00 的差距。
2014 年 11 月 20 日 19:00 和 2014 年 11 月 21 日 16:00 之间还有一个差距。
我想插入(填写)行之间的差距最多为 3 小时的值。
所需的结果应如下所示(数据帧格式):
Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 18:00,0.025<-- Added lines
4,20/11/2014 19:00,0.03
5,21/11/2014 16:00,0.04
6,21/11/2014 17:00,0.06
6,21/11/2014 18:00,0.073 <--
6,21/11/2014 19:00,0.086 <--
6,21/11/2014 20:00,0.10"
这是我用来填补超过 3 小时的天数间隔的代码:
library (zoo)
z <- read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",")
interpolated1 <-na.approx(z, xout = seq(start(z), end(z), "hours"))
来源 1:Creating a specific sequence of date/times in R。 mnel 于 2012 年 9 月 13 日回答,Matt Dowle 于 2012 年 9 月 13 日编辑
&
来源 2:Creating regular 15-minute time-series from irregular time-series。 mnel 于 2012 年 9 月 13 日回答,Dirk Eddelbuettel 于 2012 年 5 月 3 日编辑
library(zoo)
library(xts)
library(data.table)
library(devtools)
devtools::install_github("iembry-USGS/ie2misc")
library(ie2misc)
# iembry released a version of ie2misc so you should be able to install
# the package now
# `na.interp1` is a function that combines zoo's `na.approx` and pracma's
# `interp1`
其余代码在创建 z zoo
对象后开始
## Source 1 begins
startdate <- as.character((start(z)))
# set the start date/time as the 1st entry in the time series and make
# this a character vector.
start <- as.POSIXct(startdate)
# transform the character vector to a POSIXct object
enddate <- as.character((end(z)))
# set the end date/time as the last entry in the time series and make
# this a character vector.
end <- as.POSIXct(enddate)
# transform the character vector to a POSIXct object
gridtime <- seq(from = start, by = 3600, to = end)
# create a sequence beginning with the start date/time with a 60 minute
# interval ending at the end date/time
## Source 1 ends
## Source 2 begins
timeframe <- data.frame(rep(NA, length(gridtime)))
# create 1 NA column spaced out by the gridtime to complement the single
# column of z
timelength <- xts(timeframe, order.by = gridtime)
# create a xts time series object using timeframe and gridtime
zDate <- merge(timelength, z)
# merge the z zoo object and the timelength xts object
## Source 2 ends
接下来的步骤涉及根据要求插入数据的过程。
Lines <- as.data.frame(zDate)
# to data.frame from zoo
Lines[, "D1"] <- rownames(Lines)
# create column named D1
Lines <- setDT(Lines)
# create data.table out of data.frame
setcolorder(Lines, c(3, 2, 1))
# set the column order as the 3rd column followed by the 2nd and 1st
# columns
Lines <- Lines[, 3 := NULL]
# remove the 3rd column
setnames(Lines, 2, "diff")
# change the name of the 2nd column to diff
Lines <- setDF(Lines)
# return to data.frame
rowsinterps1 <- which(is.na(Lines$diff == TRUE))
# index of rows of Lines that have NA (to be interpolated)
xi <- as.numeric(Lines[which(is.na(Lines$diff == TRUE)), 1])
# the Date-Times for diff to be interpolated in numeric format
interps1 <- na.interp1(as.numeric(Lines$Time), Lines$diff, xi = xi,
na.rm = FALSE, maxgap = 3)
# the interpolated values where only gap sizes of 3 are filled
Lines[rowsinterps1, 2] <- interps1
# replace the NAs in diff with the interpolated diff values
Lines <- na.omit(Lines) # remove rows with NAs
Lines
这是行 data.frame:
Lines
D1 diff
1 2014-11-20 16:00:00 0.01000000
2 2014-11-20 17:00:00 0.02000000
3 2014-11-20 18:00:00 0.02500000
4 2014-11-20 19:00:00 0.03000000
25 2014-11-21 16:00:00 0.04000000
26 2014-11-21 17:00:00 0.06000000
27 2014-11-21 18:00:00 0.07333333
28 2014-11-21 19:00:00 0.08666667
29 2014-11-21 20:00:00 0.10000000
我们可以将 z
与基于小时网格的零宽度动物园系列 z0
合并。这会将 z
转换为具有 NA 的每小时系列。然后使用 maxgap
到 na.approx
的参数,如下所示仅填充所需的空白。这仍然会在较长的间隙中留下 NA,因此请使用 na.omit
删除它们。
fortify.zoo(z3)
会将结果转换为数据框,但由于 z3
,得到的序列只有长度为 3 的间隙被填充,是一个时间序列,这可能不是一个好主意,它会最好将其保留为动物园对象,以便您可以使用动物园的所有设施。
没有使用除 zoo 以外的包。
z0 <- zoo(, seq(start(z), end(z), "hours"))
z3 <- na.omit(na.approx(merge(z, z0), maxgap = 3))
给予:
> z3
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00
0.01000000 0.02000000 0.02500000 0.03000000
2014-11-21 16:00:00 2014-11-21 17:00:00 2014-11-21 18:00:00 2014-11-21 19:00:00
0.04000000 0.06000000 0.07333333 0.08666667
2014-11-21 20:00:00
0.10000000
接续以下问题:
我有以下 table:
Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 19:00,0.03 <-- Gap I
4,21/11/2014 16:00,0.04
5,21/11/2014 17:00,0.06 <-- Gap II
6,21/11/2014 20:00,0.10"
可以看出,2014 年 11 月有一个 18:00 的差距,2014 年 11 月 21 日有两个 18:00 和 19:00 的差距。 2014 年 11 月 20 日 19:00 和 2014 年 11 月 21 日 16:00 之间还有一个差距。 我想插入(填写)行之间的差距最多为 3 小时的值。 所需的结果应如下所示(数据帧格式):
Lines <- "D1,Diff
1,20/11/2014 16:00,0.01
2,20/11/2014 17:00,0.02
3,20/11/2014 18:00,0.025<-- Added lines
4,20/11/2014 19:00,0.03
5,21/11/2014 16:00,0.04
6,21/11/2014 17:00,0.06
6,21/11/2014 18:00,0.073 <--
6,21/11/2014 19:00,0.086 <--
6,21/11/2014 20:00,0.10"
这是我用来填补超过 3 小时的天数间隔的代码:
library (zoo)
z <- read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",")
interpolated1 <-na.approx(z, xout = seq(start(z), end(z), "hours"))
来源 1:Creating a specific sequence of date/times in R。 mnel 于 2012 年 9 月 13 日回答,Matt Dowle 于 2012 年 9 月 13 日编辑
&
来源 2:Creating regular 15-minute time-series from irregular time-series。 mnel 于 2012 年 9 月 13 日回答,Dirk Eddelbuettel 于 2012 年 5 月 3 日编辑
library(zoo)
library(xts)
library(data.table)
library(devtools)
devtools::install_github("iembry-USGS/ie2misc")
library(ie2misc)
# iembry released a version of ie2misc so you should be able to install
# the package now
# `na.interp1` is a function that combines zoo's `na.approx` and pracma's
# `interp1`
其余代码在创建 z zoo
对象后开始
## Source 1 begins
startdate <- as.character((start(z)))
# set the start date/time as the 1st entry in the time series and make
# this a character vector.
start <- as.POSIXct(startdate)
# transform the character vector to a POSIXct object
enddate <- as.character((end(z)))
# set the end date/time as the last entry in the time series and make
# this a character vector.
end <- as.POSIXct(enddate)
# transform the character vector to a POSIXct object
gridtime <- seq(from = start, by = 3600, to = end)
# create a sequence beginning with the start date/time with a 60 minute
# interval ending at the end date/time
## Source 1 ends
## Source 2 begins
timeframe <- data.frame(rep(NA, length(gridtime)))
# create 1 NA column spaced out by the gridtime to complement the single
# column of z
timelength <- xts(timeframe, order.by = gridtime)
# create a xts time series object using timeframe and gridtime
zDate <- merge(timelength, z)
# merge the z zoo object and the timelength xts object
## Source 2 ends
接下来的步骤涉及根据要求插入数据的过程。
Lines <- as.data.frame(zDate)
# to data.frame from zoo
Lines[, "D1"] <- rownames(Lines)
# create column named D1
Lines <- setDT(Lines)
# create data.table out of data.frame
setcolorder(Lines, c(3, 2, 1))
# set the column order as the 3rd column followed by the 2nd and 1st
# columns
Lines <- Lines[, 3 := NULL]
# remove the 3rd column
setnames(Lines, 2, "diff")
# change the name of the 2nd column to diff
Lines <- setDF(Lines)
# return to data.frame
rowsinterps1 <- which(is.na(Lines$diff == TRUE))
# index of rows of Lines that have NA (to be interpolated)
xi <- as.numeric(Lines[which(is.na(Lines$diff == TRUE)), 1])
# the Date-Times for diff to be interpolated in numeric format
interps1 <- na.interp1(as.numeric(Lines$Time), Lines$diff, xi = xi,
na.rm = FALSE, maxgap = 3)
# the interpolated values where only gap sizes of 3 are filled
Lines[rowsinterps1, 2] <- interps1
# replace the NAs in diff with the interpolated diff values
Lines <- na.omit(Lines) # remove rows with NAs
Lines
这是行 data.frame:
Lines
D1 diff
1 2014-11-20 16:00:00 0.01000000
2 2014-11-20 17:00:00 0.02000000
3 2014-11-20 18:00:00 0.02500000
4 2014-11-20 19:00:00 0.03000000
25 2014-11-21 16:00:00 0.04000000
26 2014-11-21 17:00:00 0.06000000
27 2014-11-21 18:00:00 0.07333333
28 2014-11-21 19:00:00 0.08666667
29 2014-11-21 20:00:00 0.10000000
我们可以将 z
与基于小时网格的零宽度动物园系列 z0
合并。这会将 z
转换为具有 NA 的每小时系列。然后使用 maxgap
到 na.approx
的参数,如下所示仅填充所需的空白。这仍然会在较长的间隙中留下 NA,因此请使用 na.omit
删除它们。
fortify.zoo(z3)
会将结果转换为数据框,但由于 z3
,得到的序列只有长度为 3 的间隙被填充,是一个时间序列,这可能不是一个好主意,它会最好将其保留为动物园对象,以便您可以使用动物园的所有设施。
没有使用除 zoo 以外的包。
z0 <- zoo(, seq(start(z), end(z), "hours"))
z3 <- na.omit(na.approx(merge(z, z0), maxgap = 3))
给予:
> z3
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00
0.01000000 0.02000000 0.02500000 0.03000000
2014-11-21 16:00:00 2014-11-21 17:00:00 2014-11-21 18:00:00 2014-11-21 19:00:00
0.04000000 0.06000000 0.07333333 0.08666667
2014-11-21 20:00:00
0.10000000