使用 zoo 包在 R 中的时间序列数据中填充缺失的日期和时间

Populating missing Date and Time in time-series data in R, with zoo package

我有刻钟(15 分钟间隔)频率数据。

sasan<-read.csv("sasanhz.csv", header = TRUE)

head(sasan)
               Timestamp Avg.Hz
1 12/27/2017 12:15:00 AM  50.05
2 12/27/2017 12:30:00 AM  49.99
3 12/27/2017 12:45:00 AM  49.98
4 12/27/2017 01:00:00 AM  50.01
5 12/27/2017 01:15:00 AM  49.97
6 12/27/2017 01:30:00 AM  49.98

str(sasan)
'data.frame':   5501 obs. of  2 variables:
 $ Timestamp: Factor w/ 5501 levels "01/01/2018 00:00:00 AM",..: 5112 5114 5116 5023 5025 
                                 5027 5029 5031 5033 5035 ...
 $ Avg.Hz   : num  50 50 50 50 50 ...

 #change to posixct

sasan$Timestamp<-as.POSIXct(sasan$Timestamp, format="%m/%d/%Y %I:%M:%S %p")

在这个时间序列中,我在 coloum 中有一些缺失的数据时间 "Timestamp" 我想估算缺失的日期时间。 我试过 zoo.

    z<-zoo(sasan)
    > head(z[1489:1497])
     Timestamp           Avg.Hz
1489 2018-01-11 12:15:00 50.02 
1490 2018-01-11 12:30:00 49.99 
1491 2018-01-11 12:45:00 49.94 
1492 <NA>                49.98 
1493 <NA>                50.02 
1494 <NA>                49.95

zoo 包中使用 "na.locf" 函数输入日期和时间的 NA 值时出现以下错误。

 sasan_mis<-seq(start(z), end(z), by = times("00:15:00"))
> na.locf(z, xout = sasan_mis)
Error in approx(x[!na], y[!na], xout, ...) : zero non-NA points
In addition: Warning message:
In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion

如何克服这个错误?我如何估算这个缺失的日期时间?感谢您的建议。

dput(head(z))
structure(c("2017-12-27 00:15:00", "2017-12-27 00:30:00", "2017-12-27 00:45:00", 
"2017-12-27 01:00:00", "2017-12-27 01:15:00", "2017-12-27 01:30:00", 
"50.05", "49.99", "49.98", "50.01", "49.97", "49.98"), .Dim = c(6L, 
2L), .Dimnames = list(NULL, c("Timestamp", "Avg.Hz")), index = 1:6, class = "zoo")

我用过的库包有

library(ggplot2)
library(forecast)
library(tseries)
library(xts)
library(zoo)
library(dplyr)

假设 OP 在数据中缺少 Timestamp 个变量的值,并正在寻找一种方法来填充它。

na.approx from zoo package 在这种情况下非常方便。

# na.approx from zoo to populate missing values of Timestamp
sasan$Timestamp <- as.POSIXct(na.approx(sasan$Timestamp), origin = "1970-1-1")
sasan
# 1  2017-12-27 00:15:00  50.05
# 2  2017-12-27 00:30:00  49.99
# 3  2017-12-27 00:45:00  49.98
# 4  2017-12-27 01:00:00  50.01
# 5  2017-12-27 01:15:00  49.97
# 6  2017-12-27 01:30:00  49.98
# 7  2017-12-27 01:45:00  49.98
# 8  2017-12-27 02:00:00  50.02
# 9  2017-12-27 02:15:00  49.95
# 10 2017-12-27 02:30:00  49.98

数据

# OP's data has been slightly modified to include NAs
sasan <- read.table(text = 
"Timestamp           Avg.Hz
1 '12/27/2017 12:15:00 AM'  50.05
2 '12/27/2017 12:30:00 AM'  49.99
3 '12/27/2017 12:45:00 AM'  49.98
4 '12/27/2017 01:00:00 AM'  50.01
5 '12/27/2017 01:15:00 AM'  49.97
6 '12/27/2017 01:30:00 AM'  49.98
7 <NA>                      49.98 
8 <NA>                      50.02 
9 <NA>                      49.95
10 '12/27/2017 02:30:00 AM'  49.98", 
header = TRUE, stringsAsFactors = FALSE)

# convert to POSIXct 
sasan$Timestamp<-as.POSIXct(sasan$Timestamp, format="%m/%d/%Y %I:%M:%S %p")