R xts对象-连续5秒的子集数据点

R xts object - subset data points for 5 consecutive seconds

我有一个大的 xts 对象,我想对时间列中的秒数进行子集化,但前提是至少有连续 5 秒的序列。我每秒最多有 8 个数据点(不应算作 5 个连续点,因为它们是在同一秒内测量的)。

And_sub_xts 是我的 xts 对象

> str(And_sub_xts)
An ‘xts’ object on 2010-04-09 20:32:56/2010-04-26 06:56:57 containing:
 Data: chr [1:164421, 1:11] "0.255416" "0.168836" "0.212126" "0.229442" "0.238100" "0.212126" "0.168836" ...
- attr(*, "dimnames")=List of 2
 ..$ : NULL
 ..$ : chr [1:11] "CalSurge" "CalSway" "CalHeave" "Stat_Surge" ...
 Indexed by objects of class: [POSIXct,POSIXt] TZ: 
 xts Attributes:  
NULL

以及

的前 100 个值

abs(diff(.indexsec(And_sub_xts))

56 8 23 34 40 40 41 42 25 27 34 35 38 38 40 40 41 56 59 59 19 19 20 20 20 20 22 22 23 23 24 24 24 25 25 26 27 27 27 27 27 28 28 30 30 30 37 38 40 40 41 44 44 46 46 47 48 51 52 54 54 54 54 55 56 59 1 4 4 4 6 6 6 6 7 7 11 12 12 14 14 15 16 16 17 18 18 19 19 21 21 22 22 23 23 25 25 26 26 26

我将 keeps 标记为粗体,因此子集应该只包含这些数据点。

我只是意识到从理论上讲,有些数据点可能会像这样分布

2010-04-09 20:32:20
2010-04-09 20:32:20
2010-04-09 20:32:21
2010-04-09 20:32:22
2010-04-09 20:32:22
2010-04-09 20:40:22
2010-04-09 22:52:23
2010-04-10 20:52:24

这不会是连续 5 秒,但您无法使用 .indexsec 命令来解决这个问题 - 也许任何人都知道解决这个问题的方法。

感谢您的帮助!

这是一种方法。 x 是包含索引值的示例数据,其秒数等于您的前 100 个值。

require(xts)
# sample data
s <- c(56, 8, 23, 34, 40, 40, 41, 42, 25, 27, 34, 35, 38, 38, 40, 
40, 41, 56, 59, 59, 19, 19, 20, 20, 20, 20, 22, 22, 23, 23, 24, 
24, 24, 25, 25, 26, 27, 27, 27, 27, 27, 28, 28, 30, 30, 30, 37, 
38, 40, 40, 41, 44, 44, 46, 46, 47, 48, 51, 52, 54, 54, 54, 54, 
55, 56, 59, 1, 4, 4, 4, 6, 6, 6, 6, 7, 7, 11, 12, 12, 14, 14, 
15, 16, 16, 17, 18, 18, 19, 19, 21, 21, 22, 22, 23, 23, 25, 25, 
26, 26, 26)
S <- cumsum(ifelse(c(0, diff(s)) < 0, 1, 0)) * 60 + s
x <- .xts(seq_along(S), S, tzone="UTC")

基本思想是将数据聚合到 1 秒分辨率,因此您可以使用 rle(运行 长度编码)来查找连续的 5 秒观测值。然后在聚合数据中找到 5 秒观察集的第一个和最后一个时间戳,然后在原始数据中找到这些时间戳的位置。最后,使用原始数据中时间戳的位置创建可用于对连续 5 秒观察组进行子集化的序列集。

# aggregate data to 1-second resolution
oneSec <- period.apply(x, endpoints(x, 'seconds'), identity) 
# find the runs of 5 or more consecutive one-second increments
consec <- rle(diff(.index(oneSec)))
gte5s <- consec$lengths >= 5
# get the location of the first obs of the run in the 1-second data
begLoc <- cumsum(c(1,consec$lengths))[gte5s]
endLoc <- begLoc + consec$lengths[gte5s]
# get the timestamp of the first and last obs from the original data
beg <- lapply(index(oneSec)[begLoc], function(i) first(x[i, which.i=TRUE]))
end <- lapply(index(oneSec)[endLoc], function(i) last(x[i, which.i=TRUE]))
# create index vector between each value in 'beg' and 'end'
loc <- unlist(mapply(seq, beg, end))
# subset original object using index vector
X <- x[loc,]