我们如何根据秒标准从 xts 中删除行

Question

我有问题。我想从大数据集中删除一些行。问题是我每 30 秒就有一次数据，但我只想达到每分钟一次。所以，我想删除具有 30 秒初始化的行。为了更好地理解，我附上了一个示例，其中包含我想要的预期结果。

time                        value
2021-11-04 05:57:00         0.0
2021-11-04 05:57:30         0.0
2021-11-04 05:58:00         0.0
2021-11-04 05:58:30         0.0
2021-11-04 05:59:00         0.0
2021-11-04 05:59:30         0.0
2021-11-04 06:00:00         0.0
2021-11-04 06:00:30         0.0
2021-11-04 06:01:00         0.0
2021-11-04 06:01:30         0.0
2021-11-04 06:02:00         0.0
2021-11-04 06:02:30         0.0
2021-11-04 06:03:00         0.0
2021-11-04 06:03:30         0.0
2021-11-04 06:04:00         0.0
2021-11-04 06:04:30         0.0
2021-11-04 06:05:00         0.0
2021-11-04 06:05:30         0.0
2021-11-04 06:06:00         0.0
2021-11-04 06:06:30         0.0
2021-11-04 06:07:00         0.0
2021-11-04 06:07:30         0.0

我想成为那样的人

time                        value
2021-11-04 05:57:00         0.0

2021-11-04 05:58:00         0.0

2021-11-04 05:59:00         0.0

2021-11-04 06:00:00         0.0

2021-11-04 06:01:00         0.0

2021-11-04 06:02:00         0.0

2021-11-04 06:03:00         0.0

2021-11-04 06:04:00         0.0

2021-11-04 06:05:00         0.0

2021-11-04 06:06:00         0.0

2021-11-04 06:07:00         0.0

应从数据集中删除初始化时间为 30 秒的每一行。

Answer 1

您可以先截断时间，然后删除重复项。由于 30 秒的元素是非唯一元素，因此它们被删除：

library(xts)
xts3 <- xts(x=rnorm(10), order.by=as.POSIXct(strptime("2021-11-04 05:57:00", "%Y-%m-%d %H:%M:%S")+1:10*30), born=as.POSIXct("1899-05-08"))

# Round observations in z to the next hour
index(xts3) <- as.POSIXct(trunc(index(xts3), units="mins"))

# Remove duplicate times in z
xts3_dup <- make.index.unique(xts3, drop = TRUE)

xts
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:58:00 -2.56173420
2021-11-04 05:59:00  0.64355622
2021-11-04 05:59:00 -0.18794658
2021-11-04 06:00:00  0.03005718
2021-11-04 06:00:00  0.64367384
2021-11-04 06:01:00  0.74716446
2021-11-04 06:01:00 -0.29986731
2021-11-04 06:02:00 -0.57503711

> xts3_dup
                           [,1]
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:59:00  0.64355622
2021-11-04 06:00:00  0.03005718
2021-11-04 06:01:00  0.74716446
2021-11-04 06:02:00 -0.57503711

Answer 2

假设最后的注释中显示输入x，

1) 像这样使用 to.minutes。如果您想保留 :00 值而不是 :30 值，请使用 1 而不是 4。

to.minutes(x, indexAt = "startof")[, 4]

给予：

                    x.Close
2021-11-04 05:57:00       2
2021-11-04 05:58:00       4
2021-11-04 05:59:00       6
2021-11-04 06:00:00       8
2021-11-04 06:01:00      10
2021-11-04 06:02:00      12
2021-11-04 06:03:00      14
2021-11-04 06:04:00      16
2021-11-04 06:05:00      18
2021-11-04 06:06:00      20
2021-11-04 06:07:00      22

2)另一种可能是grep以30结尾的时间，然后每次向下调整30秒。

xx <- x[grep("30$", time(x))]
time(xx) <- time(xx) - 30

如果你想要对应于 :00 的值，那么使用这个：

x[grep("00$", time(x))]

3) 另一种方法是使用aggregate.zoo，然后转换回xts。如果您想要 :00 值，请使用 head 而不是 tail，或者如果您想要每分钟周期的平均值，请使用 mean 而不是 head 或 tail 并省略 1。

as.xts(aggregate(x, as.POSIXct(sub("30$", "00", time(x))), tail, 1))

4)我们可以每隔一个元素，然后像以前一样将时间向下调整30秒。

xx <- x[rep(c(FALSE, TRUE), length = nrow(x))]
time(xx) <- time(xx) - 30

如果你想要 :00 次而不是交换 TRUE 和 FALSE 并省略时间的调整。

x[rep(c(TRUE, FALSE), length = nrow(x))]

备注

library(xts)
x <- structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22), .Dim = c(22L, 1L), index = structure(c(1636019820, 
1636019850, 1636019880, 1636019910, 1636019940, 1636019970, 1636020000, 
1636020030, 1636020060, 1636020090, 1636020120, 1636020150, 1636020180, 
1636020210, 1636020240, 1636020270, 1636020300, 1636020330, 1636020360, 
1636020390, 1636020420, 1636020450), tzone = "", tclass = c("POSIXct", 
"POSIXt")), class = c("xts", "zoo"))

> x
                    [,1]
2021-11-04 05:57:00    1
2021-11-04 05:57:30    2
2021-11-04 05:58:00    3
2021-11-04 05:58:30    4
2021-11-04 05:59:00    5
2021-11-04 05:59:30    6
2021-11-04 06:00:00    7
2021-11-04 06:00:30    8
2021-11-04 06:01:00    9
2021-11-04 06:01:30   10
2021-11-04 06:02:00   11
2021-11-04 06:02:30   12
2021-11-04 06:03:00   13
2021-11-04 06:03:30   14
2021-11-04 06:04:00   15
2021-11-04 06:04:30   16
2021-11-04 06:05:00   17
2021-11-04 06:05:30   18
2021-11-04 06:06:00   19
2021-11-04 06:06:30   20
2021-11-04 06:07:00   21
2021-11-04 06:07:30   22

我们如何根据秒标准从 xts 中删除行

how can we remove the rows from xts based on the seconds criteria

r

time-series

zoo

xts

备注