过滤 {quantmod} 系列之后 and/or 之前不变的价格部分

Filter unchanging price portions preceding and/or following {quantmod} series

我有很多来自 {BatchGetSymbols} 的股票价格系列,我认为是一个 {quantmod} 包装器,并且希望在开始时有一段长时间不变的价格时过滤掉 and/or该系列的结尾。并非所有的代码都有这些不变的部分。在下面的示例中,我想删除开头 $10 和结尾 $5 的不变价格,只保留最后的 $10、第一个 $5 和中间的不同价格。

不变件的价格无法提前得知,会根据具体情况而有所不同。系列中的第一个日期将具有第一个不变价格的价格,最后一个日期将具有第二个不变价格。我有近 500 万行想要 data.table 解决方案。

``` r
# Data
library(data.table)
data <- 
  data.table::data.table(
    ticker = "stockA",
    date = seq.Date(
      from = as.Date("2017-6-30"),
      to = as.Date("2017-7-19"),
      by = 1
      ),
    price =  c(rep(10, 5), rnorm(10, 8, 1), rep(5, 5))
  )

# Plot showing unchanging portion at start and end
plot(data$price)
```

DESIRED RESULT:

``` r
new_data <- 
  rbind(
    data[!price %in% c(10, 5)],
    data.table(
      ticker = "stockA",
      date = c(as.Date("2017-06-30"), as.Date("2017-08-29")),
      price = c(10, 5)
    ))[order(date)]

new_data
#>     ticker       date     price
#>  1: stockA 2017-06-30 10.000000
#>  2: stockA 2017-07-05  6.890370
#>  3: stockA 2017-07-06  8.137852
#>  4: stockA 2017-07-07  7.759324
#>  5: stockA 2017-07-08  8.861941
#>  6: stockA 2017-07-09  8.250837
#>  7: stockA 2017-07-10  8.570328
#>  8: stockA 2017-07-11  8.826646
#>  9: stockA 2017-07-12  7.872192
#> 10: stockA 2017-07-13  7.755318
#> 11: stockA 2017-07-14  9.731524
#> 12: stockA 2017-08-29  5.000000
```

reprex package (v2.0.0)

于 2021-07-22 创建

您可以使用rleid创建一个连续相似值的id,然后删除第一个和最后一个价格的行。

library(data.table)

data[, id := rleid(price)]
data[!(price == first(price) & id == 1 | price == last(price) & id == max(id))]

#    ticker       date     price id
# 1: stockA 2017-07-05 9.1267303  2
# 2: stockA 2017-07-06 7.8969750  3
# 3: stockA 2017-07-07 6.4109158  4
# 4: stockA 2017-07-08 7.1900800  5
# 5: stockA 2017-07-09 9.6342601  6
# 6: stockA 2017-07-10 9.4615477  7
# 7: stockA 2017-07-11 9.4091043  8
# 8: stockA 2017-07-12 8.5279983  9
# 9: stockA 2017-07-13 7.7585034 10
#10: stockA 2017-07-14 8.1477831 11

要包括最后价格和第一个价格,请使用 -

data[!duplicated(id, fromLast = TRUE) & id == 1 | 
     !duplicated(id) & id == max(id) | between(id, 2, max(id) - 1)]

#    ticker       date      price id
# 1: stockA 2017-07-04 10.0000000  1
# 2: stockA 2017-07-05  9.1267303  2
# 3: stockA 2017-07-06  7.8969750  3
# 4: stockA 2017-07-07  6.4109158  4
# 5: stockA 2017-07-08  7.1900800  5
# 6: stockA 2017-07-09  9.6342601  6
# 7: stockA 2017-07-10  9.4615477  7
# 8: stockA 2017-07-11  9.4091043  8
# 9: stockA 2017-07-12  8.5279983  9
#10: stockA 2017-07-13  7.7585034 10
#11: stockA 2017-07-14  8.1477831 11
#12: stockA 2017-07-15  5.0000000 12