在 R 中，找到 1 个或多个增加 1 的序列的开始和结束索引的有效方法是什么

Question

我有一个数字向量：
SampleVector <- c(2,4,7,8,9,12,14,16,17,19,23,24,25,26,27,29)
我想找到序列开始和结束时元素的索引增加 1，但我也想要不属于序列的元素的索引。
另一种说法：我想要所有不在单步序列内的元素的索引。
对于 SampleVector，我想要的索引是：
DesiredIndices <- c(1,2,3,5,6,7,8,9,10,11,15,16)
也就是说，除了数字 8（因为它在 7:9 序列中）和数字 24、25 和 26（因为它们在 23:27 序列中。
到目前为止我最好的尝试是：

SequenceStartAndEndIndices <- function(vector){
  DifferenceVector          <- diff(vector)
  DiffRunLength             <- rle(DifferenceVector)
  IndicesOfSingleElements   <- which(DifferenceVector > 1) + 1
  IndicesOfEndOfSequences   <- cumsum(DiffRunLength$lengths)[which((DiffRunLength$lengths * DiffRunLength$values) == DiffRunLength$lengths)] + 1
  IndicesOfStartsOfSequences<- c(1,head(IndicesOfEndOfSequences+1,-1))
  UniqueIndices             <- unique(c(IndicesOfStartsOfSequences,IndicesOfEndOfSequences,IndicesOfSingleElements))
  SortedIndices             <- UniqueIndices[order(UniqueIndices)]
  return(SortedIndices)
}

这个函数给出了正确答案：

> SequenceStartAndEndIndices(vector = SampleVector)
 [1]  1  2  3  5  6  7  8  9 10 11 15 16

..但几乎不可能遵循，而且它的普遍适用性还不清楚。有没有更好的方法，或者某个包中的现有函数？

作为背景，这样做的目的是帮助将距离标记的长向量解析为人类可读的内容，例如而不是“在公里：1、8、9、10、11、13”我将能够提供“在公里：1、8 到 11 和 13”。

Answer 1

这应该有效，因为如果出现以下情况，则不包括值的索引：1) 该值比前一个值大 1； 2) 比下一个小1.

> x <- diff(SampleVector)
> seq_along(SampleVector)[!(c(0, x) == 1 & c(x, 0) == 1)]
 [1]  1  2  3  5  6  7  8  9 10 11 15 16

Answer 2

您可以尝试在基数 R 中使用 tapply 来创建连续数字组。

SampleVector <- c(2,4,7,8,9,12,14,16,17,19,23,24,25,26,27,29)

toString(tapply(SampleVector, 
         cumsum(c(TRUE, diff(SampleVector) > 1)), function(x) {
          if(length(x) == 1) x else paste(x[1], x[length(x)], sep = ' to ')
}))

#[1] "2, 4, 7 to 9, 12, 14, 16 to 17, 19, 23 to 27, 29"

在 R 中，找到 1 个或多个增加 1 的序列的开始和结束索引的有效方法是什么

In R what is an efficient way of finding the indices of the start and finish of sequences of 1 or more numbers that increase by 1

r

sequence