使用 Collections.binarySearch() 进行谓词搜索（即不完全匹配）

Question

我有一个按升序排序的时间戳列表：

List<Instant> timestamps = ...; // note: sorted in ascending order

现在，给定输入时间戳 Instant inputTs，我想在 timestamps 中找到满足 t.isBefore(inputTs) && inputTs.isBefore(t.plusMillis(SOME_CONSTANT)) 的条目 t，即，我只是在寻找一个t 这样 inputTs 位于从 t 开始的某个固定长度持续时间的范围内。我承认理论上可以有多个这样的t，所以允许搜索在这些之间任意选择。

Collections.binarySearch(...) 重载需要一个键，表明 common/intended 用例是搜索 "complete match"/相同的条目（抱歉，缺少更好的词）。但是，在我的例子中，inputTs 与 timestamps 中存在的条目不同，因为 inputTs 预计是某个时间点之后不久timestamps.

中的条目 t

我的想法是简单地让我提供给Collections.binarySearch(...) return的Comparator<Instant>当谓词成立时为0：

public class TimestampFinder {
    private static final long SOME_CONSTANT = 10_000;
    private List<Instant> timestamps = ... ; // initialize and sort

    public Instant find(Instant inputTs) {
        int index = Collections.binarySearch(timestamps, inputTs, (t, key) -> {
            if (t.isBefore(key) && key.isBefore(t.plusMillis(SOME_CONSTANT))) {
                // inputTs is part of the duration after t
                // return 0 to indicate that we've found a match
                return 0;
            }
            // inputTs not in interval
            // use Instant's implementation of Comparator to indicate to binarySearch if it should continue the search in the upper or lower half of the list
            return t.compareTo(key);
        });
        return index >= 0 ? timestamps.get(index) : null;
    }
}

这是解决此问题的正确（有效）方法，还是有我忽略的更好的替代方法？请注意，对 find(Instant) 的调用次数将大大超过 timestamps 中元素的数量，这就是为什么我认为排序 timestamps 所产生的开销是合理的。

Answer 1

Collections.binarySearch 没有用于精确匹配。如文档中所述，如果未找到完全匹配项，它将 return -1 - i 其中 i 是列表中下一个更高元素的索引。

只需按自然顺序搜索 inputTs。如果没有找到，那么你可以从 inputTs 推导出下一个更高的 Instant 的索引（只需做 -1 - resultOfBinarySearch ）。如果该索引处的 Instant 是 before(inputTs.plusMillis(CONSTANT))，那么您就完成了，否则，不存在这样的 Instant。

我认为您现有的解决方案在某些方面滥用了 binarySearch，这是值得的。

使用 Collections.binarySearch() 进行谓词搜索（即不完全匹配）

Using Collections.binarySearch() for predicate search (i.e., not complete match)

java

collections

binary-search

java-time