给定一个时间列表，如何将它们分组，使相近的时间在同一组中，而相距较远的时间不在同一组中？

Question

假设我有一个具有相同日期类型 ZonedDateTime 的时间戳列表。我不想将它们打印出来，而是希望能够以某种方式对它们进行分组，并且只打印间隔，例如

07:41:05 - 07:55:46
08:21:35 - 08:45:42  //first being the first elem of the group, second being the last
etc

我只是想先把它们都转换成毫秒，然后对时间戳进行排序，也许选择一个像 100000 毫秒这样的值作为分隔符，所以如果两个时间戳毫秒值相隔小于 100000 毫秒，我认为它们作为同一组的一部分。

在最坏的情况下，排序时所有时间戳都在这个距离内，然后我有一个巨大的组，其中间隔的开始和结束元素相隔几个小时，但我希望给定的数据集不太可能发生这种情况.

有更好的方法吗？问题还没有回答。

Answer 1

使用 k 均值：

// sample data
List<ZonedDateTime> xs = IntStream.range(0, 10).mapToObj(n ->
        ZonedDateTime.now().truncatedTo(ChronoUnit.DAYS)
                .plus(ThreadLocalRandom.current().nextInt(0, 24 * 60), ChronoUnit.MINUTES))
        .collect(toList());

// assume xs is not empty
ZonedDateTime day = xs.get(0).truncatedTo(ChronoUnit.DAYS);

final int WINDOWS = 3;

System.out.printf("== fixed windows (millis precision) using k-means%n");
Map<Double, List<ZonedDateTime>> points = xs.stream()
        .collect(groupingBy(x -> (double) ((x.toInstant().toEpochMilli() - day.toInstant().toEpochMilli()) / 1000), toList()));
Double[] keys = points.keySet().stream().sorted().toArray(Double[]::new);
double[][] kpoints = new double[keys.length][2];
// put keys along f(x)=0 line
for (int i = 0; i < keys.length; i++) {
    kpoints[i][0] = keys[i];
    kpoints[i][1] = 0;
}
double[][] centroids = new double[WINDOWS][2];
for (int i = 0; i < WINDOWS; i++) {
    centroids[i][0] = ThreadLocalRandom.current().nextDouble(keys[0], keys[keys.length - 1]);
    centroids[i][1] = 0;
}
final EKmeans eKmeans = new EKmeans(centroids, kpoints);
eKmeans.run();
// regroup
int[] igroup = eKmeans.getAssignments();
Map<Integer, List<ZonedDateTime>> groups =
        IntStream.range(0, igroup.length).boxed()
                .collect(groupingBy(i -> igroup[i], collectingAndThen(toList(),
                        rs -> rs.stream().flatMap(r -> points.get(keys[r]).stream()).collect(toList()))));
groups.forEach((k, rs) -> {
    System.out.printf("  - group %d%n", k);
    rs.forEach(r -> System.out.printf("   %s%n", r.format(ISO_LOCAL_TIME)));
});

有输出

== fixed windows (millis precision) using k-means
  - group 0
   03:09:00
   03:22:00
   05:22:00
   05:38:00
   07:34:00
  - group 1
   16:30:00
   18:25:00
  - group 2
   11:23:00
   11:48:00
   14:07:00

给定一个时间列表，如何将它们分组，使相近的时间在同一组中，而相距较远的时间不在同一组中？

Given a list of times, how to group them in such a way that close times are in the same group and distant ones are not?

java

algorithm

time

grouping

zoneddatetime