给定一个时间列表,如何将它们分组,使相近的时间在同一组中,而相距较远的时间不在同一组中?
Given a list of times, how to group them in such a way that close times are in the same group and distant ones are not?
假设我有一个具有相同日期类型 ZonedDateTime
的时间戳列表。
我不想将它们打印出来,而是希望能够以某种方式对它们进行分组,并且只打印间隔,例如
07:41:05 - 07:55:46
08:21:35 - 08:45:42 //first being the first elem of the group, second being the last
etc
我只是想先把它们都转换成毫秒,然后对时间戳进行排序,也许选择一个像 100000 毫秒这样的值作为分隔符,所以如果两个时间戳毫秒值相隔小于 100000 毫秒,我认为它们作为同一组的一部分。
在最坏的情况下,排序时所有时间戳都在这个距离内,然后我有一个巨大的组,其中间隔的开始和结束元素相隔几个小时,但我希望给定的数据集不太可能发生这种情况.
有更好的方法吗?
问题还没有回答。
使用 k 均值:
// sample data
List<ZonedDateTime> xs = IntStream.range(0, 10).mapToObj(n ->
ZonedDateTime.now().truncatedTo(ChronoUnit.DAYS)
.plus(ThreadLocalRandom.current().nextInt(0, 24 * 60), ChronoUnit.MINUTES))
.collect(toList());
// assume xs is not empty
ZonedDateTime day = xs.get(0).truncatedTo(ChronoUnit.DAYS);
final int WINDOWS = 3;
System.out.printf("== fixed windows (millis precision) using k-means%n");
Map<Double, List<ZonedDateTime>> points = xs.stream()
.collect(groupingBy(x -> (double) ((x.toInstant().toEpochMilli() - day.toInstant().toEpochMilli()) / 1000), toList()));
Double[] keys = points.keySet().stream().sorted().toArray(Double[]::new);
double[][] kpoints = new double[keys.length][2];
// put keys along f(x)=0 line
for (int i = 0; i < keys.length; i++) {
kpoints[i][0] = keys[i];
kpoints[i][1] = 0;
}
double[][] centroids = new double[WINDOWS][2];
for (int i = 0; i < WINDOWS; i++) {
centroids[i][0] = ThreadLocalRandom.current().nextDouble(keys[0], keys[keys.length - 1]);
centroids[i][1] = 0;
}
final EKmeans eKmeans = new EKmeans(centroids, kpoints);
eKmeans.run();
// regroup
int[] igroup = eKmeans.getAssignments();
Map<Integer, List<ZonedDateTime>> groups =
IntStream.range(0, igroup.length).boxed()
.collect(groupingBy(i -> igroup[i], collectingAndThen(toList(),
rs -> rs.stream().flatMap(r -> points.get(keys[r]).stream()).collect(toList()))));
groups.forEach((k, rs) -> {
System.out.printf(" - group %d%n", k);
rs.forEach(r -> System.out.printf(" %s%n", r.format(ISO_LOCAL_TIME)));
});
有输出
== fixed windows (millis precision) using k-means
- group 0
03:09:00
03:22:00
05:22:00
05:38:00
07:34:00
- group 1
16:30:00
18:25:00
- group 2
11:23:00
11:48:00
14:07:00
假设我有一个具有相同日期类型 ZonedDateTime
的时间戳列表。
我不想将它们打印出来,而是希望能够以某种方式对它们进行分组,并且只打印间隔,例如
07:41:05 - 07:55:46
08:21:35 - 08:45:42 //first being the first elem of the group, second being the last
etc
我只是想先把它们都转换成毫秒,然后对时间戳进行排序,也许选择一个像 100000 毫秒这样的值作为分隔符,所以如果两个时间戳毫秒值相隔小于 100000 毫秒,我认为它们作为同一组的一部分。
在最坏的情况下,排序时所有时间戳都在这个距离内,然后我有一个巨大的组,其中间隔的开始和结束元素相隔几个小时,但我希望给定的数据集不太可能发生这种情况.
有更好的方法吗? 问题还没有回答。
使用 k 均值:
// sample data
List<ZonedDateTime> xs = IntStream.range(0, 10).mapToObj(n ->
ZonedDateTime.now().truncatedTo(ChronoUnit.DAYS)
.plus(ThreadLocalRandom.current().nextInt(0, 24 * 60), ChronoUnit.MINUTES))
.collect(toList());
// assume xs is not empty
ZonedDateTime day = xs.get(0).truncatedTo(ChronoUnit.DAYS);
final int WINDOWS = 3;
System.out.printf("== fixed windows (millis precision) using k-means%n");
Map<Double, List<ZonedDateTime>> points = xs.stream()
.collect(groupingBy(x -> (double) ((x.toInstant().toEpochMilli() - day.toInstant().toEpochMilli()) / 1000), toList()));
Double[] keys = points.keySet().stream().sorted().toArray(Double[]::new);
double[][] kpoints = new double[keys.length][2];
// put keys along f(x)=0 line
for (int i = 0; i < keys.length; i++) {
kpoints[i][0] = keys[i];
kpoints[i][1] = 0;
}
double[][] centroids = new double[WINDOWS][2];
for (int i = 0; i < WINDOWS; i++) {
centroids[i][0] = ThreadLocalRandom.current().nextDouble(keys[0], keys[keys.length - 1]);
centroids[i][1] = 0;
}
final EKmeans eKmeans = new EKmeans(centroids, kpoints);
eKmeans.run();
// regroup
int[] igroup = eKmeans.getAssignments();
Map<Integer, List<ZonedDateTime>> groups =
IntStream.range(0, igroup.length).boxed()
.collect(groupingBy(i -> igroup[i], collectingAndThen(toList(),
rs -> rs.stream().flatMap(r -> points.get(keys[r]).stream()).collect(toList()))));
groups.forEach((k, rs) -> {
System.out.printf(" - group %d%n", k);
rs.forEach(r -> System.out.printf(" %s%n", r.format(ISO_LOCAL_TIME)));
});
有输出
== fixed windows (millis precision) using k-means
- group 0
03:09:00
03:22:00
05:22:00
05:38:00
07:34:00
- group 1
16:30:00
18:25:00
- group 2
11:23:00
11:48:00
14:07:00