为什么 Hazelcast IMap 期刊没有产生所有预期的事件?
Why is Hazelcast IMap journal not producing all expected events?
下面的最小工作示例快速生成事件,然后更新 IMap。 IMap 依次从其日志中生成更新事件。
public class FastIMapExample {
private static final int NUMBER_OF_GROUPS = 10;
private static final int NUMBER_OF_EVENTS = 1000;
public static void main(String[] args) {
JetInstance jet = Jet.newJetInstance();
IMap<Long, Long> groups = jet.getMap("groups");
Pipeline p1 = Pipeline.create();
p1.readFrom(fastStreamOfLongs(NUMBER_OF_EVENTS))
.withoutTimestamps()
.writeTo(Sinks.mapWithUpdating(groups,
event -> event % NUMBER_OF_GROUPS,
(oldState, event) -> increment(oldState)
));
Pipeline p2 = Pipeline.create();
p2.readFrom(Sources.mapJournal(groups, START_FROM_OLDEST))
.withIngestionTimestamps()
.map(x -> x.getKey() + " -> " + x.getValue())
.writeTo(Sinks.logger());
jet.newJob(p2);
jet.newJob(p1).join();
}
private static StreamSource<Long> fastStreamOfLongs(int numberOfEvents) {
return SourceBuilder
.stream("fast-longs", ctx -> new AtomicLong(0))
.<Long>fillBufferFn((num, buf) -> {
long val = num.getAndIncrement();
if (val < numberOfEvents) buf.add(val);
})
.build();
}
private static long increment(Long x) {
return x == null ? 1 : x + 1;
}
}
示例输出:
3 -> 7
3 -> 50
3 -> 79
7 -> 42
...
6 -> 100
0 -> 82
9 -> 41
9 -> 100
我原本希望看到准确的 1000 个事件来描述每个更新。相反,我看到大约 50-80 个事件。 (似乎输出包含每个组的所有最新更新(即 "-> 100"
),但除此之外它是一个随机子集。)
当 NUMBER_OF_GROUPS
等于 NUMBER_OF_EVENTS
(或当事件生成被人为减慢时)我收到所有 1000 个更新。
这种行为是预期的吗?是否可以从快速源接收所有更新事件?
Sinks.mapWithUpdating
使用批处理,因此一些更新在发送实际更新条目处理器之前在本地应用。您需要使用 Sinks.mapWithEntryProcessor
为每个项目发送更新条目处理器。
来自 Sinks.mapWithEntryProcessor
的 JavaDoc:
* As opposed to {@link #mapWithUpdating} and {@link #mapWithMerging},
* this sink does not use batching and submits a separate entry processor
* for each received item. For use cases that are efficiently solvable
* using those sinks, this one will perform worse. It should be used only
* when they are not applicable.
请记住,事件日志的默认容量为 10K,如果您使用默认分区计数,则每个分区 36 个,不足以一次存储所有更新。对于您的情况,如果您使用默认分区数,则需要将容量设置为 271K 或更高以存储所有更新。
下面的最小工作示例快速生成事件,然后更新 IMap。 IMap 依次从其日志中生成更新事件。
public class FastIMapExample {
private static final int NUMBER_OF_GROUPS = 10;
private static final int NUMBER_OF_EVENTS = 1000;
public static void main(String[] args) {
JetInstance jet = Jet.newJetInstance();
IMap<Long, Long> groups = jet.getMap("groups");
Pipeline p1 = Pipeline.create();
p1.readFrom(fastStreamOfLongs(NUMBER_OF_EVENTS))
.withoutTimestamps()
.writeTo(Sinks.mapWithUpdating(groups,
event -> event % NUMBER_OF_GROUPS,
(oldState, event) -> increment(oldState)
));
Pipeline p2 = Pipeline.create();
p2.readFrom(Sources.mapJournal(groups, START_FROM_OLDEST))
.withIngestionTimestamps()
.map(x -> x.getKey() + " -> " + x.getValue())
.writeTo(Sinks.logger());
jet.newJob(p2);
jet.newJob(p1).join();
}
private static StreamSource<Long> fastStreamOfLongs(int numberOfEvents) {
return SourceBuilder
.stream("fast-longs", ctx -> new AtomicLong(0))
.<Long>fillBufferFn((num, buf) -> {
long val = num.getAndIncrement();
if (val < numberOfEvents) buf.add(val);
})
.build();
}
private static long increment(Long x) {
return x == null ? 1 : x + 1;
}
}
示例输出:
3 -> 7
3 -> 50
3 -> 79
7 -> 42
...
6 -> 100
0 -> 82
9 -> 41
9 -> 100
我原本希望看到准确的 1000 个事件来描述每个更新。相反,我看到大约 50-80 个事件。 (似乎输出包含每个组的所有最新更新(即 "-> 100"
),但除此之外它是一个随机子集。)
当 NUMBER_OF_GROUPS
等于 NUMBER_OF_EVENTS
(或当事件生成被人为减慢时)我收到所有 1000 个更新。
这种行为是预期的吗?是否可以从快速源接收所有更新事件?
Sinks.mapWithUpdating
使用批处理,因此一些更新在发送实际更新条目处理器之前在本地应用。您需要使用 Sinks.mapWithEntryProcessor
为每个项目发送更新条目处理器。
来自 Sinks.mapWithEntryProcessor
的 JavaDoc:
* As opposed to {@link #mapWithUpdating} and {@link #mapWithMerging},
* this sink does not use batching and submits a separate entry processor
* for each received item. For use cases that are efficiently solvable
* using those sinks, this one will perform worse. It should be used only
* when they are not applicable.
请记住,事件日志的默认容量为 10K,如果您使用默认分区计数,则每个分区 36 个,不足以一次存储所有更新。对于您的情况,如果您使用默认分区数,则需要将容量设置为 271K 或更高以存储所有更新。