Apache Storm 和 Flink 的区别

Question

我正在使用这两个实时数据流框架处理器。我到处搜索，但我找不到这两个框架之间的大区别。特别是我想知道它们如何根据数据大小或拓扑等工作。

Answer 1

区别主要在于处理数据流的抽象级别。

Apache Storm 的级别更低一些，处理连接在一起的数据源 (Spouts) 和处理器 (Bolts)，以反应方式对单个消息执行转换和聚合。

有一个 Trident API 从这个低级消息驱动的视图中抽象出一点，变成更聚合的查询，如构造，这使得事情更容易集成在一起。（还有一个类似 SQL 的接口用于查询数据流，但它仍被标记为实验性的。）

来自文档：

TridentState wordCounts =
     topology.newStream("spout1", spout)
       .each(new Fields("sentence"), new Split(), new Fields("word"))
       .groupBy(new Fields("word"))
       .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))                
       .parallelismHint(6);

Apache Flink 有一个更像函数式的接口来处理事件。如果您习惯了 Java 8 风格的流处理（或其他功能风格的语言，如 Scala 或 Kotlin），这看起来会很熟悉。它还有一个很好的基于网络的监控工具。它的好处是它有内置的结构，可以按时间 windows 等进行聚合。（在 Storm 中，你也可以使用 Trident 来做到这一点）。

来自文档：

 DataStream<WordWithCount> windowCounts = text
            .flatMap(new FlatMapFunction<String, WordWithCount>() {
                @Override
                public void flatMap(String value, Collector<WordWithCount> out) {
                    for (String word : value.split("\s")) {
                        out.collect(new WordWithCount(word, 1L));
                    }
                }
            })
            .keyBy("word")
            .timeWindow(Time.seconds(5), Time.seconds(1))
            .reduce(new ReduceFunction<WordWithCount>() {
                @Override
                public WordWithCount reduce(WordWithCount a, WordWithCount b) {
                    return new WordWithCount(a.word, a.count + b.count);
                }
            });

当我评估两者时，我选择了 Flink，只是因为当时感觉它的文档更完备，而且我更容易上手。风暴稍微模糊一些。有一个course on Udacity让我对它有了更深的理解，但最终还是觉得Flink更符合我的需求

您可能还想看看这个，虽然有点旧，所以这两个项目一定是从那时起发展的。

Apache Storm 和 Flink 的区别

Difference between Apache Storm and Flink

cluster-computing

bigdata

apache-storm

apache-flink