Apache Apex 与 Apache Storm 有何不同?
How Apache Apex is different from Apache Storm?
Apache Apex looks similar to Apache Storm。
- 用户在两个平台上构建 application/topology 作为有向无环图 (DAG)。 Apex 使用 operators/streams 而 Storm 使用 spouts/streams/bolts。
- 它们都实时处理数据,而不是批处理。
- 两者似乎都具有高吞吐量和低延迟
因此,乍一看,两者看起来很相似,但我不太明白其中的区别。有人可以解释一下主要区别是什么吗?换句话说,我什么时候应该使用一个而不是另一个?
体系结构存在根本差异,这使得每个平台在延迟、扩展和状态管理方面都大不相同。
在最基本的层面上,
- Apache Storm 使用记录确认来保证消息传递。
- Apache Apex 使用检查点来保证消息传递。
您可以在以下博客中了解更多差异,其中还包括其他主流流处理平台。
https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/
架构和功能
+-------------------+---------------------------+---------------------+
| | Storm | Apex |
+-------------------+---------------------------+---------------------+
| Model | Native Streaming | Native Streaming |
| | Micro batch (Trident | |
+-------------------+---------------------------+---------------------+
| Language | Java. | Java (Scala) |
| | Ability to use non | |
| | JVM languages support | |
+-------------------+---------------------------+---------------------+
| API | Compositional | Compositional (DAG) |
| | Declarative (Trident) | Declarative |
| | Limited SQL | |
| | support (Trident) | |
+-------------------+---------------------------+---------------------+
| Locality | Data Locality | Advance Processing |
+-------------------+---------------------------+---------------------+
| Latency | Low | Very Low |
| | High (Trident) | |
+-------------------+---------------------------+---------------------+
| Throughput | Limited in Ack mode | Very high |
+-------------------+---------------------------+---------------------+
| Scalibility | Limited due to Ack | Horizontal |
+-------------------+---------------------------+---------------------+
| Partitioning | Standard | Advance |
| | Set parallelism at work, | Parallel pipes, |
| | executor and task level | unifiers |
+-------------------+---------------------------+---------------------+
| Connector Library | Limited (certification) | Rich library of |
| | | connectors in |
| | | Apex Malhar |
+-------------------+---------------------------+---------------------+
可操作性
+------------+--------------------------+---------------------+
| | Storm | Apex |
+------------+--------------------------+---------------------+
| State | External store | Checkpointing |
| Management | Limited checkpointing | Local checkpointing |
| | Difficult to exploit | |
| | local state | |
+------------+--------------------------+---------------------+
| Recovery | Cumbersome API to | Incremental |
| | store and retrieve state | (buffer server) |
| | Require user code | |
+------------+--------------------------+---------------------+
| Processing | At least once | |
| Semantic | Exactly once require | At least once |
| | user code and affect | End to end |
| | latency | |
| | | exactly once |
+------------+--------------------------+---------------------+
| Back | Watermark on queue | Automatic |
| Pressure | size for spout and bolt | Buffer server |
| | Does not scale | memory and disk |
+------------+--------------------------+---------------------+
| Elasticity | Through CLI only | Yes w/ full user |
| | | control |
+------------+--------------------------+---------------------+
| Dynamic | No | Yes |
| topology | | |
+------------+--------------------------+---------------------+
| Security | Kerberos | Kerberos, RBAC, |
| | | LDAP |
+------------+--------------------------+---------------------+
| Multi | Mesos, RAS - memory, | YARN |
| Tenancy | CPU, YARN | full isolation |
+------------+--------------------------+---------------------+
| DevOps | REST API | REST API |
| Tools | Basic UI | DataTorrent RTS |
+------------+--------------------------+---------------------+
来源:
网络研讨会:Apache Apex(下一代 Hadoop)与 Storm - 比较和迁移大纲 https://www.youtube.com/watch?v=sPjyo2HfD_I
Apache Apex looks similar to Apache Storm。
- 用户在两个平台上构建 application/topology 作为有向无环图 (DAG)。 Apex 使用 operators/streams 而 Storm 使用 spouts/streams/bolts。
- 它们都实时处理数据,而不是批处理。
- 两者似乎都具有高吞吐量和低延迟
因此,乍一看,两者看起来很相似,但我不太明白其中的区别。有人可以解释一下主要区别是什么吗?换句话说,我什么时候应该使用一个而不是另一个?
体系结构存在根本差异,这使得每个平台在延迟、扩展和状态管理方面都大不相同。
在最基本的层面上,
- Apache Storm 使用记录确认来保证消息传递。
- Apache Apex 使用检查点来保证消息传递。
您可以在以下博客中了解更多差异,其中还包括其他主流流处理平台。
https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/
架构和功能
+-------------------+---------------------------+---------------------+
| | Storm | Apex |
+-------------------+---------------------------+---------------------+
| Model | Native Streaming | Native Streaming |
| | Micro batch (Trident | |
+-------------------+---------------------------+---------------------+
| Language | Java. | Java (Scala) |
| | Ability to use non | |
| | JVM languages support | |
+-------------------+---------------------------+---------------------+
| API | Compositional | Compositional (DAG) |
| | Declarative (Trident) | Declarative |
| | Limited SQL | |
| | support (Trident) | |
+-------------------+---------------------------+---------------------+
| Locality | Data Locality | Advance Processing |
+-------------------+---------------------------+---------------------+
| Latency | Low | Very Low |
| | High (Trident) | |
+-------------------+---------------------------+---------------------+
| Throughput | Limited in Ack mode | Very high |
+-------------------+---------------------------+---------------------+
| Scalibility | Limited due to Ack | Horizontal |
+-------------------+---------------------------+---------------------+
| Partitioning | Standard | Advance |
| | Set parallelism at work, | Parallel pipes, |
| | executor and task level | unifiers |
+-------------------+---------------------------+---------------------+
| Connector Library | Limited (certification) | Rich library of |
| | | connectors in |
| | | Apex Malhar |
+-------------------+---------------------------+---------------------+
可操作性
+------------+--------------------------+---------------------+
| | Storm | Apex |
+------------+--------------------------+---------------------+
| State | External store | Checkpointing |
| Management | Limited checkpointing | Local checkpointing |
| | Difficult to exploit | |
| | local state | |
+------------+--------------------------+---------------------+
| Recovery | Cumbersome API to | Incremental |
| | store and retrieve state | (buffer server) |
| | Require user code | |
+------------+--------------------------+---------------------+
| Processing | At least once | |
| Semantic | Exactly once require | At least once |
| | user code and affect | End to end |
| | latency | |
| | | exactly once |
+------------+--------------------------+---------------------+
| Back | Watermark on queue | Automatic |
| Pressure | size for spout and bolt | Buffer server |
| | Does not scale | memory and disk |
+------------+--------------------------+---------------------+
| Elasticity | Through CLI only | Yes w/ full user |
| | | control |
+------------+--------------------------+---------------------+
| Dynamic | No | Yes |
| topology | | |
+------------+--------------------------+---------------------+
| Security | Kerberos | Kerberos, RBAC, |
| | | LDAP |
+------------+--------------------------+---------------------+
| Multi | Mesos, RAS - memory, | YARN |
| Tenancy | CPU, YARN | full isolation |
+------------+--------------------------+---------------------+
| DevOps | REST API | REST API |
| Tools | Basic UI | DataTorrent RTS |
+------------+--------------------------+---------------------+
来源: 网络研讨会:Apache Apex(下一代 Hadoop)与 Storm - 比较和迁移大纲 https://www.youtube.com/watch?v=sPjyo2HfD_I