难道真正的Hadoop框架不适合实时运行?
Is it real Hadoop framework is not suitable for real-time operation?
我在博客中读到
Hadoop is batch processing centric ideal for the discovery, exploration and analysis of large amounts of multi-structured data that doesn’t fit nicely into table, and not suitable for real-time operations.
所以,任何人都可以通过对此提供更好的解释来帮助我,比如为什么它不适合实时操作。 TQ
Hadoop MapReduce 不适合实时处理。
但现在,情况正在改变。例如,Storm, Spark 提供近乎实时的处理能力。
Spark 在内存计算中使用以实现更快的处理速度。它使用RDD(Resilient Distributed Dataset)作为内存抽象。
Storm 使用 spouts(sources) 和 bolts(sinks) 的 DAG。这称为拓扑和拓扑保持 运行。即,它从喷口获取数据并提供给 bolts.Bolts 可以将此数据写入数据库或使其可供用户使用。这减少了处理时间。
对于实时处理,您有 HBase,它是 Hadoop 生态系统的一部分:
Apache HBase is the Hadoop database, a distributed, scalable, big
data store.
When Would I Use Apache HBase?
Use Apache HBase when you need random, realtime read/write access to
your Big Data. This project's goal is the hosting of very large tables
-- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed,
versioned, non-relational database modeled after Google's Bigtable: A
Distributed Storage System for Structured Data by Chang et al. Just as
Bigtable leverages the distributed data storage provided by the Google
File System, Apache HBase provides Bigtable-like capabilities on top
of Hadoop and HDFS.
Features
- Linear and modular scalability.
- List item
- Strictly consistent reads and writes.
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers.
- Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
- Easy to use Java API for client access.
- Block cache and Bloom Filters for real-time queries.
- Query predicate push down via server side Filters
- Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
- Extensible jruby-based (JIRB) shell
- Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
它还支持原子计数器,这是 HBase 的最强点之一,可以帮助您减少对大型分析作业的需求(通过仔细和计划的行键和模式设计)。
我在博客中读到
Hadoop is batch processing centric ideal for the discovery, exploration and analysis of large amounts of multi-structured data that doesn’t fit nicely into table, and not suitable for real-time operations.
所以,任何人都可以通过对此提供更好的解释来帮助我,比如为什么它不适合实时操作。 TQ
Hadoop MapReduce 不适合实时处理。
但现在,情况正在改变。例如,Storm, Spark 提供近乎实时的处理能力。
Spark 在内存计算中使用以实现更快的处理速度。它使用RDD(Resilient Distributed Dataset)作为内存抽象。
Storm 使用 spouts(sources) 和 bolts(sinks) 的 DAG。这称为拓扑和拓扑保持 运行。即,它从喷口获取数据并提供给 bolts.Bolts 可以将此数据写入数据库或使其可供用户使用。这减少了处理时间。
对于实时处理,您有 HBase,它是 Hadoop 生态系统的一部分:
Apache HBase is the Hadoop database, a distributed, scalable, big data store.
When Would I Use Apache HBase?
Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
Features
- Linear and modular scalability.
- List item
- Strictly consistent reads and writes.
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers.
- Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
- Easy to use Java API for client access.
- Block cache and Bloom Filters for real-time queries.
- Query predicate push down via server side Filters
- Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
- Extensible jruby-based (JIRB) shell
- Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
它还支持原子计数器,这是 HBase 的最强点之一,可以帮助您减少对大型分析作业的需求(通过仔细和计划的行键和模式设计)。