难道真正的Hadoop框架不适合实时运行?

Is it real Hadoop framework is not suitable for real-time operation?

我在博客中读到

Hadoop is batch processing centric ideal for the discovery, exploration and analysis of large amounts of multi-structured data that doesn’t fit nicely into table, and not suitable for real-time operations.

所以,任何人都可以通过对此提供更好的解释来帮助我,比如为什么它不适合实时操作。 TQ

Hadoop MapReduce 不适合实时处理。

但现在,情况正在改变。例如,Storm, Spark 提供近乎实时的处理能力。

Spark 在内存计算中使用以实现更快的处理速度。它使用RDD(Resilient Distributed Dataset)作为内存抽象。

Storm 使用 spouts(sources) 和 bolts(sinks) 的 DAG。这称为拓扑和拓扑保持 运行。即,它从喷口获取数据并提供给 bolts.Bolts 可以将此数据写入数据库或使其可供用户使用。这减少了处理时间。

对于实时处理,您有 HBase,它是 Hadoop 生态系统的一部分:

http://hbase.apache.org/

Apache HBase is the Hadoop database, a distributed, scalable, big data store.

When Would I Use Apache HBase?

Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Features

  • Linear and modular scalability.
  • List item
  • Strictly consistent reads and writes.
  • Automatic and configurable sharding of tables
  • Automatic failover support between RegionServers.
  • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
  • Easy to use Java API for client access.
  • Block cache and Bloom Filters for real-time queries.
  • Query predicate push down via server side Filters
  • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
  • Extensible jruby-based (JIRB) shell
  • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

它还支持原子计数器,这是 HBase 的最强点之一,可以帮助您减少对大型分析作业的需求(通过仔细和计划的行键和模式设计)。