使用多个列族或单个列族时，HBase 扫描的性能会更好吗？

Would an HBase Scan perform better with multiple Column Families or single Column Family?

我想在 HBase 中存储一个对象（有效负载）和一些元数据。

然后我想运行查询 table 并根据元数据信息提取有效负载部分。

例如，假设我有以下列限定符

P：载荷（大于M1+M2）。
M1：元数据 1
M2：元数据2

然后我会运行一个查询，例如：

获取所有有效负载，其中 M1='search-key1' && M2='search-key2'

是否有意义：

将 M1 和 M2 保留在一个列族中，将 P 保留在另一个列族中？扫描会更快吗？
将所有 3 列保留在同一个列族中？

通常，我会做一个秒杀（我可能仍然需要）-我想我先问了。

我会尝试遵循 HBase Reference 中给出的建议并选择选项 #2（将所有 3 列保留在同一个列族中）：

Try to make do with one column family if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time.

使用多个列族或单个列族时，HBase 扫描的性能会更好吗？

Would an HBase Scan perform better with multiple Column Families or single Column Family?

hbase

mapreduce