Hbase region数量持续增长

Question

我们使用的是 hbase 版本 1.1.4。数据库有大约 40 table 秒，每个 table 数据都有指定的 TimeToLive。部署在5节点集群上，下面是hbase-site.xml

<property>
<name>phoenix.query.threadPoolSize</name>
<value>2048</value>
</property>

<property>
<name>hbase.hregion.max.filesize</name>
<value>21474836480</value>
</property>

<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>4</value>
</property>
<!-- default is 64MB 67108864 -->
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>536870912</value>
</property>
<!-- default is 7, should be at least 2x compactionThreshold -->
<property>
<name>hbase.hstore.blockingStoreFiles</name>
<value>240</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>

<property>
<name>hbase.bucketcache.ioengine</name>
<value>offheap</value>
</property>
<property>
<name>hbase.bucketcache.size</name>
<value>40960</value>
</property>

问题是每个区域服务器上的区域数量都在不断增长。目前我们仅使用

合并区域

merge_region in the hbase shell.

有没有办法在每个服务器上只拥有固定数量的区域，或者有一种自动合并区域的方法？

Answer 1

Is there any way to have only a fixed number of regions, on each server, or an automated way to merge the regions?

我实现它的一种方法是使用预分割区域创建 table。例如

create 'test_table', 'f1', SPLITS=> ['1', '2', '3', '4', '5', '6', '7', '8', '9']

设计好的 rowkey 将以 1-9 开头

您可以像下面这样使用 guava murmur hash。

import com.google.common.hash.HashCode;
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;

/**
     * getMurmurHash.
     * 
     * @param content
     * @return HashCode
     */
    public static HashCode getMurmurHash(String content) {
        final HashFunction hf = Hashing.murmur3_128();
        final HashCode hc = hf.newHasher().putString(content, Charsets.UTF_8).hash();
        return hc;
    }

final long hash = getMurmur128Hash(Bytes.toString(yourrowkey as string)).asLong();
            final int prefix = Math.abs((int) hash % 9);

现在将此前缀附加到您的行键

例如

1rowkey1 // 将进入第一个区域
2rowkey2 // 将进入第二个区域
3rowkey3 // 将进入第三区域
...
9rowkey9 // 将进入第九个区域

如果您正在做 pre-splitting，并且想手动管理区域分割，您也可以禁用区域分割，方法是将 hbase.hregion.max.filesize 设置为高数字并将分割策略设置为 ConstantSizeRegionSplitPolicy.但是，您应该使用 100GB 之类的保护值，这样区域的增长就不会超出区域服务器的能力。您可以考虑禁用自动拆分并依赖 pre-splitting 中的初始区域集，例如，如果您对键前缀使用统一哈希，并且您可以确保 read/write 加载到每个区域而且它的大小在 table.

的各个区域都是统一的

还有，看at

Answer 2

好吧，这主要取决于您的数据：它是如何跨键分布的。假设所有键的值大小几乎相同，您可以使用分区：

例如，如果您的 table 密钥是 String 并且您想要 100 个区域，请使用此

public static byte[] hashKey(String key) {
    int partition = Math.abs(key.hashCode() % 100);
    String prefix = partitionPrefix(partition);
    return Bytes.add(Bytes.toBytes(prefix), ZERO_BYTE, key);
}

public static String partitionPrefix(int partition) {
    return StringUtils.leftPad(String.valueOf(partition), 2, '0');
}

在这种情况下，您所有的键都将以数字 00-99 为前缀，因此您有 100 个分区用于 100 个区域。现在您可以禁用区域分割：

HTableDescriptor td = new HTableDescriptor(TableName.valueOf("myTable"));
td.setRegionSplitPolicyClassName("org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy");

或通过shell

alter 'myTable', {TABLE_ATTRIBUTES => {METADATA => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}}

Hbase region数量持续增长

Hbase number of regions keep growing

hadoop

hbase

apache-spark