如何使用 Scala 计算 Hbase table 上的所有行

Question

我们可以计算所有行数，使用 hbase shell 和这个命令：count 'table_name', INTERVAL=> 1 或简单的 count 'table_name。

但是如何使用 Scala 编程 做到这一点？

Answer 1

虽然我已经完成了 Hbase 的 java 客户端，但我研究并发现了以下内容.. Java方式代码片段：

您可以使用 KeyOnlyFilter() 来仅获取行的键。然后像下面这样循环..

   for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
        number++;
    }

像上面一样，您可以使用下面的 scala hbase 示例..

Please look at the Java API. Adaptation to scala should be relatively easy. The example below shows part of the sample Java code adapted to scala:

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes


val conf = new HBaseConfiguration()
val admin = new HBaseAdmin(conf)

// list the tables
val listtables=admin.listTables() 
listtables.foreach(println)

// let's insert some data in 'mytable' and get the row

val table = new HTable(conf, "mytable")

val theput= new Put(Bytes.toBytes("rowkey1"))

theput.add(Bytes.toBytes("ids"),Bytes.toBytes("id1"),Bytes.toBytes("one"))
table.put(theput)

val theget= new Get(Bytes.toBytes("rowkey1"))
val result=table.get(theget)
val value=result.value()
println(Bytes.toString(value))

However as an additional information(and best way than java or scala) please see below

RowCounter 是一个 mapreduce 作业，用于计算 table 的所有行。这是一个很好的实用程序，可用作完整性检查，以确保 HBase 可以读取 table 的所有块，如果存在任何元数据不一致的问题。它将运行 mapreduce 全部放在一个进程中，但如果您有一个 MapReduce 集群供其利用，它会运行更快。

$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>

Usage: RowCounter [options] 
    <tablename> [          
        --starttime=[start] 
        --endtime=[end] 
        [--range=[startKey],[endKey]] 
        [<column1> <column2>...]
    ]

Answer 2

用java客户端，可以扫描所有table用RowKeyOnlyFilter有效。通过这种方式，您只将行键传输到您的客户端代码，而不是数据，因此速度会更快。这也是 count 'tablename' 在 shell 中所做的。

如何使用 Scala 计算 Hbase table 上的所有行

How to count all rows on Hbase table using Scala

hadoop

hbase

scala

nosql

nosql-aggregation