用 scala 迭代 accumulo table
Iterating over accumulo table with scala
我有一个 table in accumulo 命名记录,每个 row_id 有几个家族和限定词,在 accumulo shell.
中看起来像这样
michaelp@accumulo records> scan
2016-10-17 16:27:55,359 [Shell.audit] INFO : michaelp@accumulo records> scan
E001 department:sales [] 0
E001 hire_date:20160101 [] 0
E001 name:bob [] 0
E001 name:jerry [] 0
E002 department:marketing [] 0
E002 hire_date:20160202 [] 0
E002 name:sarah [] 0
E003 department:engineering [] 0
E003 hire_date:20160303 [] 0
E003 name:joe [] 0
我希望能够使用 Scala 连接器扫描这两行。在所需的导入之后,我的代码如下所示:
var opts = new ClientOnRequiredTable()
var bsOpts = new BatchScannerOpts()
opts.parseArgs("test", Array("-t", "records","-u", "michaelp", "-p", "****", "-z", "zookeeper:2181", "-i", "accumulo"), bsOpts)
var connector = opts.getConnector()
var batchReader = connector.createBatchScanner("records", opts.auths, bsOpts.scanThreads)
batchReader.setTimeout(bsOpts.scanTimeout, TimeUnit.MILLISECONDS)
var x = new Range()
var y = new LinkedList[Range]
y.add(x)
batchReader.setRanges(y)
我传入一个空范围以获取 table 中的每一行。问题是当我尝试遍历结果时。它粘在第一行。
scala> while (batchReader.iterator.hasNext()) {println(batchReader.iterator.next.getKey().toString())}
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
...
那么为什么迭代器不移动?
因为当您调用 batchReader.iterator
时,每次都会创建新的迭代器。而是做下面的事情
val iterator = batchReader.iterator
while(iterator.hasNext) {
println(iterator.next.getKey().toString())
}
我有一个 table in accumulo 命名记录,每个 row_id 有几个家族和限定词,在 accumulo shell.
中看起来像这样michaelp@accumulo records> scan
2016-10-17 16:27:55,359 [Shell.audit] INFO : michaelp@accumulo records> scan
E001 department:sales [] 0
E001 hire_date:20160101 [] 0
E001 name:bob [] 0
E001 name:jerry [] 0
E002 department:marketing [] 0
E002 hire_date:20160202 [] 0
E002 name:sarah [] 0
E003 department:engineering [] 0
E003 hire_date:20160303 [] 0
E003 name:joe [] 0
我希望能够使用 Scala 连接器扫描这两行。在所需的导入之后,我的代码如下所示:
var opts = new ClientOnRequiredTable()
var bsOpts = new BatchScannerOpts()
opts.parseArgs("test", Array("-t", "records","-u", "michaelp", "-p", "****", "-z", "zookeeper:2181", "-i", "accumulo"), bsOpts)
var connector = opts.getConnector()
var batchReader = connector.createBatchScanner("records", opts.auths, bsOpts.scanThreads)
batchReader.setTimeout(bsOpts.scanTimeout, TimeUnit.MILLISECONDS)
var x = new Range()
var y = new LinkedList[Range]
y.add(x)
batchReader.setRanges(y)
我传入一个空范围以获取 table 中的每一行。问题是当我尝试遍历结果时。它粘在第一行。
scala> while (batchReader.iterator.hasNext()) {println(batchReader.iterator.next.getKey().toString())}
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
E001 department:sales [] 1476720996135 false
...
那么为什么迭代器不移动?
因为当您调用 batchReader.iterator
时,每次都会创建新的迭代器。而是做下面的事情
val iterator = batchReader.iterator
while(iterator.hasNext) {
println(iterator.next.getKey().toString())
}