如何从 google 数据过程中的 bigtable 读取数据
How to read data from bigtable in google data proc
我正在尝试从 Google 云数据过程中的 Bigtable 读取数据。
下面是我用来从 Bigdtable 读取数据的代码。
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
options.setRunner(BlockingDataflowPipelineRunner.class);
Scan scan = new Scan();
scan.setFilter(new FirstKeyOnlyFilter());
Pipeline p = Pipeline.create(options);
p.apply(Read.from(CloudBigtableIO.read(new CloudBigtableScanConfiguration.Builder()
.withProjectId("xxxxxxxx").withZoneId("xxxxxxx")
.withClusterId("xxxxxx").withTableId("xxxxx").withScan(scan).build())))
.apply(ParDo.named("Reading data from big table").of(new DoFn<Result, Mutation>() {
@Override
public void processElement(DoFn<Result, Mutation>.ProcessContext arg0) throws Exception {
System.out.println("Inside printing");
if (arg0==null)
{
System.out.println("arg0 is null");
} else
{
System.out.println("arg0 is not null");
System.out.println(arg0.element());
}
}
}));
p.run();
每当我在我的方法中调用 arg0.element() 时,我都会遇到以下错误。
2017-03-21T12:29:28.884Z: Error: (deec5a839a59cbca): java.lang.ArrayIndexOutOfBoundsException: 12338
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1231)
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1190)
at com.google.bigtable.repackaged.com.google.cloud.hbase.adapters.read.RowCell.toString(RowCell.java:234)
at org.apache.hadoop.hbase.client.Result.toString(Result.java:804)
at java.lang.String.valueOf(String.java:2994)
at java.io.PrintStream.println(PrintStream.java:821)
at com.slb.StarterPipeline.processElement(StarterPipeline.java:102)
谁能告诉我我做错了什么。
不幸的是,这是一个 known issue。我们修复了底层实现,我们希望在下周左右发布客户端的新版本。我建议更改此行:
System.out.println(arg0.element());
类似于:
System.out.println(Bytes.toStringBinary(arg0.element().getRow());
抱歉给您带来麻烦。
我正在尝试从 Google 云数据过程中的 Bigtable 读取数据。 下面是我用来从 Bigdtable 读取数据的代码。
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
options.setRunner(BlockingDataflowPipelineRunner.class);
Scan scan = new Scan();
scan.setFilter(new FirstKeyOnlyFilter());
Pipeline p = Pipeline.create(options);
p.apply(Read.from(CloudBigtableIO.read(new CloudBigtableScanConfiguration.Builder()
.withProjectId("xxxxxxxx").withZoneId("xxxxxxx")
.withClusterId("xxxxxx").withTableId("xxxxx").withScan(scan).build())))
.apply(ParDo.named("Reading data from big table").of(new DoFn<Result, Mutation>() {
@Override
public void processElement(DoFn<Result, Mutation>.ProcessContext arg0) throws Exception {
System.out.println("Inside printing");
if (arg0==null)
{
System.out.println("arg0 is null");
} else
{
System.out.println("arg0 is not null");
System.out.println(arg0.element());
}
}
}));
p.run();
每当我在我的方法中调用 arg0.element() 时,我都会遇到以下错误。
2017-03-21T12:29:28.884Z: Error: (deec5a839a59cbca): java.lang.ArrayIndexOutOfBoundsException: 12338
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1231)
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1190)
at com.google.bigtable.repackaged.com.google.cloud.hbase.adapters.read.RowCell.toString(RowCell.java:234)
at org.apache.hadoop.hbase.client.Result.toString(Result.java:804)
at java.lang.String.valueOf(String.java:2994)
at java.io.PrintStream.println(PrintStream.java:821)
at com.slb.StarterPipeline.processElement(StarterPipeline.java:102)
谁能告诉我我做错了什么。
不幸的是,这是一个 known issue。我们修复了底层实现,我们希望在下周左右发布客户端的新版本。我建议更改此行:
System.out.println(arg0.element());
类似于:
System.out.println(Bytes.toStringBinary(arg0.element().getRow());
抱歉给您带来麻烦。