使用 phoenix-spark 插件插入 ARRAY 类型

Question

我有问题。我有一个 Spark RDD，我必须将其存储在 HBase table 中。我们使用 Apache-phoenix 层与数据库对话。 table 的一列定义为 UNSIGNED_SMALLINT ARRAY:

CREATE TABLE EXAMPLE (..., Col10 UNSIGNED_SMALLINT ARRAY, ...);

如 Phoenix 文档中所述，您可以很好地 here，ARRAY 数据类型由 java.sql.Array 支持。

我正在使用 phoenix-spark 插件将 RDD 的数据保存在 table 中。问题是我不知道如何创建 java.sql.Array 的实例，没有任何类型的 Connection 对象。代码摘录如下（代码使用 Scala 语言）：

// Map RDD into an RDD of sequences or tuples
rdd.map {
  value =>
    (/* ... */
     value.getArray(),   // Array of Int to convert into an java.sql.Array
     /* ... */
    )
}.saveToPhoenix("EXAMPLE", Seq(/* ... */, "Col10", /* ... */), conf, zkUrl)

正确的继续方法是什么？有什么办法可以满足我的需要吗？

Answer 1

Phoenix 的人已经通过电子邮件回答了上述问题。我把答案报告给即将到来的人留下智慧。

For saving arrays, you can use the plain old scala Array type. You can see the tests for an example: https://github.com/apache/phoenix/blob/master/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L408-L427

Note that saving arrays is only supported in Phoenix 4.5.0, although the patch is quite small if you need to apply it yourself: https://issues.apache.org/jira/browse/PHOENIX-1968

很好的答案。感谢 Phoenix 的朋友们。

使用 phoenix-spark 插件插入 ARRAY 类型

Using phoenix-spark plugin to insert an ARRAY Type

arrays

hbase

scala

phoenix

apache-spark