将 blob 数据从 RDBMS (Sybase) 导入到 Cassandra

Importing blob data from RDBMS (Sybase) to Cassandra

我正在尝试使用 DataStax Enterprise(DSE) 5.0 将大型 blob 数据（大约 10 TB）从 RDBMS (Sybase ASE) 导入 Cassandra。

在 DSE 5.0 中，sqoop 仍然是推荐的方法吗？根据发行说明（http://docs.datastax.com/en/latest-dse/datastax_enterprise/RNdse.html）：

Hadoop and Sqoop are deprecated. Use Spark instead. (DSP-7848)

所以我应该使用 Spark SQL 和 JDBC 数据源从 Sybase 加载数据，然后将数据帧保存到 Cassandra table？

有更好的方法吗？任何 help/suggestions 将不胜感激。

编辑：根据 DSE 文档 (http://docs.datastax.com/en/latest-dse/datastax_enterprise/spark/sparkIntro.html)，不支持从 spark 写入 blob 列。

The following Spark features and APIs are not supported:

Writing to blob columns from Spark

Reading columns of all types is supported; however, you must convert collections of blobs to byte arrays before serialising.

首选用于大型数据集 ETL 的 Spark，因为它执行分布式注入。 Oracle 数据可以加载到 Spark RDD 或数据帧中，然后只需使用 saveToCassandra(keyspace, tablename)。 2016 年 Cassandra 峰会有 Jim Hatcher 的演讲 Using Spark to Load Oracle Data into Cassandra，其中深入讨论了这个主题并提供了示例。

Sqoop 已被弃用，但在 DSE 5.0 中仍应有效。如果它是一次性加载并且您已经习惯了 Squoop，请尝试一下。

将 blob 数据从 RDBMS (Sybase) 导入到 Cassandra

Importing blob data from RDBMS (Sybase) to Cassandra

cassandra

datastax-enterprise

datastax