为什么 Cassandra TableWriter 写入 0 条记录以及如何解决?

Why Cassandra TableWriter writing 0 records and how to fix it?

我正在尝试将 RDD 写入 Cassandra table。 如下图TableWriter多次写入0行,最后写入Cassandra。

18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.171 s.
18/10/22 07:15:50 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 622 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.220 s.
18/10/22 07:15:50 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 665 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.194 s.
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.224 s.
18/10/22 07:15:50 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 708 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.231 s.
18/10/22 07:15:50 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 622 bytes result sent to driver
18/10/22 07:15:50 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 622 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.246 s.
18/10/22 07:15:50 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 708 bytes result sent to driver
18/10/22 07:15:50 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 418 ms on localhost (executor driver) (1/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 433 ms on localhost (executor driver) (2/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 426 ms on localhost (executor driver) (3/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 433 ms on localhost (executor driver) (4/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 456 ms on localhost (executor driver) (5/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 436 ms on localhost (executor driver) (6/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 424 ms on localhost (executor driver) (7/8)
18/10/22 07:15:50 INFO **TableWriter: Wrote 1 rows to log_by_date in 0.342 s.**

为什么之前多次保存失败,如何调整它以适应生产?

这不是 user10465355 指出的故障。当 Spark 将作业分解为任务时,工作可能没有均匀分布,或者没有足够的工作让每个任务都有工作要做。这会导致某些任务为空,因此当它们由 Spark Cassandra Connector 处理时,它们会写入 0 行。

例如说;

  1. 您将 100 条记录读入 10 个 Spark Partitions/Tasks
  2. 你做了一个过滤器,用过滤器消除了值,所以现在 5 个任务中只剩下 30 条记录。其他5个是空的。
  3. 当您写入时,您现在只会看到为 5 个任务写入的记录,并且 5 个任务将报告它们没有写入行。