spark-cassandra-connector 配置:concurrent.reads vs input.reads_per_sec

spark-cassandra-connector configuration: concurrent.reads vs input.reads_per_sec

阅读时感到困惑https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#read-tuning-parameters

concurrent.reads:为 joinWithCassandra 表设置读取并行度。

input.reads_per_sec:为 joinWithCassandraTable 设置每秒每个核心的最大请求数

Datastax 中 SDE 对 concurrent.reads 的描述:https://groups.google.com/a/lists.datastax.com/d/msg/spark-connector-user/PaQm1LT7Qlk/h41WLnHfBAAJ

Concurrent reads set to 4 means in a 4 core spark executor means, 16 requests will run MAX at the same time.

看起来 concurrent.readsinput.reads_per_sec.

做同样的事情

它们之间真正的区别是什么?

它们不相同,但可以视为相关...

  • concurrent.reads 定义每个核心可以同时发送多少个并发请求(所谓的 in-flight requests)。在某些情况下,您可以将其从默认值降低,以避免 Cassandra 节点因并行处理过多请求而过载;
  • input.reads_per_sec 定义每秒每个核心可以执行多少个请求。