Spark SQL 配置

Spark SQL Configurations

我们可以使用 SPARK SQL 设置什么类型的参数?我的假设是 Spark 接受以 spark.sql 为前缀的参数,并忽略任何不以 spark.sql 开头的参数,其他参数只能在 Spark 会话创建期间添加。

比方说,spark.sql.autoBroadcastJoinThresholdspark.sql.broadcastTimeout 等被接受,spark.maxRemoteBlockSizeFetchToMemspark.driver.memory 等被忽略。如果我的理解不正确,请告诉我。

Spark SQL 具有静态和运行时配置。可以查阅 the online docs 以查看特定配置是否具有上下文、会话或查询范围。

Runtime SQL configurations are per-session, mutable Spark SQL configurations. They can be set with initial values by the config file and command-line options with --conf/-c prefixed, or by setting SparkConf that are used to create SparkSession. Also, they can be set and queried by SET commands and rest to their initial values by RESET command, or by SparkSession.conf’s setter and getter methods in runtime.
:
:
Static SQL configurations are cross-session, immutable Spark SQL configurations. They can be set with final values by the config file and command-line options with --conf/-c prefixed, or by setting SparkConf that are used to create SparkSession. External users can query the static sql config values via SparkSession.conf or via set command, e.g. SET spark.sql.extensions;, but cannot set/unset them.