spark 3.x 在 HDP 3.1 上以无头模式使用配置单元 - 未找到配置单元表

Question

如何使用无头 (https://spark.apache.org/docs/latest/hadoop-provided.html) 版本的 spark 在 HDP 3.1 上配置 Spark 3.x 以与 hive 交互？

首先，我下载并解压了headless spark 3.x:

cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
 
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it

./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

spark.sql("show databases").show
// only showing default namespace, existing hive tables are missing
+---------+
|namespace|
+---------+
|  default|
+---------+

spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?

注意

这是 for Spark 3.x ond HDP 3.1 and custom spark does not find hive databases when running on yarn 的更新版本。

此外：我知道 spark 中 ACID 配置单元表的问题。现在，我只想能够看到现有的数据库

编辑

我们必须将配置单元罐放到 class 路径上。尝试如下：

 export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"

现在使用 spark-sql:

./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

失败：

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.

即该行：export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"，没有效果（如果未设置，同样的问题）。

Answer 1

如上所述 custom spark does not find hive databases when running on yarn 需要 Hive JAR。它们不在无头版本中提供。

我无法改造这些。

解决方案：不用担心：只需将 spark 构建与 Hadoop 3.2（在 HDP 3.1 上）一起使用

spark 3.x 在 HDP 3.1 上以无头模式使用配置单元 - 未找到配置单元表

spark 3.x on HDP 3.1 in headless mode with hive - hive tables not found

hive

hortonworks-data-platform

apache-spark

apache-spark-sql

hive-metastore

注意

编辑