spark 3.x 在 HDP 3.1 上以无头模式使用配置单元 - 未找到配置单元表
spark 3.x on HDP 3.1 in headless mode with hive - hive tables not found
如何使用无头 (https://spark.apache.org/docs/latest/hadoop-provided.html) 版本的 spark 在 HDP 3.1 上配置 Spark 3.x 以与 hive 交互?
首先,我下载并解压了headless spark 3.x:
cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it
./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml
spark.sql("show databases").show
// only showing default namespace, existing hive tables are missing
+---------+
|namespace|
+---------+
| default|
+---------+
spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?
注意
这是 for Spark 3.x ond HDP 3.1 and custom spark does not find hive databases when running on yarn 的更新版本。
此外:我知道 spark 中 ACID 配置单元表的问题。现在,我只想能够看到现有的数据库
编辑
我们必须将配置单元罐放到 class 路径上。尝试如下:
export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"
现在使用 spark-sql:
./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml
失败:
Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
即该行:export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"
,没有效果(如果未设置,同样的问题)。
如上所述 custom spark does not find hive databases when running on yarn 需要 Hive JAR。它们不在无头版本中提供。
我无法改造这些。
解决方案:不用担心:只需将 spark 构建与 Hadoop 3.2(在 HDP 3.1 上)一起使用
如何使用无头 (https://spark.apache.org/docs/latest/hadoop-provided.html) 版本的 spark 在 HDP 3.1 上配置 Spark 3.x 以与 hive 交互?
首先,我下载并解压了headless spark 3.x:
cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it
./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml
spark.sql("show databases").show
// only showing default namespace, existing hive tables are missing
+---------+
|namespace|
+---------+
| default|
+---------+
spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?
注意
这是
此外:我知道 spark 中 ACID 配置单元表的问题。现在,我只想能够看到现有的数据库
编辑
我们必须将配置单元罐放到 class 路径上。尝试如下:
export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"
现在使用 spark-sql:
./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml
失败:
Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
即该行:export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"
,没有效果(如果未设置,同样的问题)。
如上所述 custom spark does not find hive databases when running on yarn 需要 Hive JAR。它们不在无头版本中提供。
我无法改造这些。
解决方案:不用担心:只需将 spark 构建与 Hadoop 3.2(在 HDP 3.1 上)一起使用