无法在 spark zeppelin 中加载 hivecontext

Question

我已经安装了飞艇。一切正常，除非我尝试导入配置单元上下文。

我在 Zeppelin 上的配置：

System.getenv().get("MASTER")
System.getenv().get("SPARK_YARN_JAR")
System.getenv().get("HADOOP_CONF_DIR")
System.getenv().get("JAVA_HOME")
System.getenv().get("SPARK_HOME")
System.getenv().get("PYSPARK_PYTHON")
System.getenv().get("PYTHONPATH")
System.getenv().get("ZEPPELIN_JAVA_OPTS")

res0: String = yarn-client
res1: String = /home/centos/zeppelin-R-rscala/interpreter/spark/zeppelin-spark-0.6.0-incubating-SNAPSHOT.jar
res2: String = /etc/hadoop/conf
res3: String = /usr/jdk64/jdk1.8.0_60
res4: String = /usr/hdp/2.3.4.0-3485/spark
res5: String = null
res6: String = /usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip:/usr/hdp/current/spark-client/python/:
res7: String = -Dhdp.version=2.3.4.0-3485

我想做什么

%spark
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

我遇到的错误：

 java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)
    at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:193)
    at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:164)
    at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:162)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:415)

请注意，如果我直接从 shell 启动，一切正常。

谢谢

Answer 1

这可能意味着您将 Tez 作为 Hive 的执行引擎。您应该在用于 Spark 的 hive-site.xml 中更改属性：

<property>
  <name>hive.execution.engine</name>
  <value>mr</value>
</property>

就我而言，我不得不通过 Ambari 更改它，但这取决于您的设置。

无法在 spark zeppelin 中加载 hivecontext

cannot load hivecontext in spark zeppelin

hive

scala

apache-spark

apache-zeppelin