spark-sql : Error in session initiation NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
spark-sql : Error in session initiation NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
我在启动 spark-sql 会话时遇到问题。
最初,当我启动 spark 会话时,只有默认数据库可见(不是 Hive 的默认数据库,而是 Spark 的默认数据库)。
为了查看配置单元数据库,我将 hive-site.xml 从 hive-conf 目录复制到 spark-conf 目录。在我复制 hive-site.xml 之后,我遇到了以下错误。
$ spark-sql
WARN HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist
WARN HiveConf: HiveConf of name hive.heapsize does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
WARN HiveConf: HiveConf of name hive.materializedview.rewriting.incremental does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist
WARN HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist
WARN HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist
WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
WARN HiveConf: HiveConf of name hive.load.data.owner does not exist
WARN HiveConf: HiveConf of name hive.execution.mode does not exist
WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist
WARN HiveConf: HiveConf of name hive.strict.managed.tables does not exist
WARN HiveConf: HiveConf of name hive.create.as.insert.only does not exist
WARN HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist
WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist
WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist
WARN HiveConf: HiveConf of name hive.metastore.transactional.event.listeners does not exist
WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist
INFO metastore: Trying to connect to metastore with URI thrift://<host-name>:9083
INFO metastore: Connected to metastore.
INFO SessionState: Created local directory: /tmp/7b9d5455-e71a-4bd5-aa4b-385758b575a8_resources
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created local directory: /tmp/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8/_tmp_space.db
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
INFO ShutdownHookManager: Shutdown hook called
INFO ShutdownHookManager: Deleting directory /tmp/spark-911cc8f5-f53b-4ae6-add3-0c745581bead
$
我能够 运行 pyspark 和 spark-shell 会话成功,我可以在 pyspark/spark-shell 会话中看到 Hive 数据库。
该错误与 tez 相关,我确认 tez 服务 运行 正常。我成功地能够通过 hive2 访问配置单元表。
我正在使用 HDP3.0,Hive 执行引擎是 Tez(Map-Reduce 已被删除)。
出现这个问题是因为我将 hive-site.xml 从 /etc/hive/conf 复制到 /etc/spark/conf 以查看 spark 和 Hive3 中的数据库。 1 hive-site.xml 包含与 spark-sql.
不兼容的新添加属性的详尽列表
- 堆栈版本:
HDP3.0
Spark2.3
Hive3.1
所以在将 hive-site.xml 复制到 /etc/spark/conf 之后,我从 spark /etc/spark/conf/hive-site.xml
中删除了以下属性
hive.tez.cartesian-product.enabled
hive.metastore.warehouse.external.dir
hive.server2.webui.use.ssl
hive.heapsize
hive.server2.webui.port
hive.materializedview.rewriting.incremental
hive.server2.webui.cors.allowed.headers
hive.driver.parallel.compilation
hive.tez.bucket.pruning
hive.hook.proto.base-directory
hive.load.data.owner
hive.execution.mode
hive.service.metrics.codahale.reporter.classes
hive.strict.managed.tables
hive.create.as.insert.only
hive.optimize.dynamic.partition.hashjoin
hive.server2.webui.enable.cors
hive.metastore.db.type
hive.txn.strict.locking.mode
hive.metastore.transactional.event.listeners
hive.tez.input.generate.consistent.splits
我在启动 spark-sql 会话时遇到问题。
最初,当我启动 spark 会话时,只有默认数据库可见(不是 Hive 的默认数据库,而是 Spark 的默认数据库)。
为了查看配置单元数据库,我将 hive-site.xml 从 hive-conf 目录复制到 spark-conf 目录。在我复制 hive-site.xml 之后,我遇到了以下错误。
$ spark-sql
WARN HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist
WARN HiveConf: HiveConf of name hive.heapsize does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
WARN HiveConf: HiveConf of name hive.materializedview.rewriting.incremental does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist
WARN HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist
WARN HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist
WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
WARN HiveConf: HiveConf of name hive.load.data.owner does not exist
WARN HiveConf: HiveConf of name hive.execution.mode does not exist
WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist
WARN HiveConf: HiveConf of name hive.strict.managed.tables does not exist
WARN HiveConf: HiveConf of name hive.create.as.insert.only does not exist
WARN HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist
WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist
WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist
WARN HiveConf: HiveConf of name hive.metastore.transactional.event.listeners does not exist
WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist
INFO metastore: Trying to connect to metastore with URI thrift://<host-name>:9083
INFO metastore: Connected to metastore.
INFO SessionState: Created local directory: /tmp/7b9d5455-e71a-4bd5-aa4b-385758b575a8_resources
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created local directory: /tmp/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8/_tmp_space.db
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
INFO ShutdownHookManager: Shutdown hook called
INFO ShutdownHookManager: Deleting directory /tmp/spark-911cc8f5-f53b-4ae6-add3-0c745581bead
$
我能够 运行 pyspark 和 spark-shell 会话成功,我可以在 pyspark/spark-shell 会话中看到 Hive 数据库。
该错误与 tez 相关,我确认 tez 服务 运行 正常。我成功地能够通过 hive2 访问配置单元表。
我正在使用 HDP3.0,Hive 执行引擎是 Tez(Map-Reduce 已被删除)。
出现这个问题是因为我将 hive-site.xml 从 /etc/hive/conf 复制到 /etc/spark/conf 以查看 spark 和 Hive3 中的数据库。 1 hive-site.xml 包含与 spark-sql.
不兼容的新添加属性的详尽列表- 堆栈版本:
HDP3.0
Spark2.3
Hive3.1
所以在将 hive-site.xml 复制到 /etc/spark/conf 之后,我从 spark /etc/spark/conf/hive-site.xml
中删除了以下属性hive.tez.cartesian-product.enabled
hive.metastore.warehouse.external.dir
hive.server2.webui.use.ssl
hive.heapsize
hive.server2.webui.port
hive.materializedview.rewriting.incremental
hive.server2.webui.cors.allowed.headers
hive.driver.parallel.compilation
hive.tez.bucket.pruning
hive.hook.proto.base-directory
hive.load.data.owner
hive.execution.mode
hive.service.metrics.codahale.reporter.classes
hive.strict.managed.tables
hive.create.as.insert.only
hive.optimize.dynamic.partition.hashjoin
hive.server2.webui.enable.cors
hive.metastore.db.type
hive.txn.strict.locking.mode
hive.metastore.transactional.event.listeners
hive.tez.input.generate.consistent.splits