spark3.0.0 内置的 hivethriftserver。抛出错误 "org.postgresql.Driver" was not found in the CLASSPATH
hivethriftserver built into spark3.0.0. is throwing error "org.postgresql.Driver" was not found in the CLASSPATH
我正在从带有外部 Hive 服务器的 Spark2.3.2 迁移到带有内置 thrift hive 服务器的 Spark3.0.0,但是我无法让 thriftsever 与 spark 捆绑在一起以找到 postgresql 客户端库以连接到外部元存储。
在 Spark2.3.3 中,我只是为 Metastore 设置 $HIVE_HOME/conf/hive-site.xml
选项,将 jars 添加到 $HIVE_HOME/lib
并且一切正常。在 Spark3.0.0 中,我像这样在 $SPARK_HOME/conf/hive-site.xml
中声明了罐子的位置..
<property>
<name>spark.sql.hive.metastore.jars</name>
<value>/usr/lib/postgresql</value>
<description></description>
</property>
但这没有用。我从源代码构建 spark,因此我还尝试将 postgresql 依赖项添加到 spark pom.xml
和 thriftserver pom.xml
文件中,以便 maven 可以在编译期间引入库。但这也没有用,所以我不知道接下来要尝试什么。
这是显示错误的 thriftserver 日志....
20/06/23 03:54:54 INFO SharedState: loading hive config file: file:/opt/spark-3.0.0/conf/hive-site.xml
20/06/23 03:54:54 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark-3.0.0/spark-warehouse').
20/06/23 03:54:54 INFO SharedState: Warehouse path is 'file:/opt/spark-3.0.0/spark-warehouse'.
20/06/23 03:54:55 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.7 using Spark classes.
20/06/23 03:54:55 INFO HiveConf: Found configuration file file:/opt/spark-3.0.0/conf/hive-site.xml
20/06/23 03:54:56 INFO SessionState: Created HDFS directory: /tmp/hive/ubuntu/c3454b03-bce5-4e0e-8a8b-3d7532470b3c
20/06/23 03:54:56 INFO SessionState: Created local directory: /tmp/ubuntu/c3454b03-bce5-4e0e-8a8b-3d7532470b3c
20/06/23 03:54:56 INFO SessionState: Created HDFS directory: /tmp/hive/ubuntu/c3454b03-bce5-4e0e-8a8b-3d7532470b3c/_tmp_space.db
20/06/23 03:54:56 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.7) is file:/opt/spark-3.0.0/spark-warehouse
20/06/23 03:54:57 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
20/06/23 03:54:57 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
20/06/23 03:54:57 INFO HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
20/06/23 03:54:57 INFO ObjectStore: ObjectStore, initialize called
20/06/23 03:54:57 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20/06/23 03:54:57 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
20/06/23 03:54:57 ERROR Datastore: Exception thrown creating StoreManager. See the nested exception
Error creating transactional connection factory
org.datanucleus.exceptions.NucleusException: Error creating transactional connection factory
at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:214)
....
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
... 100 more
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)
这是我的 $SPARK_HOME/conf/hive-site.xml
文件
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://xxxxxxx:54321/db?sslmode=require</value>
<description></description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description></description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>xxxxx</value>
<description>db user</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>xxxxx</value>
<description>db password</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>spark://127.0.1.1:7077</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>/var/log/spark</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>2048m</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>false</value>
<description></description>
</property>
<property>
<name>spark.sql.hive.metastore.jars</name>
<value>builtin</value>
<description></description>
</property>
</configuration>
在 .bashrc
我设置了以下环境变量
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_251
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
启动 thriftserver 时,我使用提供的 start-thriftserver.sh
命令
maven 依赖项的问题似乎与增量 maven 构建没有引入任何新的依赖项有关。一个丑陋的解决方法是像这样强制下载并完成重建...
./build/mvn dependency:purge-local-repository -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver -DskipTests clean package
我正在从带有外部 Hive 服务器的 Spark2.3.2 迁移到带有内置 thrift hive 服务器的 Spark3.0.0,但是我无法让 thriftsever 与 spark 捆绑在一起以找到 postgresql 客户端库以连接到外部元存储。
在 Spark2.3.3 中,我只是为 Metastore 设置 $HIVE_HOME/conf/hive-site.xml
选项,将 jars 添加到 $HIVE_HOME/lib
并且一切正常。在 Spark3.0.0 中,我像这样在 $SPARK_HOME/conf/hive-site.xml
中声明了罐子的位置..
<property>
<name>spark.sql.hive.metastore.jars</name>
<value>/usr/lib/postgresql</value>
<description></description>
</property>
但这没有用。我从源代码构建 spark,因此我还尝试将 postgresql 依赖项添加到 spark pom.xml
和 thriftserver pom.xml
文件中,以便 maven 可以在编译期间引入库。但这也没有用,所以我不知道接下来要尝试什么。
这是显示错误的 thriftserver 日志....
20/06/23 03:54:54 INFO SharedState: loading hive config file: file:/opt/spark-3.0.0/conf/hive-site.xml
20/06/23 03:54:54 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark-3.0.0/spark-warehouse').
20/06/23 03:54:54 INFO SharedState: Warehouse path is 'file:/opt/spark-3.0.0/spark-warehouse'.
20/06/23 03:54:55 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.7 using Spark classes.
20/06/23 03:54:55 INFO HiveConf: Found configuration file file:/opt/spark-3.0.0/conf/hive-site.xml
20/06/23 03:54:56 INFO SessionState: Created HDFS directory: /tmp/hive/ubuntu/c3454b03-bce5-4e0e-8a8b-3d7532470b3c
20/06/23 03:54:56 INFO SessionState: Created local directory: /tmp/ubuntu/c3454b03-bce5-4e0e-8a8b-3d7532470b3c
20/06/23 03:54:56 INFO SessionState: Created HDFS directory: /tmp/hive/ubuntu/c3454b03-bce5-4e0e-8a8b-3d7532470b3c/_tmp_space.db
20/06/23 03:54:56 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.7) is file:/opt/spark-3.0.0/spark-warehouse
20/06/23 03:54:57 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
20/06/23 03:54:57 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
20/06/23 03:54:57 INFO HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
20/06/23 03:54:57 INFO ObjectStore: ObjectStore, initialize called
20/06/23 03:54:57 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20/06/23 03:54:57 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
20/06/23 03:54:57 ERROR Datastore: Exception thrown creating StoreManager. See the nested exception
Error creating transactional connection factory
org.datanucleus.exceptions.NucleusException: Error creating transactional connection factory
at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:214)
....
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)
... 100 more
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:213)
这是我的 $SPARK_HOME/conf/hive-site.xml
文件
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://xxxxxxx:54321/db?sslmode=require</value>
<description></description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description></description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>xxxxx</value>
<description>db user</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>xxxxx</value>
<description>db password</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>spark://127.0.1.1:7077</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>/var/log/spark</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>2048m</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>false</value>
<description></description>
</property>
<property>
<name>spark.sql.hive.metastore.jars</name>
<value>builtin</value>
<description></description>
</property>
</configuration>
在 .bashrc
我设置了以下环境变量
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_251
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
启动 thriftserver 时,我使用提供的 start-thriftserver.sh
命令
maven 依赖项的问题似乎与增量 maven 构建没有引入任何新的依赖项有关。一个丑陋的解决方法是像这样强制下载并完成重建...
./build/mvn dependency:purge-local-repository -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver -DskipTests clean package