使用 spark 读取雪花 table 时遇到 classnotfound 异常
Facing classnotfound exception while reading a snowflake table using spark
我正在尝试从 spark-shell 读取雪花 table。为此,我执行了以下操作。
pyspark --jars spark-snowflake_2.11-2.8.0-spark_2.4.jar,jackson-dataformat-xml-2.10.3.jar
Using Python version 2.7.5 (default, Feb 20 2018 09:19:12)
SparkSession available as 'spark'.
>>> from pyspark import SparkConf, SparkContext
>>> from pyspark.sql import SQLContext
>>> from pyspark.sql.types import *
>>> from pyspark import SparkConf, SparkContext
>>> sc = SparkContext("local", "Simple App")
>>> spark = SQLContext(sc)
>>> spark_conf = SparkConf().setMaster('local').setAppName('CHECK')
>>> sfOptions = {
... "sfURL" : "url",
... "sfAccount" : "acntname",
... "sfUser" : 'username',
... "sfPassword" : 'pwd',
... "sfRole" : 'role',
... "sfDatabase" : 'dbname',
... "sfSchema" : 'schema',
... "sfWarehouse" : 'warehousename'
... }
>>> SNOWFLAKE_SOURCE = 'net.snowflake.spark.snowflake'
>>> df = spark.read.format(SNOWFLAKE_SOURCE).options(**sfOptions).option("query","select column from schema.table limit 1").load()
一旦我 运行 加载语句,我将面临以下 classnotfound 异常:
Caused by: java.lang.ClassNotFoundException: net.snowflake.client.jdbc.internal.fasterxml.jackson.databind.ObjectMapper
在上面的操作中,除了读取雪花什么都没有table并且根据documentation,我传递了所需的jar文件我启动了spark-shell.
我启动pyspark时加载的spark版本是version 2.3.2.3.1.5.37-1
我尝试使用 2.3/2.4/2.8/3.0
的多个版本的雪花连接器,我也传递了 jar 文件 jackson-dataformat-xml-2.10.3.jar
但我仍然看到相同的异常。
任何人都可以让我知道我在这里做错了什么,我该如何纠正?
你应该运行
pyspark --jars spark-snowflake_2.11-2.8.0-spark_2.4.jar,snowflake-jdbc-3.12.5.jar
代码相关问题:
spark = SparkSession \
.builder \
.config("spark.jars", "<pathto>/snowflake-jdbc-3.12.5.jar,<pathto>/spark-snowflake_2.11-2.7.1-spark_2.4.jar") \
.config("spark.repl.local.jars",
"<pathto>/snowflake-jdbc-3.12.5.jar,<pathto>/spark-snowflake_2.11-2.7.1-spark_2.4.jar") \
.config("spark.sql.catalogImplementation", "in-memory") \
.getOrCreate()
我正在尝试从 spark-shell 读取雪花 table。为此,我执行了以下操作。
pyspark --jars spark-snowflake_2.11-2.8.0-spark_2.4.jar,jackson-dataformat-xml-2.10.3.jar
Using Python version 2.7.5 (default, Feb 20 2018 09:19:12)
SparkSession available as 'spark'.
>>> from pyspark import SparkConf, SparkContext
>>> from pyspark.sql import SQLContext
>>> from pyspark.sql.types import *
>>> from pyspark import SparkConf, SparkContext
>>> sc = SparkContext("local", "Simple App")
>>> spark = SQLContext(sc)
>>> spark_conf = SparkConf().setMaster('local').setAppName('CHECK')
>>> sfOptions = {
... "sfURL" : "url",
... "sfAccount" : "acntname",
... "sfUser" : 'username',
... "sfPassword" : 'pwd',
... "sfRole" : 'role',
... "sfDatabase" : 'dbname',
... "sfSchema" : 'schema',
... "sfWarehouse" : 'warehousename'
... }
>>> SNOWFLAKE_SOURCE = 'net.snowflake.spark.snowflake'
>>> df = spark.read.format(SNOWFLAKE_SOURCE).options(**sfOptions).option("query","select column from schema.table limit 1").load()
一旦我 运行 加载语句,我将面临以下 classnotfound 异常:
Caused by: java.lang.ClassNotFoundException: net.snowflake.client.jdbc.internal.fasterxml.jackson.databind.ObjectMapper
在上面的操作中,除了读取雪花什么都没有table并且根据documentation,我传递了所需的jar文件我启动了spark-shell.
我启动pyspark时加载的spark版本是version 2.3.2.3.1.5.37-1
我尝试使用 2.3/2.4/2.8/3.0
的多个版本的雪花连接器,我也传递了 jar 文件 jackson-dataformat-xml-2.10.3.jar
但我仍然看到相同的异常。
任何人都可以让我知道我在这里做错了什么,我该如何纠正?
你应该运行
pyspark --jars spark-snowflake_2.11-2.8.0-spark_2.4.jar,snowflake-jdbc-3.12.5.jar
代码相关问题:
spark = SparkSession \
.builder \
.config("spark.jars", "<pathto>/snowflake-jdbc-3.12.5.jar,<pathto>/spark-snowflake_2.11-2.7.1-spark_2.4.jar") \
.config("spark.repl.local.jars",
"<pathto>/snowflake-jdbc-3.12.5.jar,<pathto>/spark-snowflake_2.11-2.7.1-spark_2.4.jar") \
.config("spark.sql.catalogImplementation", "in-memory") \
.getOrCreate()