如何在本地环境中正确配置 gcs-connector
How to config gcs-connector in local environment properly
我正在尝试在我的 scala 项目中配置 gcs-connector,但我总是得到 java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
这是我的项目配置:
val sparkConf = new SparkConf()
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "2")
.set("spark.driver.memory", "4g")
.set("temporaryGcsBucket", "some-bucket")
val spark = SparkSession.builder()
.config(sparkConf)
.master("spark://spark-master:7077")
.getOrCreate()
val hadoopConfig = spark.sparkContext.hadoopConfiguration
hadoopConfig.set("fs.gs.auth.service.account.enable", "true")
hadoopConfig.set("fs.gs.auth.service.account.json.keyfile", "./path-to-key-file.json")
hadoopConfig.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
hadoopConfig.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
我尝试使用以下两种方式设置 gcs-connector:
.set("spark.jars.packages", "com.google.cloud.bigdataoss:gcs-connector:hadoop2-2.1.6")
.set("spark.driver.extraClassPath", ":/home/celsomarques/Desktop/gcs-connector-hadoop2-2.1.6.jar")
但它们都没有将指定的class加载到class路径。
你能指出我做错了什么吗?
以下配置有效:
val sparkConf = new SparkConf()
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "2")
.set("spark.driver.memory", "4g")
val spark = SparkSession.builder()
.config(sparkConf)
.master("local")
.getOrCreate()
我正在尝试在我的 scala 项目中配置 gcs-connector,但我总是得到 java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
这是我的项目配置:
val sparkConf = new SparkConf()
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "2")
.set("spark.driver.memory", "4g")
.set("temporaryGcsBucket", "some-bucket")
val spark = SparkSession.builder()
.config(sparkConf)
.master("spark://spark-master:7077")
.getOrCreate()
val hadoopConfig = spark.sparkContext.hadoopConfiguration
hadoopConfig.set("fs.gs.auth.service.account.enable", "true")
hadoopConfig.set("fs.gs.auth.service.account.json.keyfile", "./path-to-key-file.json")
hadoopConfig.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
hadoopConfig.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
我尝试使用以下两种方式设置 gcs-connector:
.set("spark.jars.packages", "com.google.cloud.bigdataoss:gcs-connector:hadoop2-2.1.6")
.set("spark.driver.extraClassPath", ":/home/celsomarques/Desktop/gcs-connector-hadoop2-2.1.6.jar")
但它们都没有将指定的class加载到class路径。
你能指出我做错了什么吗?
以下配置有效:
val sparkConf = new SparkConf()
.set("spark.executor.memory", "4g")
.set("spark.executor.cores", "2")
.set("spark.driver.memory", "4g")
val spark = SparkSession.builder()
.config(sparkConf)
.master("local")
.getOrCreate()