pyspark.sql.utils.IllegalArgumentException: 要求失败: 临时 GCS 路径尚未设置

Question

在 Google Cloud Platform 上，我正在尝试提交一个将数据帧写入 BigQuery 的 pyspark 作业。执行写入的代码如下：

finalDF.write.format("bigquery")\
.mode('overwrite')\
.option("table","[PROJECT_ID].dataset.table")\
.save()

我得到了标题中提到的错误。如何设置 GCS 临时路径？

Answer 1

正如 github repository of spark-bigquery-connector 所述

写的时候可以指定:

df.write
.format("bigquery")
.option("temporaryGcsBucket","some-bucket")
.save("dataset.table")

或以全局方式：

spark.conf.set("temporaryGcsBucket","some-bucket")

Answer 2

属性 "temporaryGcsBucket" 需要在写入数据帧或创建 sparkSession 时设置。

.option("temporaryGcsBucket","some-bucket")

或喜欢 .option("temporaryGcsBucket","some-bucket/optional_path")

1. finalDF.write.format("bigquery") .mode('overwrite').option("temporaryGcsBucket","some-bucket").option("table","[PROJECT_ID].dataset.table") .save()

pyspark.sql.utils.IllegalArgumentException: 要求失败: 临时 GCS 路径尚未设置

pyspark.sql.utils.IllegalArgumentException: requirement failed: Temporary GCS path has not been set

google-bigquery

google-cloud-dataproc