升级集群的 Databricks Runtime 后调试 PySpark 时出错

Error debugging PySpark after upgrading cluster's Databricks Runtime

我已将 Azure Databricks 集群从运行时 5.5LTS 更新到 7.3LTS。现在我在 VSCode 中调试时遇到错误。我已经像这样更新了我的 Anaconda 连接:

> conda create --name dbconnect python=3.7
> conda activate dbconnect
> pip uninstall pyspark
> pip install -U databricks-connect==7.3.*
> databricks-connect configure
> databricks-connect test

到目前为止一切顺利,但现在我正在尝试调试以下内容

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
setting = spark.conf.get("spark.master")

if "local" in setting:
    from pyspark.dbutils import DBUtils
    dbutils = DBUtils(spark.sparkContext)

dbutils = DBUtils(spark.sparkContext)上抛出异常

Exception has occurred: AttributeError 'SparkContext' object has no attribute 'conf'

我已经尝试创建 conf

from pyspark.dbutils import DBUtils
import pyspark
conf = pyspark.SparkConf()
pyspark.SparkContext.getOrCreate(conf=conf)
dbutils = DBUtils(spark.sparkContext)

但我仍然遇到同样的错误。有人可以告诉我我做错了什么吗?

根据文档 Access DBUtils,您需要传递 SparkSession spark 而不是 SparkContext :

from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils

spark = SparkSession.builder.getOrCreate()

dbutils = DBUtils(spark)