json 数据源中的变量结构

Question

感谢您的宝贵时间。

我在 Databricks 的 pyspark 中有一个数据框，上面写着 json。来自源的数据并不总是具有相同的结构，有时不会出现 'emailAddress' 字段，导致我出现错误 "org.apache.spark.sql.AnalysisException: cannot解决..."。

我试图通过以这种方式应用 Try-Except 函数来解决：

try:
  df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet", "emailAddress")

except ValueError:
  None

但这对我不起作用，它 returns 与我提到的相同的错误。

我什至尝试了另一种选择但没有结果：

 if 'Id_Cliente' in s_fields:  
  try:
    df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet", "emailAddress")
  except ValueError:
    df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet")

请帮我出出主意控制这种情况？当笔记本在结构中找不到该字段时，我需要停止执行笔记本，否则（它找到 emailAddress 变量）继续处理。

从已经非常感谢你。

你好。

Answer 1

您正在捕获 ValueError 而异常是 AnalysisException，这就是它不起作用的原因。

from pyspark.sql.utils import AnalysisException

try:
    df.select('xyz')
except AnalysisException:
    print(123)

json 数据源中的变量结构

variable structure in json data source

python

pyspark

databricks