如何从 Databricks 中的 JSON 或字典或键值对格式创建 Apache Spark DataFrame

How to Create an Apache Spark DataFrame from JSON or Dictonary or Key Value pairs format in Databricks

来自以下字典的Key:Value对如下

result.items()

dict_items([('subjectArea', 'Work'), ('txn-timestamp', '2022-01-05 11:31:10'), ('foundation', {'schema': 'AZ_FH_ELLIPSE', 'table': 'AZ_FND_MSF620', 'keys': [{'key': 'DSTRCT_CODE', 'value': 'RTK1'}, {'key': 'WORKORDER', 'value': '11358186'}], 'dataMart': {'dependencies': [{'schema': 'AZ_DM_WORK', 'table': 'DIM_WORK_ORDER'}, {'schema': 'AZ_DM_WORK', 'table': 'FACT_WORK_ITEM'}]}})])

有人可以告诉我是否可以将以上内容转换为 Spark DataFrame 吗? 抱歉,我不确定如何进行换行以使代码看起来更整洁

下面是pyspark代码:

from pyspark.sql.types import *
items1 = [{'subjectArea': 'Work', 'txn-timestamp': '2022-01-05 11:31:10', 'foundation': {'schema': 'AZ_FH_ELLIPSE', 'table': 'AZ_FND_MSF620', 'keys': [{'key': 'DSTRCT_CODE', 'value': 'RTK1'}, {'key': 'WORKORDER', 'value': '11358186'}], 'dataMart': {'dependencies': [{'schema': 'AZ_DM_WORK', 'table': 'DIM_WORK_ORDER'}, {'schema': 'AZ_DM_WORK', 'table': 'FACT_WORK_ITEM'}]}}}]

schema1 = StructType([StructField('subjectArea',StringType()),StructField('txn-timestamp',StringType()),StructField('foundation',StructType([StructField('schema',StringType()),StructField('table',StringType()),StructField('keys',StructType([StructField('key',StringType()),StructField('value',StringType())])),StructField('dataMart',StructType([StructField('dependencies',StructType([StructField('schema',StringType()),StructField('table',StringType())]))]))]))])

ddf = spark.createDataFrame(items1, schema1)
ddf.printSchema()
ddf.show()
ddf.select(ddf['foundation'].datamart).show(truncate=False)