如何从 Databricks 中的 JSON 或字典或键值对格式创建 Apache Spark DataFrame
How to Create an Apache Spark DataFrame from JSON or Dictonary or Key Value pairs format in Databricks
来自以下字典的Key:Value对如下
result.items()
dict_items([('subjectArea', 'Work'), ('txn-timestamp', '2022-01-05 11:31:10'), ('foundation', {'schema': 'AZ_FH_ELLIPSE', 'table': 'AZ_FND_MSF620', 'keys': [{'key': 'DSTRCT_CODE', 'value': 'RTK1'}, {'key': 'WORKORDER', 'value': '11358186'}], 'dataMart': {'dependencies': [{'schema': 'AZ_DM_WORK', 'table': 'DIM_WORK_ORDER'}, {'schema': 'AZ_DM_WORK', 'table': 'FACT_WORK_ITEM'}]}})])
有人可以告诉我是否可以将以上内容转换为 Spark DataFrame 吗?
抱歉,我不确定如何进行换行以使代码看起来更整洁
下面是pyspark代码:
from pyspark.sql.types import *
items1 = [{'subjectArea': 'Work', 'txn-timestamp': '2022-01-05 11:31:10', 'foundation': {'schema': 'AZ_FH_ELLIPSE', 'table': 'AZ_FND_MSF620', 'keys': [{'key': 'DSTRCT_CODE', 'value': 'RTK1'}, {'key': 'WORKORDER', 'value': '11358186'}], 'dataMart': {'dependencies': [{'schema': 'AZ_DM_WORK', 'table': 'DIM_WORK_ORDER'}, {'schema': 'AZ_DM_WORK', 'table': 'FACT_WORK_ITEM'}]}}}]
schema1 = StructType([StructField('subjectArea',StringType()),StructField('txn-timestamp',StringType()),StructField('foundation',StructType([StructField('schema',StringType()),StructField('table',StringType()),StructField('keys',StructType([StructField('key',StringType()),StructField('value',StringType())])),StructField('dataMart',StructType([StructField('dependencies',StructType([StructField('schema',StringType()),StructField('table',StringType())]))]))]))])
ddf = spark.createDataFrame(items1, schema1)
ddf.printSchema()
ddf.show()
ddf.select(ddf['foundation'].datamart).show(truncate=False)
来自以下字典的Key:Value对如下
result.items()
dict_items([('subjectArea', 'Work'), ('txn-timestamp', '2022-01-05 11:31:10'), ('foundation', {'schema': 'AZ_FH_ELLIPSE', 'table': 'AZ_FND_MSF620', 'keys': [{'key': 'DSTRCT_CODE', 'value': 'RTK1'}, {'key': 'WORKORDER', 'value': '11358186'}], 'dataMart': {'dependencies': [{'schema': 'AZ_DM_WORK', 'table': 'DIM_WORK_ORDER'}, {'schema': 'AZ_DM_WORK', 'table': 'FACT_WORK_ITEM'}]}})])
有人可以告诉我是否可以将以上内容转换为 Spark DataFrame 吗? 抱歉,我不确定如何进行换行以使代码看起来更整洁
下面是pyspark代码:
from pyspark.sql.types import *
items1 = [{'subjectArea': 'Work', 'txn-timestamp': '2022-01-05 11:31:10', 'foundation': {'schema': 'AZ_FH_ELLIPSE', 'table': 'AZ_FND_MSF620', 'keys': [{'key': 'DSTRCT_CODE', 'value': 'RTK1'}, {'key': 'WORKORDER', 'value': '11358186'}], 'dataMart': {'dependencies': [{'schema': 'AZ_DM_WORK', 'table': 'DIM_WORK_ORDER'}, {'schema': 'AZ_DM_WORK', 'table': 'FACT_WORK_ITEM'}]}}}]
schema1 = StructType([StructField('subjectArea',StringType()),StructField('txn-timestamp',StringType()),StructField('foundation',StructType([StructField('schema',StringType()),StructField('table',StringType()),StructField('keys',StructType([StructField('key',StringType()),StructField('value',StringType())])),StructField('dataMart',StructType([StructField('dependencies',StructType([StructField('schema',StringType()),StructField('table',StringType())]))]))]))])
ddf = spark.createDataFrame(items1, schema1)
ddf.printSchema()
ddf.show()
ddf.select(ddf['foundation'].datamart).show(truncate=False)