Pyspark - 将两列中的值转换为字典
Pyspark - Convert values from two columns into dict
我有这个数据框:
schema = [("clusterID", StringType(), True), \
("segment", StringType(), True)]
arr = [("cluster_comp_444", "Home equipment & interior design"), \
("cluster_comp_1160", "Going Out & shows"), \
("cluster_comp_217576624", "Healthcare & medicine"), \
("cluster_comp_465", "Good deals")]
df = spark.create_df(arr, schema)
我想用这种结构创建一个字典:
{"cluster_comp_444": "Home equipment & interior design", \
"cluster_comp_1160": "Going Out & shows",
"cluster_comp_217576624": "Healthcare & medicine",
"cluster_comp_465": "Good deals"}
我试过这一行,但这不是我需要的:
df.withColumn("json", fu.to_json(fu.struct("clusterID", "segment"))).show(truncate=False)
您可以使用 df.collect,其中 returns 一行,并使用列表理解中的第一个和第二个元素来获取字典。
arr = [("cluster_comp_444", "Home equipment & interior design"), \
("cluster_comp_1160", "Going Out & shows"), \
("cluster_comp_217576624", "Healthcare & medicine"), \
("cluster_comp_465", "Good deals")]
df = spark.createDtaFrame(arr)
dic = {row[0]:row[1] for row in df.collect()}
我有这个数据框:
schema = [("clusterID", StringType(), True), \
("segment", StringType(), True)]
arr = [("cluster_comp_444", "Home equipment & interior design"), \
("cluster_comp_1160", "Going Out & shows"), \
("cluster_comp_217576624", "Healthcare & medicine"), \
("cluster_comp_465", "Good deals")]
df = spark.create_df(arr, schema)
我想用这种结构创建一个字典:
{"cluster_comp_444": "Home equipment & interior design", \
"cluster_comp_1160": "Going Out & shows",
"cluster_comp_217576624": "Healthcare & medicine",
"cluster_comp_465": "Good deals"}
我试过这一行,但这不是我需要的:
df.withColumn("json", fu.to_json(fu.struct("clusterID", "segment"))).show(truncate=False)
您可以使用 df.collect,其中 returns 一行,并使用列表理解中的第一个和第二个元素来获取字典。
arr = [("cluster_comp_444", "Home equipment & interior design"), \
("cluster_comp_1160", "Going Out & shows"), \
("cluster_comp_217576624", "Healthcare & medicine"), \
("cluster_comp_465", "Good deals")]
df = spark.createDtaFrame(arr)
dic = {row[0]:row[1] for row in df.collect()}