如何将数据帧的每个第 i 个元素映射到由 PySpark 中的范围定义的另一个数据帧的键
How to map each i-th element of a dataframe to a key from another dataframe defined by ranges in PySpark
我想做什么
根据df1
中的聚类定义,将输入文件df0转换为所需的输出df2
我有什么
df0 = spark.createDataFrame(
[('A',0.05),('B',0.01),('C',0.75),('D',1.05),('E',0.00),('F',0.95),('G',0.34), ('H',0.13)],
("items","quotient")
)
df1 = spark.createDataFrame(
[('C0',0.00,0.00),('C1',0.01,0.05),('C2',0.06,0.10), ('C3',0.11,0.30), ('C4',0.31,0.50), ('C5',0.51,99.99)],
("cluster","from","to")
)
我想要的
df2 = spark.createDataFrame(
[('A',0.05,'C1'),('B',0.01,'C1'),('C',0.75,'C5'),('D',1.05,'C5'),('E',0.00,'C0'),('F',0.95,'C3'),('G',0.34,'C2'), ('H',0.13,'C4')],
("items","quotient","cluster")
)
笔记
编码环境是 Palantir 中的 PySpark。
为了简化编码,可以调整 DataFrame df1 的结构和内容:df1 告诉 df0 中的项目应该链接到哪个集群。
提前感谢您的宝贵时间和反馈!
这是一个简单的左连接问题。
df0.join(df1, df0['quotient'].between(df1['from'], df1['to']), "left") \
.select(*df0.columns, df1['cluster']).show()
+-----+--------+-------+
|items|quotient|cluster|
+-----+--------+-------+
| A| 0.05| C1|
| B| 0.01| C1|
| C| 0.75| C5|
| D| 1.05| C5|
| E| 0.0| C0|
| F| 0.95| C5|
| G| 0.34| C4|
| H| 0.13| C3|
+-----+--------+-------+
我想做什么
根据df1
中的聚类定义,将输入文件df0转换为所需的输出df2我有什么
df0 = spark.createDataFrame(
[('A',0.05),('B',0.01),('C',0.75),('D',1.05),('E',0.00),('F',0.95),('G',0.34), ('H',0.13)],
("items","quotient")
)
df1 = spark.createDataFrame(
[('C0',0.00,0.00),('C1',0.01,0.05),('C2',0.06,0.10), ('C3',0.11,0.30), ('C4',0.31,0.50), ('C5',0.51,99.99)],
("cluster","from","to")
)
我想要的
df2 = spark.createDataFrame(
[('A',0.05,'C1'),('B',0.01,'C1'),('C',0.75,'C5'),('D',1.05,'C5'),('E',0.00,'C0'),('F',0.95,'C3'),('G',0.34,'C2'), ('H',0.13,'C4')],
("items","quotient","cluster")
)
笔记
编码环境是 Palantir 中的 PySpark。
为了简化编码,可以调整 DataFrame df1 的结构和内容:df1 告诉 df0 中的项目应该链接到哪个集群。
提前感谢您的宝贵时间和反馈!
这是一个简单的左连接问题。
df0.join(df1, df0['quotient'].between(df1['from'], df1['to']), "left") \
.select(*df0.columns, df1['cluster']).show()
+-----+--------+-------+
|items|quotient|cluster|
+-----+--------+-------+
| A| 0.05| C1|
| B| 0.01| C1|
| C| 0.75| C5|
| D| 1.05| C5|
| E| 0.0| C0|
| F| 0.95| C5|
| G| 0.34| C4|
| H| 0.13| C3|
+-----+--------+-------+